date:20140908

Re: [Qemu-devel] [PATCH v2 0/5] Convert remaining legacy chardevs to QAPI

2014-09-08 Thread Paolo Bonzini

Il 02/09/2014 12:24, Peter Maydell ha scritto:
> This patchset converts the two remaining legacy chardevs
> ('socket' and 'udp') to use the new-style parse/kind
> mechanisms, and removes all the no-longer-required
> legacy machinery.
> 
> Patch 1 was posted to the list back in June
> (https://patches.linaro.org/32298/). I've fixed the obvious
> bug picked up in code review, and as far as I can tell from
> the thread we decided that the blocking/non-blocking
> difference between QAPI and legacy wasn't a problem.
> 
> Patch 2 fixes a hole in the functionality of QAPI-described
> UDP chardevs, to avoid regressing the commandline functionality
> when we convert the UDP backend in patch 3.
> 
> Patch 4 may be easier to review as an ignore-whitespaces
> diff (the de-indentation makes the diff a bit awkward).
> 
> Changes v1->v2:
>  * fixed the has_* values as suggested by Markus
>  * added patch 5 which renames the _qapi() function now
>the legacy version has gone (again, as suggested by
>Markus)
> 
> Peter Maydell (5):
>   qemu-char: Convert socket backend to QAPI
>   util/qemu-sockets.c: Support specifying IPv4 or IPv6 in socket_dgram()
>   qemu-char: Convert udp backend to QAPI
>   qemu-char: Remove register_char_driver() machinery
>   qemu-char: Rename register_char_driver_qapi() to
> register_char_driver()
> 
>  backends/baum.c   |   2 +-
>  backends/msmouse.c|   2 +-
>  backends/testdev.c|   2 +-
>  include/sysemu/char.h |   3 +-
>  qemu-char.c   | 353 
> --
>  spice-qemu-char.c |   8 +-
>  ui/console.c  |   3 +-
>  util/qemu-sockets.c   |   3 +-
>  8 files changed, 180 insertions(+), 196 deletions(-)
> 

Hi Peter, are you going to apply these directly?

Paolo

[Qemu-devel] [PATCH v2 6/9] virtio-serial: fix virtio-serial child refcount in transports

2014-09-08 Thread arei.gonglei

From: Gonglei 

object_initialize() leaves the object with a refcount of 1.
object_property_add_child() adds its own reference which is dropped
again when the property is deleted.

The upshot of this is that we always have a refcount >= 1.  Upon hot
unplug the virtio-serial child is not finalized!

Drop our reference after the child property has been added to the
parent.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 1 +
 hw/s390x/virtio-ccw.c  | 1 +
 hw/virtio/virtio-pci.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 31f5286..422402e 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -227,6 +227,7 @@ static void s390_virtio_serial_instance_init(Object *obj)
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SERIAL);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
+object_unref(OBJECT(&dev->vdev));
 }
 
 static int s390_virtio_scsi_init(VirtIOS390Device *s390_dev)
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 271104d..5d7f3a6 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -853,6 +853,7 @@ static void virtio_ccw_serial_instance_init(Object *obj)
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SERIAL);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
+object_unref(OBJECT(&dev->vdev));
 }
 
 static int virtio_ccw_balloon_init(VirtioCcwDevice *ccw_dev)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5a5b534..87ceea3 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1399,6 +1399,7 @@ static void virtio_serial_pci_instance_init(Object *obj)
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SERIAL);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
+object_unref(OBJECT(&dev->vdev));
 }
 
 static const TypeInfo virtio_serial_pci_info = {
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 0/9] virtio: fix virtio child recount in transports

2014-09-08 Thread arei.gonglei

From: Gonglei 

virtio-$device-{pci, s390, ccw} all duplicate the
qdev properties of their virtio child. This approach does
not work well with string or pointer properties since we
must be careful about leaking or double-freeing them.

Use the QOM alias property to forward property accesses to the
VirtIORNG child.  This way no duplication is necessary.

For their child, object_initialize() leaves the object with a refcount of 1.
object_property_add_child() adds its own reference which is dropped
again when the property is deleted.

The upshot of this is that we always have a refcount >= 1.  Upon hot
unplug the virtio-$device child is not finalized!

Drop our reference after the child property has been added to the
parent.

The v1 as below:
 http://lists.gnu.org/archive/html/qemu-devel/2014-09/msg01208.html

Changes since v1:
 1. using alias properties avoid to double-free property.(Stefan)
 2. add handling all other virtio-devices had the same probleam.
 3. same handling for CCW and s390-virito.
 
Acknowledgements:
 I copied Stefan's commit message about virtio-blk which summarized
 reasons very well, I cannot agree more with him. Holp Stefan do not
 mind, thank you so much!

Gonglei (9):
  virtio-net: use aliases instead of duplicate qdev properties
  virtio: fix virtio-net child refcount in transports
  virtio/vhost scsi: use aliases instead of duplicate qdev properties
  virtio/vhost-scsi: fix virtio-scsi/vhost-scsi child refcount in
transports
  virtio-serial: use aliases instead of duplicate qdev properties
  virtio-serial: fix virtio-serial child refcount in transports
  virtio-rng: use aliases instead of duplicate qdev properties
  virtio-rng: fix virtio-rng child refcount in transports
  virtio-balloon: fix virtio-balloon child refcount in transports

 hw/s390x/s390-virtio-bus.c | 16 ++--
 hw/s390x/virtio-ccw.c  | 18 +++---
 hw/virtio/virtio-pci.c | 18 +++---
 3 files changed, 32 insertions(+), 20 deletions(-)

-- 
1.7.12.4

[Qemu-devel] [PATCH v2 5/9] virtio-serial: use aliases instead of duplicate qdev properties

2014-09-08 Thread arei.gonglei

From: Gonglei 

virtio-serial-{pci, s390, ccw} all duplicate the
qdev properties of their VirtIOSerial child.
This approach does not work well with string or pointer
properties since we must be careful about leaking or
double-freeing them.

Use the QOM alias property to forward property accesses to the
VirtIOSerial child.  This way no duplication is necessary.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 2 +-
 hw/s390x/virtio-ccw.c  | 2 +-
 hw/virtio/virtio-pci.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 4276034..31f5286 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -226,6 +226,7 @@ static void s390_virtio_serial_instance_init(Object *obj)
 VirtIOSerialS390 *dev = VIRTIO_SERIAL_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SERIAL);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static int s390_virtio_scsi_init(VirtIOS390Device *s390_dev)
@@ -537,7 +538,6 @@ static const TypeInfo s390_virtio_blk = {
 };
 
 static Property s390_virtio_serial_properties[] = {
-DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtIOSerialS390, vdev.serial),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index a466674..271104d 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -852,6 +852,7 @@ static void virtio_ccw_serial_instance_init(Object *obj)
 VirtioSerialCcw *dev = VIRTIO_SERIAL_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SERIAL);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static int virtio_ccw_balloon_init(VirtioCcwDevice *ccw_dev)
@@ -1432,7 +1433,6 @@ static const TypeInfo virtio_ccw_blk = {
 
 static Property virtio_ccw_serial_properties[] = {
 DEFINE_PROP_STRING("devno", VirtioCcwDevice, bus_id),
-DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtioSerialCcw, vdev.serial),
 DEFINE_PROP_BIT("ioeventfd", VirtioCcwDevice, flags,
 VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 00dbfc9..5a5b534 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1376,7 +1376,6 @@ static Property virtio_serial_pci_properties[] = {
 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
 DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
-DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtIOSerialPCI, vdev.serial),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1399,6 +1398,7 @@ static void virtio_serial_pci_instance_init(Object *obj)
 VirtIOSerialPCI *dev = VIRTIO_SERIAL_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SERIAL);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static const TypeInfo virtio_serial_pci_info = {
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 1/9] virtio-net: use aliases instead of duplicate qdev properties

2014-09-08 Thread arei.gonglei

From: Gonglei 

virtio-net-pci, virtio-net-s390, and virtio-net-ccw all duplicate the
qdev properties of their VirtIONet child. This approach does not work
well with string or pointer properties since we must be careful about
leaking or double-freeing them.

Use the QOM alias property to forward property accesses to the
VirtIONet child.  This way no duplication is necessary.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 3 +--
 hw/s390x/virtio-ccw.c  | 3 +--
 hw/virtio/virtio-pci.c | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 6b6fb61..5b5d595 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -161,6 +161,7 @@ static void s390_virtio_net_instance_init(Object *obj)
 VirtIONetS390 *dev = VIRTIO_NET_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_NET);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static int s390_virtio_blk_init(VirtIOS390Device *s390_dev)
@@ -493,10 +494,8 @@ static unsigned virtio_s390_get_features(DeviceState *d)
 / S390 Virtio Bus Device Descriptions ***/
 
 static Property s390_virtio_net_properties[] = {
-DEFINE_NIC_PROPERTIES(VirtIONetS390, vdev.nic_conf),
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
 DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features),
-DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetS390, vdev.net_conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 33a1d86..7d67577 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -794,6 +794,7 @@ static void virtio_ccw_net_instance_init(Object *obj)
 VirtIONetCcw *dev = VIRTIO_NET_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_NET);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static int virtio_ccw_blk_init(VirtioCcwDevice *ccw_dev)
@@ -1374,8 +1375,6 @@ static int virtio_ccw_load_config(DeviceState *d, 
QEMUFile *f)
 static Property virtio_ccw_net_properties[] = {
 DEFINE_PROP_STRING("devno", VirtioCcwDevice, bus_id),
 DEFINE_VIRTIO_NET_FEATURES(VirtioCcwDevice, host_features[0]),
-DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetCcw, vdev.net_conf),
-DEFINE_NIC_PROPERTIES(VirtIONetCcw, vdev.nic_conf),
 DEFINE_PROP_BIT("ioeventfd", VirtioCcwDevice, flags,
 VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ddb5da1..6722156 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1414,8 +1414,6 @@ static Property virtio_net_properties[] = {
 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
 DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features),
-DEFINE_NIC_PROPERTIES(VirtIONetPCI, vdev.nic_conf),
-DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetPCI, vdev.net_conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1456,6 +1454,7 @@ static void virtio_net_pci_instance_init(Object *obj)
 VirtIONetPCI *dev = VIRTIO_NET_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_NET);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static const TypeInfo virtio_net_pci_info = {
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 3/9] virtio/vhost scsi: use aliases instead of duplicate qdev properties

2014-09-08 Thread arei.gonglei

From: Gonglei 

{virtio, vhost}-scsi-{pci, s390, ccw} all duplicate the
qdev properties of their VirtIOSCSI/VHostSCSI child.
This approach does not work well with string or pointer
properties since we must be careful about leaking or
double-freeing them.

Use the QOM alias property to forward property accesses to the
VirtIOSCSI/VHostSCSI child. This way no duplication is necessary.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 4 ++--
 hw/s390x/virtio-ccw.c  | 4 ++--
 hw/virtio/virtio-pci.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 297eac2..eaaa275 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -258,6 +258,7 @@ static void s390_virtio_scsi_instance_init(Object *obj)
 VirtIOSCSIS390 *dev = VIRTIO_SCSI_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 #ifdef CONFIG_VHOST_SCSI
@@ -279,6 +280,7 @@ static void s390_vhost_scsi_instance_init(Object *obj)
 VHostSCSIS390 *dev = VHOST_SCSI_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VHOST_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 #endif
 
@@ -614,7 +616,6 @@ static const TypeInfo virtio_s390_device_info = {
 };
 
 static Property s390_virtio_scsi_properties[] = {
-DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSIS390, vdev.parent_obj.conf),
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
 DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features),
 DEFINE_PROP_END_OF_LIST(),
@@ -640,7 +641,6 @@ static const TypeInfo s390_virtio_scsi = {
 #ifdef CONFIG_VHOST_SCSI
 static Property s390_vhost_scsi_properties[] = {
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
-DEFINE_VHOST_SCSI_PROPERTIES(VHostSCSIS390, vdev.parent_obj.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index bb699f2..458aabc 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -938,6 +938,7 @@ static void virtio_ccw_scsi_instance_init(Object *obj)
 VirtIOSCSICcw *dev = VIRTIO_SCSI_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 #ifdef CONFIG_VHOST_SCSI
@@ -959,6 +960,7 @@ static void vhost_ccw_scsi_instance_init(Object *obj)
 VHostSCSICcw *dev = VHOST_SCSI_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VHOST_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 #endif
 
@@ -1481,7 +1483,6 @@ static const TypeInfo virtio_ccw_balloon = {
 
 static Property virtio_ccw_scsi_properties[] = {
 DEFINE_PROP_STRING("devno", VirtioCcwDevice, bus_id),
-DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSICcw, vdev.parent_obj.conf),
 DEFINE_VIRTIO_SCSI_FEATURES(VirtioCcwDevice, host_features[0]),
 DEFINE_PROP_BIT("ioeventfd", VirtioCcwDevice, flags,
 VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
@@ -1510,7 +1511,6 @@ static const TypeInfo virtio_ccw_scsi = {
 #ifdef CONFIG_VHOST_SCSI
 static Property vhost_ccw_scsi_properties[] = {
 DEFINE_PROP_STRING("devno", VirtioCcwDevice, bus_id),
-DEFINE_VHOST_SCSI_PROPERTIES(VirtIOSCSICcw, vdev.parent_obj.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 19c9a6c..2dd8360 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1124,7 +1124,6 @@ static Property virtio_scsi_pci_properties[] = {
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
DEV_NVECTORS_UNSPECIFIED),
 DEFINE_VIRTIO_SCSI_FEATURES(VirtIOPCIProxy, host_features),
-DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSIPCI, vdev.parent_obj.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1176,6 +1175,7 @@ static void virtio_scsi_pci_instance_init(Object *obj)
 VirtIOSCSIPCI *dev = VIRTIO_SCSI_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
 static const TypeInfo virtio_scsi_pci_info = {
@@ -1192,7 +1192,6 @@ static const TypeInfo virtio_scsi_pci_info = {
 static Property vhost_scsi_pci_properties[] = {
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
DEV_NVECTORS_UNSPECIFIED),
-DEFINE_VHOST_SCSI_PROPERTIES(VHostSCSIPCI, vdev.parent_obj.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1232,6 +1231,7

[Qemu-devel] [PATCH v2 8/9] virtio-rng: fix virtio-rng child refcount in transports

2014-09-08 Thread arei.gonglei

From: Gonglei 

object_initialize() leaves the object with a refcount of 1.
object_property_add_child() adds its own reference which is dropped
again when the property is deleted.

The upshot of this is that we always have a refcount >= 1.  Upon hot
unplug the virtio-rng child is not finalized!

Drop our reference after the child property has been added to the
parent.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 1 +
 hw/s390x/virtio-ccw.c  | 1 +
 hw/virtio/virtio-pci.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 6d0a7f3..ca682bb 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -312,6 +312,7 @@ static void s390_virtio_rng_instance_init(Object *obj)
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_RNG);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
+object_unref(OBJECT(&dev->vdev));
 object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
  (Object **)&dev->vdev.conf.rng,
  qdev_prop_allow_set_link_before_realize,
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index da2e427..de0764d 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -1543,6 +1543,7 @@ static void virtio_ccw_rng_instance_init(Object *obj)
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_RNG);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
+object_unref(OBJECT(&dev->vdev));
 object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
  (Object **)&dev->vdev.conf.rng,
  qdev_prop_allow_set_link_before_realize,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 28b9b74..d93ffad 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1514,6 +1514,7 @@ static void virtio_rng_initfn(Object *obj)
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_RNG);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
+object_unref(OBJECT(&dev->vdev));
 object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
  (Object **)&dev->vdev.conf.rng,
  qdev_prop_allow_set_link_before_realize,
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 9/9] virtio-balloon: fix virtio-balloon child refcount in transports

2014-09-08 Thread arei.gonglei

From: Gonglei 

object_initialize() leaves the object with a refcount of 1.
object_property_add_child() adds its own reference which is dropped
again when the property is deleted.

The upshot of this is that we always have a refcount >= 1.  Upon hot
unplug the virtio-balloon child is not finalized!

Drop our reference after the child property has been added to the
parent.

Signed-off-by: Gonglei 
---
 hw/s390x/virtio-ccw.c  | 2 +-
 hw/virtio/virtio-pci.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index de0764d..c074f64 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -900,7 +900,7 @@ static void virtio_ccw_balloon_instance_init(Object *obj)
 VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-
+object_unref(OBJECT(&dev->vdev));
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d93ffad..e6cdaca 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1314,7 +1314,7 @@ static void virtio_balloon_pci_instance_init(Object *obj)
 VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-
+object_unref(OBJECT(&dev->vdev));
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_pci_stats_get_all, NULL, NULL, dev,
 NULL);
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 2/9] virtio: fix virtio-net child refcount in transports

2014-09-08 Thread arei.gonglei

From: Gonglei 

object_initialize() leaves the object with a refcount of 1.
object_property_add_child() adds its own reference which is dropped
again when the property is deleted.

The upshot of this is that we always have a refcount >= 1.  Upon hot
unplug the virtio-net child is not finalized!

Drop our reference after the child property has been added to the
parent.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 1 +
 hw/s390x/virtio-ccw.c  | 1 +
 hw/virtio/virtio-pci.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 5b5d595..297eac2 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -161,6 +161,7 @@ static void s390_virtio_net_instance_init(Object *obj)
 VirtIONetS390 *dev = VIRTIO_NET_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_NET);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 7d67577..bb699f2 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -794,6 +794,7 @@ static void virtio_ccw_net_instance_init(Object *obj)
 VirtIONetCcw *dev = VIRTIO_NET_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_NET);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 6722156..19c9a6c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1454,6 +1454,7 @@ static void virtio_net_pci_instance_init(Object *obj)
 VirtIONetPCI *dev = VIRTIO_NET_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_NET);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 4/9] virtio/vhost-scsi: fix virtio-scsi/vhost-scsi child refcount in transports

2014-09-08 Thread arei.gonglei

From: Gonglei 

object_initialize() leaves the object with a refcount of 1.
object_property_add_child() adds its own reference which is dropped
again when the property is deleted.

The upshot of this is that we always have a refcount >= 1.  Upon hot
unplug the virtio-scsi/vhost-scsi child is not finalized!

Drop our reference after the child property has been added to the
parent.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 2 ++
 hw/s390x/virtio-ccw.c  | 2 ++
 hw/virtio/virtio-pci.c | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index eaaa275..4276034 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -258,6 +258,7 @@ static void s390_virtio_scsi_instance_init(Object *obj)
 VirtIOSCSIS390 *dev = VIRTIO_SCSI_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
@@ -280,6 +281,7 @@ static void s390_vhost_scsi_instance_init(Object *obj)
 VHostSCSIS390 *dev = VHOST_SCSI_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VHOST_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 #endif
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 458aabc..a466674 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -938,6 +938,7 @@ static void virtio_ccw_scsi_instance_init(Object *obj)
 VirtIOSCSICcw *dev = VIRTIO_SCSI_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
@@ -960,6 +961,7 @@ static void vhost_ccw_scsi_instance_init(Object *obj)
 VHostSCSICcw *dev = VHOST_SCSI_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VHOST_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 #endif
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 2dd8360..00dbfc9 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1175,6 +1175,7 @@ static void virtio_scsi_pci_instance_init(Object *obj)
 VirtIOSCSIPCI *dev = VIRTIO_SCSI_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
@@ -1231,6 +1232,7 @@ static void vhost_scsi_pci_instance_init(Object *obj)
 VHostSCSIPCI *dev = VHOST_SCSI_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VHOST_SCSI);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+object_unref(OBJECT(&dev->vdev));
 qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 }
 
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 7/9] virtio-rng: use aliases instead of duplicate qdev properties

2014-09-08 Thread arei.gonglei

From: Gonglei 

virtio-rng-{pci, s390, ccw} all duplicate the
qdev properties of their VirtIORNG child.
This approach does not work well with string or pointer
properties since we must be careful about leaking or
double-freeing them.

Use the QOM alias property to forward property accesses to the
VirtIORNG child.  This way no duplication is necessary.

Signed-off-by: Gonglei 
---
 hw/s390x/s390-virtio-bus.c | 2 +-
 hw/s390x/virtio-ccw.c  | 2 +-
 hw/virtio/virtio-pci.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 422402e..6d0a7f3 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -311,6 +311,7 @@ static void s390_virtio_rng_instance_init(Object *obj)
 VirtIORNGS390 *dev = VIRTIO_RNG_S390(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_RNG);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
  (Object **)&dev->vdev.conf.rng,
  qdev_prop_allow_set_link_before_realize,
@@ -561,7 +562,6 @@ static const TypeInfo s390_virtio_serial = {
 
 static Property s390_virtio_rng_properties[] = {
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
-DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORNGS390, vdev.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 5d7f3a6..da2e427 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -1542,6 +1542,7 @@ static void virtio_ccw_rng_instance_init(Object *obj)
 VirtIORNGCcw *dev = VIRTIO_RNG_CCW(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_RNG);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
  (Object **)&dev->vdev.conf.rng,
  qdev_prop_allow_set_link_before_realize,
@@ -1550,7 +1551,6 @@ static void virtio_ccw_rng_instance_init(Object *obj)
 
 static Property virtio_ccw_rng_properties[] = {
 DEFINE_PROP_STRING("devno", VirtioCcwDevice, bus_id),
-DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORNGCcw, vdev.conf),
 DEFINE_PROP_BIT("ioeventfd", VirtioCcwDevice, flags,
 VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 87ceea3..28b9b74 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1472,7 +1472,6 @@ static const TypeInfo virtio_net_pci_info = {
 /* virtio-rng-pci */
 
 static Property virtio_rng_pci_properties[] = {
-DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORngPCI, vdev.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1514,6 +1513,7 @@ static void virtio_rng_initfn(Object *obj)
 VirtIORngPCI *dev = VIRTIO_RNG_PCI(obj);
 object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_RNG);
 object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
+qdev_alias_all_properties(DEVICE(&dev->vdev), obj);
 object_property_add_link(obj, "rng", TYPE_RNG_BACKEND,
  (Object **)&dev->vdev.conf.rng,
  qdev_prop_allow_set_link_before_realize,
-- 
1.7.12.4

Re: [Qemu-devel] [RFC PATCH v6 00/14] Reverse execution.

2014-09-08 Thread Pavel Dovgaluk

> From: Frederic Konrad [mailto:fred.kon...@greensocs.com]
> On 08/09/2014 10:29, Paolo Bonzini wrote:
> > Il 08/09/2014 10:09, Frederic Konrad ha scritto:
> >> By the way how do you want to have this discussion?
> >>
> >> At the KVM forum? Or by phone on KVM phone call?
> > Or both. :)
> >
> > Seriously, Pavel is presenting on record/replay at KVM Forum.  I guess
> > there will be some discussion after his presentation, but anything that
> > happens _before_ could make the presentation more interesting.
> >
> > Paolo
> Ok fine with us,
> 
> So is that possible to allocate time for this on the next KVM phone
> calls I think it's
> next week?
> If that suits to you Pavel?

Yes, I can take part in phone calls next week.
What should I do for that?

Pavel Dovgalyuk

Re: [Qemu-devel] [PULL 6/7] net: complete all queued packets on VM stop

2014-09-08 Thread Jason Wang

On 09/04/2014 11:50 PM, Stefan Hajnoczi wrote:
> From: "Michael S. Tsirkin" 
>
> This completes all packets, ensuring that callbacks
> will not run when VM is stopped.
>
> Cc: qemu-sta...@nongnu.org
> Cc: Jason Wang 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  net/net.c | 33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/net/net.c b/net/net.c
> index 962c05f..7acc162 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -48,6 +48,7 @@
>  # define CONFIG_NET_BRIDGE
>  #endif
>  
> +static VMChangeStateEntry *net_change_state_entry;
>  static QTAILQ_HEAD(, NetClientState) net_clients;
>  
>  const char *host_net_devices[] = {
> @@ -511,7 +512,8 @@ void qemu_purge_queued_packets(NetClientState *nc)
>  qemu_net_queue_purge(nc->peer->incoming_queue, nc);
>  }
>  
> -void qemu_flush_queued_packets(NetClientState *nc)
> +static
> +void qemu_flush_or_purge_queued_packets(NetClientState *nc, bool purge)
>  {
>  nc->receive_disabled = 0;
>  
> @@ -525,9 +527,17 @@ void qemu_flush_queued_packets(NetClientState *nc)
>   * the file descriptor (for tap, for example).
>   */
>  qemu_notify_event();
> +} else if (purge) {
> +/* Unable to empty the queue, purge remaining packets */
> +qemu_net_queue_purge(nc->incoming_queue, nc);
>  }
>  }
>  
> +void qemu_flush_queued_packets(NetClientState *nc)
> +{
> +qemu_flush_or_purge_queued_packets(nc, false);
> +}
> +
>  static ssize_t qemu_send_packet_async_with_flags(NetClientState *sender,
>   unsigned flags,
>   const uint8_t *buf, int 
> size,
> @@ -1175,6 +1185,22 @@ void qmp_set_link(const char *name, bool up, Error 
> **errp)
>  }
>  }
>  
> +static void net_vm_change_state_handler(void *opaque, int running,
> +RunState state)
> +{
> +/* Complete all queued packets, to guarantee we don't modify
> + * state later when VM is not running.
> + */
> +if (!running) {
> +NetClientState *nc;
> +NetClientState *tmp;
> +
> +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) {
> +qemu_flush_or_purge_queued_packets(nc, true);
> +}
> +}
> +}

Something like net_drain_all_queue() in do_vm_stop() looks simpler. And
doing this is tricky if it depends on other handlers to be called first.

E.g virtio-net vm change state handler will set vm_running to false. And
net_vm_change_state_handler() will be called after this. This means
virtio_net_flush_tx() will still hit the assert since it will be called
by packet cb (virtio_net_tx_complete()).
> +
>  void net_cleanup(void)
>  {
>  NetClientState *nc;
> @@ -1190,6 +1216,8 @@ void net_cleanup(void)
>  qemu_del_net_client(nc);
>  }
>  }
> +
> +qemu_del_vm_change_state_handler(net_change_state_entry);
>  }
>  
>  void net_check_clients(void)
> @@ -1275,6 +1303,9 @@ int net_init_clients(void)
>  #endif
>  }
>  
> +net_change_state_entry =
> +qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL);
> +
>  QTAILQ_INIT(&net_clients);
>  
>  if (qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL, 
> 1) == -1)

[Qemu-devel] [PATCH v7 RESEND 7/8] exec: report error when memory < hpagesize

2014-09-08 Thread Hu Tao

Report an error when memory < hpagesize in file_ram_alloc() so callers
can handle the error.

If user adds a memory-backend-file object using object_add command,
specifying a size that is less than huge page size, qemu will core dump
with message:

  Bad ram offset f000
  Aborted (core dumped)

This patch fixes the problem. With this patch, qemu reports error
message like:

  qemu-system-x86_64: -object 
memory-backend-file,mem-path=/hugepages,id=mem-file0,size=1M: memory
  size 0x10 must be equal to or larger than huge page size 0x20

Signed-off-by: Hu Tao 
---
 exec.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index 0362cd8..21af6c9 100644
--- a/exec.c
+++ b/exec.c
@@ -1059,9 +1059,9 @@ static void *file_ram_alloc(RAMBlock *block,
 char *filename;
 char *sanitized_name;
 char *c;
-void *area;
+void *area = NULL;
 int fd;
-unsigned long hpagesize;
+uint64_t hpagesize;
 
 hpagesize = gethugepagesize(path);
 if (!hpagesize) {
@@ -1069,7 +1069,10 @@ static void *file_ram_alloc(RAMBlock *block,
 }
 
 if (memory < hpagesize) {
-return NULL;
+error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
+   "or larger than huge page size 0x%" PRIx64,
+   memory, hpagesize);
+goto error;
 }
 
 if (kvm_enabled() && !kvm_has_sync_mmu()) {
-- 
1.9.3

[Qemu-devel] [PATCH v7 RESEND 6/8] exec: file_ram_alloc: don't exit if failed to preallocate memory

2014-09-08 Thread Hu Tao

When using monitor command object_add to add a memory backend file
but failed to preallocate memory for it, qemu exits silently. In
the case we'd better give an error message and keep guest running.

The problem can be reproduced as follows:

1. run qemu with -mem-prealloc
2. (monitor)object_add 
memory-backend-file,size=10G,mem-path=/hugepages,id=mem-file0

Signed-off-by: Hu Tao 
---
 exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 2b9c4c5..0362cd8 100644
--- a/exec.c
+++ b/exec.c
@@ -1129,8 +1129,8 @@ static void *file_ram_alloc(RAMBlock *block,
 return area;
 
 error:
-if (mem_prealloc) {
-exit(1);
+if (area && area != MAP_FAILED) {
+munmap(area, memory);
 }
 return NULL;
 }
-- 
1.9.3

[Qemu-devel] [PATCH v7 RESEND 8/8] exec: add parameter errp to gethugepagesize

2014-09-08 Thread Hu Tao

Add parameter errp to gethugepagesize thus callers can handle errors.

If user adds a memory-backend-file object using object_add command,
specifying a non-existing directory for property mem-path, qemu will
core dump with message:

  /nonexistingdir: No such file or directory
  Bad ram offset f000
  Aborted (core dumped)

This patch fixes the problem. With this patch, qemu reports an error
message like:

  qemu-system-x86_64: -object 
memory-backend-file,mem-path=/nonexistingdir,id=mem-file0,size=128M:
  failed to get page size of file /nonexistingdir: No such file or directory

Signed-off-by: Hu Tao 
Reviewed-by: Peter Crosthwaite 
---
 exec.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 21af6c9..0f70c04 100644
--- a/exec.c
+++ b/exec.c
@@ -1031,7 +1031,7 @@ void qemu_mutex_unlock_ramlist(void)
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
-static long gethugepagesize(const char *path)
+static long gethugepagesize(const char *path, Error **errp)
 {
 struct statfs fs;
 int ret;
@@ -1041,7 +1041,8 @@ static long gethugepagesize(const char *path)
 } while (ret != 0 && errno == EINTR);
 
 if (ret != 0) {
-perror(path);
+error_setg_errno(errp, errno, "failed to get page size of file %s",
+ path);
 return 0;
 }
 
@@ -1062,9 +1063,11 @@ static void *file_ram_alloc(RAMBlock *block,
 void *area = NULL;
 int fd;
 uint64_t hpagesize;
+Error *local_err = NULL;
 
-hpagesize = gethugepagesize(path);
-if (!hpagesize) {
+hpagesize = gethugepagesize(path, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
 goto error;
 }
 
-- 
1.9.3

[Qemu-devel] [PATCH v7 RESEND 2/8] memory: add parameter errp to memory_region_init_ram

2014-09-08 Thread Hu Tao

Add parameter errp to memory_region_init_ram and update all call sites
to pass in &error_abort.

Signed-off-by: Hu Tao 
Reviewed-by: Peter Crosthwaite 
---
 backends/hostmem-ram.c   |  2 +-
 hw/alpha/typhoon.c   |  3 ++-
 hw/arm/armv7m.c  |  7 ---
 hw/arm/cubieboard.c  |  2 +-
 hw/arm/digic_boards.c|  2 +-
 hw/arm/exynos4210.c  |  9 +
 hw/arm/highbank.c|  5 +++--
 hw/arm/integratorcp.c|  5 +++--
 hw/arm/kzm.c |  4 ++--
 hw/arm/mainstone.c   |  3 ++-
 hw/arm/musicpal.c|  6 --
 hw/arm/omap1.c   |  6 --
 hw/arm/omap2.c   |  6 --
 hw/arm/omap_sx1.c|  6 --
 hw/arm/palm.c|  3 ++-
 hw/arm/pxa2xx.c  | 11 +++
 hw/arm/realview.c|  9 ++---
 hw/arm/spitz.c   |  2 +-
 hw/arm/strongarm.c   |  3 ++-
 hw/arm/tosa.c|  2 +-
 hw/arm/versatilepb.c |  3 ++-
 hw/arm/vexpress.c| 15 ++-
 hw/arm/virt.c|  3 ++-
 hw/arm/xilinx_zynq.c |  6 --
 hw/block/onenand.c   |  2 +-
 hw/core/loader.c |  2 +-
 hw/cris/axis_dev88.c |  6 --
 hw/display/cg3.c |  6 --
 hw/display/qxl.c |  6 +++---
 hw/display/sm501.c   |  2 +-
 hw/display/tc6393xb.c|  3 ++-
 hw/display/tcx.c |  5 +++--
 hw/display/vga.c |  3 ++-
 hw/display/vmware_vga.c  |  3 ++-
 hw/i386/kvm/pci-assign.c |  3 ++-
 hw/i386/pc.c |  3 ++-
 hw/i386/pc_sysfw.c   |  5 +++--
 hw/input/milkymist-softusb.c |  4 ++--
 hw/lm32/lm32_boards.c|  6 --
 hw/lm32/milkymist.c  |  3 ++-
 hw/m68k/an5206.c |  4 ++--
 hw/m68k/dummy_m68k.c |  2 +-
 hw/m68k/mcf5208.c|  4 ++--
 hw/microblaze/petalogix_ml605_mmu.c  |  5 +++--
 hw/microblaze/petalogix_s3adsp1800_mmu.c |  6 --
 hw/mips/mips_fulong2e.c  |  5 +++--
 hw/mips/mips_jazz.c  |  8 +---
 hw/mips/mips_malta.c |  6 --
 hw/mips/mips_mipssim.c   |  6 --
 hw/mips/mips_r4k.c   |  5 +++--
 hw/moxie/moxiesim.c  |  4 ++--
 hw/net/milkymist-minimac2.c  |  2 +-
 hw/openrisc/openrisc_sim.c   |  2 +-
 hw/pci-host/prep.c   |  3 ++-
 hw/pci/pci.c |  2 +-
 hw/ppc/mac_newworld.c|  3 ++-
 hw/ppc/mac_oldworld.c|  3 ++-
 hw/ppc/ppc405_boards.c   |  8 +---
 hw/ppc/ppc405_uc.c   |  3 ++-
 hw/s390x/s390-virtio-ccw.c   |  2 +-
 hw/s390x/s390-virtio.c   |  2 +-
 hw/sh4/r2d.c |  2 +-
 hw/sh4/shix.c|  8 +---
 hw/sparc/leon3.c |  4 ++--
 hw/sparc/sun4m.c | 10 ++
 hw/sparc64/sun4u.c   |  6 --
 hw/unicore32/puv3.c  |  3 ++-
 hw/xtensa/sim.c  |  4 ++--
 hw/xtensa/xtfpga.c   |  8 +---
 include/exec/memory.h|  4 +++-
 memory.c |  5 +++--
 numa.c   |  4 ++--
 xen-hvm.c|  3 ++-
 73 files changed, 203 insertions(+), 128 deletions(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index d9a8290..e55d066 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -27,7 +27,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 
 path = object_get_canonical_path_component(OBJECT(backend));
 memory_region_init_ram(&backend->mr, OBJECT(backend), path,
-   backend->size);
+   backend->size, &error_abort);
 g_free(path);
 }
 
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 31947d9..5310006 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -844,7 +844,8 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus,
 
 /* Main memory region, 0x00...  Real hardware supports 32GB,
but the address space hole reserved at this point is 8TB.  */
-memory_region_init_ram(&s->ram_region, OBJECT(s), "ram", ram_size);
+memory_region_init_ra

[Qemu-devel] [PATCH v7 RESEND 3/8] memory: add parameter errp to memory_region_init_ram_ptr

2014-09-08 Thread Hu Tao

Add parameter errp to memory_region_init_ram_ptr and update all call
sites to pass in &error_abort.

Reviewed-by: Peter Crosthwaite 
Signed-off-by: Hu Tao 
---
 hw/display/g364fb.c  | 2 +-
 hw/i386/kvm/pci-assign.c | 3 ++-
 hw/misc/ivshmem.c| 5 +++--
 hw/misc/vfio.c   | 3 ++-
 hw/ppc/spapr.c   | 2 +-
 include/exec/memory.h| 4 +++-
 memory.c | 5 +++--
 7 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/display/g364fb.c b/hw/display/g364fb.c
index 46f7b41..cce33ae 100644
--- a/hw/display/g364fb.c
+++ b/hw/display/g364fb.c
@@ -487,7 +487,7 @@ static void g364fb_init(DeviceState *dev, G364State *s)
 
 memory_region_init_io(&s->mem_ctrl, NULL, &g364fb_ctrl_ops, s, "ctrl", 
0x18);
 memory_region_init_ram_ptr(&s->mem_vram, NULL, "vram",
-   s->vram_size, s->vram);
+   s->vram_size, s->vram, &error_abort);
 vmstate_register_ram(&s->mem_vram, dev);
 memory_region_set_coalescing(&s->mem_vram);
 }
diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 13b9de0..b37d606 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -456,7 +456,8 @@ static void assigned_dev_register_regions(PCIRegion 
*io_regions,
  object_get_typename(OBJECT(pci_dev)), i);
 memory_region_init_ram_ptr(&pci_dev->v_addrs[i].real_iomem,
OBJECT(pci_dev), name,
-   cur_region->size, virtbase);
+   cur_region->size, virtbase,
+   &error_abort);
 vmstate_register_ram(&pci_dev->v_addrs[i].real_iomem,
  &pci_dev->dev.qdev);
 }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index bd9d718..906a6b5 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -352,7 +352,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
fd) {
 ptr = mmap(0, s->ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 
 memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s), "ivshmem.bar2",
-   s->ivshmem_size, ptr);
+   s->ivshmem_size, ptr, &error_abort);
 vmstate_register_ram(&s->ivshmem, DEVICE(s));
 memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
 
@@ -480,7 +480,8 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, 
int flags)
 map_ptr = mmap(0, s->ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED,
 incoming_fd, 0);
 memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
-   "ivshmem.bar2", s->ivshmem_size, map_ptr);
+   "ivshmem.bar2", s->ivshmem_size, map_ptr,
+   &error_abort);
 vmstate_register_ram(&s->ivshmem, DEVICE(s));
 
 IVSHMEM_DPRINTF("guest h/w addr = %p, size = %" PRIu64 "\n",
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 40dcaa6..abb8d1b 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -2910,7 +2910,8 @@ static int vfio_mmap_bar(VFIODevice *vdev, VFIOBAR *bar,
 goto empty_region;
 }
 
-memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
+memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map,
+   &error_abort);
 } else {
 empty_region:
 /* Create a zero sized sub-region to make cleanup easy. */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2ab4460..3c4aaf0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1402,7 +1402,7 @@ static void ppc_spapr_init(MachineState *machine)
 if (rma_alloc_size && rma) {
 rma_region = g_new(MemoryRegion, 1);
 memory_region_init_ram_ptr(rma_region, NULL, "ppc_spapr.rma",
-   rma_alloc_size, rma);
+   rma_alloc_size, rma, &error_abort);
 vmstate_register_ram_global(rma_region);
 memory_region_add_subregion(sysmem, 0, rma_region);
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index fd4131b..dcb35a2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -351,12 +351,14 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
  * @name: the name of the region.
  * @size: size of the region.
  * @ptr: memory to be mapped; must contain at least @size bytes.
+ * @errp: pointer to Error*, to store an error if it happens.
  */
 void memory_region_init_ram_ptr(MemoryRegion *mr,
 struct Object *owner,
 const char *name,
 uint64_t size,
-void *ptr);
+void *ptr,
+Error **errp);
 
 /**
  * memo

[Qemu-devel] [PATCH v7 RESEND 5/8] hostmem-ram: don't exit qemu if size of memory-backend-ram is way too big

2014-09-08 Thread Hu Tao

When using monitor command object_add to add a memory backend whose
size is way too big to allocate memory for it, qemu just exits. In
the case we'd better give an error message and keep guest running.

The problem can be reproduced as follows:

1. run qemu
2. (monitor)object_add memory-backend-ram,size=10G,id=ram0

Reviewed-by: Peter Crosthwaite 
Signed-off-by: Hu Tao 
---
 backends/hostmem-ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index e55d066..a67a134 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -27,7 +27,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 
 path = object_get_canonical_path_component(OBJECT(backend));
 memory_region_init_ram(&backend->mr, OBJECT(backend), path,
-   backend->size, &error_abort);
+   backend->size, errp);
 g_free(path);
 }
 
-- 
1.9.3

[Qemu-devel] [PATCH v7 RESEND 4/8] memory: add parameter errp to memory_region_init_rom_device

2014-09-08 Thread Hu Tao

Add parameter errp to memory_region_init_rom_device and update all call
sites to pass in &error_abort.

Reviewed-by: Peter Crosthwaite 
Signed-off-by: Hu Tao 
---
 hw/block/pflash_cfi01.c | 2 +-
 hw/block/pflash_cfi02.c | 2 +-
 include/exec/memory.h   | 4 +++-
 memory.c| 5 +++--
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 593fbc5..4d51bfd 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -773,7 +773,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 memory_region_init_rom_device(
 &pfl->mem, OBJECT(dev),
 pfl->be ? &pflash_cfi01_ops_be : &pflash_cfi01_ops_le, pfl,
-pfl->name, total_len);
+pfl->name, total_len, &error_abort);
 vmstate_register_ram(&pfl->mem, DEVICE(pfl));
 pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index e196f4d..29bb0dc 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -608,7 +608,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 
 memory_region_init_rom_device(&pfl->orig_mem, OBJECT(pfl), pfl->be ?
   &pflash_cfi02_ops_be : &pflash_cfi02_ops_le,
-  pfl, pfl->name, chip_len);
+  pfl, pfl->name, chip_len, &error_abort);
 vmstate_register_ram(&pfl->orig_mem, DEVICE(pfl));
 pfl->storage = memory_region_get_ram_ptr(&pfl->orig_mem);
 pfl->chip_len = chip_len;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index dcb35a2..4921d27 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -388,13 +388,15 @@ void memory_region_init_alias(MemoryRegion *mr,
  * @ops: callbacks for write access handling.
  * @name: the name of the region.
  * @size: size of the region.
+ * @errp: pointer to Error*, to store an error if it happens.
  */
 void memory_region_init_rom_device(MemoryRegion *mr,
struct Object *owner,
const MemoryRegionOps *ops,
void *opaque,
const char *name,
-   uint64_t size);
+   uint64_t size,
+   Error **errp);
 
 /**
  * memory_region_init_reservation: Initialize a memory region that reserves
diff --git a/memory.c b/memory.c
index ffe24ff..38c29df 100644
--- a/memory.c
+++ b/memory.c
@@ -1202,7 +1202,8 @@ void memory_region_init_rom_device(MemoryRegion *mr,
const MemoryRegionOps *ops,
void *opaque,
const char *name,
-   uint64_t size)
+   uint64_t size,
+   Error **errp)
 {
 memory_region_init(mr, owner, name, size);
 mr->ops = ops;
@@ -1210,7 +1211,7 @@ void memory_region_init_rom_device(MemoryRegion *mr,
 mr->terminates = true;
 mr->rom_device = true;
 mr->destructor = memory_region_destructor_rom_device;
-mr->ram_addr = qemu_ram_alloc(size, mr, &error_abort);
+mr->ram_addr = qemu_ram_alloc(size, mr, errp);
 }
 
 void memory_region_init_iommu(MemoryRegion *mr,
-- 
1.9.3

[Qemu-devel] [PATCH v7 RESEND 1/8] exec: add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr

2014-09-08 Thread Hu Tao

Add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr so that
we can handle errors.

Signed-off-by: Hu Tao 
Reviewed-by: Peter Crosthwaite 
---
 exec.c  | 36 +++-
 include/exec/ram_addr.h |  4 ++--
 memory.c|  6 +++---
 3 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/exec.c b/exec.c
index 7dddcc8..2b9c4c5 100644
--- a/exec.c
+++ b/exec.c
@@ -1259,7 +1259,7 @@ static int memory_try_enable_merging(void *addr, size_t 
len)
 return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
-static ram_addr_t ram_block_add(RAMBlock *new_block)
+static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
 {
 RAMBlock *block;
 ram_addr_t old_ram_size, new_ram_size;
@@ -1276,9 +1276,11 @@ static ram_addr_t ram_block_add(RAMBlock *new_block)
 } else {
 new_block->host = phys_mem_alloc(new_block->length);
 if (!new_block->host) {
-fprintf(stderr, "Cannot set up guest memory '%s': %s\n",
-memory_region_name(new_block->mr), strerror(errno));
-exit(1);
+error_setg_errno(errp, errno,
+ "cannot set up guest memory '%s'",
+ memory_region_name(new_block->mr));
+qemu_mutex_unlock_ramlist();
+return -1;
 }
 memory_try_enable_merging(new_block->host, new_block->length);
 }
@@ -1329,6 +1331,8 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 Error **errp)
 {
 RAMBlock *new_block;
+ram_addr_t addr;
+Error *local_err = NULL;
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
@@ -1358,14 +1362,22 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return -1;
 }
 
-return ram_block_add(new_block);
+addr = ram_block_add(new_block, &local_err);
+if (local_err) {
+g_free(new_block);
+error_propagate(errp, local_err);
+return -1;
+}
+return addr;
 }
 #endif
 
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
-   MemoryRegion *mr)
+   MemoryRegion *mr, Error **errp)
 {
 RAMBlock *new_block;
+ram_addr_t addr;
+Error *local_err = NULL;
 
 size = TARGET_PAGE_ALIGN(size);
 new_block = g_malloc0(sizeof(*new_block));
@@ -1376,12 +1388,18 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
 if (host) {
 new_block->flags |= RAM_PREALLOC;
 }
-return ram_block_add(new_block);
+addr = ram_block_add(new_block, &local_err);
+if (local_err) {
+g_free(new_block);
+error_propagate(errp, local_err);
+return -1;
+}
+return addr;
 }
 
-ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr)
+ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr, Error **errp)
 {
-return qemu_ram_alloc_from_ptr(size, NULL, mr);
+return qemu_ram_alloc_from_ptr(size, NULL, mr, errp);
 }
 
 void qemu_ram_free_from_ptr(ram_addr_t addr)
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6593be1..cf1d4c7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -26,8 +26,8 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 bool share, const char *mem_path,
 Error **errp);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
-   MemoryRegion *mr);
-ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
+   MemoryRegion *mr, Error **errp);
+ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr, Error **errp);
 int qemu_get_ram_fd(ram_addr_t addr);
 void *qemu_get_ram_block_host_ptr(ram_addr_t addr);
 void *qemu_get_ram_ptr(ram_addr_t addr);
diff --git a/memory.c b/memory.c
index 1bae951..17cf40d 100644
--- a/memory.c
+++ b/memory.c
@@ -1148,7 +1148,7 @@ void memory_region_init_ram(MemoryRegion *mr,
 mr->ram = true;
 mr->terminates = true;
 mr->destructor = memory_region_destructor_ram;
-mr->ram_addr = qemu_ram_alloc(size, mr);
+mr->ram_addr = qemu_ram_alloc(size, mr, &error_abort);
 }
 
 #ifdef __linux__
@@ -1178,7 +1178,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 mr->ram = true;
 mr->terminates = true;
 mr->destructor = memory_region_destructor_ram_from_ptr;
-mr->ram_addr = qemu_ram_alloc_from_ptr(size, ptr, mr);
+mr->ram_addr = qemu_ram_alloc_from_ptr(size, ptr, mr, &error_abort);
 }
 
 void memory_region_init_alias(MemoryRegion *mr,
@@ -1208,7 +1208,7 @@ void memory_region_init_rom_device(MemoryRegion *mr,
 mr->terminates = true;
 mr->rom_device = true;
 mr->destructor = memory_region_destruct

[Qemu-devel] [PATCH v7 RESEND 0/8] memory API improvements and bug fixes for memory backends

2014-09-08 Thread Hu Tao

This is merely a rebase of v7, and fixes two merge conflicts.

This series includes two parts:
 
  1. part 1 includes patches 1-4, which improves error handling of
 memory_region_init_ram, memory_region_init_ram_ptr and
 memory_region_init_rom_device
 
  2. part 2 includes patches 5-8, each fixes a bug of memory backend.
 
changes:

  v7:
  - split the memory-file preallocation fix into its own patch(patch 6)
  - some commit messages tweaks

  v6:
  - split patch 6 in v5 into 2
  - use local_err instead of errp
  - typo fixes

  v5:
  - don't introduce _may_fail and _nofail versions of memory API
  - the patches order of bug fix and memory API change is exchanged, first
comes the API change, then bug fix.

  v4:
  - fix build on 32bit host
  - add memory API renaming
  - rebase on latest master

  v3:
  - introduce memory_region_init_ram_may_fail and
memory_region_init_ram_ptr_may_fail
  - address comments by MST
  - missing the functions renaming. will send later.

  v2:
  - split patch 1 in v1 into 2 patches
  - don't rely on ram_block_add to return -1
  - error message tweak in file_ram_alloc
  - add error messages reported by qemu to commit message of patch 3

  v1:
  - initial version


Hu Tao (8):
  exec: add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr
  memory: add parameter errp to memory_region_init_ram
  memory: add parameter errp to memory_region_init_ram_ptr
  memory: add parameter errp to memory_region_init_rom_device
  hostmem-ram: don't exit qemu if size of memory-backend-ram is way too
big
  exec: file_ram_alloc: don't exit if failed to preallocate memory
  exec: report error when memory < hpagesize
  exec: add parameter errp to gethugepagesize

 backends/hostmem-ram.c   |  2 +-
 exec.c   | 60 ++--
 hw/alpha/typhoon.c   |  3 +-
 hw/arm/armv7m.c  |  7 ++--
 hw/arm/cubieboard.c  |  2 +-
 hw/arm/digic_boards.c|  2 +-
 hw/arm/exynos4210.c  |  9 ++---
 hw/arm/highbank.c|  5 +--
 hw/arm/integratorcp.c|  5 +--
 hw/arm/kzm.c |  4 +--
 hw/arm/mainstone.c   |  3 +-
 hw/arm/musicpal.c|  6 ++--
 hw/arm/omap1.c   |  6 ++--
 hw/arm/omap2.c   |  6 ++--
 hw/arm/omap_sx1.c|  6 ++--
 hw/arm/palm.c|  3 +-
 hw/arm/pxa2xx.c  | 11 +++---
 hw/arm/realview.c|  9 +++--
 hw/arm/spitz.c   |  2 +-
 hw/arm/strongarm.c   |  3 +-
 hw/arm/tosa.c|  2 +-
 hw/arm/versatilepb.c |  3 +-
 hw/arm/vexpress.c| 15 +---
 hw/arm/virt.c|  3 +-
 hw/arm/xilinx_zynq.c |  6 ++--
 hw/block/onenand.c   |  2 +-
 hw/block/pflash_cfi01.c  |  2 +-
 hw/block/pflash_cfi02.c  |  2 +-
 hw/core/loader.c |  2 +-
 hw/cris/axis_dev88.c |  6 ++--
 hw/display/cg3.c |  6 ++--
 hw/display/g364fb.c  |  2 +-
 hw/display/qxl.c |  6 ++--
 hw/display/sm501.c   |  2 +-
 hw/display/tc6393xb.c|  3 +-
 hw/display/tcx.c |  5 +--
 hw/display/vga.c |  3 +-
 hw/display/vmware_vga.c  |  3 +-
 hw/i386/kvm/pci-assign.c |  6 ++--
 hw/i386/pc.c |  3 +-
 hw/i386/pc_sysfw.c   |  5 +--
 hw/input/milkymist-softusb.c |  4 +--
 hw/lm32/lm32_boards.c|  6 ++--
 hw/lm32/milkymist.c  |  3 +-
 hw/m68k/an5206.c |  4 +--
 hw/m68k/dummy_m68k.c |  2 +-
 hw/m68k/mcf5208.c|  4 +--
 hw/microblaze/petalogix_ml605_mmu.c  |  5 +--
 hw/microblaze/petalogix_s3adsp1800_mmu.c |  6 ++--
 hw/mips/mips_fulong2e.c  |  5 +--
 hw/mips/mips_jazz.c  |  8 +++--
 hw/mips/mips_malta.c |  6 ++--
 hw/mips/mips_mipssim.c   |  6 ++--
 hw/mips/mips_r4k.c   |  5 +--
 hw/misc/ivshmem.c|  5 +--
 hw/misc/vfio.c   |  3 +-
 hw/moxie/moxiesim.c  |  4 +--
 hw/net/milkymist-minimac2.c  |  2 +-
 hw/openrisc/openrisc_sim.c   |  2 +-
 hw/pci-host/prep.c   |  3 +-
 hw/pci/pci.c |  2 +-
 hw/ppc/mac_newworld.c|  3 +-
 hw/ppc/mac_oldworld.c|  3 +-
 hw/ppc/ppc405_boards.c

[Qemu-devel] [PULL] qemu-sparc update

2014-09-08 Thread Mark Cave-Ayland

Hi Peter,

I've just updated my qemu-sparc branch with an implementation of the apb PCI 
error registers. Please pull.


ATB,

Mark. 


The following changes since commit 1bc0e405816c9c6bde5695af20b07a1491ce1f9f:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2014-09-08 13:14:41 +0100)

are available in the git repository at:


  https://github.com/mcayland/qemu.git tags/qemu-sparc-signed

for you to fetch changes up to de739df8e0032d441a57ec1a8f9ab8e9bf72cef0:

  apb: implement PCI bus error interrupt map registers (2014-09-09 06:07:12 
+0100)


apb: implement PCI bus error interrupt map registers


Mark Cave-Ayland (1):
  apb: implement PCI bus error interrupt map registers

 hw/pci-host/apb.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

[Qemu-devel] [PATCH v14 4/5] raw-posix: Add falloc and full preallocation option

2014-09-08 Thread Hu Tao

This patch adds a new option preallocation for raw format, and implements
falloc and full preallocation.

Signed-off-by: Hu Tao 
Reviewed-by: Max Reitz 
---
 block/raw-posix.c | 93 +++
 qemu-doc.texi |  9 ++
 qemu-img.texi |  9 ++
 3 files changed, 92 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7208c05..27c854c 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -30,6 +30,7 @@
 #include "block/thread-pool.h"
 #include "qemu/iov.h"
 #include "raw-aio.h"
+#include "qapi/util.h"
 
 #if defined(__APPLE__) && (__MACH__)
 #include 
@@ -1365,6 +1366,9 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int result = 0;
 int64_t total_size = 0;
 bool nocow = false;
+PreallocMode prealloc;
+char *buf = NULL;
+Error *local_err = NULL;
 
 strstart(filename, "file:", &filename);
 
@@ -1372,37 +1376,83 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
   BDRV_SECTOR_SIZE);
 nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
+buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(PreallocMode_lookup, buf,
+   PREALLOC_MODE_MAX, PREALLOC_MODE_OFF,
+   &local_err);
+g_free(buf);
+if (local_err) {
+error_propagate(errp, local_err);
+result = -EINVAL;
+goto out;
+}
 
 fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
0644);
 if (fd < 0) {
 result = -errno;
 error_setg_errno(errp, -result, "Could not create file");
-} else {
-if (nocow) {
+goto out;
+}
+
+if (nocow) {
 #ifdef __linux__
-/* Set NOCOW flag to solve performance issue on fs like btrfs.
- * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
- * will be ignored since any failure of this operation should not
- * block the left work.
- */
-int attr;
-if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
-attr |= FS_NOCOW_FL;
-ioctl(fd, FS_IOC_SETFLAGS, &attr);
-}
-#endif
+/* Set NOCOW flag to solve performance issue on fs like btrfs.
+ * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
+ * will be ignored since any failure of this operation should not
+ * block the left work.
+ */
+int attr;
+if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
+attr |= FS_NOCOW_FL;
+ioctl(fd, FS_IOC_SETFLAGS, &attr);
 }
+#endif
+}
 
-if (ftruncate(fd, total_size) != 0) {
-result = -errno;
-error_setg_errno(errp, -result, "Could not resize file");
+if (ftruncate(fd, total_size) != 0) {
+result = -errno;
+error_setg_errno(errp, -result, "Could not resize file");
+goto out_close;
+}
+
+if (prealloc == PREALLOC_MODE_FALLOC) {
+/* posix_fallocate() doesn't set errno. */
+result = -posix_fallocate(fd, 0, total_size);
+if (result != 0) {
+error_setg_errno(errp, -result,
+ "Could not preallocate data for the new file");
 }
-if (qemu_close(fd) != 0) {
-result = -errno;
-error_setg_errno(errp, -result, "Could not close the new file");
+} else if (prealloc == PREALLOC_MODE_FULL) {
+buf = g_malloc0(65536);
+int64_t num = 0, left = total_size;
+
+while (left > 0) {
+num = MIN(left, 65536);
+result = write(fd, buf, num);
+if (result < 0) {
+result = -errno;
+error_setg_errno(errp, -result,
+ "Could not write to the new file");
+g_free(buf);
+goto out_close;
+}
+left -= num;
 }
+fsync(fd);
+g_free(buf);
+} else if (prealloc != PREALLOC_MODE_OFF) {
+result = -EINVAL;
+error_setg(errp, "Unsupported preallocation mode: %s",
+   PreallocMode_lookup[prealloc]);
 }
+
+out_close:
+if (qemu_close(fd) != 0 && result == 0) {
+result = -errno;
+error_setg_errno(errp, -result, "Could not close the new file");
+}
+out:
 return result;
 }
 
@@ -1585,6 +1635,11 @@ static QemuOptsList raw_create_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "Turn off copy-on-write (valid only on btrfs)"
 },
+{
+.name = BLOCK_OPT_PREALLOC,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode (allowed values: off, falloc, full)"
+},
 { /* end of list */ }
 }

[Qemu-devel] [PATCH v14 5/5] qcow2: Add falloc and full preallocation option

2014-09-08 Thread Hu Tao

preallocation=falloc allocates disk space by posix_fallocate(),
preallocation=full allocates disk space by writing zeros to disk.
Both modes imply preallocation=metadata.

Signed-off-by: Hu Tao 
Reviewed-by: Max Reitz 
---
 block/qcow2.c  | 62 +++---
 qemu-doc.texi  |  8 +++---
 qemu-img.texi  |  8 +++---
 tests/qemu-iotests/082.out | 54 
 4 files changed, 90 insertions(+), 42 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 94d1225..375c6a6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1772,6 +1772,56 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 Error *local_err = NULL;
 int ret;
 
+if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
+int64_t meta_size = 0;
+uint64_t nreftablee, nrefblocke, nl1e, nl2e;
+int64_t aligned_total_size = align_offset(total_size, cluster_size);
+
+/* header: 1 cluster */
+meta_size += cluster_size;
+
+/* total size of L2 tables */
+nl2e = aligned_total_size / cluster_size;
+nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
+meta_size += nl2e * sizeof(uint64_t);
+
+/* total size of L1 tables */
+nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = align_offset(nl1e, cluster_size / sizeof(uint64_t));
+meta_size += nl1e * sizeof(uint64_t);
+
+/* total size of refcount blocks
+ *
+ * note: every host cluster is reference-counted, including metadata
+ * (even refcount blocks are recursively included).
+ * Let:
+ *   a = total_size (this is the guest disk size)
+ *   m = meta size not including refcount blocks and refcount tables
+ *   c = cluster size
+ *   y1 = number of refcount blocks entries
+ *   y2 = meta size including everything
+ * then,
+ *   y1 = (y2 + a)/c
+ *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
+ * we can get y1:
+ *   y1 = (a + m) / (c - sizeof(u16) - sizeof(u16) * sizeof(u64) / c)
+ */
+nrefblocke = (aligned_total_size + meta_size + cluster_size) /
+(cluster_size - sizeof(uint16_t) -
+ 1.0 * sizeof(uint16_t) * sizeof(uint64_t) / cluster_size);
+nrefblocke = align_offset(nrefblocke, cluster_size / sizeof(uint16_t));
+meta_size += nrefblocke * sizeof(uint16_t);
+
+/* total size of refcount tables */
+nreftablee = nrefblocke * sizeof(uint16_t) / cluster_size;
+nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
+meta_size += nreftablee * sizeof(uint64_t);
+
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+aligned_total_size + meta_size);
+qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_lookup[prealloc]);
+}
+
 ret = bdrv_create_file(filename, opts, &local_err);
 if (ret < 0) {
 error_propagate(errp, local_err);
@@ -1877,7 +1927,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 /* And if we're supposed to preallocate metadata, do that now */
-if (prealloc) {
+if (prealloc != PREALLOC_MODE_OFF) {
 BDRVQcowState *s = bs->opaque;
 qemu_co_mutex_lock(&s->lock);
 ret = preallocate(bs);
@@ -1958,13 +2008,6 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 flags |= BLOCK_FLAG_LAZY_REFCOUNTS;
 }
 
-if (prealloc && prealloc != PREALLOC_MODE_METADATA) {
-ret = -EINVAL;
-error_setg(errp, "Unsupported preallocate mode: %s",
-   PreallocMode_lookup[prealloc]);
-goto finish;
-}
-
 if (backing_file && prealloc) {
 error_setg(errp, "Backing file and preallocation cannot be used at "
"the same time");
@@ -2525,7 +2568,8 @@ static QemuOptsList qcow2_create_opts = {
 {
 .name = BLOCK_OPT_PREALLOC,
 .type = QEMU_OPT_STRING,
-.help = "Preallocation mode (allowed values: off, metadata)"
+.help = "Preallocation mode (allowed values: off, metadata, "
+"falloc, full)"
 },
 {
 .name = BLOCK_OPT_LAZY_REFCOUNTS,
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 1f289d6..ef3be72 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -584,9 +584,11 @@ sizes can improve the image file size whereas larger 
cluster sizes generally
 provide better performance.
 
 @item preallocation
-Preallocation mode (allowed values: off, metadata). An image with preallocated
-metadata is initially larger but can improve performance when the image needs
-to grow.
+Preallocation mode (allowed values: @code{off}, @code{metadata}, @code{falloc},
+@code{full}). An image with preallocated metadata is initially larger but

[Qemu-devel] [PATCH v14 3/5] qapi: introduce PreallocMode and new PreallocModes full and falloc.

2014-09-08 Thread Hu Tao

This patch prepares for the subsequent patches.

Signed-off-by: Hu Tao 
Reviewed-by: Max Reitz 
Reviewed-by: Kevin Wolf 
---
 block/qcow2.c  | 23 +++
 qapi/block-core.json   | 17 +
 tests/qemu-iotests/049.out |  2 +-
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index cf27c3f..94d1225 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -30,6 +30,7 @@
 #include "qemu/error-report.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qbool.h"
+#include "qapi/util.h"
 #include "trace.h"
 #include "qemu/option_int.h"
 
@@ -1738,7 +1739,7 @@ static int preallocate(BlockDriverState *bs)
 
 static int qcow2_create2(const char *filename, int64_t total_size,
  const char *backing_file, const char *backing_format,
- int flags, size_t cluster_size, int prealloc,
+ int flags, size_t cluster_size, PreallocMode prealloc,
  QemuOpts *opts, int version,
  Error **errp)
 {
@@ -1915,7 +1916,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 uint64_t size = 0;
 int flags = 0;
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
-int prealloc = 0;
+PreallocMode prealloc;
 int version = 3;
 Error *local_err = NULL;
 int ret;
@@ -1931,12 +1932,11 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
  DEFAULT_CLUSTER_SIZE);
 buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
-if (!buf || !strcmp(buf, "off")) {
-prealloc = 0;
-} else if (!strcmp(buf, "metadata")) {
-prealloc = 1;
-} else {
-error_setg(errp, "Invalid preallocation mode: '%s'", buf);
+prealloc = qapi_enum_parse(PreallocMode_lookup, buf,
+   PREALLOC_MODE_MAX, PREALLOC_MODE_OFF,
+   &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
 ret = -EINVAL;
 goto finish;
 }
@@ -1958,6 +1958,13 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 flags |= BLOCK_FLAG_LAZY_REFCOUNTS;
 }
 
+if (prealloc && prealloc != PREALLOC_MODE_METADATA) {
+ret = -EINVAL;
+error_setg(errp, "Unsupported preallocate mode: %s",
+   PreallocMode_lookup[prealloc]);
+goto finish;
+}
+
 if (backing_file && prealloc) {
 error_setg(errp, "Backing file and preallocation cannot be used at "
"the same time");
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a685d02..a29dbe1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1697,3 +1697,20 @@
 'len'   : 'int',
 'offset': 'int',
 'speed' : 'int' } }
+
+# @PreallocMode
+#
+# Preallocation mode of QEMU image file
+#
+# @off: no preallocation
+# @metadata: preallocate only for metadata
+# @falloc: like @full preallocation but allocate disk space by
+#  posix_fallocate() rather than writing zeros.
+# @full: preallocate all data by writing zeros to device to ensure disk
+#space is really available. @full preallocation also sets up
+#metadata correctly.
+#
+# Since 2.2
+##
+{ 'enum': 'PreallocMode',
+  'data': [ 'off', 'metadata', 'falloc', 'full' ] }
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 71ca44d..09ca0ae 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -179,7 +179,7 @@ qemu-img create -f qcow2 -o preallocation=metadata 
TEST_DIR/t.qcow2 64M
 Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off 
cluster_size=65536 preallocation='metadata' lazy_refcounts=off 
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
-qemu-img: TEST_DIR/t.qcow2: Invalid preallocation mode: '1234'
+qemu-img: TEST_DIR/t.qcow2: invalid parameter value: 1234
 Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off 
cluster_size=65536 preallocation='1234' lazy_refcounts=off 
 
 == Check encryption option ==
-- 
1.9.3

[Qemu-devel] [PATCH v14 2/5] block: don't convert file size to sector size

2014-09-08 Thread Hu Tao

and avoid converting it back later.

Signed-off-by: Hu Tao 
Reviewed-by: Max Reitz 
---
 block/gluster.c   |  9 -
 block/qcow.c  |  8 
 block/qcow2.c | 10 +-
 block/raw-posix.c | 12 ++--
 block/raw-win32.c |  6 +++---
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 65c7a58..1eb3a8c 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -494,8 +494,8 @@ static int qemu_gluster_create(const char *filename,
 goto out;
 }
 
-total_size = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 
 tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 if (!tmp || !strcmp(tmp, "off")) {
@@ -516,9 +516,8 @@ static int qemu_gluster_create(const char *filename,
 if (!fd) {
 ret = -errno;
 } else {
-if (!glfs_ftruncate(fd, total_size * BDRV_SECTOR_SIZE)) {
-if (prealloc && qemu_gluster_zerofill(fd, 0,
-total_size * BDRV_SECTOR_SIZE)) {
+if (!glfs_ftruncate(fd, total_size)) {
+if (prealloc && qemu_gluster_zerofill(fd, 0, total_size)) {
 ret = -errno;
 }
 } else {
diff --git a/block/qcow.c b/block/qcow.c
index 041af26..a87bd69 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -725,8 +725,8 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 BlockDriverState *qcow_bs;
 
 /* Read out options */
-total_size = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
 if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
 flags |= BLOCK_FLAG_ENCRYPT;
@@ -754,7 +754,7 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 memset(&header, 0, sizeof(header));
 header.magic = cpu_to_be32(QCOW_MAGIC);
 header.version = cpu_to_be32(QCOW_VERSION);
-header.size = cpu_to_be64(total_size * 512);
+header.size = cpu_to_be64(total_size);
 header_size = sizeof(header);
 backing_filename_len = 0;
 if (backing_file) {
@@ -776,7 +776,7 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 header_size = (header_size + 7) & ~7;
 shift = header.cluster_bits + header.l2_bits;
-l1_size = ((total_size * 512) + (1LL << shift) - 1) >> shift;
+l1_size = (total_size + (1LL << shift) - 1) >> shift;
 
 header.l1_table_offset = cpu_to_be64(header_size);
 if (flags & BLOCK_FLAG_ENCRYPT) {
diff --git a/block/qcow2.c b/block/qcow2.c
index c8050e5..cf27c3f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1859,7 +1859,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 /* Okay, now that we have a valid image, let's give it the right size */
-ret = bdrv_truncate(bs, total_size * BDRV_SECTOR_SIZE);
+ret = bdrv_truncate(bs, total_size);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not resize image");
 goto out;
@@ -1912,7 +1912,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 char *backing_file = NULL;
 char *backing_fmt = NULL;
 char *buf = NULL;
-uint64_t sectors = 0;
+uint64_t size = 0;
 int flags = 0;
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
 int prealloc = 0;
@@ -1921,8 +1921,8 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int ret;
 
 /* Read out options */
-sectors = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-   BDRV_SECTOR_SIZE);
+size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+BDRV_SECTOR_SIZE);
 backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
 backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
 if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
@@ -1972,7 +1972,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto finish;
 }
 
-ret = qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
+ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
 cluster_size, prealloc, opts, version, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 9c22e3f..7208c05 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1369,8 +1369,8 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 strstart(filename, "file:", &filename);

[Qemu-devel] [PATCH v14 1/5] block: round up file size to nearest sector

2014-09-08 Thread Hu Tao

Signed-off-by: Hu Tao 
Reviewed-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/archipelago.c  |  3 ++-
 block/cow.c  |  3 ++-
 block/gluster.c  |  4 +--
 block/iscsi.c|  4 +--
 block/nfs.c  |  3 ++-
 block/qcow.c |  3 ++-
 block/qcow2.c|  3 ++-
 block/qed.c  |  3 ++-
 block/raw-posix.c|  8 +++---
 block/raw-win32.c|  4 +--
 block/rbd.c  |  3 ++-
 block/sheepdog.c |  3 ++-
 block/ssh.c  |  3 ++-
 block/vdi.c  |  3 ++-
 block/vhdx.c |  3 ++-
 block/vmdk.c |  3 ++-
 block/vpc.c  |  3 ++-
 tests/qemu-iotests/104   | 57 
 tests/qemu-iotests/104.out   | 12 +
 tests/qemu-iotests/common.filter | 21 +++
 tests/qemu-iotests/group |  1 +
 21 files changed, 127 insertions(+), 23 deletions(-)
 create mode 100755 tests/qemu-iotests/104
 create mode 100644 tests/qemu-iotests/104.out

diff --git a/block/archipelago.c b/block/archipelago.c
index 22a7daa..06c51f9 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -708,7 +708,8 @@ static int qemu_archipelago_create(const char *filename,
 
 parse_filename_opts(filename, errp, &volname, &segment_name, &mport,
 &vport);
-total_size = qemu_opt_get_size_del(options, BLOCK_OPT_SIZE, 0);
+total_size = ROUND_UP(qemu_opt_get_size_del(options, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 
 if (segment_name == NULL) {
 segment_name = g_strdup("archipelago");
diff --git a/block/cow.c b/block/cow.c
index 6ee4833..c3769fe 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -335,7 +335,8 @@ static int cow_create(const char *filename, QemuOpts *opts, 
Error **errp)
 BlockDriverState *cow_bs = NULL;
 
 /* Read out options */
-image_sectors = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / 512;
+image_sectors = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 
0),
+ BDRV_SECTOR_SIZE);
 image_filename = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
 
 ret = bdrv_create_file(filename, opts, &local_err);
diff --git a/block/gluster.c b/block/gluster.c
index 1912cf9..65c7a58 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -494,8 +494,8 @@ static int qemu_gluster_create(const char *filename,
 goto out;
 }
 
-total_size =
-qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE;
+total_size = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 
 tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 if (!tmp || !strcmp(tmp, "off")) {
diff --git a/block/iscsi.c b/block/iscsi.c
index 3e19202..84bcae8 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1531,8 +1531,8 @@ static int iscsi_create(const char *filename, QemuOpts 
*opts, Error **errp)
 bs = bdrv_new("", &error_abort);
 
 /* Read out options */
-total_size =
-qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE;
+total_size = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 bs->opaque = g_new0(struct IscsiLun, 1);
 iscsilun = bs->opaque;
 
diff --git a/block/nfs.c b/block/nfs.c
index 194f301..c76e368 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -418,7 +418,8 @@ static int nfs_file_create(const char *url, QemuOpts *opts, 
Error **errp)
 client->aio_context = qemu_get_aio_context();
 
 /* Read out options */
-total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 
 ret = nfs_client_open(client, url, O_CREAT, errp);
 if (ret < 0) {
diff --git a/block/qcow.c b/block/qcow.c
index 67c237f..041af26 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -725,7 +725,8 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 BlockDriverState *qcow_bs;
 
 /* Read out options */
-total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / 512;
+total_size = DIV_ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
 if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
 flags |= BLOCK_FLAG_ENCRYPT;
diff --git a/block/qcow2.c b/block/qcow2.c
index f9e045f..c8050e5 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1921,7 +1921,8 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int ret;
 
 /* Read out options */
-sectors = qemu_op

[Qemu-devel] [PATCH v14 0/5] qcow2, raw: add preallocation=full and preallocation=falloc

2014-09-08 Thread Hu Tao

This series adds two preallocation mode to qcow2 and raw:

Option preallocation=full preallocates disk space for image by writing
zeros to disk, this ensures disk space in any cases.

Option preallocation=falloc preallocates disk space by calling
posix_fallocate(). This is faster than preallocation=full.

Note: there is a false positive reported by checkpatch.pl to patch 1.

changes to v13:

  - rebase (patch 3 in v13 is already in)
  - don't convert file size to sector size in hdev_create(), too (patch 2)
  - reintroduce preallocation=falloc. (patch 3)
  - split the implementation of preallocation=full in v13 into
preallocation=falloc and preallocation=full (patch 4)

changes to v12:

  - remove dependence on minimal_blob_size() (patch 6)
  - remove preallocation=falloc. (patch 4)
  - preallocation=full tries posix_fallocate() first then writing
zeros (patch 5)
  - round up file size for all formats (patch 1)
  - avoid converting file size for more formats (patch 2)

changes to v11:

 - fix test case 049 (patch 4)
 - unsigned nl2e -> uint64_t nl2e (patch 6)
 - use >> instead of / (patch 6)

changes to v10:

  - PreallocMode is moved from file qapi-schema.json to qapi/block-core.json
  - introdues preallocation=falloc, no changes to preallocation=metadata
  - using minimal_blob_size() to calculate metadata size for qcow2
  - indentation fix in file blockdev.c

changes to v9:

 - use ROUND_UP to do round up
 - split the round up into its own patch and add test case
 - new patch rename parse_enum_option to qapi_enum_parse and make it public
 - reuse qapi_enum_parse

changes to v8:

 - round up image file size to nearest sector size
 - dont' blindly lose error info
 - target for 2.1 rather than 2.0
 - and, rebase to latest git tree

changes to v5:

  - add `Since 2.0' to PreallocMode
  - apply total_size change to raw-win32.c as well

changes to v4:

  - fix wrong calculation of qcow2 metadata size in v4
  - remove raw_preallocate2()
  - better error out path in raw_create()
  - fix coding style

changes to v3:

  - remove bdrv_preallocate and make preallocation a
bdrv_create_file option
  - prealloc_mode -> PreallocMode and add it to QAPI
  - fix return value in raw_preallocate2

changes to v2:

  - Fix comments to v2 by Fam.
  - qcow2: first fallocate disk space, then allocate metadata. This avoids
the problem in v2 that bdrv_preallocate may clear all information in
metadata. This does not necessarily map all data clusters sequentially
but does keep information in metadata. Peter, is this acceptable?


Hu Tao (5):
  block: round up file size to nearest sector
  block: don't convert file size to sector size
  qapi: introduce PreallocMode and new PreallocModes full and falloc.
  raw-posix: Add falloc and full preallocation option
  qcow2: Add falloc and full preallocation option

 block/archipelago.c  |   3 +-
 block/cow.c  |   3 +-
 block/gluster.c  |   9 ++--
 block/iscsi.c|   4 +-
 block/nfs.c  |   3 +-
 block/qcow.c |   7 +--
 block/qcow2.c|  80 --
 block/qed.c  |   3 +-
 block/raw-posix.c| 103 ++-
 block/raw-win32.c|   6 +--
 block/rbd.c  |   3 +-
 block/sheepdog.c |   3 +-
 block/ssh.c  |   3 +-
 block/vdi.c  |   3 +-
 block/vhdx.c |   3 +-
 block/vmdk.c |   3 +-
 block/vpc.c  |   3 +-
 qapi/block-core.json |  17 +++
 qemu-doc.texi|  17 +--
 qemu-img.texi|  17 +--
 tests/qemu-iotests/049.out   |   2 +-
 tests/qemu-iotests/082.out   |  54 ++--
 tests/qemu-iotests/104   |  57 ++
 tests/qemu-iotests/104.out   |  12 +
 tests/qemu-iotests/common.filter |  21 
 tests/qemu-iotests/group |   1 +
 26 files changed, 344 insertions(+), 96 deletions(-)
 create mode 100755 tests/qemu-iotests/104
 create mode 100644 tests/qemu-iotests/104.out

-- 
1.9.3

Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.

2014-09-08 Thread Hongyang Yang


在 09/09/2014 11:34 AM, Hongyang Yang 写道:

Hi

   I've read your documentation about Postcopy, this is interesting.
It comes to my mind that if COLO can gain some improvements from
Postcopy.
   The first thing I thought was that if we can use the back channel
that request dirty pages from source so that we do not need to manage
a ram snapshot of the source. That is, when entered a COLO checkpoint,
we just request the pages that dirtied on destination from source.
But I'm not sure it won't affect the performance, anyway, it may worth
a try.


Or we can reuse your kernel side patch, that generate the page fault when
destination access a page that was dirty. and then we load the page from
the ram snapshot we managed so that we do not need to flush the dirty page
when checkpoint. It may reduce the checkpoint duration time.



在 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) 写道:

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
---
  docs/migration.txt | 188 +
  1 file changed, 188 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..7f0fdc4 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio
operation
  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
  not enabled, the values on that fields are garbage and don't need to
  be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages 
across).
+
+However, some uses need two way communication; in particular the Postcopy
destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the 
source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+ Forward path - written by migration thread
+ Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+ Forward path - read by main thread
+ Return path  - opened by main thread, written by main thread AND postcopy
+thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration 
traffic
+and time it takes, the down side is that during the postcopy phase, a failure 
of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if 
precopy
+doesn't finish in a given time the switch is automatically made to precopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream 
completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal postcopy,
+except for the addition of a 'postcopy advise' command at the beginning to
+let the destination know that postcopy might happen.  When postcopy starts
+the source sends the page discard data and then forms the 'package' containing:
+
+   Command: 'postcopy ram listen'
+   The device state
+  A series of sections, identical to the precopy streams device state 
stream
+  containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy ram run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+

Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.

2014-09-08 Thread Hongyang Yang




在 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) 写道:

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
---
  docs/migration.txt | 188 +
  1 file changed, 188 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..7f0fdc4 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio 
operation
  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
  not enabled, the values on that fields are garbage and don't need to
  be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages 
across).
+
+However, some uses need two way communication; in particular the Postcopy 
destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the 
source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+ Forward path - written by migration thread
+ Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+ Forward path - read by main thread
+ Return path  - opened by main thread, written by main thread AND postcopy
+thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration 
traffic
+and time it takes, the down side is that during the postcopy phase, a failure 
of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if 
precopy
+doesn't finish in a given time the switch is automatically made to precopy.


I think you mean "automatically made to postcopy" here?


+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream 
completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal postcopy,
+except for the addition of a 'postcopy advise' command at the beginning to
+let the destination know that postcopy might happen.  When postcopy starts


A comma here?


+the source sends the page discard data and then forms the 'package' containing:
+
+   Command: 'postcopy ram listen'
+   The device state
+  A series of sections, identical to the precopy streams device state 
stream
+  containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy ram run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+--
+1  2   3 4 5  6   7
+main -DISCARD-CMD_PACKAGED ( LISTEN  DEVICE DEVICE DEVICE RUN )
+thread |   |
+   | (page request)
+   |\___
+   v\
+listen thread: --- page -- page -- page -- page -- page --
+
+   a   bc
+--
+
+On receipt of CMD_PACKAGED (1)
+   All the data associated with the package - the ( ... ) section in the
+diagram - is rea

Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.

2014-09-08 Thread Hongyang Yang


Hi

  I've read your documentation about Postcopy, this is interesting.
It comes to my mind that if COLO can gain some improvements from
Postcopy.
  The first thing I thought was that if we can use the back channel
that request dirty pages from source so that we do not need to manage
a ram snapshot of the source. That is, when entered a COLO checkpoint,
we just request the pages that dirtied on destination from source.
But I'm not sure it won't affect the performance, anyway, it may worth
a try.

在 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) 写道:

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
---
  docs/migration.txt | 188 +
  1 file changed, 188 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..7f0fdc4 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio 
operation
  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
  not enabled, the values on that fields are garbage and don't need to
  be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages 
across).
+
+However, some uses need two way communication; in particular the Postcopy 
destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the 
source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+ Forward path - written by migration thread
+ Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+ Forward path - read by main thread
+ Return path  - opened by main thread, written by main thread AND postcopy
+thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration 
traffic
+and time it takes, the down side is that during the postcopy phase, a failure 
of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if 
precopy
+doesn't finish in a given time the switch is automatically made to precopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream 
completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal postcopy,
+except for the addition of a 'postcopy advise' command at the beginning to
+let the destination know that postcopy might happen.  When postcopy starts
+the source sends the page discard data and then forms the 'package' containing:
+
+   Command: 'postcopy ram listen'
+   The device state
+  A series of sections, identical to the precopy streams device state 
stream
+  containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy ram run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+--
+1  2   3 4 5  6   7
+main -DISCARD-CMD_PACKAGED ( LISTEN  DEVICE DEVICE DEVICE RUN )
+thread |   |
+   | (page request)
+

Re: [Qemu-devel] [PATCH] vnc: add additional key up event before repeated key down

2014-09-08 Thread Chun Yan Liu



>>> On 9/6/2014 at 05:23 AM, in message
, Stefano
Stabellini  wrote: 
> On Fri, 5 Sep 2014, Chunyan Liu wrote: 
> > Using xen tools 'xl vncviewer' with tigervnc (default on SLE-12), 
> > found that: the display of the guest is unexpected while keep 
> > pressing a key. We expect the same character multiple times, but 
> > it prints only one time. This happens on a PV guest in text mode. 
> >  
> > After debugging, found that tigervnc sends repeated key down events 
> > in this case, to differentiate from user pressing the same key many 
> > times. Vnc server only prints the character when it finally receives 
> > key up event. 
>  
> Is this actually how a vnc client should behave? 
> How do the vnc client and server from realvnc behave in this regard 
> (they are the reference implementation)? 

VNC protocol doesn't specify how to handle key repetition. Tightvnc
sends key-down&key-up repeatedly, but some example like RealVNC for
Windows does the same thing - it sends only repeated key-down.

Generally the VNC keyboard handling gives lot of space for interpretation
and so the implementations differ.

>  
>  
> > To solve this issue, this patch tries to add additional key up event 
> > before the next repeated key down event (if the key is not a control 
> > key). 
> >  
> > Signed-off-by: Chunyan Liu  
> > --- 
> >  ui/vnc.c | 19 +++ 
> >  1 file changed, 19 insertions(+) 
> >  
> > diff --git a/ui/vnc.c b/ui/vnc.c 
> > index f8d9b7d..a265378 100644 
> > --- a/ui/vnc.c 
> > +++ b/ui/vnc.c 
> > @@ -1659,6 +1659,25 @@ static void do_key_event(VncState *vs, int down, int 
> >  
> keycode, int sym) 
> >  if (down) 
> >  vs->modifiers_state[keycode] ^= 1; 
> >  break; 
> > +default: 
> > +if (qemu_console_is_graphic(NULL)) { 
> > +/* record key 'down' info. Some client like tigervnc 
> > + * will send key down repeatedly if user pressing a 
> > + * a key for long time. In this case, we should add 
> > + * additional key up event before repeated key down, 
> > + * so that it can display the key multiple times. 
> > + */ 
> > +if (down) { 
> > +if (vs->modifiers_state[keycode]) { 
> > +/* add a key up event */ 
> > +do_key_event(vs, 0, keycode, sym); 
> > +} 
> > +vs->modifiers_state[keycode] = 1; 
> > +} else { 
> > +vs->modifiers_state[keycode] = 0; 
> > +} 
>  
> I am not 100% convinced that we have to fix this in QEMU, but if we do 
> then this doesn't look like the right fix: the switch statement that you 
> are modifying is for QEMU console switches.
> Also if I am not mistaken 
> modifiers_state is meant for control keys, not regular keys. 

Yes, you are right. Currently modifiers_state is used for control keys, and
key-down info is not recorded. I just want to reuse modifiers_state to
record the key-down info. Since control key and regular key have different
keycodes, that won't affect control key handling.

>  
> At the very least you could move the change below, near the call to 
> qemu_input_event_send_key_number. 

Could change there too, but need to check the keycode again: only regular key
needs the change, control key should not be changed. So I just move the change
ahead in the switch.

>  
>  
> > +} 
> > +break; 
> >  } 
> >   
> >  /* Turn off the lock state sync logic if the client support the led 
> > --  
> > 1.8.4.5 
> >  
>  
>

Re: [Qemu-devel] [PATCH v13 6/6] qcow2: Add full preallocation option

2014-09-08 Thread Hu Tao

On Thu, Sep 04, 2014 at 03:09:08PM +0200, Kevin Wolf wrote:
> Am 29.08.2014 um 10:33 hat Hu Tao geschrieben:
> > preallocation=full allocates disk space by fallocating the space if
> > posix_fallocate() is available, otherwise by writing zeros to disk to
> > ensure disk space in any cases.
> > 
> > Signed-off-by: Hu Tao 
> > ---
> >  block/qcow2.c  | 61 
> > +++---
> >  qemu-doc.texi  |  7 +++---
> >  qemu-img.texi  |  7 +++---
> >  tests/qemu-iotests/082.out | 54 
> >  4 files changed, 87 insertions(+), 42 deletions(-)
> 
> > +qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
> > +aligned_total_size + meta_size);
> > +qemu_opt_set(opts, BLOCK_OPT_PREALLOC, 
> > PreallocMode_lookup[prealloc]);
> > +}
> 
> This means that if used with a protocol that doesn't have a
> preallocation option, it gets silently ignored. I'm not completely
> decided yet whether that's a bug or a feature. :-)

We can add code to reject preallocation option if it is determined as a
bug later, but it seems not necessary currently.

Regards,
Hu

[Qemu-devel] [PATCH] Fix typos and misspellings in comments

2014-09-08 Thread zhanghailiang

Found by codespell:

occured -> occurred
formated -> formatted
Comander -> Commander
gaurantee -> guarantee

Signed-off-by: zhanghailiang 
---
 hw/ppc/spapr.c   | 2 +-
 hw/usb/quirks.h  | 2 +-
 libcacard/vcard_emul_nss.c   | 2 +-
 linux-headers/asm-powerpc/epapr_hcalls.h | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2ab4460..bedef2f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -541,7 +541,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
 _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
 
 /*
- * According to PAPR, rtas ibm,os-term, does not gaurantee a return
+ * According to PAPR, rtas ibm,os-term, does not guarantee a return
  * back to the guest cpu.
  *
  * While an additional ibm,extended-os-term property indicates that
diff --git a/hw/usb/quirks.h b/hw/usb/quirks.h
index 8dc6065..2d5d98d 100644
--- a/hw/usb/quirks.h
+++ b/hw/usb/quirks.h
@@ -90,7 +90,7 @@ static const struct usb_device_id usbredir_raw_serial_ids[] = 
{
 { USB_DEVICE(0x10C4, 0x81E8) }, /* Zephyr Bioharness */
 { USB_DEVICE(0x10C4, 0x81F2) }, /* C1007 HF band RFID controller */
 { USB_DEVICE(0x10C4, 0x8218) }, /* Lipowsky Industrie Elektronik GmbH, 
HARP-1 */
-{ USB_DEVICE(0x10C4, 0x822B) }, /* Modem EDGE(GSM) Comander 2 */
+{ USB_DEVICE(0x10C4, 0x822B) }, /* Modem EDGE(GSM) Commander 2 */
 { USB_DEVICE(0x10C4, 0x826B) }, /* Cygnal Integrated Products, Inc., 
Fasttrax GPS demonstration module */
 { USB_DEVICE(0x10C4, 0x8293) }, /* Telegesis ETRX2USB */
 { USB_DEVICE(0x10C4, 0x82F9) }, /* Procyon AVS */
diff --git a/libcacard/vcard_emul_nss.c b/libcacard/vcard_emul_nss.c
index f1bba57..ad9530c 100644
--- a/libcacard/vcard_emul_nss.c
+++ b/libcacard/vcard_emul_nss.c
@@ -286,7 +286,7 @@ vcard_emul_rsa_op(VCard *card, VCardKey *key,
 }
 }
 if ((i < buffer_size) && (buffer[i] == 0)) {
-/* yes, we have a properly formated PKCS #1 signature */
+/* yes, we have a properly formatted PKCS #1 signature */
 /*
  * NOTE: even if we accidentally got an encrypt buffer, which
  * through shear luck started with 00, 01, ff, 00, it won't matter
diff --git a/linux-headers/asm-powerpc/epapr_hcalls.h 
b/linux-headers/asm-powerpc/epapr_hcalls.h
index 06f7247..33b3f89 100644
--- a/linux-headers/asm-powerpc/epapr_hcalls.h
+++ b/linux-headers/asm-powerpc/epapr_hcalls.h
@@ -78,7 +78,7 @@
 #define EV_SUCCESS 0
 #define EV_EPERM   1   /* Operation not permitted */
 #define EV_ENOENT  2   /*  Entry Not Found */
-#define EV_EIO 3   /* I/O error occured */
+#define EV_EIO 3   /* I/O error occurred */
 #define EV_EAGAIN  4   /* The operation had insufficient
 * resources to complete and should be
 * retried
@@ -89,7 +89,7 @@
 #define EV_ENODEV  7   /* No such device */
 #define EV_EINVAL  8   /* An argument supplied to the hcall
   was out of range or invalid */
-#define EV_INTERNAL9   /* An internal error occured */
+#define EV_INTERNAL9   /* An internal error occurred */
 #define EV_CONFIG  10  /* A configuration error was detected */
 #define EV_INVALID_STATE   11  /* The object is in an invalid state */
 #define EV_UNIMPLEMENTED   12  /* Unimplemented hypercall */
-- 
1.7.12.4

[Qemu-devel] [PATCH V2] net: don't use set/get_pointer() in set/get_netdev()

2014-09-08 Thread Jason Wang

Commit 1ceef9f27359cbe92ef124bf74de6f792e71f6fb (net: multiqueue
support) tries to use set_pointer() and get_pointer() to set and get
NICPeers which is not a pointer defined in DEFINE_PROP_NETDEV. This
trick works but result a unclean and fragile implementation (e.g
print_netdev and parse_netdev).

This patch solves this issue by not using set/get_pinter() and set and
get netdev directly in set_netdev() and get_netdev(). After this the
parse_netdev() and print_netdev() were no longer used and dropped from
the source.

Cc: Markus Armbruster 
Cc: Stefan Hajnoczi 
Cc: Peter Maydell 
Signed-off-by: Jason Wang 
---
Changes from V1:
- validate ncs pointer before accessing them, this fixes the qtest failure
  on arm.
---
 hw/core/qdev-properties-system.c | 71 ++--
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index ae0900f..6939ea5 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -176,41 +176,67 @@ PropertyInfo qdev_prop_chr = {
 };
 
 /* --- netdev device --- */
+static void get_netdev(Object *obj, Visitor *v, void *opaque,
+   const char *name, Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+NICPeers *peers_ptr = qdev_get_prop_ptr(dev, prop);
+char *p = g_strdup(peers_ptr->ncs[0] ? peers_ptr->ncs[0]->name : "");
 
-static int parse_netdev(DeviceState *dev, const char *str, void **ptr)
+visit_type_str(v, &p, name, errp);
+g_free(p);
+}
+
+static void set_netdev(Object *obj, Visitor *v, void *opaque,
+   const char *name, Error **errp)
 {
-NICPeers *peers_ptr = (NICPeers *)ptr;
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+NICPeers *peers_ptr = qdev_get_prop_ptr(dev, prop);
 NetClientState **ncs = peers_ptr->ncs;
 NetClientState *peers[MAX_QUEUE_NUM];
-int queues, i = 0;
-int ret;
+Error *local_err = NULL;
+int err, queues, i = 0;
+char *str;
+
+if (dev->realized) {
+qdev_prop_set_after_realize(dev, name, errp);
+return;
+}
+
+visit_type_str(v, &str, name, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 
 queues = qemu_find_net_clients_except(str, peers,
   NET_CLIENT_OPTIONS_KIND_NIC,
   MAX_QUEUE_NUM);
 if (queues == 0) {
-ret = -ENOENT;
+err = -ENOENT;
 goto err;
 }
 
 if (queues > MAX_QUEUE_NUM) {
-ret = -E2BIG;
+err = -E2BIG;
 goto err;
 }
 
 for (i = 0; i < queues; i++) {
 if (peers[i] == NULL) {
-ret = -ENOENT;
+err = -ENOENT;
 goto err;
 }
 
 if (peers[i]->peer) {
-ret = -EEXIST;
+err = -EEXIST;
 goto err;
 }
 
 if (ncs[i]) {
-ret = -EINVAL;
+err = -EINVAL;
 goto err;
 }
 
@@ -219,31 +245,12 @@ static int parse_netdev(DeviceState *dev, const char 
*str, void **ptr)
 }
 
 peers_ptr->queues = queues;
-
-return 0;
+g_free(str);
+return;
 
 err:
-return ret;
-}
-
-static char *print_netdev(void *ptr)
-{
-NetClientState *netdev = ptr;
-const char *val = netdev->name ? netdev->name : "";
-
-return g_strdup(val);
-}
-
-static void get_netdev(Object *obj, Visitor *v, void *opaque,
-   const char *name, Error **errp)
-{
-get_pointer(obj, v, opaque, print_netdev, name, errp);
-}
-
-static void set_netdev(Object *obj, Visitor *v, void *opaque,
-   const char *name, Error **errp)
-{
-set_pointer(obj, v, opaque, parse_netdev, name, errp);
+error_set_from_qdev_prop_error(errp, err, dev, prop, str);
+g_free(str);
 }
 
 PropertyInfo qdev_prop_netdev = {
-- 
1.8.3.1

[Qemu-devel] ballooning not working on hotplugged pc-dimm

2014-09-08 Thread Alexandre DERUMIER

Hello,

I was playing with pc-dimm hotplug, and I notice that balloning is not working 
on
memory space of pc-dimm devices.

example:

qemu -m size=1024,slots=255,maxmem=15000M 

#free -m : 1024M
-> qmp balloon 512M
#free -m : 512M

-> hotplug pc-dimm 1G:

#free -m : 1512M


(This is the same behavior if qemu is started with pc-dimm devices)


qemu 2.1
Guest kernel : 3.12. 



Does it need a guest balloon module update ?

Regards,

Alexandre Derumier

Re: [Qemu-devel] [PATCH] net: don't use set/get_pointer() in set/get_netdev()

2014-09-08 Thread Jason Wang

On 09/05/2014 12:40 AM, Peter Maydell wrote:
> On 4 September 2014 17:21, Stefan Hajnoczi  wrote:
>> Unfortunately this patch breaks aarch64-softmmu qtests:
>> GTESTER check-qtest-aarch64
>> Broken pipe
>> GTester: last random seed: R02S6d8ab263ca56f8ae7a4b47bdf93fbc73
>>
>> Please take a look at what is causing this.
> Specifically, it breaks on all the ARM boards which have
> more than one built in ethernet device. Something in your
> patch is probably assuming there is only one (there's
> a somewhat suspicious "[0]" array reference, for instance).
>
> thanks
> -- PMM
>

The '[0]' dereference was introduced after multiqueue is supported.
Since there may be multiple pairs of peers for each nic.

The failure reason looks like the ncs[0] pointer should be validated
before trying to access them.

Will post V2.

Thanks

Re: [Qemu-devel] [PATCH] net: don't use set/get_pointer() in set/get_netdev()

2014-09-08 Thread Jason Wang

On 09/05/2014 12:21 AM, Stefan Hajnoczi wrote:
> Unfortunately this patch breaks aarch64-softmmu qtests:
> GTESTER check-qtest-aarch64
> Broken pipe
> GTester: last random seed: R02S6d8ab263ca56f8ae7a4b47bdf93fbc73
>
> Please take a look at what is causing this.
>
> Dropped from the net branch.
>
> Stefan
>

Ok, will post V2 later.

Thanks

Re: [Qemu-devel] [PATCH v2] qcow2: add update refcount table realization for update_refcount

2014-09-08 Thread Jun Li

On Fri, 09/05 12:21, Kevin Wolf wrote:
> Am 01.09.2014 um 12:52 hat Jun Li geschrieben:
> > When every item of refcount block is NULL, free refcount block and reset the
> > corresponding item of refcount table with NULL.
> > 
> > Signed-off-by: Jun Li 
> 
> The commit message should also describe why this is a relevant
> improvement for some use case. My gut feeling is that it complicates the
> code for a very minimal gain.

Hi Kevin,

  "Add update refcount table realization for update_refcount" is nesseary for
  qcow2 shrinking. I will submit v3 of "qcow2: Patch for shrinking qcow2 disk
  image". When check the code of update_refcount, I find it lacks of this patch.

Best Regards,
Jun Li

Re: [Qemu-devel] [Xen-devel] [RFC Patch v3 23/22] Introduce "xen-load-devices-state"

2014-09-08 Thread Wen Congyang

At 09/06/2014 05:57 AM, Stefano Stabellini Write:
> On Fri, 5 Sep 2014, Wen Congyang wrote:
>> introduce a "xen-load-devices-state" QAPI command that can be used to load
>> the state of all devices, but not the RAM or the block devices of the
>> VM.
> 
> Hello Wen,
> please CC qemu-devel too for QEMU patches.
> 
> Could you please explain why do you need this command?
> 
> Is the main issue that there are no QMP commands today to load the state
> of a VM? It looks like that for vm restore we are using the -incoming
> command line option, but there is no alternative over QMP.

We only have hmp commands savevm/loadvm, and qmp commands 
xen-save-devices-state.

We use this new command for COLO:
1. suspend both primay vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm

In such case, we need to update all devices's state in any time.

> 
> [...]
> 
> 
>> diff --git a/savevm.c b/savevm.c
>> index 22123be..c6aa502 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -863,6 +863,105 @@ out:
>>  return ret;
>>  }
>>  
>> +static int qemu_load_devices_state(QEMUFile *f)
>> +{
>> +uint8_t section_type;
>> +unsigned int v;
>> +int ret;
>> +
>> +if (qemu_savevm_state_blocked(NULL)) {
>> +return -EINVAL;
>> +}
>> +
>> +v = qemu_get_be32(f);
>> +if (v != QEMU_VM_FILE_MAGIC) {
>> +return -EINVAL;
>> +}
>> +
>> +v = qemu_get_be32(f);
>> +if (v == QEMU_VM_FILE_VERSION_COMPAT) {
>> +fprintf(stderr, "SaveVM v2 format is obsolete and don't work 
>> anymore\n");
>> +return -ENOTSUP;
>> +}
>> +if (v != QEMU_VM_FILE_VERSION) {
>> +return -ENOTSUP;
>> +}
>> +
>> +while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +uint32_t instance_id, version_id, section_id;
>> +SaveStateEntry *se;
>> +char idstr[257];
>> +int len;
>> +
>> +switch (section_type) {
>> +case QEMU_VM_SECTION_FULL:
>> +/* Read section start */
>> +section_id = qemu_get_be32(f);
>> +len = qemu_get_byte(f);
>> +qemu_get_buffer(f, (uint8_t *)idstr, len);
>> +idstr[len] = 0;
>> +instance_id = qemu_get_be32(f);
>> +version_id = qemu_get_be32(f);
>> +
>> +/* Find savevm section */
>> +se = find_se(idstr, instance_id);
>> +if (se == NULL) {
>> +fprintf(stderr, "Unknown savevm section or instance '%s' 
>> %d\n",
>> +idstr, instance_id);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +/* Validate version */
>> +if (version_id > se->version_id) {
>> +fprintf(stderr, "loadvm: unsupported version %d for '%s' 
>> v%d\n",
>> +version_id, idstr, se->version_id);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +/* Validate if it is a device's state */
>> +if (se->is_ram) {
>> +fprintf(stderr, "loadvm: %s is not devices state\n", idstr);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +ret = vmstate_load(f, se, version_id);
>> +if (ret < 0) {
>> +fprintf(stderr, "qemu: warning: error while loading state 
>> for instance 0x%x of device '%s'\n",
>> +instance_id, idstr);
>> +goto out;
>> +}
>> +break;
>> +case QEMU_VM_SECTION_START:
>> +case QEMU_VM_SECTION_PART:
>> +case QEMU_VM_SECTION_END:
>> +/*
>> + * The file is saved by the command xen-save-devices-state,
>> + * So it should not contain section start/part/end.
>> + */
>> +default:
>> +fprintf(stderr, "Unknown savevm section type %d\n", 
>> section_type);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +}
>> +
>> +cpu_synchronize_all_post_init();
>> +
>> +ret = 0;
>> +
>> +out:
>> +if (ret == 0) {
>> +if (qemu_file_get_error(f)) {
>> +ret = -EIO;
>> +}
>> +}
>> +
>> +return ret;
>> +}
> 
> Assuming that the state file only contains device states, why don't you
> just call qemu_loadvm_state to implement the command?

Do you mean there is no need to check the file? If the file contains
some memory, it will cause unexpected problem.

Thanks
Wen Congyang

> 
> 
> 
>>  static BlockDriverState *find_vmstate_bs(void)
>>  {
>>  BlockDriverState *bs = NULL;
>> @@ -1027,6 +1126,33 @@ void qmp_xen_save_devices_state(const char *filename, 
>> Error **errp)
>>  }
>>  }
>>  
>> +void qmp_xen_load_devices_state(const char *filename, Error **errp)
>> +{
>> +QEMUFile *f;
>> +int saved_vm_running;
>> +int ret;
>> +
>> +saved_vm_running = runstate_is_running();
>> +vm_stop(RUN_STATE_RE

Re: [Qemu-devel] [PATCH] qemu-char: fix terminal crash when using "-monitor stdio -nographic"

2014-09-08 Thread Li Liu



On 2014/9/5 17:31, Gerd Hoffmann wrote:
> On Fr, 2014-09-05 at 11:04 +0200, Markus Armbruster wrote:
>> Li Liu  writes:
>>
>>> Ping, any more comments? Thanks.
>>
>> I'd like to hear Gerd's opinion (cc'ed).
>>
> But is having multiple character devices use the same terminal valid?
> 
> No (guess we should catch that case in stdio init).
> 
> Beside the tty initialization and cleanup you also have the problem that
> both users are racing for input.  Well, maybe not in the qemu case as it
> is the same process and it very well might be that it polls the two
> chardevs in a well defined order, so one of them gets all input and the
> other gets nothing.  With two processes reading from the terminal (try
> 'cat | less') it is actually random though.
> 
 I'm not sure. But I have found such comments in vl.c
 "According to documentation and historically, -nographic redirects
 serial port, parallel port and monitor to stdio"
> 
> In that case mux chardev is used (that is the piece which handles the
> input switching between serial and monitor via 'Ctrl-A c').  There is
> one stdio instance, and one mux instance, the mux is chained to stdio,
> and mux allows multiple backends to connect.
> 
> You can construct it on the command line this way:
> 
> qemu -nographic -nodefaults \
>-chardev stdio,mux=on,id=terminal \
>-serial chardev:terminal \
>-monitor chardev:terminal
> 
> [ serial is default, so no output here, unless you boot a guest
>   with serial console configured ]
> 
> [ Hit 'Ctrl-A h' now ]
> 
> C-a hprint this help
> C-a xexit emulator
> C-a ssave disk data back to file (if -snapshot)
> C-a ttoggle console timestamps
> C-a bsend break (magic sysrq)
> C-a cswitch between console and monitor
> C-a C-a  sends C-a
> 
> [ Hit 'Ctrl-A c' now ]
> 
> QEMU 2.1.50 monitor - type 'help' for more information
> (qemu) info chardev
> terminal: filename=mux
> terminal-base: filename=stdio
> (qemu) 
> 
> HTH,
>   Gerd
> 

Appreciate your detailed answer. Thank you very much.

Li.

> 
> 
> .
>

Re: [Qemu-devel] Supporting multiple CPU AddressSpaces and memory transaction attributes

2014-09-08 Thread Edgar E. Iglesias

On Mon, Sep 08, 2014 at 12:53:57PM +0100, Peter Maydell wrote:
> On 7 September 2014 02:47, Edgar E. Iglesias  wrote:
> > On Thu, Sep 04, 2014 at 06:47:58PM +0100, Peter Maydell wrote:
> >> We introduce the concept of memory transaction attributes,
> >> which are a guest-CPU specific bunch of bits (say, a
> >> uint32_t). We also allow the CPU to have more than one
> >> AddressSpace, and have the guest CPU code provide
> >> a function returning the AddressSpace to use for a
> >> given set of transaction attributes. For ARM we'd
> >> put all of (S/NS, mmu_idx, CPU number) into the attributes,
> >> use an AddressSpace each for S and NS, and use the
> >> S/NS bit in the attributes to select the AddressSpace
> >> The default is that we have one AddressSpace and always
> >> use that, ie the transaction attributes are ignored.
> >> (Maybe the function should return an integer index
> >> into a cpu->as[] array?)
> >
> > The new read_with_attrs (or another name) sounds good.
> > It allows us to incrementally change the codebase. It would be nice if the
> > arguments to that function be passed in a structure so we can
> > make future changes easier.
> >
> > Maybe it could even end up going this far:
> > void access(BusPayload *pay);
> >
> > With *pay having fields for attributes, data, slave error signaling etc.
> 
> Yes, picking an API that gives us expansion room
> for things like "let device signal an error response"
> would be a good idea.
> 
> > Im a little concerned about the guest CPU defining the attribute
> > format. Maybe we could start by making the attributes generic
> > (look the same for everyone).
> 
> I don't think this works at all -- our very first
> use case is ARM S/NS signalling, which is a very
> ARM specific concept. We could make it be partly
> generic and partly guest-defined, but really the
> attributes seem to me to be pretty much bus
> specific, and guest-CPU-specific is about the
> best proxy we have for that. I'd rather just have
> the core code pass these things around as an
> opaque pile of bits.

I'll try to clarify what worries me.

If we let the guest decide the bit format and ordering we'll end up with
something like following:

struct {
union {
struct {
unsigned int secure : 1;
unsigned int featureA : 1;
unsigned int featureB : 1;
} arm;
struct {
unsigned int featureC : 1;
unsigned int featureD : 1;
} mips;
struct {
unsigned int featureX : 1;
unsigned int featureY : 1;
} cris;
};
};

Once we start adding masters and slaves all writing and reading these bits
in different formats we will have trouble mixing and matching because the
interpretations will clash. If we add a type to the
attr object, we could disallow incompatible setups or maybe add
conversion routines along the path (this is what I meant with attribute
specialization and conversion below) but this becomes quite complex
to enforce.

What happens on real SoCs is that allthough various CPUs typically have their
native bus protocols, they can speak to all kinds of devices through
bus bridges. Typically bus specific features get translated or dropped.

If we instead describe the attributes like the following
(no union, i.e the attrs have globally allocated bits and look and mean
the same to everyone):

struct {
struct {
unsigned int master_id;
unsigned int featureA;
} generic;

struct {
unsigned int secure : 1;
unsigned int featureB : 1;
} axi;

struct {
unsigned int featureC : 1;
} ocp;

struct {
unsigned int featureX : 1;
unsigned int featureY : 1;
} smif;
};

As soon as a feature bit needs to be in multiple bus specific sub structs,
it becomes generic. As there will never be dups, the sub structs may not
make much sense, i.e everything is really generic. It's the highlevel
feature we describe, not the fact that it's on a specific bus protocol.

Anyway, masters that set axi.secure can talk to non-TZ OCP slaves without
enabling ocp.featureC accidentally. Features will naturally get dropped
along the buspath if they don't match between master and slave. E.g,
a slave that does not know the notion of a secure bit will never try to
read it. IMO this is a friendlier default behaviour to work with.

For cases where we need specific attr translations, this can be done through
modeling bridges. The IOMMU framework with attributes support would work for 
that
as a bridge would be a subset of an IOMMU (does attribute translation but no
address translation). I think this would become the exception rather than the
rule.

Cheers,
Edgar

Re: [Qemu-devel] [PATCH v4 0/2] trace: Trace control commands

2014-09-08 Thread Luiz Capitulino

On Mon, 08 Sep 2014 15:03:41 -0500
Lluís Vilanova  wrote:

> Lluís Vilanova writes:
> 
> > Adds QAPI/QMP commands to control tracing events, and reimplements some of 
> > the
> > related HMP commands on top.
> 
> Ping.

I need reviewers. Stefan and Eric are obvious candidates.

Re: [Qemu-devel] [PATCH v4 0/2] trace: Trace control commands

2014-09-08 Thread Lluís Vilanova

Lluís Vilanova writes:

> Adds QAPI/QMP commands to control tracing events, and reimplements some of the
> related HMP commands on top.

Ping.


Thanks,
  Lluis


> NOTE: The "trace-event-set-state" command uses a bool 'enable' argument 
> instead
>   of an enum 'state'. I'm still not sure if an enum is better than the two
>   separate booleans.

> Signed-off-by: Lluís Vilanova 
> ---

> Changes in v4:

> * Split QAPI/QMP and HMP changes into separate patches.
> * Add copyright information in QAPI file (Eric Blake).
> * Replace event state booleans with an enum (Eric Blake).
> * Document pattern type for argument "name" (Eric Blake).
> * Other documentation improvements (Markus Armbruster).
> * Change "keepgoing" to "ignore-unavailable" (Markus Armbruster).
> * Add examples in QMP command definition file (Eric Blake).
> * Propagate QMP errors to HMP (Markus Armbruster, Stefan Hajnoczi).
> * Remove trailing newlines when using error_setg (Stefan Hajnoczi).
> * Avoid multiple invocations of error_setg (Stefan Hajnoczi).
> * Various cosmetic changes (Markus Armbruster).

> Lluís Vilanova (2):
>   trace: [qmp] Add commands to query and control event tracing state
>   trace: [hmp] Reimplement "trace-event" and "info trace-events" using QMP


>  monitor.c   |   27 ++
>  qapi-schema.json|3 ++
>  qapi/trace.json |   65 
>  qmp-commands.hx |   35 
>  trace/Makefile.objs |1 +
>  trace/control.c |   13 -
>  trace/control.h |7 -
>  trace/qmp.c |   75 
> +++
>  8 files changed, 193 insertions(+), 33 deletions(-)
>  create mode 100644 qapi/trace.json
>  create mode 100644 trace/qmp.c


> To: qemu-devel@nongnu.org
> Cc: Michael Roth 
> Cc: Markus Armbruster 
> Cc: Stefan Hajnoczi 
> Cc: Luiz Capitulino 


-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH] exec: file_ram_alloc(): print error when prealloc fails

2014-09-08 Thread Eric Blake

On 09/08/2014 12:08 PM, Luiz Capitulino wrote:
> If memory allocation fails when using the -mem-prealloc command-line
> option, QEMU exits without printing any error information to
> the user:
> 
>  # qemu [...] -m 1G -mem-prealloc -mem-path /dev/hugepages
>  # echo $?
>  1
> 
> This commit adds an error message, so that we print instead:
> 
>  # qemu [...] -m 1G -mem-prealloc -mem-path /dev/hugepages
>  qemu: unable to map backing store for hugepages: Cannot allocate memory
> 
> Signed-off-by: Luiz Capitulino 
> ---
>  exec.c | 1 +
>  1 file changed, 1 insertion(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] exec: file_ram_alloc(): print error when prealloc fails

2014-09-08 Thread Luiz Capitulino

If memory allocation fails when using the -mem-prealloc command-line
option, QEMU exits without printing any error information to
the user:

 # qemu [...] -m 1G -mem-prealloc -mem-path /dev/hugepages
 # echo $?
 1

This commit adds an error message, so that we print instead:

 # qemu [...] -m 1G -mem-prealloc -mem-path /dev/hugepages
 qemu: unable to map backing store for hugepages: Cannot allocate memory

Signed-off-by: Luiz Capitulino 
---
 exec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/exec.c b/exec.c
index 7dddcc8..cb73d16 100644
--- a/exec.c
+++ b/exec.c
@@ -1130,6 +1130,7 @@ static void *file_ram_alloc(RAMBlock *block,
 
 error:
 if (mem_prealloc) {
+error_report("%s\n", error_get_pretty(*errp));
 exit(1);
 }
 return NULL;
-- 
1.9.3

Re: [Qemu-devel] qcow2, lazy_refcounts and killing qemu

2014-09-08 Thread Max Reitz


On 08.09.2014 09:16, Markus Armbruster wrote:

"Richard W.M. Jones"  writes:


On Fri, Sep 05, 2014 at 04:39:51PM +0100, Stefan Hajnoczi wrote:

Did you try older QEMU versions?  I'm curious if this is something that
crept in later or is fundamentally broken in lazy_refcounts=on.

At your prompting, I've done a bit more investigation.

I was basing my observations on qemu 2.1.0.  However I tried my test
against qemu from git today and the bug has gone.  Good!

For my entertainment, I bisected the problem, and the commit which
*fixes* it is:

   commit 91af7014125895cc74141be6b60f3a3e882ed743
   Author: Max Reitz 
   Date:   Fri Jul 18 20:24:56 2014 +0200

 block: Add bdrv_refresh_filename()

I didn't believe this either, but I have checked the result manually
and I'm pretty sure that whatever this commit does, it does end up
fixing the lazy_refcounts problem as a side-effect.

Weird.  Maybe Max (cc'ed) has an idea.


No, not really. The only remotely related effect I can imagine is that 
this patch allows a BDS to remember when it has been opened with runtime 
options differing from the standard (e.g. forcing lazy_refcounts=on for 
a qcow2 image without that flag set), but it seems this is not the case 
here. Also, this should even be ignored in most cases.


Other than that, the main intention for this commit is to allow block 
drivers to reconstruct a filename based solely on the BDS, i.e. a 
filename which more or less recreates the same BDS when opened. The 
filename in the BDS will therefore no longer be necessarily the one 
originally used for opening it.


Max

[Qemu-devel] [PATCH] qdev: HotplugHandler: rename unplug callback to unplug_request

2014-09-08 Thread Igor Mammedov

'HotplugHandler.unplug' callback is currently used as async
call to issue unplug request from device that implements it.
Renaming 'unplug' callback to 'unplug_request' should help to
avoid confusion about what callback does and would allow to
introduce 'unplug' callback that would perform actual device
removal when hardware is ready for it.

Signed-off-by: Igor Mammedov 
---
Patch is prompted by reviewing https://patchwork.ozlabs.org/patch/383372/

Dedicated 'unplug' callback could be used by bus-less pc-dimm
device. It would allow to call HotplugHandler.unplug callback
from ACPI code when guest calls _EJ0 method and execute board
specific code (PCMachine) to unmap pc-dimm from guest's address
space and perform necessary cleanup.
---
 hw/acpi/piix4.c|  6 +++---
 hw/core/hotplug.c  | 10 +-
 hw/core/qdev.c |  3 ++-
 hw/isa/lpc_ich9.c  |  6 +++---
 hw/pci-bridge/pci_bridge_dev.c |  2 +-
 hw/pci/pcie.c  |  4 ++--
 hw/pci/pcie_port.c |  2 +-
 hw/pci/shpc.c  |  4 ++--
 include/hw/hotplug.h   | 12 ++--
 include/hw/pci/pcie.h  |  4 ++--
 include/hw/pci/shpc.h  |  4 ++--
 11 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index b72b34e..0bfa814 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -354,8 +354,8 @@ static void piix4_device_plug_cb(HotplugHandler 
*hotplug_dev,
 }
 }
 
-static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev,
-   DeviceState *dev, Error **errp)
+static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
+   DeviceState *dev, Error **errp)
 {
 PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
@@ -615,7 +615,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void 
*data)
 dc->cannot_instantiate_with_device_add_yet = true;
 dc->hotpluggable = false;
 hc->plug = piix4_device_plug_cb;
-hc->unplug = piix4_device_unplug_cb;
+hc->unplug_request = piix4_device_unplug_request_cb;
 adevc->ospm_status = piix4_ospm_status;
 }
 
diff --git a/hw/core/hotplug.c b/hw/core/hotplug.c
index 5573d9d..2ec4736 100644
--- a/hw/core/hotplug.c
+++ b/hw/core/hotplug.c
@@ -23,14 +23,14 @@ void hotplug_handler_plug(HotplugHandler *plug_handler,
 }
 }
 
-void hotplug_handler_unplug(HotplugHandler *plug_handler,
-DeviceState *plugged_dev,
-Error **errp)
+void hotplug_handler_unplug_request(HotplugHandler *plug_handler,
+DeviceState *plugged_dev,
+Error **errp)
 {
 HotplugHandlerClass *hdc = HOTPLUG_HANDLER_GET_CLASS(plug_handler);
 
-if (hdc->unplug) {
-hdc->unplug(plug_handler, plugged_dev, errp);
+if (hdc->unplug_request) {
+hdc->unplug_request(plug_handler, plugged_dev, errp);
 }
 }
 
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index da1ba48..7458c9f 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -227,7 +227,8 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 qdev_hot_removed = true;
 
 if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
-hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev, errp);
+hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler,
+   dev, errp);
 } else {
 assert(dc->unplug != NULL);
 if (dc->unplug(dev) < 0) { /* legacy handler */
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 177023b..530b074 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -607,8 +607,8 @@ static void ich9_device_plug_cb(HotplugHandler *hotplug_dev,
 ich9_pm_device_plug_cb(&lpc->pm, dev, errp);
 }
 
-static void ich9_device_unplug_cb(HotplugHandler *hotplug_dev,
-  DeviceState *dev, Error **errp)
+static void ich9_device_unplug_request_cb(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
 {
 error_setg(errp, "acpi: device unplug request for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
@@ -676,7 +676,7 @@ static void ich9_lpc_class_init(ObjectClass *klass, void 
*data)
  */
 dc->cannot_instantiate_with_device_add_yet = true;
 hc->plug = ich9_device_plug_cb;
-hc->unplug = ich9_device_unplug_cb;
+hc->unplug_request = ich9_device_unplug_request_cb;
 adevc->ospm_status = ich9_pm_ospm_status;
 }
 
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 92799d0..252ea5e 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -150,7 +150,7 @@ static void pci_bridge_dev_class_init(ObjectClass *klass, 
void *data)
 dc->vmsd = &pci_bridge_dev_vmstate;
 set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
 hc->plug

Re: [Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select

2014-09-08 Thread Dr. David Alan Gilbert

(cc'ing Michael Hines who owns and knows the RDMA code)

* Karl-Philipp Richter (krichter...@aol.de) wrote:
> ** Description changed:
> 
>   After `make clean` and `git clean -x -f -d` `git checkout v2.1.0 &&
>   configure --prefix=/home/user/prefix-qemu-2.1.0 && make` fails due to
>   missing declarations
>   
>   CCqemu-seccomp.o
>   qemu-seccomp.c:28:1: error: '__NR_select' undeclared here (not in a 
> function)
>   qemu-seccomp.c:36:1: error: '__NR_mmap' undeclared here (not in a 
> function)
>   qemu-seccomp.c:57:1: error: '__NR_getrlimit' undeclared here (not in a 
> function)
>   qemu-seccomp.c:96:1: error: '__NR_time' undeclared here (not in a 
> function)
>     GEN   qmp-marshal.c
>   qemu-seccomp.c:186:1: error: '__NR_alarm' undeclared here (not in a 
> function)
>   make: *** [qemu-seccomp.o] Error 1
>   
>   Same errors for master 8b3030114a449e66c68450acaac4b66f26d91416.
>   `configure`should not succeed for a failing build. `config.log` for
>   v2.1.0 and 8b303011... attached. The content is mostly compiler output
>   which I think is unusual for `config.log`, but see for yourself.
>   
>   I'm building on a debian 7.6 chroot on Synology DSM 5.0. `uname -a` says
>   `Linux diskstatation 3.2.40 #4493 SMP Thu Aug 21 21:43:02 CST 2014
>   armv7l GNU/Linux`.
> + 
> + After installing some of the missing header files (-> configure should
> + fail at the right point with a good error message), i.e. `apt-get
> + install liblzo2-dev libbsd-dev syslinux-common libhwloc-dev librdmacm-
> + dev libsnappy-dev libibverbs-dev valgrind linux-headers-3.2.0-4-common`
> + I'm getting
> + 
> +  CCmigration-rdma.o
> + migration-rdma.c: In function 'ram_chunk_start':
> + migration-rdma.c:523:12: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]

is that:
return (uint8_t *) (((uintptr_t) rdma_ram_block->local_host_addr)
+ (i << RDMA_REG_CHUNK_SHIFT));

244:uint8_t  *local_host_addr; /* local virtual address */

in which case I think the problem is the 'i' which is a uint64_t.

> + migration-rdma.c: In function '__qemu_rdma_add_block':
> + migration-rdma.c:556:49: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:557:49: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]

> + migration-rdma.c: In function '__qemu_rdma_delete_block':
> + migration-rdma.c:664:45: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:699:49: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c: In function 'qemu_rdma_search_ram_block':
> + migration-rdma.c:1113:49: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c: In function 'qemu_rdma_register_and_get_keys':
> + migration-rdma.c:1176:50: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c:1177:29: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c:1177:51: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c:1178:29: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c: In function 'qemu_rdma_post_send_control':
> + migration-rdma.c:1562:36: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c: In function 'qemu_rdma_post_recv_control':
> + migration-rdma.c:1616:37: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c: In function 'qemu_rdma_write_one':
> + migration-rdma.c:1864:16: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c:1868:53: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:1922:52: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:1923:50: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:1977:49: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:1998:49: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c:2010:58: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> + migration-rdma.c: In function 'qemu_rdma_registration_handle':
> + migration-rdma.c:3027:21: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
> + migration-rdma.c:30

Re: [Qemu-devel] [PATCH v2 4/5] qcow2: Check L1/L2/reftable entries for alignment

2014-09-08 Thread Benoît Canet

The Monday 08 Sep 2014 à 19:47:31 (+0200), Max Reitz wrote :
> On 08.09.2014 16:40, Benoît Canet wrote:
> >The Friday 05 Sep 2014 à 16:07:18 (+0200), Max Reitz wrote :
> >>Offsets taken from the L1, L2 and refcount tables are generally assumed
> >>to be correctly aligned. However, this cannot be guaranteed if the image
> >>has been written to by something different than qemu, thus check all
> >>offsets taken from these tables for correct cluster alignment.
> >>
> >>Signed-off-by: Max Reitz 
> >>---
> >>  block/qcow2-cluster.c  | 43 ---
> >>  block/qcow2-refcount.c | 44 ++--
> >>  2 files changed, 82 insertions(+), 5 deletions(-)
> >>
> >>diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> >>index 735f687..f7dd8c0 100644
> >>--- a/block/qcow2-cluster.c
> >>+++ b/block/qcow2-cluster.c
> >>@@ -486,6 +486,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> >>uint64_t offset,
> >>  goto out;
> >>  }
> >>+if (offset_into_cluster(s, l2_offset)) {
> >>+qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" 
> >>PRIx64
> >>+" unaligned (L1 index: %#" PRIx64 ")",
> >>+l2_offset, l1_index);
> >>+return -EIO;
> >This function mix return ret and goto out and there is more of the second.
> >Can we do ret = -EIO and goto out for consistency ?
> >bs->drv == NULL after qcow2_signal_corruption so we are not afraid of out
> >sides effects.
> 
> The "out" label here is for success; that's why I introduced the "fail"
> label in this series. I could make qcow2_cache_put() in the fail path
> optional and then use goto fail, though. But this would only increase the
> code size with no real benefit apparent to me (no code deduplication; and as
> far as I remember, we have many functions with fail labels which however use
> a plain "return" before cleaning up is needed).
> 
> (before this patch, there were two places using "goto out" in this function,
> both of which were "successes" (cluster found to be unallocated)); and two
> places using "return -errno", both of which were failures (the first one due
> to l2_load() failing and the second one due to a zero cluster found in a
> pre-v3 image))

Thanks for the explanation this make me think I should question and improve
the quality of my reviews.

Best regards

Benoît

> 
> Max
> 
> >>+}
> >>+
> >>  /* load the l2 table in memory */
> >>  ret = l2_load(bs, l2_offset, &l2_table);
> >>@@ -508,8 +515,11 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> >>uint64_t offset,
> >>  break;
> >>  case QCOW2_CLUSTER_ZERO:
> >>  if (s->qcow_version < 3) {
> >>-qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> >>-return -EIO;
> >>+qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry 
> >>found"
> >>+" in pre-v3 image (L2 offset: %#" 
> >>PRIx64
> >>+", L2 index: %#x)", l2_offset, 
> >>l2_index);
> >>+ret = -EIO;
> >>+goto fail;
> >>  }
> >>  c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> >>  &l2_table[l2_index], QCOW_OFLAG_ZERO);
> >>@@ -525,6 +535,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> >>uint64_t offset,
> >>  c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> >>  &l2_table[l2_index], QCOW_OFLAG_ZERO);
> >>  *cluster_offset &= L2E_OFFSET_MASK;
> >>+if (offset_into_cluster(s, *cluster_offset)) {
> >>+qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset 
> >>%#"
> >>+PRIx64 " unaligned (L2 offset: %#" 
> >>PRIx64
> >>+", L2 index: %#x)", *cluster_offset,
> >>+l2_offset, l2_index);
> >>+ret = -EIO;
> >>+goto fail;
> >>+}
> >>  break;
> >>  default:
> >>  abort();
> >>@@ -541,6 +559,10 @@ out:
> >>  *num = nb_available - index_in_cluster;
> >>  return ret;
> >>+
> >>+fail:
> >>+qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
> >>+return ret;
> >>  }
> >>  /*
> >>@@ -576,6 +598,12 @@ static int get_cluster_table(BlockDriverState *bs, 
> >>uint64_t offset,
> >>  assert(l1_index < s->l1_size);
> >>  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
> >>+if (offset_into_cluster(s, l2_offset)) {
> >>+qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" 
> >>PRIx64
> >>+" unaligned (L1 index: %#" PRIx64 ")",
> >>+l2_offset, l1_index);
> >>+return -EIO;
> >>+}
> >>  /* seek the l2 table of the given l2 offset */
> >>@@ -948,6 +976,15 @@ static int handle_copied(BlockDriverState *bs, 
>

Re: [Qemu-devel] [PATCH v2 4/5] qcow2: Check L1/L2/reftable entries for alignment

2014-09-08 Thread Max Reitz


On 08.09.2014 16:40, Benoît Canet wrote:

The Friday 05 Sep 2014 à 16:07:18 (+0200), Max Reitz wrote :

Offsets taken from the L1, L2 and refcount tables are generally assumed
to be correctly aligned. However, this cannot be guaranteed if the image
has been written to by something different than qemu, thus check all
offsets taken from these tables for correct cluster alignment.

Signed-off-by: Max Reitz 
---
  block/qcow2-cluster.c  | 43 ---
  block/qcow2-refcount.c | 44 ++--
  2 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 735f687..f7dd8c0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -486,6 +486,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
  goto out;
  }
  
+if (offset_into_cluster(s, l2_offset)) {

+qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" PRIx64
+" unaligned (L1 index: %#" PRIx64 ")",
+l2_offset, l1_index);
+return -EIO;

This function mix return ret and goto out and there is more of the second.
Can we do ret = -EIO and goto out for consistency ?
bs->drv == NULL after qcow2_signal_corruption so we are not afraid of out
sides effects.


The "out" label here is for success; that's why I introduced the "fail" 
label in this series. I could make qcow2_cache_put() in the fail path 
optional and then use goto fail, though. But this would only increase 
the code size with no real benefit apparent to me (no code 
deduplication; and as far as I remember, we have many functions with 
fail labels which however use a plain "return" before cleaning up is 
needed).


(before this patch, there were two places using "goto out" in this 
function, both of which were "successes" (cluster found to be 
unallocated)); and two places using "return -errno", both of which were 
failures (the first one due to l2_load() failing and the second one due 
to a zero cluster found in a pre-v3 image))


Max


+}
+
  /* load the l2 table in memory */
  
  ret = l2_load(bs, l2_offset, &l2_table);

@@ -508,8 +515,11 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
  break;
  case QCOW2_CLUSTER_ZERO:
  if (s->qcow_version < 3) {
-qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
-return -EIO;
+qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry 
found"
+" in pre-v3 image (L2 offset: %#" PRIx64
+", L2 index: %#x)", l2_offset, l2_index);
+ret = -EIO;
+goto fail;
  }
  c = count_contiguous_clusters(nb_clusters, s->cluster_size,
  &l2_table[l2_index], QCOW_OFLAG_ZERO);
@@ -525,6 +535,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
  c = count_contiguous_clusters(nb_clusters, s->cluster_size,
  &l2_table[l2_index], QCOW_OFLAG_ZERO);
  *cluster_offset &= L2E_OFFSET_MASK;
+if (offset_into_cluster(s, *cluster_offset)) {
+qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset %#"
+PRIx64 " unaligned (L2 offset: %#" PRIx64
+", L2 index: %#x)", *cluster_offset,
+l2_offset, l2_index);
+ret = -EIO;
+goto fail;
+}
  break;
  default:
  abort();
@@ -541,6 +559,10 @@ out:
  *num = nb_available - index_in_cluster;
  
  return ret;

+
+fail:
+qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
+return ret;
  }
  
  /*

@@ -576,6 +598,12 @@ static int get_cluster_table(BlockDriverState *bs, 
uint64_t offset,
  
  assert(l1_index < s->l1_size);

  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+if (offset_into_cluster(s, l2_offset)) {
+qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" PRIx64
+" unaligned (L1 index: %#" PRIx64 ")",
+l2_offset, l1_index);
+return -EIO;
+}
  
  /* seek the l2 table of the given l2 offset */
  
@@ -948,6 +976,15 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,

  bool offset_matches =
  (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
  
+if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {

+qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
+"%#llx unaligned (guest offset: %#" PRIx64
+")", cluster_offset & L2E_OFFSET_MASK,
+guest_offset);
+ret = -EIO;
+goto out;
+}
+

Re: [Qemu-devel] [PATCH v2 1/5] qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED

2014-09-08 Thread Max Reitz


On 08.09.2014 16:01, Benoît Canet wrote:

The Friday 05 Sep 2014 à 16:07:15 (+0200), Max Reitz wrote :

Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
reading from an image, they should generally not be. Nonetheless, even
an image only read from may of course be corrupted and this can be
detected during normal operation. In this case, a non-fatal event should
be emitted, but the image should not be marked corrupt (in accordance to
"fatal" set to false).

Signed-off-by: Max Reitz 
---
  block/qcow2-refcount.c | 1 +
  qapi/block-core.json   | 9 +++--
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 43665b8..0bd75d2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, 
int ign, int64_t offset,
offset,
true,
size,
+  true,

What is this line ?


It's the new "fatal" field in BLOCK_IMAGE_CORRUPTED (missing context: 
qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs), message, 
true, offset, true, size, +true, &error_abort)).


Max


&error_abort);
  g_free(message);
  
diff --git a/qapi/block-core.json b/qapi/block-core.json

index a685d02..d23bcc2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1554,7 +1554,7 @@
  ##
  # @BLOCK_IMAGE_CORRUPTED
  #
-# Emitted when a disk image is being marked corrupt
+# Emitted when a corruption has been detected in a disk image
  #
  # @device: device name
  #
@@ -1568,13 +1568,18 @@
  # @size: #optional, if the corruption resulted from an image access, this is
  #the access size
  #
+# fatal: if set, the image is marked corrupt and therefore unusable after this
+#event and must be repaired (Since 2.2; before, every
+#BLOCK_IMAGE_CORRUPTED event was fatal)
+#
  # Since: 1.7
  ##
  { 'event': 'BLOCK_IMAGE_CORRUPTED',
'data': { 'device' : 'str',
  'msg': 'str',
  '*offset': 'int',
-'*size'  : 'int' } }
+'*size'  : 'int',
+'fatal'  : 'bool' } }
  
  ##

  # @BLOCK_IO_ERROR
--
2.1.0

[Qemu-devel] [PATCH 0/4] Block-related miscellaneous cleanups

2014-09-08 Thread Markus Armbruster

Random crap I ran into while working on a bigger task.

Markus Armbruster (4):
  qemu-io: Clean up openfile() after commit 2e40134
  xen_disk: Plug memory leak on error path
  xen: Drop redundant bdrv_close() from pci_piix3_xen_ide_unplug()
  thread-pool: Drop unnecessary includes

 hw/block/xen_disk.c | 31 ++-
 hw/ide/piix.c   |  1 -
 include/block/thread-pool.h |  6 +-
 qemu-io.c   | 34 --
 thread-pool.c   |  1 -
 5 files changed, 27 insertions(+), 46 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH 1/4] qemu-io: Clean up openfile() after commit 2e40134

2014-09-08 Thread Markus Armbruster

Commit 6db9560 split off the growable case so it can use
bdrv_file_open() instead of bdrv_open() then.  Growable BDSes become
anonymous.  Weird.

Commit 2e40134 folded bdrv_file_open() back into bdrv_open() with new
flag BDRV_O_PROTOCOL.  We still have two bdrv_open() calls, and
growable BDSes remain anonymous.

Circle back to before commit 6db9560: just one call, not anonymous.

Signed-off-by: Markus Armbruster 
---
 qemu-io.c | 34 --
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 33c96c4..d2ab694 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -58,30 +58,20 @@ static int openfile(char *name, int flags, int growable, 
QDict *opts)
 return 1;
 }
 
+qemuio_bs = bdrv_new("hda", &error_abort);
+
 if (growable) {
-if (bdrv_open(&qemuio_bs, name, NULL, opts, flags | BDRV_O_PROTOCOL,
-  NULL, &local_err))
-{
-fprintf(stderr, "%s: can't open%s%s: %s\n", progname,
-name ? " device " : "", name ?: "",
-error_get_pretty(local_err));
-error_free(local_err);
-return 1;
-}
-} else {
-qemuio_bs = bdrv_new("hda", &error_abort);
+flags |= BDRV_O_PROTOCOL;
+}
 
-if (bdrv_open(&qemuio_bs, name, NULL, opts, flags, NULL, &local_err)
-< 0)
-{
-fprintf(stderr, "%s: can't open%s%s: %s\n", progname,
-name ? " device " : "", name ?: "",
-error_get_pretty(local_err));
-error_free(local_err);
-bdrv_unref(qemuio_bs);
-qemuio_bs = NULL;
-return 1;
-}
+if (bdrv_open(&qemuio_bs, name, NULL, opts, flags, NULL, &local_err) < 0) {
+fprintf(stderr, "%s: can't open%s%s: %s\n", progname,
+name ? " device " : "", name ?: "",
+error_get_pretty(local_err));
+error_free(local_err);
+bdrv_unref(qemuio_bs);
+qemuio_bs = NULL;
+return 1;
 }
 
 return 0;
-- 
1.9.3

[Qemu-devel] [PATCH 3/4] xen: Drop redundant bdrv_close() from pci_piix3_xen_ide_unplug()

2014-09-08 Thread Markus Armbruster

drive_del() closes just fine.

Signed-off-by: Markus Armbruster 
---
 hw/ide/piix.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 59319eb..49e78a7 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -182,7 +182,6 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev)
 if (ds) {
 bdrv_detach_dev(di->bdrv, ds);
 }
-bdrv_close(di->bdrv);
 pci_ide->bus[di->bus].ifs[di->unit].bs = NULL;
 drive_del(di);
 }
-- 
1.9.3

Re: [Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with nospace indicator

2014-09-08 Thread Luiz Capitulino

On Mon, 8 Sep 2014 17:33:18 +0200
Kevin Wolf  wrote:

> Am 08.09.2014 um 16:42 hat Luiz Capitulino geschrieben:
> > On Fri, 29 Aug 2014 16:07:27 -0400
> > Luiz Capitulino  wrote:
> > 
> > > Management software, such as RHEV's vdsm, want to be able to allocate
> > > disk space on demand. The basic use case is to start a VM with a small
> > > disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> > > 
> > > To this end, the management software has to be notified when QEMU
> > > encounters ENOSPC. The solution implemented by this commit is simple:
> > > it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> > > when QEMU is stopped due to ENOSPC.
> > > 
> > > Note that support for querying this event is already present in
> > > query-block by means of the 'io-status' key. Also, the new 'nospace'
> > > BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> > > which basically means that werror= has to be set to either
> > > 'stop' or 'enospc' to enable 'nospace'.
> > > 
> > > Finally, this commit also updates the 'io-status' key doc in the
> > > schema with a list of supported device models.
> > > 
> > > Signed-off-by: Luiz Capitulino 
> > 
> > Kevin, are you going to take this via block layer tree?
> 
> Yes, thanks, I've applied it now.
> 
> What was our conclusion wrt the human-readable strerror() string for
> debugging? Didn't we want to add that as well?

I can do it on top of this patch. So, just adding a new field for this
is fine?

> 
> > > ---
> > > 
> > > Three important observations:
> > > 
> > >  1. We've talked with oVirt and OpenStack folks. oVirt folks say that
> > > this implementation is enough for their use-case. OpenStack don't
> > > need this feature
> > > 
> > >  2. While testing this with a raw image on a (smaller) ext2 file mounted
> > > via the loopback device, I get half "Invalid argument" I/O errors and
> > > half  "No space" errors". This means that half of the BLOCK_IO_ERROR
> > > events that are emitted for this test-case will have nospace=false
> > > and the other half nospace=true. I don't know why I'm getting those
> > > "Invalid argument" errors, can anyone of the block layer comment
> > > on this? I don't get that with a qcow2 image (I get nospace=true for
> > > all events)
> 
> Sounds familiar, but I never got around to debugging. Would probably be
> worth some digging where the EINVAL comes from.
> 
> > >  3. I think this should go via block tree
> > > 
> > >  block.c  | 22 ++
> > >  qapi/block-core.json |  8 +++-
> > >  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> Kevin
>

Re: [Qemu-devel] [PATCH 1/1] loader: g_realloc(p, 0) frees and returns NULL, simplify

2014-09-08 Thread Markus Armbruster

"Michael S. Tsirkin"  writes:

> On Wed, Aug 20, 2014 at 08:38:11PM +0200, Markus Armbruster wrote:
>> Once upon a time, it was decided that qemu_realloc(ptr, 0) should
>> abort.  Switching to glib retired that bright idea.  A bit of code
>> that was added to cope with it (commit 3e372cf) is still around.  Bury
>> it.
>> 
>> See also commit 6528499.
>> 
>> Signed-off-by: Markus Armbruster 
>
> Acked-by: Michael S. Tsirkin 

Michael, can you take this through your tree?

[Qemu-devel] [PATCH 2/4] xen_disk: Plug memory leak on error path

2014-09-08 Thread Markus Armbruster

While there, streamline control flow a bit.

Signed-off-by: Markus Armbruster 
---
 hw/block/xen_disk.c | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index a221d0b..2dcef07 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -852,28 +852,25 @@ static int blk_connect(struct XenDevice *xendev)
 blkdev->dinfo = drive_get(IF_XEN, 0, index);
 if (!blkdev->dinfo) {
 Error *local_err = NULL;
+BlockDriver *drv;
+
 /* setup via xenbus -> create new block driver instance */
 xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
-blkdev->bs = bdrv_new(blkdev->dev, &local_err);
-if (local_err) {
-blkdev->bs = NULL;
-}
-if (blkdev->bs) {
-BlockDriver *drv = bdrv_find_whitelisted_format(blkdev->fileproto,
-   readonly);
-if (bdrv_open(&blkdev->bs, blkdev->filename, NULL, NULL, qflags,
-  drv, &local_err) != 0)
-{
-xen_be_printf(&blkdev->xendev, 0, "error: %s\n",
-  error_get_pretty(local_err));
-error_free(local_err);
-bdrv_unref(blkdev->bs);
-blkdev->bs = NULL;
-}
-}
+blkdev->bs = bdrv_new(blkdev->dev, NULL);
 if (!blkdev->bs) {
 return -1;
 }
+
+drv = bdrv_find_whitelisted_format(blkdev->fileproto, readonly);
+if (bdrv_open(&blkdev->bs, blkdev->filename, NULL, NULL, qflags,
+  drv, &local_err) != 0) {
+xen_be_printf(&blkdev->xendev, 0, "error: %s\n",
+  error_get_pretty(local_err));
+error_free(local_err);
+bdrv_unref(blkdev->bs);
+blkdev->bs = NULL;
+return -1;
+}
 } else {
 /* setup via qemu cmdline -> already setup for us */
 xen_be_printf(&blkdev->xendev, 2, "get configured bdrv (cmdline 
setup)\n");
-- 
1.9.3

[Qemu-devel] [PATCH 4/4] thread-pool: Drop unnecessary includes

2014-09-08 Thread Markus Armbruster

Dragging block_int.h into a header is *not* nice.  Fortunately, this
is the only offender.

Signed-off-by: Markus Armbruster 
---
 include/block/thread-pool.h | 6 +-
 thread-pool.c   | 1 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h
index 32afcdd..4723752 100644
--- a/include/block/thread-pool.h
+++ b/include/block/thread-pool.h
@@ -18,11 +18,7 @@
 #ifndef QEMU_THREAD_POOL_H
 #define QEMU_THREAD_POOL_H 1
 
-#include "qemu-common.h"
-#include "qemu/queue.h"
-#include "qemu/thread.h"
-#include "block/coroutine.h"
-#include "block/block_int.h"
+#include "block/block.h"
 
 typedef int ThreadPoolFunc(void *opaque);
 
diff --git a/thread-pool.c b/thread-pool.c
index 23888dc..bc07d7a 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -20,7 +20,6 @@
 #include "qemu/osdep.h"
 #include "block/coroutine.h"
 #include "trace.h"
-#include "block/block_int.h"
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 
-- 
1.9.3

Re: [Qemu-devel] [PATCH] virtio-pci: enable bus master for old guests

2014-09-08 Thread Michael S. Tsirkin

On Mon, Sep 08, 2014 at 06:26:51PM +0200, Jan Kiszka wrote:
> On 2014-09-08 18:05, Michael S. Tsirkin wrote:
> > commit cc943c36faa192cd4b32af8fe5edb31894017d35
> > pci: Use bus master address space for delivering MSI/MSI-X messages
> > breaks virtio-net for rhel6.[56] x86 guests because they don't
> > enable bus mastering for virtio PCI devices
> > 
> > Old guests forgot to enable bus mastering, enable it
> > automatically on DRIVER_OK.
> > 
> > Note: we should either back out the original patch from
> > stable or apply this one on top.
> > 
> > Cc: qemu-sta...@nongnu.org
> > Reported-by: Greg Kurz 
> > Signed-off-by: Jan Kiszka 
> 
> I didn't signed off as I didn't write this patch ;). But you can add my
> reviewed-by if you like.
> 
> Jan

Should have been Cc :)

> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  hw/virtio/virtio-pci.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index ddb5da1..af937d2 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -320,6 +320,8 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> > addr, uint32_t val)
> >  if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> >  !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
> >  proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
> > +
> > memory_region_set_enabled(&proxy->pci_dev.bus_master_enable_region,
> > +  true);
> >  }
> >  break;
> >  case VIRTIO_MSI_CONFIG_VECTOR:
> > 
> 
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 18:18, Peter Lieven ha scritto:
>> > When copying data, gparted will try using very large I/O sizes.  Of
>> > course if something breaks it will just use a smaller size, but it would
>> > make performance worse.
>> > 
>> > I tried now (with local storage, not virtual---but with such large block
>> > sizes it's disk bound anyway, one request can take 0.1 seconds to
>> > execute) and a 2 MB block size is 20% slower than 16 MB block size on
>> > your usual 3.5" rotational SATA disk.
>> > 
> can you share with what command exactly you ran these tests?
> 
> i tried myself and found that without multiwrite_merge i was not able to 
> create a request bigger than 0x sectors from inside linux.

On a different machine:

$ time dd if=/dev/zero of=test bs=16777216 count=30 oflag=direct
real0m13.497s
user0m0.001s
sys 0m0.541s

$ time dd if=/dev/zero of=test2 bs=1048576 count=480 oflag=direct
real0m15.835s
user0m0.005s
sys 0m0.770s

The bigger block size is 17% faster; for disk-to-disk copy:

$ time dd if=test of=test3 bs=16777216 count=30 iflag=direct oflag=direct
real0m26.075s
user0m0.001s
sys 0m0.678s

$ time dd if=test2 of=test4 bs=1048576 count=480 iflag=direct oflag=direct
real0m45.210s
user0m0.005s
sys 0m1.145s

The bigger block size is 73% faster.

Paolo

Re: [Qemu-devel] [PATCH] virtio-pci: enable bus master for old guests

2014-09-08 Thread Jan Kiszka

On 2014-09-08 18:05, Michael S. Tsirkin wrote:
> commit cc943c36faa192cd4b32af8fe5edb31894017d35
> pci: Use bus master address space for delivering MSI/MSI-X messages
> breaks virtio-net for rhel6.[56] x86 guests because they don't
> enable bus mastering for virtio PCI devices
> 
> Old guests forgot to enable bus mastering, enable it
> automatically on DRIVER_OK.
> 
> Note: we should either back out the original patch from
> stable or apply this one on top.
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Greg Kurz 
> Signed-off-by: Jan Kiszka 

I didn't signed off as I didn't write this patch ;). But you can add my
reviewed-by if you like.

Jan

> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index ddb5da1..af937d2 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -320,6 +320,8 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
>  !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
>  proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
> +
> memory_region_set_enabled(&proxy->pci_dev.bus_master_enable_region,
> +  true);
>  }
>  break;
>  case VIRTIO_MSI_CONFIG_VECTOR:
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Peter Lieven



> Am 08.09.2014 um 17:27 schrieb Paolo Bonzini :
> 
> Il 08/09/2014 17:18, Peter Lieven ha scritto:
>>> 
>>> That's why we have splitting code for discard, and why we would have to
>>> add it for read/write too.
>> 
>> Why should a guest generate such big requests.
> 
> When copying data, gparted will try using very large I/O sizes.  Of
> course if something breaks it will just use a smaller size, but it would
> make performance worse.
> 
> I tried now (with local storage, not virtual---but with such large block
> sizes it's disk bound anyway, one request can take 0.1 seconds to
> execute) and a 2 MB block size is 20% slower than 16 MB block size on
> your usual 3.5" rotational SATA disk.
> 

can you share with what command exactly you ran these tests?

i tried myself and found that without multiwrite_merge i was not able to create 
a request bigger than 0x sectors from inside linux.

Peter

> Paolo
> 
>> Afaik the reported
>> limit for e.g. virtio-blk is 1024 sectors (reported through blockdev
>> --getmaxsect /dev/vda).
>> I think it was only the multiwrite_merge code causing trouble here.
> 
>

Re: [Qemu-devel] [PATCH v2] Support vhd type VHD_DIFFERENCING

2014-09-08 Thread Kevin Wolf

Am 08.09.2014 um 16:41 hat Xiaodong Gong geschrieben:
> From: Xiaodong Gong 
> 
> Now qemu only supports vhd type VHD_FIXED and VHD_DYNAMIC,
> so qemu can't read snapshot volume of vhd, and can't support
> other storage features of vhd file.
> 
> This patch add read parent information in function "vpc_open",
> read bitmap in "vpc_read", and change bitmap in "vpc_write".
> 
> Signed-off-by: Xiaodong Gong 

I have a few comments and questions on this patch. Please find my
comments inline.

>  block/vpc.c | 329 
> +++-
>  1 file changed, 261 insertions(+), 68 deletions(-)
> 
> diff --git a/block/vpc.c b/block/vpc.c
> index c024b4c..3ba0d57 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -33,13 +33,18 @@
>  /**/
>  
>  #define HEADER_SIZE 512
> +#define DYNAMIC_HEADER_SIZE 1024
> +#define PARENT_LOCATOR_NUM 8
> +#define PARENT_PREFIX_LEN 7 /* such as file:// */
> +#define TBBATMAP_HEAD_SIZE 28
> +#define MACX 0x5863614d /* big endian */

Perhaps call it PLATFORM_MACX? At first I was wondering what this might
mean.

>  //#define CACHE
>  
>  enum vhd_type {
>  VHD_FIXED   = 2,
>  VHD_DYNAMIC = 3,
> -VHD_DIFFERENCING= 4,
> +VHD_DIFF= 4,
>  };
>  
>  // Seconds since Jan 1, 2000 0:00:00 (UTC)
> @@ -138,6 +143,15 @@ typedef struct BDRVVPCState {
>  Error *migration_blocker;
>  } BDRVVPCState;
>  
> +typedef struct vhd_tdbatmap_header {

This should be VHDTdBatmapHeader as well.

> +char magic[8]; /* always "tdbatmap" */
> +
> +uint64_t batmap_offset;
> +uint32_t batmap_size;
> +uint32_t batmap_version;
> +uint32_t checksum;
> +} QEMU_PACKED VHDTdBatmapHeader;
> +
>  static uint32_t vpc_checksum(uint8_t* buf, size_t size)
>  {
>  uint32_t res = 0;
> @@ -153,7 +167,7 @@ static uint32_t vpc_checksum(uint8_t* buf, size_t size)
>  static int vpc_probe(const uint8_t *buf, int buf_size, const char *filename)
>  {
>  if (buf_size >= 8 && !strncmp((char *)buf, "conectix", 8))
> - return 100;
> +return 100;
>  return 0;
>  }

I wouldn't touch this function at all when you're not changing any
functionality. But if you do touch it, please also add curly braces for
the if statement.

> @@ -164,11 +178,17 @@ static int vpc_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  int i;
>  VHDFooter *footer;
>  VHDDynDiskHeader *dyndisk_header;
> -uint8_t buf[HEADER_SIZE];
> +uint8_t buf[DYNAMIC_HEADER_SIZE];
> +uint8_t tdbatmap_header_buf[TBBATMAP_HEAD_SIZE];
>  uint32_t checksum;
>  uint64_t computed_size;
> -int disk_type = VHD_DYNAMIC;
> +uint32_t disk_type;
>  int ret;
> +VHDTdBatmapHeader *tdbatmap_header;
> +int parent_locator_offset = 0;
> +int64_t data_offset = 0;
> +int data_length = 0;
> +uint32_t platform;
>  
>  ret = bdrv_pread(bs->file, 0, s->footer_buf, HEADER_SIZE);
>  if (ret < 0) {
> @@ -176,6 +196,8 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
> int flags,
>  }
>  
>  footer = (VHDFooter *) s->footer_buf;
> +disk_type = be32_to_cpu(footer->type);
> +
>  if (strncmp(footer->creator, "conectix", 8)) {
>  int64_t offset = bdrv_getlength(bs->file);
>  if (offset < 0) {
> @@ -230,9 +252,9 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
> int flags,
>  goto fail;
>  }
>  
> -if (disk_type == VHD_DYNAMIC) {
> +if (disk_type == VHD_DYNAMIC || disk_type == VHD_DIFF) {
>  ret = bdrv_pread(bs->file, be64_to_cpu(footer->data_offset), buf,
> - HEADER_SIZE);
> + DYNAMIC_HEADER_SIZE);
>  if (ret < 0) {
>  goto fail;
>  }
> @@ -286,6 +308,56 @@ static int vpc_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  s->free_data_block_offset =
>  (s->bat_offset + (s->max_table_entries * 4) + 511) & ~511;
>  
> +/* Read tdbatmap header by offset */
> +ret = bdrv_pread(bs->file, s->free_data_block_offset,
> +tdbatmap_header_buf, TBBATMAP_HEAD_SIZE);
> +if (ret < 0) {
> +goto fail;
> +}

Where can I find information about this tdbatmap structure? I'm looking
at the VHD specification 1.0, but can't find it there.

Is this only relevant for differencing images or could it be an
independent patch?

> +tdbatmap_header = (VHDTdBatmapHeader *) tdbatmap_header_buf;
> +if (!strncmp(tdbatmap_header->magic, "tdbatmap", 8)) {
> +s->free_data_block_offset =
> +be32_to_cpu(tdbatmap_header->batmap_size) * 512
> ++ be64_to_cpu(tdbatmap_header->batmap_offset);
> +}
> +
> +/* Read backing file location from dyn header table */
> +if (dyndisk_header->parent_name[0] || 
> dyndisk_header->parent_name[1]) {

Probably nicer written as *(uint

[Qemu-devel] [PATCH] virtio-pci: enable bus master for old guests

2014-09-08 Thread Michael S. Tsirkin

commit cc943c36faa192cd4b32af8fe5edb31894017d35
pci: Use bus master address space for delivering MSI/MSI-X messages
breaks virtio-net for rhel6.[56] x86 guests because they don't
enable bus mastering for virtio PCI devices

Old guests forgot to enable bus mastering, enable it
automatically on DRIVER_OK.

Note: we should either back out the original patch from
stable or apply this one on top.

Cc: qemu-sta...@nongnu.org
Reported-by: Greg Kurz 
Signed-off-by: Jan Kiszka 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ddb5da1..af937d2 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -320,6 +320,8 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
 !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
 proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
+memory_region_set_enabled(&proxy->pci_dev.bus_master_enable_region,
+  true);
 }
 break;
 case VIRTIO_MSI_CONFIG_VECTOR:
-- 
MST

Re: [Qemu-devel] [PATCH] Fix improper usage of cpu_to_be32 in vpc

2014-09-08 Thread Kevin Wolf

Am 08.09.2014 um 16:40 hat Xiaodong Gong geschrieben:
> From: Xiaodong Gong 
> 
> cpu_to_be32() is wrong since vhd_type is an enum constant
> (just a regular CPU-endian integer).
> 
> Signed-off-by: Xiaodong Gong 

Thanks, this one is perfect. :-) Applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with nospace indicator

2014-09-08 Thread Kevin Wolf

Am 08.09.2014 um 16:42 hat Luiz Capitulino geschrieben:
> On Fri, 29 Aug 2014 16:07:27 -0400
> Luiz Capitulino  wrote:
> 
> > Management software, such as RHEV's vdsm, want to be able to allocate
> > disk space on demand. The basic use case is to start a VM with a small
> > disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> > 
> > To this end, the management software has to be notified when QEMU
> > encounters ENOSPC. The solution implemented by this commit is simple:
> > it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> > when QEMU is stopped due to ENOSPC.
> > 
> > Note that support for querying this event is already present in
> > query-block by means of the 'io-status' key. Also, the new 'nospace'
> > BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> > which basically means that werror= has to be set to either
> > 'stop' or 'enospc' to enable 'nospace'.
> > 
> > Finally, this commit also updates the 'io-status' key doc in the
> > schema with a list of supported device models.
> > 
> > Signed-off-by: Luiz Capitulino 
> 
> Kevin, are you going to take this via block layer tree?

Yes, thanks, I've applied it now.

What was our conclusion wrt the human-readable strerror() string for
debugging? Didn't we want to add that as well?

> > ---
> > 
> > Three important observations:
> > 
> >  1. We've talked with oVirt and OpenStack folks. oVirt folks say that
> > this implementation is enough for their use-case. OpenStack don't
> > need this feature
> > 
> >  2. While testing this with a raw image on a (smaller) ext2 file mounted
> > via the loopback device, I get half "Invalid argument" I/O errors and
> > half  "No space" errors". This means that half of the BLOCK_IO_ERROR
> > events that are emitted for this test-case will have nospace=false
> > and the other half nospace=true. I don't know why I'm getting those
> > "Invalid argument" errors, can anyone of the block layer comment
> > on this? I don't get that with a qcow2 image (I get nospace=true for
> > all events)

Sounds familiar, but I never got around to debugging. Would probably be
worth some digging where the EINVAL comes from.

> >  3. I think this should go via block tree
> > 
> >  block.c  | 22 ++
> >  qapi/block-core.json |  8 +++-
> >  2 files changed, 21 insertions(+), 9 deletions(-)

Kevin

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 17:18, Peter Lieven ha scritto:
>>
>> That's why we have splitting code for discard, and why we would have to
>> add it for read/write too.
> 
> Why should a guest generate such big requests.

When copying data, gparted will try using very large I/O sizes.  Of
course if something breaks it will just use a smaller size, but it would
make performance worse.

I tried now (with local storage, not virtual---but with such large block
sizes it's disk bound anyway, one request can take 0.1 seconds to
execute) and a 2 MB block size is 20% slower than 16 MB block size on
your usual 3.5" rotational SATA disk.

Paolo

> Afaik the reported
> limit for e.g. virtio-blk is 1024 sectors (reported through blockdev
> --getmaxsect /dev/vda).
> I think it was only the multiwrite_merge code causing trouble here.

Re: [Qemu-devel] [PATCH v2 3/3] util: Add an utility infrastructure used to compute an average on a time slice

2014-09-08 Thread Benoît Canet

On Mon, Sep 08, 2014 at 05:09:38PM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 16:49, Benoît Canet ha scritto:
> >> > - create two windows, with twice the suggested expiration period, and
> >> > return min/avg/max from the oldest window.  Example
> >> > 
> >> >t=0  |t=1  |t=2  |t=3  |t=4
> >> >wnd0: [0,1)  |wnd0: [1,3)  | |wnd0: [3,5)  |
> >> >wnd1: [0,2)  | |wnd1: [2,4)  | |
> >> > 
> >> > Values are returned from:
> >> > 
> >> >wnd0-|wnd1-|wnd0-|wnd1-|
> > 
> > This is neat.
> 
> Alternatively, you can make it probabilistically correct:
> 
> t=0|t=0.66   |t=1.33 |t=2  
> |t=2.66
>|wnd0: [0.66,2)   |   |wnd0: [2,3.33)   |
> wnd1: [0,0.66) | |wnd1: [1.33,2.66)  | |
> 
> Return from:
> 
> 
> wnd1---|wnd1-|wnd0---|wnd1-|wnd0
> 
> So you always have 2/3 seconds worth of data, and on average exactly 1 second
> worth of data.
> 
> The problem is the delay in getting data, which can be big for the minute-
> and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
> it might not show in the minute-based statistics for as much as 30 seconds
> after it ends (the window switches every 40 seconds).
> 
> For min/max you could return min(min0, min1) and max(max0, max1).  Only the
> average has this problem.
> 
> Exponential smoothing doesn't have this problem.  IIRC uptime uses that.

I am writing this so cloud end users can programatically get informations about
their vms disk statistics.

Cloud end users are known to use their cloud API to script the elasticity of 
their
architecture.
Some code will poll system statistics to decide if new instances must be 
launched
or existing instances must be pruned.
This means introducing a delay in the accounting code would slow down their
decisions.

min and max is also useful to know since it gives an idea of the deviation.

So I think the first method you suggested would be the best for a cloud vm.

Best regards

Benoît

> 
> Paolo

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Peter Lieven


On 08.09.2014 17:15, Paolo Bonzini wrote:

Il 08/09/2014 17:13, ronnie sahlberg ha scritto:

What I would like to see would also be to report these limitations to
the guest itself to prevent it from generating too large I/Os.

That's difficult because you don't want a backend change (e.g. from
local storage to iSCSI) to change the vital product data in the guest.


I hadn't that in mind



That's why we have splitting code for discard, and why we would have to
add it for read/write too.


Why should a guest generate such big requests. Afaik the reported
limit for e.g. virtio-blk is 1024 sectors (reported through blockdev 
--getmaxsect /dev/vda).
I think it was only the multiwrite_merge code causing trouble here.

Peter

Re: [Qemu-devel] [PATCH 1/2] tcg: Compress TCGLabelQemuLdst

2014-09-08 Thread Claudio Fontana

Reviewed-by: Claudio Fontana 

On 03.09.2014 18:46, Richard Henderson wrote:
> Use 1 32-bit word instead of 6.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg-be-ldst.h | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/tcg/tcg-be-ldst.h b/tcg/tcg-be-ldst.h
> index 49b3de6..904eeda 100644
> --- a/tcg/tcg-be-ldst.h
> +++ b/tcg/tcg-be-ldst.h
> @@ -23,14 +23,19 @@
>  #ifdef CONFIG_SOFTMMU
>  #define TCG_MAX_QEMU_LDST   640
>  
> +QEMU_BUILD_BUG_ON(TCG_TARGET_NB_REGS > 32);
> +QEMU_BUILD_BUG_ON(NB_MMU_MODES > 8);
> +
>  typedef struct TCGLabelQemuLdst {
> -bool is_ld:1;   /* qemu_ld: true, qemu_st: false */
> -TCGMemOp opc:4;
> -TCGReg addrlo_reg;  /* reg index for low word of guest virtual addr 
> */
> -TCGReg addrhi_reg;  /* reg index for high word of guest virtual addr 
> */
> -TCGReg datalo_reg;  /* reg index for low word to be loaded or stored 
> */
> -TCGReg datahi_reg;  /* reg index for high word to be loaded or 
> stored */
> -int mem_index;  /* soft MMU memory index */
> +TCGMemOp opc : 4;
> +bool is_ld : 1; /* qemu_ld: true, qemu_st: false */
> +TCGReg addrlo_reg : 5;  /* reg index for low word of guest virtual addr 
> */
> +TCGReg addrhi_reg : 5;  /* reg index for high word of guest virtual addr 
> */
> +TCGReg datalo_reg : 5;  /* reg index for low word to be loaded or stored 
> */
> +TCGReg datahi_reg : 5;  /* reg index for high word to be loaded or 
> stored */
> +unsigned mem_index : 3; /* soft MMU memory index */
> +/* 4 bits unused in 32-bit word */
> +
>  tcg_insn_unit *raddr;   /* gen code addr of the next IR of qemu_ld/st IR 
> */
>  tcg_insn_unit *label_ptr[2]; /* label pointers to be updated */
>  } TCGLabelQemuLdst;
>

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Peter Lieven


On 08.09.2014 17:13, ronnie sahlberg wrote:

On Mon, Sep 8, 2014 at 7:42 AM, Paolo Bonzini  wrote:

Il 08/09/2014 16:35, Peter Lieven ha scritto:

messages. :)

So you would not throw an error msg here?

No, though a trace could be useful.

Is there a howto somewhere how to implement that?

Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.


Whats your opinion changed the max_xfer_len to 0x regardsless
of use_16_for_rw in iSCSI?

If you implemented request splitting in the block layer, it would be
okay to force max_xfer_len to 0x.

I think it should be OK if that is a different series. This only
affects the iSCSI transport since it is the only transport so far to
record or report a max transfer length.
If a guest generates these big requests(i.e. not multiwrite) we
already fail them due to the READ10 limitations. We would just fail
the request at a different layer in the stack with these patches.


What I would like to see would also be to report these limitations to
the guest itself to prevent it from generating too large I/Os.
I am willing to do that part once the initial max_xfer_len ones are finished.


Yes, I also think this approach is straightforward. We should avoid
big requests at the beginning and not fix them at the end.
I had a look it shouldn't be too hard. There are default values for
virtio-scsi HD inquiry emulation which are set if no cmdline values
are specified. It should be possible to feed them from bs->bl if set.

Peter

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 17:13, ronnie sahlberg ha scritto:
> 
> What I would like to see would also be to report these limitations to
> the guest itself to prevent it from generating too large I/Os.

That's difficult because you don't want a backend change (e.g. from
local storage to iSCSI) to change the vital product data in the guest.

That's why we have splitting code for discard, and why we would have to
add it for read/write too.

Paolo

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread ronnie sahlberg

On Mon, Sep 8, 2014 at 7:42 AM, Paolo Bonzini  wrote:
> Il 08/09/2014 16:35, Peter Lieven ha scritto:
>
> messages. :)
 So you would not throw an error msg here?
>>> No, though a trace could be useful.
>>
>> Is there a howto somewhere how to implement that?
>
> Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.
>
>> Whats your opinion changed the max_xfer_len to 0x regardsless
>> of use_16_for_rw in iSCSI?
>
> If you implemented request splitting in the block layer, it would be
> okay to force max_xfer_len to 0x.

I think it should be OK if that is a different series. This only
affects the iSCSI transport since it is the only transport so far to
record or report a max transfer length.
If a guest generates these big requests(i.e. not multiwrite) we
already fail them due to the READ10 limitations. We would just fail
the request at a different layer in the stack with these patches.

What I would like to see would also be to report these limitations to
the guest itself to prevent it from generating too large I/Os.
I am willing to do that part once the initial max_xfer_len ones are finished.

[Qemu-devel] [Bug 1366836] Re: Core2Duo and KVM may not boot Win8 properly on 3.x kernels

2014-09-08 Thread Erik Rull

** Description changed:

  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel 3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at 
Windows 8 boot without any error.
  When I dump the CPU registers via "info registers", nothing changes, that 
means the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
  It stalls when the Windows logo is displayed and the balled circle starts 
rotating.
  
  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU
  1.7.0 or 2.0.0 on the host side it works on the Core2Duo. Also the
  system above but just with an i3 or i5 CPU it works fine.
  
  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.
  
  Seems to be related to a kvm/processor incompatibility.
  
  Windows XP runs on all combinations without any issues. Windows 8.1
  guests have the same issues as Windows 8.0.
+ 
+ An example command line that does not boot Windows 8 is:
+ qemu-system-x86_64 -machine pc-i440fx-1.5,accel=kvm,kernel_irqchip=off 
-daemonize -cpu kvm32,+sep,+nx -nodefaults -vga std -readconfig 
/usr/X11R6/X11etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0,hostport=1.2 
-device usb-tablet -drive file=/dev/sdb,cache=writethrough,if=none,id=x -device 
ide-drive,drive=x -m 1024 -monitor telnet:127.0.0.1:5100,nowait,server -vnc :1 
-L /usr/X11R6/share/qemu -boot c -localtime -enable-kvm -no-shutdown
+ 
+ enabling the kernel_irqchip, removing the sep, disabling usb, changing
+ the machine type or changing the monitor type (SDL or VNC) has no
+ effect.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1366836

Title:
  Core2Duo and KVM may not boot Win8 properly on 3.x kernels

Status in QEMU:
  New

Bug description:
  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel 3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at 
Windows 8 boot without any error.
  When I dump the CPU registers via "info registers", nothing changes, that 
means the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
  It stalls when the Windows logo is displayed and the balled circle starts 
rotating.

  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU
  1.7.0 or 2.0.0 on the host side it works on the Core2Duo. Also the
  system above but just with an i3 or i5 CPU it works fine.

  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.

  Seems to be related to a kvm/processor incompatibility.

  Windows XP runs on all combinations without any issues. Windows 8.1
  guests have the same issues as Windows 8.0.

  An example command line that does not boot Windows 8 is:
  qemu-system-x86_64 -machine pc-i440fx-1.5,accel=kvm,kernel_irqchip=off 
-daemonize -cpu kvm32,+sep,+nx -nodefaults -vga std -readconfig 
/usr/X11R6/X11etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0,hostport=1.2 
-device usb-tablet -drive file=/dev/sdb,cache=writethrough,if=none,id=x -device 
ide-drive,drive=x -m 1024 -monitor telnet:127.0.0.1:5100,nowait,server -vnc :1 
-L /usr/X11R6/share/qemu -boot c -localtime -enable-kvm -no-shutdown

  enabling the kernel_irqchip, removing the sep, disabling usb, changing
  the machine type or changing the monitor type (SDL or VNC) has no
  effect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1366836/+subscriptions

Re: [Qemu-devel] [PATCH v2 3/3] util: Add an utility infrastructure used to compute an average on a time slice

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 16:49, Benoît Canet ha scritto:
>> > - create two windows, with twice the suggested expiration period, and
>> > return min/avg/max from the oldest window.  Example
>> > 
>> >t=0  |t=1  |t=2  |t=3  |t=4
>> >wnd0: [0,1)  |wnd0: [1,3)  | |wnd0: [3,5)  |
>> >wnd1: [0,2)  | |wnd1: [2,4)  | |
>> > 
>> > Values are returned from:
>> > 
>> >wnd0-|wnd1-|wnd0-|wnd1-|
> 
> This is neat.

Alternatively, you can make it probabilistically correct:

t=0|t=0.66   |t=1.33 |t=2  
|t=2.66
   |wnd0: [0.66,2)   |   |wnd0: [2,3.33)   |
wnd1: [0,0.66) | |wnd1: [1.33,2.66)  | |

Return from:

wnd1---|wnd1-|wnd0---|wnd1-|wnd0

So you always have 2/3 seconds worth of data, and on average exactly 1 second
worth of data.

The problem is the delay in getting data, which can be big for the minute-
and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
it might not show in the minute-based statistics for as much as 30 seconds
after it ends (the window switches every 40 seconds).

For min/max you could return min(min0, min1) and max(max0, max1).  Only the
average has this problem.

Exponential smoothing doesn't have this problem.  IIRC uptime uses that.

Paolo

[Qemu-devel] [Bug 1366836] Re: Core2Duo and KVM may not boot Win8 properly on 3.x kernels

2014-09-08 Thread Erik Rull

** Description changed:

- When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel
- 3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at Windows 8 
boot without any error.
- When I dump the CPU registers via "info registers", nothing changes, that 
means
- the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
+ When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel 3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at 
Windows 8 boot without any error.
+ When I dump the CPU registers via "info registers", nothing changes, that 
means the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
+ It stalls when the Windows logo is displayed and the balled circle starts 
rotating.
  
- But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 or 
2.0.0 on
- the host side it works on the Core2Duo. Also the system above but just with an
- i3 or i5 CPU it works fine.
+ But - when I run the very same guest using Kernel 2.6.32.12 and QEMU
+ 1.7.0 or 2.0.0 on the host side it works on the Core2Duo. Also the
+ system above but just with an i3 or i5 CPU it works fine.
  
  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.
  
  Seems to be related to a kvm/processor incompatibility.
+ 
+ Windows XP runs on all combinations without any issues. Windows 8.1
+ guests have the same issues as Windows 8.0.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1366836

Title:
  Core2Duo and KVM may not boot Win8 properly on 3.x kernels

Status in QEMU:
  New

Bug description:
  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel 3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at 
Windows 8 boot without any error.
  When I dump the CPU registers via "info registers", nothing changes, that 
means the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
  It stalls when the Windows logo is displayed and the balled circle starts 
rotating.

  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU
  1.7.0 or 2.0.0 on the host side it works on the Core2Duo. Also the
  system above but just with an i3 or i5 CPU it works fine.

  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.

  Seems to be related to a kvm/processor incompatibility.

  Windows XP runs on all combinations without any issues. Windows 8.1
  guests have the same issues as Windows 8.0.

  An example command line that does not boot Windows 8 is:
  qemu-system-x86_64 -machine pc-i440fx-1.5,accel=kvm,kernel_irqchip=off 
-daemonize -cpu kvm32,+sep,+nx -nodefaults -vga std -readconfig 
/usr/X11R6/X11etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0,hostport=1.2 
-device usb-tablet -drive file=/dev/sdb,cache=writethrough,if=none,id=x -device 
ide-drive,drive=x -m 1024 -monitor telnet:127.0.0.1:5100,nowait,server -vnc :1 
-L /usr/X11R6/share/qemu -boot c -localtime -enable-kvm -no-shutdown

  enabling the kernel_irqchip, removing the sep, disabling usb, changing
  the machine type or changing the monitor type (SDL or VNC) has no
  effect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1366836/+subscriptions

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Peter Lieven


On 08.09.2014 16:42, Paolo Bonzini wrote:

Il 08/09/2014 16:35, Peter Lieven ha scritto:

messages. :)

So you would not throw an error msg here?

No, though a trace could be useful.

Is there a howto somewhere how to implement that?

Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.


Thanks for the pointer.




Whats your opinion changed the max_xfer_len to 0x regardsless
of use_16_for_rw in iSCSI?

If you implemented request splitting in the block layer, it would be
okay to force max_xfer_len to 0x.


Unfortunately, I currently have no time for that. It will include some 
shuffling with
qiovs that has to be properly tested.

Regarding iSCSI: In fact currently the limit is 0x for all iSCSI Targets < 
2TB.
So I thought that its not obvious at all why a > 2TB target can handle bigger 
requests.

To the root cause of this patch multiwrite_merge I still have some thoughts:
 - why are we merging requests for raw (especially host devices and/or iSCSI?)
   The original patch from Kevin was to mitigate a QCOW2 performance regression.
   For iSCSI the qiov concats are destroying all the zerocopy efforts we made.
 - should we only merge requests within the same cluster?
 - why is there no multiread_merge?

Peter

Re: [Qemu-devel] [PATCH v2 2/3] timers: Move NANOSECONDS_PER_SECONDS to timer.h for future reuse

2014-09-08 Thread Eric Blake

On 09/08/2014 06:18 AM, Benoît Canet wrote:
> Signed-off-by: Benoît Canet 
> ---
>  include/qemu/throttle.h | 2 --
>  include/qemu/timer.h| 2 ++
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1366836] [NEW] Core2Duo and KVM may not boot Win8 properly on 3.x kernels

2014-09-08 Thread Erik Rull

Public bug reported:

When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla kernel
3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at Windows 8 
boot without any error.
When I dump the CPU registers via "info registers", nothing changes, that means
the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.

But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 or 
2.0.0 on
the host side it works on the Core2Duo. Also the system above but just with an
i3 or i5 CPU it works fine.

I already disabled networking and USB for the guest and changed the graphics
card - no effect. I assume that some mean bits and bytes have to be set up
properly to get the thing running.

Seems to be related to a kvm/processor incompatibility.

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: core2duo kvm

** Description changed:

  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel
  3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at Windows 8 
boot without any error.
  When I dump the CPU registers via "info registers", nothing changes, that 
means
  the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
  
  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 or 
2.0.0 on
  the host side it works on the Core2Duo. Also the system above but just with an
  i3 or i5 CPU it works fine.
  
  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.
  
- Seems to be related to a kvm/progressor incompatibility.
+ Seems to be related to a kvm/procressor incompatibility.

** Description changed:

  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel
  3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at Windows 8 
boot without any error.
  When I dump the CPU registers via "info registers", nothing changes, that 
means
  the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.
  
  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 or 
2.0.0 on
  the host side it works on the Core2Duo. Also the system above but just with an
  i3 or i5 CPU it works fine.
  
  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.
  
- Seems to be related to a kvm/procressor incompatibility.
+ Seems to be related to a kvm/processor incompatibility.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1366836

Title:
  Core2Duo and KVM may not boot Win8 properly on 3.x kernels

Status in QEMU:
  New

Bug description:
  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla 
kernel
  3.4.67 or 3.10.12 to run a Windows 8.0 guest, the guest freezes at Windows 8 
boot without any error.
  When I dump the CPU registers via "info registers", nothing changes, that 
means
  the system really stalled. Same happens with QEMU 2.0.0 and QEMU 2.1.0.

  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 or 
2.0.0 on
  the host side it works on the Core2Duo. Also the system above but just with an
  i3 or i5 CPU it works fine.

  I already disabled networking and USB for the guest and changed the graphics
  card - no effect. I assume that some mean bits and bytes have to be set up
  properly to get the thing running.

  Seems to be related to a kvm/processor incompatibility.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1366836/+subscriptions

Re: [Qemu-devel] [PATCH v2 3/3] util: Add an utility infrastructure used to compute an average on a time slice

2014-09-08 Thread Benoît Canet

The Monday 08 Sep 2014 à 16:29:26 (+0200), Paolo Bonzini wrote :
> Il 08/09/2014 14:18, Benoît Canet ha scritto:
> > The algorithm used was defined on the list while discussing the new IO 
> > accounting
> > overhaul.
> > See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html
> > 
> > Also the module takes care of computing minimal and maximal values over the 
> > time
> > slice duration.
> > 
> > Signed-off-by: Benoît Canet 
> 
> If you add
> 
> int64_t cpu_get_clock(void)
> {
> return my_clock_value;
> }
> 
> to the test, and use a QEMU_CLOCK_VIRTUAL-based average, you should be
> able to advance the clock directly in the test with no need for sleep()
> and with 100% deterministic results.
> 
> > 
> > +/* Check if the ta->periods seconds time slice has expired
> > + *
> > + * If the slice has expired the counters will be reseted
> > + *
> > + * @ta: the timed average structure used
> > + */
> > +static void timed_average_check_expiration(TimedAverage *ta)
> > +{
> > +int64_t now = qemu_clock_get_ns(ta->clock_type);
> > +
> > +/* if we are still in the period slice do nothing */
> > +if (now < ta->expiration) {
> > +return;
> > +}
> > +
> > +/* the slice has expired -> create a new slice */
> > +ta->min = UINT64_MAX;
> > +ta->sum = 0;
> > +ta->max = 0;
> > +ta->count = 0;
> > +timed_average_set_expiration(ta);
> > +}
> 
Thanks for reviewing.

> This can produce very noisy results if you invoke min/avg/max at the
> wrong time.  Some alternatives include:
> 
> - create two windows, with twice the suggested expiration period, and
> return min/avg/max from the oldest window.  Example
> 
>t=0  |t=1  |t=2  |t=3  |t=4
>wnd0: [0,1)  |wnd0: [1,3)  | |wnd0: [3,5)  |
>wnd1: [0,2)  | |wnd1: [2,4)  | |
> 
> Values are returned from:
> 
>wnd0-|wnd1-|wnd0-|wnd1-|


This is neat.
As I think that knowing the minimal and maximal latencies of a block device
would be handy I will implement this.

Best regards

Benoît

> 
> 
> - if you do not need min/max, you can use exponential smoothing, with a
> weighted factor that depends on the time since the last sample.
> http://www.drdobbs.com/tools/discontiguous-exponential-averaging/184410671
> -- for example, giving 90% weight to the last second.  Of course the
> exponential nature means that, in that case, 1-sqrt(10%)=68.3% weight is
> given to the last half second, 21.6% weight is given to the previous
> half second, and 10% to the entire previous history.  This cannot give
> min/max, but can give avg/stdev.
> 
> Paolo
>

Re: [Qemu-devel] [PATCH v2 1/3] throttle: Make NANOSECONDS_PER_SECOND an integer

2014-09-08 Thread Eric Blake

On 09/08/2014 06:18 AM, Benoît Canet wrote:
> We will want to reuse this define in the future by making it common to 
> multiples

s/multiples/multiple/

> QEMU modules.
> It would be safer that this define be an integer so we avoid stranges float

s/stranges/strange/

> rounding errors.
> Do this conversion.
> 
> Signed-off-by: Benoît Canet 
> ---
>  include/qemu/throttle.h | 2 +-
>  util/throttle.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
> index b890613..8f9e611 100644
> --- a/include/qemu/throttle.h
> +++ b/include/qemu/throttle.h
> @@ -27,7 +27,7 @@
>  #include "qemu-common.h"
>  #include "qemu/timer.h"
>  
> -#define NANOSECONDS_PER_SECOND  10.0
> +#define NANOSECONDS_PER_SECOND  10

This hunk is good.

>  
>  typedef enum {
>  THROTTLE_BPS_TOTAL,
> diff --git a/util/throttle.c b/util/throttle.c
> index f976ac7..af8445a 100644
> --- a/util/throttle.c
> +++ b/util/throttle.c
> @@ -34,7 +34,7 @@ void throttle_leak_bucket(LeakyBucket *bkt, int64_t 
> delta_ns)
>  double leak;
>  
>  /* compute how much to leak */
> -leak = (bkt->avg * (double) delta_ns) / NANOSECONDS_PER_SECOND;
> +leak = (bkt->avg * (double) delta_ns) / (double) NANOSECONDS_PER_SECOND;

This hunk is spurious.  With just your first hunk, it evaluates to the
following types via promotion rules:

(double * double) / int
double / int
double / double
double

so the explicit cast isn't changing anything.

>  
>  /* make the bucket leak */
>  bkt->level = MAX(bkt->level - leak, 0);
> @@ -70,7 +70,7 @@ static void throttle_do_leak(ThrottleState *ts, int64_t now)
>   */
>  static int64_t throttle_do_compute_wait(double limit, double extra)
>  {
> -double wait = extra * NANOSECONDS_PER_SECOND;
> +double wait = extra * (double) NANOSECONDS_PER_SECOND;

This hunk is also spurious.  Again, with just your first hunk, it
evaluates through the following promotion rules:

double * int
double * double
double

and the cast isn't changing anything.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 16:35, Peter Lieven ha scritto:

 messages. :)
>>> So you would not throw an error msg here?
>> No, though a trace could be useful.
> 
> Is there a howto somewhere how to implement that?

Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.

> Whats your opinion changed the max_xfer_len to 0x regardsless
> of use_16_for_rw in iSCSI?

If you implemented request splitting in the block layer, it would be
okay to force max_xfer_len to 0x.

Paolo

Re: [Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with nospace indicator

2014-09-08 Thread Luiz Capitulino

On Fri, 29 Aug 2014 16:07:27 -0400
Luiz Capitulino  wrote:

> Management software, such as RHEV's vdsm, want to be able to allocate
> disk space on demand. The basic use case is to start a VM with a small
> disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> 
> To this end, the management software has to be notified when QEMU
> encounters ENOSPC. The solution implemented by this commit is simple:
> it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> when QEMU is stopped due to ENOSPC.
> 
> Note that support for querying this event is already present in
> query-block by means of the 'io-status' key. Also, the new 'nospace'
> BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> which basically means that werror= has to be set to either
> 'stop' or 'enospc' to enable 'nospace'.
> 
> Finally, this commit also updates the 'io-status' key doc in the
> schema with a list of supported device models.
> 
> Signed-off-by: Luiz Capitulino 

Kevin, are you going to take this via block layer tree?

> ---
> 
> Three important observations:
> 
>  1. We've talked with oVirt and OpenStack folks. oVirt folks say that
> this implementation is enough for their use-case. OpenStack don't
> need this feature
> 
>  2. While testing this with a raw image on a (smaller) ext2 file mounted
> via the loopback device, I get half "Invalid argument" I/O errors and
> half  "No space" errors". This means that half of the BLOCK_IO_ERROR
> events that are emitted for this test-case will have nospace=false
> and the other half nospace=true. I don't know why I'm getting those
> "Invalid argument" errors, can anyone of the block layer comment
> on this? I don't get that with a qcow2 image (I get nospace=true for
> all events)
> 
>  3. I think this should go via block tree
> 
>  block.c  | 22 ++
>  qapi/block-core.json |  8 +++-
>  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 1df13ac..b334e35 100644
> --- a/block.c
> +++ b/block.c
> @@ -3632,6 +3632,18 @@ BlockErrorAction 
> bdrv_get_error_action(BlockDriverState *bs, bool is_read, int e
>  }
>  }
>  
> +static void send_qmp_error_event(BlockDriverState *bs,
> + BlockErrorAction action,
> + bool is_read, int error)
> +{
> +BlockErrorAction ac;
> +
> +ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
> +qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
> +   bdrv_iostatus_is_enabled(bs),
> +   error == ENOSPC, &error_abort);
> +}
> +
>  /* This is done by device models because, while the block layer knows
>   * about the error, it does not know whether an operation comes from
>   * the device or the block layer (from a job, for example).
> @@ -3657,16 +3669,10 @@ void bdrv_error_action(BlockDriverState *bs, 
> BlockErrorAction action,
>   * also ensures that the STOP/RESUME pair of events is emitted.
>   */
>  qemu_system_vmstop_request_prepare();
> -qapi_event_send_block_io_error(bdrv_get_device_name(bs),
> -   is_read ? IO_OPERATION_TYPE_READ :
> -   IO_OPERATION_TYPE_WRITE,
> -   action, &error_abort);
> +send_qmp_error_event(bs, action, is_read, error);
>  qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
>  } else {
> -qapi_event_send_block_io_error(bdrv_get_device_name(bs),
> -   is_read ? IO_OPERATION_TYPE_READ :
> -   IO_OPERATION_TYPE_WRITE,
> -   action, &error_abort);
> +send_qmp_error_event(bs, action, is_read, error);
>  }
>  }
>  
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index fb74c56..567e0a6 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -336,6 +336,7 @@
>  #
>  # @io-status: #optional @BlockDeviceIoStatus. Only present if the device
>  # supports it and the VM is configured to stop on errors
> +# (supported device models: virtio-blk, ide, scsi-disk)
>  #
>  # @inserted: #optional @BlockDeviceInfo describing the device if media is
>  #present
> @@ -1569,6 +1570,11 @@
>  #
>  # @action: action that has been taken
>  #
> +# @nospace: #optional true if I/O error was caused due to a no-space
> +#   condition. This key is only present if query-block's
> +#   io-status is present, please see query-block documentation
> +#   for more information (since: 2.2)
> +#
>  # Note: If action is "stop", a STOP event will eventually follow the
>  # BLOCK_IO_ERROR event
>  #
> @@ -1576,7 +1582,7 @@
>  ##
>  { 'event': 'BLOCK_IO_ERROR',
>'data': { 'device': 'str', 'operati

[Qemu-devel] [PATCH v2] Support vhd type VHD_DIFFERENCING

2014-09-08 Thread Xiaodong Gong

From: Xiaodong Gong 

Now qemu only supports vhd type VHD_FIXED and VHD_DYNAMIC,
so qemu can't read snapshot volume of vhd, and can't support
other storage features of vhd file.

This patch add read parent information in function "vpc_open",
read bitmap in "vpc_read", and change bitmap in "vpc_write".

Signed-off-by: Xiaodong Gong 
---
 block/vpc.c | 329 +++-
 1 file changed, 261 insertions(+), 68 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index c024b4c..3ba0d57 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -33,13 +33,18 @@
 /**/
 
 #define HEADER_SIZE 512
+#define DYNAMIC_HEADER_SIZE 1024
+#define PARENT_LOCATOR_NUM 8
+#define PARENT_PREFIX_LEN 7 /* such as file:// */
+#define TBBATMAP_HEAD_SIZE 28
+#define MACX 0x5863614d /* big endian */
 
 //#define CACHE
 
 enum vhd_type {
 VHD_FIXED   = 2,
 VHD_DYNAMIC = 3,
-VHD_DIFFERENCING= 4,
+VHD_DIFF= 4,
 };
 
 // Seconds since Jan 1, 2000 0:00:00 (UTC)
@@ -138,6 +143,15 @@ typedef struct BDRVVPCState {
 Error *migration_blocker;
 } BDRVVPCState;
 
+typedef struct vhd_tdbatmap_header {
+char magic[8]; /* always "tdbatmap" */
+
+uint64_t batmap_offset;
+uint32_t batmap_size;
+uint32_t batmap_version;
+uint32_t checksum;
+} QEMU_PACKED VHDTdBatmapHeader;
+
 static uint32_t vpc_checksum(uint8_t* buf, size_t size)
 {
 uint32_t res = 0;
@@ -153,7 +167,7 @@ static uint32_t vpc_checksum(uint8_t* buf, size_t size)
 static int vpc_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
 if (buf_size >= 8 && !strncmp((char *)buf, "conectix", 8))
-   return 100;
+return 100;
 return 0;
 }
 
@@ -164,11 +178,17 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 int i;
 VHDFooter *footer;
 VHDDynDiskHeader *dyndisk_header;
-uint8_t buf[HEADER_SIZE];
+uint8_t buf[DYNAMIC_HEADER_SIZE];
+uint8_t tdbatmap_header_buf[TBBATMAP_HEAD_SIZE];
 uint32_t checksum;
 uint64_t computed_size;
-int disk_type = VHD_DYNAMIC;
+uint32_t disk_type;
 int ret;
+VHDTdBatmapHeader *tdbatmap_header;
+int parent_locator_offset = 0;
+int64_t data_offset = 0;
+int data_length = 0;
+uint32_t platform;
 
 ret = bdrv_pread(bs->file, 0, s->footer_buf, HEADER_SIZE);
 if (ret < 0) {
@@ -176,6 +196,8 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 footer = (VHDFooter *) s->footer_buf;
+disk_type = be32_to_cpu(footer->type);
+
 if (strncmp(footer->creator, "conectix", 8)) {
 int64_t offset = bdrv_getlength(bs->file);
 if (offset < 0) {
@@ -230,9 +252,9 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto fail;
 }
 
-if (disk_type == VHD_DYNAMIC) {
+if (disk_type == VHD_DYNAMIC || disk_type == VHD_DIFF) {
 ret = bdrv_pread(bs->file, be64_to_cpu(footer->data_offset), buf,
- HEADER_SIZE);
+ DYNAMIC_HEADER_SIZE);
 if (ret < 0) {
 goto fail;
 }
@@ -286,6 +308,56 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 s->free_data_block_offset =
 (s->bat_offset + (s->max_table_entries * 4) + 511) & ~511;
 
+/* Read tdbatmap header by offset */
+ret = bdrv_pread(bs->file, s->free_data_block_offset,
+tdbatmap_header_buf, TBBATMAP_HEAD_SIZE);
+if (ret < 0) {
+goto fail;
+}
+
+tdbatmap_header = (VHDTdBatmapHeader *) tdbatmap_header_buf;
+if (!strncmp(tdbatmap_header->magic, "tdbatmap", 8)) {
+s->free_data_block_offset =
+be32_to_cpu(tdbatmap_header->batmap_size) * 512
++ be64_to_cpu(tdbatmap_header->batmap_offset);
+}
+
+/* Read backing file location from dyn header table */
+if (dyndisk_header->parent_name[0] || dyndisk_header->parent_name[1]) {
+for (i = 0; i < PARENT_LOCATOR_NUM; i++) {
+data_offset =
+be64_to_cpu(dyndisk_header->parent_locator[i].data_offset);
+data_length =
+be32_to_cpu(dyndisk_header->parent_locator[i].data_length);
+platform = dyndisk_header->parent_locator[i].platform;
+
+if (MACX == platform) {
+if (data_offset + PARENT_PREFIX_LEN >
+s->max_table_entries * s->block_size) {
+goto fail;
+}
+if (data_length - PARENT_PREFIX_LEN > 1024) {
+goto fail;
+}
+ret = bdrv_pread(bs->file, data_offset + PARENT_PREFIX_LEN,
+bs->backing_file, data_length - PARENT_PREFIX_LEN);
+if (ret < 0) {

Re: [Qemu-devel] [PATCH v2 4/5] qcow2: Check L1/L2/reftable entries for alignment

2014-09-08 Thread Benoît Canet

The Friday 05 Sep 2014 à 16:07:18 (+0200), Max Reitz wrote :
> Offsets taken from the L1, L2 and refcount tables are generally assumed
> to be correctly aligned. However, this cannot be guaranteed if the image
> has been written to by something different than qemu, thus check all
> offsets taken from these tables for correct cluster alignment.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-cluster.c  | 43 ---
>  block/qcow2-refcount.c | 44 ++--
>  2 files changed, 82 insertions(+), 5 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 735f687..f7dd8c0 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -486,6 +486,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> uint64_t offset,
>  goto out;
>  }
>  
> +if (offset_into_cluster(s, l2_offset)) {
> +qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" PRIx64
> +" unaligned (L1 index: %#" PRIx64 ")",
> +l2_offset, l1_index);
> +return -EIO;

This function mix return ret and goto out and there is more of the second.
Can we do ret = -EIO and goto out for consistency ?
bs->drv == NULL after qcow2_signal_corruption so we are not afraid of out
sides effects.

> +}
> +
>  /* load the l2 table in memory */
>  
>  ret = l2_load(bs, l2_offset, &l2_table);
> @@ -508,8 +515,11 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> uint64_t offset,
>  break;
>  case QCOW2_CLUSTER_ZERO:
>  if (s->qcow_version < 3) {
> -qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> -return -EIO;
> +qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry 
> found"
> +" in pre-v3 image (L2 offset: %#" PRIx64
> +", L2 index: %#x)", l2_offset, l2_index);
> +ret = -EIO;
> +goto fail;
>  }
>  c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>  &l2_table[l2_index], QCOW_OFLAG_ZERO);
> @@ -525,6 +535,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
> uint64_t offset,
>  c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>  &l2_table[l2_index], QCOW_OFLAG_ZERO);
>  *cluster_offset &= L2E_OFFSET_MASK;
> +if (offset_into_cluster(s, *cluster_offset)) {
> +qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset 
> %#"
> +PRIx64 " unaligned (L2 offset: %#" PRIx64
> +", L2 index: %#x)", *cluster_offset,
> +l2_offset, l2_index);
> +ret = -EIO;
> +goto fail;
> +}
>  break;
>  default:
>  abort();
> @@ -541,6 +559,10 @@ out:
>  *num = nb_available - index_in_cluster;
>  
>  return ret;
> +
> +fail:
> +qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
> +return ret;
>  }
>  
>  /*
> @@ -576,6 +598,12 @@ static int get_cluster_table(BlockDriverState *bs, 
> uint64_t offset,
>  
>  assert(l1_index < s->l1_size);
>  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
> +if (offset_into_cluster(s, l2_offset)) {
> +qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" PRIx64
> +" unaligned (L1 index: %#" PRIx64 ")",
> +l2_offset, l1_index);
> +return -EIO;
> +}
>  
>  /* seek the l2 table of the given l2 offset */
>  
> @@ -948,6 +976,15 @@ static int handle_copied(BlockDriverState *bs, uint64_t 
> guest_offset,
>  bool offset_matches =
>  (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
>  
> +if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
> +qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
> +"%#llx unaligned (guest offset: %#" 
> PRIx64
> +")", cluster_offset & L2E_OFFSET_MASK,
> +guest_offset);
> +ret = -EIO;
> +goto out;
> +}
> +
>  if (*host_offset != 0 && !offset_matches) {
>  *bytes = 0;
>  ret = 0;
> @@ -979,7 +1016,7 @@ out:
>  
>  /* Only return a host offset if we actually made progress. Otherwise we
>   * would make requirements for handle_alloc() that it can't fulfill */
> -if (ret) {
> +if (ret > 0) {
>  *host_offset = (cluster_offset & L2E_OFFSET_MASK)
>   + offset_into_cluster(s, guest_offset);
>  }
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index b9d421e..2bcaaf9 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -108,6

[Qemu-devel] [PATCH] Fix improper usage of cpu_to_be32 in vpc

2014-09-08 Thread Xiaodong Gong

From: Xiaodong Gong 

cpu_to_be32() is wrong since vhd_type is an enum constant
(just a regular CPU-endian integer).

Signed-off-by: Xiaodong Gong 
---
 block/vpc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 055efc4..c024b4c 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -489,7 +489,7 @@ static int vpc_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 BDRVVPCState *s = (BDRVVPCState *)bs->opaque;
 VHDFooter *footer = (VHDFooter *) s->footer_buf;
 
-if (cpu_to_be32(footer->type) != VHD_FIXED) {
+if (be32_to_cpu(footer->type) != VHD_FIXED) {
 bdi->cluster_size = s->block_size;
 }
 
@@ -506,7 +506,7 @@ static int vpc_read(BlockDriverState *bs, int64_t 
sector_num,
 int64_t sectors, sectors_per_block;
 VHDFooter *footer = (VHDFooter *) s->footer_buf;
 
-if (cpu_to_be32(footer->type) == VHD_FIXED) {
+if (be32_to_cpu(footer->type) == VHD_FIXED) {
 return bdrv_read(bs->file, sector_num, buf, nb_sectors);
 }
 while (nb_sectors > 0) {
@@ -555,7 +555,7 @@ static int vpc_write(BlockDriverState *bs, int64_t 
sector_num,
 int ret;
 VHDFooter *footer =  (VHDFooter *) s->footer_buf;
 
-if (cpu_to_be32(footer->type) == VHD_FIXED) {
+if (be32_to_cpu(footer->type) == VHD_FIXED) {
 return bdrv_write(bs->file, sector_num, buf, nb_sectors);
 }
 while (nb_sectors > 0) {
@@ -857,7 +857,7 @@ static int vpc_has_zero_init(BlockDriverState *bs)
 BDRVVPCState *s = bs->opaque;
 VHDFooter *footer =  (VHDFooter *) s->footer_buf;
 
-if (cpu_to_be32(footer->type) == VHD_FIXED) {
+if (be32_to_cpu(footer->type) == VHD_FIXED) {
 return bdrv_has_zero_init(bs->file);
 } else {
 return 1;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Peter Lieven


On 08.09.2014 15:58, Paolo Bonzini wrote:

Il 08/09/2014 15:56, Peter Lieven ha scritto:

Look like you are changing the coroutine version.

Some hardware like virtio-blk uses the AIO version of read and writes.
What would happen if all the block drivers down the chain are AIO
enabled ?

The AIO version still goes through bdrv_co_do_readv/writev.

However, error_report is not something you can use for guest-accessible
error messages, unless you want your logs to fill up with error
messages. :)

So you would not throw an error msg here?

No, though a trace could be useful.


Is there a howto somewhere how to implement that?

Then I would send a v2 if there are no other complaints.

Whats your opinion changed the max_xfer_len to 0x regardsless
of use_16_for_rw in iSCSI?

Peter

Re: [Qemu-devel] [PATCH v2 3/3] util: Add an utility infrastructure used to compute an average on a time slice

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 14:18, Benoît Canet ha scritto:
> The algorithm used was defined on the list while discussing the new IO 
> accounting
> overhaul.
> See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html
> 
> Also the module takes care of computing minimal and maximal values over the 
> time
> slice duration.
> 
> Signed-off-by: Benoît Canet 

If you add

int64_t cpu_get_clock(void)
{
return my_clock_value;
}

to the test, and use a QEMU_CLOCK_VIRTUAL-based average, you should be
able to advance the clock directly in the test with no need for sleep()
and with 100% deterministic results.

> 
> +/* Check if the ta->periods seconds time slice has expired
> + *
> + * If the slice has expired the counters will be reseted
> + *
> + * @ta: the timed average structure used
> + */
> +static void timed_average_check_expiration(TimedAverage *ta)
> +{
> +int64_t now = qemu_clock_get_ns(ta->clock_type);
> +
> +/* if we are still in the period slice do nothing */
> +if (now < ta->expiration) {
> +return;
> +}
> +
> +/* the slice has expired -> create a new slice */
> +ta->min = UINT64_MAX;
> +ta->sum = 0;
> +ta->max = 0;
> +ta->count = 0;
> +timed_average_set_expiration(ta);
> +}

This can produce very noisy results if you invoke min/avg/max at the
wrong time.  Some alternatives include:

- create two windows, with twice the suggested expiration period, and
return min/avg/max from the oldest window.  Example

   t=0  |t=1  |t=2  |t=3  |t=4
   wnd0: [0,1)  |wnd0: [1,3)  | |wnd0: [3,5)  |
   wnd1: [0,2)  | |wnd1: [2,4)  | |

Values are returned from:

   wnd0-|wnd1-|wnd0-|wnd1-|


- if you do not need min/max, you can use exponential smoothing, with a
weighted factor that depends on the time since the last sample.
http://www.drdobbs.com/tools/discontiguous-exponential-averaging/184410671
-- for example, giving 90% weight to the last second.  Of course the
exponential nature means that, in that case, 1-sqrt(10%)=68.3% weight is
given to the last half second, 21.6% weight is given to the previous
half second, and 10% to the entire previous history.  This cannot give
min/max, but can give avg/stdev.

Paolo

Re: [Qemu-devel] [PATCH v2 3/5] qcow2: Use qcow2_signal_corruption() for overlaps

2014-09-08 Thread Benoît Canet

The Friday 05 Sep 2014 à 16:07:17 (+0200), Max Reitz wrote :
> Use the new function in case of a failed overlap check.
> 
> This changes output in case of corruption, so adapt iotest 060's
> reference output accordingly.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-refcount.c | 24 +++-
>  tests/qemu-iotests/060.out | 10 +-
>  2 files changed, 8 insertions(+), 26 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 0bd75d2..b9d421e 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -26,8 +26,6 @@
>  #include "block/block_int.h"
>  #include "block/qcow2.h"
>  #include "qemu/range.h"
> -#include "qapi/qmp/types.h"
> -#include "qapi-event.h"
>  
>  static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
>  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> @@ -1838,27 +1836,11 @@ int qcow2_pre_write_overlap_check(BlockDriverState 
> *bs, int ign, int64_t offset,
>  return ret;
>  } else if (ret > 0) {
>  int metadata_ol_bitnr = ffs(ret) - 1;
> -char *message;
> -
>  assert(metadata_ol_bitnr < QCOW2_OL_MAX_BITNR);
>  
> -fprintf(stderr, "qcow2: Preventing invalid write on metadata 
> (overlaps "
> -"with %s); image marked as corrupt.\n",
> -metadata_ol_names[metadata_ol_bitnr]);
> -message = g_strdup_printf("Prevented %s overwrite",
> -metadata_ol_names[metadata_ol_bitnr]);
> -qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs),
> -  message,
> -  true,
> -  offset,
> -  true,
> -  size,
> -  true,
> -  &error_abort);
> -g_free(message);
> -
> -qcow2_mark_corrupt(bs);
> -bs->drv = NULL; /* make BDS unusable */
> +qcow2_signal_corruption(bs, true, offset, size, "Preventing invalid "
> +"write on metadata (overlaps with %s)",
> +metadata_ol_names[metadata_ol_bitnr]);
>  return -EIO;
>  }
>  
> diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
> index c27c952..30806da 100644
> --- a/tests/qemu-iotests/060.out
> +++ b/tests/qemu-iotests/060.out
> @@ -8,7 +8,7 @@ ERROR cluster 3 refcount=1 reference=3
>  1 errors were found on the image.
>  Data may be corrupted, or further writes to the image may corrupt it.
>  incompatible_features 0x0
> -qcow2: Preventing invalid write on metadata (overlaps with active L1 table); 
> image marked as corrupt.
> +qcow2: Marking image as corrupt: Preventing invalid write on metadata 
> (overlaps with active L1 table); further corruption events will be suppressed
>  write failed: Input/output error
>  incompatible_features 0x2
>  qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; 
> cannot be opened read/write
> @@ -24,7 +24,7 @@ ERROR cluster 2 refcount=1 reference=2
>  2 errors were found on the image.
>  Data may be corrupted, or further writes to the image may corrupt it.
>  incompatible_features 0x0
> -qcow2: Preventing invalid write on metadata (overlaps with refcount block); 
> image marked as corrupt.
> +qcow2: Marking image as corrupt: Preventing invalid write on metadata 
> (overlaps with refcount block); further corruption events will be suppressed
>  write failed: Input/output error
>  incompatible_features 0x2
>  Repairing refcount block 0 refcount=2
> @@ -56,7 +56,7 @@ Data may be corrupted, or further writes to the image may 
> corrupt it.
>  1 leaked clusters were found on the image.
>  This means waste of disk space, but no harm to data.
>  incompatible_features 0x0
> -qcow2: Preventing invalid write on metadata (overlaps with inactive L2 
> table); image marked as corrupt.
> +qcow2: Marking image as corrupt: Preventing invalid write on metadata 
> (overlaps with inactive L2 table); further corruption events will be 
> suppressed
>  write failed: Input/output error
>  incompatible_features 0x2
>  Repairing cluster 4 refcount=1 reference=2
> @@ -88,7 +88,7 @@ wrote 65536/65536 bytes at offset 536870912
>  64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  discard 65536/65536 bytes at offset 0
>  64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -qcow2: Preventing invalid write on metadata (overlaps with active L2 table); 
> image marked as corrupt.
> +qcow2: Marking image as corrupt: Preventing invalid write on metadata 
> (overlaps with active L2 table); further corruption events will be suppressed
>  blkdebug: Suspended request '0'
>  write failed: Input/output error
>  blkdebug: Resuming request '0'
> @@ -99,6 +99,6 @@ aio_writ

Re: [Qemu-devel] [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-08 Thread Alex Williamson

On Mon, 2014-09-08 at 11:20 +0200, Paolo Bonzini wrote:
> Il 06/09/2014 01:19, Alexander Graf ha scritto:
> >> > 1) interpretive execution of pci load/store instruction. If we use this 
> >> > function
> >> >pci access does not get intercepted (no SIE exit) but is handled via 
> >> > microcode.
> >> >To enable this we have to disable zpci device and enable it again 
> >> > with information
> >> >from the SIE control block.
> > Hrm. So how about you create a special vm ioctl for KVM that allows you
> > to attach a VFIO device fd into the KVM VM context? Then the default
> > would stay "accessible by mmap traps", but we could accelerate it with KVM.
> 
> There is already KVM_DEV_VFIO_GROUP_ADD and KVM_DEV_VFIO_GROUP_DEL.
> 
> Right now, they result in a call to kvm_arch_register_noncoherent_dma or
> kvm_arch_unregister_noncoherent_dma, but you can add more hooks.

Eric Auger is also working on a patch series to do IRQ forward control
on ARM via the kvm-vfio pseudo device, extending the interface to
register VFIO device fds.  Sounds like that may be a good path to follow
here too.  Thanks,

Alex

Re: [Qemu-devel] [PATCH v2 2/5] qcow2: Add qcow2_signal_corruption()

2014-09-08 Thread Benoît Canet

The Friday 05 Sep 2014 à 16:07:16 (+0200), Max Reitz wrote :
> Add a helper function for easily marking an image corrupt (on fatal
> corruptions) while outputting an informative message to stderr and via
> QAPI.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2.c | 48 
>  block/qcow2.h |  5 +
>  2 files changed, 53 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f9e045f..fc4217b 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -30,6 +30,8 @@
>  #include "qemu/error-report.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qapi/qmp/qbool.h"
> +#include "qapi/qmp/types.h"
> +#include "qapi-event.h"
>  #include "trace.h"
>  #include "qemu/option_int.h"
>  
> @@ -2478,6 +2480,52 @@ static int qcow2_amend_options(BlockDriverState *bs, 
> QemuOpts *opts)
>  return 0;
>  }
>  
> +/*
> + * If offset or size are negative, respectively, they will not be included in
> + * the BLOCK_IMAGE_CORRUPTED event emitted.
> + * fatal will be ignored for read-only BDS; corruptions found there will 
> always
> + * be considered non-fatal.
> + */
> +void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t 
> offset,
> + int64_t size, const char *message_format, ...)
> +{
> +BDRVQcowState *s = bs->opaque;
> +char *message;
> +va_list ap;
> +
> +fatal = fatal && !bs->read_only;
> +
> +if (s->signaled_corruption &&
> +(!fatal || (s->incompatible_features & QCOW2_INCOMPAT_CORRUPT)))
> +{
> +return;
> +}
> +
> +va_start(ap, message_format);
> +message = g_strdup_vprintf(message_format, ap);
> +va_end(ap);
> +
> +if (fatal) {
> +fprintf(stderr, "qcow2: Marking image as corrupt: %s; further "
> +"corruption events will be suppressed\n", message);
> +} else {
> +fprintf(stderr, "qcow2: Image is corrupt: %s; further non-fatal "
> +"corruption events will be suppressed\n", message);
> +}
> +
> +qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs), message,
> +  offset >= 0, offset, size >= 0, 
> size,
> +  fatal, &error_abort);
> +g_free(message);
> +
> +if (fatal) {
> +qcow2_mark_corrupt(bs);
> +bs->drv = NULL; /* make BDS unusable */
> +}
> +
> +s->signaled_corruption = true;
> +}
> +
>  static QemuOptsList qcow2_create_opts = {
>  .name = "qcow2-create-opts",
>  .head = QTAILQ_HEAD_INITIALIZER(qcow2_create_opts.head),
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 6aeb7ea..7b7b6a6 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -261,6 +261,7 @@ typedef struct BDRVQcowState {
>  bool discard_passthrough[QCOW2_DISCARD_MAX];
>  
>  int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
> +bool signaled_corruption;
>  
>  uint64_t incompatible_features;
>  uint64_t compatible_features;
> @@ -477,6 +478,10 @@ int qcow2_mark_corrupt(BlockDriverState *bs);
>  int qcow2_mark_consistent(BlockDriverState *bs);
>  int qcow2_update_header(BlockDriverState *bs);
>  
> +void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t 
> offset,
> + int64_t size, const char *message_format, ...)
> + GCC_FMT_ATTR(5, 6);
> +
>  /* qcow2-refcount.c functions */
>  int qcow2_refcount_init(BlockDriverState *bs);
>  void qcow2_refcount_close(BlockDriverState *bs);
> -- 
> 2.1.0
> 
> 
Reviewed-by: Benoît Canet

Re: [Qemu-devel] [PULL v2 00/24] Block patches

2014-09-08 Thread Peter Maydell

On 8 September 2014 11:51, Stefan Hajnoczi  wrote:
> v2:
>  * Dropped Fam's dataplane virtio header patches that break win32 build 
> [Peter]
>
> The following changes since commit f102f224556f292f55b6e25147741bb8c48c9451:
>
>   Merge remote-tracking branch 'remotes/afaerber/tags/qom-cpu-for-peter' into 
> staging (2014-09-05 16:03:56 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 01ce352e62c3f86df6f4ad32c3ab9353e55af799:
>
>   ide: Add resize callback to ide/core (2014-09-08 11:12:44 +0100)
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v2 1/5] qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED

2014-09-08 Thread Benoît Canet

The Friday 05 Sep 2014 à 16:07:15 (+0200), Max Reitz wrote :
> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
> reading from an image, they should generally not be. Nonetheless, even
> an image only read from may of course be corrupted and this can be
> detected during normal operation. In this case, a non-fatal event should
> be emitted, but the image should not be marked corrupt (in accordance to
> "fatal" set to false).
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-refcount.c | 1 +
>  qapi/block-core.json   | 9 +++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 43665b8..0bd75d2 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, 
> int ign, int64_t offset,
>offset,
>true,
>size,

> +  true,

What is this line ?

>&error_abort);
>  g_free(message);
>  
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index a685d02..d23bcc2 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1554,7 +1554,7 @@
>  ##
>  # @BLOCK_IMAGE_CORRUPTED
>  #
> -# Emitted when a disk image is being marked corrupt
> +# Emitted when a corruption has been detected in a disk image
>  #
>  # @device: device name
>  #
> @@ -1568,13 +1568,18 @@
>  # @size: #optional, if the corruption resulted from an image access, this is
>  #the access size
>  #
> +# fatal: if set, the image is marked corrupt and therefore unusable after 
> this
> +#event and must be repaired (Since 2.2; before, every
> +#BLOCK_IMAGE_CORRUPTED event was fatal)
> +#
>  # Since: 1.7
>  ##
>  { 'event': 'BLOCK_IMAGE_CORRUPTED',
>'data': { 'device' : 'str',
>  'msg': 'str',
>  '*offset': 'int',
> -'*size'  : 'int' } }
> +'*size'  : 'int',
> +'fatal'  : 'bool' } }
>  
>  ##
>  # @BLOCK_IO_ERROR
> -- 
> 2.1.0
> 
>

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 15:56, Peter Lieven ha scritto:
>
>>> Look like you are changing the coroutine version.
>>>
>>> Some hardware like virtio-blk uses the AIO version of read and writes.
>>> What would happen if all the block drivers down the chain are AIO
>>> enabled ?
>> The AIO version still goes through bdrv_co_do_readv/writev.
>>
>> However, error_report is not something you can use for guest-accessible
>> error messages, unless you want your logs to fill up with error
>> messages. :)
> 
> So you would not throw an error msg here?

No, though a trace could be useful.

Paolo

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Peter Lieven


On 08.09.2014 15:49, Paolo Bonzini wrote:

Il 08/09/2014 15:44, Benoît Canet ha scritto:

+if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) 
{
+error_report("read of %d sectors at sector %ld exceeds device max"
+ " transfer length of %d sectors.", nb_sectors, sector_num,
+ bs->bl.max_transfer_length);
+return -EINVAL;
+}
+
  return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
   nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
  }
@@ -3507,6 +3514,13 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
  return -EINVAL;
  }
  
+if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {

+error_report("write of %d sectors at sector %ld exceeds device max"
+ " transfer length of %d sectors.", nb_sectors, sector_num,
+ bs->bl.max_transfer_length);
+return -EINVAL;
+}
+
  return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
  }
--
1.7.9.5



Look like you are changing the coroutine version.

Some hardware like virtio-blk uses the AIO version of read and writes.
What would happen if all the block drivers down the chain are AIO enabled ?

The AIO version still goes through bdrv_co_do_readv/writev.

However, error_report is not something you can use for guest-accessible
error messages, unless you want your logs to fill up with error messages. :)


So you would not throw an error msg here?

Peter

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Paolo Bonzini

Il 08/09/2014 15:44, Benoît Canet ha scritto:
>> > +if (bs->bl.max_transfer_length && nb_sectors > 
>> > bs->bl.max_transfer_length) {
>> > +error_report("read of %d sectors at sector %ld exceeds device max"
>> > + " transfer length of %d sectors.", nb_sectors, 
>> > sector_num,
>> > + bs->bl.max_transfer_length);
>> > +return -EINVAL;
>> > +}
>> > +
>> >  return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>> >   nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>> >  }
>> > @@ -3507,6 +3514,13 @@ static int coroutine_fn 
>> > bdrv_co_do_writev(BlockDriverState *bs,
>> >  return -EINVAL;
>> >  }
>> >  
>> > +if (bs->bl.max_transfer_length && nb_sectors > 
>> > bs->bl.max_transfer_length) {
>> > +error_report("write of %d sectors at sector %ld exceeds device 
>> > max"
>> > + " transfer length of %d sectors.", nb_sectors, 
>> > sector_num,
>> > + bs->bl.max_transfer_length);
>> > +return -EINVAL;
>> > +}
>> > +
>> >  return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
>> >nb_sectors << BDRV_SECTOR_BITS, qiov, 
>> > flags);
>> >  }
>> > -- 
>> > 1.7.9.5
>> > 
>> > 
> Look like you are changing the coroutine version.
> 
> Some hardware like virtio-blk uses the AIO version of read and writes.
> What would happen if all the block drivers down the chain are AIO enabled ?

The AIO version still goes through bdrv_co_do_readv/writev.

However, error_report is not something you can use for guest-accessible
error messages, unless you want your logs to fill up with error messages. :)

Paolo

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-08 Thread Benoît Canet

The Friday 05 Sep 2014 à 18:51:26 (+0200), Peter Lieven wrote :
> Signed-off-by: Peter Lieven 
> ---
>  block.c |   14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 2c4a5de..fa4c34b 100644
> --- a/block.c
> +++ b/block.c
> @@ -3215,6 +3215,13 @@ static int coroutine_fn 
> bdrv_co_do_readv(BlockDriverState *bs,
>  return -EINVAL;
>  }
>  
> +if (bs->bl.max_transfer_length && nb_sectors > 
> bs->bl.max_transfer_length) {
> +error_report("read of %d sectors at sector %ld exceeds device max"
> + " transfer length of %d sectors.", nb_sectors, 
> sector_num,
> + bs->bl.max_transfer_length);
> +return -EINVAL;
> +}
> +
>  return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>   nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>  }
> @@ -3507,6 +3514,13 @@ static int coroutine_fn 
> bdrv_co_do_writev(BlockDriverState *bs,
>  return -EINVAL;
>  }
>  
> +if (bs->bl.max_transfer_length && nb_sectors > 
> bs->bl.max_transfer_length) {
> +error_report("write of %d sectors at sector %ld exceeds device max"
> + " transfer length of %d sectors.", nb_sectors, 
> sector_num,
> + bs->bl.max_transfer_length);
> +return -EINVAL;
> +}
> +
>  return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
>nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>  }
> -- 
> 1.7.9.5
> 
> 

Look like you are changing the coroutine version.

Some hardware like virtio-blk uses the AIO version of read and writes.
What would happen if all the block drivers down the chain are AIO enabled ?

Best regards

Benoît

Re: [Qemu-devel] [PULL 01/12] pci: Use bus master address space for delivering MSI/MSI-X messages

2014-09-08 Thread Michael S. Tsirkin

On Thu, Aug 14, 2014 at 06:08:44PM +0200, Michael S. Tsirkin wrote:
> From: Jan Kiszka 
> 
> The spec says (and real HW confirms this) that, if the bus master bit
> is 0, the device will not generate any PCI accesses. MSI and MSI-X
> messages fall among these, so we should use the corresponding address
> space to deliver them. This will prevent delivery if bus master support
> is disabled.
> 
> Cc: qemu-sta...@nongnu.org

This is reported to break old guests which incorrectly don't enable
bus mastering.

Worth fixing (virtio needs a work-around) but please drop the patch from 
qemu-stable.

Thanks!

> Signed-off-by: Jan Kiszka 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/pci/msi.c  | 2 +-
>  hw/pci/msix.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> index a4a3040..52d2313 100644
> --- a/hw/pci/msi.c
> +++ b/hw/pci/msi.c
> @@ -291,7 +291,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
> "notify vector 0x%x"
> " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
> vector, msg.address, msg.data);
> -stl_le_phys(&address_space_memory, msg.address, msg.data);
> +stl_le_phys(&dev->bus_master_as, msg.address, msg.data);
>  }
>  
>  /* Normally called by pci_default_write_config(). */
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 5c49bfc..20ae476 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -439,7 +439,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>  
>  msg = msix_get_message(dev, vector);
>  
> -stl_le_phys(&address_space_memory, msg.address, msg.data);
> +stl_le_phys(&dev->bus_master_as, msg.address, msg.data);
>  }
>  
>  void msix_reset(PCIDevice *dev)
> -- 
> MST
>

1 2 >

1 - 100 of 180 matches

Mail list logo