date:20120522

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-22 Thread Alexey Kardashevskiy

On 22/05/12 15:52, Alexander Graf wrote:
 
 
 On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 22/05/12 13:21, Alexander Graf wrote:


 On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:

 On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
 Alexander,

 Is that any better? :)

 Alex (Graf that is), ping ?

 The original patch from Alexey was fine btw.

 VFIO will always call things with the existing capability offset so
 there's no real risk of doing the wrong thing or break the list or
 anything.

 IE. A small simple patch that addresses the problem :-)

 The new patch is a bit more robust I believe, I don't think we need to
 go too far to fix a problem we don't have. But we need a fix for the
 real issue and the simple patch does it neatly from what I can
 understand.

 Cheers,
 Ben.


 @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
 * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
   uint8_t offset, uint8_t size)
 {
 -uint8_t *config;
 +uint8_t *config, existing;

 Existing is a pointer to the target dev's config space, right?

 Yes.

int i, overlapping_cap;

 +existing = pci_find_capability(pdev, cap_id);
 +if (existing) {
 +if (offset  (existing != offset)) {
 +return -EEXIST;
 +}
 +for (i = existing; i  size; ++i) {

 So how does this possibly make sense?

 Although I do not expect VFIO to add capabilities (does not make sense), I 
 still want to double
 check that this space has not been tried to use by someone else.
 
 i is an int. existing is a uint8_t*.


It was there before me. This function already does a loop and this is how it 
was coded at the first place.


 +if (pdev-used[i]) {
 +return -EFAULT;
 +}
 +}
 +memset(pdev-used + offset, 0xFF, size);
 Why?

 Because I am marking the space this capability takes as used.
 
 But if it already existed (at the same offset), it should be set used 
 already, no? Unless size  existing size, in which case you might overwrite 
 data in the next chunk, no?


No, it does not exist for VFIO - VFIO read the config space from the host 
kernel first and then calls msi_init or msix_init or whatever_else_init 
depending on what it got from the host kernel. And these xxx_init() functions 
eventually call pci_add_capability().

Sure we can either implement own msi_init/msix_init (and may be others in the 
future) for VFIO (which would do all the same as other QEMU devices except 
touching the capabilities)  OR  hack msi_init/msix_init not to touch 
capabilities if they exist.



 +/* Make capability read-only by default */
 +memset(pdev-wmask + offset, 0, size);
 Why?

 Because the pci_add_capability() does it for a new capability by default.
 
 Hrm. So you're copying code? Can't you merge the overwrite and write cases?

I am trying to make it as a single chunk which is as small as possible.


If it helps, below is the same patch with extended context to see what is going 
on in that function.






hw/pci.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 63a8219..7008a42 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom)
 ptr = memory_region_get_ram_ptr(pdev-rom);
 load_image(path, ptr);
 g_free(path);
 
 if (is_default_rom) {
 /* Only the default rom images will be patched (if needed). */
 pci_patch_ids(pdev, ptr, size);
 }
 
 qemu_put_ram_ptr(ptr);
 
 pci_register_bar(pdev, PCI_ROM_SLOT, 0, pdev-rom);
 
 return 0;
 }
 
 static void pci_del_option_rom(PCIDevice *pdev)
 {
 if (!pdev-has_rom)
 return;
 
 vmstate_unregister_ram(pdev-rom, pdev-qdev);
 memory_region_destroy(pdev-rom);
 pdev-has_rom = false;
 }
 
 /*
  * if !offset
  * Reserve space and add capability to the linked list in pci config space
  *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
  * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size)
 {
-uint8_t *config;
+uint8_t *config, existing;
 int i, overlapping_cap;
 
+existing = pci_find_capability(pdev, cap_id);
+if (existing) {
+if (offset  (existing != offset)) {
+return -EEXIST;
+}
+for (i = existing; i  size; ++i) {
+if (pdev-used[i]) {
+return -EFAULT;
+}
+}
+memset(pdev-used + offset, 0xFF, size);
+/* Make capability read-only by default */
+memset(pdev-wmask + offset, 0, size);
+/* Check capability by default */
+memset(pdev-cmask + offset, 0xFF, size);
+return existing;
+}
+
 if

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-22 Thread Alexander Graf



On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 On 22/05/12 15:52, Alexander Graf wrote:
 
 
 On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 22/05/12 13:21, Alexander Graf wrote:
 
 
 On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:
 
 On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
 Alexander,
 
 Is that any better? :)
 
 Alex (Graf that is), ping ?
 
 The original patch from Alexey was fine btw.
 
 VFIO will always call things with the existing capability offset so
 there's no real risk of doing the wrong thing or break the list or
 anything.
 
 IE. A small simple patch that addresses the problem :-)
 
 The new patch is a bit more robust I believe, I don't think we need to
 go too far to fix a problem we don't have. But we need a fix for the
 real issue and the simple patch does it neatly from what I can
 understand.
 
 Cheers,
 Ben.
 
 
 @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
 * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
  uint8_t offset, uint8_t size)
 {
 -uint8_t *config;
 +uint8_t *config, existing;
 
 Existing is a pointer to the target dev's config space, right?
 
 Yes.
 
   int i, overlapping_cap;
 
 +existing = pci_find_capability(pdev, cap_id);
 +if (existing) {
 +if (offset  (existing != offset)) {
 +return -EEXIST;
 +}
 +for (i = existing; i  size; ++i) {
 
 So how does this possibly make sense?
 
 Although I do not expect VFIO to add capabilities (does not make sense), I 
 still want to double
 check that this space has not been tried to use by someone else.
 
 i is an int. existing is a uint8_t*.
 
 
 It was there before me. This function already does a loop and this is how it 
 was coded at the first place.

Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...

 
 
 +if (pdev-used[i]) {
 +return -EFAULT;
 +}
 +}
 +memset(pdev-used + offset, 0xFF, size);
 Why?
 
 Because I am marking the space this capability takes as used.
 
 But if it already existed (at the same offset), it should be set used 
 already, no? Unless size  existing size, in which case you might overwrite 
 data in the next chunk, no?
 
 
 No, it does not exist for VFIO - VFIO read the config space from the host 
 kernel first and then calls msi_init or msix_init or whatever_else_init 
 depending on what it got from the host kernel. And these xxx_init() functions 
 eventually call pci_add_capability().

So why would the function that populates the config space initially not set the 
used flag correctly?

 
 Sure we can either implement own msi_init/msix_init (and may be others in the 
 future) for VFIO (which would do all the same as other QEMU devices except 
 touching the capabilities)  OR  hack msi_init/msix_init not to touch 
 capabilities if they exist.

No, calling the internal functions sounds fine to me. It's the step before that 
irritates me. VFIO shouldn't differ too much from an emulated device wrt its 
config space population really.

 
 
 
 +/* Make capability read-only by default */
 +memset(pdev-wmask + offset, 0, size);
 Why?
 
 Because the pci_add_capability() does it for a new capability by default.
 
 Hrm. So you're copying code? Can't you merge the overwrite and write cases?
 
 I am trying to make it as a single chunk which is as small as possible.

No, you're needlessly duplicating code which is a bad idea :). Please reuse as 
much of the existing function as possible, unless it really doesn't make sense.

 
 
 If it helps, below is the same patch with extended context to see what is 
 going on in that function.
 
 
 
 
 
 
 hw/pci.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 63a8219..7008a42 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
 is_default_rom)
 ptr = memory_region_get_ram_ptr(pdev-rom);
 load_image(path, ptr);
 g_free(path);
 
 if (is_default_rom) {
 /* Only the default rom images will be patched (if needed). */
 pci_patch_ids(pdev, ptr, size);
 }
 
 qemu_put_ram_ptr(ptr);
 
 pci_register_bar(pdev, PCI_ROM_SLOT, 0, pdev-rom);
 
 return 0;
 }
 
 static void pci_del_option_rom(PCIDevice *pdev)
 {
 if (!pdev-has_rom)
 return;
 
 vmstate_unregister_ram(pdev-rom, pdev-qdev);
 memory_region_destroy(pdev-rom);
 pdev-has_rom = false;
 }
 
 /*
  * if !offset
  * Reserve space and add capability to the linked list in pci config space
  *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
  * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size)
 {
 -

[Qemu-devel] [PATCH v2 10/13] pci: export pci_user functions for use by other drivers

2012-05-22 Thread Alex Williamson

VFIO PCI support will make use of these for user initiated
PCI config accesses.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---

 drivers/pci/access.c |6 --
 drivers/pci/pci.h|7 ---
 include/linux/pci.h  |8 
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 2a58164..ba91a7e 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -162,7 +162,8 @@ int pci_user_read_config_##size 
\
if (ret  0)\
ret = -EINVAL;  \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_read_config_##size);
 
 /* Returns 0 on success, negative values indicate error. */
 #define PCI_USER_WRITE_CONFIG(size,type)   \
@@ -181,7 +182,8 @@ int pci_user_write_config_##size
\
if (ret  0)\
ret = -EINVAL;  \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
 
 PCI_USER_READ_CONFIG(byte, u8)
 PCI_USER_READ_CONFIG(word, u16)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e494347..f2dcc46 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -86,13 +86,6 @@ static inline bool pci_is_bridge(struct pci_dev *pci_dev)
return !!(pci_dev-subordinate);
 }
 
-extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
-extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
-extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 
*val);
-extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
-extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
-extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 
val);
-
 struct pci_vpd_ops {
ssize_t (*read)(struct pci_dev *dev, loff_t pos, size_t count, void 
*buf);
ssize_t (*write)(struct pci_dev *dev, loff_t pos, size_t count, const 
void *buf);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2559735..0cf57d5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -770,6 +770,14 @@ static inline int pci_write_config_dword(const struct 
pci_dev *dev, int where,
return pci_bus_write_config_dword(dev-bus, dev-devfn, where, val);
 }
 
+/* user-space driven config access */
+int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
+int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
+int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 *val);
+int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
+int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
+int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 val);
+
 int __must_check pci_enable_device(struct pci_dev *dev);
 int __must_check pci_enable_device_io(struct pci_dev *dev);
 int __must_check pci_enable_device_mem(struct pci_dev *dev);

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-22 Thread Alexander Graf



On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 On 22/05/12 15:52, Alexander Graf wrote:
 
 
 On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 22/05/12 13:21, Alexander Graf wrote:
 
 
 On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:
 
 On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
 Alexander,
 
 Is that any better? :)
 
 Alex (Graf that is), ping ?
 
 The original patch from Alexey was fine btw.
 
 VFIO will always call things with the existing capability offset so
 there's no real risk of doing the wrong thing or break the list or
 anything.
 
 IE. A small simple patch that addresses the problem :-)
 
 The new patch is a bit more robust I believe, I don't think we need to
 go too far to fix a problem we don't have. But we need a fix for the
 real issue and the simple patch does it neatly from what I can
 understand.
 
 Cheers,
 Ben.
 
 
 @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
 * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
  uint8_t offset, uint8_t size)
 {
 -uint8_t *config;
 +uint8_t *config, existing;
 
 Existing is a pointer to the target dev's config space, right?
 
 Yes.
 
   int i, overlapping_cap;
 
 +existing = pci_find_capability(pdev, cap_id);
 +if (existing) {
 +if (offset  (existing != offset)) {
 +return -EEXIST;
 +}
 +for (i = existing; i  size; ++i) {
 
 So how does this possibly make sense?
 
 Although I do not expect VFIO to add capabilities (does not make sense), I 
 still want to double
 check that this space has not been tried to use by someone else.
 
 i is an int. existing is a uint8_t*.
 
 
 It was there before me. This function already does a loop and this is how it 
 was coded at the first place.

Also, while at it, please add some comments at least for the code you add that 
explain why you do the things you do :).

Alex

[Qemu-devel] [PATCH v2 07/13] vfio: VFIO core

2012-05-22 Thread Alex Williamson

VFIO is a secure user level driver for use with both virtual machines
and user level drivers.  VFIO makes use of IOMMU groups to ensure the
isolation of devices in use, allowing unprivileged user access.  It's
intended that VFIO will replace KVM device assignment and UIO drivers
(in cases where the target platform includes a sufficiently capable
IOMMU).

New in this version of VFIO is support for IOMMU groups managed
through the IOMMU core as well as a rework of the API, removing the
group merge interface.  We now go back to a model more similar to
original VFIO with UIOMMU support where the file descriptor obtained
from /dev/vfio/vfio allows access to the IOMMU, but only after a
group is added, avoiding the previous privilege issues with this type
of model.  IOMMU support is also now fully modular as IOMMUs have
vastly different interface requirements on different platforms.  VFIO
users are able to query and initialize the IOMMU model of their
choice.

Please see the follow-on Documentation commit for further description
and usage example.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/ioctl/ioctl-number.txt |1 
 MAINTAINERS  |8 
 drivers/Kconfig  |2 
 drivers/Makefile |1 
 drivers/vfio/Kconfig |8 
 drivers/vfio/Makefile|1 
 drivers/vfio/vfio.c  | 1406 ++
 include/linux/vfio.h |  366 +
 8 files changed, 1793 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/vfio.c
 create mode 100644 include/linux/vfio.h

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index e34b531..111e30a 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -88,6 +88,7 @@ Code  Seq#(hex)   Include FileComments
and kernel/power/user.c
 '8'all SNP8023 advanced NIC card
mailto:m...@solidum.com
+';'64-6F   linux/vfio.h
 '@'00-0F   linux/radeonfb.hconflict!
 '@'00-0F   drivers/video/aty/aty128fb.cconflict!
 'A'00-1F   linux/apm_bios.hconflict!
diff --git a/MAINTAINERS b/MAINTAINERS
index b362709..5aca4ff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7229,6 +7229,14 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/
 
+VFIO DRIVER
+M: Alex Williamson alex.william...@redhat.com
+L: k...@vger.kernel.org
+S: Maintained
+F: Documentation/vfio.txt
+F: drivers/vfio/
+F: include/linux/vfio.h
+
 VIDEOBUF2 FRAMEWORK
 M: Pawel Osciak pa...@osciak.com
 M: Marek Szyprowski m.szyprow...@samsung.com
diff --git a/drivers/Kconfig b/drivers/Kconfig
index d236aef..46eb115 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -112,6 +112,8 @@ source drivers/auxdisplay/Kconfig
 
 source drivers/uio/Kconfig
 
+source drivers/vfio/Kconfig
+
 source drivers/vlynq/Kconfig
 
 source drivers/virtio/Kconfig
diff --git a/drivers/Makefile b/drivers/Makefile
index 95952c8..fe1880a 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_ATM) += atm/
 obj-$(CONFIG_FUSION)   += message/
 obj-y  += firewire/
 obj-$(CONFIG_UIO)  += uio/
+obj-$(CONFIG_VFIO) += vfio/
 obj-y  += cdrom/
 obj-y  += auxdisplay/
 obj-$(CONFIG_PCCARD)   += pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 000..9acb1e7
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,8 @@
+menuconfig VFIO
+   tristate VFIO Non-Privileged userspace driver framework
+   depends on IOMMU_API
+   help
+ VFIO provides a framework for secure userspace device drivers.
+ See Documentation/vfio.txt for more details.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 000..7500a67
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_VFIO) += vfio.o
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
new file mode 100644
index 000..6558eef
--- /dev/null
+++ b/drivers/vfio/vfio.c
@@ -0,0 +1,1406 @@
+/*
+ * VFIO core
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson alex.william...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#include linux/cdev.h

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-22 Thread Alexey Kardashevskiy

On 22/05/12 16:31, Alexander Graf wrote:
 
 
 On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 22/05/12 15:52, Alexander Graf wrote:


 On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 On 22/05/12 13:21, Alexander Graf wrote:


 On 22.05.2012, at 04:02, Benjamin Herrenschmidt 
 b...@kernel.crashing.org wrote:

 On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
 Alexander,

 Is that any better? :)

 Alex (Graf that is), ping ?

 The original patch from Alexey was fine btw.

 VFIO will always call things with the existing capability offset so
 there's no real risk of doing the wrong thing or break the list or
 anything.

 IE. A small simple patch that addresses the problem :-)

 The new patch is a bit more robust I believe, I don't think we need to
 go too far to fix a problem we don't have. But we need a fix for the
 real issue and the simple patch does it neatly from what I can
 understand.

 Cheers,
 Ben.


 @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
 * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
  uint8_t offset, uint8_t size)
 {
 -uint8_t *config;
 +uint8_t *config, existing;

 Existing is a pointer to the target dev's config space, right?

 Yes.

   int i, overlapping_cap;

 +existing = pci_find_capability(pdev, cap_id);
 +if (existing) {
 +if (offset  (existing != offset)) {
 +return -EEXIST;
 +}
 +for (i = existing; i  size; ++i) {

 So how does this possibly make sense?

 Although I do not expect VFIO to add capabilities (does not make sense), I 
 still want to double
 check that this space has not been tried to use by someone else.

 i is an int. existing is a uint8_t*.


 It was there before me. This function already does a loop and this is how it 
 was coded at the first place.
 
 Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...


Well it is still does not make much sense to have int i rather than uint8_t 
i :)


 +if (pdev-used[i]) {
 +return -EFAULT;
 +}
 +}
 +memset(pdev-used + offset, 0xFF, size);
 Why?

 Because I am marking the space this capability takes as used.

 But if it already existed (at the same offset), it should be set used 
 already, no? Unless size  existing size, in which case you might overwrite 
 data in the next chunk, no?


 No, it does not exist for VFIO - VFIO read the config space from the host 
 kernel first and then calls msi_init or msix_init or whatever_else_init 
 depending on what it got from the host kernel. And these xxx_init() 
 functions eventually call pci_add_capability().
 So why would the function that populates the config space initially not set 
 the used flag correctly?


This is internal kitchen of PCIDevice which I do not want to touch from 
anywhere but pci.c. And
there is no fixup_capability or something.


 Sure we can either implement own msi_init/msix_init (and may be others in 
 the future) for VFIO (which would do all the same as other QEMU devices 
 except touching the capabilities)  OR  hack msi_init/msix_init not to touch 
 capabilities if they exist.
 No, calling the internal functions sounds fine to me. It's the step before 
 that irritates me. VFIO shouldn't differ too much from an emulated device wrt 
 its config space population really.


The last thing we want for a VFIO device is changing its capabilities list.


 +/* Make capability read-only by default */
 +memset(pdev-wmask + offset, 0, size);
 Why?

 Because the pci_add_capability() does it for a new capability by default.

 Hrm. So you're copying code? Can't you merge the overwrite and write cases?

 I am trying to make it as a single chunk which is as small as possible.
 
 No, you're needlessly duplicating code which is a bad idea :). Please reuse 
 as much of the existing function as possible, unless it really doesn't make 
 sense.

I actually duplicated 4 (four) lines and did it just once. This is too little 
to be called
duplicating :) And I get very special case visually separated and easy to 
remove if we find a
better solution later.

But - no problemo, I'll rework it.


[no further comments]



 If it helps, below is the same patch with extended context to see what is 
 going on in that function.






 hw/pci.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)

 diff --git a/hw/pci.c b/hw/pci.c
 index 63a8219..7008a42 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
 is_default_rom)
 ptr = memory_region_get_ram_ptr(pdev-rom);
 load_image(path, ptr);
 g_free(path);

 if (is_default_rom) {
 /* Only the default rom images will be patched (if needed). */
 pci_patch_ids(pdev, ptr, size);
 }

 qemu_put_ram_ptr(ptr);

 pci_register_bar(pdev, PCI_ROM_SLOT, 0,

[Qemu-devel] [PATCH v2 04/13] pci: Add PCI DMA source ID quirk

2012-05-22 Thread Alex Williamson

DMA transactions are tagged with the source ID of the device making
the request.  Occasionally hardware screws this up and uses the
source ID of a different device (often the wrong function number of
a multifunction device).  A specific Ricoh multifunction device is
a prime example of this problem and included in this patch.  The
purpose of this function is that given a pci_dev, return the pci_dev
to use as the source ID for DMA.  When hardware works correctly,
this returns the input device.  For the components of the Ricoh
multifunction device, return the pci_dev for function 0.

This will be used by IOMMU drivers for determining the boundaries
of IOMMU groups as multiple devices using the same source ID must
be contained within the same group.  This can also be used by
existing streaming DMA paths for the same purpose.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/pci/quirks.c |   40 
 include/linux/pci.h  |5 +
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 4bf7102..a2dd77f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3109,3 +3109,43 @@ int pci_dev_specific_reset(struct pci_dev *dev, int 
probe)
 
return -ENOTTY;
 }
+
+static struct pci_dev *pci_func_0_dma_source(struct pci_dev *dev)
+{
+   return pci_get_slot(dev-bus, PCI_DEVFN(PCI_SLOT(dev-devfn), 0));
+}
+
+static const struct pci_dev_dma_source {
+   u16 vendor;
+   u16 device;
+   struct pci_dev *(*dma_source)(struct pci_dev *dev);
+} pci_dev_dma_source[] = {
+   /*
+* https://bugzilla.redhat.com/show_bug.cgi?id=605888
+*
+* Some Ricoh devices use the function 0 source ID for DMA on
+* other functions of a multifunction device.  The DMA devices
+* is therefore function 0, which will have implications of the
+* iommu grouping of these devices.
+*/
+   { PCI_VENDOR_ID_RICOH, 0xe822, pci_func_0_dma_source },
+   { PCI_VENDOR_ID_RICOH, 0xe230, pci_func_0_dma_source },
+   { PCI_VENDOR_ID_RICOH, 0xe832, pci_func_0_dma_source },
+   { PCI_VENDOR_ID_RICOH, 0xe832, pci_func_0_dma_source },
+   { 0 }
+};
+
+struct pci_dev *pci_dma_source(struct pci_dev *dev)
+{
+   const struct pci_dev_dma_source *i;
+
+   for (i = pci_dev_dma_source; i-dma_source; i++) {
+   if ((i-vendor == dev-vendor ||
+i-vendor == (u16)PCI_ANY_ID) 
+   (i-device == dev-device ||
+i-device == (u16)PCI_ANY_ID))
+   return i-dma_source(dev);
+   }
+
+   return dev;
+}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e444f5b..02dbfed 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1479,9 +1479,14 @@ enum pci_fixup_pass {
 
 #ifdef CONFIG_PCI_QUIRKS
 void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev);
+struct pci_dev *pci_dma_source(struct pci_dev *dev);
 #else
 static inline void pci_fixup_device(enum pci_fixup_pass pass,
struct pci_dev *dev) {}
+static inline struct pci_dev *pci_dma_source(struct pci_dev *dev)
+{
+   return dev;
+}
 #endif
 
 void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen);

[Qemu-devel] [PATCH v2 09/13] vfio: x86 IOMMU implementation

2012-05-22 Thread Alex Williamson

x86 is probably the wrong name for this VFIO IOMMU driver, but x86
is the primary target for it.  This driver support a very simple
usage model using the existing IOMMU API.  The IOMMU is expected to
support the full host address space with no special IOVA windows,
number of mappings restrictions, or unique processor target options.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/ioctl/ioctl-number.txt |2 
 drivers/vfio/Kconfig |6 
 drivers/vfio/Makefile|2 
 drivers/vfio/vfio.c  |7 
 drivers/vfio/vfio_iommu_x86.c|  743 ++
 include/linux/vfio.h |   52 ++
 6 files changed, 811 insertions(+), 1 deletions(-)
 create mode 100644 drivers/vfio/vfio_iommu_x86.c

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index 111e30a..9d1694e 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -88,7 +88,7 @@ Code  Seq#(hex)   Include FileComments
and kernel/power/user.c
 '8'all SNP8023 advanced NIC card
mailto:m...@solidum.com
-';'64-6F   linux/vfio.h
+';'64-72   linux/vfio.h
 '@'00-0F   linux/radeonfb.hconflict!
 '@'00-0F   drivers/video/aty/aty128fb.cconflict!
 'A'00-1F   linux/apm_bios.hconflict!
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 9acb1e7..bd88a30 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -1,6 +1,12 @@
+config VFIO_IOMMU_X86
+   tristate
+   depends on VFIO  X86
+   default n
+
 menuconfig VFIO
tristate VFIO Non-Privileged userspace driver framework
depends on IOMMU_API
+   select VFIO_IOMMU_X86 if X86
help
  VFIO provides a framework for secure userspace device drivers.
  See Documentation/vfio.txt for more details.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 7500a67..1f1abee 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1 +1,3 @@
 obj-$(CONFIG_VFIO) += vfio.o
+obj-$(CONFIG_VFIO_IOMMU_X86) += vfio_iommu_x86.o
+obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6558eef..89899a8 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1369,6 +1369,13 @@ static int __init vfio_init(void)
 
pr_info(DRIVER_DESC  version:  DRIVER_VERSION \n);
 
+   /*
+* Attempt to load known iommu-drivers.  This gives us a working
+* environment without the user needing to explicitly load iommu
+* drivers.
+*/
+   request_module_nowait(vfio_iommu_x86);
+
return 0;
 
 err_groups_cdev:
diff --git a/drivers/vfio/vfio_iommu_x86.c b/drivers/vfio/vfio_iommu_x86.c
new file mode 100644
index 000..a52391d
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_x86.c
@@ -0,0 +1,743 @@
+/*
+ * VFIO: IOMMU DMA mapping support for x86 (Intel VT-d  AMD-Vi)
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson alex.william...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#include linux/compat.h
+#include linux/device.h
+#include linux/fs.h
+#include linux/iommu.h
+#include linux/module.h
+#include linux/mm.h
+#include linux/pci.h /* pci_bus_type */
+#include linux/sched.h
+#include linux/slab.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/workqueue.h
+
+#define DRIVER_VERSION  0.2
+#define DRIVER_AUTHOR   Alex Williamson alex.william...@redhat.com
+#define DRIVER_DESC x86 IOMMU driver for VFIO
+
+static bool allow_unsafe_interrupts;
+module_param_named(allow_unsafe_interrupts,
+  allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(allow_unsafe_interrupts,
+Enable VFIO IOMMU support for on platforms without interrupt 
remapping support.);
+
+struct vfio_iommu {
+   struct iommu_domain *domain;
+   struct mutexlock;
+   struct list_headdma_list;
+   struct list_headgroup_list;
+   boolcache;
+};
+
+struct vfio_dma {
+   struct list_headnext;
+   dma_addr_t  iova;   /* Device address */
+   unsigned long   vaddr;  /* Process virtual addr */
+   longnpage;  /* Number of pages */
+   int prot;   /* IOMMU_READ/WRITE */
+};
+
+struct vfio_group {
+   struct iommu_group  *iommu_group;
+   struct list_headnext;
+};
+
+/*
+ * This

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-22 Thread Alexander Graf


On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote:

 On 22/05/12 16:31, Alexander Graf wrote:
 
 
 On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 22/05/12 15:52, Alexander Graf wrote:
 
 
 On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 22/05/12 13:21, Alexander Graf wrote:
 
 
 On 22.05.2012, at 04:02, Benjamin Herrenschmidt 
 b...@kernel.crashing.org wrote:
 
 On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
 Alexander,
 
 Is that any better? :)
 
 Alex (Graf that is), ping ?
 
 The original patch from Alexey was fine btw.
 
 VFIO will always call things with the existing capability offset so
 there's no real risk of doing the wrong thing or break the list or
 anything.
 
 IE. A small simple patch that addresses the problem :-)
 
 The new patch is a bit more robust I believe, I don't think we need to
 go too far to fix a problem we don't have. But we need a fix for the
 real issue and the simple patch does it neatly from what I can
 understand.
 
 Cheers,
 Ben.
 
 
 @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
 * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
 uint8_t offset, uint8_t size)
 {
 -uint8_t *config;
 +uint8_t *config, existing;
 
 Existing is a pointer to the target dev's config space, right?
 
 Yes.
 
  int i, overlapping_cap;
 
 +existing = pci_find_capability(pdev, cap_id);
 +if (existing) {
 +if (offset  (existing != offset)) {
 +return -EEXIST;
 +}
 +for (i = existing; i  size; ++i) {
 
 So how does this possibly make sense?
 
 Although I do not expect VFIO to add capabilities (does not make sense), 
 I still want to double
 check that this space has not been tried to use by someone else.
 
 i is an int. existing is a uint8_t*.
 
 
 It was there before me. This function already does a loop and this is how 
 it was coded at the first place.
 
 Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...
 
 
 Well it is still does not make much sense to have int i rather than 
 uint8_t i :)
 
 
 +if (pdev-used[i]) {
 +return -EFAULT;
 +}
 +}
 +memset(pdev-used + offset, 0xFF, size);
 Why?
 
 Because I am marking the space this capability takes as used.
 
 But if it already existed (at the same offset), it should be set used 
 already, no? Unless size  existing size, in which case you might 
 overwrite data in the next chunk, no?
 
 
 No, it does not exist for VFIO - VFIO read the config space from the host 
 kernel first and then calls msi_init or msix_init or whatever_else_init 
 depending on what it got from the host kernel. And these xxx_init() 
 functions eventually call pci_add_capability().
 So why would the function that populates the config space initially not set 
 the used flag correctly?
 
 
 This is internal kitchen of PCIDevice which I do not want to touch from 
 anywhere but pci.c. And
 there is no fixup_capability or something.

Hrm. Maybe we should have one? :) Or instead of populating the config space 
with the exact data from the host device, loop through the host device 
capabilities and populate them using this function as we go. That should 
maintain the offsets, but ensure that all internal flags are set, no?

 
 
 Sure we can either implement own msi_init/msix_init (and may be others in 
 the future) for VFIO (which would do all the same as other QEMU devices 
 except touching the capabilities)  OR  hack msi_init/msix_init not to touch 
 capabilities if they exist.
 No, calling the internal functions sounds fine to me. It's the step before 
 that irritates me. VFIO shouldn't differ too much from an emulated device 
 wrt its config space population really.
 
 
 The last thing we want for a VFIO device is changing its capabilities list.

Well - we want it to look the same. The population should go through the same 
methods as emulated devices have to go through, no? :)

 
 
 +/* Make capability read-only by default */
 +memset(pdev-wmask + offset, 0, size);
 Why?
 
 Because the pci_add_capability() does it for a new capability by default.
 
 Hrm. So you're copying code? Can't you merge the overwrite and write cases?
 
 I am trying to make it as a single chunk which is as small as possible.
 
 No, you're needlessly duplicating code which is a bad idea :). Please reuse 
 as much of the existing function as possible, unless it really doesn't make 
 sense.
 
 I actually duplicated 4 (four) lines and did it just once. This is too little 
 to be called
 duplicating :) And I get very special case visually separated and easy to 
 remove if we find a
 better solution later.

Yeah, but the special case really shouldn't be all that special - that's why I 
was irritated.

 But - no problemo, I'll rework it.

Thanks!

 [no further comments]

What about mine below? ;)


Alex

 
 
 
 If it helps, below is the

Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions

2012-05-22 Thread Benjamin Herrenschmidt

On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote:
 The emulated devices can run simultaneously with the guest, so
 we need to be careful with ordering of load and stores done by
 them to the guest system memory, which need to be observed in
 the right order by the guest operating system.
 
 This adds a barrier call to the basic DMA read/write ops which
 is currently implemented as a smp_mb(), but could be later
 improved for more fine grained control of barriers.
 
 Additionally, a _relaxed() variant of the accessors is provided
 to easily convert devices who would be performance sensitive
 and negatively impacted by the change.
 
 Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 ---

(Note to Rusty: The number I told you on ST is wrong, see below)

So I tried to do some performance measurements with that patch using
netperf on an x86 laptop (x220 with core i7).

It's a bit tricky. For example, if I just create a tap interface,
give it a local IP on the laptop and a different IP on the guest,
(ie talking to a netserver on the host basically from the guest
via tap), the performance is pretty poor and the numbers seem
useless with and without the barrier.

So I did tests involving talking to a server on our gigabit network
instead.

The baseline is the laptop without kvm talking to the server. The
TCP_STREAM test results are:

(The * at the beginning of the lines is something I added to
 distinguish multi-line results on some tests)

MIGRATED TCP STREAM TEST
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

* 87380  16384  1638410.02 933.02   
* 87380  16384  1638410.03 908.64   
* 87380  16384  1638410.03 926.78   
* 87380  16384  1638410.03 919.73   

It's a bit noisy, ideally I should do a point-to-point setup to
an otherwise idle machine, here I'm getting some general lab network
noise but it gives us a pretty good baseline to begin with.

I have not managed to get any sensible result out of UDP_STREAM in
that configuration for some reason (ie just the host laptop), ie
they look insane :

MIGRATED UDP STREAM TEST
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

*229376   65507   10.00  270468  014173.84
 126976   10.00  44  2.31
*229376   65507   10.00  266526  013967.32
 126976   10.00  41  2.15

So we don't have a good comparison baseline but we can still compare
KVM against itself with and without the barrier.

Now KVM. This is x86_64 running an ubuntu precise guest (I had the
ISO laying around) and using the default setup which appears to be
an emulated e1000. I've done some tests with slirp just to see how
bad it was and it's bad enough to be irrelevant. The numbers have
thus been done using a tap interface bridged to the host ethernet
(who happens to also be some kind of e1000).

For each test I've done 3 series of numbers:

 - Without the barrier added
 - With the barrier added to dma_memory_rw
 - With the barrier added to dma_memory_rw -and- dma_memory_map

First TCP_STREAM. The numbers are a bit noisy, I suspect somebody
was hammering the server machine while I was doing one of the tests,
but here's what I got when it appeared to have stabilized:

MIGRATED TCP STREAM TEST
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

no barrier
* 87380 16384   1638410.01880.31
* 87380 16384   1638410.01876.73
* 87380 16384   1638410.01880.73
* 87380 16384   1638410.01878.63
barrier
* 87380 16384   1638410.01869.39
* 87380 16384   1638410.01864.99
* 87380 16384   1638410.01886.13
* 87380 16384   1638410.01872.90
barrier + map
* 87380 16384   1638410.01867.45
* 87380 16384   1638410.01868.51
* 87380 16384   1638410.01888.94
* 87380 16384   1638410.01888.19

As far as I can tell, it's all in the noise. I was about to concede a
small (1 % ?) loss to the barrier until I ran the last 2 tests and
then I stopped caring :-)

With UDP_STREAM, we get something like that:
MIGRATED UDP STREAM TEST
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

no barrier
*229376   65507   10.005208 0  272.92
 126976   10.005208272.92
*229376   65507   10.005447 0  285.44
 126976   10.005447285.44
*229376   65507   10.005119 0  268.22
 126976   10.005119268.22
barrier

[Qemu-devel] [PATCH v2 08/13] vfio: Add documentation

2012-05-22 Thread Alex Williamson

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/vfio.txt |  315 
 1 files changed, 315 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/vfio.txt

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
new file mode 100644
index 000..1240874
--- /dev/null
+++ b/Documentation/vfio.txt
@@ -0,0 +1,315 @@
+VFIO - Virtual Function I/O[1]
+---
+Many modern system now provide DMA and interrupt remapping facilities
+to help ensure I/O devices behave within the boundaries they've been
+allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
+POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
+systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
+agnostic framework for exposing direct device access to userspace, in
+a secure, IOMMU protected environment.  In other words, this allows
+safe[2], non-privileged, userspace drivers.
+
+Why do we want that?  Virtual machines often make use of direct device
+access (device assignment) when configured for the highest possible
+I/O performance.  From a device and host perspective, this simply
+turns the VM into a userspace driver, with the benefits of
+significantly reduced latency, higher bandwidth, and direct use of
+bare-metal device drivers[3].
+
+Some applications, particularly in the high performance computing
+field, also benefit from low-overhead, direct device access from
+userspace.  Examples include network adapters (often non-TCP/IP based)
+and compute accelerators.  Prior to VFIO, these drivers had to either
+go through the full development cycle to become proper upstream
+driver, be maintained out of tree, or make use of the UIO framework,
+which has no notion of IOMMU protection, limited interrupt support,
+and requires root privileges to access things like PCI configuration
+space.
+
+The VFIO driver framework intends to unify these, replacing both the
+KVM PCI specific device assignment code as well as provide a more
+secure, more featureful userspace driver environment than UIO.
+
+Groups, Devices, and IOMMUs
+---
+
+Devices are the main target of any I/O driver.  Devices typically
+create a programming interface made up of I/O access, interrupts,
+and DMA.  Without going into the details of each of these, DMA is
+by far the most critical aspect for maintaining a secure environment
+as allowing a device read-write access to system memory imposes the
+greatest risk to the overall system integrity.
+
+To help mitigate this risk, many modern IOMMUs now incorporate
+isolation properties into what was, in many cases, an interface only
+meant for translation (ie. solving the addressing problems of devices
+with limited address spaces).  With this, devices can now be isolated
+from each other and from arbitrary memory access, thus allowing
+things like secure direct assignment of devices into virtual machines.
+
+This isolation is not always at the granularity of a single device
+though.  Even when an IOMMU is capable of this, properties of devices,
+interconnects, and IOMMU topologies can each reduce this isolation.
+For instance, an individual device may be part of a larger multi-
+function enclosure.  While the IOMMU may be able to distinguish
+between devices within the enclosure, the enclosure may not require
+transactions between devices to reach the IOMMU.  Examples of this
+could be anything from a multi-function PCI device with backdoors
+between functions to a non-PCI-ACS (Access Control Services) capable
+bridge allowing redirection without reaching the IOMMU.  Topology
+can also play a factor in terms of hiding devices.  A PCIe-to-PCI
+bridge masks the devices behind it, making transaction appear as if
+from the bridge itself.  Obviously IOMMU design plays a major factor
+as well.
+
+Therefore, while for the most part an IOMMU may have device level
+granularity, any system is susceptible to reduced granularity.  The
+IOMMU API therefore supports a notion of IOMMU groups.  A group is
+a set of devices which is isolatable from all other devices in the
+system.  Groups are therefore the unit of ownership used by VFIO.
+
+While the group is the minimum granularity that must be used to
+ensure secure user access, it's not necessarily the preferred
+granularity.  In IOMMUs which make use of page tables, it may be
+possible to share a set of page tables between different groups,
+reducing the overhead both to the platform (reduced TLB thrashing,
+reduced duplicate page tables), and to the user (programming only
+a single set of translations).  For this reason, VFIO makes use of
+a container class, which may hold one or more groups.  A container
+is created by simply opening the /dev/vfio/vfio character device.
+
+On its own, the container provides little

[Qemu-devel] [PATCH v2 03/13] iommu: IOMMU groups for VT-d and AMD-Vi

2012-05-22 Thread Alex Williamson

Add back group support for AMD  Intel.  amd_iommu already tracks
devices and has init and uninit routines to manage groups.
intel-iommu does this on the fly, so we make use of the notifier
support built into iommu groups to create and remove groups.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/iommu/amd_iommu.c   |   28 +-
 drivers/iommu/intel-iommu.c |   46 +++
 2 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 32c00cd..b7e5ddf 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -256,9 +256,11 @@ static bool check_device(struct device *dev)
 
 static int iommu_init_device(struct device *dev)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
struct iommu_dev_data *dev_data;
+   struct iommu_group *group;
u16 alias;
+   int ret;
 
if (dev-archdata.iommu)
return 0;
@@ -279,8 +281,30 @@ static int iommu_init_device(struct device *dev)
return -ENOTSUPP;
}
dev_data-alias_data = alias_data;
+
+   dma_pdev = pci_get_bus_and_slot(alias  8, alias  0xff);
+   } else
+   dma_pdev = pdev;
+
+   if (!pdev-is_virtfn  PCI_FUNC(pdev-devfn)  iommu_group_mf 
+   pdev-hdr_type == PCI_HEADER_TYPE_NORMAL)
+   dma_pdev = pci_get_slot(pdev-bus,
+   PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
+
+   group = iommu_group_get(dma_pdev-dev);
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group))
+   return PTR_ERR(group);
}
 
+   ret = iommu_group_add_device(group, dev);
+
+   iommu_group_put(group);
+
+   if (ret)
+   return ret;
+
if (pci_iommuv2_capable(pdev)) {
struct amd_iommu *iommu;
 
@@ -309,6 +333,8 @@ static void iommu_ignore_device(struct device *dev)
 
 static void iommu_uninit_device(struct device *dev)
 {
+   iommu_group_remove_device(dev);
+
/*
 * Nothing to do here - we keep dev_data around for unplugged devices
 * and reuse it when the device is re-plugged - not doing so would
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d4a0ff7..e63b33b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4087,6 +4087,50 @@ static int intel_iommu_domain_has_cap(struct 
iommu_domain *domain,
return 0;
 }
 
+static int intel_iommu_add_device(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_dev *bridge, *dma_pdev = pdev;
+   struct iommu_group *group;
+   int ret;
+
+   if (!device_to_iommu(pci_domain_nr(pdev-bus),
+pdev-bus-number, pdev-devfn))
+   return -ENODEV;
+
+   bridge = pci_find_upstream_pcie_bridge(pdev);
+   if (bridge) {
+   if (pci_is_pcie(bridge))
+   dma_pdev = pci_get_domain_bus_and_slot(
+   pci_domain_nr(pdev-bus),
+   bridge-subordinate-number, 0);
+   else
+   dma_pdev = bridge;
+   }
+
+   if (!pdev-is_virtfn  PCI_FUNC(pdev-devfn)  iommu_group_mf 
+   pdev-hdr_type == PCI_HEADER_TYPE_NORMAL)
+   dma_pdev = pci_get_slot(pdev-bus,
+   PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
+
+   group = iommu_group_get(dma_pdev-dev);
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+   }
+
+   ret = iommu_group_add_device(group, dev);
+
+   iommu_group_put(group);
+   return ret;
+}
+
+static void intel_iommu_remove_device(struct device *dev)
+{
+   iommu_group_remove_device(dev);
+}
+
 static struct iommu_ops intel_iommu_ops = {
.domain_init= intel_iommu_domain_init,
.domain_destroy = intel_iommu_domain_destroy,
@@ -4096,6 +4140,8 @@ static struct iommu_ops intel_iommu_ops = {
.unmap  = intel_iommu_unmap,
.iova_to_phys   = intel_iommu_iova_to_phys,
.domain_has_cap = intel_iommu_domain_has_cap,
+   .add_device = intel_iommu_add_device,
+   .remove_device  = intel_iommu_remove_device,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
 };

[Qemu-devel] [PATCH v2 05/13] pci: Add ACS validation utility

2012-05-22 Thread Alex Williamson

In a PCI environment, transactions aren't always required to reach
the root bus before being re-routed.  Intermediate switches between
an endpoint and the root bus can redirect DMA back downstream before
things like IOMMUs have a chance to intervene.  Legacy PCI is always
susceptible to this as it operates on a shared bus.  PCIe added a
new capability to describe and control this behavior, Access Control
Services, or ACS.  The utility function pci_acs_enabled() allows us
to test the ACS capabilities of an individual devices against a set
of flags while pci_acs_path_enabled() tests a complete path from
a given downstream device up to the specified upstream device.  We
also include the ability to add device specific tests as it's
likely we'll see devices that do no implement ACS, but want to
indicate support for various capabilities in this space.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/pci/pci.c|   76 ++
 drivers/pci/quirks.c |   29 +++
 include/linux/pci.h  |   10 ++-
 3 files changed, 114 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 111569c..ab6c2a6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2359,6 +2359,82 @@ void pci_enable_acs(struct pci_dev *dev)
 }
 
 /**
+ * pci_acs_enable - test ACS against required flags for a given device
+ * @pdev: device to test
+ * @acs_flags: required PCI ACS flags
+ *
+ * Return true if the device supports the provided flags.  Automatically
+ * filters out flags that are not implemented on multifunction devices.
+ */
+bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
+{
+   int pos;
+   u16 ctrl;
+
+   if (pci_dev_specific_acs_enabled(pdev, acs_flags))
+   return true;
+
+   if (!pci_is_pcie(pdev))
+   return false;
+
+   if (pdev-pcie_type == PCI_EXP_TYPE_DOWNSTREAM ||
+   pdev-pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return false;
+
+   pci_read_config_word(pdev, pos + PCI_ACS_CTRL, ctrl);
+   if ((ctrl  acs_flags) != acs_flags)
+   return false;
+   } else if (pdev-multifunction) {
+   /* Filter out flags not applicable to multifunction */
+   acs_flags = (PCI_ACS_RR | PCI_ACS_CR |
+ PCI_ACS_EC | PCI_ACS_DT);
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return false;
+
+   pci_read_config_word(pdev, pos + PCI_ACS_CTRL, ctrl);
+   if ((ctrl  acs_flags) != acs_flags)
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL_GPL(pci_acs_enabled);
+
+/**
+ * pci_acs_path_enable - test ACS flags from start to end in a hierarchy
+ * @start: starting downstream device
+ * @end: ending upstream device or NULL to search to the root bus
+ * @acs_flags: required flags
+ *
+ * Walk up a device tree from start to end testing PCI ACS support.  If
+ * any step along the way does not support the required flags, return false.
+ */
+bool pci_acs_path_enabled(struct pci_dev *start,
+ struct pci_dev *end, u16 acs_flags)
+{
+   struct pci_dev *pdev, *parent = start;
+
+   do {
+   pdev = parent;
+
+   if (!pci_acs_enabled(pdev, acs_flags))
+   return false;
+
+   if (pci_is_root_bus(pdev-bus))
+   return (end == NULL);
+
+   parent = pdev-bus-self;
+   } while (pdev != end);
+
+   return true;
+}
+EXPORT_SYMBOL_GPL(pci_acs_path_enabled);
+
+/**
  * pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge
  * @dev: the PCI device
  * @pin: the INTx pin (1=INTA, 2=INTB, 3=INTD, 4=INTD)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index a2dd77f..4ed6aa6 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3149,3 +3149,32 @@ struct pci_dev *pci_dma_source(struct pci_dev *dev)
 
return dev;
 }
+
+static const struct pci_dev_acs_enabled {
+   u16 vendor;
+   u16 device;
+   bool (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
+} pci_dev_acs_enabled[] = {
+   { 0 }
+};
+
+bool pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags)
+{
+   const struct pci_dev_acs_enabled *i;
+
+   /*
+* Allow devices that do not expose standard PCI ACS capabilities
+* or control to indicate their support here.  Multi-function devices
+* which do not allow internal peer-to-peer between functions, but
+* do not implement PCI ACS may wish to return true here.
+*/
+   for (i = pci_dev_acs_enabled; i-acs_enabled; i++) {
+   if ((i-vendor == dev-vendor ||
+i-vendor ==

[Qemu-devel] [PATCH v2 01/13] driver core: Add iommu_group tracking to struct device

2012-05-22 Thread Alex Williamson

IOMMU groups allow IOMMU drivers to represent DMA visibility
and isolation of devices.  Multiple devices may be grouped
together for the purposes of DMA.  Placing a pointer on
struct device enable easy access for things like streaming
DMA programming and drivers like VFIO.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org
---

 include/linux/device.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5ad17cc..13dd26b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -35,6 +35,7 @@ struct subsys_private;
 struct bus_type;
 struct device_node;
 struct iommu_ops;
+struct iommu_group;
 
 struct bus_attribute {
struct attributeattr;
@@ -677,6 +678,7 @@ struct device {
const struct attribute_group **groups;  /* optional groups */
 
void(*release)(struct device *dev);
+   struct iommu_group  *iommu_group;
 };
 
 /* Get the wakeup routines, which depend on struct device */

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-22 Thread Benjamin Herrenschmidt

On Tue, 2012-05-22 at 09:13 +0200, Alexander Graf wrote:
 On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote:

  This is internal kitchen of PCIDevice which I do not want to touch
 from anywhere but pci.c. And
  there is no fixup_capability or something.
 
 Hrm. Maybe we should have one? :) Or instead of populating the config
 space with the exact data from the host device,
 loop through the host device capabilities and populate them using this
 function as we go.
  That should maintain the offsets, but ensure that all internal flags
 are set, no?

That actually sounds reasonable, though it might be more efficient
to have something like pci_parse_config() or similar called once
with a pre-cooked config space.

Internally inside pci.c it can do that same loop you mention. The
advantage is that it can also do whatever else we might need in the
future. If for some reason, something wants to cache a cap pointer
it can be done there, whatever else that is normally initialized as
fields to generate the config space can be initialized by reading the
config space and then initializing the fields etc... from that one
function.

Cheers,
Ben.

[Qemu-devel] [PATCH v2 11/13] pci: Create common pcibios_err_to_errno

2012-05-22 Thread Alex Williamson

For returning errors out to non-PCI code.  Re-name xen's version.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---

 drivers/xen/xen-pciback/conf_space.c |6 +++---
 include/linux/pci.h  |   26 ++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xen-pciback/conf_space.c 
b/drivers/xen/xen-pciback/conf_space.c
index 30d7be0..46ae0f9 100644
--- a/drivers/xen/xen-pciback/conf_space.c
+++ b/drivers/xen/xen-pciback/conf_space.c
@@ -124,7 +124,7 @@ static inline u32 merge_value(u32 val, u32 new_val, u32 
new_val_mask,
return val;
 }
 
-static int pcibios_err_to_errno(int err)
+static int xen_pcibios_err_to_errno(int err)
 {
switch (err) {
case PCIBIOS_SUCCESSFUL:
@@ -202,7 +202,7 @@ out:
   pci_name(dev), size, offset, value);
 
*ret_val = value;
-   return pcibios_err_to_errno(err);
+   return xen_pcibios_err_to_errno(err);
 }
 
 int xen_pcibk_config_write(struct pci_dev *dev, int offset, int size, u32 
value)
@@ -290,7 +290,7 @@ int xen_pcibk_config_write(struct pci_dev *dev, int offset, 
int size, u32 value)
}
}
 
-   return pcibios_err_to_errno(err);
+   return xen_pcibios_err_to_errno(err);
 }
 
 void xen_pcibk_config_free_dyn_fields(struct pci_dev *dev)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0cf57d5..b0f79b3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -467,6 +467,32 @@ static inline bool pci_dev_msi_enabled(struct pci_dev 
*pci_dev) { return false;
 #define PCIBIOS_SET_FAILED 0x88
 #define PCIBIOS_BUFFER_TOO_SMALL   0x89
 
+/*
+ * Translate above to generic errno for passing back through non-pci.
+ */
+static inline int pcibios_err_to_errno(int err)
+{
+   if (err = PCIBIOS_SUCCESSFUL)
+   return err; /* Assume already errno */
+
+   switch (err) {
+   case PCIBIOS_FUNC_NOT_SUPPORTED:
+   return -ENOENT;
+   case PCIBIOS_BAD_VENDOR_ID:
+   return -EINVAL;
+   case PCIBIOS_DEVICE_NOT_FOUND:
+   return -ENODEV;
+   case PCIBIOS_BAD_REGISTER_NUMBER:
+   return -EFAULT;
+   case PCIBIOS_SET_FAILED:
+   return -EIO;
+   case PCIBIOS_BUFFER_TOO_SMALL:
+   return -ENOSPC;
+   }
+
+   return -ENOTTY;
+}
+
 /* Low-level architecture-dependent routines */
 
 struct pci_ops {

[Qemu-devel] [PATCH v2 12/13] pci: Misc pci_reg additions

2012-05-22 Thread Alex Williamson

Fill in many missing definitions and add sizeof fields for many
sections allowing for more extensive config parsing.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 include/linux/pci_regs.h |  112 +-
 1 files changed, 100 insertions(+), 12 deletions(-)

diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index 4b608f5..379be84 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -26,6 +26,7 @@
  * Under PCI, each device has 256 bytes of configuration address space,
  * of which the first 64 bytes are standardized as follows:
  */
+#define PCI_STD_HEADER_SIZEOF  64
 #define PCI_VENDOR_ID  0x00/* 16 bits */
 #define PCI_DEVICE_ID  0x02/* 16 bits */
 #define PCI_COMMAND0x04/* 16 bits */
@@ -209,9 +210,12 @@
 #define  PCI_CAP_ID_SHPC   0x0C/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID  0x0D/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3   0x0E/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SECDEV 0x0F/* Secure Device */
 #define  PCI_CAP_ID_EXP0x10/* PCI Express */
 #define  PCI_CAP_ID_MSIX   0x11/* MSI-X */
+#define  PCI_CAP_ID_SATA   0x12/* SATA Data/Index Conf. */
 #define  PCI_CAP_ID_AF 0x13/* PCI Advanced Features */
+#define  PCI_CAP_ID_MAXPCI_CAP_ID_AF
 #define PCI_CAP_LIST_NEXT  1   /* Next capability in the list */
 #define PCI_CAP_FLAGS  2   /* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF 4
@@ -276,6 +280,7 @@
 #define  PCI_VPD_ADDR_MASK 0x7fff  /* Address mask */
 #define  PCI_VPD_ADDR_F0x8000  /* Write 0, 1 indicates 
completion */
 #define PCI_VPD_DATA   4   /* 32-bits of data returned here */
+#define PCI_CAP_VPD_SIZEOF 8
 
 /* Slot Identification */
 
@@ -297,8 +302,10 @@
 #define PCI_MSI_ADDRESS_HI 8   /* Upper 32 bits (if 
PCI_MSI_FLAGS_64BIT set) */
 #define PCI_MSI_DATA_328   /* 16 bits of data for 32-bit 
devices */
 #define PCI_MSI_MASK_3212  /* Mask bits register for 
32-bit devices */
+#define PCI_MSI_PENDING_32 16  /* Pending intrs for 32-bit devices */
 #define PCI_MSI_DATA_6412  /* 16 bits of data for 64-bit 
devices */
 #define PCI_MSI_MASK_6416  /* Mask bits register for 
64-bit devices */
+#define PCI_MSI_PENDING_64 20  /* Pending intrs for 64-bit devices */
 
 /* MSI-X registers */
 #define PCI_MSIX_FLAGS 2
@@ -308,6 +315,7 @@
 #define PCI_MSIX_TABLE 4
 #define PCI_MSIX_PBA   8
 #define  PCI_MSIX_FLAGS_BIRMASK(7  0)
+#define PCI_CAP_MSIX_SIZEOF12  /* size of MSIX registers */
 
 /* MSI-X entry's format */
 #define PCI_MSIX_ENTRY_SIZE16
@@ -338,6 +346,7 @@
 #define  PCI_AF_CTRL_FLR   0x01
 #define PCI_AF_STATUS  5
 #define  PCI_AF_STATUS_TP  0x01
+#define PCI_CAP_AF_SIZEOF  6   /* size of AF registers */
 
 /* PCI-X registers */
 
@@ -374,6 +383,9 @@
 #define  PCI_X_STATUS_SPL_ERR  0x2000  /* Rcvd Split Completion Error 
Msg */
 #define  PCI_X_STATUS_266MHZ   0x4000  /* 266 MHz capable */
 #define  PCI_X_STATUS_533MHZ   0x8000  /* 533 MHz capable */
+#define PCI_X_ECC_CSR  8   /* ECC control and status */
+#define PCI_CAP_PCIX_SIZEOF_V0 8   /* size of registers for Version 0 */
+#define PCI_CAP_PCIX_SIZEOF_V1224  /* size for Version 1  2 */
 
 /* PCI Bridge Subsystem ID registers */
 
@@ -462,6 +474,7 @@
 #define  PCI_EXP_LNKSTA_DLLLA  0x2000  /* Data Link Layer Link Active */
 #define  PCI_EXP_LNKSTA_LBMS   0x4000  /* Link Bandwidth Management Status */
 #define  PCI_EXP_LNKSTA_LABS   0x8000  /* Link Autonomous Bandwidth Status */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 20  /* v1 endpoints end here */
 #define PCI_EXP_SLTCAP 20  /* Slot Capabilities */
 #define  PCI_EXP_SLTCAP_ABP0x0001 /* Attention Button Present */
 #define  PCI_EXP_SLTCAP_PCP0x0002 /* Power Controller Present */
@@ -521,6 +534,7 @@
 #define  PCI_EXP_OBFF_MSGA_EN  0x2000  /* OBFF enable with Message type A */
 #define  PCI_EXP_OBFF_MSGB_EN  0x4000  /* OBFF enable with Message type B */
 #define  PCI_EXP_OBFF_WAKE_EN  0x6000  /* OBFF using WAKE# signaling */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 44  /* v2 endpoints end here */
 #define PCI_EXP_LNKCTL248  /* Link Control 2 */
 #define PCI_EXP_SLTCTL256  /* Slot Control 2 */
 
@@ -529,23 +543,43 @@
 #define PCI_EXT_CAP_VER(header)((header  16)  0xf)
 #define PCI_EXT_CAP_NEXT(header)   ((header  20)  0xffc)
 
-#define PCI_EXT_CAP_ID_ERR 1
-#define PCI_EXT_CAP_ID_VC  2
-#define PCI_EXT_CAP_ID_DSN 3
-#define PCI_EXT_CAP_ID_PWR 4
-#define PCI_EXT_CAP_ID_VNDR11
-#define

[Qemu-devel] [PATCH v2 00/13] IOMMU Groups + VFIO

2012-05-22 Thread Alex Williamson

Version 2 incorporating acks and feedback from v1.  The PCI DMA quirk
and ACS check are reworked, sysfs iommu groups ABI Documentation
added as well as numerous other fixes, including patches from Alexey
Kardashevskiy towards supporting POWER usage of VFIO and IOMMU groups.

This series can be found here on top of 3.4:

git://github.com/awilliam/linux-vfio.git iommu-group-vfio-20120521

The Qemu tree has also been updated to Qemu 1.1 and can be found here:

git://github.com/awilliam/qemu-vfio.git iommu-group-vfio

I'd really like to make a push to get this in for 3.5, so let's talk
about how to do that across iommu, pci, and new driver.  Joerg, are
you sufficiently happy with the IOMMU group concept and code?  We'll
also need David Woodhouse buyin on the intel-iommu changes in patches
3  6.  Who needs to approve VFIO as a new driver, GregKH?  Bjorn,
I'd be happy to send the PCI changes as a series for you, but I
wonder if it makes sense to collect acks for them if you approve and
bundle them in with the associated code that needs them so you're
not left with unused code.  Let me know which you prefer.  If there
are better ways to do it, please let me know.  Thanks,

Alex

---

Alex Williamson (13):
  vfio: Add PCI device driver
  pci: Misc pci_reg additions
  pci: Create common pcibios_err_to_errno
  pci: export pci_user functions for use by other drivers
  vfio: x86 IOMMU implementation
  vfio: Add documentation
  vfio: VFIO core
  iommu: Make use of DMA quirking and ACS enabled check for groups
  pci: Add ACS validation utility
  pci: Add PCI DMA source ID quirk
  iommu: IOMMU groups for VT-d and AMD-Vi
  iommu: IOMMU Groups
  driver core: Add iommu_group tracking to struct device


 .../ABI/testing/sysfs-kernel-iommu_groups  |   14 
 Documentation/ioctl/ioctl-number.txt   |1 
 Documentation/vfio.txt |  315 
 MAINTAINERS|8 
 drivers/Kconfig|2 
 drivers/Makefile   |1 
 drivers/iommu/amd_iommu.c  |   67 +
 drivers/iommu/intel-iommu.c|   87 +
 drivers/iommu/iommu.c  |  578 +++-
 drivers/pci/access.c   |6 
 drivers/pci/pci.c  |   76 +
 drivers/pci/pci.h  |7 
 drivers/pci/quirks.c   |   69 +
 drivers/vfio/Kconfig   |   16 
 drivers/vfio/Makefile  |3 
 drivers/vfio/pci/Kconfig   |8 
 drivers/vfio/pci/Makefile  |4 
 drivers/vfio/pci/vfio_pci.c|  557 +++
 drivers/vfio/pci/vfio_pci_config.c | 1522 
 drivers/vfio/pci/vfio_pci_intrs.c  |  724 ++
 drivers/vfio/pci/vfio_pci_private.h|   91 +
 drivers/vfio/pci/vfio_pci_rdwr.c   |  269 
 drivers/vfio/vfio.c| 1413 +++
 drivers/vfio/vfio_iommu_x86.c  |  743 ++
 drivers/xen/xen-pciback/conf_space.c   |6 
 include/linux/device.h |2 
 include/linux/iommu.h  |  104 +
 include/linux/pci.h|   49 +
 include/linux/pci_regs.h   |  112 +
 include/linux/vfio.h   |  444 ++
 30 files changed, 7182 insertions(+), 116 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups
 create mode 100644 Documentation/vfio.txt
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/pci/Kconfig
 create mode 100644 drivers/vfio/pci/Makefile
 create mode 100644 drivers/vfio/pci/vfio_pci.c
 create mode 100644 drivers/vfio/pci/vfio_pci_config.c
 create mode 100644 drivers/vfio/pci/vfio_pci_intrs.c
 create mode 100644 drivers/vfio/pci/vfio_pci_private.h
 create mode 100644 drivers/vfio/pci/vfio_pci_rdwr.c
 create mode 100644 drivers/vfio/vfio.c
 create mode 100644 drivers/vfio/vfio_iommu_x86.c
 create mode 100644 include/linux/vfio.h

[Qemu-devel] [PATCH v2 06/13] iommu: Make use of DMA quirking and ACS enabled check for groups

2012-05-22 Thread Alex Williamson

Incorporate DMA quirking and ACS checking into amd_iommu and
intel-iommu.  Note that IOMMU groups are not yet used for
streaming DMA, so this doesn't immediately solve the problems
with broken Ricoh devices.  This a very strict implementation of
ACS checking, which will often result in multifunction devices
being grouped together.  This is actually a good thing as we
generally have no reason to trust isolation between functions,
but I won't be surprised if we later add a boot option to relax
this if a user wants to opt-in to a less secure grouping.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/iommu/amd_iommu.c   |   18 ++
 drivers/iommu/intel-iommu.c |   18 ++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b7e5ddf..be72d6d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -254,6 +254,8 @@ static bool check_device(struct device *dev)
return true;
 }
 
+#define PCI_ACS_ENABLED(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | 
PCI_ACS_UF)
+
 static int iommu_init_device(struct device *dev)
 {
struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
@@ -291,6 +293,22 @@ static int iommu_init_device(struct device *dev)
dma_pdev = pci_get_slot(pdev-bus,
PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
 
+   dma_pdev = pci_dma_source(dma_pdev);
+
+   if (dma_pdev-multifunction 
+   !pci_acs_enabled(dma_pdev, PCI_ACS_ENABLED))
+   dma_pdev = pci_get_slot(dma_pdev-bus,
+   PCI_DEVFN(PCI_SLOT(dma_pdev-devfn),
+   0));
+
+   while (!pci_is_root_bus(dma_pdev-bus)) {
+   if (pci_acs_path_enabled(dma_pdev-bus-self,
+NULL, PCI_ACS_ENABLED))
+   break;
+
+   dma_pdev = dma_pdev-bus-self;
+   }
+
group = iommu_group_get(dma_pdev-dev);
if (!group) {
group = iommu_group_alloc();
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e63b33b..cf2a650 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4087,6 +4087,8 @@ static int intel_iommu_domain_has_cap(struct iommu_domain 
*domain,
return 0;
 }
 
+#define PCI_ACS_ENABLED(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | 
PCI_ACS_UF)
+
 static int intel_iommu_add_device(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
@@ -4113,6 +4115,22 @@ static int intel_iommu_add_device(struct device *dev)
dma_pdev = pci_get_slot(pdev-bus,
PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
 
+   dma_pdev = pci_dma_source(dma_pdev);
+
+   if (dma_pdev-multifunction 
+   !pci_acs_enabled(dma_pdev, PCI_ACS_ENABLED))
+   dma_pdev = pci_get_slot(dma_pdev-bus,
+   PCI_DEVFN(PCI_SLOT(dma_pdev-devfn),
+   0));
+
+   while (!pci_is_root_bus(dma_pdev-bus)) {
+   if (pci_acs_path_enabled(dma_pdev-bus-self,
+NULL, PCI_ACS_ENABLED))
+   break;
+
+   dma_pdev = dma_pdev-bus-self;
+   }
+
group = iommu_group_get(dma_pdev-dev);
if (!group) {
group = iommu_group_alloc();

Re: [Qemu-devel] [PATCH next v2 00/74] QOM CPUState, part 3: CPU reset

2012-05-22 Thread Fabien Chouteau

On 05/22/2012 02:34 AM, Andreas Färber wrote:
 Am 21.05.2012 20:20, schrieb Blue Swirl:
 On Mon, May 21, 2012 at 9:09 AM, Andreas Färber afaer...@suse.de wrote:
 Am 14.05.2012 23:22, schrieb Blue Swirl:
 On Mon, May 14, 2012 at 8:59 PM, Andreas Färber afaer...@suse.de wrote:
 Am 14.05.2012 21:54, schrieb Blue Swirl:
 On Mon, May 14, 2012 at 4:15 PM, Andreas Färber afaer...@suse.de wrote:
 Am 10.05.2012 02:13, schrieb Andreas Färber:
 Andreas Färber (74):
 [...]
   target-sparc: Let cpu_sparc_init() return SPARCCPU
   sun4m: Use cpu_sparc_init() to obtain SPARCCPU
   sun4m: Pass SPARCCPU to {main,secondary}_cpu_reset()
   sun4u: Use cpu_sparc_init() to obtain SPARCCPU
   sun4u: Let cpu_devinit() return SPARCCPU
   sun4u: Store SPARCCPU in ResetData
   leon3: Use cpu_sparc_init() to obtain SPARCCPU
   leon3: Store SPARCCPU in ResetData
 [...]

 Forgot to mention the bsd-user patch.

   Kill off cpu_state_reset()

 Ping! Blue, can you ack please?

 Otherwise the patches look pretty safe (if it compiles, it works).

 Was I supposed to interpret that as an Acked-by? ;)

 No, but you may do so if it compiles. ;-)
 
 Joking, are we? My cover letter indicated pretty clearly on what
 platforms I had already compile-tested on, and that meant each patch.
 I had also run `make check` on each and the sparc test image for sun4m,
 HelenOS for sun4u and crible for leon3.
 
 Applied the remainder of the series to qom-next:
 http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-next
 
 Tristan/Fabien, could you please submit a patch to document the Leon3
 machine in MAINTAINERS? No one got cc'ed on those two patches. Thanks.
 

OK, I'll send a patch.

-- 
Fabien Chouteau

[Qemu-devel] [PATCH v2 02/13] iommu: IOMMU Groups

2012-05-22 Thread Alex Williamson

IOMMU device groups are currently a rather vague associative notion
with assembly required by the user or user level driver provider to
do anything useful.  This patch intends to grow the IOMMU group concept
into something a bit more consumable.

To do this, we first create an object representing the group, struct
iommu_group.  This structure is allocated (iommu_group_alloc) and
filled (iommu_group_add_device) by the iommu driver.  The iommu driver
is free to add devices to the group using it's own set of policies.
This allows inclusion of devices based on physical hardware or topology
limitations of the platform, as well as soft requirements, such as
multi-function trust levels or peer-to-peer protection of the
interconnects.  Each device may only belong to a single iommu group,
which is linked from struct device.iommu_group.  IOMMU groups are
maintained using kobject reference counting, allowing for automatic
removal of empty, unreferenced groups.  It is the responsibility of
the iommu driver to remove devices from the group
(iommu_group_remove_device).

IOMMU groups also include a userspace representation in sysfs under
/sys/kernel/iommu_groups.  When allocated, each group is given a
dynamically assign ID (int).  The ID is managed by the core IOMMU group
code to support multiple heterogeneous iommu drivers, which could
potentially collide in group naming/numbering.  This also keeps group
IDs to small, easily managed values.  A directory is created under
/sys/kernel/iommu_groups for each group.  A further subdirectory named
devices contains links to each device within the group.  The iommu_group
file in the device's sysfs directory, which formerly contained a group
number when read, is now a link to the iommu group.  Example:

$ ls -l /sys/kernel/iommu_groups/26/devices/
total 0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 :00:1e.0 -
../../../../devices/pci:00/:00:1e.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 :06:0d.0 -
../../../../devices/pci:00/:00:1e.0/:06:0d.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 :06:0d.1 -
../../../../devices/pci:00/:00:1e.0/:06:0d.1

$ ls -l  /sys/kernel/iommu_groups/26/devices/*/iommu_group
[truncating perms/owner/timestamp]
/sys/kernel/iommu_groups/26/devices/:00:1e.0/iommu_group -
../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/:06:0d.0/iommu_group -
../../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/:06:0d.1/iommu_group -
../../../../kernel/iommu_groups/26

Groups also include several exported functions for use by user level
driver providers, for example VFIO.  These include:

iommu_group_get(): Acquires a reference to a group from a device
iommu_group_put(): Releases reference
iommu_group_for_each_dev(): Iterates over group devices using callback
iommu_group_[un]register_notifier(): Allows notification of device add
and remove operations relevant to the group
iommu_group_id(): Return the group number

This patch also extends the IOMMU API to allow attaching groups to
domains.  This is currently a simple wrapper for iterating through
devices within a group, but it's expected that the IOMMU API may
eventually make groups a more integral part of domains.

Groups intentionally do not try to manage group ownership.  A user
level driver provider must independently acquire ownership for each
device within a group before making use of the group as a whole.
This may change in the future if group usage becomes more pervasive
across both DMA and IOMMU ops.

Groups intentionally do not provide a mechanism for driver locking
or otherwise manipulating driver matching/probing of devices within
the group.  Such interfaces are generic to devices and beyond the
scope of IOMMU groups.  If implemented, user level providers have
ready access via iommu_group_for_each_dev and group notifiers.

iommu_device_group() is removed here as it has no users.  The
replacement is:

group = iommu_group_get(dev);
id = iommu_group_id(group);
iommu_group_put(group);

AMD-Vi  Intel VT-d support re-added in following patches.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 .../ABI/testing/sysfs-kernel-iommu_groups  |   14 
 drivers/iommu/amd_iommu.c  |   21 -
 drivers/iommu/intel-iommu.c|   49 --
 drivers/iommu/iommu.c  |  578 +++-
 include/linux/iommu.h  |  104 +++-
 5 files changed, 663 insertions(+), 103 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
new file mode 100644
index 000..9b31556
--- /dev/null
+++

[Qemu-devel] [PATCH] Add MAINTAINERS entry for leon3

2012-05-22 Thread Fabien Chouteau


Signed-off-by: Fabien Chouteau chout...@adacore.com
---
 MAINTAINERS |6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b45f075..3d773d3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -387,6 +387,12 @@ M: Blue Swirl blauwir...@gmail.com
 S: Maintained
 F: hw/sun4u.c
 
+Leon3
+M: Fabien Chouteau chout...@adacore.com
+S: Maintained
+F: hw/leon3.c
+F: hw/grlib*
+
 S390 Machines
 -
 S390 Virtio
-- 
1.7.9.5

Re: [Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file

2012-05-22 Thread Kevin Wolf

Am 21.05.2012 22:19, schrieb Corey Bryant:
 libvirt's sVirt security driver provides SELinux MAC isolation for
 Qemu guest processes and their corresponding image files.  In other
 words, sVirt uses SELinux to prevent a QEMU process from opening
 files that do not belong to it.
 
 sVirt provides this support by labeling guests and resources with
 security labels that are stored in file system extended attributes.
 Some file systems, such as NFS, do not support the extended
 attribute security namespace, and therefore cannot support sVirt
 isolation.
 
 A solution to this problem is to provide fd passing support, where
 libvirt opens files and passes file descriptors to QEMU.  This,
 along with SELinux policy to prevent QEMU from opening files, can
 provide image file isolation for NFS files.
 
 This patch series adds the -filefd command-line option and the
 getfd_file monitor command.  This will enable libvirt to open a
 file and push the corresponding filename and file descriptor to
 QEMU.  When QEMU needs to open a file, it will first check if the
 file descriptor was passed by either of these methods before
 attempting to actually open the file.

I thought we decided to avoid making some file names magic, and instead
go for the obvious /dev/fd/42?

Kevin

Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure

2012-05-22 Thread Kevin Wolf

Am 21.05.2012 19:56, schrieb Blue Swirl:
 On Mon, May 21, 2012 at 10:25 AM, Jim Meyering j...@meyering.net wrote:
 Blue Swirl wrote:
 On Tue, May 15, 2012 at 1:04 PM,  j...@meyering.net wrote:
 From: Jim Meyering meyer...@redhat.com

 Without this, envlist_to_environ may silently fail to copy all
 strings into the destination buffer, and both callers would leak
 any env strings allocated after a failing strdup, because the
 freeing code stops at the first NULL pointer.

 Signed-off-by: Jim Meyering meyer...@redhat.com
 ---
  envlist.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

 diff --git a/envlist.c b/envlist.c
 index f2303cd..2bbd99c 100644
 --- a/envlist.c
 +++ b/envlist.c
 @@ -235,7 +235,14 @@ envlist_to_environ(const envlist_t *envlist, size_t 
 *count)

for (entry = envlist-el_entries.lh_first; entry != NULL;
entry = entry-ev_link.le_next) {
 -   *(penv++) = strdup(entry-ev_var);
 +   if ((*(penv++) = strdup(entry-ev_var)) == NULL) {
 +   char **e = env;
 +   while (e != penv) {
 +   free(*e++);
 +   }
 +   free(env);
 +   return NULL;
 +   }

 ERROR: code indent should never use tabs
 #82: FILE: envlist.c:238:
 +^I^Iif ((*(penv++) = strdup(entry-ev_var)) == NULL) {$

 That entire file is indented solely with TABs, so adding these new
 lines using spaces for indentation seems unjustified: the mix tends
 to make the code unreadable in some contexts (email quoting, for one).
 How about two patches: one to convert all leading TABs in envlist.c to
 spaces, and the next to make the above change, but indenting with spaces?

 ERROR: do not use assignment in if condition
 #82: FILE: envlist.c:238:
 +   if ((*(penv++) = strdup(entry-ev_var)) == NULL) {

 I agree with the sentiment, but found that the alternative was less
 readable and less maintainable, since I'd have to increment penv in
 two places (both in the if-block and after it) rather than in just one.
 However, I've just realized I can hoist the penv++ increment into the
 for-statement, in which case it's ok:

for (entry = envlist-el_entries.lh_first; entry != NULL;
 entry = entry-ev_link.le_next, penv++) {
*penv = strdup(entry-ev_var);
if (*penv == NULL) {
char **e = env;
while (e = penv) {
free(*e++);
}
free(env);
return NULL;
}
}

 Your move.  Which would you prefer?
  1) two patches: one replacing all leading TABs with equivalent spaces,
  then the above patch
  2) one patch, indented using TABs, in spite of the checkpatch failure
  3) one patch, indented using spaces, in spite of the consistency issue
 
 1) (or 3). Though for v1.1, maybe 3) is the smaller fix and later do 1) for 
 1.2.

A patch replacing tabs by spaces isn't really the kind of patches that
we would want to avoid during freeze. It's easy enough to check with git
diff -w that it doesn't change anything semantically.

Kevin

Re: [Qemu-devel] [PATCH] rbd: hook up cache options

2012-05-22 Thread Kevin Wolf

Am 17.05.2012 22:42, schrieb Josh Durgin:
 Writeback caching was added in Ceph 0.46, and writethrough will be in
 0.47. These are controlled by general config options, so there's no
 need to check for librbd version.
 
 Signed-off-by: Josh Durgin josh.dur...@inktank.com

Thanks, applied to the block-next branch for 1.2.

Kevin

Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu

2012-05-22 Thread Kevin Wolf

Am 16.05.2012 12:16, schrieb Pavel Dovgaluk:
 I use qemu under Windows and it has two windows when executes - console and 
 SDL ones.
 When I close SDL window main loop function terminates correctly, and when I 
 close 
 console window to terminate qemu then the code after main loop is not 
 executed.

Is there no way to catch this case and use the regular shutdown
mechanism there as well?

I'm not against your patch and it's probably 1.1 material, but there may
be more shutdown logic that we're missing if you close the console window.

Kevin

[Qemu-devel] [PATCH 0/2] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

Per discussion, let's switch envlist.c to indent with spaces,
and then make the fix:

Jim Meyering (2):
  envlist.c: convert many leading TABs to spaces via expand -i
  envlist.c: handle strdup failure

 envlist.c | 272 --
 1 file changed, 140 insertions(+), 132 deletions(-)

--
1.7.10.2.552.gaa3bb87

[Qemu-devel] [PATCH 1/2] envlist.c: convert many leading TABs to spaces via expand -i

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com


Signed-off-by: Jim Meyering meyer...@redhat.com
---
 envlist.c | 256 +++---
 1 file changed, 128 insertions(+), 128 deletions(-)

diff --git a/envlist.c b/envlist.c
index f2303cd..1d98108 100644
--- a/envlist.c
+++ b/envlist.c
@@ -8,13 +8,13 @@
 #include envlist.h

 struct envlist_entry {
-   const char *ev_var; /* actual env value */
-   QLIST_ENTRY(envlist_entry) ev_link;
+const char *ev_var;/* actual env value */
+QLIST_ENTRY(envlist_entry) ev_link;
 };

 struct envlist {
-   QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
-   size_t el_count;/* number of entries */
+QLIST_HEAD(, envlist_entry) el_entries;/* actual entries */
+size_t el_count;   /* number of entries */
 };

 static int envlist_parse(envlist_t *envlist,
@@ -27,15 +27,15 @@ static int envlist_parse(envlist_t *envlist,
 envlist_t *
 envlist_create(void)
 {
-   envlist_t *envlist;
+envlist_t *envlist;

-   if ((envlist = malloc(sizeof (*envlist))) == NULL)
-   return (NULL);
+if ((envlist = malloc(sizeof (*envlist))) == NULL)
+return (NULL);

-   QLIST_INIT(envlist-el_entries);
-   envlist-el_count = 0;
+QLIST_INIT(envlist-el_entries);
+envlist-el_count = 0;

-   return (envlist);
+return (envlist);
 }

 /*
@@ -44,18 +44,18 @@ envlist_create(void)
 void
 envlist_free(envlist_t *envlist)
 {
-   struct envlist_entry *entry;
+struct envlist_entry *entry;

-   assert(envlist != NULL);
+assert(envlist != NULL);

-   while (envlist-el_entries.lh_first != NULL) {
-   entry = envlist-el_entries.lh_first;
-   QLIST_REMOVE(entry, ev_link);
+while (envlist-el_entries.lh_first != NULL) {
+entry = envlist-el_entries.lh_first;
+QLIST_REMOVE(entry, ev_link);

-   free((char *)entry-ev_var);
-   free(entry);
-   }
-   free(envlist);
+free((char *)entry-ev_var);
+free(entry);
+}
+free(envlist);
 }

 /*
@@ -72,7 +72,7 @@ envlist_free(envlist_t *envlist)
 int
 envlist_parse_set(envlist_t *envlist, const char *env)
 {
-   return (envlist_parse(envlist, env, envlist_setenv));
+return (envlist_parse(envlist, env, envlist_setenv));
 }

 /*
@@ -84,7 +84,7 @@ envlist_parse_set(envlist_t *envlist, const char *env)
 int
 envlist_parse_unset(envlist_t *envlist, const char *env)
 {
-   return (envlist_parse(envlist, env, envlist_unsetenv));
+return (envlist_parse(envlist, env, envlist_unsetenv));
 }

 /*
@@ -97,32 +97,32 @@ static int
 envlist_parse(envlist_t *envlist, const char *env,
 int (*callback)(envlist_t *, const char *))
 {
-   char *tmpenv, *envvar;
-   char *envsave = NULL;
-
-   assert(callback != NULL);
-
-   if ((envlist == NULL) || (env == NULL))
-   return (EINVAL);
-
-   /*
-* We need to make temporary copy of the env string
-* as strtok_r(3) modifies it while it tokenizes.
-*/
-   if ((tmpenv = strdup(env)) == NULL)
-   return (errno);
-
-   envvar = strtok_r(tmpenv, ,, envsave);
-   while (envvar != NULL) {
-   if ((*callback)(envlist, envvar) != 0) {
-   free(tmpenv);
-   return (errno);
-   }
-   envvar = strtok_r(NULL, ,, envsave);
-   }
-
-   free(tmpenv);
-   return (0);
+char *tmpenv, *envvar;
+char *envsave = NULL;
+
+assert(callback != NULL);
+
+if ((envlist == NULL) || (env == NULL))
+return (EINVAL);
+
+/*
+ * We need to make temporary copy of the env string
+ * as strtok_r(3) modifies it while it tokenizes.
+ */
+if ((tmpenv = strdup(env)) == NULL)
+return (errno);
+
+envvar = strtok_r(tmpenv, ,, envsave);
+while (envvar != NULL) {
+if ((*callback)(envlist, envvar) != 0) {
+free(tmpenv);
+return (errno);
+}
+envvar = strtok_r(NULL, ,, envsave);
+}
+
+free(tmpenv);
+return (0);
 }

 /*
@@ -134,46 +134,46 @@ envlist_parse(envlist_t *envlist, const char *env,
 int
 envlist_setenv(envlist_t *envlist, const char *env)
 {
-   struct envlist_entry *entry = NULL;
-   const char *eq_sign;
-   size_t envname_len;
-
-   if ((envlist == NULL) || (env == NULL))
-   return (EINVAL);
-
-   /* find out first equals sign in given env */
-   if ((eq_sign = strchr(env, '=')) == NULL)
-   return (EINVAL);
-   envname_len = eq_sign - env + 1;
-
-   /*
-* If there

[Qemu-devel] [PATCH 2/2] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

Without this, envlist_to_environ may silently fail to copy all
strings into the destination buffer, and both callers would leak
any env strings allocated after a failing strdup, because the
freeing code stops at the first NULL pointer.

Signed-off-by: Jim Meyering meyer...@redhat.com
---
 envlist.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/envlist.c b/envlist.c
index 1d98108..0dfb56c 100644
--- a/envlist.c
+++ b/envlist.c
@@ -234,8 +234,16 @@ envlist_to_environ(const envlist_t *envlist, size_t *count)
 return (NULL);

 for (entry = envlist-el_entries.lh_first; entry != NULL;
-entry = entry-ev_link.le_next) {
-*(penv++) = strdup(entry-ev_var);
+ entry = entry-ev_link.le_next, penv++) {
+*penv = strdup(entry-ev_var);
+if (*penv == NULL) {
+char **e = env;
+while (e = penv) {
+free(*e++);
+}
+free(env);
+return NULL;
+}
 }
 *penv = NULL; /* NULL terminate the list */

-- 
1.7.10.2.552.gaa3bb87

Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

Kevin Wolf wrote:
 A patch replacing tabs by spaces isn't really the kind of patches that
 we would want to avoid during freeze. It's easy enough to check with git
 diff -w that it doesn't change anything semantically.

That makes sense, so I've posted two patches:

1) two patches: one replacing all leading TABs with equivalent spaces,
  then the above patch

Note however, that envlist.c must predate checkpatch.pl, since
that patch provokes numerous warnings about style issues:
(hmm... I've just noticed these first two, which seem to suggest
that even for comments I should remove the TABs.  Is it worth
a V2 to fix those two lines?  Or are they false positives, since
they're not really code indentation?  )

ERROR: code indent should never use tabs
#23: FILE: envlist.c:11:
+const char *ev_var;^I^I^I/* actual env value */$

ERROR: code indent should never use tabs
#30: FILE: envlist.c:16:
+QLIST_HEAD(, envlist_entry) el_entries;^I/* actual entries */$

ERROR: code indent should never use tabs
#31: FILE: envlist.c:17:
+size_t el_count;^I^I^I/* number of entries */$

WARNING: space prohibited between function name and open parenthesis '('
#44: FILE: envlist.c:32:
+if ((envlist = malloc(sizeof (*envlist))) == NULL)

ERROR: do not use assignment in if condition
#44: FILE: envlist.c:32:
+if ((envlist = malloc(sizeof (*envlist))) == NULL)

WARNING: braces {} are necessary for all arms of this statement
#44: FILE: envlist.c:32:
+if ((envlist = malloc(sizeof (*envlist))) == NULL)
[...]

ERROR: return is not a function, parentheses are not required
#45: FILE: envlist.c:33:
+return (NULL);

ERROR: return is not a function, parentheses are not required
#53: FILE: envlist.c:38:
+return (envlist);

ERROR: return is not a function, parentheses are not required
#90: FILE: envlist.c:75:
+return (envlist_parse(envlist, env, envlist_setenv));

ERROR: return is not a function, parentheses are not required
#99: FILE: envlist.c:87:
+return (envlist_parse(envlist, env, envlist_unsetenv));

WARNING: braces {} are necessary for all arms of this statement
#138: FILE: envlist.c:105:
+if ((envlist == NULL) || (env == NULL))
[...]

ERROR: return is not a function, parentheses are not required
#139: FILE: envlist.c:106:
+return (EINVAL);

ERROR: do not use assignment in if condition
#145: FILE: envlist.c:112:
+if ((tmpenv = strdup(env)) == NULL)

WARNING: braces {} are necessary for all arms of this statement
#145: FILE: envlist.c:112:
+if ((tmpenv = strdup(env)) == NULL)
[...]

ERROR: return is not a function, parentheses are not required
#146: FILE: envlist.c:113:
+return (errno);

ERROR: return is not a function, parentheses are not required
#152: FILE: envlist.c:119:
+return (errno);

ERROR: return is not a function, parentheses are not required
#158: FILE: envlist.c:125:
+return (0);

WARNING: braces {} are necessary for all arms of this statement
#210: FILE: envlist.c:141:
+if ((envlist == NULL) || (env == NULL))
[...]

ERROR: return is not a function, parentheses are not required
#211: FILE: envlist.c:142:
+return (EINVAL);

ERROR: do not use assignment in if condition
#214: FILE: envlist.c:145:
+if ((eq_sign = strchr(env, '=')) == NULL)

WARNING: braces {} are necessary for all arms of this statement
#214: FILE: envlist.c:145:
+if ((eq_sign = strchr(env, '=')) == NULL)
[...]

ERROR: return is not a function, parentheses are not required
#215: FILE: envlist.c:146:
+return (EINVAL);

WARNING: braces {} are necessary for all arms of this statement
#225: FILE: envlist.c:156:
+if (strncmp(entry-ev_var, env, envname_len) == 0)
[...]

WARNING: space prohibited between function name and open parenthesis '('
#237: FILE: envlist.c:168:
+if ((entry = malloc(sizeof (*entry))) == NULL)

ERROR: do not use assignment in if condition
#237: FILE: envlist.c:168:
+if ((entry = malloc(sizeof (*entry))) == NULL)

WARNING: braces {} are necessary for all arms of this statement
#237: FILE: envlist.c:168:
+if ((entry = malloc(sizeof (*entry))) == NULL)
[...]

ERROR: return is not a function, parentheses are not required
#238: FILE: envlist.c:169:
+return (errno);

ERROR: do not use assignment in if condition
#239: FILE: envlist.c:170:
+if ((entry-ev_var = strdup(env)) == NULL) {

ERROR: return is not a function, parentheses are not required
#241: FILE: envlist.c:172:
+return (errno);

ERROR: return is not a function, parentheses are not required
#245: FILE: envlist.c:176:
+return (0);

WARNING: braces {} are necessary for all arms of this statement
#284: FILE: envlist.c:189:
+if ((envlist == NULL) || (env == NULL))
[...]

ERROR: return is not a function, parentheses are not required
#285: FILE: envlist.c:190:
+

Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes

2012-05-22 Thread ronnie sahlberg

Hi Stefan

On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
 Yes, I use IDE since I boot from this LUN.


 I just managed to track it down to the IDE changes.

 It looks like basically this change triggered it :

 commit bef0fd5958120542f126f2dedbfce65d8839a94d
 Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Date:   Thu Mar 29 10:31:30 2012 +0100
    ide: convert ide_sector_read() to asynchronous I/O


 Stefan, any ideas?   Ill continue trying to understand the IDE code
 tomorrow and maybe I make some progress.

 It sounds like aio for iSCSI is broken.  IDE submits bdrv_aio_readv()
 requests which iSCSI handles.  QEMU main loop should find that the
 iSCSI socket file descriptor becomes readable when the SCSI response
 arrives.  We can then report I/O completion back to the guest.

 However, it appears that completions are not triggering.  My guess is
 that once the OS is fully booted you don't see the problem because
 other events are pumping the main loop besides a 55ms timer.
 Therefore this problem probably applies across the board but it's most
 noticable when the BIOS/bootloader/early OS is executing.

I booted off a CDROM image and then had iscsi device as a disk.
From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null
bs=512 iflag=direct'
but did not see these 55ms delays anymore.
I did however see a ~4ms delay .



 Have you added debugging into block/iscsi.c or libiscsi to watch when
 read handlers get called?  Using tcpdump or strace might also be
 useful to figure out when the reply becomes available and why we
 aren't registering a read handler function with the QEMU main loop.

I just did that, and I print a message and timeval to stdout for the
functions in block/iscsi.c
What seems to happen is that we call iscsi_aio_readv() which sets up a
write event  but this event is not triggered for another 55ms.
Since the descriptor is writeable at this point,  i would expected
that the write_event_callback would trigger almost immediately.


Below   iscsi_aio_readv   is printed when we enter this function.
This function prepares a CDB and sets it up for queueing,  but it does
not actually write it to the socket.
At the end of this function, we set up the events  by calling
iscsi_set_events() and we set it up for writeable.

iscsi_process_write() is when we enter the fd-is-writeable callback.

iscsi_aio_readv   1337676989.429822
iscsi_set_events   write_event?Y  1337676989.429834
iscsi_process_write   1337676989.484424
iscsi_set_events   write_event?N  1337676989.484507
iscsi_process_read1337676989.484628
iscsi_aio_read10_cb   1337676989.484647
iscsi_set_events   write_event?N  1337676989.484660
iscsi_readv_writev_bh_cb  1337676989.484665
iscsi_aio_readv   1337676989.484751


So  what happens seems to be that once we have set up the i/o  and
queued it,  and we set up the writeable event,
but this event is not triggered immediately! It takes another 55ms
before the event is triggered.


To me this looks like a bug in the eventsystem. The socket is always
writeable. shouldnt the callback be invoked almost immediately here ?


regards
ronnie sahlberg

Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes

2012-05-22 Thread ronnie sahlberg

Hi,

Now that I see what happens, I can easily workaround this in block/iscsi.c
by the patch below, but I dont know if this is the right thing to do.

It does appear that here,  when I use qemu_set_fd_handler() and add a
handler for writeble it takes 55ms before the event system notices
this and reacts.



diff --git a/block/iscsi.c b/block/iscsi.c
index d37c4ee..1ebff0f 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
 {
 struct iscsi_context *iscsi = iscsilun-iscsi;

+if (iscsi_which_events(iscsi)  POLLOUT) {
+iscsi_process_write(iscsilun);
+}
+
 qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
(iscsi_which_events(iscsi)  POLLOUT)
? iscsi_process_write : NULL,




On Tue, May 22, 2012 at 7:07 PM, ronnie sahlberg
ronniesahlb...@gmail.com wrote:
 Hi Stefan

 On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
 Yes, I use IDE since I boot from this LUN.


 I just managed to track it down to the IDE changes.

 It looks like basically this change triggered it :

 commit bef0fd5958120542f126f2dedbfce65d8839a94d
 Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Date:   Thu Mar 29 10:31:30 2012 +0100
    ide: convert ide_sector_read() to asynchronous I/O


 Stefan, any ideas?   Ill continue trying to understand the IDE code
 tomorrow and maybe I make some progress.

 It sounds like aio for iSCSI is broken.  IDE submits bdrv_aio_readv()
 requests which iSCSI handles.  QEMU main loop should find that the
 iSCSI socket file descriptor becomes readable when the SCSI response
 arrives.  We can then report I/O completion back to the guest.

 However, it appears that completions are not triggering.  My guess is
 that once the OS is fully booted you don't see the problem because
 other events are pumping the main loop besides a 55ms timer.
 Therefore this problem probably applies across the board but it's most
 noticable when the BIOS/bootloader/early OS is executing.

 I booted off a CDROM image and then had iscsi device as a disk.
 From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null
 bs=512 iflag=direct'
 but did not see these 55ms delays anymore.
 I did however see a ~4ms delay .



 Have you added debugging into block/iscsi.c or libiscsi to watch when
 read handlers get called?  Using tcpdump or strace might also be
 useful to figure out when the reply becomes available and why we
 aren't registering a read handler function with the QEMU main loop.

 I just did that, and I print a message and timeval to stdout for the
 functions in block/iscsi.c
 What seems to happen is that we call iscsi_aio_readv() which sets up a
 write event  but this event is not triggered for another 55ms.
 Since the descriptor is writeable at this point,  i would expected
 that the write_event_callback would trigger almost immediately.


 Below   iscsi_aio_readv   is printed when we enter this function.
 This function prepares a CDB and sets it up for queueing,  but it does
 not actually write it to the socket.
 At the end of this function, we set up the events  by calling
 iscsi_set_events() and we set it up for writeable.

 iscsi_process_write() is when we enter the fd-is-writeable callback.

 iscsi_aio_readv                   1337676989.429822
 iscsi_set_events   write_event?Y  1337676989.429834
 iscsi_process_write               1337676989.484424
 iscsi_set_events   write_event?N  1337676989.484507
 iscsi_process_read                1337676989.484628
 iscsi_aio_read10_cb               1337676989.484647
 iscsi_set_events   write_event?N  1337676989.484660
 iscsi_readv_writev_bh_cb          1337676989.484665
 iscsi_aio_readv                   1337676989.484751


 So  what happens seems to be that once we have set up the i/o  and
 queued it,  and we set up the writeable event,
 but this event is not triggered immediately! It takes another 55ms
 before the event is triggered.


 To me this looks like a bug in the eventsystem. The socket is always
 writeable. shouldnt the callback be invoked almost immediately here ?


 regards
 ronnie sahlberg

Re: [Qemu-devel] [PATCH] rbd: hook up cache options

2012-05-22 Thread Paolo Bonzini

Il 17/05/2012 22:42, Josh Durgin ha scritto:
 + * Fallback to more conservative semantics if setting cache
 + * options fails. Ignore errors from setting rbd_cache because the
 + * only possible error is that the option does not exist, and
 + * librbd defaults to no caching. If write through caching cannot
 + * be set up, fall back to no caching.
 + */
 +if (flags  BDRV_O_NOCACHE) {
 +rados_conf_set(s-cluster, rbd_cache, false);
 +} else {
 +rados_conf_set(s-cluster, rbd_cache, true);
 +if (!(flags  BDRV_O_CACHE_WB)) {
 +r = rados_conf_set(s-cluster, rbd_cache_max_dirty, 0);
 +if (r  0) {
 +rados_conf_set(s-cluster, rbd_cache, false);
 +}
 +}
 +}

Last time I looked at ceph, rbd_flush was not a full flush of the cache;
it only ensured that the pending requests were sent.  So my questions are:

1) has this changed?  does rbd_flush now flush dirty items when
rbd_cache_max_dirty  0?

2) should the usage of a cache be conditional on LIBRBD_VERSION_CODE =
LIBRBD_VERSION(0, 1, 1)?

Thanks,

Paolo

Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu

2012-05-22 Thread Paolo Bonzini

Il 22/05/2012 10:51, Kevin Wolf ha scritto:
 Am 16.05.2012 12:16, schrieb Pavel Dovgaluk:
 I use qemu under Windows and it has two windows when executes - console and 
 SDL ones.
 When I close SDL window main loop function terminates correctly, and when I 
 close 
 console window to terminate qemu then the code after main loop is not 
 executed.
 
 Is there no way to catch this case and use the regular shutdown
 mechanism there as well?
 
 I'm not against your patch and it's probably 1.1 material, but there may
 be more shutdown logic that we're missing if you close the console window.

Looks like qemu_ctrl_handler (in os-win32.c) needs to do something like
termsig_handler (in os-posix.c) instead of a plain exit.

Paolo

Re: [Qemu-devel] [RFC:kvm] export host NUMA info to guest make emulated device NUMA attr

2012-05-22 Thread Liu ping fan

On Sat, May 19, 2012 at 12:14 AM, Shirley Ma mashi...@us.ibm.com wrote:
 On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote:
 Currently, the guest can not know the NUMA info of the vcpu, which
 will
 result in performance drawback.

 This is the discovered and experiment by
         Shirley Ma x...@us.ibm.com
         Krishna Kumar krkum...@in.ibm.com
         Tom Lendacky t...@us.ibm.com
 Refer to -
 http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html
 we can see the big perfermance gap between NUMA aware and unaware.

 Enlightened by their discovery, I think, we can do more work -- that
 is to
 export NUMA info of host to guest.

 There three problems we've found:

 1. KVM doesn't support NUMA load balancer. Even there are no other
 workloads in the system, and the number of vcpus on the guest is smaller
 than the number of cpus per node, the vcpus could be scheduled on
 different nodes.

 Someone is working on in-kernel solution. Andrew Theurer has a working
 user-space NUMA aware VM balancer, it requires libvirt and cgroups
 (which is default for RHEL6 systems).

Interesting, and I found that sched/numa: Introduce
sys_numa_{t,m}bind() committed by Peter and Ingo may help.
But I think from the guest view, it can not tell whether the two vcpus
are on the same host node. For example,
vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more
expensive if it pull_task from vcpu-a and
choose vcpu-b to push.  And my idea is to export such info to guest,
still working on it.


 2. The host scheduler is not aware the relationship between guest vCPUs
 and vhost. So it's possible for host scheduler to schedule per-device
 vhost thread on the same cpu on which the vCPU kick a TX packet, or
 schecule vhost thread on different node than the vCPU for; For RX packet
 it's possible for vhost delivers RX packet on the vCPU running on
 different node too.

Yes. I notice this point in your original patch.

 3. per-device vhost thread is not scaled.

What about the scale-ability of per-vm * host_NUMA_NODE? When we make
advantage of multi-core,  we produce mulit vcpu threads for one VM.
So what about the emulated device? Is it acceptable to scale to take
advantage of host NUMA attr.  After all, how many nodes on which the
VM
can be run on are the user's control.  It is a balance of
scale-ability and performance.

 So the problems are in host scheduling and vhost thread scalability. I
 am not sure how much help from exposing NUMA info from host to guest.

 Have you tested these patched? How much performance gain here?

Sorry, not yet.  As you have mentioned, the vhost thread scalability
is a big problem. So I want to see others' opinion before going on.

Thanks and regards,
pingfan


 Thanks
 Shirley

 So here comes the idea:
 1. export host numa info through guest's sched domain to its scheduler
   Export vcpu's NUMA info to guest scheduler(I think mem NUMA problem
   has been handled by host).  So the guest's lb will consider the
 cost.
   I am still working on this, and my original idea is to export these
 info
   through static struct sched_domain_topology_level
 *sched_domain_topology
   to guest.

 2. Do a better emulation of virt mach exported to guest.
   In real world, the devices are limited by kinds of reasons to own
 the NUMA
   property. But as to Qemu, the device is emulated by thread, which
 inherit
   the NUMA attr in nature.  We can implement the device as components
 of many
   logic units, each of the unit is backed by a thread in different
 host node.
   Currently, I want to start the work on vhost. But I think, maybe in
   future, the iothread in Qemu can also has such attr.


 Forgive me, for the limited time, I can not have more better
 understand of
 vhost/virtio_net drivers. These patches are just draft, _FAR_, _FAR_
 from work.
 I will do more detail work for them in future.

 To easy the review, the following is the sum up of the 2nd point of
 the idea.
 As for the 1st point of the idea, it is not reflected in the patches.

 --spread/shrink the vhost_workers over the host nodes as demanded from
 Qemu.
   And we can consider each vhost_worker as an independent net logic
 device
   embeded in physical device vhost_net.  At the meanwhile, we spread
 vcpu
   threads over the host node.
   The vrings on guest are allocated PAGE_SIZE align separately, so
 they can
   will only be mapped into different host node, so vhost_worker in the
 same
   node can access it with the least cost. So does the vq on guest.

 --virtio_net driver will changes and talk with the logic device. And
 which
   logic device it will talk to is determined by on which vcpu it is
 scheduled.

 --the binding of vcpus and vhost_worker is implemented by:
   for call direction, vq-a in the node-A will have a dedicated irq-a.
 And
   we set the irq-a's affinity to vcpus in node-A.
   for kick direction, kick register-b trigger different eventfd-b
 which wake up
   vhost_worker-b.

Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes

2012-05-22 Thread Paolo Bonzini

Il 22/05/2012 11:15, ronnie sahlberg ha scritto:
 Hi,
 
 Now that I see what happens, I can easily workaround this in block/iscsi.c
 by the patch below, but I dont know if this is the right thing to do.
 
 It does appear that here,  when I use qemu_set_fd_handler() and add a
 handler for writeble it takes 55ms before the event system notices
 this and reacts.
 
 
 
 diff --git a/block/iscsi.c b/block/iscsi.c
 index d37c4ee..1ebff0f 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
  {
  struct iscsi_context *iscsi = iscsilun-iscsi;
 
 +if (iscsi_which_events(iscsi)  POLLOUT) {
 +iscsi_process_write(iscsilun);
 +}
 +
  qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
 (iscsi_which_events(iscsi)  POLLOUT)
 ? iscsi_process_write : NULL,

Doh, now I remember.  Whenever you change the aio handlers you need to
call qemu_notify_event() afterwards, if the handler may fire right away.
 The alternative is your patch, which is correct.  Can you resend it
with SoB?

Paolo

Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure

2012-05-22 Thread Kevin Wolf

Am 22.05.2012 11:05, schrieb Jim Meyering:
 Kevin Wolf wrote:
 A patch replacing tabs by spaces isn't really the kind of patches that
 we would want to avoid during freeze. It's easy enough to check with git
 diff -w that it doesn't change anything semantically.
 
 That makes sense, so I've posted two patches:
 
 1) two patches: one replacing all leading TABs with equivalent spaces,
   then the above patch
 
 Note however, that envlist.c must predate checkpatch.pl, since
 that patch provokes numerous warnings about style issues:

Quite possible. Most files in qemu will still contain some violations.

 (hmm... I've just noticed these first two, which seem to suggest
 that even for comments I should remove the TABs.  Is it worth
 a V2 to fix those two lines?  Or are they false positives, since
 they're not really code indentation?  )

We don't use tabs at all where possible. There shouldn't be much more
exceptions than the Makefiles.

Kevin

Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes

2012-05-22 Thread ronnie sahlberg

On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/05/2012 11:15, ronnie sahlberg ha scritto:
 Hi,

 Now that I see what happens, I can easily workaround this in block/iscsi.c
 by the patch below, but I dont know if this is the right thing to do.

 It does appear that here,  when I use qemu_set_fd_handler() and add a
 handler for writeble it takes 55ms before the event system notices
 this and reacts.



 diff --git a/block/iscsi.c b/block/iscsi.c
 index d37c4ee..1ebff0f 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
  {
      struct iscsi_context *iscsi = iscsilun-iscsi;

 +    if (iscsi_which_events(iscsi)  POLLOUT) {
 +        iscsi_process_write(iscsilun);
 +    }
 +
      qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
                             (iscsi_which_events(iscsi)  POLLOUT)
                             ? iscsi_process_write : NULL,

 Doh, now I remember.  Whenever you change the aio handlers you need to
 call qemu_notify_event() afterwards, if the handler may fire right away.

Thanks.  I just confirmed that qemu_notify_event() fixes the issue.
Ill send a patch that uses qemu_notify_event() and a comment why this is needed.


regards
ronnie sahlberg

[Qemu-devel] [PATCHv2 0/2] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

This is the same as v1, except that two lines of non-leading TABs
in envlist.c (indenting comments after code) have also been converted
to use equivalent spaces instead of TABs.

Jim Meyering (2):
  envlist.c: convert all TABs to equivalent spaces
  envlist.c: handle strdup failure

 envlist.c | 272 --
 1 file changed, 140 insertions(+), 132 deletions(-)

--
1.7.10.2.552.gaa3bb87

[Qemu-devel] [PATCHv2 2/2] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

Without this, envlist_to_environ may silently fail to copy all
strings into the destination buffer, and both callers would leak
any env strings allocated after a failing strdup, because the
freeing code stops at the first NULL pointer.

Signed-off-by: Jim Meyering meyer...@redhat.com
---
 envlist.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/envlist.c b/envlist.c
index be0addb..7532091 100644
--- a/envlist.c
+++ b/envlist.c
@@ -234,8 +234,16 @@ envlist_to_environ(const envlist_t *envlist, size_t *count)
 return (NULL);

 for (entry = envlist-el_entries.lh_first; entry != NULL;
-entry = entry-ev_link.le_next) {
-*(penv++) = strdup(entry-ev_var);
+ entry = entry-ev_link.le_next, penv++) {
+*penv = strdup(entry-ev_var);
+if (*penv == NULL) {
+char **e = env;
+while (e = penv) {
+free(*e++);
+}
+free(env);
+return NULL;
+}
 }
 *penv = NULL; /* NULL terminate the list */

-- 
1.7.10.2.552.gaa3bb87

[Qemu-devel] [PATCHv2 1/2] envlist.c: convert all TABs to equivalent spaces

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com


Signed-off-by: Jim Meyering meyer...@redhat.com
---
 envlist.c | 256 +++---
 1 file changed, 128 insertions(+), 128 deletions(-)

diff --git a/envlist.c b/envlist.c
index f2303cd..be0addb 100644
--- a/envlist.c
+++ b/envlist.c
@@ -8,13 +8,13 @@
 #include envlist.h

 struct envlist_entry {
-   const char *ev_var; /* actual env value */
-   QLIST_ENTRY(envlist_entry) ev_link;
+const char *ev_var; /* actual env value */
+QLIST_ENTRY(envlist_entry) ev_link;
 };

 struct envlist {
-   QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
-   size_t el_count;/* number of entries */
+QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
+size_t el_count;/* number of entries */
 };

 static int envlist_parse(envlist_t *envlist,
@@ -27,15 +27,15 @@ static int envlist_parse(envlist_t *envlist,
 envlist_t *
 envlist_create(void)
 {
-   envlist_t *envlist;
+envlist_t *envlist;

-   if ((envlist = malloc(sizeof (*envlist))) == NULL)
-   return (NULL);
+if ((envlist = malloc(sizeof (*envlist))) == NULL)
+return (NULL);

-   QLIST_INIT(envlist-el_entries);
-   envlist-el_count = 0;
+QLIST_INIT(envlist-el_entries);
+envlist-el_count = 0;

-   return (envlist);
+return (envlist);
 }

 /*
@@ -44,18 +44,18 @@ envlist_create(void)
 void
 envlist_free(envlist_t *envlist)
 {
-   struct envlist_entry *entry;
+struct envlist_entry *entry;

-   assert(envlist != NULL);
+assert(envlist != NULL);

-   while (envlist-el_entries.lh_first != NULL) {
-   entry = envlist-el_entries.lh_first;
-   QLIST_REMOVE(entry, ev_link);
+while (envlist-el_entries.lh_first != NULL) {
+entry = envlist-el_entries.lh_first;
+QLIST_REMOVE(entry, ev_link);

-   free((char *)entry-ev_var);
-   free(entry);
-   }
-   free(envlist);
+free((char *)entry-ev_var);
+free(entry);
+}
+free(envlist);
 }

 /*
@@ -72,7 +72,7 @@ envlist_free(envlist_t *envlist)
 int
 envlist_parse_set(envlist_t *envlist, const char *env)
 {
-   return (envlist_parse(envlist, env, envlist_setenv));
+return (envlist_parse(envlist, env, envlist_setenv));
 }

 /*
@@ -84,7 +84,7 @@ envlist_parse_set(envlist_t *envlist, const char *env)
 int
 envlist_parse_unset(envlist_t *envlist, const char *env)
 {
-   return (envlist_parse(envlist, env, envlist_unsetenv));
+return (envlist_parse(envlist, env, envlist_unsetenv));
 }

 /*
@@ -97,32 +97,32 @@ static int
 envlist_parse(envlist_t *envlist, const char *env,
 int (*callback)(envlist_t *, const char *))
 {
-   char *tmpenv, *envvar;
-   char *envsave = NULL;
-
-   assert(callback != NULL);
-
-   if ((envlist == NULL) || (env == NULL))
-   return (EINVAL);
-
-   /*
-* We need to make temporary copy of the env string
-* as strtok_r(3) modifies it while it tokenizes.
-*/
-   if ((tmpenv = strdup(env)) == NULL)
-   return (errno);
-
-   envvar = strtok_r(tmpenv, ,, envsave);
-   while (envvar != NULL) {
-   if ((*callback)(envlist, envvar) != 0) {
-   free(tmpenv);
-   return (errno);
-   }
-   envvar = strtok_r(NULL, ,, envsave);
-   }
-
-   free(tmpenv);
-   return (0);
+char *tmpenv, *envvar;
+char *envsave = NULL;
+
+assert(callback != NULL);
+
+if ((envlist == NULL) || (env == NULL))
+return (EINVAL);
+
+/*
+ * We need to make temporary copy of the env string
+ * as strtok_r(3) modifies it while it tokenizes.
+ */
+if ((tmpenv = strdup(env)) == NULL)
+return (errno);
+
+envvar = strtok_r(tmpenv, ,, envsave);
+while (envvar != NULL) {
+if ((*callback)(envlist, envvar) != 0) {
+free(tmpenv);
+return (errno);
+}
+envvar = strtok_r(NULL, ,, envsave);
+}
+
+free(tmpenv);
+return (0);
 }

 /*
@@ -134,46 +134,46 @@ envlist_parse(envlist_t *envlist, const char *env,
 int
 envlist_setenv(envlist_t *envlist, const char *env)
 {
-   struct envlist_entry *entry = NULL;
-   const char *eq_sign;
-   size_t envname_len;
-
-   if ((envlist == NULL) || (env == NULL))
-   return (EINVAL);
-
-   /* find out first equals sign in given env */
-   if ((eq_sign = strchr(env, '=')) == NULL)
-   return (EINVAL);
-   envname_len = eq_sign - env + 1;
-
-   /*
-* If there already

Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

Kevin Wolf wrote:

 Am 22.05.2012 11:05, schrieb Jim Meyering:
 Kevin Wolf wrote:
 A patch replacing tabs by spaces isn't really the kind of patches that
 we would want to avoid during freeze. It's easy enough to check with git
 diff -w that it doesn't change anything semantically.

 That makes sense, so I've posted two patches:

 1) two patches: one replacing all leading TABs with equivalent spaces,
   then the above patch

 Note however, that envlist.c must predate checkpatch.pl, since
 that patch provokes numerous warnings about style issues:

 Quite possible. Most files in qemu will still contain some violations.

 (hmm... I've just noticed these first two, which seem to suggest
 that even for comments I should remove the TABs.  Is it worth
 a V2 to fix those two lines?  Or are they false positives, since
 they're not really code indentation?  )

 We don't use tabs at all where possible. There shouldn't be much more
 exceptions than the Makefiles.

Sounds like a v2 would be welcome.
Here you go...

Re: [Qemu-devel] qemu-microblaze-system

2012-05-22 Thread Michael Trimarchi

Hi Peter

On 05/22/2012 07:45 AM, Peter Crosthwaite wrote:
 Hi Michael,
 
 The microblaze linux kernel is available at:
 
 http://wiki.xilinx.com/
 
 If you wish to do your own kernel config from scratch. There is some
 brief documentation there on how to do a bringup.
 
 The qemu port for mb maintained by PetaLogix and I notice you are
 using a student email domain, so you likely eligible for a free
 academic license of the petalogix tools which automate the board
 bringup/kernel config etc, (and it will include our version of QEMU
 which can boot a lot more FPGA-microblaze platforms than just the
 ml605 ans s3adsp):

Yes I know, but what is the difference with the mainline qemu?

I'm working for an European Project that use microblaze and multiple pVex cores.
I have already done a driver that can use the rVex throught microblaze but
we need to do some simulation on qemu

 
 www.petalogix.com
 
 I have a Xilinx XPS project that exactly matches the ML605 qemu
 platform that I can send you as well, and I can also send you raw
 kernel config if that helps?

Yes, this can definitely help. Can you send both?

Regards
Michael



 
 Peter
 
 On Mon, May 14, 2012 at 7:42 PM, Michael Trimarchi
 trimar...@gandalf.sssup.it wrote:
 Hi all,

 I would like to start using/testing the qemu microblaze system, but I don't 
 find information about linux kernel configuration that match the ml605 
 hardware design? Is there any howto?

 Michael

Re: [Qemu-devel] [PATCHv2 1/2] envlist.c: convert all TABs to equivalent spaces

2012-05-22 Thread Peter Maydell

On 22 May 2012 10:50, Jim Meyering j...@meyering.net wrote:
 From: Jim Meyering meyer...@redhat.com

If we're going to go to the effort of a complete reindent
patch we should actually reindent to the QEMU coding style
standard, which is four-space, not eight.

-- PMM

[Qemu-devel] [PATCH] ISCSI: We need to call qemu_notify_event() everytime we update which events we need to be notified for.

2012-05-22 Thread Ronnie Sahlberg

Otherwise, If we add an event for -is-writeable but the socket is already 
writeable there may be a short delay before the event callback is actually 
triggered.

Those delays would in particular hurt performance during BIOS boot and when the 
GRUB bootloader reads the kernel and initrd.

Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com
---
 block/iscsi.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index d37c4ee..f956824 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -109,6 +109,13 @@ iscsi_set_events(IscsiLun *iscsilun)
(iscsi_which_events(iscsi)  POLLOUT)
? iscsi_process_write : NULL,
iscsi_process_flush, iscsilun);
+
+/* If we just added the event for writeable we must call
+   and the socket is already writeable the callback might
+   not be invoked until after a short delay unless we call
+   qemu_notify_event().
+ */
+qemu_notify_event();
 }
 
 static void
-- 
1.7.3.1

Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes

2012-05-22 Thread ronnie sahlberg

None of the other drivers in block/*.c call qemu_notify_event()

Do you you think those should be audited and have this call added to
where required ?


regards
ronnie sahlberg

On Tue, May 22, 2012 at 7:48 PM, ronnie sahlberg
ronniesahlb...@gmail.com wrote:
 On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/05/2012 11:15, ronnie sahlberg ha scritto:
 Hi,

 Now that I see what happens, I can easily workaround this in block/iscsi.c
 by the patch below, but I dont know if this is the right thing to do.

 It does appear that here,  when I use qemu_set_fd_handler() and add a
 handler for writeble it takes 55ms before the event system notices
 this and reacts.



 diff --git a/block/iscsi.c b/block/iscsi.c
 index d37c4ee..1ebff0f 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
  {
      struct iscsi_context *iscsi = iscsilun-iscsi;

 +    if (iscsi_which_events(iscsi)  POLLOUT) {
 +        iscsi_process_write(iscsilun);
 +    }
 +
      qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
                             (iscsi_which_events(iscsi)  POLLOUT)
                             ? iscsi_process_write : NULL,

 Doh, now I remember.  Whenever you change the aio handlers you need to
 call qemu_notify_event() afterwards, if the handler may fire right away.

 Thanks.  I just confirmed that qemu_notify_event() fixes the issue.
 Ill send a patch that uses qemu_notify_event() and a comment why this is 
 needed.


 regards
 ronnie sahlberg

Re: [Qemu-devel] Current differences between qemu --enable-kvm and qemu-kvm?

2012-05-22 Thread Stefan Hajnoczi

On Mon, May 21, 2012 at 5:54 PM, Erik Rull webmas...@rdsoftware.de wrote:
 is there a summary existing that shows up the rough or actual differences
 between qemu --enable-kvm and qemu-kvm? I tested both versions with the same
 compile and start options, the CPU performance results are identical, only
 the bootup time of my guest system with qemu-kvm seemed to be a bit faster
 (not measured, it just feeled so).

For production KVM instances I think it still makes sense to use
qemu-kvm packages from your distro or qemu-kvm upstream source.

Jan Kiszka has reduced the delta between qemu.git and qemu-kvm.git to
the point where I think the list of differences is rather small -
maybe PCI passthrough stuff, irqfd for vhost-net (which is now also
being upstreamed into qemu.git), and a few other things I don't know
of.

For development most patches should be against qemu.git unless they
have a dependency on qemu-kvm.git code.

Stefan

[Qemu-devel] [PATCH] ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN.

2012-05-22 Thread Ronnie Sahlberg

This allows using LUNs bigger than 2TB.

Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com
---
 block/iscsi.c |  101 
 trace-events  |4 +-
 2 files changed, 74 insertions(+), 31 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index f956824..ed1ad7b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -25,6 +25,7 @@
 #include config-host.h
 
 #include poll.h
+#include arpa/inet.h
 #include qemu-common.h
 #include qemu-error.h
 #include block_int.h
@@ -168,12 +169,12 @@ iscsi_readv_writev_bh_cb(void *p)
 
 
 static void
-iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
+iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
  void *command_data, void *opaque)
 {
 IscsiAIOCB *acb = opaque;
 
-trace_iscsi_aio_write10_cb(iscsi, status, acb, acb-canceled);
+trace_iscsi_aio_write16_cb(iscsi, status, acb, acb-canceled);
 
 g_free(acb-buf);
 
@@ -186,7 +187,7 @@ iscsi_aio_write10_cb(struct iscsi_context *iscsi, int 
status,
 
 acb-status = 0;
 if (status  0) {
-error_report(Failed to write10 data to iSCSI lun. %s,
+error_report(Failed to write16 data to iSCSI lun. %s,
  iscsi_get_error(iscsi));
 acb-status = -EIO;
 }
@@ -211,12 +212,9 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
 struct iscsi_context *iscsi = iscsilun-iscsi;
 IscsiAIOCB *acb;
 size_t size;
-int fua = 0;
-
-/* set FUA on writes when cache mode is write through */
-if (!(bs-open_flags  BDRV_O_CACHE_WB)) {
-fua = 1;
-}
+uint32_t num_sectors;
+uint64_t lba;
+struct iscsi_data data;
 
 acb = qemu_aio_get(iscsi_aio_pool, bs, cb, opaque);
 trace_iscsi_aio_writev(iscsi, sector_num, nb_sectors, opaque, acb);
@@ -226,18 +224,44 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
 
 acb-canceled   = 0;
 
-/* XXX we should pass the iovec to write10 to avoid the extra copy */
+/* XXX we should pass the iovec to write16 to avoid the extra copy */
 /* this will allow us to get rid of 'buf' completely */
 size = nb_sectors * BDRV_SECTOR_SIZE;
 acb-buf = g_malloc(size);
 qemu_iovec_to_buffer(acb-qiov, acb-buf);
-acb-task = iscsi_write10_task(iscsi, iscsilun-lun, acb-buf, size,
-  sector_qemu2lun(sector_num, iscsilun),
-  fua, 0, iscsilun-block_size,
-  iscsi_aio_write10_cb, acb);
+
+
+acb-task = malloc(sizeof(struct scsi_task));
 if (acb-task == NULL) {
-error_report(iSCSI: Failed to send write10 command. %s,
- iscsi_get_error(iscsi));
+error_report(iSCSI: Failed to allocate task for scsi WRITE16 
+ command. %s, iscsi_get_error(iscsi));
+qemu_aio_release(acb);
+return NULL;
+}
+memset(acb-task, 0, sizeof(struct scsi_task));
+
+acb-task-xfer_dir = SCSI_XFER_WRITE;
+acb-task-cdb_size = 16;
+acb-task-cdb[0] = 0x8a;
+if (!(bs-open_flags  BDRV_O_CACHE_WB)) {
+/* set FUA on writes when cache mode is write through */
+acb-task-cdb[1] |= 0x04;
+}
+lba = sector_qemu2lun(sector_num, iscsilun);
+*(uint32_t *)acb-task-cdb[2]  = htonl(lba  32);
+*(uint32_t *)acb-task-cdb[6]  = htonl(lba  0x);
+num_sectors = size / iscsilun-block_size;
+*(uint32_t *)acb-task-cdb[10] = htonl(num_sectors);
+acb-task-expxferlen = size;
+
+data.data = acb-buf;
+data.size = size;
+
+if (iscsi_scsi_command_async(iscsi, iscsilun-lun, acb-task,
+ iscsi_aio_write16_cb,
+ data,
+ acb) != 0) {
+scsi_free_scsi_task(acb-task);
 g_free(acb-buf);
 qemu_aio_release(acb);
 return NULL;
@@ -249,12 +273,12 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
 }
 
 static void
-iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status,
+iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status,
 void *command_data, void *opaque)
 {
 IscsiAIOCB *acb = opaque;
 
-trace_iscsi_aio_read10_cb(iscsi, status, acb, acb-canceled);
+trace_iscsi_aio_read16_cb(iscsi, status, acb, acb-canceled);
 
 if (acb-canceled != 0) {
 qemu_aio_release(acb);
@@ -265,7 +289,7 @@ iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status,
 
 acb-status = 0;
 if (status != 0) {
-error_report(Failed to read10 data from iSCSI lun. %s,
+error_report(Failed to read16 data from iSCSI lun. %s,
  iscsi_get_error(iscsi));
 acb-status = -EIO;
 }
@@ -284,8 +308,10 @@ iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num,
 IscsiLun *iscsilun = bs-opaque;
 struct iscsi_context *iscsi = iscsilun-iscsi;
 IscsiAIOCB *acb;
-size_t qemu_read_size, lun_read_size;

Re: [Qemu-devel] [PATCH v3 0/8] msi: Refactorings and reset fixes

2012-05-22 Thread Alexander Graf


On 11.05.2012, at 16:42, Jan Kiszka wrote:

 v3 is v2 - xhci changes as MSI is instable there anyway. So, only
 patches 1 and 2 are for 1.1/stable now.

Just saw these in my inbox as unanswered, so mst: ping? :)


Alex

[Qemu-devel] [PATCHv3 0/2] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

Same as v2, but now with TABs converted using expand --tabs=4.

Jim Meyering (2):
  envlist.c: convert each TAB(width-4) to equivalent spaces
  envlist.c: handle strdup failure

 envlist.c | 272 --
 1 file changed, 140 insertions(+), 132 deletions(-)

--
1.7.10.2.552.gaa3bb87

[Qemu-devel] [PATCHv3 1/2] envlist.c: convert each TAB(width-4) to equivalent spaces

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com


Signed-off-by: Jim Meyering meyer...@redhat.com
---
 envlist.c | 256 +++---
 1 file changed, 128 insertions(+), 128 deletions(-)

diff --git a/envlist.c b/envlist.c
index f2303cd..e44889b 100644
--- a/envlist.c
+++ b/envlist.c
@@ -8,13 +8,13 @@
 #include envlist.h

 struct envlist_entry {
-   const char *ev_var; /* actual env value */
-   QLIST_ENTRY(envlist_entry) ev_link;
+const char *ev_var; /* actual env value */
+QLIST_ENTRY(envlist_entry) ev_link;
 };

 struct envlist {
-   QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
-   size_t el_count;/* number of entries */
+QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */
+size_t el_count;/* number of entries */
 };

 static int envlist_parse(envlist_t *envlist,
@@ -27,15 +27,15 @@ static int envlist_parse(envlist_t *envlist,
 envlist_t *
 envlist_create(void)
 {
-   envlist_t *envlist;
+envlist_t *envlist;

-   if ((envlist = malloc(sizeof (*envlist))) == NULL)
-   return (NULL);
+if ((envlist = malloc(sizeof (*envlist))) == NULL)
+return (NULL);

-   QLIST_INIT(envlist-el_entries);
-   envlist-el_count = 0;
+QLIST_INIT(envlist-el_entries);
+envlist-el_count = 0;

-   return (envlist);
+return (envlist);
 }

 /*
@@ -44,18 +44,18 @@ envlist_create(void)
 void
 envlist_free(envlist_t *envlist)
 {
-   struct envlist_entry *entry;
+struct envlist_entry *entry;

-   assert(envlist != NULL);
+assert(envlist != NULL);

-   while (envlist-el_entries.lh_first != NULL) {
-   entry = envlist-el_entries.lh_first;
-   QLIST_REMOVE(entry, ev_link);
+while (envlist-el_entries.lh_first != NULL) {
+entry = envlist-el_entries.lh_first;
+QLIST_REMOVE(entry, ev_link);

-   free((char *)entry-ev_var);
-   free(entry);
-   }
-   free(envlist);
+free((char *)entry-ev_var);
+free(entry);
+}
+free(envlist);
 }

 /*
@@ -72,7 +72,7 @@ envlist_free(envlist_t *envlist)
 int
 envlist_parse_set(envlist_t *envlist, const char *env)
 {
-   return (envlist_parse(envlist, env, envlist_setenv));
+return (envlist_parse(envlist, env, envlist_setenv));
 }

 /*
@@ -84,7 +84,7 @@ envlist_parse_set(envlist_t *envlist, const char *env)
 int
 envlist_parse_unset(envlist_t *envlist, const char *env)
 {
-   return (envlist_parse(envlist, env, envlist_unsetenv));
+return (envlist_parse(envlist, env, envlist_unsetenv));
 }

 /*
@@ -97,32 +97,32 @@ static int
 envlist_parse(envlist_t *envlist, const char *env,
 int (*callback)(envlist_t *, const char *))
 {
-   char *tmpenv, *envvar;
-   char *envsave = NULL;
-
-   assert(callback != NULL);
-
-   if ((envlist == NULL) || (env == NULL))
-   return (EINVAL);
-
-   /*
-* We need to make temporary copy of the env string
-* as strtok_r(3) modifies it while it tokenizes.
-*/
-   if ((tmpenv = strdup(env)) == NULL)
-   return (errno);
-
-   envvar = strtok_r(tmpenv, ,, envsave);
-   while (envvar != NULL) {
-   if ((*callback)(envlist, envvar) != 0) {
-   free(tmpenv);
-   return (errno);
-   }
-   envvar = strtok_r(NULL, ,, envsave);
-   }
-
-   free(tmpenv);
-   return (0);
+char *tmpenv, *envvar;
+char *envsave = NULL;
+
+assert(callback != NULL);
+
+if ((envlist == NULL) || (env == NULL))
+return (EINVAL);
+
+/*
+ * We need to make temporary copy of the env string
+ * as strtok_r(3) modifies it while it tokenizes.
+ */
+if ((tmpenv = strdup(env)) == NULL)
+return (errno);
+
+envvar = strtok_r(tmpenv, ,, envsave);
+while (envvar != NULL) {
+if ((*callback)(envlist, envvar) != 0) {
+free(tmpenv);
+return (errno);
+}
+envvar = strtok_r(NULL, ,, envsave);
+}
+
+free(tmpenv);
+return (0);
 }

 /*
@@ -134,46 +134,46 @@ envlist_parse(envlist_t *envlist, const char *env,
 int
 envlist_setenv(envlist_t *envlist, const char *env)
 {
-   struct envlist_entry *entry = NULL;
-   const char *eq_sign;
-   size_t envname_len;
-
-   if ((envlist == NULL) || (env == NULL))
-   return (EINVAL);
-
-   /* find out first equals sign in given env */
-   if ((eq_sign = strchr(env, '=')) == NULL)
-   return (EINVAL);
-   envname_len = eq_sign - env + 1;
-
-   /*
-* If there already exists variable with given name
-* we remove and release it before allocating a whole
-* new entry.
-*/
-   for (entry = envlist-el_entries.lh_first; entry != NULL;
-   entry = entry-ev_link.le_next) {
-

Re: [Qemu-devel] [PATCH v3 3/8] ahci: Clean up reset functions

2012-05-22 Thread Alexander Graf


On 11.05.2012, at 16:42, Jan Kiszka wrote:

 Properly register reset functions via the device class.
 
 CC: Alexander Graf ag...@suse.de
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Looks good to me, but I'm no infrastructure expert :)

Reviewed-by: Alexander Graf ag...@suse.de


Alex

[Qemu-devel] [PATCHv2 2/9] tcg: __jit_debug_descriptor must not be static

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

Add comments so no one else will be tempted to reduce the scope
of this global variable.

Signed-off-by: Jim Meyering meyer...@redhat.com
---

 tcg/tcg.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ab589c7..2793fa6 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2264,7 +2264,9 @@ void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf)
(3) Call tcg_register_jit_int, with the constructed .debug_frame.
 */

-/* Begin GDB interface.  THE FOLLOWING MUST MATCH GDB DOCS.  */
+/* Begin GDB interface.  THE FOLLOWING MUST MATCH GDB DOCS:
+   http://sourceware.org/gdb/onlinedocs/gdb/Declarations.html
+*/
 typedef enum {
 JIT_NOACTION = 0,
 JIT_REGISTER_FN,
@@ -2291,8 +2293,10 @@ void __jit_debug_register_code(void)
 asm();
 }

-/* Must statically initialize the version, because GDB may check
-   the version before we can set it.  */
+/* We must initialize the version this way, because GDB may check
+   the version before we can set it.  This declaration must have
+   external scope.  If it were static, an aggressive compiler might
+   notice that we never read this symbol and remove it altogether. */
 struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };

 /* End GDB interface.  */
--
1.7.10.2.552.gaa3bb87

Re: [Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static

2012-05-22 Thread Jim Meyering

Peter Maydell wrote:
 On 21 May 2012 21:10, Jim Meyering j...@meyering.net wrote:
 Peter Maydell wrote:
 On 21 May 2012 20:51, Jim Meyering j...@meyering.net wrote:
 From: Jim Meyering meyer...@redhat.com


 Signed-off-by: Jim Meyering meyer...@redhat.com
 ---
  tcg/tcg.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/tcg/tcg.c b/tcg/tcg.c
 index ab589c7..350fdad 100644
 --- a/tcg/tcg.c
 +++ b/tcg/tcg.c
 @@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)

  /* Must statically initialize the version, because GDB may check
    the version before we can set it.  */
 -struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
 +static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };

  /* End GDB interface.  */

 Nak. This symbol is global so that gdb can find it by fishing around
 in the executable's symbol table.

 Thanks for the quick feedback.

 How does the scope of the symbol affect whether gdb can find it?

 If you mark it 'static' the compiler can throw it away or completely
 rearrange it if it's feeling clever enough, I think.

 Anyway, we're following a prescribed interface here:
 http://sourceware.org/gdb/onlinedocs/gdb/Declarations.html

 and I don't think we should deviate from it. As the comment says,
 THE FOLLOWING MUST MATCH GDB DOCS..

 If declaring this variable static is not appropriate,
 then the comment saying that static initialization is desired
 should be changed.

 The comment means statically initialize this variable rather than
 doing it dynamically in some function at startup.

Thanks.  I've clarified the comments and posted a V2.

Re: [Qemu-devel] [PATCHv2 1/2] envlist.c: convert all TABs to equivalent spaces

2012-05-22 Thread Jim Meyering

Peter Maydell wrote:
 On 22 May 2012 10:50, Jim Meyering j...@meyering.net wrote:
 From: Jim Meyering meyer...@redhat.com

 If we're going to go to the effort of a complete reindent
 patch we should actually reindent to the QEMU coding style
 standard, which is four-space, not eight.

Good point.  V3 is on its way.

Re: [Qemu-devel] [PATCH v3 1/8] ahci: Fix reset of MSI function

2012-05-22 Thread Alexander Graf


On 11.05.2012, at 16:42, Jan Kiszka wrote:

 Call msi_reset on device reset as still required by the core.
 
 CC: Alexander Graf ag...@suse.de
 CC: qemu-sta...@nongnu.org
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Acked-by: Alexander Graf ag...@suse.de


Alex

Re: [Qemu-devel] Current differences between qemu --enable-kvm and qemu-kvm?

2012-05-22 Thread Jan Kiszka

On 2012-05-22 07:04, Stefan Hajnoczi wrote:
 On Mon, May 21, 2012 at 5:54 PM, Erik Rull webmas...@rdsoftware.de wrote:
 is there a summary existing that shows up the rough or actual differences
 between qemu --enable-kvm and qemu-kvm? I tested both versions with the same
 compile and start options, the CPU performance results are identical, only
 the bootup time of my guest system with qemu-kvm seemed to be a bit faster
 (not measured, it just feeled so).

Current upstream does not enable the in-kernel irqchip of KVM by
default. This should explain the difference in boot-up times. Try
-machine accel=kvm,kernel_irqchip=on. But the default will be on, just
like in qemu-kvm, once [1] is merged.

 
 For production KVM instances I think it still makes sense to use
 qemu-kvm packages from your distro or qemu-kvm upstream source.
 
 Jan Kiszka has reduced the delta between qemu.git and qemu-kvm.git to
 the point where I think the list of differences is rather small -
 maybe PCI passthrough stuff, irqfd for vhost-net (which is now also
 being upstreamed into qemu.git), and a few other things I don't know
 of.

Right, the list of differences is dramatically shrinking. As stated in
[2], soon only PCI passthrough and legacy interface dependencies on
qemu-kvm will be the remaining reasons to use it. If we are lucky, PCI
passthrough will also make it into upstream for QEMU 1.2, we are working
on this.

 
 For development most patches should be against qemu.git unless they
 have a dependency on qemu-kvm.git code.

Yes, unless you are working on the upstream merge itself, there is
practically no reason anymore to develop against qemu-kvm directly.

Jan

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91171
[2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91026

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH qom-next 0/5] target-i386: re-factor CPU creation/initialization to QOM

2012-05-22 Thread Igor Mammedov

Moving code related to CPU creation and initialization internal parts
from board level into apic and cpu objects will allow X86CPU to better
model QOM object life-cycle.
It will allow to create X86CPU as any other object by creating it with
object_new() then setting properties and then calling x86_cpu_realize()
to make it running. Later x86_cpu_realize() should become realize property.

git tree at: 
  https://github.com/imammedo/qemu/tree/x86-cpu-realize-v2

[Qemu-devel] [PATCH qom-next 2/5] target-i386: add cpu-model property to x86_cpu

2012-05-22 Thread Igor Mammedov

it's probably intermidiate step till cpu modeled as
sub-classes. After then we probably could drop it.

However it still could be used for overiding default
cpu subclasses definition, and probably renamed to
something like 'features'.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 cpu-defs.h   |2 +-
 hw/pc.c  |   10 --
 target-i386/cpu.c|   34 ++
 target-i386/helper.c |   16 
 4 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index f49e950..8f4623c 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -221,7 +221,7 @@ typedef struct CPUWatchpoint {
 struct QemuCond *halt_cond; \
 int thread_kicked;  \
 struct qemu_work_item *queued_work_first, *queued_work_last;\
-const char *cpu_model_str;  \
+char *cpu_model_str;\
 struct KVMState *kvm_state; \
 struct kvm_run *kvm_run;\
 int kvm_fd; \
diff --git a/hw/pc.c b/hw/pc.c
index 6ad1c0d..00d738d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -939,7 +939,6 @@ static X86CPU *pc_new_cpu(const char *cpu_model)
 
 cpu = cpu_x86_init(cpu_model);
 if (cpu == NULL) {
-fprintf(stderr, Unable to find x86 CPU definition\n);
 exit(1);
 }
 env = cpu-env;
@@ -955,15 +954,6 @@ void pc_cpus_init(const char *cpu_model)
 {
 int i;
 
-/* init CPUs */
-if (cpu_model == NULL) {
-#ifdef TARGET_X86_64
-cpu_model = qemu64;
-#else
-cpu_model = qemu32;
-#endif
-}
-
 for(i = 0; i  smp_cpus; i++) {
 pc_new_cpu(cpu_model);
 }
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index c7c23bf..538892d 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1729,6 +1729,27 @@ static void mce_init(X86CPU *cpu)
 }
 }
 
+static char *x86_get_cpu_model(Object *obj, Error **errp)
+{
+X86CPU *cpu = X86_CPU(obj);
+CPUX86State *env = cpu-env;
+return g_strdup(env-cpu_model_str);
+}
+
+static void x86_set_cpu_model(Object *obj, const char *value, Error **errp)
+{
+X86CPU *cpu = X86_CPU(obj);
+CPUX86State *env = cpu-env;
+
+g_free((gpointer)env-cpu_model_str);
+env-cpu_model_str = g_strdup(value);
+
+if (cpu_x86_register(cpu, env-cpu_model_str)  0) {
+fprintf(stderr, Unable to find x86 CPU definition\n);
+error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
+}
+}
+
 void x86_cpu_realize(Object *obj, Error **errp)
 {
 X86CPU *cpu = X86_CPU(obj);
@@ -1769,7 +1790,20 @@ static void x86_cpu_initfn(Object *obj)
 x86_cpuid_get_tsc_freq,
 x86_cpuid_set_tsc_freq, NULL, NULL, NULL);
 
+object_property_add_str(obj, cpu-model,
+x86_get_cpu_model, x86_set_cpu_model, NULL);
+
 env-cpuid_apic_id = env-cpu_index;
+
+/* init various static tables used in TCG mode */
+if (tcg_enabled()  !inited) {
+inited = 1;
+optimize_flags_init();
+#ifndef CONFIG_USER_ONLY
+prev_debug_excp_handler =
+cpu_set_debug_excp_handler(breakpoint_handler);
+#endif
+}
 }
 
 static void x86_cpu_common_class_init(ObjectClass *oc, void *data)
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 94f95b7..6fc67a9 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1160,12 +1160,10 @@ int cpu_x86_get_descr_debug(CPUX86State *env, unsigned 
int selector,
 X86CPU *cpu_x86_init(const char *cpu_model)
 {
 X86CPU *cpu;
-CPUX86State *env;
+Error *errp = NULL;
 static int inited;
 
 cpu = X86_CPU(object_new(TYPE_X86_CPU));
-env = cpu-env;
-env-cpu_model_str = cpu_model;
 
 /* init various static tables used in TCG mode */
 if (tcg_enabled()  !inited) {
@@ -1176,7 +1174,17 @@ X86CPU *cpu_x86_init(const char *cpu_model)
 cpu_set_debug_excp_handler(breakpoint_handler);
 #endif
 }
-if (cpu_x86_register(cpu, cpu_model)  0) {
+
+if (cpu_model) {
+object_property_set_str(OBJECT(cpu), cpu_model, cpu-model, errp);
+} else {
+#ifdef TARGET_X86_64
+object_property_set_str(OBJECT(cpu), qemu64, cpu-model, errp);
+#else
+object_property_set_str(OBJECT(cpu), qemu32, cpu-model, errp);
+#endif
+}
+if (errp) {
 object_delete(OBJECT(cpu));
 return NULL;
 }
-- 
1.7.7.6

[Qemu-devel] [PATCH 02/16] qapi: introduce size type

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 qapi/qapi-visit-core.h |4 
 qapi/qapi-visit-core.c |7 +++
 scripts/qapi.py|2 +-
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/qapi/qapi-visit-core.h b/qapi/qapi-visit-core.h
index e850746..bab2fec 100644
--- a/qapi/qapi-visit-core.h
+++ b/qapi/qapi-visit-core.h
@@ -39,6 +39,9 @@ struct Visitor
   const char *kind, const char *name, Error **errp);
 
 void (*type_int)(Visitor *v, int64_t *obj, const char *name, Error **errp);
+/* visit_type_size() falls back to (*type_int)() if type_size is unset */
+void (*type_size)(Visitor *v, int64_t *obj, const char *name,
+  Error **errp);
 void (*type_bool)(Visitor *v, bool *obj, const char *name, Error **errp);
 void (*type_str)(Visitor *v, char **obj, const char *name, Error **errp);
 void (*type_number)(Visitor *v, double *obj, const char *name,
@@ -69,6 +72,7 @@ void visit_end_optional(Visitor *v, Error **errp);
 void visit_type_enum(Visitor *v, int *obj, const char *strings[],
  const char *kind, const char *name, Error **errp);
 void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp);
+void visit_type_size(Visitor *v, int64_t *obj, const char *name, Error **errp);
 void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp);
 void visit_type_str(Visitor *v, char **obj, const char *name, Error **errp);
 void visit_type_number(Visitor *v, double *obj, const char *name, Error 
**errp);
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index df1ed5c..ea31cf5 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -95,6 +95,13 @@ void visit_type_int(Visitor *v, int64_t *obj, const char 
*name, Error **errp)
 }
 }
 
+void visit_type_size(Visitor *v, int64_t *obj, const char *name, Error **errp)
+{
+if (!error_is_set(errp)) {
+(v-type_size ? v-type_size : v-type_int)(v, obj, name, errp);
+}
+}
+
 void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp)
 {
 if (!error_is_set(errp)) {
diff --git a/scripts/qapi.py b/scripts/qapi.py
index e062336..6aebb0f 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -157,7 +157,7 @@ def is_enum(name):
 def c_type(name):
 if name == 'str':
 return 'char *'
-elif name == 'int':
+elif name == 'int' or name == 'size':
 return 'int64_t'
 elif name == 'bool':
 return 'bool'
-- 
1.7.1

[Qemu-devel] [PATCH 14/16] convert net_init_tap() to NetClientOptions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/tap.h |2 +-
 net/tap-aix.c |2 +-
 net/tap-bsd.c |2 +-
 net/tap-haiku.c   |2 +-
 net/tap-linux.c   |9 +++-
 net/tap-solaris.c |2 +-
 net/tap-win32.c   |   11 +++--
 net/tap.c |  111 ++---
 8 files changed, 71 insertions(+), 70 deletions(-)

diff --git a/net/tap.h b/net/tap.h
index 44e31ce..f092129 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -47,7 +47,7 @@ void tap_using_vnet_hdr(VLANClientState *vc, int 
using_vnet_hdr);
 void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6, int 
ecn, int ufo);
 void tap_set_vnet_hdr_len(VLANClientState *vc, int len);
 
-int tap_set_sndbuf(int fd, QemuOpts *opts);
+int tap_set_sndbuf(int fd, const NetdevTapOptions *tap);
 int tap_probe_vnet_hdr(int fd);
 int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
diff --git a/net/tap-aix.c b/net/tap-aix.c
index e19aaba..f27c177 100644
--- a/net/tap-aix.c
+++ b/net/tap-aix.c
@@ -31,7 +31,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 return -1;
 }
 
-int tap_set_sndbuf(int fd, QemuOpts *opts)
+int tap_set_sndbuf(int fd, const NetdevTapOptions *tap)
 {
 return 0;
 }
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 4b6b3a4..9a4e2a1 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -123,7 +123,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 return fd;
 }
 
-int tap_set_sndbuf(int fd, QemuOpts *opts)
+int tap_set_sndbuf(int fd, const NetdevTapOptions *tap)
 {
 return 0;
 }
diff --git a/net/tap-haiku.c b/net/tap-haiku.c
index 91dda8e..34739d1 100644
--- a/net/tap-haiku.c
+++ b/net/tap-haiku.c
@@ -31,7 +31,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 return -1;
 }
 
-int tap_set_sndbuf(int fd, QemuOpts *opts)
+int tap_set_sndbuf(int fd, const NetdevTapOptions *tap)
 {
 return 0;
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 41d581b..c6521be 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -98,16 +98,19 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
  */
 #define TAP_DEFAULT_SNDBUF 0
 
-int tap_set_sndbuf(int fd, QemuOpts *opts)
+int tap_set_sndbuf(int fd, const NetdevTapOptions *tap)
 {
 int sndbuf;
 
-sndbuf = qemu_opt_get_size(opts, sndbuf, TAP_DEFAULT_SNDBUF);
+sndbuf = !tap-has_sndbuf   ? TAP_DEFAULT_SNDBUF :
+ tap-sndbuf  INT_MAX  ? INT_MAX :
+ tap-sndbuf;
+
 if (!sndbuf) {
 sndbuf = INT_MAX;
 }
 
-if (ioctl(fd, TUNSETSNDBUF, sndbuf) == -1  qemu_opt_get(opts, 
sndbuf)) {
+if (ioctl(fd, TUNSETSNDBUF, sndbuf) == -1  tap-has_sndbuf) {
 error_report(TUNSETSNDBUF ioctl failed: %s, strerror(errno));
 return -1;
 }
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index cf76463..5d6ac42 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -197,7 +197,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 return fd;
 }
 
-int tap_set_sndbuf(int fd, QemuOpts *opts)
+int tap_set_sndbuf(int fd, const NetdevTapOptions *tap)
 {
 return 0;
 }
diff --git a/net/tap-win32.c b/net/tap-win32.c
index b738f45..b6099cd 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -699,19 +699,20 @@ static int tap_win32_init(VLANState *vlan, const char 
*model,
 return 0;
 }
 
-int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts,
+int net_init_tap(QemuOpts *old_opts, const NetClientOptions *opts,
  const char *name, VLANState *vlan)
 {
-const char *ifname;
+const NetdevTapOptions *tap;
 
-ifname = qemu_opt_get(opts, ifname);
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_TAP);
+tap = opts-tap;
 
-if (!ifname) {
+if (!tap-has_ifname) {
 error_report(tap: no interface name);
 return -1;
 }
 
-if (tap_win32_init(vlan, tap, name, ifname) == -1) {
+if (tap_win32_init(vlan, tap, name, tap-ifname) == -1) {
 return -1;
 }
 
diff --git a/net/tap.c b/net/tap.c
index 6d57d6c..7501eba 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -547,29 +547,32 @@ int net_init_bridge(QemuOpts *opts, const 
NetClientOptions *new_opts,
 return 0;
 }
 
-static int net_tap_init(QemuOpts *opts, int *vnet_hdr)
+static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
+const char *setup_script, char *ifname,
+size_t ifname_sz)
 {
 int fd, vnet_hdr_required;
-char ifname[128] = {0,};
-const char *setup_script;
 
-if (qemu_opt_get(opts, ifname)) {
-pstrcpy(ifname, sizeof(ifname), qemu_opt_get(opts, ifname));
+if (tap-has_ifname) {
+pstrcpy(ifname, ifname_sz, tap-ifname);
+} else {
+assert(ifname_sz  0);
+ifname[0] = '\0';
 }
 
-*vnet_hdr = qemu_opt_get_bool(opts,

Re: [Qemu-devel] [PATCH qom-next 3/5] pc: move apic_mapped initialization into common apic init code

2012-05-22 Thread Jan Kiszka

On 2012-05-22 07:35, Igor Mammedov wrote:
 Move from apic_init in pc.c  the code that belongs to apic_init_common
 and create/init apic in pc_new_cpu instead of separate func.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
  hw/apic_common.c |   16 
  hw/msi.h |2 ++
  hw/pc.c  |   47 ---
  3 files changed, 26 insertions(+), 39 deletions(-)
 
 diff --git a/hw/apic_common.c b/hw/apic_common.c
 index 23d51e8..703931b 100644
 --- a/hw/apic_common.c
 +++ b/hw/apic_common.c
 @@ -21,6 +21,7 @@
  #include apic_internal.h
  #include trace.h
  #include kvm.h
 +#include msi.h
  
  static int apic_irq_delivered;
  bool apic_report_tpr_access;
 @@ -284,6 +285,7 @@ static int apic_init_common(SysBusDevice *dev)
  APICCommonClass *info;
  static DeviceState *vapic;
  static int apic_no;
 +static int apic_mapped;
  
  if (apic_no = MAX_APICS) {
  return -1;
 @@ -295,6 +297,20 @@ static int apic_init_common(SysBusDevice *dev)
  
  sysbus_init_mmio(dev, s-io_memory);
  
 +/* XXX: mapping more APICs at the same memory location */
 +if (apic_mapped == 0) {
 +/* NOTE: the APIC is directly connected to the CPU - it is not
 +   on the global memory bus. */
 +/* XXX: what if the base changes? */
 +sysbus_mmio_map(sysbus_from_qdev(s-busdev.qdev), 0, MSI_ADDR_BASE);
 +apic_mapped = 1;
 +}
 +
 +/* KVM does not support MSI yet. */
 +if (!kvm_irqchip_in_kernel()) {
 +msi_supported = true;
 +}
 +
  if (!vapic  s-vapic_control  VAPIC_ENABLE_MASK) {
  vapic = sysbus_create_simple(kvmvapic, -1, NULL);
  }
 diff --git a/hw/msi.h b/hw/msi.h
 index 3040bb0..abd52b6 100644
 --- a/hw/msi.h
 +++ b/hw/msi.h
 @@ -40,4 +40,6 @@ static inline bool msi_present(const PCIDevice *dev)
  return dev-cap_present  QEMU_PCI_CAP_MSI;
  }
  
 +#define MSI_ADDR_BASE 0xfee0
 +
  #endif /* QEMU_MSI_H */
 diff --git a/hw/pc.c b/hw/pc.c
 index 00d738d..0eb0b73 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -879,44 +879,6 @@ DeviceState *cpu_get_current_apic(void)
  }
  }
  
 -static DeviceState *apic_init(void *env, uint8_t apic_id)
 -{
 -DeviceState *dev;
 -static int apic_mapped;
 -
 -if (kvm_irqchip_in_kernel()) {
 -dev = qdev_create(NULL, kvm-apic);
 -} else if (xen_enabled()) {
 -dev = qdev_create(NULL, xen-apic);
 -} else {
 -dev = qdev_create(NULL, apic);
 -}
 -
 -qdev_prop_set_uint8(dev, id, apic_id);
 -qdev_prop_set_ptr(dev, cpu_env, env);
 -qdev_init_nofail(dev);
 -
 -/* XXX: mapping more APICs at the same memory location */
 -if (apic_mapped == 0) {
 -/* NOTE: the APIC is directly connected to the CPU - it is not
 -   on the global memory bus. */
 -/* XXX: what if the base changes? */
 -sysbus_mmio_map(sysbus_from_qdev(dev), 0, MSI_ADDR_BASE);

While at it, you should drop MSI_ADDR_BASE definition from pc.c.

 -apic_mapped = 1;
 -}
 -
 -/* KVM does not support MSI yet. */
 -if (!kvm_irqchip_in_kernel()) {
 -msi_supported = true;
 -}
 -
 -if (xen_msi_support()) {
 -msi_supported = true;
 -}
 -
 -return dev;
 -}
 -

You are loosing some xen bits here. But this will collide with latest
kvm pull request
(http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91171) anyway.
You may want to base on uq/master.

  void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
  {
  CPUX86State *s = opaque;
 @@ -943,7 +905,14 @@ static X86CPU *pc_new_cpu(const char *cpu_model)
  }
  env = cpu-env;
  if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
 -env-apic_state = apic_init(env, env-cpuid_apic_id);
 +if (kvm_irqchip_in_kernel()) {
 +env-apic_state = qdev_create(NULL, kvm-apic);
 +} else {
 +env-apic_state = qdev_create(NULL, apic);
 +}
 +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
 +qdev_prop_set_ptr(env-apic_state, cpu_env, env);
 +qdev_init_nofail(env-apic_state);
  }
  qemu_register_reset(pc_cpu_reset, cpu);
  pc_cpu_reset(cpu);

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH qom-next 3/5] pc: move apic_mapped initialization into common apic init code

2012-05-22 Thread Jan Kiszka

On 2012-05-22 07:35, Igor Mammedov wrote:
 Move from apic_init in pc.c  the code that belongs to apic_init_common
 and create/init apic in pc_new_cpu instead of separate func.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
  hw/apic_common.c |   16 
  hw/msi.h |2 ++
  hw/pc.c  |   47 ---
  3 files changed, 26 insertions(+), 39 deletions(-)
 
 diff --git a/hw/apic_common.c b/hw/apic_common.c
 index 23d51e8..703931b 100644
 --- a/hw/apic_common.c
 +++ b/hw/apic_common.c
 @@ -21,6 +21,7 @@
  #include apic_internal.h
  #include trace.h
  #include kvm.h
 +#include msi.h
  
  static int apic_irq_delivered;
  bool apic_report_tpr_access;
 @@ -284,6 +285,7 @@ static int apic_init_common(SysBusDevice *dev)
  APICCommonClass *info;
  static DeviceState *vapic;
  static int apic_no;
 +static int apic_mapped;
  
  if (apic_no = MAX_APICS) {
  return -1;
 @@ -295,6 +297,20 @@ static int apic_init_common(SysBusDevice *dev)
  
  sysbus_init_mmio(dev, s-io_memory);
  
 +/* XXX: mapping more APICs at the same memory location */
 +if (apic_mapped == 0) {
 +/* NOTE: the APIC is directly connected to the CPU - it is not
 +   on the global memory bus. */
 +/* XXX: what if the base changes? */
 +sysbus_mmio_map(sysbus_from_qdev(s-busdev.qdev), 0, MSI_ADDR_BASE);
 +apic_mapped = 1;
 +}
 +
 +/* KVM does not support MSI yet. */
 +if (!kvm_irqchip_in_kernel()) {
 +msi_supported = true;
 +}
 +
  if (!vapic  s-vapic_control  VAPIC_ENABLE_MASK) {
  vapic = sysbus_create_simple(kvmvapic, -1, NULL);
  }
 diff --git a/hw/msi.h b/hw/msi.h
 index 3040bb0..abd52b6 100644
 --- a/hw/msi.h
 +++ b/hw/msi.h
 @@ -40,4 +40,6 @@ static inline bool msi_present(const PCIDevice *dev)
  return dev-cap_present  QEMU_PCI_CAP_MSI;
  }
  
 +#define MSI_ADDR_BASE 0xfee0
 +
  #endif /* QEMU_MSI_H */
 diff --git a/hw/pc.c b/hw/pc.c
 index 00d738d..0eb0b73 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -879,44 +879,6 @@ DeviceState *cpu_get_current_apic(void)
  }
  }
  
 -static DeviceState *apic_init(void *env, uint8_t apic_id)
 -{
 -DeviceState *dev;
 -static int apic_mapped;
 -
 -if (kvm_irqchip_in_kernel()) {
 -dev = qdev_create(NULL, kvm-apic);
 -} else if (xen_enabled()) {
 -dev = qdev_create(NULL, xen-apic);
 -} else {
 -dev = qdev_create(NULL, apic);
 -}
 -
 -qdev_prop_set_uint8(dev, id, apic_id);
 -qdev_prop_set_ptr(dev, cpu_env, env);
 -qdev_init_nofail(dev);
 -
 -/* XXX: mapping more APICs at the same memory location */
 -if (apic_mapped == 0) {
 -/* NOTE: the APIC is directly connected to the CPU - it is not
 -   on the global memory bus. */
 -/* XXX: what if the base changes? */
 -sysbus_mmio_map(sysbus_from_qdev(dev), 0, MSI_ADDR_BASE);

While at it, you should drop MSI_ADDR_BASE definition from pc.c.

 -apic_mapped = 1;
 -}
 -
 -/* KVM does not support MSI yet. */
 -if (!kvm_irqchip_in_kernel()) {
 -msi_supported = true;
 -}
 -
 -if (xen_msi_support()) {
 -msi_supported = true;
 -}
 -
 -return dev;
 -}
 -

You are loosing some xen bits here. But this will collide with latest
kvm pull request
(http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91171) anyway.
You may want to base on uq/master.

  void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
  {
  CPUX86State *s = opaque;
 @@ -943,7 +905,14 @@ static X86CPU *pc_new_cpu(const char *cpu_model)
  }
  env = cpu-env;
  if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
 -env-apic_state = apic_init(env, env-cpuid_apic_id);
 +if (kvm_irqchip_in_kernel()) {
 +env-apic_state = qdev_create(NULL, kvm-apic);
 +} else {
 +env-apic_state = qdev_create(NULL, apic);
 +}
 +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
 +qdev_prop_set_ptr(env-apic_state, cpu_env, env);
 +qdev_init_nofail(env-apic_state);
  }
  qemu_register_reset(pc_cpu_reset, cpu);
  pc_cpu_reset(cpu);

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH qom-next 4/5] target-i386: make initialize CPU in QOM way

2012-05-22 Thread Jan Kiszka

On 2012-05-22 07:35, Igor Mammedov wrote:
 Make CPU creation/initialization consistent with QOM object
 behavior in this, by moving tcg and apic initialization from board
 level into CPU's initfn/realize calls and cpu_model property setter.
 
 Which makes CPU object self-sufficient in respect of creation/initialization
 and matches a typical object creation sequence, i.e.:
   - create CPU instance
   - set properties
   - realize object - (x86_cpu_realize will be converted into realize
   property setter, when it is implemented)
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
  hw/pc.c  |   32 +-
  target-i386/cpu.c|   60 
 +-
  target-i386/helper.c |   39 
  3 files changed, 65 insertions(+), 66 deletions(-)
 
 diff --git a/hw/pc.c b/hw/pc.c
 index 0eb0b73..677f9e0 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -894,37 +894,17 @@ static void pc_cpu_reset(void *opaque)
  cpu_reset(CPU(cpu));
  }
  
 -static X86CPU *pc_new_cpu(const char *cpu_model)
 -{
 -X86CPU *cpu;
 -CPUX86State *env;
 -
 -cpu = cpu_x86_init(cpu_model);
 -if (cpu == NULL) {
 -exit(1);
 -}
 -env = cpu-env;
 -if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
 -if (kvm_irqchip_in_kernel()) {
 -env-apic_state = qdev_create(NULL, kvm-apic);
 -} else {
 -env-apic_state = qdev_create(NULL, apic);
 -}
 -qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
 -qdev_prop_set_ptr(env-apic_state, cpu_env, env);
 -qdev_init_nofail(env-apic_state);
 -}
 -qemu_register_reset(pc_cpu_reset, cpu);
 -pc_cpu_reset(cpu);
 -return cpu;
 -}
 -
  void pc_cpus_init(const char *cpu_model)
  {
 +X86CPU *cpu;
  int i;
  
  for(i = 0; i  smp_cpus; i++) {
 -pc_new_cpu(cpu_model);
 +cpu = cpu_x86_init(cpu_model);
 +if (cpu == NULL) {
 +exit(1);
 +}
 +qemu_register_reset(pc_cpu_reset, cpu);
  }
  }
  
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 538892d..0e804ea 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -31,6 +31,9 @@
  
  #include hyperv.h
  
 +#include hw/qdev.h
 +#include sysemu.h
 +
  /* feature flags taken from Intel Processor Identification and the CPUID
   * Instruction and AMD's CPUID Specification.  In cases of disagreement
   * between feature naming conventions, aliases may be added.
 @@ -1747,21 +1750,76 @@ static void x86_set_cpu_model(Object *obj, const char 
 *value, Error **errp)
  if (cpu_x86_register(cpu, env-cpu_model_str)  0) {
  fprintf(stderr, Unable to find x86 CPU definition\n);
  error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
 +return;
 +}
 +
 +if (((env-cpuid_features  CPUID_APIC) || smp_cpus  1)) {
 +if (kvm_irqchip_in_kernel()) {
 +env-apic_state = qdev_create(NULL, kvm-apic);
 +} else {
 +env-apic_state = qdev_create(NULL, apic);
 +}
 +object_property_add_child(OBJECT(cpu), apic,
 +OBJECT(env-apic_state), NULL);
 +
 +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
 +qdev_prop_set_ptr(env-apic_state, cpu_env, env);
 +}
 +}
 +
 +static CPUDebugExcpHandler *prev_debug_excp_handler;
 +
 +static void breakpoint_handler(CPUX86State *env)
 +{
 +CPUBreakpoint *bp;
 +
 +if (env-watchpoint_hit) {
 +if (env-watchpoint_hit-flags  BP_CPU) {
 +env-watchpoint_hit = NULL;
 +if (check_hw_breakpoints(env, 0)) {
 +raise_exception_env(EXCP01_DB, env);
 +} else {
 +cpu_resume_from_signal(env, NULL);
 +}
 +}
 +} else {
 +QTAILQ_FOREACH(bp, env-breakpoints, entry)
 +if (bp-pc == env-eip) {
 +if (bp-flags  BP_CPU) {
 +check_hw_breakpoints(env, 1);
 +raise_exception_env(EXCP01_DB, env);
 +}
 +break;
 +}
 +}
 +if (prev_debug_excp_handler) {
 +prev_debug_excp_handler(env);
  }
  }
  
  void x86_cpu_realize(Object *obj, Error **errp)
  {
  X86CPU *cpu = X86_CPU(obj);
 +CPUX86State *env = cpu-env;
 +
 +if (env-apic_state) {
 +if (qdev_init(env-apic_state)  0) {
 +error_set(errp, QERR_DEVICE_INIT_FAILED,
 +object_get_typename(OBJECT(env-apic_state)));
 +return;
 +}
 +}
  
  mce_init(cpu);
 -qemu_init_vcpu(cpu-env);
 +qemu_init_vcpu(env);
 +cpu_reset(CPU(cpu));
  }
  
  static void x86_cpu_initfn(Object *obj)
  {
  X86CPU *cpu = X86_CPU(obj);
  CPUX86State *env = cpu-env;
 +static int inited;
  
  cpu_exec_init(env);
  
 diff --git a/target-i386/helper.c b/target-i386/helper.c
 index 6fc67a9..443092e 100644
 ---

[Qemu-devel] [PATCH qom-next 3/5] pc: move apic_mapped initialization into common apic init code

2012-05-22 Thread Igor Mammedov

Move from apic_init in pc.c  the code that belongs to apic_init_common
and create/init apic in pc_new_cpu instead of separate func.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/apic_common.c |   16 
 hw/msi.h |2 ++
 hw/pc.c  |   47 ---
 3 files changed, 26 insertions(+), 39 deletions(-)

diff --git a/hw/apic_common.c b/hw/apic_common.c
index 23d51e8..703931b 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -21,6 +21,7 @@
 #include apic_internal.h
 #include trace.h
 #include kvm.h
+#include msi.h
 
 static int apic_irq_delivered;
 bool apic_report_tpr_access;
@@ -284,6 +285,7 @@ static int apic_init_common(SysBusDevice *dev)
 APICCommonClass *info;
 static DeviceState *vapic;
 static int apic_no;
+static int apic_mapped;
 
 if (apic_no = MAX_APICS) {
 return -1;
@@ -295,6 +297,20 @@ static int apic_init_common(SysBusDevice *dev)
 
 sysbus_init_mmio(dev, s-io_memory);
 
+/* XXX: mapping more APICs at the same memory location */
+if (apic_mapped == 0) {
+/* NOTE: the APIC is directly connected to the CPU - it is not
+   on the global memory bus. */
+/* XXX: what if the base changes? */
+sysbus_mmio_map(sysbus_from_qdev(s-busdev.qdev), 0, MSI_ADDR_BASE);
+apic_mapped = 1;
+}
+
+/* KVM does not support MSI yet. */
+if (!kvm_irqchip_in_kernel()) {
+msi_supported = true;
+}
+
 if (!vapic  s-vapic_control  VAPIC_ENABLE_MASK) {
 vapic = sysbus_create_simple(kvmvapic, -1, NULL);
 }
diff --git a/hw/msi.h b/hw/msi.h
index 3040bb0..abd52b6 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -40,4 +40,6 @@ static inline bool msi_present(const PCIDevice *dev)
 return dev-cap_present  QEMU_PCI_CAP_MSI;
 }
 
+#define MSI_ADDR_BASE 0xfee0
+
 #endif /* QEMU_MSI_H */
diff --git a/hw/pc.c b/hw/pc.c
index 00d738d..0eb0b73 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -879,44 +879,6 @@ DeviceState *cpu_get_current_apic(void)
 }
 }
 
-static DeviceState *apic_init(void *env, uint8_t apic_id)
-{
-DeviceState *dev;
-static int apic_mapped;
-
-if (kvm_irqchip_in_kernel()) {
-dev = qdev_create(NULL, kvm-apic);
-} else if (xen_enabled()) {
-dev = qdev_create(NULL, xen-apic);
-} else {
-dev = qdev_create(NULL, apic);
-}
-
-qdev_prop_set_uint8(dev, id, apic_id);
-qdev_prop_set_ptr(dev, cpu_env, env);
-qdev_init_nofail(dev);
-
-/* XXX: mapping more APICs at the same memory location */
-if (apic_mapped == 0) {
-/* NOTE: the APIC is directly connected to the CPU - it is not
-   on the global memory bus. */
-/* XXX: what if the base changes? */
-sysbus_mmio_map(sysbus_from_qdev(dev), 0, MSI_ADDR_BASE);
-apic_mapped = 1;
-}
-
-/* KVM does not support MSI yet. */
-if (!kvm_irqchip_in_kernel()) {
-msi_supported = true;
-}
-
-if (xen_msi_support()) {
-msi_supported = true;
-}
-
-return dev;
-}
-
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
 {
 CPUX86State *s = opaque;
@@ -943,7 +905,14 @@ static X86CPU *pc_new_cpu(const char *cpu_model)
 }
 env = cpu-env;
 if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
-env-apic_state = apic_init(env, env-cpuid_apic_id);
+if (kvm_irqchip_in_kernel()) {
+env-apic_state = qdev_create(NULL, kvm-apic);
+} else {
+env-apic_state = qdev_create(NULL, apic);
+}
+qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
+qdev_prop_set_ptr(env-apic_state, cpu_env, env);
+qdev_init_nofail(env-apic_state);
 }
 qemu_register_reset(pc_cpu_reset, cpu);
 pc_cpu_reset(cpu);
-- 
1.7.7.6

[Qemu-devel] [PATCH 06/16] qapi schema: add Netdev types

2012-05-22 Thread Laszlo Ersek

NetdevTapOptions::sndbuf and NetdevDumpOptions::len use the new size
type.

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 qapi-schema.json |  275 ++
 1 files changed, 275 insertions(+), 0 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index a8b3803..fae2596 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1840,3 +1840,278 @@
 # Since: 0.14.0
 ##
 { 'command': 'netdev_del', 'data': {'id': 'str'} }
+
+##
+# @NetdevNoneOptions
+#
+# Use it alone to have zero network devices.
+#
+# Since 1.2
+##
+{ 'type': 'NetdevNoneOptions',
+  'data': { } }
+
+##
+# @NetLegacyNicOptions
+#
+# Create a new Network Interface Card.
+#
+# @netdev: #optional id of -netdev to connect to
+#
+# @macaddr: #optional MAC address
+#
+# @model: #optional device model (e1000, rtl8139, virtio etc.)
+#
+# @addr: #optional PCI device address
+#
+# @vectors: #optional number of MSI-x vectors, 0 to disable MSI-X
+#
+# Since 1.2
+##
+{ 'type': 'NetLegacyNicOptions',
+  'data': {
+'*netdev':  'str',
+'*macaddr': 'str',
+'*model':   'str',
+'*addr':'str',
+'*vectors': 'int' } }
+
+##
+# @String
+#
+# A fat type wrapping 'str', to be embedded in lists.
+#
+# Since 1.2
+##
+{ 'type': 'String',
+  'data': {
+'str': 'str' } }
+
+##
+# @NetdevUserOptions
+#
+# Use the user mode network stack which requires no administrator privilege to
+# run.
+#
+# @hostname: #optional client hostname reported by the builtin DHCP server
+#
+# @restrict: #optional isolate the guest from the host
+#
+# @ip: #optional legacy parameter, use net= instead
+#
+# @net: #optional IP address and optional netmask
+#
+# @host: #optional guest-visible address of the host
+#
+# @tftp: #optional root directory of the built-in TFTP server
+#
+# @bootfile: #optional BOOTP filename, for use with tftp=
+#
+# @dhcpstart: #optional the first of the 16 IPs the built-in DHCP server can
+# assign
+#
+# @dns: #optional guest-visible address of the virtual nameserver
+#
+# @smb: #optional root directory of the built-in SMB server
+#
+# @smbserver: #optional IP address of the built-in SMB server
+#
+# @hostfwd: #optional redirect incoming TCP or UDP host connections to guest
+#   endpoints
+#
+# @guestfwd: #optional forward guest TCP connections
+#
+# Since 1.2
+##
+{ 'type': 'NetdevUserOptions',
+  'data': {
+'*hostname':  'str',
+'*restrict':  'bool',
+'*ip':'str',
+'*net':   'str',
+'*host':  'str',
+'*tftp':  'str',
+'*bootfile':  'str',
+'*dhcpstart': 'str',
+'*dns':   'str',
+'*smb':   'str',
+'*smbserver': 'str',
+'*hostfwd':   ['String'],
+'*guestfwd':  ['String'] } }
+
+##
+# @NetdevTapOptions
+#
+# Connect the host TAP network interface name to the VLAN.
+#
+# @ifname: #optional interface name
+#
+# @fd: #optional file descriptor of an already opened tap
+#
+# @script: #optional script to initialize the interface
+#
+# @downscript: #optional script to shut down the interface
+#
+# @helper: #optional command to execute to configure bridge
+#
+# @sndbuf: #optional send buffer limit. Understands [TGMKkb] suffixes.
+#
+# @vnet_hdr: #optional enable the IFF_VNET_HDR flag on the tap interface
+#
+# @vhost: #optional enable vhost-net network accelerator
+#
+# @vhostfd: #optional file descriptor of an already opened vhost net device
+#
+# @vhostforce: #optional vhost on for non-MSIX virtio guests
+#
+# Since 1.2
+##
+{ 'type': 'NetdevTapOptions',
+  'data': {
+'*ifname': 'str',
+'*fd': 'str',
+'*script': 'str',
+'*downscript': 'str',
+'*helper': 'str',
+'*sndbuf': 'size',
+'*vnet_hdr':   'bool',
+'*vhost':  'bool',
+'*vhostfd':'str',
+'*vhostforce': 'bool' } }
+
+##
+# @NetdevSocketOptions
+#
+# Connect the VLAN to a remote VLAN in another QEMU virtual machine using a TCP
+# socket connection.
+#
+# @fd: #optional file descriptor of an already opened socket
+#
+# @listen: #optional port number, and optional hostname, to listen on
+#
+# @connect: #optional port number, and optional hostname, to connect to
+#
+# @mcast: #optional UDP multicast address and port number
+#
+# @localaddr: #optional source address and port for multicast and udp packets
+#
+# @udp: #optional UDP unicast address and port number
+#
+# Since 1.2
+##
+{ 'type': 'NetdevSocketOptions',
+  'data': {
+'*fd':'str',
+'*listen':'str',
+'*connect':   'str',
+'*mcast': 'str',
+'*localaddr': 'str',
+'*udp':   'str' } }
+
+##
+# @NetdevVdeOptions
+#
+# Connect the VLAN to a vde switch running on the host.
+#
+# @sock: #optional socket path
+#
+# @port: #optional port number
+#
+# @group: #optional group owner of socket
+#
+# @mode: #optional permissions for socket
+#
+# Since 1.2
+##
+{ 'type': 'NetdevVdeOptions',
+  'data': {
+'*sock':  'str',
+'*port':  'int',
+'*group': 'str',
+'*mode':

Re: [Qemu-devel] [PATCH qom-next 1/5] target-i386: move cpu halted decision into x86_cpu_reset

2012-05-22 Thread Peter Maydell

On 22 May 2012 11:35, Igor Mammedov imamm...@redhat.com wrote:
 From: Igor Mammedov niall...@gmail.com

 MP initialization protocol differs between cpu families, and for P6 and
 onward models it is up to CPU to decide if it will be BSP using this
 protocol, so try to model this. However there is no point in implementing
 MP initialization protocol in qemu. Thus first CPU is always marked as BSP.

This breaks compilation of the i386-linux-user target:
target-i386/cpu.c: In function ‘x86_cpu_reset’:
target-i386/cpu.c:1710: error: implicit declaration of function
‘apic_designate_bsp’
target-i386/cpu.c:1710: error: nested extern declaration of ‘apic_designate_bsp’
target-i386/cpu.c:1713: error: implicit declaration of function
‘cpu_get_apic_base’
target-i386/cpu.c:1713: error: nested extern declaration of ‘cpu_get_apic_base’

-- PMM

[Qemu-devel] [PATCH 1/2] fdc: floppy drive should be visible after start without media

2012-05-22 Thread Pavel Hrdina

If you start guest with floppy drive but without media inserted,
the guest now can see floppy drive.

Signed-off-by: Pavel Hrdina phrd...@redhat.com
---
 hw/pc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 4d34a33..967c17a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -382,7 +382,7 @@ void pc_cmos_init(ram_addr_t ram_size, ram_addr_t 
above_4g_mem_size,
 if (floppy) {
 fdc_get_bs(fd, floppy);
 for (i = 0; i  2; i++) {
-if (fd[i]  bdrv_is_inserted(fd[i])) {
+if (fd[i]) {
 bdrv_get_floppy_geometry_hint(fd[i], nb_heads, max_track,
   last_sect, FDRIVE_DRV_NONE,
   fd_type[i], rate);
-- 
1.7.7.6

[Qemu-devel] [PATCH 0/2] fdc: fix media handling

2012-05-22 Thread Pavel Hrdina

This patch series fixes handling of FDC when you don't have media inserted.
Guest should see floppy drive if you start guest without media and should
detect that there is no media in drive.

Signed-off-by: Pavel Hrdina phrd...@redhat.com

Pavel Hrdina (2):
  fdc: floppy drive should be visible after start without media
  fdc: fix media detection

 hw/fdc.c |8 ++--
 hw/pc.c  |2 +-
 2 files changed, 3 insertions(+), 7 deletions(-)

-- 
1.7.7.6

[Qemu-devel] [PATCH 15/16] convert net_init_bridge() to NetClientOptions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/tap.c |   23 ---
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 7501eba..fdaab2b 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -512,21 +512,22 @@ static int net_bridge_run_helper(const char *helper, 
const char *bridge)
 return -1;
 }
 
-int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts,
+int net_init_bridge(QemuOpts *old_opts, const NetClientOptions *opts,
 const char *name, VLANState *vlan)
 {
+const NetdevBridgeOptions *bridge;
+const char *helper, *br;
+
 TAPState *s;
 int fd, vnet_hdr;
 
-if (!qemu_opt_get(opts, br)) {
-qemu_opt_set(opts, br, DEFAULT_BRIDGE_INTERFACE);
-}
-if (!qemu_opt_get(opts, helper)) {
-qemu_opt_set(opts, helper, DEFAULT_BRIDGE_HELPER);
-}
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_BRIDGE);
+bridge = opts-bridge;
+
+helper = bridge-has_helper ? bridge-helper : DEFAULT_BRIDGE_HELPER;
+br = bridge-has_br ? bridge-br : DEFAULT_BRIDGE_INTERFACE;
 
-fd = net_bridge_run_helper(qemu_opt_get(opts, helper),
-   qemu_opt_get(opts, br));
+fd = net_bridge_run_helper(helper, br);
 if (fd == -1) {
 return -1;
 }
@@ -541,8 +542,8 @@ int net_init_bridge(QemuOpts *opts, const NetClientOptions 
*new_opts,
 return -1;
 }
 
-snprintf(s-nc.info_str, sizeof(s-nc.info_str), helper=%s,br=%s,
- qemu_opt_get(opts, helper), qemu_opt_get(opts, br));
+snprintf(s-nc.info_str, sizeof(s-nc.info_str), helper=%s,br=%s, helper,
+ br);
 
 return 0;
 }
-- 
1.7.1

[Qemu-devel] [PATCH 11/16] convert net_init_slirp() to NetClientOptions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/slirp.c |   93 ---
 1 files changed, 25 insertions(+), 68 deletions(-)

diff --git a/net/slirp.c b/net/slirp.c
index 9b925b7..166304c 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -654,89 +654,46 @@ void do_info_usernet(Monitor *mon)
 }
 }
 
-static int net_init_slirp_configs(const char *name, const char *value, void 
*opaque)
+static void
+net_init_slirp_configs(const StringList *fwd, int flags)
 {
-struct slirp_config_str *config;
-
-if (strcmp(name, hostfwd) != 0  strcmp(name, guestfwd) != 0) {
-return 0;
-}
-
-config = g_malloc0(sizeof(*config));
+while (fwd) {
+struct slirp_config_str *config;
 
-pstrcpy(config-str, sizeof(config-str), value);
+config = g_malloc0(sizeof(*config));
+pstrcpy(config-str, sizeof(config-str), fwd-value-str);
+config-flags = flags;
+config-next = slirp_configs;
+slirp_configs = config;
 
-if (!strcmp(name, hostfwd)) {
-config-flags = SLIRP_CFG_HOSTFWD;
+fwd = fwd-next;
 }
-
-config-next = slirp_configs;
-slirp_configs = config;
-
-return 0;
 }
 
-int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts,
+int net_init_slirp(QemuOpts *old_opts, const NetClientOptions *opts,
const char *name, VLANState *vlan)
 {
 struct slirp_config_str *config;
-const char *vhost;
-const char *vhostname;
-const char *vdhcp_start;
-const char *vnamesrv;
-const char *tftp_export;
-const char *bootfile;
-const char *smb_export;
-const char *vsmbsrv;
-const char *restrict_opt;
-char *vnet = NULL;
-int restricted = 0;
+char *vnet;
 int ret;
+const NetdevUserOptions *user;
 
-vhost   = qemu_opt_get(opts, host);
-vhostname   = qemu_opt_get(opts, hostname);
-vdhcp_start = qemu_opt_get(opts, dhcpstart);
-vnamesrv= qemu_opt_get(opts, dns);
-tftp_export = qemu_opt_get(opts, tftp);
-bootfile= qemu_opt_get(opts, bootfile);
-smb_export  = qemu_opt_get(opts, smb);
-vsmbsrv = qemu_opt_get(opts, smbserver);
-
-restrict_opt = qemu_opt_get(opts, restrict);
-if (restrict_opt) {
-if (!strcmp(restrict_opt, on) ||
-!strcmp(restrict_opt, yes) || !strcmp(restrict_opt, y)) {
-restricted = 1;
-} else if (strcmp(restrict_opt, off) 
-strcmp(restrict_opt, no)  strcmp(restrict_opt, n)) {
-error_report(invalid option: 'restrict=%s', restrict_opt);
-return -1;
-}
-}
-
-if (qemu_opt_get(opts, ip)) {
-const char *ip = qemu_opt_get(opts, ip);
-int l = strlen(ip) + strlen(/24) + 1;
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_USER);
+user = opts-user;
 
-vnet = g_malloc(l);
+vnet = user-has_net ? g_strdup(user-net) :
+   user-has_ip  ? g_strdup_printf(%s/24, user-ip) :
+   NULL;
 
-/* emulate legacy ip= parameter */
-pstrcpy(vnet, l, ip);
-pstrcat(vnet, l, /24);
-}
-
-if (qemu_opt_get(opts, net)) {
-if (vnet) {
-g_free(vnet);
-}
-vnet = g_strdup(qemu_opt_get(opts, net));
-}
+/* all optional fields are initialized to all bits zero */
 
-qemu_opt_foreach(opts, net_init_slirp_configs, NULL, 0);
+net_init_slirp_configs(user-hostfwd, SLIRP_CFG_HOSTFWD);
+net_init_slirp_configs(user-guestfwd, 0);
 
-ret = net_slirp_init(vlan, user, name, restricted, vnet, vhost,
- vhostname, tftp_export, bootfile, vdhcp_start,
- vnamesrv, smb_export, vsmbsrv);
+ret = net_slirp_init(vlan, user, name, user-restrict, vnet, user-host,
+ user-hostname, user-tftp, user-bootfile,
+ user-dhcpstart, user-dns, user-smb,
+ user-smbserver);
 
 while (slirp_configs) {
 config = slirp_configs;
-- 
1.7.1

[Qemu-devel] [PATCH 08/16] convert net_client_init() to OptsVisitor

2012-05-22 Thread Laszlo Ersek

The net_client_init() prototype is kept intact.

Based on is_netdev, the QemuOpts-rooted QemuOpt-list is parsed as a
Netdev or a NetLegacy. The original meat of net_client_init() is moved to
and simplified in net_client_init1():

Fields not common between -net and -netdev are clearly separated: (1)
since both Netdev::id and NetLegacy::name are mandatory now, getting the
name for the init functions is cleaner; (2) NetLegacy::vlan also
explicitly depends on -net (see below).

Verifying the type= option for -netdev can be turned into a switch.

Format validation with qemu_opts_validate() can be removed because the
visitor covers it. Relatedly, the net_client_types array is reduced to
an array of init functions that can be directly indexed by opts-kind.
(Help text is available in the schema JSON.)

The outermost negation in the condition around qemu_find_vlan() was
flattened, because it expresses the dependent code's requirements more
clearly.

VLAN lookup is avoided if there's no init function to pass the VLAN to.

Whenever the value of type=... is needed, we substitute
NetClientOptionsKind_lookup[kind].

The individual init functions are not converted yet, thus the original
QemuOpts instance is passed transparently.

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/dump.h  |4 +-
 net/slirp.h |4 +-
 net/socket.h|4 +-
 net/tap.h   |7 +-
 net/vde.h   |4 +-
 net.c   |  428 --
 net/dump.c  |3 +-
 net/slirp.c |3 +-
 net/socket.c|3 +-
 net/tap-win32.c |3 +-
 net/tap.c   |6 +-
 net/vde.c   |3 +-
 12 files changed, 126 insertions(+), 346 deletions(-)

diff --git a/net/dump.h b/net/dump.h
index 2b5d9ba..85ac00b 100644
--- a/net/dump.h
+++ b/net/dump.h
@@ -26,7 +26,9 @@
 
 #include net.h
 #include qemu-common.h
+#include qapi-types.h
 
-int net_init_dump(QemuOpts *opts, const char *name, VLANState *vlan);
+int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts,
+  const char *name, VLANState *vlan);
 
 #endif /* QEMU_NET_DUMP_H */
diff --git a/net/slirp.h b/net/slirp.h
index 53fe95d..ef13a65 100644
--- a/net/slirp.h
+++ b/net/slirp.h
@@ -27,10 +27,12 @@
 #include qemu-common.h
 #include qdict.h
 #include qemu-option.h
+#include qapi-types.h
 
 #ifdef CONFIG_SLIRP
 
-int net_init_slirp(QemuOpts *opts, const char *name, VLANState *vlan);
+int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts,
+   const char *name, VLANState *vlan);
 
 void net_slirp_hostfwd_add(Monitor *mon, const QDict *qdict);
 void net_slirp_hostfwd_remove(Monitor *mon, const QDict *qdict);
diff --git a/net/socket.h b/net/socket.h
index e1fe959..e44d26e 100644
--- a/net/socket.h
+++ b/net/socket.h
@@ -26,7 +26,9 @@
 
 #include net.h
 #include qemu-common.h
+#include qapi-types.h
 
-int net_init_socket(QemuOpts *opts, const char *name, VLANState *vlan);
+int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts,
+const char *name, VLANState *vlan);
 
 #endif /* QEMU_NET_SOCKET_H */
diff --git a/net/tap.h b/net/tap.h
index b2a9450..44e31ce 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -28,11 +28,13 @@
 
 #include qemu-common.h
 #include qemu-option.h
+#include qapi-types.h
 
 #define DEFAULT_NETWORK_SCRIPT /etc/qemu-ifup
 #define DEFAULT_NETWORK_DOWN_SCRIPT /etc/qemu-ifdown
 
-int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan);
+int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts,
+ const char *name, VLANState *vlan);
 
 int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required);
 
@@ -57,6 +59,7 @@ int tap_get_fd(VLANClientState *vc);
 struct vhost_net;
 struct vhost_net *tap_get_vhost_net(VLANClientState *vc);
 
-int net_init_bridge(QemuOpts *opts, const char *name, VLANState *vlan);
+int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts,
+const char *name, VLANState *vlan);
 
 #endif /* QEMU_NET_TAP_H */
diff --git a/net/vde.h b/net/vde.h
index 732e575..5fc17f9 100644
--- a/net/vde.h
+++ b/net/vde.h
@@ -26,10 +26,12 @@
 
 #include qemu-common.h
 #include qemu-option.h
+#include qapi-types.h
 
 #ifdef CONFIG_VDE
 
-int net_init_vde(QemuOpts *opts, const char *name, VLANState *vlan);
+int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts,
+ const char *name, VLANState *vlan);
 
 #endif /* CONFIG_VDE */
 
diff --git a/net.c b/net.c
index 9c8cd42..5ac5cf0 100644
--- a/net.c
+++ b/net.c
@@ -37,6 +37,9 @@
 #include qmp-commands.h
 #include hw/qdev.h
 #include iov.h
+#include qapi-visit.h
+#include qapi/opts-visitor.h
+#include qapi/qapi-dealloc-visitor.h
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -745,7 +748,8 @@ int net_handle_fd_param(Monitor *mon, const char *param)
 return fd;
 }
 
-static int net_init_nic(QemuOpts *opts, const char *name,

[Qemu-devel] [PATCH 2/2] fdc: fix media detection

2012-05-22 Thread Pavel Hrdina

We have to set up 'media_changed' after guest start so floppy driver
could detect that there is no media in drive. For this purpose we call
'fdctrl_change_cb' instead of 'fd_revalidate' in 'fdctrl_connect_drives'.
'fd_revalidate' is called inside 'fdctrl_change_cb'.

In 'fdctrl_handle_seek' we always set current track because we don't care
if there is media inserted or not.

Signed-off-by: Pavel Hrdina phrd...@redhat.com
---
 hw/fdc.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index cb4cd25..337b35a 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1617,11 +1617,7 @@ static void fdctrl_handle_seek(FDCtrl *fdctrl, int 
direction)
 /* The seek command just sends step pulses to the drive and doesn't care if
  * there is a medium inserted of if it's banging the head against the 
drive.
  */
-if (fdctrl-fifo[2]  cur_drv-max_track) {
-cur_drv-track = cur_drv-max_track;
-} else {
-cur_drv-track = fdctrl-fifo[2];
-}
+cur_drv-track = fdctrl-fifo[2];
 /* Raise Interrupt */
 fdctrl_raise_irq(fdctrl, FD_SR0_SEEK);
 }
@@ -1878,7 +1874,7 @@ static int fdctrl_connect_drives(FDCtrl *fdctrl)
 }
 
 fd_init(drive);
-fd_revalidate(drive);
+fdctrl_change_cb(drive, 0);
 if (drive-bs) {
 bdrv_set_dev_ops(drive-bs, fdctrl_block_ops, drive);
 }
-- 
1.7.7.6

[Qemu-devel] [PATCH 04/16] qapi: introduce OptsVisitor

2012-05-22 Thread Laszlo Ersek

This visitor supports parsing

  -option [type=]discriminator[,optarg1=val1][,optarg2=val2][,...]

style QemuOpts objects into native C structures. After defining the type
tree in the qapi schema (see below), a root type traversal with this
visitor linked to the underlying QemuOpts object will build the native C
representation of the option.

The type tree in the schema, corresponding to an option with a
discriminator, must have the following structure:

  struct
scalar member for non-discriminated optarg 1 [*]
list for repeating non-discriminated optarg 2 [*]
  wrapper struct
single scalar member
union
  struct for discriminator case 1
scalar member for optarg 3 [*]
list for repeating optarg 4 [*]
  wrapper struct
single scalar member
scalar member for optarg 5 [*]
  struct for discriminator case 2
...

The type optarg name is fixed for the discriminator role. Its schema
representation is union of structures, and each discriminator value must
correspond to a member name in the union.

If the option takes no type descriminator, then the type subtree rooted
at the union must be absent from the schema (including the union itself).

Optarg values can be of scalar types str / bool / int / size.

Members marked with [*] may be defined as optional in the schema,
describing an optional optarg.

Repeating an optarg is supported; its schema representation must be list
of structure with single mandatory scalar member. If an optarg is not
described as repeating in the schema (ie. it is defined as a scalar field
instead of a list), its last occurrence will take effect. Ordering between
differently named optargs is not preserved.

A mandatory list (or an optional one which is reported to be available),
corresponding to a repeating optarg, has at least one element after
successful parsing.

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 qapi/opts-visitor.h |   31 +
 qapi/opts-visitor.c |  363 +++
 Makefile.objs   |2 +-
 3 files changed, 395 insertions(+), 1 deletions(-)
 create mode 100644 qapi/opts-visitor.h
 create mode 100644 qapi/opts-visitor.c

diff --git a/qapi/opts-visitor.h b/qapi/opts-visitor.h
new file mode 100644
index 000..edd6d41
--- /dev/null
+++ b/qapi/opts-visitor.h
@@ -0,0 +1,31 @@
+/*
+ * Options Visitor
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Author: Laszlo Ersek ler...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef OPTS_VISITOR_H
+#define OPTS_VISITOR_H
+
+#include qapi-visit-core.h
+#include qemu-option.h
+
+typedef struct OptsVisitor OptsVisitor;
+
+/* Contrarily to qemu-option.c::parse_option_number(), OptsVisitor's int
+ * parser relies on strtoll() instead of strtoull(). Consequences:
+ * - string representations of negative numbers are rejected (parsed values
+ *   continue to be non-negative),
+ * - values above INT64_MAX or LLONG_MAX are rejected.
+ */
+OptsVisitor *opts_visitor_new(const QemuOpts *opts);
+void opts_visitor_cleanup(OptsVisitor *nv);
+Visitor *opts_get_visitor(OptsVisitor *nv);
+
+#endif
diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
new file mode 100644
index 000..a94c67e
--- /dev/null
+++ b/qapi/opts-visitor.c
@@ -0,0 +1,363 @@
+/*
+ * Options Visitor
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Author: Laszlo Ersek ler...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include opts-visitor.h
+#include qemu-queue.h
+#include qemu-option-internal.h
+#include qapi-visit-impl.h
+
+
+struct OptsVisitor
+{
+Visitor visitor;
+
+/* Ownership remains with opts_visitor_new()'s caller. */
+const QemuOpts *opts_root;
+
+unsigned depth;
+
+/* Non-null iff depth is positive. Each key is a QemuOpt name. Each value
+ * is a non-empty GQueue, enumerating all QemuOpt occurrences with that
+ * name. */
+GHashTable *unprocessed_opts;
+
+/* The list currently being traversed with opts_start_list() /
+ * opts_next_list(). The list must have a struct element type in the
+ * schema, with a single mandatory scalar member. */
+GQueue *repeated_opts;
+bool repeated_opts_first;
+};
+
+
+static void
+destroy_list(gpointer list)
+{
+  g_queue_free(list);
+}
+
+
+static void
+opts_start_struct(Visitor *v, void **obj, const char *kind,
+  const char *name, size_t size, Error **errp)
+{
+OptsVisitor *ov = DO_UPCAST(OptsVisitor, visitor, v);
+const QemuOpt *opt;
+
+*obj = g_malloc0(size);
+if (ov-depth++  0) {
+return;
+}
+
+ov-unprocessed_opts = g_hash_table_new_full(g_str_hash, g_str_equal,
+ NULL, destroy_list);
+QTAILQ_FOREACH(opt,

[Qemu-devel] [PATCH 00/16] introduce OptsVisitor, rebase -net/-netdev parsing

2012-05-22 Thread Laszlo Ersek

Inspired by [1], the first half of this series attempts to implement a new
visitor that should clean up defining and processing command line options.
For a more detailed description, please see [PATCH 04/16] qapi: introduce
OptsVisitor.

The second half converts -net/-netdev parsing to the new visitor.

The series depends on qapi: convert netdev_add  netdev_del v5, see [2].

Comments highly appreciated; please keep me CC'd. Thank you.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2012-04/msg02512.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2012-05/msg02841.html

Laszlo Ersek (15):
  qapi: introduce size type
  expose QemuOpt and QemuOpts struct definitions to interested parties
  qapi: introduce OptsVisitor
  qapi schema: remove trailing whitespace
  qapi schema: add Netdev types
  hw, net: net_client_type - NetClientOptionsKind (qapi-generated)
  convert net_client_init() to OptsVisitor
  convert net_init_nic() to NetClientOptions
  convert net_init_dump() to NetClientOptions
  convert net_init_slirp() to NetClientOptions
  convert net_init_socket() to NetClientOptions
  convert net_init_vde() to NetClientOptions
  convert net_init_tap() to NetClientOptions
  convert net_init_bridge() to NetClientOptions
  remove unused QemuOpts parameter from net init functions

Paolo Bonzini (1):
  qapi: fix error propagation

 error.h|4 +-
 net.h  |   16 +--
 net/dump.h |5 +-
 net/slirp.h|5 +-
 net/socket.h   |5 +-
 net/tap.h  |   10 +-
 net/vde.h  |5 +-
 qapi/opts-visitor.h|   31 +++
 qapi/qapi-visit-core.h |4 +
 qemu-option-internal.h |   53 +
 error.c|4 +-
 hw/cadence_gem.c   |2 +-
 hw/dp8393x.c   |2 +-
 hw/e1000.c |2 +-
 hw/eepro100.c  |2 +-
 hw/etraxfs_eth.c   |2 +-
 hw/lan9118.c   |2 +-
 hw/lance.c |2 +-
 hw/mcf_fec.c   |2 +-
 hw/milkymist-minimac2.c|2 +-
 hw/mipsnet.c   |2 +-
 hw/musicpal.c  |2 +-
 hw/ne2000-isa.c|2 +-
 hw/ne2000.c|2 +-
 hw/opencores_eth.c |2 +-
 hw/pcnet-pci.c |2 +-
 hw/rtl8139.c   |2 +-
 hw/smc91c111.c |2 +-
 hw/spapr_llan.c|2 +-
 hw/stellaris_enet.c|2 +-
 hw/usb/dev-network.c   |2 +-
 hw/vhost_net.c |2 +-
 hw/virtio-net.c|   10 +-
 hw/xen_nic.c   |2 +-
 hw/xgmac.c |2 +-
 hw/xilinx_axienet.c|2 +-
 hw/xilinx_ethlite.c|2 +-
 net.c  |  493 +++-
 net/dump.c |   24 ++-
 net/slirp.c|   96 +++--
 net/socket.c   |  124 ---
 net/tap-aix.c  |2 +-
 net/tap-bsd.c  |2 +-
 net/tap-haiku.c|2 +-
 net/tap-linux.c|9 +-
 net/tap-solaris.c  |2 +-
 net/tap-win32.c|   14 +-
 net/tap.c  |  152 ++--
 net/vde.c  |   40 +++-
 qapi/opts-visitor.c|  363 +
 qapi/qapi-visit-core.c |   17 +-
 qemu-option.c  |   24 +--
 tests/test-qmp-input-visitor.c |   24 ++-
 Makefile.objs  |2 +-
 docs/qapi-code-gen.txt |2 +
 qapi-schema.json   |  285 +++-
 scripts/qapi-visit.py  |  129 ++-
 scripts/qapi.py|2 +-
 58 files changed, 1239 insertions(+), 771 deletions(-)
 create mode 100644 qapi/opts-visitor.h
 create mode 100644 qemu-option-internal.h
 create mode 100644 qapi/opts-visitor.c

[Qemu-devel] [PATCH qom-next 4/5] target-i386: make initialize CPU in QOM way

2012-05-22 Thread Igor Mammedov

Make CPU creation/initialization consistent with QOM object
behavior in this, by moving tcg and apic initialization from board
level into CPU's initfn/realize calls and cpu_model property setter.

Which makes CPU object self-sufficient in respect of creation/initialization
and matches a typical object creation sequence, i.e.:
  - create CPU instance
  - set properties
  - realize object - (x86_cpu_realize will be converted into realize
  property setter, when it is implemented)

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/pc.c  |   32 +-
 target-i386/cpu.c|   60 +-
 target-i386/helper.c |   39 
 3 files changed, 65 insertions(+), 66 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 0eb0b73..677f9e0 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -894,37 +894,17 @@ static void pc_cpu_reset(void *opaque)
 cpu_reset(CPU(cpu));
 }
 
-static X86CPU *pc_new_cpu(const char *cpu_model)
-{
-X86CPU *cpu;
-CPUX86State *env;
-
-cpu = cpu_x86_init(cpu_model);
-if (cpu == NULL) {
-exit(1);
-}
-env = cpu-env;
-if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
-if (kvm_irqchip_in_kernel()) {
-env-apic_state = qdev_create(NULL, kvm-apic);
-} else {
-env-apic_state = qdev_create(NULL, apic);
-}
-qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
-qdev_prop_set_ptr(env-apic_state, cpu_env, env);
-qdev_init_nofail(env-apic_state);
-}
-qemu_register_reset(pc_cpu_reset, cpu);
-pc_cpu_reset(cpu);
-return cpu;
-}
-
 void pc_cpus_init(const char *cpu_model)
 {
+X86CPU *cpu;
 int i;
 
 for(i = 0; i  smp_cpus; i++) {
-pc_new_cpu(cpu_model);
+cpu = cpu_x86_init(cpu_model);
+if (cpu == NULL) {
+exit(1);
+}
+qemu_register_reset(pc_cpu_reset, cpu);
 }
 }
 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 538892d..0e804ea 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -31,6 +31,9 @@
 
 #include hyperv.h
 
+#include hw/qdev.h
+#include sysemu.h
+
 /* feature flags taken from Intel Processor Identification and the CPUID
  * Instruction and AMD's CPUID Specification.  In cases of disagreement
  * between feature naming conventions, aliases may be added.
@@ -1747,21 +1750,76 @@ static void x86_set_cpu_model(Object *obj, const char 
*value, Error **errp)
 if (cpu_x86_register(cpu, env-cpu_model_str)  0) {
 fprintf(stderr, Unable to find x86 CPU definition\n);
 error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
+return;
+}
+
+if (((env-cpuid_features  CPUID_APIC) || smp_cpus  1)) {
+if (kvm_irqchip_in_kernel()) {
+env-apic_state = qdev_create(NULL, kvm-apic);
+} else {
+env-apic_state = qdev_create(NULL, apic);
+}
+object_property_add_child(OBJECT(cpu), apic,
+OBJECT(env-apic_state), NULL);
+
+qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id);
+qdev_prop_set_ptr(env-apic_state, cpu_env, env);
+}
+}
+
+static CPUDebugExcpHandler *prev_debug_excp_handler;
+
+static void breakpoint_handler(CPUX86State *env)
+{
+CPUBreakpoint *bp;
+
+if (env-watchpoint_hit) {
+if (env-watchpoint_hit-flags  BP_CPU) {
+env-watchpoint_hit = NULL;
+if (check_hw_breakpoints(env, 0)) {
+raise_exception_env(EXCP01_DB, env);
+} else {
+cpu_resume_from_signal(env, NULL);
+}
+}
+} else {
+QTAILQ_FOREACH(bp, env-breakpoints, entry)
+if (bp-pc == env-eip) {
+if (bp-flags  BP_CPU) {
+check_hw_breakpoints(env, 1);
+raise_exception_env(EXCP01_DB, env);
+}
+break;
+}
+}
+if (prev_debug_excp_handler) {
+prev_debug_excp_handler(env);
 }
 }
 
 void x86_cpu_realize(Object *obj, Error **errp)
 {
 X86CPU *cpu = X86_CPU(obj);
+CPUX86State *env = cpu-env;
+
+if (env-apic_state) {
+if (qdev_init(env-apic_state)  0) {
+error_set(errp, QERR_DEVICE_INIT_FAILED,
+object_get_typename(OBJECT(env-apic_state)));
+return;
+}
+}
 
 mce_init(cpu);
-qemu_init_vcpu(cpu-env);
+qemu_init_vcpu(env);
+cpu_reset(CPU(cpu));
 }
 
 static void x86_cpu_initfn(Object *obj)
 {
 X86CPU *cpu = X86_CPU(obj);
 CPUX86State *env = cpu-env;
+static int inited;
 
 cpu_exec_init(env);
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 6fc67a9..443092e 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -947,34 +947,6 @@ int check_hw_breakpoints(CPUX86State *env, int 
force_dr6_update)
 return hit_enabled;
 }
 
-static CPUDebugExcpHandler *prev_debug_excp_handler;

Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions

2012-05-22 Thread Michael S. Tsirkin

On Tue, May 22, 2012 at 05:17:39PM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote:
  The emulated devices can run simultaneously with the guest, so
  we need to be careful with ordering of load and stores done by
  them to the guest system memory, which need to be observed in
  the right order by the guest operating system.
  
  This adds a barrier call to the basic DMA read/write ops which
  is currently implemented as a smp_mb(), but could be later
  improved for more fine grained control of barriers.
  
  Additionally, a _relaxed() variant of the accessors is provided
  to easily convert devices who would be performance sensitive
  and negatively impacted by the change.
  
  Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
  ---
 
 (Note to Rusty: The number I told you on ST is wrong, see below)
 
 So I tried to do some performance measurements with that patch using
 netperf on an x86 laptop (x220 with core i7).
 
 It's a bit tricky. For example, if I just create a tap interface,
 give it a local IP on the laptop and a different IP on the guest,
 (ie talking to a netserver on the host basically from the guest
 via tap), the performance is pretty poor and the numbers seem
 useless with and without the barrier.
 
 So I did tests involving talking to a server on our gigabit network
 instead.
 
 The baseline is the laptop without kvm talking to the server. The
 TCP_STREAM test results are:

It's not a good test. The thing most affecting throughput results is how
much CPU does you guest get. So as a minumum you need to measure CPU
utilization on the host and divide by that.

-- 
MST

[Qemu-devel] [PATCH 05/16] qapi schema: remove trailing whitespace

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 qapi-schema.json |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index bb1f806..a8b3803 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -315,7 +315,7 @@
 # @CPU: the index of the virtual CPU
 #
 # @current: this only exists for backwards compatible and should be ignored
-# 
+#
 # @halted: true if the virtual CPU is in the halt state.  Halt usually refers
 #  to a processor specific low power mode.
 #
@@ -658,7 +658,7 @@
 # @SpiceInfo
 #
 # Information about the SPICE session.
-# 
+#
 # @enabled: true if the SPICE server is enabled, false otherwise
 #
 # @host: #optional The hostname the SPICE server is bound to.  This depends on
@@ -1269,7 +1269,7 @@
 ##
 { 'command': 'human-monitor-command',
   'data': {'command-line': 'str', '*cpu-index': 'int'},
-  'returns': 'str' } 
+  'returns': 'str' }
 
 ##
 # @migrate_cancel
@@ -1430,7 +1430,7 @@
 # @password: the new password
 #
 # @connected: #optional how to handle existing clients when changing the
-#   password.  If nothing is specified, defaults to `keep' 
+#   password.  If nothing is specified, defaults to `keep'
 #   `fail' to fail the command if clients are connected
 #   `disconnect' to disconnect existing clients
 #   `keep' to maintain existing clients
@@ -1570,7 +1570,7 @@
 #  If the argument combination is invalid, InvalidParameterCombination
 #
 # Since: 1.1
-## 
+##
 { 'command': 'block_set_io_throttle',
   'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
-- 
1.7.1

[Qemu-devel] [PATCH 10/16] convert net_init_dump() to NetClientOptions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/dump.c |   21 +
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/dump.c b/net/dump.c
index 27e9528..a1de09e 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -144,22 +144,35 @@ static int net_dump_init(VLANState *vlan, const char 
*device,
 return 0;
 }
 
-int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts,
+int net_init_dump(QemuOpts *old_opts, const NetClientOptions *opts,
   const char *name, VLANState *vlan)
 {
 int len;
 const char *file;
 char def_file[128];
+const NetdevDumpOptions *dump;
+
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_DUMP);
+dump = opts-dump;
 
 assert(vlan);
 
-file = qemu_opt_get(opts, file);
-if (!file) {
+if (dump-has_file) {
+file = dump-file;
+} else {
 snprintf(def_file, sizeof(def_file), qemu-vlan%d.pcap, vlan-id);
 file = def_file;
 }
 
-len = qemu_opt_get_size(opts, len, 65536);
+if (dump-has_len) {
+if (dump-len  INT_MAX) {
+error_report(invalid length: %PRId64, dump-len);
+return -1;
+}
+len = dump-len;
+} else {
+len = 65536;
+}
 
 return net_dump_init(vlan, dump, name, file, len);
 }
-- 
1.7.1

Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event

2012-05-22 Thread Jan Kiszka

On 2012-05-21 03:50, Wen Congyang wrote:
 When the guest is panicked, it will write 0x1 to the port 0x505. So if
 qemu reads 0x1 from this port, we can do the folloing three things
 according to the parameter -onpanic:
 1. emit QEVENT_GUEST_PANICKED only
 2. emit QEVENT_GUEST_PANICKED and pause VM
 3. emit QEVENT_GUEST_PANICKED and quit VM
 
 Note: if we emit QEVENT_GUEST_PANICKED only, and the management
 application does not receive this event(the management may not
 run when the event is emitted), the management won't know the
 guest is panicked.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  kvm-all.c|   84 
 ++
  kvm-stub.c   |9 ++
  kvm.h|3 ++
  monitor.c|3 ++
  monitor.h|1 +
  qapi-schema.json |6 +++-
  qemu-options.hx  |   14 +
  qmp.c|3 +-
  vl.c |   17 ++-
  9 files changed, 137 insertions(+), 3 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 9b73ccf..b5b0531 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -19,6 +19,7 @@
  #include stdarg.h
  
  #include linux/kvm.h
 +#include linux/kvm_para.h
  
  #include qemu-common.h
  #include qemu-barrier.h
 @@ -29,6 +30,8 @@
  #include bswap.h
  #include memory.h
  #include exec-memory.h
 +#include iorange.h
 +#include qemu-objects.h
  
  /* This check must be after config-host.h is included */
  #ifdef CONFIG_EVENTFD
 @@ -1707,3 +1710,84 @@ int kvm_on_sigbus(int code, void *addr)
  {
  return kvm_arch_on_sigbus(code, addr);
  }
 +
 +/* Possible values for action parameter. */
 +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
 +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause VM */
 +#define PANICKED_QUIT   3   /* emit QEVENT_GUEST_PANICKED and quit VM */
 +
 +static int panicked_action = PANICKED_REPORT;
 +
 +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, unsigned 
 width,
 + uint64_t *data)
 +{
 +*data = (1  KVM_PV_FEATURE_PANICKED);
 +}
 +
 +static void panicked_mon_event(const char *action)
 +{
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'action': %s }, action);
 +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
 +qobject_decref(data);
 +}
 +
 +static void panicked_perform_action(void)
 +{
 +switch(panicked_action) {
 +case PANICKED_REPORT:
 +panicked_mon_event(report);
 +break;
 +
 +case PANICKED_PAUSE:
 +panicked_mon_event(pause);
 +vm_stop(RUN_STATE_GUEST_PANICKED);
 +break;
 +
 +case PANICKED_QUIT:
 +panicked_mon_event(quit);
 +exit(0);
 +break;
 +}
 +}
 +
 +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, unsigned 
 width,
 +  uint64_t data)
 +{
 +if (data == KVM_PV_PANICKED)
 +panicked_perform_action();
 +}
 +
 +static void kvm_pv_port_destructor(IORange *iorange)
 +{
 +g_free(iorange);
 +}
 +
 +static IORangeOps pv_io_range_ops = {
 +.read = kvm_pv_port_read,
 +.write = kvm_pv_port_write,
 +.destructor = kvm_pv_port_destructor,
 +};
 +
 +void kvm_pv_port_init(void)
 +{
 +IORange *pv_io_range = g_malloc(sizeof(IORange));
 +
 +iorange_init(pv_io_range, pv_io_range_ops, 0x505, 1);
 +ioport_register(pv_io_range);

Not sure if the discussion about the PV channel already settled, but if
PIO is the way to go, please model this as a proper QEMU device (e.g.
panic, implemented in hw/kvm/panic.c), not just an open-coded PIO range.

Jan

PS: checkpatch.pl...

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH 07/16] hw, net: net_client_type - NetClientOptionsKind (qapi-generated)

2012-05-22 Thread Laszlo Ersek

NET_CLIENT_TYPE_ - NET_CLIENT_OPTIONS_KIND_

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net.h   |   16 +-
 hw/cadence_gem.c|2 +-
 hw/dp8393x.c|2 +-
 hw/e1000.c  |2 +-
 hw/eepro100.c   |2 +-
 hw/etraxfs_eth.c|2 +-
 hw/lan9118.c|2 +-
 hw/lance.c  |2 +-
 hw/mcf_fec.c|2 +-
 hw/milkymist-minimac2.c |2 +-
 hw/mipsnet.c|2 +-
 hw/musicpal.c   |2 +-
 hw/ne2000-isa.c |2 +-
 hw/ne2000.c |2 +-
 hw/opencores_eth.c  |2 +-
 hw/pcnet-pci.c  |2 +-
 hw/rtl8139.c|2 +-
 hw/smc91c111.c  |2 +-
 hw/spapr_llan.c |2 +-
 hw/stellaris_enet.c |2 +-
 hw/usb/dev-network.c|2 +-
 hw/vhost_net.c  |2 +-
 hw/virtio-net.c |   10 
 hw/xen_nic.c|2 +-
 hw/xgmac.c  |2 +-
 hw/xilinx_axienet.c |2 +-
 hw/xilinx_ethlite.c |2 +-
 net.c   |   50 +++---
 net/dump.c  |2 +-
 net/slirp.c |2 +-
 net/socket.c|4 +-
 net/tap-win32.c |2 +-
 net/tap.c   |   16 +++---
 net/vde.c   |2 +-
 34 files changed, 71 insertions(+), 83 deletions(-)

diff --git a/net.h b/net.h
index bdc2a06..b0b8c7a 100644
--- a/net.h
+++ b/net.h
@@ -7,6 +7,7 @@
 #include qemu-option.h
 #include net/queue.h
 #include vmstate.h
+#include qapi-types.h
 
 struct MACAddr {
 uint8_t a[6];
@@ -29,19 +30,6 @@ typedef struct NICConf {
 
 /* VLANs support */
 
-typedef enum {
-NET_CLIENT_TYPE_NONE,
-NET_CLIENT_TYPE_NIC,
-NET_CLIENT_TYPE_USER,
-NET_CLIENT_TYPE_TAP,
-NET_CLIENT_TYPE_SOCKET,
-NET_CLIENT_TYPE_VDE,
-NET_CLIENT_TYPE_DUMP,
-NET_CLIENT_TYPE_BRIDGE,
-
-NET_CLIENT_TYPE_MAX
-} net_client_type;
-
 typedef void (NetPoll)(VLANClientState *, bool enable);
 typedef int (NetCanReceive)(VLANClientState *);
 typedef ssize_t (NetReceive)(VLANClientState *, const uint8_t *, size_t);
@@ -50,7 +38,7 @@ typedef void (NetCleanup) (VLANClientState *);
 typedef void (LinkStatusChanged)(VLANClientState *);
 
 typedef struct NetClientInfo {
-net_client_type type;
+NetClientOptionsKind type;
 size_t size;
 NetReceive *receive;
 NetReceive *receive_raw;
diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c
index e2140ae..f184620 100644
--- a/hw/cadence_gem.c
+++ b/hw/cadence_gem.c
@@ -1161,7 +1161,7 @@ static void gem_set_link(VLANClientState *nc)
 }
 
 static NetClientInfo net_gem_info = {
-.type = NET_CLIENT_TYPE_NIC,
+.type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
 .can_receive = gem_can_receive,
 .receive = gem_receive,
diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index 017d074..756d630 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -872,7 +872,7 @@ static void nic_cleanup(VLANClientState *nc)
 }
 
 static NetClientInfo net_dp83932_info = {
-.type = NET_CLIENT_TYPE_NIC,
+.type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
 .can_receive = nic_can_receive,
 .receive = nic_receive,
diff --git a/hw/e1000.c b/hw/e1000.c
index 4573f13..ad24298 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1206,7 +1206,7 @@ pci_e1000_uninit(PCIDevice *dev)
 }
 
 static NetClientInfo net_e1000_info = {
-.type = NET_CLIENT_TYPE_NIC,
+.type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
 .can_receive = e1000_can_receive,
 .receive = e1000_receive,
diff --git a/hw/eepro100.c b/hw/eepro100.c
index 6279ae3..f343685 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1845,7 +1845,7 @@ static int pci_nic_uninit(PCIDevice *pci_dev)
 }
 
 static NetClientInfo net_eepro100_info = {
-.type = NET_CLIENT_TYPE_NIC,
+.type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
 .can_receive = nic_can_receive,
 .receive = nic_receive,
diff --git a/hw/etraxfs_eth.c b/hw/etraxfs_eth.c
index 16a0637..45fb40c 100644
--- a/hw/etraxfs_eth.c
+++ b/hw/etraxfs_eth.c
@@ -579,7 +579,7 @@ static void eth_cleanup(VLANClientState *nc)
 }
 
 static NetClientInfo net_etraxfs_info = {
-   .type = NET_CLIENT_TYPE_NIC,
+   .type = NET_CLIENT_OPTIONS_KIND_NIC,
.size = sizeof(NICState),
.can_receive = eth_can_receive,
.receive = eth_receive,
diff --git a/hw/lan9118.c b/hw/lan9118.c
index 7b4fe87..40fb765 100644
--- a/hw/lan9118.c
+++ b/hw/lan9118.c
@@ -1310,7 +1310,7 @@ static void lan9118_cleanup(VLANClientState *nc)
 }
 
 static NetClientInfo net_lan9118_info = {
-.type = NET_CLIENT_TYPE_NIC,
+.type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
 .can_receive = lan9118_can_receive,
 .receive = lan9118_receive,
diff --git a/hw/lance.c b/hw/lance.c
index ce3d46c..91c0e16 100644
--- a/hw/lance.c
+++ b/hw/lance.c
@@ -93,7 +93,7 @@

[Qemu-devel] [PATCH qom-next 1/5] target-i386: move cpu halted decision into x86_cpu_reset

2012-05-22 Thread Igor Mammedov

From: Igor Mammedov niall...@gmail.com

MP initialization protocol differs between cpu families, and for P6 and
onward models it is up to CPU to decide if it will be BSP using this
protocol, so try to model this. However there is no point in implementing
MP initialization protocol in qemu. Thus first CPU is always marked as BSP.

This patch:
 - moves decision to designate BSP from board into cpu, making cpu
self-sufficient in this regard. Later it will allow to cleanup hw/pc.c
and remove cpu_reset and wrappers from there.
 - stores flag that CPU is BSP in IA32_APIC_BASE to model behavior
described in Inted SDM vol 3a part 1 chapter 8.4.1
 - uses MSR_IA32_APICBASE_BSP flag in apic_base for checking if cpu is BSP

patch is based on Jan Kiszka's proposal:
http://thread.gmane.org/gmane.comp.emulators.qemu/100806

Signed-off-by: Igor Mammedov niall...@gmail.com
Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/apic.h|2 +-
 hw/apic_common.c |   18 --
 hw/pc.c  |9 -
 target-i386/cpu.c|7 +++
 target-i386/helper.c |1 -
 target-i386/kvm.c|5 +++--
 6 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/hw/apic.h b/hw/apic.h
index 62179ce..d961ed4 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -20,9 +20,9 @@ void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
 void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
TPRAccess access);
+void apic_designate_bsp(DeviceState *d);
 
 /* pc.c */
-int cpu_is_bsp(CPUX86State *env);
 DeviceState *cpu_get_current_apic(void);
 
 #endif
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 60b8259..23d51e8 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -43,8 +43,8 @@ uint64_t cpu_get_apic_base(DeviceState *d)
 trace_cpu_get_apic_base((uint64_t)s-apicbase);
 return s-apicbase;
 } else {
-trace_cpu_get_apic_base(0);
-return 0;
+trace_cpu_get_apic_base(MSR_IA32_APICBASE_BSP);
+return MSR_IA32_APICBASE_BSP;
 }
 }
 
@@ -201,22 +201,28 @@ void apic_init_reset(DeviceState *d)
 s-timer_expiry = -1;
 }
 
+void apic_designate_bsp(DeviceState *d)
+{
+if (d) {
+APICCommonState *s = APIC_COMMON(d);
+s-apicbase |= MSR_IA32_APICBASE_BSP;
+}
+}
+
 static void apic_reset_common(DeviceState *d)
 {
 APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
 APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
-bool bsp;
 
-bsp = cpu_is_bsp(s-cpu_env);
 s-apicbase = 0xfee0 |
-(bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
+(s-apicbase  MSR_IA32_APICBASE_BSP) | MSR_IA32_APICBASE_ENABLE;
 
 s-vapic_paddr = 0;
 info-vapic_base_update(s);
 
 apic_init_reset(d);
 
-if (bsp) {
+if (s-apicbase  MSR_IA32_APICBASE_BSP) {
 /*
  * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
  * time typically by BIOS, so PIC interrupt can be delivered to the
diff --git a/hw/pc.c b/hw/pc.c
index 4167782..6ad1c0d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -870,12 +870,6 @@ void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd)
 nb_ne2k++;
 }
 
-int cpu_is_bsp(CPUX86State *env)
-{
-/* We hard-wire the BSP to the first CPU. */
-return env-cpu_index == 0;
-}
-
 DeviceState *cpu_get_current_apic(void)
 {
 if (cpu_single_env) {
@@ -935,10 +929,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 static void pc_cpu_reset(void *opaque)
 {
 X86CPU *cpu = opaque;
-CPUX86State *env = cpu-env;
-
 cpu_reset(CPU(cpu));
-env-halted = !cpu_is_bsp(env);
 }
 
 static X86CPU *pc_new_cpu(const char *cpu_model)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 89b4ac7..c7c23bf 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1704,6 +1704,13 @@ static void x86_cpu_reset(CPUState *s)
 env-dr[7] = DR7_FIXED_1;
 cpu_breakpoint_remove_all(env, BP_CPU);
 cpu_watchpoint_remove_all(env, BP_CPU);
+
+/* We hard-wire the BSP to the first CPU. */
+if (env-cpu_index == 0) {
+apic_designate_bsp(env-apic_state);
+}
+
+env-halted = !(cpu_get_apic_base(env-apic_state)  
MSR_IA32_APICBASE_BSP);
 }
 
 static void mce_init(X86CPU *cpu)
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 8df109f..94f95b7 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1197,7 +1197,6 @@ void do_cpu_init(X86CPU *cpu)
 env-interrupt_request = sipi;
 env-pat = pat;
 apic_init_reset(env-apic_state);
-env-halted = !cpu_is_bsp(env);
 }
 
 void do_cpu_sipi(X86CPU *cpu)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0d0d8f6..09621e5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -583,8 +583,9 @@ void kvm_arch_reset_vcpu(CPUX86State *env)
 env-interrupt_injected = -1;
 env-xcr0 = 1;
 if (kvm_irqchip_in_kernel()) {
-env-mp_state = cpu_is_bsp(env) ?

[Qemu-devel] [PATCH 16/16] remove unused QemuOpts parameter from net init functions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/dump.h  |5 ++---
 net/slirp.h |5 ++---
 net/socket.h|5 ++---
 net/tap.h   |9 -
 net/vde.h   |5 ++---
 net.c   |   14 ++
 net/dump.c  |4 ++--
 net/slirp.c |4 ++--
 net/socket.c|4 ++--
 net/tap-win32.c |4 ++--
 net/tap.c   |8 
 net/vde.c   |4 ++--
 12 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/net/dump.h b/net/dump.h
index 85ac00b..0fa2dd7 100644
--- a/net/dump.h
+++ b/net/dump.h
@@ -25,10 +25,9 @@
 #define QEMU_NET_DUMP_H
 
 #include net.h
-#include qemu-common.h
 #include qapi-types.h
 
-int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts,
-  const char *name, VLANState *vlan);
+int net_init_dump(const NetClientOptions *opts, const char *name,
+  VLANState *vlan);
 
 #endif /* QEMU_NET_DUMP_H */
diff --git a/net/slirp.h b/net/slirp.h
index ef13a65..fa7eacc 100644
--- a/net/slirp.h
+++ b/net/slirp.h
@@ -26,13 +26,12 @@
 
 #include qemu-common.h
 #include qdict.h
-#include qemu-option.h
 #include qapi-types.h
 
 #ifdef CONFIG_SLIRP
 
-int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts,
-   const char *name, VLANState *vlan);
+int net_init_slirp(const NetClientOptions *opts, const char *name,
+   VLANState *vlan);
 
 void net_slirp_hostfwd_add(Monitor *mon, const QDict *qdict);
 void net_slirp_hostfwd_remove(Monitor *mon, const QDict *qdict);
diff --git a/net/socket.h b/net/socket.h
index e44d26e..c4809ad 100644
--- a/net/socket.h
+++ b/net/socket.h
@@ -25,10 +25,9 @@
 #define QEMU_NET_SOCKET_H
 
 #include net.h
-#include qemu-common.h
 #include qapi-types.h
 
-int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts,
-const char *name, VLANState *vlan);
+int net_init_socket(const NetClientOptions *opts, const char *name,
+VLANState *vlan);
 
 #endif /* QEMU_NET_SOCKET_H */
diff --git a/net/tap.h b/net/tap.h
index f092129..19dea58 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -27,14 +27,13 @@
 #define QEMU_NET_TAP_H
 
 #include qemu-common.h
-#include qemu-option.h
 #include qapi-types.h
 
 #define DEFAULT_NETWORK_SCRIPT /etc/qemu-ifup
 #define DEFAULT_NETWORK_DOWN_SCRIPT /etc/qemu-ifdown
 
-int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts,
- const char *name, VLANState *vlan);
+int net_init_tap(const NetClientOptions *opts, const char *name,
+ VLANState *vlan);
 
 int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required);
 
@@ -59,7 +58,7 @@ int tap_get_fd(VLANClientState *vc);
 struct vhost_net;
 struct vhost_net *tap_get_vhost_net(VLANClientState *vc);
 
-int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts,
-const char *name, VLANState *vlan);
+int net_init_bridge(const NetClientOptions *opts, const char *name,
+VLANState *vlan);
 
 #endif /* QEMU_NET_TAP_H */
diff --git a/net/vde.h b/net/vde.h
index 5fc17f9..ad502ef 100644
--- a/net/vde.h
+++ b/net/vde.h
@@ -25,13 +25,12 @@
 #define QEMU_NET_VDE_H
 
 #include qemu-common.h
-#include qemu-option.h
 #include qapi-types.h
 
 #ifdef CONFIG_VDE
 
-int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts,
- const char *name, VLANState *vlan);
+int net_init_vde(const NetClientOptions *opts, const char *name,
+ VLANState *vlan);
 
 #endif /* CONFIG_VDE */
 
diff --git a/net.c b/net.c
index bd2fd23..687992a 100644
--- a/net.c
+++ b/net.c
@@ -748,8 +748,8 @@ int net_handle_fd_param(Monitor *mon, const char *param)
 return fd;
 }
 
-static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts,
-const char *name, VLANState *vlan)
+static int net_init_nic(const NetClientOptions *opts, const char *name,
+VLANState *vlan)
 {
 int idx;
 NICInfo *nd;
@@ -813,8 +813,7 @@ static int net_init_nic(QemuOpts *old_opts, const 
NetClientOptions *opts,
 
 
 static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
-QemuOpts *old_opts,
-const NetClientOptions *new_opts,
+const NetClientOptions *opts,
 const char *name,
 VLANState *vlan) = {
 [NET_CLIENT_OPTIONS_KIND_NIC]= net_init_nic,
@@ -833,8 +832,7 @@ static int (* const 
net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
 };
 
 
-static int net_client_init1(const void *object, int is_netdev,
-QemuOpts *old_opts, Error **errp)
+static int net_client_init1(const void *object, int is_netdev, Error **errp)
 {
 union {
 const Netdev*netdev;
@@ -884,7 +882,7 @@ static int net_client_init1(const void *object, int 
is_netdev,
 vlan = qemu_find_vlan(u.net-has_vlan ? u.net-vlan : 0, true);
 }
 
-if (net_client_init_fun[opts-kind](old_opts, opts,

Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions

2012-05-22 Thread Benjamin Herrenschmidt

On Tue, 2012-05-22 at 14:14 +0300, Michael S. Tsirkin wrote:
  The baseline is the laptop without kvm talking to the server. The
  TCP_STREAM test results are:
 
 It's not a good test. The thing most affecting throughput results is
 how
 much CPU does you guest get. So as a minumum you need to measure CPU
 utilization on the host and divide by that. 

The simple fact that we don't reach the baseline while in qemu seems to
be a reasonably good indication that we tend to be CPU bound already so
it's not -that- relevant. It would be if we were saturating the network.

But yes, I can try to do more tests tomorrow, it would be nice if you
could contribute a proper test protocol (or even test on some machines)
since you seem to be familiar with those measurements (and I have a very
limited access to x86 gear ... basically just my laptop).

Cheers,
Ben.

[Qemu-devel] [PATCH 01/16] qapi: fix error propagation

2012-05-22 Thread Laszlo Ersek

From: Paolo Bonzini pbonz...@redhat.com

Don't overwrite / leak previously set errors.
Don't try to end a container that could not be started.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 error.h|4 +-
 error.c|4 +-
 qapi/qapi-visit-core.c |   10 +--
 tests/test-qmp-input-visitor.c |   24 +---
 docs/qapi-code-gen.txt |2 +
 scripts/qapi-visit.py  |  129 +++
 6 files changed, 102 insertions(+), 71 deletions(-)

diff --git a/error.h b/error.h
index 45ff6c1..6898f84 100644
--- a/error.h
+++ b/error.h
@@ -24,7 +24,7 @@ typedef struct Error Error;
 /**
  * Set an indirect pointer to an error given a printf-style format parameter.
  * Currently, qerror.h defines these error formats.  This function is not
- * meant to be used outside of QEMU.
+ * meant to be used outside of QEMU.  Errors after the first are discarded.
  */
 void error_set(Error **err, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 
@@ -57,7 +57,7 @@ void error_set_field(Error *err, const char *field, const 
char *value);
 /**
  * Propagate an error to an indirect pointer to an error.  This function will
  * always transfer ownership of the error reference and handles the case where
- * dst_err is NULL correctly.
+ * dst_err is NULL correctly.  Errors after the first are discarded.
  */
 void error_propagate(Error **dst_err, Error *local_err);
 
diff --git a/error.c b/error.c
index a52b771..0177972 100644
--- a/error.c
+++ b/error.c
@@ -29,7 +29,7 @@ void error_set(Error **errp, const char *fmt, ...)
 Error *err;
 va_list ap;
 
-if (errp == NULL) {
+if (errp == NULL || *errp != NULL) {
 return;
 }
 
@@ -132,7 +132,7 @@ bool error_is_type(Error *err, const char *fmt)
 
 void error_propagate(Error **dst_err, Error *local_err)
 {
-if (dst_err) {
+if (dst_err  !*dst_err) {
 *dst_err = local_err;
 } else if (local_err) {
 error_free(local_err);
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index a4e088c..df1ed5c 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -39,9 +39,8 @@ void visit_start_struct(Visitor *v, void **obj, const char 
*kind,
 
 void visit_end_struct(Visitor *v, Error **errp)
 {
-if (!error_is_set(errp)) {
-v-end_struct(v, errp);
-}
+assert(!error_is_set(errp));
+v-end_struct(v, errp);
 }
 
 void visit_start_list(Visitor *v, const char *name, Error **errp)
@@ -62,9 +61,8 @@ GenericList *visit_next_list(Visitor *v, GenericList **list, 
Error **errp)
 
 void visit_end_list(Visitor *v, Error **errp)
 {
-if (!error_is_set(errp)) {
-v-end_list(v, errp);
-}
+assert(!error_is_set(errp));
+v-end_list(v, errp);
 }
 
 void visit_start_optional(Visitor *v, bool *present, const char *name,
diff --git a/tests/test-qmp-input-visitor.c b/tests/test-qmp-input-visitor.c
index c30fdc4..8f5a509 100644
--- a/tests/test-qmp-input-visitor.c
+++ b/tests/test-qmp-input-visitor.c
@@ -151,14 +151,22 @@ typedef struct TestStruct
 static void visit_type_TestStruct(Visitor *v, TestStruct **obj,
   const char *name, Error **errp)
 {
-visit_start_struct(v, (void **)obj, TestStruct, name, sizeof(TestStruct),
-   errp);
-
-visit_type_int(v, (*obj)-integer, integer, errp);
-visit_type_bool(v, (*obj)-boolean, boolean, errp);
-visit_type_str(v, (*obj)-string, string, errp);
-
-visit_end_struct(v, errp);
+Error *err = NULL;
+if (!error_is_set(errp)) {
+visit_start_struct(v, (void **)obj, TestStruct, name, 
sizeof(TestStruct),
+   err);
+if (!err) {
+visit_type_int(v, (*obj)-integer, integer, err);
+visit_type_bool(v, (*obj)-boolean, boolean, err);
+visit_type_str(v, (*obj)-string, string, err);
+
+/* Always call end_struct if start_struct succeeded.  */
+error_propagate(errp, err);
+err = NULL;
+visit_end_struct(v, err);
+}
+error_propagate(errp, err);
+}
 }
 
 static void test_visitor_in_struct(TestInputVisitorData *data,
diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
index ad11767..cccb11e 100644
--- a/docs/qapi-code-gen.txt
+++ b/docs/qapi-code-gen.txt
@@ -220,6 +220,8 @@ Example:
 #endif
 mdroth@illuin:~/w/qemu2.git$
 
+(The actual structure of the visit_type_* functions is a bit more complex
+in order to propagate errors correctly and avoid leaking memory).
 
 === scripts/qapi-commands.py ===
 
diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
index 8d4e94a..61cf586 100644
--- a/scripts/qapi-visit.py
+++ b/scripts/qapi-visit.py
@@ -17,14 +17,37 @@ import os
 import getopt
 import errno
 
-def generate_visit_struct_body(field_prefix, members):
-ret = 
+def generate_visit_struct_body(field_prefix, name,

[Qemu-devel] [PATCH 12/16] convert net_init_socket() to NetClientOptions

2012-05-22 Thread Laszlo Ersek

I reverse engineered the following permissions between the -socket
sub-options:

 fd  listen  connect  mcast  udp | localaddr
  fd x   .   ..  .   | .
  listen .   x   ..  .   | .
  connect.   .   x.  .   | .
  mcast  .   .   .x  .   | x
  udp.   .   ..  x   | x
  ---+
  localaddr  .   .   .x  x x

I transformed the code accordingly. The real fix would be to embed fd,
listen, connect, mcast and udp in a separate union. However
OptsVisitor's enum parser only supports the type=XXX QemuOpt instance as
union discriminator.

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/socket.c |  119 +-
 1 files changed, 43 insertions(+), 76 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 563447d..e3cba20 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -586,101 +586,68 @@ static int net_socket_udp_init(VLANState *vlan,
 return 0;
 }
 
-int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts,
+int net_init_socket(QemuOpts *old_opts, const NetClientOptions *opts,
 const char *name, VLANState *vlan)
 {
-if (qemu_opt_get(opts, fd)) {
-int fd;
+const NetdevSocketOptions *sock;
 
-if (qemu_opt_get(opts, listen) ||
-qemu_opt_get(opts, connect) ||
-qemu_opt_get(opts, mcast) ||
-qemu_opt_get(opts, localaddr)) {
-error_report(listen=, connect=, mcast= and localaddr= is invalid 
with fd=);
-return -1;
-}
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_SOCKET);
+sock = opts-socket;
 
-fd = net_handle_fd_param(cur_mon, qemu_opt_get(opts, fd));
-if (fd == -1) {
-return -1;
-}
+if (sock-has_fd + sock-has_listen + sock-has_connect + sock-has_mcast +
+sock-has_udp != 1) {
+error_report(exactly one of fd=, listen=, connect=, mcast= or udp=
+  is required);
+return -1;
+}
 
-if (!net_socket_fd_init(vlan, socket, name, fd, 1)) {
-return -1;
-}
-} else if (qemu_opt_get(opts, listen)) {
-const char *listen;
-
-if (qemu_opt_get(opts, fd) ||
-qemu_opt_get(opts, connect) ||
-qemu_opt_get(opts, mcast) ||
-qemu_opt_get(opts, localaddr)) {
-error_report(fd=, connect=, mcast= and localaddr= is invalid with 
listen=);
-return -1;
-}
+if (sock-has_localaddr  !sock-has_mcast  !sock-has_udp) {
+error_report(localaddr= is only valid with mcast= or udp=);
+return -1;
+}
 
-listen = qemu_opt_get(opts, listen);
+if (sock-has_fd) {
+int fd;
 
-if (net_socket_listen_init(vlan, socket, name, listen) == -1) {
-return -1;
-}
-} else if (qemu_opt_get(opts, connect)) {
-const char *connect;
-
-if (qemu_opt_get(opts, fd) ||
-qemu_opt_get(opts, listen) ||
-qemu_opt_get(opts, mcast) ||
-qemu_opt_get(opts, localaddr)) {
-error_report(fd=, listen=, mcast= and localaddr= is invalid with 
connect=);
+fd = net_handle_fd_param(cur_mon, sock-fd);
+if (fd == -1 || !net_socket_fd_init(vlan, socket, name, fd, 1)) {
 return -1;
 }
+return 0;
+}
 
-connect = qemu_opt_get(opts, connect);
-
-if (net_socket_connect_init(vlan, socket, name, connect) == -1) {
+if (sock-has_listen) {
+if (net_socket_listen_init(vlan, socket, name, sock-listen) == -1) {
 return -1;
 }
-} else if (qemu_opt_get(opts, mcast)) {
-const char *mcast, *localaddr;
+return 0;
+}
 
-if (qemu_opt_get(opts, fd) ||
-qemu_opt_get(opts, connect) ||
-qemu_opt_get(opts, listen)) {
-error_report(fd=, connect= and listen= is invalid with mcast=);
+if (sock-has_connect) {
+if (net_socket_connect_init(vlan, socket, name, sock-connect) ==
+-1) {
 return -1;
 }
+return 0;
+}
 
-mcast = qemu_opt_get(opts, mcast);
-localaddr = qemu_opt_get(opts, localaddr);
-
-if (net_socket_mcast_init(vlan, socket, name, mcast, localaddr) == 
-1) {
-return -1;
-}
-} else if (qemu_opt_get(opts, udp)) {
-const char *udp, *localaddr;
-
-if (qemu_opt_get(opts, fd) ||
-qemu_opt_get(opts, connect) ||
-qemu_opt_get(opts, listen) ||
-qemu_opt_get(opts, mcast)) {
-error_report(fd=, connect=, listen=
-  and mcast= is invalid with udp=);
+if (sock-has_mcast) {
+/* if sock-localaddr is missing, it has been initialized to all bits
+ * zero */
+if

[Qemu-devel] [PATCHv3 2/2] envlist.c: handle strdup failure

2012-05-22 Thread Jim Meyering

From: Jim Meyering meyer...@redhat.com

Without this, envlist_to_environ may silently fail to copy all
strings into the destination buffer, and both callers would leak
any env strings allocated after a failing strdup, because the
freeing code stops at the first NULL pointer.

Signed-off-by: Jim Meyering meyer...@redhat.com
---
 envlist.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/envlist.c b/envlist.c
index e44889b..df5c723 100644
--- a/envlist.c
+++ b/envlist.c
@@ -234,8 +234,16 @@ envlist_to_environ(const envlist_t *envlist, size_t *count)
 return (NULL);

 for (entry = envlist-el_entries.lh_first; entry != NULL;
-entry = entry-ev_link.le_next) {
-*(penv++) = strdup(entry-ev_var);
+ entry = entry-ev_link.le_next, penv++) {
+*penv = strdup(entry-ev_var);
+if (*penv == NULL) {
+char **e = env;
+while (e = penv) {
+free(*e++);
+}
+free(env);
+return NULL;
+}
 }
 *penv = NULL; /* NULL terminate the list */

-- 
1.7.10.2.552.gaa3bb87

[Qemu-devel] [PATCH qom-next 5/5] target-i386: move reset callback to cpu.c

2012-05-22 Thread Igor Mammedov

Moving reset callback into cpu object from board level will allow
properly create object during run-time (hotplug).

When reset over QOM hierarchy is implemented, this reset callback
should be removed.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/pc.c   |7 ---
 target-i386/cpu.c |8 
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 677f9e0..70dd0e6 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -888,12 +888,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 }
 }
 
-static void pc_cpu_reset(void *opaque)
-{
-X86CPU *cpu = opaque;
-cpu_reset(CPU(cpu));
-}
-
 void pc_cpus_init(const char *cpu_model)
 {
 X86CPU *cpu;
@@ -904,7 +898,6 @@ void pc_cpus_init(const char *cpu_model)
 if (cpu == NULL) {
 exit(1);
 }
-qemu_register_reset(pc_cpu_reset, cpu);
 }
 }
 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 0e804ea..87f4f5a 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1716,6 +1716,13 @@ static void x86_cpu_reset(CPUState *s)
 env-halted = !(cpu_get_apic_base(env-apic_state)  
MSR_IA32_APICBASE_BSP);
 }
 
+/* TODO: remove me, when reset over QOM tree is implemented */
+static void x86_cpu_machine_reset_cb(void *opaque)
+{
+X86CPU *cpu = opaque;
+cpu_reset(CPU(cpu));
+}
+
 static void mce_init(X86CPU *cpu)
 {
 CPUX86State *cenv = cpu-env;
@@ -1812,6 +1819,7 @@ void x86_cpu_realize(Object *obj, Error **errp)
 
 mce_init(cpu);
 qemu_init_vcpu(env);
+qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
 cpu_reset(CPU(cpu));
 }
 
-- 
1.7.7.6

[Qemu-devel] [PATCH 03/16] expose QemuOpt and QemuOpts struct definitions to interested parties

2012-05-22 Thread Laszlo Ersek

The only clients should be the existent qemu-option.c, and the upcoming
qapi/opts-visitor.c.

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 qemu-option-internal.h |   53 
 qemu-option.c  |   24 +
 2 files changed, 54 insertions(+), 23 deletions(-)
 create mode 100644 qemu-option-internal.h

diff --git a/qemu-option-internal.h b/qemu-option-internal.h
new file mode 100644
index 000..19fdc1c
--- /dev/null
+++ b/qemu-option-internal.h
@@ -0,0 +1,53 @@
+/*
+ * Commandline option parsing functions
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2009 Kevin Wolf kw...@redhat.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_OPTIONS_INTERNAL_H
+#define QEMU_OPTIONS_INTERNAL_H
+
+#include qemu-option.h
+
+struct QemuOpt {
+const char   *name;
+const char   *str;
+
+const QemuOptDesc *desc;
+union {
+bool boolean;
+uint64_t uint;
+} value;
+
+QemuOpts *opts;
+QTAILQ_ENTRY(QemuOpt) next;
+};
+
+struct QemuOpts {
+char *id;
+QemuOptsList *list;
+Location loc;
+QTAILQ_HEAD(QemuOptHead, QemuOpt) head;
+QTAILQ_ENTRY(QemuOpts) next;
+};
+
+#endif
diff --git a/qemu-option.c b/qemu-option.c
index bb3886c..8334190 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -29,9 +29,9 @@
 #include qemu-common.h
 #include qemu-error.h
 #include qemu-objects.h
-#include qemu-option.h
 #include error.h
 #include qerror.h
+#include qemu-option-internal.h
 
 /*
  * Extracts the name of an option from the parameter string (p points at the
@@ -511,28 +511,6 @@ void print_option_help(QEMUOptionParameter *list)
 
 /* -- */
 
-struct QemuOpt {
-const char   *name;
-const char   *str;
-
-const QemuOptDesc *desc;
-union {
-bool boolean;
-uint64_t uint;
-} value;
-
-QemuOpts *opts;
-QTAILQ_ENTRY(QemuOpt) next;
-};
-
-struct QemuOpts {
-char *id;
-QemuOptsList *list;
-Location loc;
-QTAILQ_HEAD(QemuOptHead, QemuOpt) head;
-QTAILQ_ENTRY(QemuOpts) next;
-};
-
 static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name)
 {
 QemuOpt *opt;
-- 
1.7.1

[Qemu-devel] [PATCH 09/16] convert net_init_nic() to NetClientOptions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net.c |   39 ++-
 1 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/net.c b/net.c
index 5ac5cf0..bd2fd23 100644
--- a/net.c
+++ b/net.c
@@ -748,12 +748,15 @@ int net_handle_fd_param(Monitor *mon, const char *param)
 return fd;
 }
 
-static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts,
+static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts,
 const char *name, VLANState *vlan)
 {
 int idx;
 NICInfo *nd;
-const char *netdev;
+const NetLegacyNicOptions *nic;
+
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_NIC);
+nic = opts-nic;
 
 idx = nic_get_free_idx();
 if (idx == -1 || nb_nics = MAX_NICS) {
@@ -765,10 +768,10 @@ static int net_init_nic(QemuOpts *opts, const 
NetClientOptions *new_opts,
 
 memset(nd, 0, sizeof(*nd));
 
-if ((netdev = qemu_opt_get(opts, netdev))) {
-nd-netdev = qemu_find_netdev(netdev);
+if (nic-has_netdev) {
+nd-netdev = qemu_find_netdev(nic-netdev);
 if (!nd-netdev) {
-error_report(netdev '%s' not found, netdev);
+error_report(netdev '%s' not found, nic-netdev);
 return -1;
 }
 } else {
@@ -778,26 +781,28 @@ static int net_init_nic(QemuOpts *opts, const 
NetClientOptions *new_opts,
 if (name) {
 nd-name = g_strdup(name);
 }
-if (qemu_opt_get(opts, model)) {
-nd-model = g_strdup(qemu_opt_get(opts, model));
+if (nic-has_model) {
+nd-model = g_strdup(nic-model);
 }
-if (qemu_opt_get(opts, addr)) {
-nd-devaddr = g_strdup(qemu_opt_get(opts, addr));
+if (nic-has_addr) {
+nd-devaddr = g_strdup(nic-addr);
 }
 
-if (qemu_opt_get(opts, macaddr) 
-net_parse_macaddr(nd-macaddr.a, qemu_opt_get(opts, macaddr))  0) {
+if (nic-has_macaddr 
+net_parse_macaddr(nd-macaddr.a, nic-macaddr)  0) {
 error_report(invalid syntax for ethernet address);
 return -1;
 }
 qemu_macaddr_default_if_unset(nd-macaddr);
 
-nd-nvectors = qemu_opt_get_number(opts, vectors,
-   DEV_NVECTORS_UNSPECIFIED);
-if (nd-nvectors != DEV_NVECTORS_UNSPECIFIED 
-(nd-nvectors  0 || nd-nvectors  0x7ff)) {
-error_report(invalid # of vectors: %d, nd-nvectors);
-return -1;
+if (nic-has_vectors) {
+if (nic-vectors  0x7ff) {
+error_report(invalid # of vectors: %PRId64, nic-vectors);
+return -1;
+}
+nd-nvectors = nic-vectors;
+} else {
+nd-nvectors = DEV_NVECTORS_UNSPECIFIED;
 }
 
 nd-used = 1;
-- 
1.7.1

Re: [Qemu-devel] [PATCH 2/2] fdc: fix media detection

2012-05-22 Thread Kevin Wolf

Am 22.05.2012 12:59, schrieb Pavel Hrdina:
 We have to set up 'media_changed' after guest start so floppy driver
 could detect that there is no media in drive. For this purpose we call
 'fdctrl_change_cb' instead of 'fd_revalidate' in 'fdctrl_connect_drives'.
 'fd_revalidate' is called inside 'fdctrl_change_cb'.
 
 In 'fdctrl_handle_seek' we always set current track because we don't care
 if there is media inserted or not.
 
 Signed-off-by: Pavel Hrdina phrd...@redhat.com

Can you please add a qtest case that shows the problems that you're
fixing in this series?

 diff --git a/hw/fdc.c b/hw/fdc.c
 index cb4cd25..337b35a 100644
 --- a/hw/fdc.c
 +++ b/hw/fdc.c
 @@ -1617,11 +1617,7 @@ static void fdctrl_handle_seek(FDCtrl *fdctrl, int 
 direction)
  /* The seek command just sends step pulses to the drive and doesn't care 
 if
   * there is a medium inserted of if it's banging the head against the 
 drive.
   */
 -if (fdctrl-fifo[2]  cur_drv-max_track) {
 -cur_drv-track = cur_drv-max_track;
 -} else {
 -cur_drv-track = fdctrl-fifo[2];
 -}
 +cur_drv-track = fdctrl-fifo[2];

Why is it okay to have cur_drv-track point outside the floppy? Won't it
mess up future calculations? Not all other places check it again
cur_drv-max_track.

Kevin

Re: [Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file

2012-05-22 Thread Eric Blake

On 05/22/2012 02:18 AM, Kevin Wolf wrote:

 This patch series adds the -filefd command-line option and the
 getfd_file monitor command.  This will enable libvirt to open a
 file and push the corresponding filename and file descriptor to
 QEMU.  When QEMU needs to open a file, it will first check if the
 file descriptor was passed by either of these methods before
 attempting to actually open the file.
 
 I thought we decided to avoid making some file names magic, and instead
 go for the obvious /dev/fd/42?

This doesn't make some file names magic, it makes all file names
magic.  In other words, _every_ call to open() first checks the
database for an existing fd for the same file name.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions

2012-05-22 Thread Michael S. Tsirkin

On Tue, May 22, 2012 at 09:41:41PM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2012-05-22 at 14:14 +0300, Michael S. Tsirkin wrote:
   The baseline is the laptop without kvm talking to the server. The
   TCP_STREAM test results are:
  
  It's not a good test. The thing most affecting throughput results is
  how
  much CPU does you guest get. So as a minumum you need to measure CPU
  utilization on the host and divide by that. 
 
 The simple fact that we don't reach the baseline while in qemu seems to
 be a reasonably good indication that we tend to be CPU bound already so
 it's not -that- relevant. It would be if we were saturating the network.
 
 But yes, I can try to do more tests tomorrow, it would be nice if you
 could contribute a proper test protocol (or even test on some machines)
 since you seem to be familiar with those measurements (and I have a very
 limited access to x86 gear ... basically just my laptop).
 
 Cheers,
 Ben.

I have a deja vu. Amos sent perf results when you argued about
exactly the same issue in guest virtio. Delta was small but
measureable. At the moment I have no free time or free hardware
to redo the same work all over again. It's a well known fact that
actual memory barrier is slow on x86 CPUs. You can't see
it with network on your laptop? Write a microbenchmark.

-- 
MST

[Qemu-devel] [PATCH 13/16] convert net_init_vde() to NetClientOptions

2012-05-22 Thread Laszlo Ersek


Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 net/vde.c |   37 +
 1 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/net/vde.c b/net/vde.c
index 8e60f68..35e8113 100644
--- a/net/vde.c
+++ b/net/vde.c
@@ -110,20 +110,41 @@ static int net_vde_init(VLANState *vlan, const char 
*model,
 return 0;
 }
 
-int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts,
+int net_init_vde(QemuOpts *old_opts, const NetClientOptions *opts,
  const char *name, VLANState *vlan)
 {
-const char *sock;
-const char *group;
 int port, mode;
 
-sock  = qemu_opt_get(opts, sock);
-group = qemu_opt_get(opts, group);
+const NetdevVdeOptions *vde;
 
-port = qemu_opt_get_number(opts, port, 0);
-mode = qemu_opt_get_number(opts, mode, 0700);
+assert(opts-kind == NET_CLIENT_OPTIONS_KIND_VDE);
+vde = opts-vde;
 
-if (net_vde_init(vlan, vde, name, sock, port, group, mode) == -1) {
+if (vde-has_port) {
+if (vde-port  INT_MAX) {
+error_report(invalid port: %PRId64, vde-port);
+return -1;
+}
+port = vde-port;
+}
+else {
+port = 0;
+}
+
+if (vde-has_mode) {
+if (vde-mode  INT_MAX) {
+error_report(invalid mode: %PRId64, vde-mode);
+return -1;
+}
+mode = vde-mode;
+}
+else {
+mode = 0700;
+}
+
+/* missing optional values have been initialized to all bits zero */
+if (net_vde_init(vlan, vde, name, vde-sock, port, vde-group, mode) ==
+ -1) {
 return -1;
 }
 
-- 
1.7.1

Re: [Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file

2012-05-22 Thread Kevin Wolf

Am 22.05.2012 14:02, schrieb Eric Blake:
 On 05/22/2012 02:18 AM, Kevin Wolf wrote:
 
 This patch series adds the -filefd command-line option and the
 getfd_file monitor command.  This will enable libvirt to open a
 file and push the corresponding filename and file descriptor to
 QEMU.  When QEMU needs to open a file, it will first check if the
 file descriptor was passed by either of these methods before
 attempting to actually open the file.

 I thought we decided to avoid making some file names magic, and instead
 go for the obvious /dev/fd/42?
 
 This doesn't make some file names magic, it makes all file names
 magic.  In other words, _every_ call to open() first checks the
 database for an existing fd for the same file name.

Depends on your definition. You call every database lookup magic, I only
considered cases where the database actually contains something.

But no matter if some or all, there's magic and I dislike that.

Kevin

Re: [Qemu-devel] [PATCH] ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN.

2012-05-22 Thread Paolo Bonzini

Il 22/05/2012 12:10, Ronnie Sahlberg ha scritto:
 This allows using LUNs bigger than 2TB.

Applied to scsi-next for 1.2, thanks.

Paolo

 Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com
 ---
  block/iscsi.c |  101 
  trace-events  |4 +-
  2 files changed, 74 insertions(+), 31 deletions(-)
 
 diff --git a/block/iscsi.c b/block/iscsi.c
 index f956824..ed1ad7b 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -25,6 +25,7 @@
  #include config-host.h
  
  #include poll.h
 +#include arpa/inet.h
  #include qemu-common.h
  #include qemu-error.h
  #include block_int.h
 @@ -168,12 +169,12 @@ iscsi_readv_writev_bh_cb(void *p)
  
  
  static void
 -iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
 +iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
   void *command_data, void *opaque)
  {
  IscsiAIOCB *acb = opaque;
  
 -trace_iscsi_aio_write10_cb(iscsi, status, acb, acb-canceled);
 +trace_iscsi_aio_write16_cb(iscsi, status, acb, acb-canceled);
  
  g_free(acb-buf);
  
 @@ -186,7 +187,7 @@ iscsi_aio_write10_cb(struct iscsi_context *iscsi, int 
 status,
  
  acb-status = 0;
  if (status  0) {
 -error_report(Failed to write10 data to iSCSI lun. %s,
 +error_report(Failed to write16 data to iSCSI lun. %s,
   iscsi_get_error(iscsi));
  acb-status = -EIO;
  }
 @@ -211,12 +212,9 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t 
 sector_num,
  struct iscsi_context *iscsi = iscsilun-iscsi;
  IscsiAIOCB *acb;
  size_t size;
 -int fua = 0;
 -
 -/* set FUA on writes when cache mode is write through */
 -if (!(bs-open_flags  BDRV_O_CACHE_WB)) {
 -fua = 1;
 -}
 +uint32_t num_sectors;
 +uint64_t lba;
 +struct iscsi_data data;
  
  acb = qemu_aio_get(iscsi_aio_pool, bs, cb, opaque);
  trace_iscsi_aio_writev(iscsi, sector_num, nb_sectors, opaque, acb);
 @@ -226,18 +224,44 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t 
 sector_num,
  
  acb-canceled   = 0;
  
 -/* XXX we should pass the iovec to write10 to avoid the extra copy */
 +/* XXX we should pass the iovec to write16 to avoid the extra copy */
  /* this will allow us to get rid of 'buf' completely */
  size = nb_sectors * BDRV_SECTOR_SIZE;
  acb-buf = g_malloc(size);
  qemu_iovec_to_buffer(acb-qiov, acb-buf);
 -acb-task = iscsi_write10_task(iscsi, iscsilun-lun, acb-buf, size,
 -  sector_qemu2lun(sector_num, iscsilun),
 -  fua, 0, iscsilun-block_size,
 -  iscsi_aio_write10_cb, acb);
 +
 +
 +acb-task = malloc(sizeof(struct scsi_task));
  if (acb-task == NULL) {
 -error_report(iSCSI: Failed to send write10 command. %s,
 - iscsi_get_error(iscsi));
 +error_report(iSCSI: Failed to allocate task for scsi WRITE16 
 + command. %s, iscsi_get_error(iscsi));
 +qemu_aio_release(acb);
 +return NULL;
 +}
 +memset(acb-task, 0, sizeof(struct scsi_task));
 +
 +acb-task-xfer_dir = SCSI_XFER_WRITE;
 +acb-task-cdb_size = 16;
 +acb-task-cdb[0] = 0x8a;
 +if (!(bs-open_flags  BDRV_O_CACHE_WB)) {
 +/* set FUA on writes when cache mode is write through */
 +acb-task-cdb[1] |= 0x04;
 +}
 +lba = sector_qemu2lun(sector_num, iscsilun);
 +*(uint32_t *)acb-task-cdb[2]  = htonl(lba  32);
 +*(uint32_t *)acb-task-cdb[6]  = htonl(lba  0x);
 +num_sectors = size / iscsilun-block_size;
 +*(uint32_t *)acb-task-cdb[10] = htonl(num_sectors);
 +acb-task-expxferlen = size;
 +
 +data.data = acb-buf;
 +data.size = size;
 +
 +if (iscsi_scsi_command_async(iscsi, iscsilun-lun, acb-task,
 + iscsi_aio_write16_cb,
 + data,
 + acb) != 0) {
 +scsi_free_scsi_task(acb-task);
  g_free(acb-buf);
  qemu_aio_release(acb);
  return NULL;
 @@ -249,12 +273,12 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t 
 sector_num,
  }
  
  static void
 -iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status,
 +iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status,
  void *command_data, void *opaque)
  {
  IscsiAIOCB *acb = opaque;
  
 -trace_iscsi_aio_read10_cb(iscsi, status, acb, acb-canceled);
 +trace_iscsi_aio_read16_cb(iscsi, status, acb, acb-canceled);
  
  if (acb-canceled != 0) {
  qemu_aio_release(acb);
 @@ -265,7 +289,7 @@ iscsi_aio_read10_cb(struct iscsi_context *iscsi, int 
 status,
  
  acb-status = 0;
  if (status != 0) {
 -error_report(Failed to read10 data from iSCSI lun. %s,
 +error_report(Failed to read16 data from iSCSI lun. %s,
   iscsi_get_error(iscsi));
  acb-status = -EIO;

Re: [Qemu-devel] [PATCH] ISCSI: We need to call qemu_notify_event() everytime we update which events we need to be notified for.

2012-05-22 Thread Paolo Bonzini

Il 22/05/2012 11:56, Ronnie Sahlberg ha scritto:
 Otherwise, If we add an event for -is-writeable but the socket is already 
 writeable there may be a short delay before the event callback is actually 
 triggered.
 
 Those delays would in particular hurt performance during BIOS boot and when 
 the GRUB bootloader reads the kernel and initrd.
 
 Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com
 ---
  block/iscsi.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/block/iscsi.c b/block/iscsi.c
 index d37c4ee..f956824 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -109,6 +109,13 @@ iscsi_set_events(IscsiLun *iscsilun)
 (iscsi_which_events(iscsi)  POLLOUT)
 ? iscsi_process_write : NULL,
 iscsi_process_flush, iscsilun);
 +
 +/* If we just added the event for writeable we must call
 +   and the socket is already writeable the callback might
 +   not be invoked until after a short delay unless we call
 +   qemu_notify_event().
 + */
 +qemu_notify_event();
  }
  
  static void

Thanks, applied to SCSI branch for 1.1.

Paolo

[Qemu-devel] ARM QEMU/KVM and TrustZone

2012-05-22 Thread Peter Maydell

Historically for QEMU we haven't implemented TrustZone support even
though we claim to emulate CPUs that provide it. Instead we provide a
CPU which mostly looks like a variant of the real thing without the
TrustZone feature. We then bolt on a few extra cp15 registers (eg the
SCR) as a pragmatic move to get Linux guests to run. Now we're also
dealing with KVM on ARM I'd like to define things a bit more solidly
so KVM and TCG agree on what the CPU model they present is.

There are several possible environments we could provide
to a guest:
 (1) a CPU with full TrustZone support
 (2) a CPU without TrustZone at all
 (3) a TZ CPU running in NonSecure PL0/PL1
 (4) a TZ CPU running in Secure PL0/PL1

In some ways (1) is the purist solution -- emulate exactly what the
hardware does. However:

 * on TCG it would require a lot of work, including new functionality
   in core QEMU (to support having different CPU cores being able to
   see different views of memory, and having the S/NS attribute
   attached to memory transactions)

 * it isn't possible in KVM, because the ARM Virtualization Extensions
   don't allow you to fake the CPSR a guest sees, and so you can't
   make the guest believe it is in Monitor mode

Option (2) is architecturally sanctioned (ie TrustZone is an optional
feature, not mandatory), but it doesn't correspond to real CPUs, in
that the hardware Cortex-A8/A9/A15 always have TrustZone. So we're
modelling something that doesn't really exist.

Options (3) and (4) correspond to the environment an OS guest
typically actually uses on hardware. For ARM's devboards (versatile
express etc) Linux runs in the Secure world but it doesn't actually
use any of the TrustZone functionality, it's just a give me full
access to everything setup. For just about every other ARM system,
the boot rom or equivalent keeps Secure world to itself, and the OS
kernel runs in the NonSecure world. (This typically means that the
boot rom provides a set of board-specific entry points via the Secure
Monitor Call (SMC) instruction for doing operations like invalidate
whole L2 cache which require secure privileges.)

Proposal:

My suggestion is that we present the guest with a view that looks like
a sort of superset of (2) (3) and (4), ie sufficient that a guest
expecting any of those environments can run. In particular:

 * no cp15 registers have secure/nonsecure banking
 * there is only one memory space visible
 * secure-access-only permissions are not enforced
 * the handful of only-in-trustzone registers are implemented
   (eg VBAR, MVBAR)
 * we implement a fake monitor mode

The aim of the fake monitor mode is to allow us to provide fake
qemu-specific bootroms which implement whatever the board's SMC
interface is, without having to write specific KVM kernel code for
each board. So we don't have to run arbitrary secure-world guest code.
The rules are:
 * on an SMC instruction we enter the guest at the SMC vector
   as defined by the MVBAR (monitor vector base address register)
 * we actually run with the same access permissions as above
   (and under KVM if you look at CPSR.M it will tell you you're
   in Supervisor mode)
 * return from the SMC is via a standard exception return insn
 * we don't implement the separate memory space for the secure
   world. (This implies that you need to find space in the
   non-secure world's physical memory map for the bootrom shim;
   not a big deal I think since we already have a requirement
   for some space to put QEMU's arm_boot trivial bootloader.)

The code written for this fake monitor mode environment is likely to
be able to work OK if we ever implement full TrustZone support in TCG
QEMU.

Work required:

* Documentation: the general principles as listed above
* TCG: make sure we have implementations of all the TZ registers
* TCG: implement the SMC and fake-monitor-mode
  (I already have patches from Nokia in the qemu-linaro
  stack which can be cleaned up and used here)
* KVM: implement emulation of MVBAR
* KVM: set the config bit so SMC is trapped to the hypervisor
  and causes guest restart at the right entrypoint
* KVM: if there turns out to be anything that fake-monitor-mode
  needs to do that requires Hyp privilege we'd need a hypercall
  ABI, but I can't currently think of anything

I think that's basically a fairly small set of work to formalise
the approach we're already taking in practice, and make it a little
more flexible.

Opinions?

-- PMM

[Qemu-devel] [PATCH 1/2] scsi: declare vmstate_info_scsi_requests to be static

2012-05-22 Thread Paolo Bonzini

From: Jim Meyering meyer...@redhat.com

Signed-off-by: Jim Meyering meyer...@redhat.com
---
 hw/scsi-bus.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 8ab9bcd..f10f3ec 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1561,7 +1561,7 @@ static int get_scsi_requests(QEMUFile *f, void *pv, 
size_t size)
 return 0;
 }
 
-const VMStateInfo vmstate_info_scsi_requests = {
+static const VMStateInfo vmstate_info_scsi_requests = {
 .name = scsi-requests,
 .get  = get_scsi_requests,
 .put  = put_scsi_requests,
-- 
1.7.10.1

[Qemu-devel] [PULL 1.1 0/2] SCSI patches for 1.1.0-rc3

2012-05-22 Thread Paolo Bonzini

The following changes since commit 76ee152a86d5f2533443ce4d2be6fe253cfb3c45:

  Update version to 1.1.0-rc2 (2012-05-14 17:56:50 -0500)

are available in the git repository at:

  git://github.com/bonzini/qemu.git scsi-next

for you to fetch changes up to e1a2d34f4abd8a117f5c5a25a5bb2e67d597de23:

  ISCSI: call qemu_notify_event() after updating events (2012-05-22 14:14:05 
+0200)


Jim Meyering (1):
  scsi: declare vmstate_info_scsi_requests to be static

Ronnie Sahlberg (1):
  ISCSI: call qemu_notify_event() after updating events

 block/iscsi.c |7 +++
 hw/scsi-bus.c |2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)
-- 
1.7.10.1

1 2 3 >

1 - 100 of 278 matches

Mail list logo