Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
On 22/05/12 15:52, Alexander Graf wrote: On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 13:21, Alexander Graf wrote: On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: Alexander, Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more robust I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -uint8_t *config; +uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? Yes. int i, overlapping_cap; +existing = pci_find_capability(pdev, cap_id); +if (existing) { +if (offset (existing != offset)) { +return -EEXIST; +} +for (i = existing; i size; ++i) { So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. It was there before me. This function already does a loop and this is how it was coded at the first place. +if (pdev-used[i]) { +return -EFAULT; +} +} +memset(pdev-used + offset, 0xFF, size); Why? Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size existing size, in which case you might overwrite data in the next chunk, no? No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); Why? Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? I am trying to make it as a single chunk which is as small as possible. If it helps, below is the same patch with extended context to see what is going on in that function. hw/pci.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 63a8219..7008a42 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) ptr = memory_region_get_ram_ptr(pdev-rom); load_image(path, ptr); g_free(path); if (is_default_rom) { /* Only the default rom images will be patched (if needed). */ pci_patch_ids(pdev, ptr, size); } qemu_put_ram_ptr(ptr); pci_register_bar(pdev, PCI_ROM_SLOT, 0, pdev-rom); return 0; } static void pci_del_option_rom(PCIDevice *pdev) { if (!pdev-has_rom) return; vmstate_unregister_ram(pdev-rom, pdev-qdev); memory_region_destroy(pdev-rom); pdev-has_rom = false; } /* * if !offset * Reserve space and add capability to the linked list in pci config space * * if offset = 0, * Find and reserve space and add capability to the linked list * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -uint8_t *config; +uint8_t *config, existing; int i, overlapping_cap; +existing = pci_find_capability(pdev, cap_id); +if (existing) { +if (offset (existing != offset)) { +return -EEXIST; +} +for (i = existing; i size; ++i) { +if (pdev-used[i]) { +return -EFAULT; +} +} +memset(pdev-used + offset, 0xFF, size); +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); +/* Check capability by default */ +memset(pdev-cmask + offset, 0xFF, size); +return existing; +} + if
Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 15:52, Alexander Graf wrote: On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 13:21, Alexander Graf wrote: On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: Alexander, Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more robust I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -uint8_t *config; +uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? Yes. int i, overlapping_cap; +existing = pci_find_capability(pdev, cap_id); +if (existing) { +if (offset (existing != offset)) { +return -EEXIST; +} +for (i = existing; i size; ++i) { So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. It was there before me. This function already does a loop and this is how it was coded at the first place. Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... +if (pdev-used[i]) { +return -EFAULT; +} +} +memset(pdev-used + offset, 0xFF, size); Why? Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size existing size, in which case you might overwrite data in the next chunk, no? No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). So why would the function that populates the config space initially not set the used flag correctly? Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); Why? Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? I am trying to make it as a single chunk which is as small as possible. No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. If it helps, below is the same patch with extended context to see what is going on in that function. hw/pci.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 63a8219..7008a42 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) ptr = memory_region_get_ram_ptr(pdev-rom); load_image(path, ptr); g_free(path); if (is_default_rom) { /* Only the default rom images will be patched (if needed). */ pci_patch_ids(pdev, ptr, size); } qemu_put_ram_ptr(ptr); pci_register_bar(pdev, PCI_ROM_SLOT, 0, pdev-rom); return 0; } static void pci_del_option_rom(PCIDevice *pdev) { if (!pdev-has_rom) return; vmstate_unregister_ram(pdev-rom, pdev-qdev); memory_region_destroy(pdev-rom); pdev-has_rom = false; } /* * if !offset * Reserve space and add capability to the linked list in pci config space * * if offset = 0, * Find and reserve space and add capability to the linked list * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -
[Qemu-devel] [PATCH v2 10/13] pci: export pci_user functions for use by other drivers
VFIO PCI support will make use of these for user initiated PCI config accesses. Signed-off-by: Alex Williamson alex.william...@redhat.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/access.c |6 -- drivers/pci/pci.h|7 --- include/linux/pci.h |8 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/pci/access.c b/drivers/pci/access.c index 2a58164..ba91a7e 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -162,7 +162,8 @@ int pci_user_read_config_##size \ if (ret 0)\ ret = -EINVAL; \ return ret; \ -} +} \ +EXPORT_SYMBOL_GPL(pci_user_read_config_##size); /* Returns 0 on success, negative values indicate error. */ #define PCI_USER_WRITE_CONFIG(size,type) \ @@ -181,7 +182,8 @@ int pci_user_write_config_##size \ if (ret 0)\ ret = -EINVAL; \ return ret; \ -} +} \ +EXPORT_SYMBOL_GPL(pci_user_write_config_##size); PCI_USER_READ_CONFIG(byte, u8) PCI_USER_READ_CONFIG(word, u16) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index e494347..f2dcc46 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -86,13 +86,6 @@ static inline bool pci_is_bridge(struct pci_dev *pci_dev) return !!(pci_dev-subordinate); } -extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val); -extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val); -extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 *val); -extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val); -extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val); -extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 val); - struct pci_vpd_ops { ssize_t (*read)(struct pci_dev *dev, loff_t pos, size_t count, void *buf); ssize_t (*write)(struct pci_dev *dev, loff_t pos, size_t count, const void *buf); diff --git a/include/linux/pci.h b/include/linux/pci.h index 2559735..0cf57d5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -770,6 +770,14 @@ static inline int pci_write_config_dword(const struct pci_dev *dev, int where, return pci_bus_write_config_dword(dev-bus, dev-devfn, where, val); } +/* user-space driven config access */ +int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val); +int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val); +int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 *val); +int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val); +int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val); +int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 val); + int __must_check pci_enable_device(struct pci_dev *dev); int __must_check pci_enable_device_io(struct pci_dev *dev); int __must_check pci_enable_device_mem(struct pci_dev *dev);
Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 15:52, Alexander Graf wrote: On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 13:21, Alexander Graf wrote: On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: Alexander, Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more robust I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -uint8_t *config; +uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? Yes. int i, overlapping_cap; +existing = pci_find_capability(pdev, cap_id); +if (existing) { +if (offset (existing != offset)) { +return -EEXIST; +} +for (i = existing; i size; ++i) { So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. It was there before me. This function already does a loop and this is how it was coded at the first place. Also, while at it, please add some comments at least for the code you add that explain why you do the things you do :). Alex
[Qemu-devel] [PATCH v2 07/13] vfio: VFIO core
VFIO is a secure user level driver for use with both virtual machines and user level drivers. VFIO makes use of IOMMU groups to ensure the isolation of devices in use, allowing unprivileged user access. It's intended that VFIO will replace KVM device assignment and UIO drivers (in cases where the target platform includes a sufficiently capable IOMMU). New in this version of VFIO is support for IOMMU groups managed through the IOMMU core as well as a rework of the API, removing the group merge interface. We now go back to a model more similar to original VFIO with UIOMMU support where the file descriptor obtained from /dev/vfio/vfio allows access to the IOMMU, but only after a group is added, avoiding the previous privilege issues with this type of model. IOMMU support is also now fully modular as IOMMUs have vastly different interface requirements on different platforms. VFIO users are able to query and initialize the IOMMU model of their choice. Please see the follow-on Documentation commit for further description and usage example. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/ioctl/ioctl-number.txt |1 MAINTAINERS |8 drivers/Kconfig |2 drivers/Makefile |1 drivers/vfio/Kconfig |8 drivers/vfio/Makefile|1 drivers/vfio/vfio.c | 1406 ++ include/linux/vfio.h | 366 + 8 files changed, 1793 insertions(+), 0 deletions(-) create mode 100644 drivers/vfio/Kconfig create mode 100644 drivers/vfio/Makefile create mode 100644 drivers/vfio/vfio.c create mode 100644 include/linux/vfio.h diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index e34b531..111e30a 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -88,6 +88,7 @@ Code Seq#(hex) Include FileComments and kernel/power/user.c '8'all SNP8023 advanced NIC card mailto:m...@solidum.com +';'64-6F linux/vfio.h '@'00-0F linux/radeonfb.hconflict! '@'00-0F drivers/video/aty/aty128fb.cconflict! 'A'00-1F linux/apm_bios.hconflict! diff --git a/MAINTAINERS b/MAINTAINERS index b362709..5aca4ff 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7229,6 +7229,14 @@ S: Maintained F: Documentation/filesystems/vfat.txt F: fs/fat/ +VFIO DRIVER +M: Alex Williamson alex.william...@redhat.com +L: k...@vger.kernel.org +S: Maintained +F: Documentation/vfio.txt +F: drivers/vfio/ +F: include/linux/vfio.h + VIDEOBUF2 FRAMEWORK M: Pawel Osciak pa...@osciak.com M: Marek Szyprowski m.szyprow...@samsung.com diff --git a/drivers/Kconfig b/drivers/Kconfig index d236aef..46eb115 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -112,6 +112,8 @@ source drivers/auxdisplay/Kconfig source drivers/uio/Kconfig +source drivers/vfio/Kconfig + source drivers/vlynq/Kconfig source drivers/virtio/Kconfig diff --git a/drivers/Makefile b/drivers/Makefile index 95952c8..fe1880a 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_ATM) += atm/ obj-$(CONFIG_FUSION) += message/ obj-y += firewire/ obj-$(CONFIG_UIO) += uio/ +obj-$(CONFIG_VFIO) += vfio/ obj-y += cdrom/ obj-y += auxdisplay/ obj-$(CONFIG_PCCARD) += pcmcia/ diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig new file mode 100644 index 000..9acb1e7 --- /dev/null +++ b/drivers/vfio/Kconfig @@ -0,0 +1,8 @@ +menuconfig VFIO + tristate VFIO Non-Privileged userspace driver framework + depends on IOMMU_API + help + VFIO provides a framework for secure userspace device drivers. + See Documentation/vfio.txt for more details. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile new file mode 100644 index 000..7500a67 --- /dev/null +++ b/drivers/vfio/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_VFIO) += vfio.o diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c new file mode 100644 index 000..6558eef --- /dev/null +++ b/drivers/vfio/vfio.c @@ -0,0 +1,1406 @@ +/* + * VFIO core + * + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#include linux/cdev.h
Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
On 22/05/12 16:31, Alexander Graf wrote: On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 15:52, Alexander Graf wrote: On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 13:21, Alexander Graf wrote: On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: Alexander, Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more robust I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -uint8_t *config; +uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? Yes. int i, overlapping_cap; +existing = pci_find_capability(pdev, cap_id); +if (existing) { +if (offset (existing != offset)) { +return -EEXIST; +} +for (i = existing; i size; ++i) { So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. It was there before me. This function already does a loop and this is how it was coded at the first place. Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... Well it is still does not make much sense to have int i rather than uint8_t i :) +if (pdev-used[i]) { +return -EFAULT; +} +} +memset(pdev-used + offset, 0xFF, size); Why? Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size existing size, in which case you might overwrite data in the next chunk, no? No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). So why would the function that populates the config space initially not set the used flag correctly? This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And there is no fixup_capability or something. Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. The last thing we want for a VFIO device is changing its capabilities list. +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); Why? Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? I am trying to make it as a single chunk which is as small as possible. No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. I actually duplicated 4 (four) lines and did it just once. This is too little to be called duplicating :) And I get very special case visually separated and easy to remove if we find a better solution later. But - no problemo, I'll rework it. [no further comments] If it helps, below is the same patch with extended context to see what is going on in that function. hw/pci.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 63a8219..7008a42 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) ptr = memory_region_get_ram_ptr(pdev-rom); load_image(path, ptr); g_free(path); if (is_default_rom) { /* Only the default rom images will be patched (if needed). */ pci_patch_ids(pdev, ptr, size); } qemu_put_ram_ptr(ptr); pci_register_bar(pdev, PCI_ROM_SLOT, 0,
[Qemu-devel] [PATCH v2 04/13] pci: Add PCI DMA source ID quirk
DMA transactions are tagged with the source ID of the device making the request. Occasionally hardware screws this up and uses the source ID of a different device (often the wrong function number of a multifunction device). A specific Ricoh multifunction device is a prime example of this problem and included in this patch. The purpose of this function is that given a pci_dev, return the pci_dev to use as the source ID for DMA. When hardware works correctly, this returns the input device. For the components of the Ricoh multifunction device, return the pci_dev for function 0. This will be used by IOMMU drivers for determining the boundaries of IOMMU groups as multiple devices using the same source ID must be contained within the same group. This can also be used by existing streaming DMA paths for the same purpose. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/pci/quirks.c | 40 include/linux/pci.h |5 + 2 files changed, 45 insertions(+), 0 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 4bf7102..a2dd77f 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3109,3 +3109,43 @@ int pci_dev_specific_reset(struct pci_dev *dev, int probe) return -ENOTTY; } + +static struct pci_dev *pci_func_0_dma_source(struct pci_dev *dev) +{ + return pci_get_slot(dev-bus, PCI_DEVFN(PCI_SLOT(dev-devfn), 0)); +} + +static const struct pci_dev_dma_source { + u16 vendor; + u16 device; + struct pci_dev *(*dma_source)(struct pci_dev *dev); +} pci_dev_dma_source[] = { + /* +* https://bugzilla.redhat.com/show_bug.cgi?id=605888 +* +* Some Ricoh devices use the function 0 source ID for DMA on +* other functions of a multifunction device. The DMA devices +* is therefore function 0, which will have implications of the +* iommu grouping of these devices. +*/ + { PCI_VENDOR_ID_RICOH, 0xe822, pci_func_0_dma_source }, + { PCI_VENDOR_ID_RICOH, 0xe230, pci_func_0_dma_source }, + { PCI_VENDOR_ID_RICOH, 0xe832, pci_func_0_dma_source }, + { PCI_VENDOR_ID_RICOH, 0xe832, pci_func_0_dma_source }, + { 0 } +}; + +struct pci_dev *pci_dma_source(struct pci_dev *dev) +{ + const struct pci_dev_dma_source *i; + + for (i = pci_dev_dma_source; i-dma_source; i++) { + if ((i-vendor == dev-vendor || +i-vendor == (u16)PCI_ANY_ID) + (i-device == dev-device || +i-device == (u16)PCI_ANY_ID)) + return i-dma_source(dev); + } + + return dev; +} diff --git a/include/linux/pci.h b/include/linux/pci.h index e444f5b..02dbfed 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1479,9 +1479,14 @@ enum pci_fixup_pass { #ifdef CONFIG_PCI_QUIRKS void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev); +struct pci_dev *pci_dma_source(struct pci_dev *dev); #else static inline void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev) {} +static inline struct pci_dev *pci_dma_source(struct pci_dev *dev) +{ + return dev; +} #endif void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen);
[Qemu-devel] [PATCH v2 09/13] vfio: x86 IOMMU implementation
x86 is probably the wrong name for this VFIO IOMMU driver, but x86 is the primary target for it. This driver support a very simple usage model using the existing IOMMU API. The IOMMU is expected to support the full host address space with no special IOVA windows, number of mappings restrictions, or unique processor target options. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/ioctl/ioctl-number.txt |2 drivers/vfio/Kconfig |6 drivers/vfio/Makefile|2 drivers/vfio/vfio.c |7 drivers/vfio/vfio_iommu_x86.c| 743 ++ include/linux/vfio.h | 52 ++ 6 files changed, 811 insertions(+), 1 deletions(-) create mode 100644 drivers/vfio/vfio_iommu_x86.c diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 111e30a..9d1694e 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -88,7 +88,7 @@ Code Seq#(hex) Include FileComments and kernel/power/user.c '8'all SNP8023 advanced NIC card mailto:m...@solidum.com -';'64-6F linux/vfio.h +';'64-72 linux/vfio.h '@'00-0F linux/radeonfb.hconflict! '@'00-0F drivers/video/aty/aty128fb.cconflict! 'A'00-1F linux/apm_bios.hconflict! diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 9acb1e7..bd88a30 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -1,6 +1,12 @@ +config VFIO_IOMMU_X86 + tristate + depends on VFIO X86 + default n + menuconfig VFIO tristate VFIO Non-Privileged userspace driver framework depends on IOMMU_API + select VFIO_IOMMU_X86 if X86 help VFIO provides a framework for secure userspace device drivers. See Documentation/vfio.txt for more details. diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 7500a67..1f1abee 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -1 +1,3 @@ obj-$(CONFIG_VFIO) += vfio.o +obj-$(CONFIG_VFIO_IOMMU_X86) += vfio_iommu_x86.o +obj-$(CONFIG_VFIO_PCI) += pci/ diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 6558eef..89899a8 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1369,6 +1369,13 @@ static int __init vfio_init(void) pr_info(DRIVER_DESC version: DRIVER_VERSION \n); + /* +* Attempt to load known iommu-drivers. This gives us a working +* environment without the user needing to explicitly load iommu +* drivers. +*/ + request_module_nowait(vfio_iommu_x86); + return 0; err_groups_cdev: diff --git a/drivers/vfio/vfio_iommu_x86.c b/drivers/vfio/vfio_iommu_x86.c new file mode 100644 index 000..a52391d --- /dev/null +++ b/drivers/vfio/vfio_iommu_x86.c @@ -0,0 +1,743 @@ +/* + * VFIO: IOMMU DMA mapping support for x86 (Intel VT-d AMD-Vi) + * + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#include linux/compat.h +#include linux/device.h +#include linux/fs.h +#include linux/iommu.h +#include linux/module.h +#include linux/mm.h +#include linux/pci.h /* pci_bus_type */ +#include linux/sched.h +#include linux/slab.h +#include linux/uaccess.h +#include linux/vfio.h +#include linux/workqueue.h + +#define DRIVER_VERSION 0.2 +#define DRIVER_AUTHOR Alex Williamson alex.william...@redhat.com +#define DRIVER_DESC x86 IOMMU driver for VFIO + +static bool allow_unsafe_interrupts; +module_param_named(allow_unsafe_interrupts, + allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(allow_unsafe_interrupts, +Enable VFIO IOMMU support for on platforms without interrupt remapping support.); + +struct vfio_iommu { + struct iommu_domain *domain; + struct mutexlock; + struct list_headdma_list; + struct list_headgroup_list; + boolcache; +}; + +struct vfio_dma { + struct list_headnext; + dma_addr_t iova; /* Device address */ + unsigned long vaddr; /* Process virtual addr */ + longnpage; /* Number of pages */ + int prot; /* IOMMU_READ/WRITE */ +}; + +struct vfio_group { + struct iommu_group *iommu_group; + struct list_headnext; +}; + +/* + * This
Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote: On 22/05/12 16:31, Alexander Graf wrote: On 22.05.2012, at 08:11, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 15:52, Alexander Graf wrote: On 22.05.2012, at 05:44, Alexey Kardashevskiy a...@ozlabs.ru wrote: On 22/05/12 13:21, Alexander Graf wrote: On 22.05.2012, at 04:02, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: Alexander, Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more robust I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { -uint8_t *config; +uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? Yes. int i, overlapping_cap; +existing = pci_find_capability(pdev, cap_id); +if (existing) { +if (offset (existing != offset)) { +return -EEXIST; +} +for (i = existing; i size; ++i) { So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. It was there before me. This function already does a loop and this is how it was coded at the first place. Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... Well it is still does not make much sense to have int i rather than uint8_t i :) +if (pdev-used[i]) { +return -EFAULT; +} +} +memset(pdev-used + offset, 0xFF, size); Why? Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size existing size, in which case you might overwrite data in the next chunk, no? No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). So why would the function that populates the config space initially not set the used flag correctly? This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And there is no fixup_capability or something. Hrm. Maybe we should have one? :) Or instead of populating the config space with the exact data from the host device, loop through the host device capabilities and populate them using this function as we go. That should maintain the offsets, but ensure that all internal flags are set, no? Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. The last thing we want for a VFIO device is changing its capabilities list. Well - we want it to look the same. The population should go through the same methods as emulated devices have to go through, no? :) +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); Why? Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? I am trying to make it as a single chunk which is as small as possible. No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. I actually duplicated 4 (four) lines and did it just once. This is too little to be called duplicating :) And I get very special case visually separated and easy to remove if we find a better solution later. Yeah, but the special case really shouldn't be all that special - that's why I was irritated. But - no problemo, I'll rework it. Thanks! [no further comments] What about mine below? ;) Alex If it helps, below is the
Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions
On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote: The emulated devices can run simultaneously with the guest, so we need to be careful with ordering of load and stores done by them to the guest system memory, which need to be observed in the right order by the guest operating system. This adds a barrier call to the basic DMA read/write ops which is currently implemented as a smp_mb(), but could be later improved for more fine grained control of barriers. Additionally, a _relaxed() variant of the accessors is provided to easily convert devices who would be performance sensitive and negatively impacted by the change. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- (Note to Rusty: The number I told you on ST is wrong, see below) So I tried to do some performance measurements with that patch using netperf on an x86 laptop (x220 with core i7). It's a bit tricky. For example, if I just create a tap interface, give it a local IP on the laptop and a different IP on the guest, (ie talking to a netserver on the host basically from the guest via tap), the performance is pretty poor and the numbers seem useless with and without the barrier. So I did tests involving talking to a server on our gigabit network instead. The baseline is the laptop without kvm talking to the server. The TCP_STREAM test results are: (The * at the beginning of the lines is something I added to distinguish multi-line results on some tests) MIGRATED TCP STREAM TEST Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec * 87380 16384 1638410.02 933.02 * 87380 16384 1638410.03 908.64 * 87380 16384 1638410.03 926.78 * 87380 16384 1638410.03 919.73 It's a bit noisy, ideally I should do a point-to-point setup to an otherwise idle machine, here I'm getting some general lab network noise but it gives us a pretty good baseline to begin with. I have not managed to get any sensible result out of UDP_STREAM in that configuration for some reason (ie just the host laptop), ie they look insane : MIGRATED UDP STREAM TEST Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec *229376 65507 10.00 270468 014173.84 126976 10.00 44 2.31 *229376 65507 10.00 266526 013967.32 126976 10.00 41 2.15 So we don't have a good comparison baseline but we can still compare KVM against itself with and without the barrier. Now KVM. This is x86_64 running an ubuntu precise guest (I had the ISO laying around) and using the default setup which appears to be an emulated e1000. I've done some tests with slirp just to see how bad it was and it's bad enough to be irrelevant. The numbers have thus been done using a tap interface bridged to the host ethernet (who happens to also be some kind of e1000). For each test I've done 3 series of numbers: - Without the barrier added - With the barrier added to dma_memory_rw - With the barrier added to dma_memory_rw -and- dma_memory_map First TCP_STREAM. The numbers are a bit noisy, I suspect somebody was hammering the server machine while I was doing one of the tests, but here's what I got when it appeared to have stabilized: MIGRATED TCP STREAM TEST Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec no barrier * 87380 16384 1638410.01880.31 * 87380 16384 1638410.01876.73 * 87380 16384 1638410.01880.73 * 87380 16384 1638410.01878.63 barrier * 87380 16384 1638410.01869.39 * 87380 16384 1638410.01864.99 * 87380 16384 1638410.01886.13 * 87380 16384 1638410.01872.90 barrier + map * 87380 16384 1638410.01867.45 * 87380 16384 1638410.01868.51 * 87380 16384 1638410.01888.94 * 87380 16384 1638410.01888.19 As far as I can tell, it's all in the noise. I was about to concede a small (1 % ?) loss to the barrier until I ran the last 2 tests and then I stopped caring :-) With UDP_STREAM, we get something like that: MIGRATED UDP STREAM TEST Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec no barrier *229376 65507 10.005208 0 272.92 126976 10.005208272.92 *229376 65507 10.005447 0 285.44 126976 10.005447285.44 *229376 65507 10.005119 0 268.22 126976 10.005119268.22 barrier
[Qemu-devel] [PATCH v2 08/13] vfio: Add documentation
Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/vfio.txt | 315 1 files changed, 315 insertions(+), 0 deletions(-) create mode 100644 Documentation/vfio.txt diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt new file mode 100644 index 000..1240874 --- /dev/null +++ b/Documentation/vfio.txt @@ -0,0 +1,315 @@ +VFIO - Virtual Function I/O[1] +--- +Many modern system now provide DMA and interrupt remapping facilities +to help ensure I/O devices behave within the boundaries they've been +allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, +POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC +systems such as Freescale PAMU. The VFIO driver is an IOMMU/device +agnostic framework for exposing direct device access to userspace, in +a secure, IOMMU protected environment. In other words, this allows +safe[2], non-privileged, userspace drivers. + +Why do we want that? Virtual machines often make use of direct device +access (device assignment) when configured for the highest possible +I/O performance. From a device and host perspective, this simply +turns the VM into a userspace driver, with the benefits of +significantly reduced latency, higher bandwidth, and direct use of +bare-metal device drivers[3]. + +Some applications, particularly in the high performance computing +field, also benefit from low-overhead, direct device access from +userspace. Examples include network adapters (often non-TCP/IP based) +and compute accelerators. Prior to VFIO, these drivers had to either +go through the full development cycle to become proper upstream +driver, be maintained out of tree, or make use of the UIO framework, +which has no notion of IOMMU protection, limited interrupt support, +and requires root privileges to access things like PCI configuration +space. + +The VFIO driver framework intends to unify these, replacing both the +KVM PCI specific device assignment code as well as provide a more +secure, more featureful userspace driver environment than UIO. + +Groups, Devices, and IOMMUs +--- + +Devices are the main target of any I/O driver. Devices typically +create a programming interface made up of I/O access, interrupts, +and DMA. Without going into the details of each of these, DMA is +by far the most critical aspect for maintaining a secure environment +as allowing a device read-write access to system memory imposes the +greatest risk to the overall system integrity. + +To help mitigate this risk, many modern IOMMUs now incorporate +isolation properties into what was, in many cases, an interface only +meant for translation (ie. solving the addressing problems of devices +with limited address spaces). With this, devices can now be isolated +from each other and from arbitrary memory access, thus allowing +things like secure direct assignment of devices into virtual machines. + +This isolation is not always at the granularity of a single device +though. Even when an IOMMU is capable of this, properties of devices, +interconnects, and IOMMU topologies can each reduce this isolation. +For instance, an individual device may be part of a larger multi- +function enclosure. While the IOMMU may be able to distinguish +between devices within the enclosure, the enclosure may not require +transactions between devices to reach the IOMMU. Examples of this +could be anything from a multi-function PCI device with backdoors +between functions to a non-PCI-ACS (Access Control Services) capable +bridge allowing redirection without reaching the IOMMU. Topology +can also play a factor in terms of hiding devices. A PCIe-to-PCI +bridge masks the devices behind it, making transaction appear as if +from the bridge itself. Obviously IOMMU design plays a major factor +as well. + +Therefore, while for the most part an IOMMU may have device level +granularity, any system is susceptible to reduced granularity. The +IOMMU API therefore supports a notion of IOMMU groups. A group is +a set of devices which is isolatable from all other devices in the +system. Groups are therefore the unit of ownership used by VFIO. + +While the group is the minimum granularity that must be used to +ensure secure user access, it's not necessarily the preferred +granularity. In IOMMUs which make use of page tables, it may be +possible to share a set of page tables between different groups, +reducing the overhead both to the platform (reduced TLB thrashing, +reduced duplicate page tables), and to the user (programming only +a single set of translations). For this reason, VFIO makes use of +a container class, which may hold one or more groups. A container +is created by simply opening the /dev/vfio/vfio character device. + +On its own, the container provides little
[Qemu-devel] [PATCH v2 03/13] iommu: IOMMU groups for VT-d and AMD-Vi
Add back group support for AMD Intel. amd_iommu already tracks devices and has init and uninit routines to manage groups. intel-iommu does this on the fly, so we make use of the notifier support built into iommu groups to create and remove groups. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/amd_iommu.c | 28 +- drivers/iommu/intel-iommu.c | 46 +++ 2 files changed, 73 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 32c00cd..b7e5ddf 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -256,9 +256,11 @@ static bool check_device(struct device *dev) static int iommu_init_device(struct device *dev) { - struct pci_dev *pdev = to_pci_dev(dev); + struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev); struct iommu_dev_data *dev_data; + struct iommu_group *group; u16 alias; + int ret; if (dev-archdata.iommu) return 0; @@ -279,8 +281,30 @@ static int iommu_init_device(struct device *dev) return -ENOTSUPP; } dev_data-alias_data = alias_data; + + dma_pdev = pci_get_bus_and_slot(alias 8, alias 0xff); + } else + dma_pdev = pdev; + + if (!pdev-is_virtfn PCI_FUNC(pdev-devfn) iommu_group_mf + pdev-hdr_type == PCI_HEADER_TYPE_NORMAL) + dma_pdev = pci_get_slot(pdev-bus, + PCI_DEVFN(PCI_SLOT(pdev-devfn), 0)); + + group = iommu_group_get(dma_pdev-dev); + if (!group) { + group = iommu_group_alloc(); + if (IS_ERR(group)) + return PTR_ERR(group); } + ret = iommu_group_add_device(group, dev); + + iommu_group_put(group); + + if (ret) + return ret; + if (pci_iommuv2_capable(pdev)) { struct amd_iommu *iommu; @@ -309,6 +333,8 @@ static void iommu_ignore_device(struct device *dev) static void iommu_uninit_device(struct device *dev) { + iommu_group_remove_device(dev); + /* * Nothing to do here - we keep dev_data around for unplugged devices * and reuse it when the device is re-plugged - not doing so would diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index d4a0ff7..e63b33b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4087,6 +4087,50 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain, return 0; } +static int intel_iommu_add_device(struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_dev *bridge, *dma_pdev = pdev; + struct iommu_group *group; + int ret; + + if (!device_to_iommu(pci_domain_nr(pdev-bus), +pdev-bus-number, pdev-devfn)) + return -ENODEV; + + bridge = pci_find_upstream_pcie_bridge(pdev); + if (bridge) { + if (pci_is_pcie(bridge)) + dma_pdev = pci_get_domain_bus_and_slot( + pci_domain_nr(pdev-bus), + bridge-subordinate-number, 0); + else + dma_pdev = bridge; + } + + if (!pdev-is_virtfn PCI_FUNC(pdev-devfn) iommu_group_mf + pdev-hdr_type == PCI_HEADER_TYPE_NORMAL) + dma_pdev = pci_get_slot(pdev-bus, + PCI_DEVFN(PCI_SLOT(pdev-devfn), 0)); + + group = iommu_group_get(dma_pdev-dev); + if (!group) { + group = iommu_group_alloc(); + if (IS_ERR(group)) + return PTR_ERR(group); + } + + ret = iommu_group_add_device(group, dev); + + iommu_group_put(group); + return ret; +} + +static void intel_iommu_remove_device(struct device *dev) +{ + iommu_group_remove_device(dev); +} + static struct iommu_ops intel_iommu_ops = { .domain_init= intel_iommu_domain_init, .domain_destroy = intel_iommu_domain_destroy, @@ -4096,6 +4140,8 @@ static struct iommu_ops intel_iommu_ops = { .unmap = intel_iommu_unmap, .iova_to_phys = intel_iommu_iova_to_phys, .domain_has_cap = intel_iommu_domain_has_cap, + .add_device = intel_iommu_add_device, + .remove_device = intel_iommu_remove_device, .pgsize_bitmap = INTEL_IOMMU_PGSIZES, };
[Qemu-devel] [PATCH v2 05/13] pci: Add ACS validation utility
In a PCI environment, transactions aren't always required to reach the root bus before being re-routed. Intermediate switches between an endpoint and the root bus can redirect DMA back downstream before things like IOMMUs have a chance to intervene. Legacy PCI is always susceptible to this as it operates on a shared bus. PCIe added a new capability to describe and control this behavior, Access Control Services, or ACS. The utility function pci_acs_enabled() allows us to test the ACS capabilities of an individual devices against a set of flags while pci_acs_path_enabled() tests a complete path from a given downstream device up to the specified upstream device. We also include the ability to add device specific tests as it's likely we'll see devices that do no implement ACS, but want to indicate support for various capabilities in this space. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/pci/pci.c| 76 ++ drivers/pci/quirks.c | 29 +++ include/linux/pci.h | 10 ++- 3 files changed, 114 insertions(+), 1 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 111569c..ab6c2a6 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2359,6 +2359,82 @@ void pci_enable_acs(struct pci_dev *dev) } /** + * pci_acs_enable - test ACS against required flags for a given device + * @pdev: device to test + * @acs_flags: required PCI ACS flags + * + * Return true if the device supports the provided flags. Automatically + * filters out flags that are not implemented on multifunction devices. + */ +bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) +{ + int pos; + u16 ctrl; + + if (pci_dev_specific_acs_enabled(pdev, acs_flags)) + return true; + + if (!pci_is_pcie(pdev)) + return false; + + if (pdev-pcie_type == PCI_EXP_TYPE_DOWNSTREAM || + pdev-pcie_type == PCI_EXP_TYPE_ROOT_PORT) { + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); + if (!pos) + return false; + + pci_read_config_word(pdev, pos + PCI_ACS_CTRL, ctrl); + if ((ctrl acs_flags) != acs_flags) + return false; + } else if (pdev-multifunction) { + /* Filter out flags not applicable to multifunction */ + acs_flags = (PCI_ACS_RR | PCI_ACS_CR | + PCI_ACS_EC | PCI_ACS_DT); + + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); + if (!pos) + return false; + + pci_read_config_word(pdev, pos + PCI_ACS_CTRL, ctrl); + if ((ctrl acs_flags) != acs_flags) + return false; + } + + return true; +} +EXPORT_SYMBOL_GPL(pci_acs_enabled); + +/** + * pci_acs_path_enable - test ACS flags from start to end in a hierarchy + * @start: starting downstream device + * @end: ending upstream device or NULL to search to the root bus + * @acs_flags: required flags + * + * Walk up a device tree from start to end testing PCI ACS support. If + * any step along the way does not support the required flags, return false. + */ +bool pci_acs_path_enabled(struct pci_dev *start, + struct pci_dev *end, u16 acs_flags) +{ + struct pci_dev *pdev, *parent = start; + + do { + pdev = parent; + + if (!pci_acs_enabled(pdev, acs_flags)) + return false; + + if (pci_is_root_bus(pdev-bus)) + return (end == NULL); + + parent = pdev-bus-self; + } while (pdev != end); + + return true; +} +EXPORT_SYMBOL_GPL(pci_acs_path_enabled); + +/** * pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge * @dev: the PCI device * @pin: the INTx pin (1=INTA, 2=INTB, 3=INTD, 4=INTD) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index a2dd77f..4ed6aa6 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3149,3 +3149,32 @@ struct pci_dev *pci_dma_source(struct pci_dev *dev) return dev; } + +static const struct pci_dev_acs_enabled { + u16 vendor; + u16 device; + bool (*acs_enabled)(struct pci_dev *dev, u16 acs_flags); +} pci_dev_acs_enabled[] = { + { 0 } +}; + +bool pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags) +{ + const struct pci_dev_acs_enabled *i; + + /* +* Allow devices that do not expose standard PCI ACS capabilities +* or control to indicate their support here. Multi-function devices +* which do not allow internal peer-to-peer between functions, but +* do not implement PCI ACS may wish to return true here. +*/ + for (i = pci_dev_acs_enabled; i-acs_enabled; i++) { + if ((i-vendor == dev-vendor || +i-vendor ==
[Qemu-devel] [PATCH v2 01/13] driver core: Add iommu_group tracking to struct device
IOMMU groups allow IOMMU drivers to represent DMA visibility and isolation of devices. Multiple devices may be grouped together for the purposes of DMA. Placing a pointer on struct device enable easy access for things like streaming DMA programming and drivers like VFIO. Signed-off-by: Alex Williamson alex.william...@redhat.com Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org --- include/linux/device.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/linux/device.h b/include/linux/device.h index 5ad17cc..13dd26b 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -35,6 +35,7 @@ struct subsys_private; struct bus_type; struct device_node; struct iommu_ops; +struct iommu_group; struct bus_attribute { struct attributeattr; @@ -677,6 +678,7 @@ struct device { const struct attribute_group **groups; /* optional groups */ void(*release)(struct device *dev); + struct iommu_group *iommu_group; }; /* Get the wakeup routines, which depend on struct device */
Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space
On Tue, 2012-05-22 at 09:13 +0200, Alexander Graf wrote: On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote: This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And there is no fixup_capability or something. Hrm. Maybe we should have one? :) Or instead of populating the config space with the exact data from the host device, loop through the host device capabilities and populate them using this function as we go. That should maintain the offsets, but ensure that all internal flags are set, no? That actually sounds reasonable, though it might be more efficient to have something like pci_parse_config() or similar called once with a pre-cooked config space. Internally inside pci.c it can do that same loop you mention. The advantage is that it can also do whatever else we might need in the future. If for some reason, something wants to cache a cap pointer it can be done there, whatever else that is normally initialized as fields to generate the config space can be initialized by reading the config space and then initializing the fields etc... from that one function. Cheers, Ben.
[Qemu-devel] [PATCH v2 11/13] pci: Create common pcibios_err_to_errno
For returning errors out to non-PCI code. Re-name xen's version. Signed-off-by: Alex Williamson alex.william...@redhat.com Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/xen/xen-pciback/conf_space.c |6 +++--- include/linux/pci.h | 26 ++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/xen/xen-pciback/conf_space.c b/drivers/xen/xen-pciback/conf_space.c index 30d7be0..46ae0f9 100644 --- a/drivers/xen/xen-pciback/conf_space.c +++ b/drivers/xen/xen-pciback/conf_space.c @@ -124,7 +124,7 @@ static inline u32 merge_value(u32 val, u32 new_val, u32 new_val_mask, return val; } -static int pcibios_err_to_errno(int err) +static int xen_pcibios_err_to_errno(int err) { switch (err) { case PCIBIOS_SUCCESSFUL: @@ -202,7 +202,7 @@ out: pci_name(dev), size, offset, value); *ret_val = value; - return pcibios_err_to_errno(err); + return xen_pcibios_err_to_errno(err); } int xen_pcibk_config_write(struct pci_dev *dev, int offset, int size, u32 value) @@ -290,7 +290,7 @@ int xen_pcibk_config_write(struct pci_dev *dev, int offset, int size, u32 value) } } - return pcibios_err_to_errno(err); + return xen_pcibios_err_to_errno(err); } void xen_pcibk_config_free_dyn_fields(struct pci_dev *dev) diff --git a/include/linux/pci.h b/include/linux/pci.h index 0cf57d5..b0f79b3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -467,6 +467,32 @@ static inline bool pci_dev_msi_enabled(struct pci_dev *pci_dev) { return false; #define PCIBIOS_SET_FAILED 0x88 #define PCIBIOS_BUFFER_TOO_SMALL 0x89 +/* + * Translate above to generic errno for passing back through non-pci. + */ +static inline int pcibios_err_to_errno(int err) +{ + if (err = PCIBIOS_SUCCESSFUL) + return err; /* Assume already errno */ + + switch (err) { + case PCIBIOS_FUNC_NOT_SUPPORTED: + return -ENOENT; + case PCIBIOS_BAD_VENDOR_ID: + return -EINVAL; + case PCIBIOS_DEVICE_NOT_FOUND: + return -ENODEV; + case PCIBIOS_BAD_REGISTER_NUMBER: + return -EFAULT; + case PCIBIOS_SET_FAILED: + return -EIO; + case PCIBIOS_BUFFER_TOO_SMALL: + return -ENOSPC; + } + + return -ENOTTY; +} + /* Low-level architecture-dependent routines */ struct pci_ops {
[Qemu-devel] [PATCH v2 12/13] pci: Misc pci_reg additions
Fill in many missing definitions and add sizeof fields for many sections allowing for more extensive config parsing. Signed-off-by: Alex Williamson alex.william...@redhat.com --- include/linux/pci_regs.h | 112 +- 1 files changed, 100 insertions(+), 12 deletions(-) diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h index 4b608f5..379be84 100644 --- a/include/linux/pci_regs.h +++ b/include/linux/pci_regs.h @@ -26,6 +26,7 @@ * Under PCI, each device has 256 bytes of configuration address space, * of which the first 64 bytes are standardized as follows: */ +#define PCI_STD_HEADER_SIZEOF 64 #define PCI_VENDOR_ID 0x00/* 16 bits */ #define PCI_DEVICE_ID 0x02/* 16 bits */ #define PCI_COMMAND0x04/* 16 bits */ @@ -209,9 +210,12 @@ #define PCI_CAP_ID_SHPC 0x0C/* PCI Standard Hot-Plug Controller */ #define PCI_CAP_ID_SSVID 0x0D/* Bridge subsystem vendor/device ID */ #define PCI_CAP_ID_AGP3 0x0E/* AGP Target PCI-PCI bridge */ +#define PCI_CAP_ID_SECDEV 0x0F/* Secure Device */ #define PCI_CAP_ID_EXP0x10/* PCI Express */ #define PCI_CAP_ID_MSIX 0x11/* MSI-X */ +#define PCI_CAP_ID_SATA 0x12/* SATA Data/Index Conf. */ #define PCI_CAP_ID_AF 0x13/* PCI Advanced Features */ +#define PCI_CAP_ID_MAXPCI_CAP_ID_AF #define PCI_CAP_LIST_NEXT 1 /* Next capability in the list */ #define PCI_CAP_FLAGS 2 /* Capability defined flags (16 bits) */ #define PCI_CAP_SIZEOF 4 @@ -276,6 +280,7 @@ #define PCI_VPD_ADDR_MASK 0x7fff /* Address mask */ #define PCI_VPD_ADDR_F0x8000 /* Write 0, 1 indicates completion */ #define PCI_VPD_DATA 4 /* 32-bits of data returned here */ +#define PCI_CAP_VPD_SIZEOF 8 /* Slot Identification */ @@ -297,8 +302,10 @@ #define PCI_MSI_ADDRESS_HI 8 /* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */ #define PCI_MSI_DATA_328 /* 16 bits of data for 32-bit devices */ #define PCI_MSI_MASK_3212 /* Mask bits register for 32-bit devices */ +#define PCI_MSI_PENDING_32 16 /* Pending intrs for 32-bit devices */ #define PCI_MSI_DATA_6412 /* 16 bits of data for 64-bit devices */ #define PCI_MSI_MASK_6416 /* Mask bits register for 64-bit devices */ +#define PCI_MSI_PENDING_64 20 /* Pending intrs for 64-bit devices */ /* MSI-X registers */ #define PCI_MSIX_FLAGS 2 @@ -308,6 +315,7 @@ #define PCI_MSIX_TABLE 4 #define PCI_MSIX_PBA 8 #define PCI_MSIX_FLAGS_BIRMASK(7 0) +#define PCI_CAP_MSIX_SIZEOF12 /* size of MSIX registers */ /* MSI-X entry's format */ #define PCI_MSIX_ENTRY_SIZE16 @@ -338,6 +346,7 @@ #define PCI_AF_CTRL_FLR 0x01 #define PCI_AF_STATUS 5 #define PCI_AF_STATUS_TP 0x01 +#define PCI_CAP_AF_SIZEOF 6 /* size of AF registers */ /* PCI-X registers */ @@ -374,6 +383,9 @@ #define PCI_X_STATUS_SPL_ERR 0x2000 /* Rcvd Split Completion Error Msg */ #define PCI_X_STATUS_266MHZ 0x4000 /* 266 MHz capable */ #define PCI_X_STATUS_533MHZ 0x8000 /* 533 MHz capable */ +#define PCI_X_ECC_CSR 8 /* ECC control and status */ +#define PCI_CAP_PCIX_SIZEOF_V0 8 /* size of registers for Version 0 */ +#define PCI_CAP_PCIX_SIZEOF_V1224 /* size for Version 1 2 */ /* PCI Bridge Subsystem ID registers */ @@ -462,6 +474,7 @@ #define PCI_EXP_LNKSTA_DLLLA 0x2000 /* Data Link Layer Link Active */ #define PCI_EXP_LNKSTA_LBMS 0x4000 /* Link Bandwidth Management Status */ #define PCI_EXP_LNKSTA_LABS 0x8000 /* Link Autonomous Bandwidth Status */ +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 20 /* v1 endpoints end here */ #define PCI_EXP_SLTCAP 20 /* Slot Capabilities */ #define PCI_EXP_SLTCAP_ABP0x0001 /* Attention Button Present */ #define PCI_EXP_SLTCAP_PCP0x0002 /* Power Controller Present */ @@ -521,6 +534,7 @@ #define PCI_EXP_OBFF_MSGA_EN 0x2000 /* OBFF enable with Message type A */ #define PCI_EXP_OBFF_MSGB_EN 0x4000 /* OBFF enable with Message type B */ #define PCI_EXP_OBFF_WAKE_EN 0x6000 /* OBFF using WAKE# signaling */ +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 44 /* v2 endpoints end here */ #define PCI_EXP_LNKCTL248 /* Link Control 2 */ #define PCI_EXP_SLTCTL256 /* Slot Control 2 */ @@ -529,23 +543,43 @@ #define PCI_EXT_CAP_VER(header)((header 16) 0xf) #define PCI_EXT_CAP_NEXT(header) ((header 20) 0xffc) -#define PCI_EXT_CAP_ID_ERR 1 -#define PCI_EXT_CAP_ID_VC 2 -#define PCI_EXT_CAP_ID_DSN 3 -#define PCI_EXT_CAP_ID_PWR 4 -#define PCI_EXT_CAP_ID_VNDR11 -#define
[Qemu-devel] [PATCH v2 00/13] IOMMU Groups + VFIO
Version 2 incorporating acks and feedback from v1. The PCI DMA quirk and ACS check are reworked, sysfs iommu groups ABI Documentation added as well as numerous other fixes, including patches from Alexey Kardashevskiy towards supporting POWER usage of VFIO and IOMMU groups. This series can be found here on top of 3.4: git://github.com/awilliam/linux-vfio.git iommu-group-vfio-20120521 The Qemu tree has also been updated to Qemu 1.1 and can be found here: git://github.com/awilliam/qemu-vfio.git iommu-group-vfio I'd really like to make a push to get this in for 3.5, so let's talk about how to do that across iommu, pci, and new driver. Joerg, are you sufficiently happy with the IOMMU group concept and code? We'll also need David Woodhouse buyin on the intel-iommu changes in patches 3 6. Who needs to approve VFIO as a new driver, GregKH? Bjorn, I'd be happy to send the PCI changes as a series for you, but I wonder if it makes sense to collect acks for them if you approve and bundle them in with the associated code that needs them so you're not left with unused code. Let me know which you prefer. If there are better ways to do it, please let me know. Thanks, Alex --- Alex Williamson (13): vfio: Add PCI device driver pci: Misc pci_reg additions pci: Create common pcibios_err_to_errno pci: export pci_user functions for use by other drivers vfio: x86 IOMMU implementation vfio: Add documentation vfio: VFIO core iommu: Make use of DMA quirking and ACS enabled check for groups pci: Add ACS validation utility pci: Add PCI DMA source ID quirk iommu: IOMMU groups for VT-d and AMD-Vi iommu: IOMMU Groups driver core: Add iommu_group tracking to struct device .../ABI/testing/sysfs-kernel-iommu_groups | 14 Documentation/ioctl/ioctl-number.txt |1 Documentation/vfio.txt | 315 MAINTAINERS|8 drivers/Kconfig|2 drivers/Makefile |1 drivers/iommu/amd_iommu.c | 67 + drivers/iommu/intel-iommu.c| 87 + drivers/iommu/iommu.c | 578 +++- drivers/pci/access.c |6 drivers/pci/pci.c | 76 + drivers/pci/pci.h |7 drivers/pci/quirks.c | 69 + drivers/vfio/Kconfig | 16 drivers/vfio/Makefile |3 drivers/vfio/pci/Kconfig |8 drivers/vfio/pci/Makefile |4 drivers/vfio/pci/vfio_pci.c| 557 +++ drivers/vfio/pci/vfio_pci_config.c | 1522 drivers/vfio/pci/vfio_pci_intrs.c | 724 ++ drivers/vfio/pci/vfio_pci_private.h| 91 + drivers/vfio/pci/vfio_pci_rdwr.c | 269 drivers/vfio/vfio.c| 1413 +++ drivers/vfio/vfio_iommu_x86.c | 743 ++ drivers/xen/xen-pciback/conf_space.c |6 include/linux/device.h |2 include/linux/iommu.h | 104 + include/linux/pci.h| 49 + include/linux/pci_regs.h | 112 + include/linux/vfio.h | 444 ++ 30 files changed, 7182 insertions(+), 116 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups create mode 100644 Documentation/vfio.txt create mode 100644 drivers/vfio/Kconfig create mode 100644 drivers/vfio/Makefile create mode 100644 drivers/vfio/pci/Kconfig create mode 100644 drivers/vfio/pci/Makefile create mode 100644 drivers/vfio/pci/vfio_pci.c create mode 100644 drivers/vfio/pci/vfio_pci_config.c create mode 100644 drivers/vfio/pci/vfio_pci_intrs.c create mode 100644 drivers/vfio/pci/vfio_pci_private.h create mode 100644 drivers/vfio/pci/vfio_pci_rdwr.c create mode 100644 drivers/vfio/vfio.c create mode 100644 drivers/vfio/vfio_iommu_x86.c create mode 100644 include/linux/vfio.h
[Qemu-devel] [PATCH v2 06/13] iommu: Make use of DMA quirking and ACS enabled check for groups
Incorporate DMA quirking and ACS checking into amd_iommu and intel-iommu. Note that IOMMU groups are not yet used for streaming DMA, so this doesn't immediately solve the problems with broken Ricoh devices. This a very strict implementation of ACS checking, which will often result in multifunction devices being grouped together. This is actually a good thing as we generally have no reason to trust isolation between functions, but I won't be surprised if we later add a boot option to relax this if a user wants to opt-in to a less secure grouping. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/iommu/amd_iommu.c | 18 ++ drivers/iommu/intel-iommu.c | 18 ++ 2 files changed, 36 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index b7e5ddf..be72d6d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -254,6 +254,8 @@ static bool check_device(struct device *dev) return true; } +#define PCI_ACS_ENABLED(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) + static int iommu_init_device(struct device *dev) { struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev); @@ -291,6 +293,22 @@ static int iommu_init_device(struct device *dev) dma_pdev = pci_get_slot(pdev-bus, PCI_DEVFN(PCI_SLOT(pdev-devfn), 0)); + dma_pdev = pci_dma_source(dma_pdev); + + if (dma_pdev-multifunction + !pci_acs_enabled(dma_pdev, PCI_ACS_ENABLED)) + dma_pdev = pci_get_slot(dma_pdev-bus, + PCI_DEVFN(PCI_SLOT(dma_pdev-devfn), + 0)); + + while (!pci_is_root_bus(dma_pdev-bus)) { + if (pci_acs_path_enabled(dma_pdev-bus-self, +NULL, PCI_ACS_ENABLED)) + break; + + dma_pdev = dma_pdev-bus-self; + } + group = iommu_group_get(dma_pdev-dev); if (!group) { group = iommu_group_alloc(); diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index e63b33b..cf2a650 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4087,6 +4087,8 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain, return 0; } +#define PCI_ACS_ENABLED(PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) + static int intel_iommu_add_device(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); @@ -4113,6 +4115,22 @@ static int intel_iommu_add_device(struct device *dev) dma_pdev = pci_get_slot(pdev-bus, PCI_DEVFN(PCI_SLOT(pdev-devfn), 0)); + dma_pdev = pci_dma_source(dma_pdev); + + if (dma_pdev-multifunction + !pci_acs_enabled(dma_pdev, PCI_ACS_ENABLED)) + dma_pdev = pci_get_slot(dma_pdev-bus, + PCI_DEVFN(PCI_SLOT(dma_pdev-devfn), + 0)); + + while (!pci_is_root_bus(dma_pdev-bus)) { + if (pci_acs_path_enabled(dma_pdev-bus-self, +NULL, PCI_ACS_ENABLED)) + break; + + dma_pdev = dma_pdev-bus-self; + } + group = iommu_group_get(dma_pdev-dev); if (!group) { group = iommu_group_alloc();
Re: [Qemu-devel] [PATCH next v2 00/74] QOM CPUState, part 3: CPU reset
On 05/22/2012 02:34 AM, Andreas Färber wrote: Am 21.05.2012 20:20, schrieb Blue Swirl: On Mon, May 21, 2012 at 9:09 AM, Andreas Färber afaer...@suse.de wrote: Am 14.05.2012 23:22, schrieb Blue Swirl: On Mon, May 14, 2012 at 8:59 PM, Andreas Färber afaer...@suse.de wrote: Am 14.05.2012 21:54, schrieb Blue Swirl: On Mon, May 14, 2012 at 4:15 PM, Andreas Färber afaer...@suse.de wrote: Am 10.05.2012 02:13, schrieb Andreas Färber: Andreas Färber (74): [...] target-sparc: Let cpu_sparc_init() return SPARCCPU sun4m: Use cpu_sparc_init() to obtain SPARCCPU sun4m: Pass SPARCCPU to {main,secondary}_cpu_reset() sun4u: Use cpu_sparc_init() to obtain SPARCCPU sun4u: Let cpu_devinit() return SPARCCPU sun4u: Store SPARCCPU in ResetData leon3: Use cpu_sparc_init() to obtain SPARCCPU leon3: Store SPARCCPU in ResetData [...] Forgot to mention the bsd-user patch. Kill off cpu_state_reset() Ping! Blue, can you ack please? Otherwise the patches look pretty safe (if it compiles, it works). Was I supposed to interpret that as an Acked-by? ;) No, but you may do so if it compiles. ;-) Joking, are we? My cover letter indicated pretty clearly on what platforms I had already compile-tested on, and that meant each patch. I had also run `make check` on each and the sparc test image for sun4m, HelenOS for sun4u and crible for leon3. Applied the remainder of the series to qom-next: http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-next Tristan/Fabien, could you please submit a patch to document the Leon3 machine in MAINTAINERS? No one got cc'ed on those two patches. Thanks. OK, I'll send a patch. -- Fabien Chouteau
[Qemu-devel] [PATCH v2 02/13] iommu: IOMMU Groups
IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named devices contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 :00:1e.0 - ../../../../devices/pci:00/:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 :06:0d.0 - ../../../../devices/pci:00/:00:1e.0/:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 :06:0d.1 - ../../../../devices/pci:00/:00:1e.0/:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/:00:1e.0/iommu_group - ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/:06:0d.0/iommu_group - ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/:06:0d.1/iommu_group - ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson alex.william...@redhat.com --- .../ABI/testing/sysfs-kernel-iommu_groups | 14 drivers/iommu/amd_iommu.c | 21 - drivers/iommu/intel-iommu.c| 49 -- drivers/iommu/iommu.c | 578 +++- include/linux/iommu.h | 104 +++- 5 files changed, 663 insertions(+), 103 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Documentation/ABI/testing/sysfs-kernel-iommu_groups new file mode 100644 index 000..9b31556 --- /dev/null +++
[Qemu-devel] [PATCH] Add MAINTAINERS entry for leon3
Signed-off-by: Fabien Chouteau chout...@adacore.com --- MAINTAINERS |6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index b45f075..3d773d3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -387,6 +387,12 @@ M: Blue Swirl blauwir...@gmail.com S: Maintained F: hw/sun4u.c +Leon3 +M: Fabien Chouteau chout...@adacore.com +S: Maintained +F: hw/leon3.c +F: hw/grlib* + S390 Machines - S390 Virtio -- 1.7.9.5
Re: [Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file
Am 21.05.2012 22:19, schrieb Corey Bryant: libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it. sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation. A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files. This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to open a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file. I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42? Kevin
Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure
Am 21.05.2012 19:56, schrieb Blue Swirl: On Mon, May 21, 2012 at 10:25 AM, Jim Meyering j...@meyering.net wrote: Blue Swirl wrote: On Tue, May 15, 2012 at 1:04 PM, j...@meyering.net wrote: From: Jim Meyering meyer...@redhat.com Without this, envlist_to_environ may silently fail to copy all strings into the destination buffer, and both callers would leak any env strings allocated after a failing strdup, because the freeing code stops at the first NULL pointer. Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/envlist.c b/envlist.c index f2303cd..2bbd99c 100644 --- a/envlist.c +++ b/envlist.c @@ -235,7 +235,14 @@ envlist_to_environ(const envlist_t *envlist, size_t *count) for (entry = envlist-el_entries.lh_first; entry != NULL; entry = entry-ev_link.le_next) { - *(penv++) = strdup(entry-ev_var); + if ((*(penv++) = strdup(entry-ev_var)) == NULL) { + char **e = env; + while (e != penv) { + free(*e++); + } + free(env); + return NULL; + } ERROR: code indent should never use tabs #82: FILE: envlist.c:238: +^I^Iif ((*(penv++) = strdup(entry-ev_var)) == NULL) {$ That entire file is indented solely with TABs, so adding these new lines using spaces for indentation seems unjustified: the mix tends to make the code unreadable in some contexts (email quoting, for one). How about two patches: one to convert all leading TABs in envlist.c to spaces, and the next to make the above change, but indenting with spaces? ERROR: do not use assignment in if condition #82: FILE: envlist.c:238: + if ((*(penv++) = strdup(entry-ev_var)) == NULL) { I agree with the sentiment, but found that the alternative was less readable and less maintainable, since I'd have to increment penv in two places (both in the if-block and after it) rather than in just one. However, I've just realized I can hoist the penv++ increment into the for-statement, in which case it's ok: for (entry = envlist-el_entries.lh_first; entry != NULL; entry = entry-ev_link.le_next, penv++) { *penv = strdup(entry-ev_var); if (*penv == NULL) { char **e = env; while (e = penv) { free(*e++); } free(env); return NULL; } } Your move. Which would you prefer? 1) two patches: one replacing all leading TABs with equivalent spaces, then the above patch 2) one patch, indented using TABs, in spite of the checkpatch failure 3) one patch, indented using spaces, in spite of the consistency issue 1) (or 3). Though for v1.1, maybe 3) is the smaller fix and later do 1) for 1.2. A patch replacing tabs by spaces isn't really the kind of patches that we would want to avoid during freeze. It's easy enough to check with git diff -w that it doesn't change anything semantically. Kevin
Re: [Qemu-devel] [PATCH] rbd: hook up cache options
Am 17.05.2012 22:42, schrieb Josh Durgin: Writeback caching was added in Ceph 0.46, and writethrough will be in 0.47. These are controlled by general config options, so there's no need to check for librbd version. Signed-off-by: Josh Durgin josh.dur...@inktank.com Thanks, applied to the block-next branch for 1.2. Kevin
Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu
Am 16.05.2012 12:16, schrieb Pavel Dovgaluk: I use qemu under Windows and it has two windows when executes - console and SDL ones. When I close SDL window main loop function terminates correctly, and when I close console window to terminate qemu then the code after main loop is not executed. Is there no way to catch this case and use the regular shutdown mechanism there as well? I'm not against your patch and it's probably 1.1 material, but there may be more shutdown logic that we're missing if you close the console window. Kevin
[Qemu-devel] [PATCH 0/2] envlist.c: handle strdup failure
From: Jim Meyering meyer...@redhat.com Per discussion, let's switch envlist.c to indent with spaces, and then make the fix: Jim Meyering (2): envlist.c: convert many leading TABs to spaces via expand -i envlist.c: handle strdup failure envlist.c | 272 -- 1 file changed, 140 insertions(+), 132 deletions(-) -- 1.7.10.2.552.gaa3bb87
[Qemu-devel] [PATCH 1/2] envlist.c: convert many leading TABs to spaces via expand -i
From: Jim Meyering meyer...@redhat.com Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 256 +++--- 1 file changed, 128 insertions(+), 128 deletions(-) diff --git a/envlist.c b/envlist.c index f2303cd..1d98108 100644 --- a/envlist.c +++ b/envlist.c @@ -8,13 +8,13 @@ #include envlist.h struct envlist_entry { - const char *ev_var; /* actual env value */ - QLIST_ENTRY(envlist_entry) ev_link; +const char *ev_var;/* actual env value */ +QLIST_ENTRY(envlist_entry) ev_link; }; struct envlist { - QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */ - size_t el_count;/* number of entries */ +QLIST_HEAD(, envlist_entry) el_entries;/* actual entries */ +size_t el_count; /* number of entries */ }; static int envlist_parse(envlist_t *envlist, @@ -27,15 +27,15 @@ static int envlist_parse(envlist_t *envlist, envlist_t * envlist_create(void) { - envlist_t *envlist; +envlist_t *envlist; - if ((envlist = malloc(sizeof (*envlist))) == NULL) - return (NULL); +if ((envlist = malloc(sizeof (*envlist))) == NULL) +return (NULL); - QLIST_INIT(envlist-el_entries); - envlist-el_count = 0; +QLIST_INIT(envlist-el_entries); +envlist-el_count = 0; - return (envlist); +return (envlist); } /* @@ -44,18 +44,18 @@ envlist_create(void) void envlist_free(envlist_t *envlist) { - struct envlist_entry *entry; +struct envlist_entry *entry; - assert(envlist != NULL); +assert(envlist != NULL); - while (envlist-el_entries.lh_first != NULL) { - entry = envlist-el_entries.lh_first; - QLIST_REMOVE(entry, ev_link); +while (envlist-el_entries.lh_first != NULL) { +entry = envlist-el_entries.lh_first; +QLIST_REMOVE(entry, ev_link); - free((char *)entry-ev_var); - free(entry); - } - free(envlist); +free((char *)entry-ev_var); +free(entry); +} +free(envlist); } /* @@ -72,7 +72,7 @@ envlist_free(envlist_t *envlist) int envlist_parse_set(envlist_t *envlist, const char *env) { - return (envlist_parse(envlist, env, envlist_setenv)); +return (envlist_parse(envlist, env, envlist_setenv)); } /* @@ -84,7 +84,7 @@ envlist_parse_set(envlist_t *envlist, const char *env) int envlist_parse_unset(envlist_t *envlist, const char *env) { - return (envlist_parse(envlist, env, envlist_unsetenv)); +return (envlist_parse(envlist, env, envlist_unsetenv)); } /* @@ -97,32 +97,32 @@ static int envlist_parse(envlist_t *envlist, const char *env, int (*callback)(envlist_t *, const char *)) { - char *tmpenv, *envvar; - char *envsave = NULL; - - assert(callback != NULL); - - if ((envlist == NULL) || (env == NULL)) - return (EINVAL); - - /* -* We need to make temporary copy of the env string -* as strtok_r(3) modifies it while it tokenizes. -*/ - if ((tmpenv = strdup(env)) == NULL) - return (errno); - - envvar = strtok_r(tmpenv, ,, envsave); - while (envvar != NULL) { - if ((*callback)(envlist, envvar) != 0) { - free(tmpenv); - return (errno); - } - envvar = strtok_r(NULL, ,, envsave); - } - - free(tmpenv); - return (0); +char *tmpenv, *envvar; +char *envsave = NULL; + +assert(callback != NULL); + +if ((envlist == NULL) || (env == NULL)) +return (EINVAL); + +/* + * We need to make temporary copy of the env string + * as strtok_r(3) modifies it while it tokenizes. + */ +if ((tmpenv = strdup(env)) == NULL) +return (errno); + +envvar = strtok_r(tmpenv, ,, envsave); +while (envvar != NULL) { +if ((*callback)(envlist, envvar) != 0) { +free(tmpenv); +return (errno); +} +envvar = strtok_r(NULL, ,, envsave); +} + +free(tmpenv); +return (0); } /* @@ -134,46 +134,46 @@ envlist_parse(envlist_t *envlist, const char *env, int envlist_setenv(envlist_t *envlist, const char *env) { - struct envlist_entry *entry = NULL; - const char *eq_sign; - size_t envname_len; - - if ((envlist == NULL) || (env == NULL)) - return (EINVAL); - - /* find out first equals sign in given env */ - if ((eq_sign = strchr(env, '=')) == NULL) - return (EINVAL); - envname_len = eq_sign - env + 1; - - /* -* If there
[Qemu-devel] [PATCH 2/2] envlist.c: handle strdup failure
From: Jim Meyering meyer...@redhat.com Without this, envlist_to_environ may silently fail to copy all strings into the destination buffer, and both callers would leak any env strings allocated after a failing strdup, because the freeing code stops at the first NULL pointer. Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/envlist.c b/envlist.c index 1d98108..0dfb56c 100644 --- a/envlist.c +++ b/envlist.c @@ -234,8 +234,16 @@ envlist_to_environ(const envlist_t *envlist, size_t *count) return (NULL); for (entry = envlist-el_entries.lh_first; entry != NULL; -entry = entry-ev_link.le_next) { -*(penv++) = strdup(entry-ev_var); + entry = entry-ev_link.le_next, penv++) { +*penv = strdup(entry-ev_var); +if (*penv == NULL) { +char **e = env; +while (e = penv) { +free(*e++); +} +free(env); +return NULL; +} } *penv = NULL; /* NULL terminate the list */ -- 1.7.10.2.552.gaa3bb87
Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure
Kevin Wolf wrote: A patch replacing tabs by spaces isn't really the kind of patches that we would want to avoid during freeze. It's easy enough to check with git diff -w that it doesn't change anything semantically. That makes sense, so I've posted two patches: 1) two patches: one replacing all leading TABs with equivalent spaces, then the above patch Note however, that envlist.c must predate checkpatch.pl, since that patch provokes numerous warnings about style issues: (hmm... I've just noticed these first two, which seem to suggest that even for comments I should remove the TABs. Is it worth a V2 to fix those two lines? Or are they false positives, since they're not really code indentation? ) ERROR: code indent should never use tabs #23: FILE: envlist.c:11: +const char *ev_var;^I^I^I/* actual env value */$ ERROR: code indent should never use tabs #30: FILE: envlist.c:16: +QLIST_HEAD(, envlist_entry) el_entries;^I/* actual entries */$ ERROR: code indent should never use tabs #31: FILE: envlist.c:17: +size_t el_count;^I^I^I/* number of entries */$ WARNING: space prohibited between function name and open parenthesis '(' #44: FILE: envlist.c:32: +if ((envlist = malloc(sizeof (*envlist))) == NULL) ERROR: do not use assignment in if condition #44: FILE: envlist.c:32: +if ((envlist = malloc(sizeof (*envlist))) == NULL) WARNING: braces {} are necessary for all arms of this statement #44: FILE: envlist.c:32: +if ((envlist = malloc(sizeof (*envlist))) == NULL) [...] ERROR: return is not a function, parentheses are not required #45: FILE: envlist.c:33: +return (NULL); ERROR: return is not a function, parentheses are not required #53: FILE: envlist.c:38: +return (envlist); ERROR: return is not a function, parentheses are not required #90: FILE: envlist.c:75: +return (envlist_parse(envlist, env, envlist_setenv)); ERROR: return is not a function, parentheses are not required #99: FILE: envlist.c:87: +return (envlist_parse(envlist, env, envlist_unsetenv)); WARNING: braces {} are necessary for all arms of this statement #138: FILE: envlist.c:105: +if ((envlist == NULL) || (env == NULL)) [...] ERROR: return is not a function, parentheses are not required #139: FILE: envlist.c:106: +return (EINVAL); ERROR: do not use assignment in if condition #145: FILE: envlist.c:112: +if ((tmpenv = strdup(env)) == NULL) WARNING: braces {} are necessary for all arms of this statement #145: FILE: envlist.c:112: +if ((tmpenv = strdup(env)) == NULL) [...] ERROR: return is not a function, parentheses are not required #146: FILE: envlist.c:113: +return (errno); ERROR: return is not a function, parentheses are not required #152: FILE: envlist.c:119: +return (errno); ERROR: return is not a function, parentheses are not required #158: FILE: envlist.c:125: +return (0); WARNING: braces {} are necessary for all arms of this statement #210: FILE: envlist.c:141: +if ((envlist == NULL) || (env == NULL)) [...] ERROR: return is not a function, parentheses are not required #211: FILE: envlist.c:142: +return (EINVAL); ERROR: do not use assignment in if condition #214: FILE: envlist.c:145: +if ((eq_sign = strchr(env, '=')) == NULL) WARNING: braces {} are necessary for all arms of this statement #214: FILE: envlist.c:145: +if ((eq_sign = strchr(env, '=')) == NULL) [...] ERROR: return is not a function, parentheses are not required #215: FILE: envlist.c:146: +return (EINVAL); WARNING: braces {} are necessary for all arms of this statement #225: FILE: envlist.c:156: +if (strncmp(entry-ev_var, env, envname_len) == 0) [...] WARNING: space prohibited between function name and open parenthesis '(' #237: FILE: envlist.c:168: +if ((entry = malloc(sizeof (*entry))) == NULL) ERROR: do not use assignment in if condition #237: FILE: envlist.c:168: +if ((entry = malloc(sizeof (*entry))) == NULL) WARNING: braces {} are necessary for all arms of this statement #237: FILE: envlist.c:168: +if ((entry = malloc(sizeof (*entry))) == NULL) [...] ERROR: return is not a function, parentheses are not required #238: FILE: envlist.c:169: +return (errno); ERROR: do not use assignment in if condition #239: FILE: envlist.c:170: +if ((entry-ev_var = strdup(env)) == NULL) { ERROR: return is not a function, parentheses are not required #241: FILE: envlist.c:172: +return (errno); ERROR: return is not a function, parentheses are not required #245: FILE: envlist.c:176: +return (0); WARNING: braces {} are necessary for all arms of this statement #284: FILE: envlist.c:189: +if ((envlist == NULL) || (env == NULL)) [...] ERROR: return is not a function, parentheses are not required #285: FILE: envlist.c:190: +
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Hi Stefan On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Yes, I use IDE since I boot from this LUN. I just managed to track it down to the IDE changes. It looks like basically this change triggered it : commit bef0fd5958120542f126f2dedbfce65d8839a94d Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date: Â Thu Mar 29 10:31:30 2012 +0100 Â Â ide: convert ide_sector_read() to asynchronous I/O Stefan, any ideas? Â Ill continue trying to understand the IDE code tomorrow and maybe I make some progress. It sounds like aio for iSCSI is broken. Â IDE submits bdrv_aio_readv() requests which iSCSI handles. Â QEMU main loop should find that the iSCSI socket file descriptor becomes readable when the SCSI response arrives. Â We can then report I/O completion back to the guest. However, it appears that completions are not triggering. Â My guess is that once the OS is fully booted you don't see the problem because other events are pumping the main loop besides a 55ms timer. Therefore this problem probably applies across the board but it's most noticable when the BIOS/bootloader/early OS is executing. I booted off a CDROM image and then had iscsi device as a disk. From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null bs=512 iflag=direct' but did not see these 55ms delays anymore. I did however see a ~4ms delay . Have you added debugging into block/iscsi.c or libiscsi to watch when read handlers get called? Â Using tcpdump or strace might also be useful to figure out when the reply becomes available and why we aren't registering a read handler function with the QEMU main loop. I just did that, and I print a message and timeval to stdout for the functions in block/iscsi.c What seems to happen is that we call iscsi_aio_readv() which sets up a write event but this event is not triggered for another 55ms. Since the descriptor is writeable at this point, i would expected that the write_event_callback would trigger almost immediately. Below iscsi_aio_readv is printed when we enter this function. This function prepares a CDB and sets it up for queueing, but it does not actually write it to the socket. At the end of this function, we set up the events by calling iscsi_set_events() and we set it up for writeable. iscsi_process_write() is when we enter the fd-is-writeable callback. iscsi_aio_readv 1337676989.429822 iscsi_set_events write_event?Y 1337676989.429834 iscsi_process_write 1337676989.484424 iscsi_set_events write_event?N 1337676989.484507 iscsi_process_read1337676989.484628 iscsi_aio_read10_cb 1337676989.484647 iscsi_set_events write_event?N 1337676989.484660 iscsi_readv_writev_bh_cb 1337676989.484665 iscsi_aio_readv 1337676989.484751 So what happens seems to be that once we have set up the i/o and queued it, and we set up the writeable event, but this event is not triggered immediately! It takes another 55ms before the event is triggered. To me this looks like a bug in the eventsystem. The socket is always writeable. shouldnt the callback be invoked almost immediately here ? regards ronnie sahlberg
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) { struct iscsi_context *iscsi = iscsilun-iscsi; +if (iscsi_which_events(iscsi) POLLOUT) { +iscsi_process_write(iscsilun); +} + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, On Tue, May 22, 2012 at 7:07 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Hi Stefan On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Yes, I use IDE since I boot from this LUN. I just managed to track it down to the IDE changes. It looks like basically this change triggered it : commit bef0fd5958120542f126f2dedbfce65d8839a94d Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date:  Thu Mar 29 10:31:30 2012 +0100   ide: convert ide_sector_read() to asynchronous I/O Stefan, any ideas?  Ill continue trying to understand the IDE code tomorrow and maybe I make some progress. It sounds like aio for iSCSI is broken.  IDE submits bdrv_aio_readv() requests which iSCSI handles.  QEMU main loop should find that the iSCSI socket file descriptor becomes readable when the SCSI response arrives.  We can then report I/O completion back to the guest. However, it appears that completions are not triggering.  My guess is that once the OS is fully booted you don't see the problem because other events are pumping the main loop besides a 55ms timer. Therefore this problem probably applies across the board but it's most noticable when the BIOS/bootloader/early OS is executing. I booted off a CDROM image and then had iscsi device as a disk. From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null bs=512 iflag=direct' but did not see these 55ms delays anymore. I did however see a ~4ms delay . Have you added debugging into block/iscsi.c or libiscsi to watch when read handlers get called?  Using tcpdump or strace might also be useful to figure out when the reply becomes available and why we aren't registering a read handler function with the QEMU main loop. I just did that, and I print a message and timeval to stdout for the functions in block/iscsi.c What seems to happen is that we call iscsi_aio_readv() which sets up a write event  but this event is not triggered for another 55ms. Since the descriptor is writeable at this point,  i would expected that the write_event_callback would trigger almost immediately. Below  iscsi_aio_readv  is printed when we enter this function. This function prepares a CDB and sets it up for queueing,  but it does not actually write it to the socket. At the end of this function, we set up the events  by calling iscsi_set_events() and we set it up for writeable. iscsi_process_write() is when we enter the fd-is-writeable callback. iscsi_aio_readv          1337676989.429822 iscsi_set_events  write_event?Y  1337676989.429834 iscsi_process_write        1337676989.484424 iscsi_set_events  write_event?N  1337676989.484507 iscsi_process_read         1337676989.484628 iscsi_aio_read10_cb        1337676989.484647 iscsi_set_events  write_event?N  1337676989.484660 iscsi_readv_writev_bh_cb      1337676989.484665 iscsi_aio_readv          1337676989.484751 So  what happens seems to be that once we have set up the i/o  and queued it,  and we set up the writeable event, but this event is not triggered immediately! It takes another 55ms before the event is triggered. To me this looks like a bug in the eventsystem. The socket is always writeable. shouldnt the callback be invoked almost immediately here ? regards ronnie sahlberg
Re: [Qemu-devel] [PATCH] rbd: hook up cache options
Il 17/05/2012 22:42, Josh Durgin ha scritto: + * Fallback to more conservative semantics if setting cache + * options fails. Ignore errors from setting rbd_cache because the + * only possible error is that the option does not exist, and + * librbd defaults to no caching. If write through caching cannot + * be set up, fall back to no caching. + */ +if (flags BDRV_O_NOCACHE) { +rados_conf_set(s-cluster, rbd_cache, false); +} else { +rados_conf_set(s-cluster, rbd_cache, true); +if (!(flags BDRV_O_CACHE_WB)) { +r = rados_conf_set(s-cluster, rbd_cache_max_dirty, 0); +if (r 0) { +rados_conf_set(s-cluster, rbd_cache, false); +} +} +} Last time I looked at ceph, rbd_flush was not a full flush of the cache; it only ensured that the pending requests were sent. So my questions are: 1) has this changed? does rbd_flush now flush dirty items when rbd_cache_max_dirty 0? 2) should the usage of a cache be conditional on LIBRBD_VERSION_CODE = LIBRBD_VERSION(0, 1, 1)? Thanks, Paolo
Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu
Il 22/05/2012 10:51, Kevin Wolf ha scritto: Am 16.05.2012 12:16, schrieb Pavel Dovgaluk: I use qemu under Windows and it has two windows when executes - console and SDL ones. When I close SDL window main loop function terminates correctly, and when I close console window to terminate qemu then the code after main loop is not executed. Is there no way to catch this case and use the regular shutdown mechanism there as well? I'm not against your patch and it's probably 1.1 material, but there may be more shutdown logic that we're missing if you close the console window. Looks like qemu_ctrl_handler (in os-win32.c) needs to do something like termsig_handler (in os-posix.c) instead of a plain exit. Paolo
Re: [Qemu-devel] [RFC:kvm] export host NUMA info to guest make emulated device NUMA attr
On Sat, May 19, 2012 at 12:14 AM, Shirley Ma mashi...@us.ibm.com wrote: On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote: Currently, the guest can not know the NUMA info of the vcpu, which will result in performance drawback. This is the discovered and experiment by     Shirley Ma x...@us.ibm.com     Krishna Kumar krkum...@in.ibm.com     Tom Lendacky t...@us.ibm.com Refer to - http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html we can see the big perfermance gap between NUMA aware and unaware. Enlightened by their discovery, I think, we can do more work -- that is to export NUMA info of host to guest. There three problems we've found: 1. KVM doesn't support NUMA load balancer. Even there are no other workloads in the system, and the number of vcpus on the guest is smaller than the number of cpus per node, the vcpus could be scheduled on different nodes. Someone is working on in-kernel solution. Andrew Theurer has a working user-space NUMA aware VM balancer, it requires libvirt and cgroups (which is default for RHEL6 systems). Interesting, and I found that sched/numa: Introduce sys_numa_{t,m}bind() committed by Peter and Ingo may help. But I think from the guest view, it can not tell whether the two vcpus are on the same host node. For example, vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more expensive if it pull_task from vcpu-a and choose vcpu-b to push. And my idea is to export such info to guest, still working on it. 2. The host scheduler is not aware the relationship between guest vCPUs and vhost. So it's possible for host scheduler to schedule per-device vhost thread on the same cpu on which the vCPU kick a TX packet, or schecule vhost thread on different node than the vCPU for; For RX packet it's possible for vhost delivers RX packet on the vCPU running on different node too. Yes. I notice this point in your original patch. 3. per-device vhost thread is not scaled. What about the scale-ability of per-vm * host_NUMA_NODE? When we make advantage of multi-core, we produce mulit vcpu threads for one VM. So what about the emulated device? Is it acceptable to scale to take advantage of host NUMA attr. After all, how many nodes on which the VM can be run on are the user's control. It is a balance of scale-ability and performance. So the problems are in host scheduling and vhost thread scalability. I am not sure how much help from exposing NUMA info from host to guest. Have you tested these patched? How much performance gain here? Sorry, not yet. As you have mentioned, the vhost thread scalability is a big problem. So I want to see others' opinion before going on. Thanks and regards, pingfan Thanks Shirley So here comes the idea: 1. export host numa info through guest's sched domain to its scheduler  Export vcpu's NUMA info to guest scheduler(I think mem NUMA problem  has been handled by host).  So the guest's lb will consider the cost.  I am still working on this, and my original idea is to export these info  through static struct sched_domain_topology_level *sched_domain_topology  to guest. 2. Do a better emulation of virt mach exported to guest.  In real world, the devices are limited by kinds of reasons to own the NUMA  property. But as to Qemu, the device is emulated by thread, which inherit  the NUMA attr in nature.  We can implement the device as components of many  logic units, each of the unit is backed by a thread in different host node.  Currently, I want to start the work on vhost. But I think, maybe in  future, the iothread in Qemu can also has such attr. Forgive me, for the limited time, I can not have more better understand of vhost/virtio_net drivers. These patches are just draft, _FAR_, _FAR_ from work. I will do more detail work for them in future. To easy the review, the following is the sum up of the 2nd point of the idea. As for the 1st point of the idea, it is not reflected in the patches. --spread/shrink the vhost_workers over the host nodes as demanded from Qemu.  And we can consider each vhost_worker as an independent net logic device  embeded in physical device vhost_net.  At the meanwhile, we spread vcpu  threads over the host node.  The vrings on guest are allocated PAGE_SIZE align separately, so they can  will only be mapped into different host node, so vhost_worker in the same  node can access it with the least cost. So does the vq on guest. --virtio_net driver will changes and talk with the logic device. And which  logic device it will talk to is determined by on which vcpu it is scheduled. --the binding of vcpus and vhost_worker is implemented by:  for call direction, vq-a in the node-A will have a dedicated irq-a. And  we set the irq-a's affinity to vcpus in node-A.  for kick direction, kick register-b trigger different eventfd-b which wake up  vhost_worker-b.
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Il 22/05/2012 11:15, ronnie sahlberg ha scritto: Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) { struct iscsi_context *iscsi = iscsilun-iscsi; +if (iscsi_which_events(iscsi) POLLOUT) { +iscsi_process_write(iscsilun); +} + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, Doh, now I remember. Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. The alternative is your patch, which is correct. Can you resend it with SoB? Paolo
Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure
Am 22.05.2012 11:05, schrieb Jim Meyering: Kevin Wolf wrote: A patch replacing tabs by spaces isn't really the kind of patches that we would want to avoid during freeze. It's easy enough to check with git diff -w that it doesn't change anything semantically. That makes sense, so I've posted two patches: 1) two patches: one replacing all leading TABs with equivalent spaces, then the above patch Note however, that envlist.c must predate checkpatch.pl, since that patch provokes numerous warnings about style issues: Quite possible. Most files in qemu will still contain some violations. (hmm... I've just noticed these first two, which seem to suggest that even for comments I should remove the TABs. Is it worth a V2 to fix those two lines? Or are they false positives, since they're not really code indentation? ) We don't use tabs at all where possible. There shouldn't be much more exceptions than the Makefiles. Kevin
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/05/2012 11:15, ronnie sahlberg ha scritto: Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, Â when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) Â { Â Â Â struct iscsi_context *iscsi = iscsilun-iscsi; + Â Â if (iscsi_which_events(iscsi) POLLOUT) { + Â Â Â Â iscsi_process_write(iscsilun); + Â Â } + Â Â Â qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, Â Â Â Â Â Â Â Â Â Â Â Â Â Â (iscsi_which_events(iscsi) POLLOUT) Â Â Â Â Â Â Â Â Â Â Â Â Â Â ? iscsi_process_write : NULL, Doh, now I remember. Â Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. Thanks. I just confirmed that qemu_notify_event() fixes the issue. Ill send a patch that uses qemu_notify_event() and a comment why this is needed. regards ronnie sahlberg
[Qemu-devel] [PATCHv2 0/2] envlist.c: handle strdup failure
From: Jim Meyering meyer...@redhat.com This is the same as v1, except that two lines of non-leading TABs in envlist.c (indenting comments after code) have also been converted to use equivalent spaces instead of TABs. Jim Meyering (2): envlist.c: convert all TABs to equivalent spaces envlist.c: handle strdup failure envlist.c | 272 -- 1 file changed, 140 insertions(+), 132 deletions(-) -- 1.7.10.2.552.gaa3bb87
[Qemu-devel] [PATCHv2 2/2] envlist.c: handle strdup failure
From: Jim Meyering meyer...@redhat.com Without this, envlist_to_environ may silently fail to copy all strings into the destination buffer, and both callers would leak any env strings allocated after a failing strdup, because the freeing code stops at the first NULL pointer. Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/envlist.c b/envlist.c index be0addb..7532091 100644 --- a/envlist.c +++ b/envlist.c @@ -234,8 +234,16 @@ envlist_to_environ(const envlist_t *envlist, size_t *count) return (NULL); for (entry = envlist-el_entries.lh_first; entry != NULL; -entry = entry-ev_link.le_next) { -*(penv++) = strdup(entry-ev_var); + entry = entry-ev_link.le_next, penv++) { +*penv = strdup(entry-ev_var); +if (*penv == NULL) { +char **e = env; +while (e = penv) { +free(*e++); +} +free(env); +return NULL; +} } *penv = NULL; /* NULL terminate the list */ -- 1.7.10.2.552.gaa3bb87
[Qemu-devel] [PATCHv2 1/2] envlist.c: convert all TABs to equivalent spaces
From: Jim Meyering meyer...@redhat.com Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 256 +++--- 1 file changed, 128 insertions(+), 128 deletions(-) diff --git a/envlist.c b/envlist.c index f2303cd..be0addb 100644 --- a/envlist.c +++ b/envlist.c @@ -8,13 +8,13 @@ #include envlist.h struct envlist_entry { - const char *ev_var; /* actual env value */ - QLIST_ENTRY(envlist_entry) ev_link; +const char *ev_var; /* actual env value */ +QLIST_ENTRY(envlist_entry) ev_link; }; struct envlist { - QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */ - size_t el_count;/* number of entries */ +QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */ +size_t el_count;/* number of entries */ }; static int envlist_parse(envlist_t *envlist, @@ -27,15 +27,15 @@ static int envlist_parse(envlist_t *envlist, envlist_t * envlist_create(void) { - envlist_t *envlist; +envlist_t *envlist; - if ((envlist = malloc(sizeof (*envlist))) == NULL) - return (NULL); +if ((envlist = malloc(sizeof (*envlist))) == NULL) +return (NULL); - QLIST_INIT(envlist-el_entries); - envlist-el_count = 0; +QLIST_INIT(envlist-el_entries); +envlist-el_count = 0; - return (envlist); +return (envlist); } /* @@ -44,18 +44,18 @@ envlist_create(void) void envlist_free(envlist_t *envlist) { - struct envlist_entry *entry; +struct envlist_entry *entry; - assert(envlist != NULL); +assert(envlist != NULL); - while (envlist-el_entries.lh_first != NULL) { - entry = envlist-el_entries.lh_first; - QLIST_REMOVE(entry, ev_link); +while (envlist-el_entries.lh_first != NULL) { +entry = envlist-el_entries.lh_first; +QLIST_REMOVE(entry, ev_link); - free((char *)entry-ev_var); - free(entry); - } - free(envlist); +free((char *)entry-ev_var); +free(entry); +} +free(envlist); } /* @@ -72,7 +72,7 @@ envlist_free(envlist_t *envlist) int envlist_parse_set(envlist_t *envlist, const char *env) { - return (envlist_parse(envlist, env, envlist_setenv)); +return (envlist_parse(envlist, env, envlist_setenv)); } /* @@ -84,7 +84,7 @@ envlist_parse_set(envlist_t *envlist, const char *env) int envlist_parse_unset(envlist_t *envlist, const char *env) { - return (envlist_parse(envlist, env, envlist_unsetenv)); +return (envlist_parse(envlist, env, envlist_unsetenv)); } /* @@ -97,32 +97,32 @@ static int envlist_parse(envlist_t *envlist, const char *env, int (*callback)(envlist_t *, const char *)) { - char *tmpenv, *envvar; - char *envsave = NULL; - - assert(callback != NULL); - - if ((envlist == NULL) || (env == NULL)) - return (EINVAL); - - /* -* We need to make temporary copy of the env string -* as strtok_r(3) modifies it while it tokenizes. -*/ - if ((tmpenv = strdup(env)) == NULL) - return (errno); - - envvar = strtok_r(tmpenv, ,, envsave); - while (envvar != NULL) { - if ((*callback)(envlist, envvar) != 0) { - free(tmpenv); - return (errno); - } - envvar = strtok_r(NULL, ,, envsave); - } - - free(tmpenv); - return (0); +char *tmpenv, *envvar; +char *envsave = NULL; + +assert(callback != NULL); + +if ((envlist == NULL) || (env == NULL)) +return (EINVAL); + +/* + * We need to make temporary copy of the env string + * as strtok_r(3) modifies it while it tokenizes. + */ +if ((tmpenv = strdup(env)) == NULL) +return (errno); + +envvar = strtok_r(tmpenv, ,, envsave); +while (envvar != NULL) { +if ((*callback)(envlist, envvar) != 0) { +free(tmpenv); +return (errno); +} +envvar = strtok_r(NULL, ,, envsave); +} + +free(tmpenv); +return (0); } /* @@ -134,46 +134,46 @@ envlist_parse(envlist_t *envlist, const char *env, int envlist_setenv(envlist_t *envlist, const char *env) { - struct envlist_entry *entry = NULL; - const char *eq_sign; - size_t envname_len; - - if ((envlist == NULL) || (env == NULL)) - return (EINVAL); - - /* find out first equals sign in given env */ - if ((eq_sign = strchr(env, '=')) == NULL) - return (EINVAL); - envname_len = eq_sign - env + 1; - - /* -* If there already
Re: [Qemu-devel] [PATCH 1/3] envlist.c: handle strdup failure
Kevin Wolf wrote: Am 22.05.2012 11:05, schrieb Jim Meyering: Kevin Wolf wrote: A patch replacing tabs by spaces isn't really the kind of patches that we would want to avoid during freeze. It's easy enough to check with git diff -w that it doesn't change anything semantically. That makes sense, so I've posted two patches: 1) two patches: one replacing all leading TABs with equivalent spaces, then the above patch Note however, that envlist.c must predate checkpatch.pl, since that patch provokes numerous warnings about style issues: Quite possible. Most files in qemu will still contain some violations. (hmm... I've just noticed these first two, which seem to suggest that even for comments I should remove the TABs. Is it worth a V2 to fix those two lines? Or are they false positives, since they're not really code indentation? ) We don't use tabs at all where possible. There shouldn't be much more exceptions than the Makefiles. Sounds like a v2 would be welcome. Here you go...
Re: [Qemu-devel] qemu-microblaze-system
Hi Peter On 05/22/2012 07:45 AM, Peter Crosthwaite wrote: Hi Michael, The microblaze linux kernel is available at: http://wiki.xilinx.com/ If you wish to do your own kernel config from scratch. There is some brief documentation there on how to do a bringup. The qemu port for mb maintained by PetaLogix and I notice you are using a student email domain, so you likely eligible for a free academic license of the petalogix tools which automate the board bringup/kernel config etc, (and it will include our version of QEMU which can boot a lot more FPGA-microblaze platforms than just the ml605 ans s3adsp): Yes I know, but what is the difference with the mainline qemu? I'm working for an European Project that use microblaze and multiple pVex cores. I have already done a driver that can use the rVex throught microblaze but we need to do some simulation on qemu www.petalogix.com I have a Xilinx XPS project that exactly matches the ML605 qemu platform that I can send you as well, and I can also send you raw kernel config if that helps? Yes, this can definitely help. Can you send both? Regards Michael Peter On Mon, May 14, 2012 at 7:42 PM, Michael Trimarchi trimar...@gandalf.sssup.it wrote: Hi all, I would like to start using/testing the qemu microblaze system, but I don't find information about linux kernel configuration that match the ml605 hardware design? Is there any howto? Michael
Re: [Qemu-devel] [PATCHv2 1/2] envlist.c: convert all TABs to equivalent spaces
On 22 May 2012 10:50, Jim Meyering j...@meyering.net wrote: From: Jim Meyering meyer...@redhat.com If we're going to go to the effort of a complete reindent patch we should actually reindent to the QEMU coding style standard, which is four-space, not eight. -- PMM
[Qemu-devel] [PATCH] ISCSI: We need to call qemu_notify_event() everytime we update which events we need to be notified for.
Otherwise, If we add an event for -is-writeable but the socket is already writeable there may be a short delay before the event callback is actually triggered. Those delays would in particular hurt performance during BIOS boot and when the GRUB bootloader reads the kernel and initrd. Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com --- block/iscsi.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..f956824 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -109,6 +109,13 @@ iscsi_set_events(IscsiLun *iscsilun) (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, iscsi_process_flush, iscsilun); + +/* If we just added the event for writeable we must call + and the socket is already writeable the callback might + not be invoked until after a short delay unless we call + qemu_notify_event(). + */ +qemu_notify_event(); } static void -- 1.7.3.1
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
None of the other drivers in block/*.c call qemu_notify_event() Do you you think those should be audited and have this call added to where required ? regards ronnie sahlberg On Tue, May 22, 2012 at 7:48 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/05/2012 11:15, ronnie sahlberg ha scritto: Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, Â when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) Â { Â Â Â struct iscsi_context *iscsi = iscsilun-iscsi; + Â Â if (iscsi_which_events(iscsi) POLLOUT) { + Â Â Â Â iscsi_process_write(iscsilun); + Â Â } + Â Â Â qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, Â Â Â Â Â Â Â Â Â Â Â Â Â Â (iscsi_which_events(iscsi) POLLOUT) Â Â Â Â Â Â Â Â Â Â Â Â Â Â ? iscsi_process_write : NULL, Doh, now I remember. Â Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. Thanks. Â I just confirmed that qemu_notify_event() fixes the issue. Ill send a patch that uses qemu_notify_event() and a comment why this is needed. regards ronnie sahlberg
Re: [Qemu-devel] Current differences between qemu --enable-kvm and qemu-kvm?
On Mon, May 21, 2012 at 5:54 PM, Erik Rull webmas...@rdsoftware.de wrote: is there a summary existing that shows up the rough or actual differences between qemu --enable-kvm and qemu-kvm? I tested both versions with the same compile and start options, the CPU performance results are identical, only the bootup time of my guest system with qemu-kvm seemed to be a bit faster (not measured, it just feeled so). For production KVM instances I think it still makes sense to use qemu-kvm packages from your distro or qemu-kvm upstream source. Jan Kiszka has reduced the delta between qemu.git and qemu-kvm.git to the point where I think the list of differences is rather small - maybe PCI passthrough stuff, irqfd for vhost-net (which is now also being upstreamed into qemu.git), and a few other things I don't know of. For development most patches should be against qemu.git unless they have a dependency on qemu-kvm.git code. Stefan
[Qemu-devel] [PATCH] ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN.
This allows using LUNs bigger than 2TB. Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com --- block/iscsi.c | 101 trace-events |4 +- 2 files changed, 74 insertions(+), 31 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index f956824..ed1ad7b 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -25,6 +25,7 @@ #include config-host.h #include poll.h +#include arpa/inet.h #include qemu-common.h #include qemu-error.h #include block_int.h @@ -168,12 +169,12 @@ iscsi_readv_writev_bh_cb(void *p) static void -iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status, +iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) { IscsiAIOCB *acb = opaque; -trace_iscsi_aio_write10_cb(iscsi, status, acb, acb-canceled); +trace_iscsi_aio_write16_cb(iscsi, status, acb, acb-canceled); g_free(acb-buf); @@ -186,7 +187,7 @@ iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status, acb-status = 0; if (status 0) { -error_report(Failed to write10 data to iSCSI lun. %s, +error_report(Failed to write16 data to iSCSI lun. %s, iscsi_get_error(iscsi)); acb-status = -EIO; } @@ -211,12 +212,9 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, struct iscsi_context *iscsi = iscsilun-iscsi; IscsiAIOCB *acb; size_t size; -int fua = 0; - -/* set FUA on writes when cache mode is write through */ -if (!(bs-open_flags BDRV_O_CACHE_WB)) { -fua = 1; -} +uint32_t num_sectors; +uint64_t lba; +struct iscsi_data data; acb = qemu_aio_get(iscsi_aio_pool, bs, cb, opaque); trace_iscsi_aio_writev(iscsi, sector_num, nb_sectors, opaque, acb); @@ -226,18 +224,44 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, acb-canceled = 0; -/* XXX we should pass the iovec to write10 to avoid the extra copy */ +/* XXX we should pass the iovec to write16 to avoid the extra copy */ /* this will allow us to get rid of 'buf' completely */ size = nb_sectors * BDRV_SECTOR_SIZE; acb-buf = g_malloc(size); qemu_iovec_to_buffer(acb-qiov, acb-buf); -acb-task = iscsi_write10_task(iscsi, iscsilun-lun, acb-buf, size, - sector_qemu2lun(sector_num, iscsilun), - fua, 0, iscsilun-block_size, - iscsi_aio_write10_cb, acb); + + +acb-task = malloc(sizeof(struct scsi_task)); if (acb-task == NULL) { -error_report(iSCSI: Failed to send write10 command. %s, - iscsi_get_error(iscsi)); +error_report(iSCSI: Failed to allocate task for scsi WRITE16 + command. %s, iscsi_get_error(iscsi)); +qemu_aio_release(acb); +return NULL; +} +memset(acb-task, 0, sizeof(struct scsi_task)); + +acb-task-xfer_dir = SCSI_XFER_WRITE; +acb-task-cdb_size = 16; +acb-task-cdb[0] = 0x8a; +if (!(bs-open_flags BDRV_O_CACHE_WB)) { +/* set FUA on writes when cache mode is write through */ +acb-task-cdb[1] |= 0x04; +} +lba = sector_qemu2lun(sector_num, iscsilun); +*(uint32_t *)acb-task-cdb[2] = htonl(lba 32); +*(uint32_t *)acb-task-cdb[6] = htonl(lba 0x); +num_sectors = size / iscsilun-block_size; +*(uint32_t *)acb-task-cdb[10] = htonl(num_sectors); +acb-task-expxferlen = size; + +data.data = acb-buf; +data.size = size; + +if (iscsi_scsi_command_async(iscsi, iscsilun-lun, acb-task, + iscsi_aio_write16_cb, + data, + acb) != 0) { +scsi_free_scsi_task(acb-task); g_free(acb-buf); qemu_aio_release(acb); return NULL; @@ -249,12 +273,12 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, } static void -iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status, +iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) { IscsiAIOCB *acb = opaque; -trace_iscsi_aio_read10_cb(iscsi, status, acb, acb-canceled); +trace_iscsi_aio_read16_cb(iscsi, status, acb, acb-canceled); if (acb-canceled != 0) { qemu_aio_release(acb); @@ -265,7 +289,7 @@ iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status, acb-status = 0; if (status != 0) { -error_report(Failed to read10 data from iSCSI lun. %s, +error_report(Failed to read16 data from iSCSI lun. %s, iscsi_get_error(iscsi)); acb-status = -EIO; } @@ -284,8 +308,10 @@ iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num, IscsiLun *iscsilun = bs-opaque; struct iscsi_context *iscsi = iscsilun-iscsi; IscsiAIOCB *acb; -size_t qemu_read_size, lun_read_size;
Re: [Qemu-devel] [PATCH v3 0/8] msi: Refactorings and reset fixes
On 11.05.2012, at 16:42, Jan Kiszka wrote: v3 is v2 - xhci changes as MSI is instable there anyway. So, only patches 1 and 2 are for 1.1/stable now. Just saw these in my inbox as unanswered, so mst: ping? :) Alex
[Qemu-devel] [PATCHv3 0/2] envlist.c: handle strdup failure
From: Jim Meyering meyer...@redhat.com Same as v2, but now with TABs converted using expand --tabs=4. Jim Meyering (2): envlist.c: convert each TAB(width-4) to equivalent spaces envlist.c: handle strdup failure envlist.c | 272 -- 1 file changed, 140 insertions(+), 132 deletions(-) -- 1.7.10.2.552.gaa3bb87
[Qemu-devel] [PATCHv3 1/2] envlist.c: convert each TAB(width-4) to equivalent spaces
From: Jim Meyering meyer...@redhat.com Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 256 +++--- 1 file changed, 128 insertions(+), 128 deletions(-) diff --git a/envlist.c b/envlist.c index f2303cd..e44889b 100644 --- a/envlist.c +++ b/envlist.c @@ -8,13 +8,13 @@ #include envlist.h struct envlist_entry { - const char *ev_var; /* actual env value */ - QLIST_ENTRY(envlist_entry) ev_link; +const char *ev_var; /* actual env value */ +QLIST_ENTRY(envlist_entry) ev_link; }; struct envlist { - QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */ - size_t el_count;/* number of entries */ +QLIST_HEAD(, envlist_entry) el_entries; /* actual entries */ +size_t el_count;/* number of entries */ }; static int envlist_parse(envlist_t *envlist, @@ -27,15 +27,15 @@ static int envlist_parse(envlist_t *envlist, envlist_t * envlist_create(void) { - envlist_t *envlist; +envlist_t *envlist; - if ((envlist = malloc(sizeof (*envlist))) == NULL) - return (NULL); +if ((envlist = malloc(sizeof (*envlist))) == NULL) +return (NULL); - QLIST_INIT(envlist-el_entries); - envlist-el_count = 0; +QLIST_INIT(envlist-el_entries); +envlist-el_count = 0; - return (envlist); +return (envlist); } /* @@ -44,18 +44,18 @@ envlist_create(void) void envlist_free(envlist_t *envlist) { - struct envlist_entry *entry; +struct envlist_entry *entry; - assert(envlist != NULL); +assert(envlist != NULL); - while (envlist-el_entries.lh_first != NULL) { - entry = envlist-el_entries.lh_first; - QLIST_REMOVE(entry, ev_link); +while (envlist-el_entries.lh_first != NULL) { +entry = envlist-el_entries.lh_first; +QLIST_REMOVE(entry, ev_link); - free((char *)entry-ev_var); - free(entry); - } - free(envlist); +free((char *)entry-ev_var); +free(entry); +} +free(envlist); } /* @@ -72,7 +72,7 @@ envlist_free(envlist_t *envlist) int envlist_parse_set(envlist_t *envlist, const char *env) { - return (envlist_parse(envlist, env, envlist_setenv)); +return (envlist_parse(envlist, env, envlist_setenv)); } /* @@ -84,7 +84,7 @@ envlist_parse_set(envlist_t *envlist, const char *env) int envlist_parse_unset(envlist_t *envlist, const char *env) { - return (envlist_parse(envlist, env, envlist_unsetenv)); +return (envlist_parse(envlist, env, envlist_unsetenv)); } /* @@ -97,32 +97,32 @@ static int envlist_parse(envlist_t *envlist, const char *env, int (*callback)(envlist_t *, const char *)) { - char *tmpenv, *envvar; - char *envsave = NULL; - - assert(callback != NULL); - - if ((envlist == NULL) || (env == NULL)) - return (EINVAL); - - /* -* We need to make temporary copy of the env string -* as strtok_r(3) modifies it while it tokenizes. -*/ - if ((tmpenv = strdup(env)) == NULL) - return (errno); - - envvar = strtok_r(tmpenv, ,, envsave); - while (envvar != NULL) { - if ((*callback)(envlist, envvar) != 0) { - free(tmpenv); - return (errno); - } - envvar = strtok_r(NULL, ,, envsave); - } - - free(tmpenv); - return (0); +char *tmpenv, *envvar; +char *envsave = NULL; + +assert(callback != NULL); + +if ((envlist == NULL) || (env == NULL)) +return (EINVAL); + +/* + * We need to make temporary copy of the env string + * as strtok_r(3) modifies it while it tokenizes. + */ +if ((tmpenv = strdup(env)) == NULL) +return (errno); + +envvar = strtok_r(tmpenv, ,, envsave); +while (envvar != NULL) { +if ((*callback)(envlist, envvar) != 0) { +free(tmpenv); +return (errno); +} +envvar = strtok_r(NULL, ,, envsave); +} + +free(tmpenv); +return (0); } /* @@ -134,46 +134,46 @@ envlist_parse(envlist_t *envlist, const char *env, int envlist_setenv(envlist_t *envlist, const char *env) { - struct envlist_entry *entry = NULL; - const char *eq_sign; - size_t envname_len; - - if ((envlist == NULL) || (env == NULL)) - return (EINVAL); - - /* find out first equals sign in given env */ - if ((eq_sign = strchr(env, '=')) == NULL) - return (EINVAL); - envname_len = eq_sign - env + 1; - - /* -* If there already exists variable with given name -* we remove and release it before allocating a whole -* new entry. -*/ - for (entry = envlist-el_entries.lh_first; entry != NULL; - entry = entry-ev_link.le_next) { -
Re: [Qemu-devel] [PATCH v3 3/8] ahci: Clean up reset functions
On 11.05.2012, at 16:42, Jan Kiszka wrote: Properly register reset functions via the device class. CC: Alexander Graf ag...@suse.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com Looks good to me, but I'm no infrastructure expert :) Reviewed-by: Alexander Graf ag...@suse.de Alex
[Qemu-devel] [PATCHv2 2/9] tcg: __jit_debug_descriptor must *not* be static
From: Jim Meyering meyer...@redhat.com Add comments so no one else will be tempted to reduce the scope of this global variable. Signed-off-by: Jim Meyering meyer...@redhat.com --- tcg/tcg.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index ab589c7..2793fa6 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -2264,7 +2264,9 @@ void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf) (3) Call tcg_register_jit_int, with the constructed .debug_frame. */ -/* Begin GDB interface. THE FOLLOWING MUST MATCH GDB DOCS. */ +/* Begin GDB interface. THE FOLLOWING MUST MATCH GDB DOCS: + http://sourceware.org/gdb/onlinedocs/gdb/Declarations.html +*/ typedef enum { JIT_NOACTION = 0, JIT_REGISTER_FN, @@ -2291,8 +2293,10 @@ void __jit_debug_register_code(void) asm(); } -/* Must statically initialize the version, because GDB may check - the version before we can set it. */ +/* We must initialize the version this way, because GDB may check + the version before we can set it. This declaration must have + external scope. If it were static, an aggressive compiler might + notice that we never read this symbol and remove it altogether. */ struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 }; /* End GDB interface. */ -- 1.7.10.2.552.gaa3bb87
Re: [Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static
Peter Maydell wrote: On 21 May 2012 21:10, Jim Meyering j...@meyering.net wrote: Peter Maydell wrote: On 21 May 2012 20:51, Jim Meyering j...@meyering.net wrote: From: Jim Meyering meyer...@redhat.com Signed-off-by: Jim Meyering meyer...@redhat.com ---  tcg/tcg.c | 2 +-  1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index ab589c7..350fdad 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)  /* Must statically initialize the version, because GDB may check   the version before we can set it.  */ -struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 }; +static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };  /* End GDB interface.  */ Nak. This symbol is global so that gdb can find it by fishing around in the executable's symbol table. Thanks for the quick feedback. How does the scope of the symbol affect whether gdb can find it? If you mark it 'static' the compiler can throw it away or completely rearrange it if it's feeling clever enough, I think. Anyway, we're following a prescribed interface here: http://sourceware.org/gdb/onlinedocs/gdb/Declarations.html and I don't think we should deviate from it. As the comment says, THE FOLLOWING MUST MATCH GDB DOCS.. If declaring this variable static is not appropriate, then the comment saying that static initialization is desired should be changed. The comment means statically initialize this variable rather than doing it dynamically in some function at startup. Thanks. I've clarified the comments and posted a V2.
Re: [Qemu-devel] [PATCHv2 1/2] envlist.c: convert all TABs to equivalent spaces
Peter Maydell wrote: On 22 May 2012 10:50, Jim Meyering j...@meyering.net wrote: From: Jim Meyering meyer...@redhat.com If we're going to go to the effort of a complete reindent patch we should actually reindent to the QEMU coding style standard, which is four-space, not eight. Good point. V3 is on its way.
Re: [Qemu-devel] [PATCH v3 1/8] ahci: Fix reset of MSI function
On 11.05.2012, at 16:42, Jan Kiszka wrote: Call msi_reset on device reset as still required by the core. CC: Alexander Graf ag...@suse.de CC: qemu-sta...@nongnu.org Signed-off-by: Jan Kiszka jan.kis...@siemens.com Acked-by: Alexander Graf ag...@suse.de Alex
Re: [Qemu-devel] Current differences between qemu --enable-kvm and qemu-kvm?
On 2012-05-22 07:04, Stefan Hajnoczi wrote: On Mon, May 21, 2012 at 5:54 PM, Erik Rull webmas...@rdsoftware.de wrote: is there a summary existing that shows up the rough or actual differences between qemu --enable-kvm and qemu-kvm? I tested both versions with the same compile and start options, the CPU performance results are identical, only the bootup time of my guest system with qemu-kvm seemed to be a bit faster (not measured, it just feeled so). Current upstream does not enable the in-kernel irqchip of KVM by default. This should explain the difference in boot-up times. Try -machine accel=kvm,kernel_irqchip=on. But the default will be on, just like in qemu-kvm, once [1] is merged. For production KVM instances I think it still makes sense to use qemu-kvm packages from your distro or qemu-kvm upstream source. Jan Kiszka has reduced the delta between qemu.git and qemu-kvm.git to the point where I think the list of differences is rather small - maybe PCI passthrough stuff, irqfd for vhost-net (which is now also being upstreamed into qemu.git), and a few other things I don't know of. Right, the list of differences is dramatically shrinking. As stated in [2], soon only PCI passthrough and legacy interface dependencies on qemu-kvm will be the remaining reasons to use it. If we are lucky, PCI passthrough will also make it into upstream for QEMU 1.2, we are working on this. For development most patches should be against qemu.git unless they have a dependency on qemu-kvm.git code. Yes, unless you are working on the upstream merge itself, there is practically no reason anymore to develop against qemu-kvm directly. Jan [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91171 [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91026 -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH qom-next 0/5] target-i386: re-factor CPU creation/initialization to QOM
Moving code related to CPU creation and initialization internal parts from board level into apic and cpu objects will allow X86CPU to better model QOM object life-cycle. It will allow to create X86CPU as any other object by creating it with object_new() then setting properties and then calling x86_cpu_realize() to make it running. Later x86_cpu_realize() should become realize property. git tree at: https://github.com/imammedo/qemu/tree/x86-cpu-realize-v2
[Qemu-devel] [PATCH qom-next 2/5] target-i386: add cpu-model property to x86_cpu
it's probably intermidiate step till cpu modeled as sub-classes. After then we probably could drop it. However it still could be used for overiding default cpu subclasses definition, and probably renamed to something like 'features'. Signed-off-by: Igor Mammedov imamm...@redhat.com --- cpu-defs.h |2 +- hw/pc.c | 10 -- target-i386/cpu.c| 34 ++ target-i386/helper.c | 16 4 files changed, 47 insertions(+), 15 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index f49e950..8f4623c 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -221,7 +221,7 @@ typedef struct CPUWatchpoint { struct QemuCond *halt_cond; \ int thread_kicked; \ struct qemu_work_item *queued_work_first, *queued_work_last;\ -const char *cpu_model_str; \ +char *cpu_model_str;\ struct KVMState *kvm_state; \ struct kvm_run *kvm_run;\ int kvm_fd; \ diff --git a/hw/pc.c b/hw/pc.c index 6ad1c0d..00d738d 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -939,7 +939,6 @@ static X86CPU *pc_new_cpu(const char *cpu_model) cpu = cpu_x86_init(cpu_model); if (cpu == NULL) { -fprintf(stderr, Unable to find x86 CPU definition\n); exit(1); } env = cpu-env; @@ -955,15 +954,6 @@ void pc_cpus_init(const char *cpu_model) { int i; -/* init CPUs */ -if (cpu_model == NULL) { -#ifdef TARGET_X86_64 -cpu_model = qemu64; -#else -cpu_model = qemu32; -#endif -} - for(i = 0; i smp_cpus; i++) { pc_new_cpu(cpu_model); } diff --git a/target-i386/cpu.c b/target-i386/cpu.c index c7c23bf..538892d 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1729,6 +1729,27 @@ static void mce_init(X86CPU *cpu) } } +static char *x86_get_cpu_model(Object *obj, Error **errp) +{ +X86CPU *cpu = X86_CPU(obj); +CPUX86State *env = cpu-env; +return g_strdup(env-cpu_model_str); +} + +static void x86_set_cpu_model(Object *obj, const char *value, Error **errp) +{ +X86CPU *cpu = X86_CPU(obj); +CPUX86State *env = cpu-env; + +g_free((gpointer)env-cpu_model_str); +env-cpu_model_str = g_strdup(value); + +if (cpu_x86_register(cpu, env-cpu_model_str) 0) { +fprintf(stderr, Unable to find x86 CPU definition\n); +error_set(errp, QERR_INVALID_PARAMETER_COMBINATION); +} +} + void x86_cpu_realize(Object *obj, Error **errp) { X86CPU *cpu = X86_CPU(obj); @@ -1769,7 +1790,20 @@ static void x86_cpu_initfn(Object *obj) x86_cpuid_get_tsc_freq, x86_cpuid_set_tsc_freq, NULL, NULL, NULL); +object_property_add_str(obj, cpu-model, +x86_get_cpu_model, x86_set_cpu_model, NULL); + env-cpuid_apic_id = env-cpu_index; + +/* init various static tables used in TCG mode */ +if (tcg_enabled() !inited) { +inited = 1; +optimize_flags_init(); +#ifndef CONFIG_USER_ONLY +prev_debug_excp_handler = +cpu_set_debug_excp_handler(breakpoint_handler); +#endif +} } static void x86_cpu_common_class_init(ObjectClass *oc, void *data) diff --git a/target-i386/helper.c b/target-i386/helper.c index 94f95b7..6fc67a9 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1160,12 +1160,10 @@ int cpu_x86_get_descr_debug(CPUX86State *env, unsigned int selector, X86CPU *cpu_x86_init(const char *cpu_model) { X86CPU *cpu; -CPUX86State *env; +Error *errp = NULL; static int inited; cpu = X86_CPU(object_new(TYPE_X86_CPU)); -env = cpu-env; -env-cpu_model_str = cpu_model; /* init various static tables used in TCG mode */ if (tcg_enabled() !inited) { @@ -1176,7 +1174,17 @@ X86CPU *cpu_x86_init(const char *cpu_model) cpu_set_debug_excp_handler(breakpoint_handler); #endif } -if (cpu_x86_register(cpu, cpu_model) 0) { + +if (cpu_model) { +object_property_set_str(OBJECT(cpu), cpu_model, cpu-model, errp); +} else { +#ifdef TARGET_X86_64 +object_property_set_str(OBJECT(cpu), qemu64, cpu-model, errp); +#else +object_property_set_str(OBJECT(cpu), qemu32, cpu-model, errp); +#endif +} +if (errp) { object_delete(OBJECT(cpu)); return NULL; } -- 1.7.7.6
[Qemu-devel] [PATCH 02/16] qapi: introduce size type
Signed-off-by: Laszlo Ersek ler...@redhat.com --- qapi/qapi-visit-core.h |4 qapi/qapi-visit-core.c |7 +++ scripts/qapi.py|2 +- 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/qapi/qapi-visit-core.h b/qapi/qapi-visit-core.h index e850746..bab2fec 100644 --- a/qapi/qapi-visit-core.h +++ b/qapi/qapi-visit-core.h @@ -39,6 +39,9 @@ struct Visitor const char *kind, const char *name, Error **errp); void (*type_int)(Visitor *v, int64_t *obj, const char *name, Error **errp); +/* visit_type_size() falls back to (*type_int)() if type_size is unset */ +void (*type_size)(Visitor *v, int64_t *obj, const char *name, + Error **errp); void (*type_bool)(Visitor *v, bool *obj, const char *name, Error **errp); void (*type_str)(Visitor *v, char **obj, const char *name, Error **errp); void (*type_number)(Visitor *v, double *obj, const char *name, @@ -69,6 +72,7 @@ void visit_end_optional(Visitor *v, Error **errp); void visit_type_enum(Visitor *v, int *obj, const char *strings[], const char *kind, const char *name, Error **errp); void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp); +void visit_type_size(Visitor *v, int64_t *obj, const char *name, Error **errp); void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp); void visit_type_str(Visitor *v, char **obj, const char *name, Error **errp); void visit_type_number(Visitor *v, double *obj, const char *name, Error **errp); diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c index df1ed5c..ea31cf5 100644 --- a/qapi/qapi-visit-core.c +++ b/qapi/qapi-visit-core.c @@ -95,6 +95,13 @@ void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp) } } +void visit_type_size(Visitor *v, int64_t *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +(v-type_size ? v-type_size : v-type_int)(v, obj, name, errp); +} +} + void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp) { if (!error_is_set(errp)) { diff --git a/scripts/qapi.py b/scripts/qapi.py index e062336..6aebb0f 100644 --- a/scripts/qapi.py +++ b/scripts/qapi.py @@ -157,7 +157,7 @@ def is_enum(name): def c_type(name): if name == 'str': return 'char *' -elif name == 'int': +elif name == 'int' or name == 'size': return 'int64_t' elif name == 'bool': return 'bool' -- 1.7.1
[Qemu-devel] [PATCH 14/16] convert net_init_tap() to NetClientOptions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/tap.h |2 +- net/tap-aix.c |2 +- net/tap-bsd.c |2 +- net/tap-haiku.c |2 +- net/tap-linux.c |9 +++- net/tap-solaris.c |2 +- net/tap-win32.c | 11 +++-- net/tap.c | 111 ++--- 8 files changed, 71 insertions(+), 70 deletions(-) diff --git a/net/tap.h b/net/tap.h index 44e31ce..f092129 100644 --- a/net/tap.h +++ b/net/tap.h @@ -47,7 +47,7 @@ void tap_using_vnet_hdr(VLANClientState *vc, int using_vnet_hdr); void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6, int ecn, int ufo); void tap_set_vnet_hdr_len(VLANClientState *vc, int len); -int tap_set_sndbuf(int fd, QemuOpts *opts); +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap); int tap_probe_vnet_hdr(int fd); int tap_probe_vnet_hdr_len(int fd, int len); int tap_probe_has_ufo(int fd); diff --git a/net/tap-aix.c b/net/tap-aix.c index e19aaba..f27c177 100644 --- a/net/tap-aix.c +++ b/net/tap-aix.c @@ -31,7 +31,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return -1; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-bsd.c b/net/tap-bsd.c index 4b6b3a4..9a4e2a1 100644 --- a/net/tap-bsd.c +++ b/net/tap-bsd.c @@ -123,7 +123,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return fd; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-haiku.c b/net/tap-haiku.c index 91dda8e..34739d1 100644 --- a/net/tap-haiku.c +++ b/net/tap-haiku.c @@ -31,7 +31,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return -1; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-linux.c b/net/tap-linux.c index 41d581b..c6521be 100644 --- a/net/tap-linux.c +++ b/net/tap-linux.c @@ -98,16 +98,19 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required */ #define TAP_DEFAULT_SNDBUF 0 -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { int sndbuf; -sndbuf = qemu_opt_get_size(opts, sndbuf, TAP_DEFAULT_SNDBUF); +sndbuf = !tap-has_sndbuf ? TAP_DEFAULT_SNDBUF : + tap-sndbuf INT_MAX ? INT_MAX : + tap-sndbuf; + if (!sndbuf) { sndbuf = INT_MAX; } -if (ioctl(fd, TUNSETSNDBUF, sndbuf) == -1 qemu_opt_get(opts, sndbuf)) { +if (ioctl(fd, TUNSETSNDBUF, sndbuf) == -1 tap-has_sndbuf) { error_report(TUNSETSNDBUF ioctl failed: %s, strerror(errno)); return -1; } diff --git a/net/tap-solaris.c b/net/tap-solaris.c index cf76463..5d6ac42 100644 --- a/net/tap-solaris.c +++ b/net/tap-solaris.c @@ -197,7 +197,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return fd; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-win32.c b/net/tap-win32.c index b738f45..b6099cd 100644 --- a/net/tap-win32.c +++ b/net/tap-win32.c @@ -699,19 +699,20 @@ static int tap_win32_init(VLANState *vlan, const char *model, return 0; } -int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_tap(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { -const char *ifname; +const NetdevTapOptions *tap; -ifname = qemu_opt_get(opts, ifname); +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_TAP); +tap = opts-tap; -if (!ifname) { +if (!tap-has_ifname) { error_report(tap: no interface name); return -1; } -if (tap_win32_init(vlan, tap, name, ifname) == -1) { +if (tap_win32_init(vlan, tap, name, tap-ifname) == -1) { return -1; } diff --git a/net/tap.c b/net/tap.c index 6d57d6c..7501eba 100644 --- a/net/tap.c +++ b/net/tap.c @@ -547,29 +547,32 @@ int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, return 0; } -static int net_tap_init(QemuOpts *opts, int *vnet_hdr) +static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr, +const char *setup_script, char *ifname, +size_t ifname_sz) { int fd, vnet_hdr_required; -char ifname[128] = {0,}; -const char *setup_script; -if (qemu_opt_get(opts, ifname)) { -pstrcpy(ifname, sizeof(ifname), qemu_opt_get(opts, ifname)); +if (tap-has_ifname) { +pstrcpy(ifname, ifname_sz, tap-ifname); +} else { +assert(ifname_sz 0); +ifname[0] = '\0'; } -*vnet_hdr = qemu_opt_get_bool(opts,
Re: [Qemu-devel] [PATCH qom-next 3/5] pc: move apic_mapped initialization into common apic init code
On 2012-05-22 07:35, Igor Mammedov wrote: Move from apic_init in pc.c the code that belongs to apic_init_common and create/init apic in pc_new_cpu instead of separate func. Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/apic_common.c | 16 hw/msi.h |2 ++ hw/pc.c | 47 --- 3 files changed, 26 insertions(+), 39 deletions(-) diff --git a/hw/apic_common.c b/hw/apic_common.c index 23d51e8..703931b 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -21,6 +21,7 @@ #include apic_internal.h #include trace.h #include kvm.h +#include msi.h static int apic_irq_delivered; bool apic_report_tpr_access; @@ -284,6 +285,7 @@ static int apic_init_common(SysBusDevice *dev) APICCommonClass *info; static DeviceState *vapic; static int apic_no; +static int apic_mapped; if (apic_no = MAX_APICS) { return -1; @@ -295,6 +297,20 @@ static int apic_init_common(SysBusDevice *dev) sysbus_init_mmio(dev, s-io_memory); +/* XXX: mapping more APICs at the same memory location */ +if (apic_mapped == 0) { +/* NOTE: the APIC is directly connected to the CPU - it is not + on the global memory bus. */ +/* XXX: what if the base changes? */ +sysbus_mmio_map(sysbus_from_qdev(s-busdev.qdev), 0, MSI_ADDR_BASE); +apic_mapped = 1; +} + +/* KVM does not support MSI yet. */ +if (!kvm_irqchip_in_kernel()) { +msi_supported = true; +} + if (!vapic s-vapic_control VAPIC_ENABLE_MASK) { vapic = sysbus_create_simple(kvmvapic, -1, NULL); } diff --git a/hw/msi.h b/hw/msi.h index 3040bb0..abd52b6 100644 --- a/hw/msi.h +++ b/hw/msi.h @@ -40,4 +40,6 @@ static inline bool msi_present(const PCIDevice *dev) return dev-cap_present QEMU_PCI_CAP_MSI; } +#define MSI_ADDR_BASE 0xfee0 + #endif /* QEMU_MSI_H */ diff --git a/hw/pc.c b/hw/pc.c index 00d738d..0eb0b73 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -879,44 +879,6 @@ DeviceState *cpu_get_current_apic(void) } } -static DeviceState *apic_init(void *env, uint8_t apic_id) -{ -DeviceState *dev; -static int apic_mapped; - -if (kvm_irqchip_in_kernel()) { -dev = qdev_create(NULL, kvm-apic); -} else if (xen_enabled()) { -dev = qdev_create(NULL, xen-apic); -} else { -dev = qdev_create(NULL, apic); -} - -qdev_prop_set_uint8(dev, id, apic_id); -qdev_prop_set_ptr(dev, cpu_env, env); -qdev_init_nofail(dev); - -/* XXX: mapping more APICs at the same memory location */ -if (apic_mapped == 0) { -/* NOTE: the APIC is directly connected to the CPU - it is not - on the global memory bus. */ -/* XXX: what if the base changes? */ -sysbus_mmio_map(sysbus_from_qdev(dev), 0, MSI_ADDR_BASE); While at it, you should drop MSI_ADDR_BASE definition from pc.c. -apic_mapped = 1; -} - -/* KVM does not support MSI yet. */ -if (!kvm_irqchip_in_kernel()) { -msi_supported = true; -} - -if (xen_msi_support()) { -msi_supported = true; -} - -return dev; -} - You are loosing some xen bits here. But this will collide with latest kvm pull request (http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91171) anyway. You may want to base on uq/master. void pc_acpi_smi_interrupt(void *opaque, int irq, int level) { CPUX86State *s = opaque; @@ -943,7 +905,14 @@ static X86CPU *pc_new_cpu(const char *cpu_model) } env = cpu-env; if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -env-apic_state = apic_init(env, env-cpuid_apic_id); +if (kvm_irqchip_in_kernel()) { +env-apic_state = qdev_create(NULL, kvm-apic); +} else { +env-apic_state = qdev_create(NULL, apic); +} +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); +qdev_prop_set_ptr(env-apic_state, cpu_env, env); +qdev_init_nofail(env-apic_state); } qemu_register_reset(pc_cpu_reset, cpu); pc_cpu_reset(cpu); Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH qom-next 3/5] pc: move apic_mapped initialization into common apic init code
On 2012-05-22 07:35, Igor Mammedov wrote: Move from apic_init in pc.c the code that belongs to apic_init_common and create/init apic in pc_new_cpu instead of separate func. Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/apic_common.c | 16 hw/msi.h |2 ++ hw/pc.c | 47 --- 3 files changed, 26 insertions(+), 39 deletions(-) diff --git a/hw/apic_common.c b/hw/apic_common.c index 23d51e8..703931b 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -21,6 +21,7 @@ #include apic_internal.h #include trace.h #include kvm.h +#include msi.h static int apic_irq_delivered; bool apic_report_tpr_access; @@ -284,6 +285,7 @@ static int apic_init_common(SysBusDevice *dev) APICCommonClass *info; static DeviceState *vapic; static int apic_no; +static int apic_mapped; if (apic_no = MAX_APICS) { return -1; @@ -295,6 +297,20 @@ static int apic_init_common(SysBusDevice *dev) sysbus_init_mmio(dev, s-io_memory); +/* XXX: mapping more APICs at the same memory location */ +if (apic_mapped == 0) { +/* NOTE: the APIC is directly connected to the CPU - it is not + on the global memory bus. */ +/* XXX: what if the base changes? */ +sysbus_mmio_map(sysbus_from_qdev(s-busdev.qdev), 0, MSI_ADDR_BASE); +apic_mapped = 1; +} + +/* KVM does not support MSI yet. */ +if (!kvm_irqchip_in_kernel()) { +msi_supported = true; +} + if (!vapic s-vapic_control VAPIC_ENABLE_MASK) { vapic = sysbus_create_simple(kvmvapic, -1, NULL); } diff --git a/hw/msi.h b/hw/msi.h index 3040bb0..abd52b6 100644 --- a/hw/msi.h +++ b/hw/msi.h @@ -40,4 +40,6 @@ static inline bool msi_present(const PCIDevice *dev) return dev-cap_present QEMU_PCI_CAP_MSI; } +#define MSI_ADDR_BASE 0xfee0 + #endif /* QEMU_MSI_H */ diff --git a/hw/pc.c b/hw/pc.c index 00d738d..0eb0b73 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -879,44 +879,6 @@ DeviceState *cpu_get_current_apic(void) } } -static DeviceState *apic_init(void *env, uint8_t apic_id) -{ -DeviceState *dev; -static int apic_mapped; - -if (kvm_irqchip_in_kernel()) { -dev = qdev_create(NULL, kvm-apic); -} else if (xen_enabled()) { -dev = qdev_create(NULL, xen-apic); -} else { -dev = qdev_create(NULL, apic); -} - -qdev_prop_set_uint8(dev, id, apic_id); -qdev_prop_set_ptr(dev, cpu_env, env); -qdev_init_nofail(dev); - -/* XXX: mapping more APICs at the same memory location */ -if (apic_mapped == 0) { -/* NOTE: the APIC is directly connected to the CPU - it is not - on the global memory bus. */ -/* XXX: what if the base changes? */ -sysbus_mmio_map(sysbus_from_qdev(dev), 0, MSI_ADDR_BASE); While at it, you should drop MSI_ADDR_BASE definition from pc.c. -apic_mapped = 1; -} - -/* KVM does not support MSI yet. */ -if (!kvm_irqchip_in_kernel()) { -msi_supported = true; -} - -if (xen_msi_support()) { -msi_supported = true; -} - -return dev; -} - You are loosing some xen bits here. But this will collide with latest kvm pull request (http://thread.gmane.org/gmane.comp.emulators.kvm.devel/91171) anyway. You may want to base on uq/master. void pc_acpi_smi_interrupt(void *opaque, int irq, int level) { CPUX86State *s = opaque; @@ -943,7 +905,14 @@ static X86CPU *pc_new_cpu(const char *cpu_model) } env = cpu-env; if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -env-apic_state = apic_init(env, env-cpuid_apic_id); +if (kvm_irqchip_in_kernel()) { +env-apic_state = qdev_create(NULL, kvm-apic); +} else { +env-apic_state = qdev_create(NULL, apic); +} +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); +qdev_prop_set_ptr(env-apic_state, cpu_env, env); +qdev_init_nofail(env-apic_state); } qemu_register_reset(pc_cpu_reset, cpu); pc_cpu_reset(cpu); Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH qom-next 4/5] target-i386: make initialize CPU in QOM way
On 2012-05-22 07:35, Igor Mammedov wrote: Make CPU creation/initialization consistent with QOM object behavior in this, by moving tcg and apic initialization from board level into CPU's initfn/realize calls and cpu_model property setter. Which makes CPU object self-sufficient in respect of creation/initialization and matches a typical object creation sequence, i.e.: - create CPU instance - set properties - realize object - (x86_cpu_realize will be converted into realize property setter, when it is implemented) Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/pc.c | 32 +- target-i386/cpu.c| 60 +- target-i386/helper.c | 39 3 files changed, 65 insertions(+), 66 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 0eb0b73..677f9e0 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -894,37 +894,17 @@ static void pc_cpu_reset(void *opaque) cpu_reset(CPU(cpu)); } -static X86CPU *pc_new_cpu(const char *cpu_model) -{ -X86CPU *cpu; -CPUX86State *env; - -cpu = cpu_x86_init(cpu_model); -if (cpu == NULL) { -exit(1); -} -env = cpu-env; -if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -if (kvm_irqchip_in_kernel()) { -env-apic_state = qdev_create(NULL, kvm-apic); -} else { -env-apic_state = qdev_create(NULL, apic); -} -qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); -qdev_prop_set_ptr(env-apic_state, cpu_env, env); -qdev_init_nofail(env-apic_state); -} -qemu_register_reset(pc_cpu_reset, cpu); -pc_cpu_reset(cpu); -return cpu; -} - void pc_cpus_init(const char *cpu_model) { +X86CPU *cpu; int i; for(i = 0; i smp_cpus; i++) { -pc_new_cpu(cpu_model); +cpu = cpu_x86_init(cpu_model); +if (cpu == NULL) { +exit(1); +} +qemu_register_reset(pc_cpu_reset, cpu); } } diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 538892d..0e804ea 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -31,6 +31,9 @@ #include hyperv.h +#include hw/qdev.h +#include sysemu.h + /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement * between feature naming conventions, aliases may be added. @@ -1747,21 +1750,76 @@ static void x86_set_cpu_model(Object *obj, const char *value, Error **errp) if (cpu_x86_register(cpu, env-cpu_model_str) 0) { fprintf(stderr, Unable to find x86 CPU definition\n); error_set(errp, QERR_INVALID_PARAMETER_COMBINATION); +return; +} + +if (((env-cpuid_features CPUID_APIC) || smp_cpus 1)) { +if (kvm_irqchip_in_kernel()) { +env-apic_state = qdev_create(NULL, kvm-apic); +} else { +env-apic_state = qdev_create(NULL, apic); +} +object_property_add_child(OBJECT(cpu), apic, +OBJECT(env-apic_state), NULL); + +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); +qdev_prop_set_ptr(env-apic_state, cpu_env, env); +} +} + +static CPUDebugExcpHandler *prev_debug_excp_handler; + +static void breakpoint_handler(CPUX86State *env) +{ +CPUBreakpoint *bp; + +if (env-watchpoint_hit) { +if (env-watchpoint_hit-flags BP_CPU) { +env-watchpoint_hit = NULL; +if (check_hw_breakpoints(env, 0)) { +raise_exception_env(EXCP01_DB, env); +} else { +cpu_resume_from_signal(env, NULL); +} +} +} else { +QTAILQ_FOREACH(bp, env-breakpoints, entry) +if (bp-pc == env-eip) { +if (bp-flags BP_CPU) { +check_hw_breakpoints(env, 1); +raise_exception_env(EXCP01_DB, env); +} +break; +} +} +if (prev_debug_excp_handler) { +prev_debug_excp_handler(env); } } void x86_cpu_realize(Object *obj, Error **errp) { X86CPU *cpu = X86_CPU(obj); +CPUX86State *env = cpu-env; + +if (env-apic_state) { +if (qdev_init(env-apic_state) 0) { +error_set(errp, QERR_DEVICE_INIT_FAILED, +object_get_typename(OBJECT(env-apic_state))); +return; +} +} mce_init(cpu); -qemu_init_vcpu(cpu-env); +qemu_init_vcpu(env); +cpu_reset(CPU(cpu)); } static void x86_cpu_initfn(Object *obj) { X86CPU *cpu = X86_CPU(obj); CPUX86State *env = cpu-env; +static int inited; cpu_exec_init(env); diff --git a/target-i386/helper.c b/target-i386/helper.c index 6fc67a9..443092e 100644 ---
[Qemu-devel] [PATCH qom-next 3/5] pc: move apic_mapped initialization into common apic init code
Move from apic_init in pc.c the code that belongs to apic_init_common and create/init apic in pc_new_cpu instead of separate func. Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/apic_common.c | 16 hw/msi.h |2 ++ hw/pc.c | 47 --- 3 files changed, 26 insertions(+), 39 deletions(-) diff --git a/hw/apic_common.c b/hw/apic_common.c index 23d51e8..703931b 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -21,6 +21,7 @@ #include apic_internal.h #include trace.h #include kvm.h +#include msi.h static int apic_irq_delivered; bool apic_report_tpr_access; @@ -284,6 +285,7 @@ static int apic_init_common(SysBusDevice *dev) APICCommonClass *info; static DeviceState *vapic; static int apic_no; +static int apic_mapped; if (apic_no = MAX_APICS) { return -1; @@ -295,6 +297,20 @@ static int apic_init_common(SysBusDevice *dev) sysbus_init_mmio(dev, s-io_memory); +/* XXX: mapping more APICs at the same memory location */ +if (apic_mapped == 0) { +/* NOTE: the APIC is directly connected to the CPU - it is not + on the global memory bus. */ +/* XXX: what if the base changes? */ +sysbus_mmio_map(sysbus_from_qdev(s-busdev.qdev), 0, MSI_ADDR_BASE); +apic_mapped = 1; +} + +/* KVM does not support MSI yet. */ +if (!kvm_irqchip_in_kernel()) { +msi_supported = true; +} + if (!vapic s-vapic_control VAPIC_ENABLE_MASK) { vapic = sysbus_create_simple(kvmvapic, -1, NULL); } diff --git a/hw/msi.h b/hw/msi.h index 3040bb0..abd52b6 100644 --- a/hw/msi.h +++ b/hw/msi.h @@ -40,4 +40,6 @@ static inline bool msi_present(const PCIDevice *dev) return dev-cap_present QEMU_PCI_CAP_MSI; } +#define MSI_ADDR_BASE 0xfee0 + #endif /* QEMU_MSI_H */ diff --git a/hw/pc.c b/hw/pc.c index 00d738d..0eb0b73 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -879,44 +879,6 @@ DeviceState *cpu_get_current_apic(void) } } -static DeviceState *apic_init(void *env, uint8_t apic_id) -{ -DeviceState *dev; -static int apic_mapped; - -if (kvm_irqchip_in_kernel()) { -dev = qdev_create(NULL, kvm-apic); -} else if (xen_enabled()) { -dev = qdev_create(NULL, xen-apic); -} else { -dev = qdev_create(NULL, apic); -} - -qdev_prop_set_uint8(dev, id, apic_id); -qdev_prop_set_ptr(dev, cpu_env, env); -qdev_init_nofail(dev); - -/* XXX: mapping more APICs at the same memory location */ -if (apic_mapped == 0) { -/* NOTE: the APIC is directly connected to the CPU - it is not - on the global memory bus. */ -/* XXX: what if the base changes? */ -sysbus_mmio_map(sysbus_from_qdev(dev), 0, MSI_ADDR_BASE); -apic_mapped = 1; -} - -/* KVM does not support MSI yet. */ -if (!kvm_irqchip_in_kernel()) { -msi_supported = true; -} - -if (xen_msi_support()) { -msi_supported = true; -} - -return dev; -} - void pc_acpi_smi_interrupt(void *opaque, int irq, int level) { CPUX86State *s = opaque; @@ -943,7 +905,14 @@ static X86CPU *pc_new_cpu(const char *cpu_model) } env = cpu-env; if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -env-apic_state = apic_init(env, env-cpuid_apic_id); +if (kvm_irqchip_in_kernel()) { +env-apic_state = qdev_create(NULL, kvm-apic); +} else { +env-apic_state = qdev_create(NULL, apic); +} +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); +qdev_prop_set_ptr(env-apic_state, cpu_env, env); +qdev_init_nofail(env-apic_state); } qemu_register_reset(pc_cpu_reset, cpu); pc_cpu_reset(cpu); -- 1.7.7.6
[Qemu-devel] [PATCH 06/16] qapi schema: add Netdev types
NetdevTapOptions::sndbuf and NetdevDumpOptions::len use the new size type. Signed-off-by: Laszlo Ersek ler...@redhat.com --- qapi-schema.json | 275 ++ 1 files changed, 275 insertions(+), 0 deletions(-) diff --git a/qapi-schema.json b/qapi-schema.json index a8b3803..fae2596 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1840,3 +1840,278 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @NetdevNoneOptions +# +# Use it alone to have zero network devices. +# +# Since 1.2 +## +{ 'type': 'NetdevNoneOptions', + 'data': { } } + +## +# @NetLegacyNicOptions +# +# Create a new Network Interface Card. +# +# @netdev: #optional id of -netdev to connect to +# +# @macaddr: #optional MAC address +# +# @model: #optional device model (e1000, rtl8139, virtio etc.) +# +# @addr: #optional PCI device address +# +# @vectors: #optional number of MSI-x vectors, 0 to disable MSI-X +# +# Since 1.2 +## +{ 'type': 'NetLegacyNicOptions', + 'data': { +'*netdev': 'str', +'*macaddr': 'str', +'*model': 'str', +'*addr':'str', +'*vectors': 'int' } } + +## +# @String +# +# A fat type wrapping 'str', to be embedded in lists. +# +# Since 1.2 +## +{ 'type': 'String', + 'data': { +'str': 'str' } } + +## +# @NetdevUserOptions +# +# Use the user mode network stack which requires no administrator privilege to +# run. +# +# @hostname: #optional client hostname reported by the builtin DHCP server +# +# @restrict: #optional isolate the guest from the host +# +# @ip: #optional legacy parameter, use net= instead +# +# @net: #optional IP address and optional netmask +# +# @host: #optional guest-visible address of the host +# +# @tftp: #optional root directory of the built-in TFTP server +# +# @bootfile: #optional BOOTP filename, for use with tftp= +# +# @dhcpstart: #optional the first of the 16 IPs the built-in DHCP server can +# assign +# +# @dns: #optional guest-visible address of the virtual nameserver +# +# @smb: #optional root directory of the built-in SMB server +# +# @smbserver: #optional IP address of the built-in SMB server +# +# @hostfwd: #optional redirect incoming TCP or UDP host connections to guest +# endpoints +# +# @guestfwd: #optional forward guest TCP connections +# +# Since 1.2 +## +{ 'type': 'NetdevUserOptions', + 'data': { +'*hostname': 'str', +'*restrict': 'bool', +'*ip':'str', +'*net': 'str', +'*host': 'str', +'*tftp': 'str', +'*bootfile': 'str', +'*dhcpstart': 'str', +'*dns': 'str', +'*smb': 'str', +'*smbserver': 'str', +'*hostfwd': ['String'], +'*guestfwd': ['String'] } } + +## +# @NetdevTapOptions +# +# Connect the host TAP network interface name to the VLAN. +# +# @ifname: #optional interface name +# +# @fd: #optional file descriptor of an already opened tap +# +# @script: #optional script to initialize the interface +# +# @downscript: #optional script to shut down the interface +# +# @helper: #optional command to execute to configure bridge +# +# @sndbuf: #optional send buffer limit. Understands [TGMKkb] suffixes. +# +# @vnet_hdr: #optional enable the IFF_VNET_HDR flag on the tap interface +# +# @vhost: #optional enable vhost-net network accelerator +# +# @vhostfd: #optional file descriptor of an already opened vhost net device +# +# @vhostforce: #optional vhost on for non-MSIX virtio guests +# +# Since 1.2 +## +{ 'type': 'NetdevTapOptions', + 'data': { +'*ifname': 'str', +'*fd': 'str', +'*script': 'str', +'*downscript': 'str', +'*helper': 'str', +'*sndbuf': 'size', +'*vnet_hdr': 'bool', +'*vhost': 'bool', +'*vhostfd':'str', +'*vhostforce': 'bool' } } + +## +# @NetdevSocketOptions +# +# Connect the VLAN to a remote VLAN in another QEMU virtual machine using a TCP +# socket connection. +# +# @fd: #optional file descriptor of an already opened socket +# +# @listen: #optional port number, and optional hostname, to listen on +# +# @connect: #optional port number, and optional hostname, to connect to +# +# @mcast: #optional UDP multicast address and port number +# +# @localaddr: #optional source address and port for multicast and udp packets +# +# @udp: #optional UDP unicast address and port number +# +# Since 1.2 +## +{ 'type': 'NetdevSocketOptions', + 'data': { +'*fd':'str', +'*listen':'str', +'*connect': 'str', +'*mcast': 'str', +'*localaddr': 'str', +'*udp': 'str' } } + +## +# @NetdevVdeOptions +# +# Connect the VLAN to a vde switch running on the host. +# +# @sock: #optional socket path +# +# @port: #optional port number +# +# @group: #optional group owner of socket +# +# @mode: #optional permissions for socket +# +# Since 1.2 +## +{ 'type': 'NetdevVdeOptions', + 'data': { +'*sock': 'str', +'*port': 'int', +'*group': 'str', +'*mode':
Re: [Qemu-devel] [PATCH qom-next 1/5] target-i386: move cpu halted decision into x86_cpu_reset
On 22 May 2012 11:35, Igor Mammedov imamm...@redhat.com wrote: From: Igor Mammedov niall...@gmail.com MP initialization protocol differs between cpu families, and for P6 and onward models it is up to CPU to decide if it will be BSP using this protocol, so try to model this. However there is no point in implementing MP initialization protocol in qemu. Thus first CPU is always marked as BSP. This breaks compilation of the i386-linux-user target: target-i386/cpu.c: In function ‘x86_cpu_reset’: target-i386/cpu.c:1710: error: implicit declaration of function ‘apic_designate_bsp’ target-i386/cpu.c:1710: error: nested extern declaration of ‘apic_designate_bsp’ target-i386/cpu.c:1713: error: implicit declaration of function ‘cpu_get_apic_base’ target-i386/cpu.c:1713: error: nested extern declaration of ‘cpu_get_apic_base’ -- PMM
[Qemu-devel] [PATCH 1/2] fdc: floppy drive should be visible after start without media
If you start guest with floppy drive but without media inserted, the guest now can see floppy drive. Signed-off-by: Pavel Hrdina phrd...@redhat.com --- hw/pc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 4d34a33..967c17a 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -382,7 +382,7 @@ void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size, if (floppy) { fdc_get_bs(fd, floppy); for (i = 0; i 2; i++) { -if (fd[i] bdrv_is_inserted(fd[i])) { +if (fd[i]) { bdrv_get_floppy_geometry_hint(fd[i], nb_heads, max_track, last_sect, FDRIVE_DRV_NONE, fd_type[i], rate); -- 1.7.7.6
[Qemu-devel] [PATCH 0/2] fdc: fix media handling
This patch series fixes handling of FDC when you don't have media inserted. Guest should see floppy drive if you start guest without media and should detect that there is no media in drive. Signed-off-by: Pavel Hrdina phrd...@redhat.com Pavel Hrdina (2): fdc: floppy drive should be visible after start without media fdc: fix media detection hw/fdc.c |8 ++-- hw/pc.c |2 +- 2 files changed, 3 insertions(+), 7 deletions(-) -- 1.7.7.6
[Qemu-devel] [PATCH 15/16] convert net_init_bridge() to NetClientOptions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/tap.c | 23 --- 1 files changed, 12 insertions(+), 11 deletions(-) diff --git a/net/tap.c b/net/tap.c index 7501eba..fdaab2b 100644 --- a/net/tap.c +++ b/net/tap.c @@ -512,21 +512,22 @@ static int net_bridge_run_helper(const char *helper, const char *bridge) return -1; } -int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_bridge(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { +const NetdevBridgeOptions *bridge; +const char *helper, *br; + TAPState *s; int fd, vnet_hdr; -if (!qemu_opt_get(opts, br)) { -qemu_opt_set(opts, br, DEFAULT_BRIDGE_INTERFACE); -} -if (!qemu_opt_get(opts, helper)) { -qemu_opt_set(opts, helper, DEFAULT_BRIDGE_HELPER); -} +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_BRIDGE); +bridge = opts-bridge; + +helper = bridge-has_helper ? bridge-helper : DEFAULT_BRIDGE_HELPER; +br = bridge-has_br ? bridge-br : DEFAULT_BRIDGE_INTERFACE; -fd = net_bridge_run_helper(qemu_opt_get(opts, helper), - qemu_opt_get(opts, br)); +fd = net_bridge_run_helper(helper, br); if (fd == -1) { return -1; } @@ -541,8 +542,8 @@ int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, return -1; } -snprintf(s-nc.info_str, sizeof(s-nc.info_str), helper=%s,br=%s, - qemu_opt_get(opts, helper), qemu_opt_get(opts, br)); +snprintf(s-nc.info_str, sizeof(s-nc.info_str), helper=%s,br=%s, helper, + br); return 0; } -- 1.7.1
[Qemu-devel] [PATCH 11/16] convert net_init_slirp() to NetClientOptions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/slirp.c | 93 --- 1 files changed, 25 insertions(+), 68 deletions(-) diff --git a/net/slirp.c b/net/slirp.c index 9b925b7..166304c 100644 --- a/net/slirp.c +++ b/net/slirp.c @@ -654,89 +654,46 @@ void do_info_usernet(Monitor *mon) } } -static int net_init_slirp_configs(const char *name, const char *value, void *opaque) +static void +net_init_slirp_configs(const StringList *fwd, int flags) { -struct slirp_config_str *config; - -if (strcmp(name, hostfwd) != 0 strcmp(name, guestfwd) != 0) { -return 0; -} - -config = g_malloc0(sizeof(*config)); +while (fwd) { +struct slirp_config_str *config; -pstrcpy(config-str, sizeof(config-str), value); +config = g_malloc0(sizeof(*config)); +pstrcpy(config-str, sizeof(config-str), fwd-value-str); +config-flags = flags; +config-next = slirp_configs; +slirp_configs = config; -if (!strcmp(name, hostfwd)) { -config-flags = SLIRP_CFG_HOSTFWD; +fwd = fwd-next; } - -config-next = slirp_configs; -slirp_configs = config; - -return 0; } -int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_slirp(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { struct slirp_config_str *config; -const char *vhost; -const char *vhostname; -const char *vdhcp_start; -const char *vnamesrv; -const char *tftp_export; -const char *bootfile; -const char *smb_export; -const char *vsmbsrv; -const char *restrict_opt; -char *vnet = NULL; -int restricted = 0; +char *vnet; int ret; +const NetdevUserOptions *user; -vhost = qemu_opt_get(opts, host); -vhostname = qemu_opt_get(opts, hostname); -vdhcp_start = qemu_opt_get(opts, dhcpstart); -vnamesrv= qemu_opt_get(opts, dns); -tftp_export = qemu_opt_get(opts, tftp); -bootfile= qemu_opt_get(opts, bootfile); -smb_export = qemu_opt_get(opts, smb); -vsmbsrv = qemu_opt_get(opts, smbserver); - -restrict_opt = qemu_opt_get(opts, restrict); -if (restrict_opt) { -if (!strcmp(restrict_opt, on) || -!strcmp(restrict_opt, yes) || !strcmp(restrict_opt, y)) { -restricted = 1; -} else if (strcmp(restrict_opt, off) -strcmp(restrict_opt, no) strcmp(restrict_opt, n)) { -error_report(invalid option: 'restrict=%s', restrict_opt); -return -1; -} -} - -if (qemu_opt_get(opts, ip)) { -const char *ip = qemu_opt_get(opts, ip); -int l = strlen(ip) + strlen(/24) + 1; +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_USER); +user = opts-user; -vnet = g_malloc(l); +vnet = user-has_net ? g_strdup(user-net) : + user-has_ip ? g_strdup_printf(%s/24, user-ip) : + NULL; -/* emulate legacy ip= parameter */ -pstrcpy(vnet, l, ip); -pstrcat(vnet, l, /24); -} - -if (qemu_opt_get(opts, net)) { -if (vnet) { -g_free(vnet); -} -vnet = g_strdup(qemu_opt_get(opts, net)); -} +/* all optional fields are initialized to all bits zero */ -qemu_opt_foreach(opts, net_init_slirp_configs, NULL, 0); +net_init_slirp_configs(user-hostfwd, SLIRP_CFG_HOSTFWD); +net_init_slirp_configs(user-guestfwd, 0); -ret = net_slirp_init(vlan, user, name, restricted, vnet, vhost, - vhostname, tftp_export, bootfile, vdhcp_start, - vnamesrv, smb_export, vsmbsrv); +ret = net_slirp_init(vlan, user, name, user-restrict, vnet, user-host, + user-hostname, user-tftp, user-bootfile, + user-dhcpstart, user-dns, user-smb, + user-smbserver); while (slirp_configs) { config = slirp_configs; -- 1.7.1
[Qemu-devel] [PATCH 08/16] convert net_client_init() to OptsVisitor
The net_client_init() prototype is kept intact. Based on is_netdev, the QemuOpts-rooted QemuOpt-list is parsed as a Netdev or a NetLegacy. The original meat of net_client_init() is moved to and simplified in net_client_init1(): Fields not common between -net and -netdev are clearly separated: (1) since both Netdev::id and NetLegacy::name are mandatory now, getting the name for the init functions is cleaner; (2) NetLegacy::vlan also explicitly depends on -net (see below). Verifying the type= option for -netdev can be turned into a switch. Format validation with qemu_opts_validate() can be removed because the visitor covers it. Relatedly, the net_client_types array is reduced to an array of init functions that can be directly indexed by opts-kind. (Help text is available in the schema JSON.) The outermost negation in the condition around qemu_find_vlan() was flattened, because it expresses the dependent code's requirements more clearly. VLAN lookup is avoided if there's no init function to pass the VLAN to. Whenever the value of type=... is needed, we substitute NetClientOptionsKind_lookup[kind]. The individual init functions are not converted yet, thus the original QemuOpts instance is passed transparently. Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/dump.h |4 +- net/slirp.h |4 +- net/socket.h|4 +- net/tap.h |7 +- net/vde.h |4 +- net.c | 428 -- net/dump.c |3 +- net/slirp.c |3 +- net/socket.c|3 +- net/tap-win32.c |3 +- net/tap.c |6 +- net/vde.c |3 +- 12 files changed, 126 insertions(+), 346 deletions(-) diff --git a/net/dump.h b/net/dump.h index 2b5d9ba..85ac00b 100644 --- a/net/dump.h +++ b/net/dump.h @@ -26,7 +26,9 @@ #include net.h #include qemu-common.h +#include qapi-types.h -int net_init_dump(QemuOpts *opts, const char *name, VLANState *vlan); +int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts, + const char *name, VLANState *vlan); #endif /* QEMU_NET_DUMP_H */ diff --git a/net/slirp.h b/net/slirp.h index 53fe95d..ef13a65 100644 --- a/net/slirp.h +++ b/net/slirp.h @@ -27,10 +27,12 @@ #include qemu-common.h #include qdict.h #include qemu-option.h +#include qapi-types.h #ifdef CONFIG_SLIRP -int net_init_slirp(QemuOpts *opts, const char *name, VLANState *vlan); +int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts, + const char *name, VLANState *vlan); void net_slirp_hostfwd_add(Monitor *mon, const QDict *qdict); void net_slirp_hostfwd_remove(Monitor *mon, const QDict *qdict); diff --git a/net/socket.h b/net/socket.h index e1fe959..e44d26e 100644 --- a/net/socket.h +++ b/net/socket.h @@ -26,7 +26,9 @@ #include net.h #include qemu-common.h +#include qapi-types.h -int net_init_socket(QemuOpts *opts, const char *name, VLANState *vlan); +int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts, +const char *name, VLANState *vlan); #endif /* QEMU_NET_SOCKET_H */ diff --git a/net/tap.h b/net/tap.h index b2a9450..44e31ce 100644 --- a/net/tap.h +++ b/net/tap.h @@ -28,11 +28,13 @@ #include qemu-common.h #include qemu-option.h +#include qapi-types.h #define DEFAULT_NETWORK_SCRIPT /etc/qemu-ifup #define DEFAULT_NETWORK_DOWN_SCRIPT /etc/qemu-ifdown -int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan); +int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts, + const char *name, VLANState *vlan); int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required); @@ -57,6 +59,7 @@ int tap_get_fd(VLANClientState *vc); struct vhost_net; struct vhost_net *tap_get_vhost_net(VLANClientState *vc); -int net_init_bridge(QemuOpts *opts, const char *name, VLANState *vlan); +int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, +const char *name, VLANState *vlan); #endif /* QEMU_NET_TAP_H */ diff --git a/net/vde.h b/net/vde.h index 732e575..5fc17f9 100644 --- a/net/vde.h +++ b/net/vde.h @@ -26,10 +26,12 @@ #include qemu-common.h #include qemu-option.h +#include qapi-types.h #ifdef CONFIG_VDE -int net_init_vde(QemuOpts *opts, const char *name, VLANState *vlan); +int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts, + const char *name, VLANState *vlan); #endif /* CONFIG_VDE */ diff --git a/net.c b/net.c index 9c8cd42..5ac5cf0 100644 --- a/net.c +++ b/net.c @@ -37,6 +37,9 @@ #include qmp-commands.h #include hw/qdev.h #include iov.h +#include qapi-visit.h +#include qapi/opts-visitor.h +#include qapi/qapi-dealloc-visitor.h /* Net bridge is currently not supported for W32. */ #if !defined(_WIN32) @@ -745,7 +748,8 @@ int net_handle_fd_param(Monitor *mon, const char *param) return fd; } -static int net_init_nic(QemuOpts *opts, const char *name,
[Qemu-devel] [PATCH 2/2] fdc: fix media detection
We have to set up 'media_changed' after guest start so floppy driver could detect that there is no media in drive. For this purpose we call 'fdctrl_change_cb' instead of 'fd_revalidate' in 'fdctrl_connect_drives'. 'fd_revalidate' is called inside 'fdctrl_change_cb'. In 'fdctrl_handle_seek' we always set current track because we don't care if there is media inserted or not. Signed-off-by: Pavel Hrdina phrd...@redhat.com --- hw/fdc.c |8 ++-- 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/hw/fdc.c b/hw/fdc.c index cb4cd25..337b35a 100644 --- a/hw/fdc.c +++ b/hw/fdc.c @@ -1617,11 +1617,7 @@ static void fdctrl_handle_seek(FDCtrl *fdctrl, int direction) /* The seek command just sends step pulses to the drive and doesn't care if * there is a medium inserted of if it's banging the head against the drive. */ -if (fdctrl-fifo[2] cur_drv-max_track) { -cur_drv-track = cur_drv-max_track; -} else { -cur_drv-track = fdctrl-fifo[2]; -} +cur_drv-track = fdctrl-fifo[2]; /* Raise Interrupt */ fdctrl_raise_irq(fdctrl, FD_SR0_SEEK); } @@ -1878,7 +1874,7 @@ static int fdctrl_connect_drives(FDCtrl *fdctrl) } fd_init(drive); -fd_revalidate(drive); +fdctrl_change_cb(drive, 0); if (drive-bs) { bdrv_set_dev_ops(drive-bs, fdctrl_block_ops, drive); } -- 1.7.7.6
[Qemu-devel] [PATCH 04/16] qapi: introduce OptsVisitor
This visitor supports parsing -option [type=]discriminator[,optarg1=val1][,optarg2=val2][,...] style QemuOpts objects into native C structures. After defining the type tree in the qapi schema (see below), a root type traversal with this visitor linked to the underlying QemuOpts object will build the native C representation of the option. The type tree in the schema, corresponding to an option with a discriminator, must have the following structure: struct scalar member for non-discriminated optarg 1 [*] list for repeating non-discriminated optarg 2 [*] wrapper struct single scalar member union struct for discriminator case 1 scalar member for optarg 3 [*] list for repeating optarg 4 [*] wrapper struct single scalar member scalar member for optarg 5 [*] struct for discriminator case 2 ... The type optarg name is fixed for the discriminator role. Its schema representation is union of structures, and each discriminator value must correspond to a member name in the union. If the option takes no type descriminator, then the type subtree rooted at the union must be absent from the schema (including the union itself). Optarg values can be of scalar types str / bool / int / size. Members marked with [*] may be defined as optional in the schema, describing an optional optarg. Repeating an optarg is supported; its schema representation must be list of structure with single mandatory scalar member. If an optarg is not described as repeating in the schema (ie. it is defined as a scalar field instead of a list), its last occurrence will take effect. Ordering between differently named optargs is not preserved. A mandatory list (or an optional one which is reported to be available), corresponding to a repeating optarg, has at least one element after successful parsing. Signed-off-by: Laszlo Ersek ler...@redhat.com --- qapi/opts-visitor.h | 31 + qapi/opts-visitor.c | 363 +++ Makefile.objs |2 +- 3 files changed, 395 insertions(+), 1 deletions(-) create mode 100644 qapi/opts-visitor.h create mode 100644 qapi/opts-visitor.c diff --git a/qapi/opts-visitor.h b/qapi/opts-visitor.h new file mode 100644 index 000..edd6d41 --- /dev/null +++ b/qapi/opts-visitor.h @@ -0,0 +1,31 @@ +/* + * Options Visitor + * + * Copyright Red Hat, Inc. 2012 + * + * Author: Laszlo Ersek ler...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef OPTS_VISITOR_H +#define OPTS_VISITOR_H + +#include qapi-visit-core.h +#include qemu-option.h + +typedef struct OptsVisitor OptsVisitor; + +/* Contrarily to qemu-option.c::parse_option_number(), OptsVisitor's int + * parser relies on strtoll() instead of strtoull(). Consequences: + * - string representations of negative numbers are rejected (parsed values + * continue to be non-negative), + * - values above INT64_MAX or LLONG_MAX are rejected. + */ +OptsVisitor *opts_visitor_new(const QemuOpts *opts); +void opts_visitor_cleanup(OptsVisitor *nv); +Visitor *opts_get_visitor(OptsVisitor *nv); + +#endif diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c new file mode 100644 index 000..a94c67e --- /dev/null +++ b/qapi/opts-visitor.c @@ -0,0 +1,363 @@ +/* + * Options Visitor + * + * Copyright Red Hat, Inc. 2012 + * + * Author: Laszlo Ersek ler...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include opts-visitor.h +#include qemu-queue.h +#include qemu-option-internal.h +#include qapi-visit-impl.h + + +struct OptsVisitor +{ +Visitor visitor; + +/* Ownership remains with opts_visitor_new()'s caller. */ +const QemuOpts *opts_root; + +unsigned depth; + +/* Non-null iff depth is positive. Each key is a QemuOpt name. Each value + * is a non-empty GQueue, enumerating all QemuOpt occurrences with that + * name. */ +GHashTable *unprocessed_opts; + +/* The list currently being traversed with opts_start_list() / + * opts_next_list(). The list must have a struct element type in the + * schema, with a single mandatory scalar member. */ +GQueue *repeated_opts; +bool repeated_opts_first; +}; + + +static void +destroy_list(gpointer list) +{ + g_queue_free(list); +} + + +static void +opts_start_struct(Visitor *v, void **obj, const char *kind, + const char *name, size_t size, Error **errp) +{ +OptsVisitor *ov = DO_UPCAST(OptsVisitor, visitor, v); +const QemuOpt *opt; + +*obj = g_malloc0(size); +if (ov-depth++ 0) { +return; +} + +ov-unprocessed_opts = g_hash_table_new_full(g_str_hash, g_str_equal, + NULL, destroy_list); +QTAILQ_FOREACH(opt,
[Qemu-devel] [PATCH 00/16] introduce OptsVisitor, rebase -net/-netdev parsing
Inspired by [1], the first half of this series attempts to implement a new visitor that should clean up defining and processing command line options. For a more detailed description, please see [PATCH 04/16] qapi: introduce OptsVisitor. The second half converts -net/-netdev parsing to the new visitor. The series depends on qapi: convert netdev_add netdev_del v5, see [2]. Comments highly appreciated; please keep me CC'd. Thank you. [1] http://lists.nongnu.org/archive/html/qemu-devel/2012-04/msg02512.html [2] http://lists.nongnu.org/archive/html/qemu-devel/2012-05/msg02841.html Laszlo Ersek (15): qapi: introduce size type expose QemuOpt and QemuOpts struct definitions to interested parties qapi: introduce OptsVisitor qapi schema: remove trailing whitespace qapi schema: add Netdev types hw, net: net_client_type - NetClientOptionsKind (qapi-generated) convert net_client_init() to OptsVisitor convert net_init_nic() to NetClientOptions convert net_init_dump() to NetClientOptions convert net_init_slirp() to NetClientOptions convert net_init_socket() to NetClientOptions convert net_init_vde() to NetClientOptions convert net_init_tap() to NetClientOptions convert net_init_bridge() to NetClientOptions remove unused QemuOpts parameter from net init functions Paolo Bonzini (1): qapi: fix error propagation error.h|4 +- net.h | 16 +-- net/dump.h |5 +- net/slirp.h|5 +- net/socket.h |5 +- net/tap.h | 10 +- net/vde.h |5 +- qapi/opts-visitor.h| 31 +++ qapi/qapi-visit-core.h |4 + qemu-option-internal.h | 53 + error.c|4 +- hw/cadence_gem.c |2 +- hw/dp8393x.c |2 +- hw/e1000.c |2 +- hw/eepro100.c |2 +- hw/etraxfs_eth.c |2 +- hw/lan9118.c |2 +- hw/lance.c |2 +- hw/mcf_fec.c |2 +- hw/milkymist-minimac2.c|2 +- hw/mipsnet.c |2 +- hw/musicpal.c |2 +- hw/ne2000-isa.c|2 +- hw/ne2000.c|2 +- hw/opencores_eth.c |2 +- hw/pcnet-pci.c |2 +- hw/rtl8139.c |2 +- hw/smc91c111.c |2 +- hw/spapr_llan.c|2 +- hw/stellaris_enet.c|2 +- hw/usb/dev-network.c |2 +- hw/vhost_net.c |2 +- hw/virtio-net.c| 10 +- hw/xen_nic.c |2 +- hw/xgmac.c |2 +- hw/xilinx_axienet.c|2 +- hw/xilinx_ethlite.c|2 +- net.c | 493 +++- net/dump.c | 24 ++- net/slirp.c| 96 +++-- net/socket.c | 124 --- net/tap-aix.c |2 +- net/tap-bsd.c |2 +- net/tap-haiku.c|2 +- net/tap-linux.c|9 +- net/tap-solaris.c |2 +- net/tap-win32.c| 14 +- net/tap.c | 152 ++-- net/vde.c | 40 +++- qapi/opts-visitor.c| 363 + qapi/qapi-visit-core.c | 17 +- qemu-option.c | 24 +-- tests/test-qmp-input-visitor.c | 24 ++- Makefile.objs |2 +- docs/qapi-code-gen.txt |2 + qapi-schema.json | 285 +++- scripts/qapi-visit.py | 129 ++- scripts/qapi.py|2 +- 58 files changed, 1239 insertions(+), 771 deletions(-) create mode 100644 qapi/opts-visitor.h create mode 100644 qemu-option-internal.h create mode 100644 qapi/opts-visitor.c
[Qemu-devel] [PATCH qom-next 4/5] target-i386: make initialize CPU in QOM way
Make CPU creation/initialization consistent with QOM object behavior in this, by moving tcg and apic initialization from board level into CPU's initfn/realize calls and cpu_model property setter. Which makes CPU object self-sufficient in respect of creation/initialization and matches a typical object creation sequence, i.e.: - create CPU instance - set properties - realize object - (x86_cpu_realize will be converted into realize property setter, when it is implemented) Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/pc.c | 32 +- target-i386/cpu.c| 60 +- target-i386/helper.c | 39 3 files changed, 65 insertions(+), 66 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 0eb0b73..677f9e0 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -894,37 +894,17 @@ static void pc_cpu_reset(void *opaque) cpu_reset(CPU(cpu)); } -static X86CPU *pc_new_cpu(const char *cpu_model) -{ -X86CPU *cpu; -CPUX86State *env; - -cpu = cpu_x86_init(cpu_model); -if (cpu == NULL) { -exit(1); -} -env = cpu-env; -if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -if (kvm_irqchip_in_kernel()) { -env-apic_state = qdev_create(NULL, kvm-apic); -} else { -env-apic_state = qdev_create(NULL, apic); -} -qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); -qdev_prop_set_ptr(env-apic_state, cpu_env, env); -qdev_init_nofail(env-apic_state); -} -qemu_register_reset(pc_cpu_reset, cpu); -pc_cpu_reset(cpu); -return cpu; -} - void pc_cpus_init(const char *cpu_model) { +X86CPU *cpu; int i; for(i = 0; i smp_cpus; i++) { -pc_new_cpu(cpu_model); +cpu = cpu_x86_init(cpu_model); +if (cpu == NULL) { +exit(1); +} +qemu_register_reset(pc_cpu_reset, cpu); } } diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 538892d..0e804ea 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -31,6 +31,9 @@ #include hyperv.h +#include hw/qdev.h +#include sysemu.h + /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement * between feature naming conventions, aliases may be added. @@ -1747,21 +1750,76 @@ static void x86_set_cpu_model(Object *obj, const char *value, Error **errp) if (cpu_x86_register(cpu, env-cpu_model_str) 0) { fprintf(stderr, Unable to find x86 CPU definition\n); error_set(errp, QERR_INVALID_PARAMETER_COMBINATION); +return; +} + +if (((env-cpuid_features CPUID_APIC) || smp_cpus 1)) { +if (kvm_irqchip_in_kernel()) { +env-apic_state = qdev_create(NULL, kvm-apic); +} else { +env-apic_state = qdev_create(NULL, apic); +} +object_property_add_child(OBJECT(cpu), apic, +OBJECT(env-apic_state), NULL); + +qdev_prop_set_uint8(env-apic_state, id, env-cpuid_apic_id); +qdev_prop_set_ptr(env-apic_state, cpu_env, env); +} +} + +static CPUDebugExcpHandler *prev_debug_excp_handler; + +static void breakpoint_handler(CPUX86State *env) +{ +CPUBreakpoint *bp; + +if (env-watchpoint_hit) { +if (env-watchpoint_hit-flags BP_CPU) { +env-watchpoint_hit = NULL; +if (check_hw_breakpoints(env, 0)) { +raise_exception_env(EXCP01_DB, env); +} else { +cpu_resume_from_signal(env, NULL); +} +} +} else { +QTAILQ_FOREACH(bp, env-breakpoints, entry) +if (bp-pc == env-eip) { +if (bp-flags BP_CPU) { +check_hw_breakpoints(env, 1); +raise_exception_env(EXCP01_DB, env); +} +break; +} +} +if (prev_debug_excp_handler) { +prev_debug_excp_handler(env); } } void x86_cpu_realize(Object *obj, Error **errp) { X86CPU *cpu = X86_CPU(obj); +CPUX86State *env = cpu-env; + +if (env-apic_state) { +if (qdev_init(env-apic_state) 0) { +error_set(errp, QERR_DEVICE_INIT_FAILED, +object_get_typename(OBJECT(env-apic_state))); +return; +} +} mce_init(cpu); -qemu_init_vcpu(cpu-env); +qemu_init_vcpu(env); +cpu_reset(CPU(cpu)); } static void x86_cpu_initfn(Object *obj) { X86CPU *cpu = X86_CPU(obj); CPUX86State *env = cpu-env; +static int inited; cpu_exec_init(env); diff --git a/target-i386/helper.c b/target-i386/helper.c index 6fc67a9..443092e 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -947,34 +947,6 @@ int check_hw_breakpoints(CPUX86State *env, int force_dr6_update) return hit_enabled; } -static CPUDebugExcpHandler *prev_debug_excp_handler;
Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions
On Tue, May 22, 2012 at 05:17:39PM +1000, Benjamin Herrenschmidt wrote: On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote: The emulated devices can run simultaneously with the guest, so we need to be careful with ordering of load and stores done by them to the guest system memory, which need to be observed in the right order by the guest operating system. This adds a barrier call to the basic DMA read/write ops which is currently implemented as a smp_mb(), but could be later improved for more fine grained control of barriers. Additionally, a _relaxed() variant of the accessors is provided to easily convert devices who would be performance sensitive and negatively impacted by the change. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- (Note to Rusty: The number I told you on ST is wrong, see below) So I tried to do some performance measurements with that patch using netperf on an x86 laptop (x220 with core i7). It's a bit tricky. For example, if I just create a tap interface, give it a local IP on the laptop and a different IP on the guest, (ie talking to a netserver on the host basically from the guest via tap), the performance is pretty poor and the numbers seem useless with and without the barrier. So I did tests involving talking to a server on our gigabit network instead. The baseline is the laptop without kvm talking to the server. The TCP_STREAM test results are: It's not a good test. The thing most affecting throughput results is how much CPU does you guest get. So as a minumum you need to measure CPU utilization on the host and divide by that. -- MST
[Qemu-devel] [PATCH 05/16] qapi schema: remove trailing whitespace
Signed-off-by: Laszlo Ersek ler...@redhat.com --- qapi-schema.json | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/qapi-schema.json b/qapi-schema.json index bb1f806..a8b3803 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -315,7 +315,7 @@ # @CPU: the index of the virtual CPU # # @current: this only exists for backwards compatible and should be ignored -# +# # @halted: true if the virtual CPU is in the halt state. Halt usually refers # to a processor specific low power mode. # @@ -658,7 +658,7 @@ # @SpiceInfo # # Information about the SPICE session. -# +# # @enabled: true if the SPICE server is enabled, false otherwise # # @host: #optional The hostname the SPICE server is bound to. This depends on @@ -1269,7 +1269,7 @@ ## { 'command': 'human-monitor-command', 'data': {'command-line': 'str', '*cpu-index': 'int'}, - 'returns': 'str' } + 'returns': 'str' } ## # @migrate_cancel @@ -1430,7 +1430,7 @@ # @password: the new password # # @connected: #optional how to handle existing clients when changing the -# password. If nothing is specified, defaults to `keep' +# password. If nothing is specified, defaults to `keep' # `fail' to fail the command if clients are connected # `disconnect' to disconnect existing clients # `keep' to maintain existing clients @@ -1570,7 +1570,7 @@ # If the argument combination is invalid, InvalidParameterCombination # # Since: 1.1 -## +## { 'command': 'block_set_io_throttle', 'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int', 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } } -- 1.7.1
[Qemu-devel] [PATCH 10/16] convert net_init_dump() to NetClientOptions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/dump.c | 21 + 1 files changed, 17 insertions(+), 4 deletions(-) diff --git a/net/dump.c b/net/dump.c index 27e9528..a1de09e 100644 --- a/net/dump.c +++ b/net/dump.c @@ -144,22 +144,35 @@ static int net_dump_init(VLANState *vlan, const char *device, return 0; } -int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_dump(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { int len; const char *file; char def_file[128]; +const NetdevDumpOptions *dump; + +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_DUMP); +dump = opts-dump; assert(vlan); -file = qemu_opt_get(opts, file); -if (!file) { +if (dump-has_file) { +file = dump-file; +} else { snprintf(def_file, sizeof(def_file), qemu-vlan%d.pcap, vlan-id); file = def_file; } -len = qemu_opt_get_size(opts, len, 65536); +if (dump-has_len) { +if (dump-len INT_MAX) { +error_report(invalid length: %PRId64, dump-len); +return -1; +} +len = dump-len; +} else { +len = 65536; +} return net_dump_init(vlan, dump, name, file, len); } -- 1.7.1
Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event
On 2012-05-21 03:50, Wen Congyang wrote: When the guest is panicked, it will write 0x1 to the port 0x505. So if qemu reads 0x1 from this port, we can do the folloing three things according to the parameter -onpanic: 1. emit QEVENT_GUEST_PANICKED only 2. emit QEVENT_GUEST_PANICKED and pause VM 3. emit QEVENT_GUEST_PANICKED and quit VM Note: if we emit QEVENT_GUEST_PANICKED only, and the management application does not receive this event(the management may not run when the event is emitted), the management won't know the guest is panicked. Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- kvm-all.c| 84 ++ kvm-stub.c |9 ++ kvm.h|3 ++ monitor.c|3 ++ monitor.h|1 + qapi-schema.json |6 +++- qemu-options.hx | 14 + qmp.c|3 +- vl.c | 17 ++- 9 files changed, 137 insertions(+), 3 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 9b73ccf..b5b0531 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -19,6 +19,7 @@ #include stdarg.h #include linux/kvm.h +#include linux/kvm_para.h #include qemu-common.h #include qemu-barrier.h @@ -29,6 +30,8 @@ #include bswap.h #include memory.h #include exec-memory.h +#include iorange.h +#include qemu-objects.h /* This check must be after config-host.h is included */ #ifdef CONFIG_EVENTFD @@ -1707,3 +1710,84 @@ int kvm_on_sigbus(int code, void *addr) { return kvm_arch_on_sigbus(code, addr); } + +/* Possible values for action parameter. */ +#define PANICKED_REPORT 1 /* emit QEVENT_GUEST_PANICKED only */ +#define PANICKED_PAUSE 2 /* emit QEVENT_GUEST_PANICKED and pause VM */ +#define PANICKED_QUIT 3 /* emit QEVENT_GUEST_PANICKED and quit VM */ + +static int panicked_action = PANICKED_REPORT; + +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, unsigned width, + uint64_t *data) +{ +*data = (1 KVM_PV_FEATURE_PANICKED); +} + +static void panicked_mon_event(const char *action) +{ +QObject *data; + +data = qobject_from_jsonf({ 'action': %s }, action); +monitor_protocol_event(QEVENT_GUEST_PANICKED, data); +qobject_decref(data); +} + +static void panicked_perform_action(void) +{ +switch(panicked_action) { +case PANICKED_REPORT: +panicked_mon_event(report); +break; + +case PANICKED_PAUSE: +panicked_mon_event(pause); +vm_stop(RUN_STATE_GUEST_PANICKED); +break; + +case PANICKED_QUIT: +panicked_mon_event(quit); +exit(0); +break; +} +} + +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, unsigned width, + uint64_t data) +{ +if (data == KVM_PV_PANICKED) +panicked_perform_action(); +} + +static void kvm_pv_port_destructor(IORange *iorange) +{ +g_free(iorange); +} + +static IORangeOps pv_io_range_ops = { +.read = kvm_pv_port_read, +.write = kvm_pv_port_write, +.destructor = kvm_pv_port_destructor, +}; + +void kvm_pv_port_init(void) +{ +IORange *pv_io_range = g_malloc(sizeof(IORange)); + +iorange_init(pv_io_range, pv_io_range_ops, 0x505, 1); +ioport_register(pv_io_range); Not sure if the discussion about the PV channel already settled, but if PIO is the way to go, please model this as a proper QEMU device (e.g. panic, implemented in hw/kvm/panic.c), not just an open-coded PIO range. Jan PS: checkpatch.pl... -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH 07/16] hw, net: net_client_type - NetClientOptionsKind (qapi-generated)
NET_CLIENT_TYPE_ - NET_CLIENT_OPTIONS_KIND_ Signed-off-by: Laszlo Ersek ler...@redhat.com --- net.h | 16 +- hw/cadence_gem.c|2 +- hw/dp8393x.c|2 +- hw/e1000.c |2 +- hw/eepro100.c |2 +- hw/etraxfs_eth.c|2 +- hw/lan9118.c|2 +- hw/lance.c |2 +- hw/mcf_fec.c|2 +- hw/milkymist-minimac2.c |2 +- hw/mipsnet.c|2 +- hw/musicpal.c |2 +- hw/ne2000-isa.c |2 +- hw/ne2000.c |2 +- hw/opencores_eth.c |2 +- hw/pcnet-pci.c |2 +- hw/rtl8139.c|2 +- hw/smc91c111.c |2 +- hw/spapr_llan.c |2 +- hw/stellaris_enet.c |2 +- hw/usb/dev-network.c|2 +- hw/vhost_net.c |2 +- hw/virtio-net.c | 10 hw/xen_nic.c|2 +- hw/xgmac.c |2 +- hw/xilinx_axienet.c |2 +- hw/xilinx_ethlite.c |2 +- net.c | 50 +++--- net/dump.c |2 +- net/slirp.c |2 +- net/socket.c|4 +- net/tap-win32.c |2 +- net/tap.c | 16 +++--- net/vde.c |2 +- 34 files changed, 71 insertions(+), 83 deletions(-) diff --git a/net.h b/net.h index bdc2a06..b0b8c7a 100644 --- a/net.h +++ b/net.h @@ -7,6 +7,7 @@ #include qemu-option.h #include net/queue.h #include vmstate.h +#include qapi-types.h struct MACAddr { uint8_t a[6]; @@ -29,19 +30,6 @@ typedef struct NICConf { /* VLANs support */ -typedef enum { -NET_CLIENT_TYPE_NONE, -NET_CLIENT_TYPE_NIC, -NET_CLIENT_TYPE_USER, -NET_CLIENT_TYPE_TAP, -NET_CLIENT_TYPE_SOCKET, -NET_CLIENT_TYPE_VDE, -NET_CLIENT_TYPE_DUMP, -NET_CLIENT_TYPE_BRIDGE, - -NET_CLIENT_TYPE_MAX -} net_client_type; - typedef void (NetPoll)(VLANClientState *, bool enable); typedef int (NetCanReceive)(VLANClientState *); typedef ssize_t (NetReceive)(VLANClientState *, const uint8_t *, size_t); @@ -50,7 +38,7 @@ typedef void (NetCleanup) (VLANClientState *); typedef void (LinkStatusChanged)(VLANClientState *); typedef struct NetClientInfo { -net_client_type type; +NetClientOptionsKind type; size_t size; NetReceive *receive; NetReceive *receive_raw; diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c index e2140ae..f184620 100644 --- a/hw/cadence_gem.c +++ b/hw/cadence_gem.c @@ -1161,7 +1161,7 @@ static void gem_set_link(VLANClientState *nc) } static NetClientInfo net_gem_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = gem_can_receive, .receive = gem_receive, diff --git a/hw/dp8393x.c b/hw/dp8393x.c index 017d074..756d630 100644 --- a/hw/dp8393x.c +++ b/hw/dp8393x.c @@ -872,7 +872,7 @@ static void nic_cleanup(VLANClientState *nc) } static NetClientInfo net_dp83932_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = nic_can_receive, .receive = nic_receive, diff --git a/hw/e1000.c b/hw/e1000.c index 4573f13..ad24298 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -1206,7 +1206,7 @@ pci_e1000_uninit(PCIDevice *dev) } static NetClientInfo net_e1000_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = e1000_can_receive, .receive = e1000_receive, diff --git a/hw/eepro100.c b/hw/eepro100.c index 6279ae3..f343685 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -1845,7 +1845,7 @@ static int pci_nic_uninit(PCIDevice *pci_dev) } static NetClientInfo net_eepro100_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = nic_can_receive, .receive = nic_receive, diff --git a/hw/etraxfs_eth.c b/hw/etraxfs_eth.c index 16a0637..45fb40c 100644 --- a/hw/etraxfs_eth.c +++ b/hw/etraxfs_eth.c @@ -579,7 +579,7 @@ static void eth_cleanup(VLANClientState *nc) } static NetClientInfo net_etraxfs_info = { - .type = NET_CLIENT_TYPE_NIC, + .type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = eth_can_receive, .receive = eth_receive, diff --git a/hw/lan9118.c b/hw/lan9118.c index 7b4fe87..40fb765 100644 --- a/hw/lan9118.c +++ b/hw/lan9118.c @@ -1310,7 +1310,7 @@ static void lan9118_cleanup(VLANClientState *nc) } static NetClientInfo net_lan9118_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = lan9118_can_receive, .receive = lan9118_receive, diff --git a/hw/lance.c b/hw/lance.c index ce3d46c..91c0e16 100644 --- a/hw/lance.c +++ b/hw/lance.c @@ -93,7 +93,7 @@
[Qemu-devel] [PATCH qom-next 1/5] target-i386: move cpu halted decision into x86_cpu_reset
From: Igor Mammedov niall...@gmail.com MP initialization protocol differs between cpu families, and for P6 and onward models it is up to CPU to decide if it will be BSP using this protocol, so try to model this. However there is no point in implementing MP initialization protocol in qemu. Thus first CPU is always marked as BSP. This patch: - moves decision to designate BSP from board into cpu, making cpu self-sufficient in this regard. Later it will allow to cleanup hw/pc.c and remove cpu_reset and wrappers from there. - stores flag that CPU is BSP in IA32_APIC_BASE to model behavior described in Inted SDM vol 3a part 1 chapter 8.4.1 - uses MSR_IA32_APICBASE_BSP flag in apic_base for checking if cpu is BSP patch is based on Jan Kiszka's proposal: http://thread.gmane.org/gmane.comp.emulators.qemu/100806 Signed-off-by: Igor Mammedov niall...@gmail.com Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/apic.h|2 +- hw/apic_common.c | 18 -- hw/pc.c |9 - target-i386/cpu.c|7 +++ target-i386/helper.c |1 - target-i386/kvm.c|5 +++-- 6 files changed, 23 insertions(+), 19 deletions(-) diff --git a/hw/apic.h b/hw/apic.h index 62179ce..d961ed4 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -20,9 +20,9 @@ void apic_init_reset(DeviceState *s); void apic_sipi(DeviceState *s); void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, TPRAccess access); +void apic_designate_bsp(DeviceState *d); /* pc.c */ -int cpu_is_bsp(CPUX86State *env); DeviceState *cpu_get_current_apic(void); #endif diff --git a/hw/apic_common.c b/hw/apic_common.c index 60b8259..23d51e8 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -43,8 +43,8 @@ uint64_t cpu_get_apic_base(DeviceState *d) trace_cpu_get_apic_base((uint64_t)s-apicbase); return s-apicbase; } else { -trace_cpu_get_apic_base(0); -return 0; +trace_cpu_get_apic_base(MSR_IA32_APICBASE_BSP); +return MSR_IA32_APICBASE_BSP; } } @@ -201,22 +201,28 @@ void apic_init_reset(DeviceState *d) s-timer_expiry = -1; } +void apic_designate_bsp(DeviceState *d) +{ +if (d) { +APICCommonState *s = APIC_COMMON(d); +s-apicbase |= MSR_IA32_APICBASE_BSP; +} +} + static void apic_reset_common(DeviceState *d) { APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d); APICCommonClass *info = APIC_COMMON_GET_CLASS(s); -bool bsp; -bsp = cpu_is_bsp(s-cpu_env); s-apicbase = 0xfee0 | -(bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE; +(s-apicbase MSR_IA32_APICBASE_BSP) | MSR_IA32_APICBASE_ENABLE; s-vapic_paddr = 0; info-vapic_base_update(s); apic_init_reset(d); -if (bsp) { +if (s-apicbase MSR_IA32_APICBASE_BSP) { /* * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization * time typically by BIOS, so PIC interrupt can be delivered to the diff --git a/hw/pc.c b/hw/pc.c index 4167782..6ad1c0d 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -870,12 +870,6 @@ void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd) nb_ne2k++; } -int cpu_is_bsp(CPUX86State *env) -{ -/* We hard-wire the BSP to the first CPU. */ -return env-cpu_index == 0; -} - DeviceState *cpu_get_current_apic(void) { if (cpu_single_env) { @@ -935,10 +929,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) static void pc_cpu_reset(void *opaque) { X86CPU *cpu = opaque; -CPUX86State *env = cpu-env; - cpu_reset(CPU(cpu)); -env-halted = !cpu_is_bsp(env); } static X86CPU *pc_new_cpu(const char *cpu_model) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 89b4ac7..c7c23bf 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1704,6 +1704,13 @@ static void x86_cpu_reset(CPUState *s) env-dr[7] = DR7_FIXED_1; cpu_breakpoint_remove_all(env, BP_CPU); cpu_watchpoint_remove_all(env, BP_CPU); + +/* We hard-wire the BSP to the first CPU. */ +if (env-cpu_index == 0) { +apic_designate_bsp(env-apic_state); +} + +env-halted = !(cpu_get_apic_base(env-apic_state) MSR_IA32_APICBASE_BSP); } static void mce_init(X86CPU *cpu) diff --git a/target-i386/helper.c b/target-i386/helper.c index 8df109f..94f95b7 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1197,7 +1197,6 @@ void do_cpu_init(X86CPU *cpu) env-interrupt_request = sipi; env-pat = pat; apic_init_reset(env-apic_state); -env-halted = !cpu_is_bsp(env); } void do_cpu_sipi(X86CPU *cpu) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 0d0d8f6..09621e5 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -583,8 +583,9 @@ void kvm_arch_reset_vcpu(CPUX86State *env) env-interrupt_injected = -1; env-xcr0 = 1; if (kvm_irqchip_in_kernel()) { -env-mp_state = cpu_is_bsp(env) ?
[Qemu-devel] [PATCH 16/16] remove unused QemuOpts parameter from net init functions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/dump.h |5 ++--- net/slirp.h |5 ++--- net/socket.h|5 ++--- net/tap.h |9 - net/vde.h |5 ++--- net.c | 14 ++ net/dump.c |4 ++-- net/slirp.c |4 ++-- net/socket.c|4 ++-- net/tap-win32.c |4 ++-- net/tap.c |8 net/vde.c |4 ++-- 12 files changed, 32 insertions(+), 39 deletions(-) diff --git a/net/dump.h b/net/dump.h index 85ac00b..0fa2dd7 100644 --- a/net/dump.h +++ b/net/dump.h @@ -25,10 +25,9 @@ #define QEMU_NET_DUMP_H #include net.h -#include qemu-common.h #include qapi-types.h -int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts, - const char *name, VLANState *vlan); +int net_init_dump(const NetClientOptions *opts, const char *name, + VLANState *vlan); #endif /* QEMU_NET_DUMP_H */ diff --git a/net/slirp.h b/net/slirp.h index ef13a65..fa7eacc 100644 --- a/net/slirp.h +++ b/net/slirp.h @@ -26,13 +26,12 @@ #include qemu-common.h #include qdict.h -#include qemu-option.h #include qapi-types.h #ifdef CONFIG_SLIRP -int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts, - const char *name, VLANState *vlan); +int net_init_slirp(const NetClientOptions *opts, const char *name, + VLANState *vlan); void net_slirp_hostfwd_add(Monitor *mon, const QDict *qdict); void net_slirp_hostfwd_remove(Monitor *mon, const QDict *qdict); diff --git a/net/socket.h b/net/socket.h index e44d26e..c4809ad 100644 --- a/net/socket.h +++ b/net/socket.h @@ -25,10 +25,9 @@ #define QEMU_NET_SOCKET_H #include net.h -#include qemu-common.h #include qapi-types.h -int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts, -const char *name, VLANState *vlan); +int net_init_socket(const NetClientOptions *opts, const char *name, +VLANState *vlan); #endif /* QEMU_NET_SOCKET_H */ diff --git a/net/tap.h b/net/tap.h index f092129..19dea58 100644 --- a/net/tap.h +++ b/net/tap.h @@ -27,14 +27,13 @@ #define QEMU_NET_TAP_H #include qemu-common.h -#include qemu-option.h #include qapi-types.h #define DEFAULT_NETWORK_SCRIPT /etc/qemu-ifup #define DEFAULT_NETWORK_DOWN_SCRIPT /etc/qemu-ifdown -int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts, - const char *name, VLANState *vlan); +int net_init_tap(const NetClientOptions *opts, const char *name, + VLANState *vlan); int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required); @@ -59,7 +58,7 @@ int tap_get_fd(VLANClientState *vc); struct vhost_net; struct vhost_net *tap_get_vhost_net(VLANClientState *vc); -int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, -const char *name, VLANState *vlan); +int net_init_bridge(const NetClientOptions *opts, const char *name, +VLANState *vlan); #endif /* QEMU_NET_TAP_H */ diff --git a/net/vde.h b/net/vde.h index 5fc17f9..ad502ef 100644 --- a/net/vde.h +++ b/net/vde.h @@ -25,13 +25,12 @@ #define QEMU_NET_VDE_H #include qemu-common.h -#include qemu-option.h #include qapi-types.h #ifdef CONFIG_VDE -int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts, - const char *name, VLANState *vlan); +int net_init_vde(const NetClientOptions *opts, const char *name, + VLANState *vlan); #endif /* CONFIG_VDE */ diff --git a/net.c b/net.c index bd2fd23..687992a 100644 --- a/net.c +++ b/net.c @@ -748,8 +748,8 @@ int net_handle_fd_param(Monitor *mon, const char *param) return fd; } -static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts, -const char *name, VLANState *vlan) +static int net_init_nic(const NetClientOptions *opts, const char *name, +VLANState *vlan) { int idx; NICInfo *nd; @@ -813,8 +813,7 @@ static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts, static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( -QemuOpts *old_opts, -const NetClientOptions *new_opts, +const NetClientOptions *opts, const char *name, VLANState *vlan) = { [NET_CLIENT_OPTIONS_KIND_NIC]= net_init_nic, @@ -833,8 +832,7 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( }; -static int net_client_init1(const void *object, int is_netdev, -QemuOpts *old_opts, Error **errp) +static int net_client_init1(const void *object, int is_netdev, Error **errp) { union { const Netdev*netdev; @@ -884,7 +882,7 @@ static int net_client_init1(const void *object, int is_netdev, vlan = qemu_find_vlan(u.net-has_vlan ? u.net-vlan : 0, true); } -if (net_client_init_fun[opts-kind](old_opts, opts,
Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions
On Tue, 2012-05-22 at 14:14 +0300, Michael S. Tsirkin wrote: The baseline is the laptop without kvm talking to the server. The TCP_STREAM test results are: It's not a good test. The thing most affecting throughput results is how much CPU does you guest get. So as a minumum you need to measure CPU utilization on the host and divide by that. The simple fact that we don't reach the baseline while in qemu seems to be a reasonably good indication that we tend to be CPU bound already so it's not -that- relevant. It would be if we were saturating the network. But yes, I can try to do more tests tomorrow, it would be nice if you could contribute a proper test protocol (or even test on some machines) since you seem to be familiar with those measurements (and I have a very limited access to x86 gear ... basically just my laptop). Cheers, Ben.
[Qemu-devel] [PATCH 01/16] qapi: fix error propagation
From: Paolo Bonzini pbonz...@redhat.com Don't overwrite / leak previously set errors. Don't try to end a container that could not be started. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Laszlo Ersek ler...@redhat.com --- error.h|4 +- error.c|4 +- qapi/qapi-visit-core.c | 10 +-- tests/test-qmp-input-visitor.c | 24 +--- docs/qapi-code-gen.txt |2 + scripts/qapi-visit.py | 129 +++ 6 files changed, 102 insertions(+), 71 deletions(-) diff --git a/error.h b/error.h index 45ff6c1..6898f84 100644 --- a/error.h +++ b/error.h @@ -24,7 +24,7 @@ typedef struct Error Error; /** * Set an indirect pointer to an error given a printf-style format parameter. * Currently, qerror.h defines these error formats. This function is not - * meant to be used outside of QEMU. + * meant to be used outside of QEMU. Errors after the first are discarded. */ void error_set(Error **err, const char *fmt, ...) GCC_FMT_ATTR(2, 3); @@ -57,7 +57,7 @@ void error_set_field(Error *err, const char *field, const char *value); /** * Propagate an error to an indirect pointer to an error. This function will * always transfer ownership of the error reference and handles the case where - * dst_err is NULL correctly. + * dst_err is NULL correctly. Errors after the first are discarded. */ void error_propagate(Error **dst_err, Error *local_err); diff --git a/error.c b/error.c index a52b771..0177972 100644 --- a/error.c +++ b/error.c @@ -29,7 +29,7 @@ void error_set(Error **errp, const char *fmt, ...) Error *err; va_list ap; -if (errp == NULL) { +if (errp == NULL || *errp != NULL) { return; } @@ -132,7 +132,7 @@ bool error_is_type(Error *err, const char *fmt) void error_propagate(Error **dst_err, Error *local_err) { -if (dst_err) { +if (dst_err !*dst_err) { *dst_err = local_err; } else if (local_err) { error_free(local_err); diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c index a4e088c..df1ed5c 100644 --- a/qapi/qapi-visit-core.c +++ b/qapi/qapi-visit-core.c @@ -39,9 +39,8 @@ void visit_start_struct(Visitor *v, void **obj, const char *kind, void visit_end_struct(Visitor *v, Error **errp) { -if (!error_is_set(errp)) { -v-end_struct(v, errp); -} +assert(!error_is_set(errp)); +v-end_struct(v, errp); } void visit_start_list(Visitor *v, const char *name, Error **errp) @@ -62,9 +61,8 @@ GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp) void visit_end_list(Visitor *v, Error **errp) { -if (!error_is_set(errp)) { -v-end_list(v, errp); -} +assert(!error_is_set(errp)); +v-end_list(v, errp); } void visit_start_optional(Visitor *v, bool *present, const char *name, diff --git a/tests/test-qmp-input-visitor.c b/tests/test-qmp-input-visitor.c index c30fdc4..8f5a509 100644 --- a/tests/test-qmp-input-visitor.c +++ b/tests/test-qmp-input-visitor.c @@ -151,14 +151,22 @@ typedef struct TestStruct static void visit_type_TestStruct(Visitor *v, TestStruct **obj, const char *name, Error **errp) { -visit_start_struct(v, (void **)obj, TestStruct, name, sizeof(TestStruct), - errp); - -visit_type_int(v, (*obj)-integer, integer, errp); -visit_type_bool(v, (*obj)-boolean, boolean, errp); -visit_type_str(v, (*obj)-string, string, errp); - -visit_end_struct(v, errp); +Error *err = NULL; +if (!error_is_set(errp)) { +visit_start_struct(v, (void **)obj, TestStruct, name, sizeof(TestStruct), + err); +if (!err) { +visit_type_int(v, (*obj)-integer, integer, err); +visit_type_bool(v, (*obj)-boolean, boolean, err); +visit_type_str(v, (*obj)-string, string, err); + +/* Always call end_struct if start_struct succeeded. */ +error_propagate(errp, err); +err = NULL; +visit_end_struct(v, err); +} +error_propagate(errp, err); +} } static void test_visitor_in_struct(TestInputVisitorData *data, diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt index ad11767..cccb11e 100644 --- a/docs/qapi-code-gen.txt +++ b/docs/qapi-code-gen.txt @@ -220,6 +220,8 @@ Example: #endif mdroth@illuin:~/w/qemu2.git$ +(The actual structure of the visit_type_* functions is a bit more complex +in order to propagate errors correctly and avoid leaking memory). === scripts/qapi-commands.py === diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py index 8d4e94a..61cf586 100644 --- a/scripts/qapi-visit.py +++ b/scripts/qapi-visit.py @@ -17,14 +17,37 @@ import os import getopt import errno -def generate_visit_struct_body(field_prefix, members): -ret = +def generate_visit_struct_body(field_prefix, name,
[Qemu-devel] [PATCH 12/16] convert net_init_socket() to NetClientOptions
I reverse engineered the following permissions between the -socket sub-options: fd listen connect mcast udp | localaddr fd x . .. . | . listen . x .. . | . connect. . x. . | . mcast . . .x . | x udp. . .. x | x ---+ localaddr . . .x x x I transformed the code accordingly. The real fix would be to embed fd, listen, connect, mcast and udp in a separate union. However OptsVisitor's enum parser only supports the type=XXX QemuOpt instance as union discriminator. Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/socket.c | 119 +- 1 files changed, 43 insertions(+), 76 deletions(-) diff --git a/net/socket.c b/net/socket.c index 563447d..e3cba20 100644 --- a/net/socket.c +++ b/net/socket.c @@ -586,101 +586,68 @@ static int net_socket_udp_init(VLANState *vlan, return 0; } -int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_socket(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { -if (qemu_opt_get(opts, fd)) { -int fd; +const NetdevSocketOptions *sock; -if (qemu_opt_get(opts, listen) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, mcast) || -qemu_opt_get(opts, localaddr)) { -error_report(listen=, connect=, mcast= and localaddr= is invalid with fd=); -return -1; -} +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_SOCKET); +sock = opts-socket; -fd = net_handle_fd_param(cur_mon, qemu_opt_get(opts, fd)); -if (fd == -1) { -return -1; -} +if (sock-has_fd + sock-has_listen + sock-has_connect + sock-has_mcast + +sock-has_udp != 1) { +error_report(exactly one of fd=, listen=, connect=, mcast= or udp= + is required); +return -1; +} -if (!net_socket_fd_init(vlan, socket, name, fd, 1)) { -return -1; -} -} else if (qemu_opt_get(opts, listen)) { -const char *listen; - -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, mcast) || -qemu_opt_get(opts, localaddr)) { -error_report(fd=, connect=, mcast= and localaddr= is invalid with listen=); -return -1; -} +if (sock-has_localaddr !sock-has_mcast !sock-has_udp) { +error_report(localaddr= is only valid with mcast= or udp=); +return -1; +} -listen = qemu_opt_get(opts, listen); +if (sock-has_fd) { +int fd; -if (net_socket_listen_init(vlan, socket, name, listen) == -1) { -return -1; -} -} else if (qemu_opt_get(opts, connect)) { -const char *connect; - -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, listen) || -qemu_opt_get(opts, mcast) || -qemu_opt_get(opts, localaddr)) { -error_report(fd=, listen=, mcast= and localaddr= is invalid with connect=); +fd = net_handle_fd_param(cur_mon, sock-fd); +if (fd == -1 || !net_socket_fd_init(vlan, socket, name, fd, 1)) { return -1; } +return 0; +} -connect = qemu_opt_get(opts, connect); - -if (net_socket_connect_init(vlan, socket, name, connect) == -1) { +if (sock-has_listen) { +if (net_socket_listen_init(vlan, socket, name, sock-listen) == -1) { return -1; } -} else if (qemu_opt_get(opts, mcast)) { -const char *mcast, *localaddr; +return 0; +} -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, listen)) { -error_report(fd=, connect= and listen= is invalid with mcast=); +if (sock-has_connect) { +if (net_socket_connect_init(vlan, socket, name, sock-connect) == +-1) { return -1; } +return 0; +} -mcast = qemu_opt_get(opts, mcast); -localaddr = qemu_opt_get(opts, localaddr); - -if (net_socket_mcast_init(vlan, socket, name, mcast, localaddr) == -1) { -return -1; -} -} else if (qemu_opt_get(opts, udp)) { -const char *udp, *localaddr; - -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, listen) || -qemu_opt_get(opts, mcast)) { -error_report(fd=, connect=, listen= - and mcast= is invalid with udp=); +if (sock-has_mcast) { +/* if sock-localaddr is missing, it has been initialized to all bits + * zero */ +if
[Qemu-devel] [PATCHv3 2/2] envlist.c: handle strdup failure
From: Jim Meyering meyer...@redhat.com Without this, envlist_to_environ may silently fail to copy all strings into the destination buffer, and both callers would leak any env strings allocated after a failing strdup, because the freeing code stops at the first NULL pointer. Signed-off-by: Jim Meyering meyer...@redhat.com --- envlist.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/envlist.c b/envlist.c index e44889b..df5c723 100644 --- a/envlist.c +++ b/envlist.c @@ -234,8 +234,16 @@ envlist_to_environ(const envlist_t *envlist, size_t *count) return (NULL); for (entry = envlist-el_entries.lh_first; entry != NULL; -entry = entry-ev_link.le_next) { -*(penv++) = strdup(entry-ev_var); + entry = entry-ev_link.le_next, penv++) { +*penv = strdup(entry-ev_var); +if (*penv == NULL) { +char **e = env; +while (e = penv) { +free(*e++); +} +free(env); +return NULL; +} } *penv = NULL; /* NULL terminate the list */ -- 1.7.10.2.552.gaa3bb87
[Qemu-devel] [PATCH qom-next 5/5] target-i386: move reset callback to cpu.c
Moving reset callback into cpu object from board level will allow properly create object during run-time (hotplug). When reset over QOM hierarchy is implemented, this reset callback should be removed. Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/pc.c |7 --- target-i386/cpu.c |8 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 677f9e0..70dd0e6 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -888,12 +888,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) } } -static void pc_cpu_reset(void *opaque) -{ -X86CPU *cpu = opaque; -cpu_reset(CPU(cpu)); -} - void pc_cpus_init(const char *cpu_model) { X86CPU *cpu; @@ -904,7 +898,6 @@ void pc_cpus_init(const char *cpu_model) if (cpu == NULL) { exit(1); } -qemu_register_reset(pc_cpu_reset, cpu); } } diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 0e804ea..87f4f5a 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1716,6 +1716,13 @@ static void x86_cpu_reset(CPUState *s) env-halted = !(cpu_get_apic_base(env-apic_state) MSR_IA32_APICBASE_BSP); } +/* TODO: remove me, when reset over QOM tree is implemented */ +static void x86_cpu_machine_reset_cb(void *opaque) +{ +X86CPU *cpu = opaque; +cpu_reset(CPU(cpu)); +} + static void mce_init(X86CPU *cpu) { CPUX86State *cenv = cpu-env; @@ -1812,6 +1819,7 @@ void x86_cpu_realize(Object *obj, Error **errp) mce_init(cpu); qemu_init_vcpu(env); +qemu_register_reset(x86_cpu_machine_reset_cb, cpu); cpu_reset(CPU(cpu)); } -- 1.7.7.6
[Qemu-devel] [PATCH 03/16] expose QemuOpt and QemuOpts struct definitions to interested parties
The only clients should be the existent qemu-option.c, and the upcoming qapi/opts-visitor.c. Signed-off-by: Laszlo Ersek ler...@redhat.com --- qemu-option-internal.h | 53 qemu-option.c | 24 + 2 files changed, 54 insertions(+), 23 deletions(-) create mode 100644 qemu-option-internal.h diff --git a/qemu-option-internal.h b/qemu-option-internal.h new file mode 100644 index 000..19fdc1c --- /dev/null +++ b/qemu-option-internal.h @@ -0,0 +1,53 @@ +/* + * Commandline option parsing functions + * + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2009 Kevin Wolf kw...@redhat.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef QEMU_OPTIONS_INTERNAL_H +#define QEMU_OPTIONS_INTERNAL_H + +#include qemu-option.h + +struct QemuOpt { +const char *name; +const char *str; + +const QemuOptDesc *desc; +union { +bool boolean; +uint64_t uint; +} value; + +QemuOpts *opts; +QTAILQ_ENTRY(QemuOpt) next; +}; + +struct QemuOpts { +char *id; +QemuOptsList *list; +Location loc; +QTAILQ_HEAD(QemuOptHead, QemuOpt) head; +QTAILQ_ENTRY(QemuOpts) next; +}; + +#endif diff --git a/qemu-option.c b/qemu-option.c index bb3886c..8334190 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -29,9 +29,9 @@ #include qemu-common.h #include qemu-error.h #include qemu-objects.h -#include qemu-option.h #include error.h #include qerror.h +#include qemu-option-internal.h /* * Extracts the name of an option from the parameter string (p points at the @@ -511,28 +511,6 @@ void print_option_help(QEMUOptionParameter *list) /* -- */ -struct QemuOpt { -const char *name; -const char *str; - -const QemuOptDesc *desc; -union { -bool boolean; -uint64_t uint; -} value; - -QemuOpts *opts; -QTAILQ_ENTRY(QemuOpt) next; -}; - -struct QemuOpts { -char *id; -QemuOptsList *list; -Location loc; -QTAILQ_HEAD(QemuOptHead, QemuOpt) head; -QTAILQ_ENTRY(QemuOpts) next; -}; - static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name) { QemuOpt *opt; -- 1.7.1
[Qemu-devel] [PATCH 09/16] convert net_init_nic() to NetClientOptions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net.c | 39 ++- 1 files changed, 22 insertions(+), 17 deletions(-) diff --git a/net.c b/net.c index 5ac5cf0..bd2fd23 100644 --- a/net.c +++ b/net.c @@ -748,12 +748,15 @@ int net_handle_fd_param(Monitor *mon, const char *param) return fd; } -static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, +static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { int idx; NICInfo *nd; -const char *netdev; +const NetLegacyNicOptions *nic; + +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_NIC); +nic = opts-nic; idx = nic_get_free_idx(); if (idx == -1 || nb_nics = MAX_NICS) { @@ -765,10 +768,10 @@ static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, memset(nd, 0, sizeof(*nd)); -if ((netdev = qemu_opt_get(opts, netdev))) { -nd-netdev = qemu_find_netdev(netdev); +if (nic-has_netdev) { +nd-netdev = qemu_find_netdev(nic-netdev); if (!nd-netdev) { -error_report(netdev '%s' not found, netdev); +error_report(netdev '%s' not found, nic-netdev); return -1; } } else { @@ -778,26 +781,28 @@ static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, if (name) { nd-name = g_strdup(name); } -if (qemu_opt_get(opts, model)) { -nd-model = g_strdup(qemu_opt_get(opts, model)); +if (nic-has_model) { +nd-model = g_strdup(nic-model); } -if (qemu_opt_get(opts, addr)) { -nd-devaddr = g_strdup(qemu_opt_get(opts, addr)); +if (nic-has_addr) { +nd-devaddr = g_strdup(nic-addr); } -if (qemu_opt_get(opts, macaddr) -net_parse_macaddr(nd-macaddr.a, qemu_opt_get(opts, macaddr)) 0) { +if (nic-has_macaddr +net_parse_macaddr(nd-macaddr.a, nic-macaddr) 0) { error_report(invalid syntax for ethernet address); return -1; } qemu_macaddr_default_if_unset(nd-macaddr); -nd-nvectors = qemu_opt_get_number(opts, vectors, - DEV_NVECTORS_UNSPECIFIED); -if (nd-nvectors != DEV_NVECTORS_UNSPECIFIED -(nd-nvectors 0 || nd-nvectors 0x7ff)) { -error_report(invalid # of vectors: %d, nd-nvectors); -return -1; +if (nic-has_vectors) { +if (nic-vectors 0x7ff) { +error_report(invalid # of vectors: %PRId64, nic-vectors); +return -1; +} +nd-nvectors = nic-vectors; +} else { +nd-nvectors = DEV_NVECTORS_UNSPECIFIED; } nd-used = 1; -- 1.7.1
Re: [Qemu-devel] [PATCH 2/2] fdc: fix media detection
Am 22.05.2012 12:59, schrieb Pavel Hrdina: We have to set up 'media_changed' after guest start so floppy driver could detect that there is no media in drive. For this purpose we call 'fdctrl_change_cb' instead of 'fd_revalidate' in 'fdctrl_connect_drives'. 'fd_revalidate' is called inside 'fdctrl_change_cb'. In 'fdctrl_handle_seek' we always set current track because we don't care if there is media inserted or not. Signed-off-by: Pavel Hrdina phrd...@redhat.com Can you please add a qtest case that shows the problems that you're fixing in this series? diff --git a/hw/fdc.c b/hw/fdc.c index cb4cd25..337b35a 100644 --- a/hw/fdc.c +++ b/hw/fdc.c @@ -1617,11 +1617,7 @@ static void fdctrl_handle_seek(FDCtrl *fdctrl, int direction) /* The seek command just sends step pulses to the drive and doesn't care if * there is a medium inserted of if it's banging the head against the drive. */ -if (fdctrl-fifo[2] cur_drv-max_track) { -cur_drv-track = cur_drv-max_track; -} else { -cur_drv-track = fdctrl-fifo[2]; -} +cur_drv-track = fdctrl-fifo[2]; Why is it okay to have cur_drv-track point outside the floppy? Won't it mess up future calculations? Not all other places check it again cur_drv-max_track. Kevin
Re: [Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file
On 05/22/2012 02:18 AM, Kevin Wolf wrote: This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to open a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file. I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42? This doesn't make some file names magic, it makes all file names magic. In other words, _every_ call to open() first checks the database for an existing fd for the same file name. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions
On Tue, May 22, 2012 at 09:41:41PM +1000, Benjamin Herrenschmidt wrote: On Tue, 2012-05-22 at 14:14 +0300, Michael S. Tsirkin wrote: The baseline is the laptop without kvm talking to the server. The TCP_STREAM test results are: It's not a good test. The thing most affecting throughput results is how much CPU does you guest get. So as a minumum you need to measure CPU utilization on the host and divide by that. The simple fact that we don't reach the baseline while in qemu seems to be a reasonably good indication that we tend to be CPU bound already so it's not -that- relevant. It would be if we were saturating the network. But yes, I can try to do more tests tomorrow, it would be nice if you could contribute a proper test protocol (or even test on some machines) since you seem to be familiar with those measurements (and I have a very limited access to x86 gear ... basically just my laptop). Cheers, Ben. I have a deja vu. Amos sent perf results when you argued about exactly the same issue in guest virtio. Delta was small but measureable. At the moment I have no free time or free hardware to redo the same work all over again. It's a well known fact that actual memory barrier is slow on x86 CPUs. You can't see it with network on your laptop? Write a microbenchmark. -- MST
[Qemu-devel] [PATCH 13/16] convert net_init_vde() to NetClientOptions
Signed-off-by: Laszlo Ersek ler...@redhat.com --- net/vde.c | 37 + 1 files changed, 29 insertions(+), 8 deletions(-) diff --git a/net/vde.c b/net/vde.c index 8e60f68..35e8113 100644 --- a/net/vde.c +++ b/net/vde.c @@ -110,20 +110,41 @@ static int net_vde_init(VLANState *vlan, const char *model, return 0; } -int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_vde(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { -const char *sock; -const char *group; int port, mode; -sock = qemu_opt_get(opts, sock); -group = qemu_opt_get(opts, group); +const NetdevVdeOptions *vde; -port = qemu_opt_get_number(opts, port, 0); -mode = qemu_opt_get_number(opts, mode, 0700); +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_VDE); +vde = opts-vde; -if (net_vde_init(vlan, vde, name, sock, port, group, mode) == -1) { +if (vde-has_port) { +if (vde-port INT_MAX) { +error_report(invalid port: %PRId64, vde-port); +return -1; +} +port = vde-port; +} +else { +port = 0; +} + +if (vde-has_mode) { +if (vde-mode INT_MAX) { +error_report(invalid mode: %PRId64, vde-mode); +return -1; +} +mode = vde-mode; +} +else { +mode = 0700; +} + +/* missing optional values have been initialized to all bits zero */ +if (net_vde_init(vlan, vde, name, vde-sock, port, vde-group, mode) == + -1) { return -1; } -- 1.7.1
Re: [Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file
Am 22.05.2012 14:02, schrieb Eric Blake: On 05/22/2012 02:18 AM, Kevin Wolf wrote: This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to open a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file. I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42? This doesn't make some file names magic, it makes all file names magic. In other words, _every_ call to open() first checks the database for an existing fd for the same file name. Depends on your definition. You call every database lookup magic, I only considered cases where the database actually contains something. But no matter if some or all, there's magic and I dislike that. Kevin
Re: [Qemu-devel] [PATCH] ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN.
Il 22/05/2012 12:10, Ronnie Sahlberg ha scritto: This allows using LUNs bigger than 2TB. Applied to scsi-next for 1.2, thanks. Paolo Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com --- block/iscsi.c | 101 trace-events |4 +- 2 files changed, 74 insertions(+), 31 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index f956824..ed1ad7b 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -25,6 +25,7 @@ #include config-host.h #include poll.h +#include arpa/inet.h #include qemu-common.h #include qemu-error.h #include block_int.h @@ -168,12 +169,12 @@ iscsi_readv_writev_bh_cb(void *p) static void -iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status, +iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) { IscsiAIOCB *acb = opaque; -trace_iscsi_aio_write10_cb(iscsi, status, acb, acb-canceled); +trace_iscsi_aio_write16_cb(iscsi, status, acb, acb-canceled); g_free(acb-buf); @@ -186,7 +187,7 @@ iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status, acb-status = 0; if (status 0) { -error_report(Failed to write10 data to iSCSI lun. %s, +error_report(Failed to write16 data to iSCSI lun. %s, iscsi_get_error(iscsi)); acb-status = -EIO; } @@ -211,12 +212,9 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, struct iscsi_context *iscsi = iscsilun-iscsi; IscsiAIOCB *acb; size_t size; -int fua = 0; - -/* set FUA on writes when cache mode is write through */ -if (!(bs-open_flags BDRV_O_CACHE_WB)) { -fua = 1; -} +uint32_t num_sectors; +uint64_t lba; +struct iscsi_data data; acb = qemu_aio_get(iscsi_aio_pool, bs, cb, opaque); trace_iscsi_aio_writev(iscsi, sector_num, nb_sectors, opaque, acb); @@ -226,18 +224,44 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, acb-canceled = 0; -/* XXX we should pass the iovec to write10 to avoid the extra copy */ +/* XXX we should pass the iovec to write16 to avoid the extra copy */ /* this will allow us to get rid of 'buf' completely */ size = nb_sectors * BDRV_SECTOR_SIZE; acb-buf = g_malloc(size); qemu_iovec_to_buffer(acb-qiov, acb-buf); -acb-task = iscsi_write10_task(iscsi, iscsilun-lun, acb-buf, size, - sector_qemu2lun(sector_num, iscsilun), - fua, 0, iscsilun-block_size, - iscsi_aio_write10_cb, acb); + + +acb-task = malloc(sizeof(struct scsi_task)); if (acb-task == NULL) { -error_report(iSCSI: Failed to send write10 command. %s, - iscsi_get_error(iscsi)); +error_report(iSCSI: Failed to allocate task for scsi WRITE16 + command. %s, iscsi_get_error(iscsi)); +qemu_aio_release(acb); +return NULL; +} +memset(acb-task, 0, sizeof(struct scsi_task)); + +acb-task-xfer_dir = SCSI_XFER_WRITE; +acb-task-cdb_size = 16; +acb-task-cdb[0] = 0x8a; +if (!(bs-open_flags BDRV_O_CACHE_WB)) { +/* set FUA on writes when cache mode is write through */ +acb-task-cdb[1] |= 0x04; +} +lba = sector_qemu2lun(sector_num, iscsilun); +*(uint32_t *)acb-task-cdb[2] = htonl(lba 32); +*(uint32_t *)acb-task-cdb[6] = htonl(lba 0x); +num_sectors = size / iscsilun-block_size; +*(uint32_t *)acb-task-cdb[10] = htonl(num_sectors); +acb-task-expxferlen = size; + +data.data = acb-buf; +data.size = size; + +if (iscsi_scsi_command_async(iscsi, iscsilun-lun, acb-task, + iscsi_aio_write16_cb, + data, + acb) != 0) { +scsi_free_scsi_task(acb-task); g_free(acb-buf); qemu_aio_release(acb); return NULL; @@ -249,12 +273,12 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, } static void -iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status, +iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) { IscsiAIOCB *acb = opaque; -trace_iscsi_aio_read10_cb(iscsi, status, acb, acb-canceled); +trace_iscsi_aio_read16_cb(iscsi, status, acb, acb-canceled); if (acb-canceled != 0) { qemu_aio_release(acb); @@ -265,7 +289,7 @@ iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status, acb-status = 0; if (status != 0) { -error_report(Failed to read10 data from iSCSI lun. %s, +error_report(Failed to read16 data from iSCSI lun. %s, iscsi_get_error(iscsi)); acb-status = -EIO;
Re: [Qemu-devel] [PATCH] ISCSI: We need to call qemu_notify_event() everytime we update which events we need to be notified for.
Il 22/05/2012 11:56, Ronnie Sahlberg ha scritto: Otherwise, If we add an event for -is-writeable but the socket is already writeable there may be a short delay before the event callback is actually triggered. Those delays would in particular hurt performance during BIOS boot and when the GRUB bootloader reads the kernel and initrd. Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com --- block/iscsi.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..f956824 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -109,6 +109,13 @@ iscsi_set_events(IscsiLun *iscsilun) (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, iscsi_process_flush, iscsilun); + +/* If we just added the event for writeable we must call + and the socket is already writeable the callback might + not be invoked until after a short delay unless we call + qemu_notify_event(). + */ +qemu_notify_event(); } static void Thanks, applied to SCSI branch for 1.1. Paolo
[Qemu-devel] ARM QEMU/KVM and TrustZone
Historically for QEMU we haven't implemented TrustZone support even though we claim to emulate CPUs that provide it. Instead we provide a CPU which mostly looks like a variant of the real thing without the TrustZone feature. We then bolt on a few extra cp15 registers (eg the SCR) as a pragmatic move to get Linux guests to run. Now we're also dealing with KVM on ARM I'd like to define things a bit more solidly so KVM and TCG agree on what the CPU model they present is. There are several possible environments we could provide to a guest: (1) a CPU with full TrustZone support (2) a CPU without TrustZone at all (3) a TZ CPU running in NonSecure PL0/PL1 (4) a TZ CPU running in Secure PL0/PL1 In some ways (1) is the purist solution -- emulate exactly what the hardware does. However: * on TCG it would require a lot of work, including new functionality in core QEMU (to support having different CPU cores being able to see different views of memory, and having the S/NS attribute attached to memory transactions) * it isn't possible in KVM, because the ARM Virtualization Extensions don't allow you to fake the CPSR a guest sees, and so you can't make the guest believe it is in Monitor mode Option (2) is architecturally sanctioned (ie TrustZone is an optional feature, not mandatory), but it doesn't correspond to real CPUs, in that the hardware Cortex-A8/A9/A15 always have TrustZone. So we're modelling something that doesn't really exist. Options (3) and (4) correspond to the environment an OS guest typically actually uses on hardware. For ARM's devboards (versatile express etc) Linux runs in the Secure world but it doesn't actually use any of the TrustZone functionality, it's just a give me full access to everything setup. For just about every other ARM system, the boot rom or equivalent keeps Secure world to itself, and the OS kernel runs in the NonSecure world. (This typically means that the boot rom provides a set of board-specific entry points via the Secure Monitor Call (SMC) instruction for doing operations like invalidate whole L2 cache which require secure privileges.) Proposal: My suggestion is that we present the guest with a view that looks like a sort of superset of (2) (3) and (4), ie sufficient that a guest expecting any of those environments can run. In particular: * no cp15 registers have secure/nonsecure banking * there is only one memory space visible * secure-access-only permissions are not enforced * the handful of only-in-trustzone registers are implemented (eg VBAR, MVBAR) * we implement a fake monitor mode The aim of the fake monitor mode is to allow us to provide fake qemu-specific bootroms which implement whatever the board's SMC interface is, without having to write specific KVM kernel code for each board. So we don't have to run arbitrary secure-world guest code. The rules are: * on an SMC instruction we enter the guest at the SMC vector as defined by the MVBAR (monitor vector base address register) * we actually run with the same access permissions as above (and under KVM if you look at CPSR.M it will tell you you're in Supervisor mode) * return from the SMC is via a standard exception return insn * we don't implement the separate memory space for the secure world. (This implies that you need to find space in the non-secure world's physical memory map for the bootrom shim; not a big deal I think since we already have a requirement for some space to put QEMU's arm_boot trivial bootloader.) The code written for this fake monitor mode environment is likely to be able to work OK if we ever implement full TrustZone support in TCG QEMU. Work required: * Documentation: the general principles as listed above * TCG: make sure we have implementations of all the TZ registers * TCG: implement the SMC and fake-monitor-mode (I already have patches from Nokia in the qemu-linaro stack which can be cleaned up and used here) * KVM: implement emulation of MVBAR * KVM: set the config bit so SMC is trapped to the hypervisor and causes guest restart at the right entrypoint * KVM: if there turns out to be anything that fake-monitor-mode needs to do that requires Hyp privilege we'd need a hypercall ABI, but I can't currently think of anything I think that's basically a fairly small set of work to formalise the approach we're already taking in practice, and make it a little more flexible. Opinions? -- PMM
[Qemu-devel] [PATCH 1/2] scsi: declare vmstate_info_scsi_requests to be static
From: Jim Meyering meyer...@redhat.com Signed-off-by: Jim Meyering meyer...@redhat.com --- hw/scsi-bus.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 8ab9bcd..f10f3ec 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -1561,7 +1561,7 @@ static int get_scsi_requests(QEMUFile *f, void *pv, size_t size) return 0; } -const VMStateInfo vmstate_info_scsi_requests = { +static const VMStateInfo vmstate_info_scsi_requests = { .name = scsi-requests, .get = get_scsi_requests, .put = put_scsi_requests, -- 1.7.10.1
[Qemu-devel] [PULL 1.1 0/2] SCSI patches for 1.1.0-rc3
The following changes since commit 76ee152a86d5f2533443ce4d2be6fe253cfb3c45: Update version to 1.1.0-rc2 (2012-05-14 17:56:50 -0500) are available in the git repository at: git://github.com/bonzini/qemu.git scsi-next for you to fetch changes up to e1a2d34f4abd8a117f5c5a25a5bb2e67d597de23: ISCSI: call qemu_notify_event() after updating events (2012-05-22 14:14:05 +0200) Jim Meyering (1): scsi: declare vmstate_info_scsi_requests to be static Ronnie Sahlberg (1): ISCSI: call qemu_notify_event() after updating events block/iscsi.c |7 +++ hw/scsi-bus.c |2 +- 2 files changed, 8 insertions(+), 1 deletion(-) -- 1.7.10.1