[PATCH] KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()
From: Dave Hansen [EMAIL PROTECTED] Signed-off-by: Dave Hansen [EMAIL PROTECTED] Signed-off-by: Avi Kivity [EMAIL PROTECTED] diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 87d4342..a6299e6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1542,13 +1542,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_vcpu *vcpu = filp-private_data; void __user *argp = (void __user *)arg; int r; + struct kvm_lapic_state *lapic = NULL; switch (ioctl) { case KVM_GET_LAPIC: { - struct kvm_lapic_state lapic; + lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); - memset(lapic, 0, sizeof lapic); - r = kvm_vcpu_ioctl_get_lapic(vcpu, lapic); + r = -ENOMEM; + if (!lapic) + goto out; + r = kvm_vcpu_ioctl_get_lapic(vcpu, lapic); if (r) goto out; r = -EFAULT; @@ -1558,12 +1561,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp, break; } case KVM_SET_LAPIC: { - struct kvm_lapic_state lapic; - + lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); + r = -ENOMEM; + if (!lapic) + goto out; r = -EFAULT; - if (copy_from_user(lapic, argp, sizeof lapic)) + if (copy_from_user(lapic, argp, sizeof(struct kvm_lapic_state))) goto out; - r = kvm_vcpu_ioctl_set_lapic(vcpu, lapic);; + r = kvm_vcpu_ioctl_set_lapic(vcpu, lapic); if (r) goto out; r = 0; @@ -1661,6 +1666,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EINVAL; } out: + if (lapic) + kfree(lapic); return r; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SR-IOV: patches are available for Linux kernel
Greetings, Patches to support Single Root I/O Virtualization (SR-IOV) capability are available for the Linux 2.6 development tree. KVM and Xen supports will come soon! --- Single Root I/O Virtualization (SR-IOV) capability defined by PCI-SIG is intended to enable multiple system software to share PCI hardware resources. PCI device that supports this capability can be extended to one Physical Functions plus multiple Virtual Functions. Physical Function, which could be considered as the real PCI device, reflects the hardware instance and manages all physical resources. Virtual Functions are associated with a Physical Function and shares physical resources with the Physical Function. Software can control allocation of Virtual Functions via registers encapsulated in the capability. Following patches add SR-IOV capability support to the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective. [PATCH 1/4] PCI: export pci_read_base and add pci_update_base [PATCH 2/4] PCI: support ARI capability [PATCH 3/4] PCI: support SR-IOV capability [PATCH 4/4] PCI: document SR-IOV SR-IOV specification can be found at http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SR-IOV: patches are available for Linux kernel [1/4]
[PATCH 1/4] PCI: export pci_read_base and add pci_update_base Export pci_read_base; add pci_update_base to for PCI BAR update. Signed-off-by: Yu Zhao [EMAIL PROTECTED] Signed-off-by: Eddie Dong [EMAIL PROTECTED] --- drivers/pci/probe.c | 25 drivers/pci/setup-res.c | 74 +++ include/linux/pci.h | 12 3 files changed, 67 insertions(+), 44 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 7098dfb..d030996 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -181,13 +181,6 @@ static u64 pci_size(u64 base, u64 maxbas return size; } -enum pci_bar_type { - pci_bar_unknown,/* Standard PCI BAR probe */ - pci_bar_io, /* An io port BAR */ - pci_bar_mem32, /* A 32-bit memory BAR */ - pci_bar_mem64, /* A 64-bit memory BAR */ -}; - static inline enum pci_bar_type decode_bar(struct resource *res, u32 bar) { if ((bar PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) { @@ -202,11 +195,18 @@ static inline enum pci_bar_type decode_b return pci_bar_mem32; } -/* - * If the type is not unknown, we assume that the lowest bit is 'enable'. +/** + * pci_read_base - read a PCI BAR + * @dev: PCI device + * @type: type of BAR + * @res: resource buffer to be filled in + * @pos: BAR position in config space + * * Returns 1 if the BAR was 64-bit and 0 if it was 32-bit. + * + * If the type is not unknown, we assume that the lowest bit is 'enable'. */ -static int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, +int pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int pos) { u32 l, sz, mask; @@ -299,6 +299,7 @@ static int __pci_read_base(struct pci_de res-flags = 0; goto out; } +EXPORT_SYMBOL_GPL(pci_read_base); static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom) { @@ -307,7 +308,7 @@ static void pci_read_bases(struct pci_de for (pos = 0; pos howmany; pos++) { struct resource *res = dev-resource[pos]; reg = PCI_BASE_ADDRESS_0 + (pos 2); - pos += __pci_read_base(dev, pci_bar_unknown, res, reg); + pos += pci_read_base(dev, pci_bar_unknown, res, reg); } if (rom) { @@ -316,7 +317,7 @@ static void pci_read_bases(struct pci_de res-flags = IORESOURCE_MEM | IORESOURCE_PREFETCH | IORESOURCE_READONLY | IORESOURCE_CACHEABLE | IORESOURCE_SIZEALIGN; - __pci_read_base(dev, pci_bar_mem32, res, rom); + pci_read_base(dev, pci_bar_mem32, res, rom); } } diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index 1a5fc83..059de16 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -26,11 +26,20 @@ #include linux/slab.h #include pci.h -void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno) +/** + * pci_update_base - update a PCI BAR + * @dev: PCI device + * @type: type of BAR + * @res: resource info used to update BAR + * @pos: BAR position in config space + * + * If the type is not unknown, we assume that the lowest bit is 'enable'. + */ +void pci_update_base(struct pci_dev *dev, enum pci_bar_type type, + struct resource *res, unsigned int pos) { struct pci_bus_region region; u32 new, check, mask; - int reg; /* * Ignore resources for unimplemented BARs and unused resource slots @@ -49,56 +58,57 @@ void pci_update_resource(struct pci_dev pcibios_resource_to_bus(dev, region, res); - dev_dbg(dev-dev, BAR %d: got res [%#llx-%#llx] bus [%#llx-%#llx] - flags %#lx\n, resno, -(unsigned long long)res-start, -(unsigned long long)res-end, -(unsigned long long)region.start, -(unsigned long long)region.end, -(unsigned long)res-flags); + dev_dbg(dev-dev, BAR at %d: got res [%#llx-%#llx] bus [%#llx-%#llx] + flags %#lx\n, pos, (unsigned long long)res-start, + (unsigned long long)res-end, (unsigned long long)region.start, + (unsigned long long)region.end, (unsigned long)res-flags); new = region.start | (res-flags PCI_REGION_FLAG_MASK); + if (type != pci_bar_unknown) + new |= PCI_ROM_ADDRESS_ENABLE; + if (res-flags IORESOURCE_IO) mask = (u32)PCI_BASE_ADDRESS_IO_MASK; else mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; - if (resno 6) { - reg = PCI_BASE_ADDRESS_0 + 4 * resno; - } else if (resno == PCI_ROM_RESOURCE) { - if (!(res-flags IORESOURCE_ROM_ENABLE)) - return; - new |=
SR-IOV: patches are available for Linux kernel [2/4]
[PATCH 2/4] PCI: support ARI capability Support Alternative Routing-ID Interpretation (ARI), which increases the number of functions that can be supported by a PCIe endpoint. SR-IOV depends on ARI. PCI-SIG ARI specification can be found at http://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-interpretation-070604.pdf Signed-off-by: Yu Zhao [EMAIL PROTECTED] Signed-off-by: Eddie Dong [EMAIL PROTECTED] --- drivers/pci/Kconfig |7 + drivers/pci/Makefile |2 + drivers/pci/ari.c| 70 ++ drivers/pci/probe.c |3 ++ include/linux/pci.h | 29 +++ include/linux/pci_regs.h | 14 + 6 files changed, 125 insertions(+), 0 deletions(-) diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index e1ca425..f43cc46 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -50,3 +50,10 @@ config HT_IRQ This allows native hypertransport devices to use interrupts. If unsure say Y. + +config PCI_ARI + bool PCI ARI support + depends on PCI + default n + help + This enables PCI Alternative Routing-ID Interpretation. diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 7d63f8c..96f2767 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -53,3 +53,5 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o ifeq ($(CONFIG_PCI_DEBUG),y) EXTRA_CFLAGS += -DDEBUG endif + +obj-$(CONFIG_PCI_ARI) += ari.o diff --git a/drivers/pci/ari.c b/drivers/pci/ari.c new file mode 100644 index 000..e6dae1d --- /dev/null +++ b/drivers/pci/ari.c @@ -0,0 +1,70 @@ +/* + * drivers/pci/ari.c + * + * Copyright (C) 2008 Intel Corporation, Yu Zhao [EMAIL PROTECTED] + * + * PCI Express Alternative Routing-ID Interpretation capability support. + */ + +#include linux/pci.h + +#include pci.h + +/** + * pci_ari_enable_fwd - enable ARI forwarding + * @dev: PCI device + */ +void pci_ari_enable_fwd(struct pci_dev *dev) +{ + int pos; + u32 cap; + u16 ctrl; + + if (dev-pcie_type != PCI_EXP_TYPE_ROOT_PORT + dev-pcie_type != PCI_EXP_TYPE_DOWNSTREAM) + return; + + pos = pci_find_capability(dev, PCI_CAP_ID_EXP); + if (!pos) + return; + + pci_read_config_dword(dev, pos + PCI_EXP_DEVCAP2, cap); + + if (!(cap PCI_EXP_DEVCAP2_ARI)) + return; + + dev-ari_enabled = 1; + dev_info(dev-dev, ARI forwarding enabled.\n); + pci_read_config_word(dev, pos + PCI_EXP_DEVCTL2, ctrl); + if (ctrl PCI_EXP_DEVCTL2_ARI) + return; + + ctrl |= PCI_EXP_DEVCTL2_ARI; + pci_write_config_word(dev, pos + PCI_EXP_DEVCTL2, ctrl); +} +EXPORT_SYMBOL_GPL(pci_ari_enable_fwd); + +/** + * pci_ari_next_fn - find next function number + * @dev: PCI device + * + * Returns function number, and 0 if there are no higher numbered + * functions; returns negative on failure. + */ +int pci_ari_next_fn(struct pci_dev *dev) +{ + int pos; + u16 cap; + + if (dev-pcie_type != PCI_EXP_TYPE_ENDPOINT) + return -EINVAL; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI); + if (!pos) + return -EIO; + + pci_read_config_word(dev, pos + PCI_ARI_CAP, cap); + + return PCI_ARI_CAP_NFN(cap); +} +EXPORT_SYMBOL_GPL(pci_ari_next_fn); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index d030996..e8506fe 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1032,6 +1032,9 @@ int pci_scan_slot(struct pci_bus *bus, i int func, nr = 0; int scan_all_fns; + if (bus-self) + pci_ari_enable_fwd(bus-self); + scan_all_fns = pcibios_scan_all_fns(bus, devfn); for (func = 0; func 8; func++, devfn++) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 9ea3a1d..110779a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -214,6 +214,7 @@ #endif unsigned intbroken_parity_status:1; /* Device generates false positive parity */ unsigned intmsi_enabled:1; unsigned intmsix_enabled:1; + unsigned intari_enabled:1; /* ARI forwarding on bridge */ unsigned intis_managed:1; unsigned intis_pcie:1; pci_dev_flags_t dev_flags; @@ -1125,5 +1126,33 @@ static inline void pci_mmcfg_early_init( static inline void pci_mmcfg_late_init(void) { } #endif +#ifdef CONFIG_PCI_ARI +/** + * pci_ari_fwd_enabled - query ARI forwarding status + * @dev: PCI device + * + * Returns 1 if ARI forwarding is enabled, and 0 if it's not + * enabled; returns negative on failure. + */ +static inline int pci_ari_fwd_enabled(struct pci_dev *dev) +{ + return dev-ari_enabled; +} +extern void pci_ari_enable_fwd(struct pci_dev *dev); +extern int pci_ari_next_fn(struct pci_dev *dev); +#else +static inline int pci_ari_fwd_enabled(struct pci_dev *dev) +{ + return -EIO; +} +static
SR-IOV: patches are available for Linux kernel [3/4]
[PATCH 3/4] PCI: support SR-IOV capability Support SR-IOV capability. By default, this feature is not enabled and the Physical Function behaves as normal PCIe device. After it's enabled, each Virtual Function's PCI configuration space can be accessed using its own Bus, Device and Function Number (Routing ID). Each Virtual Function also has PCI Memory Space, which is used to map its own register set. Signed-off-by: Yu Zhao [EMAIL PROTECTED] Signed-off-by: Eddie Dong [EMAIL PROTECTED] --- drivers/pci/Kconfig |9 + drivers/pci/Makefile |2 drivers/pci/iov.c| 608 ++ drivers/pci/pci.h| 13 + include/linux/pci.h | 35 +++ include/linux/pci_regs.h | 20 ++ 6 files changed, 687 insertions(+), 0 deletions(-) diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index f43cc46..5000c3c 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -57,3 +57,12 @@ config PCI_ARI default n help This enables PCI Alternative Routing-ID Interpretation. + +config PCI_IOV + bool PCI SR-IOV support + depends on PCI + select PCI_MSI + select PCI_ARI + default n + help + This allows device drivers to enable Single Root I/O Virtualization. diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 96f2767..2dcefce 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -55,3 +55,5 @@ EXTRA_CFLAGS += -DDEBUG endif obj-$(CONFIG_PCI_ARI) += ari.o + +obj-$(CONFIG_PCI_IOV) += iov.o diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c new file mode 100644 index 000..0bd006f --- /dev/null +++ b/drivers/pci/iov.c @@ -0,0 +1,608 @@ +/* + * drivers/pci/iov.c + * + * Copyright (C) 2008 Intel Corporation, Yu Zhao [EMAIL PROTECTED] + * + * PCI Express Single Root I/O Virtualization capability support. + */ + +#include linux/ctype.h +#include linux/string.h +#include linux/pci.h +#include linux/delay.h +#include asm/page.h + +#include pci.h + +#define VF_PARAM_LEN 128 + +#define notify_phyfn(pf, event, arg) ({ \ + pf-iov-notify ? pf-iov-notify(pf, event, arg) : 0; \ +}) + +static int iov_set_nr_virtfn(struct pci_dev *, int); + + +static inline void vfid_to_bdf(struct pci_dev *pf, int vfid, u8 *bus, u8 *devfn) +{ + u16 bdf; + + bdf = (pf-bus-number 8) + pf-devfn + + pf-iov-offset + pf-iov-stride * vfid; + *bus = bdf 8; + *devfn = bdf 0xff; +} + +static inline int bdf_to_vfid(struct pci_dev *pf, u8 bus, u8 devfn) +{ + u16 bdf; + int vfid; + + bdf = ((bus - pf-bus-number) 8) + devfn - + pf-devfn - pf-iov-offset; + vfid = bdf / pf-iov-stride; + + if (bdf % pf-iov-stride || vfid = pf-iov-nr_virtfn) + return -EINVAL; + + return vfid; +} + +static struct pci_dev *iov_alloc_virtfn(struct pci_dev *pf, int vfid) +{ + int i; + u8 bus, devfn; + unsigned long size; + struct pci_dev *vf; + struct pci_bus *pb; + struct resource *res; + + vf = alloc_pci_dev(); + if (!vf) + return NULL; + + vfid_to_bdf(pf, vfid, bus, devfn); + + /* +* PF uses internal switch to route Type 1 configuration +* transaction to VF when VF resides on a different bus. +* So there is no explicit bridge device in this case. +*/ + pb = pci_find_bus(0, bus); + if (!pb) { + pb = pci_create_bus(pf-dev.parent, bus, + pf-bus-ops, pf-bus-sysdata); + if (!pb) + goto failed; + } + + vf-bus = pb; + vf-sysdata = pb-sysdata; + vf-dev.parent = pf-dev.parent; + vf-dev.bus = pf-dev.bus; + vf-devfn = devfn; + vf-hdr_type = PCI_HEADER_TYPE_NORMAL; + vf-multifunction = 0; + vf-vendor = pf-vendor; + vf-device = pf-iov-device; + vf-cfg_size = 4096; + vf-error_state = pci_channel_io_normal; + vf-pcie_type = PCI_EXP_TYPE_ENDPOINT; + vf-dma_mask = 0x; + + dev_set_name(vf-dev, %04x:%02x:%02x.%d, pci_domain_nr(pb), bus, +PCI_SLOT(devfn), PCI_FUNC(devfn)); + + pci_read_config_byte(vf, PCI_REVISION_ID, vf-revision); + vf-class = pf-class; + vf-current_state = PCI_UNKNOWN; + vf-irq = 0; + + for (i = 0; i PCI_IOV_NUM_BAR; i++) { + res = pf-iov-resource + i; + if (!res-parent) + continue; + size = resource_size(res) / pf-iov-nr_virtfn; + vf-resource[i].name = pci_name(vf); + vf-resource[i].parent = res-parent; + vf-resource[i].flags = res-flags; + vf-resource[i].start = res-start + size * vfid; + vf-resource[i].end = vf-resource[i].start + size - 1; + } + + vf-subsystem_vendor = pf-subsystem_vendor; +
SR-IOV: patches are available for Linux kernel [4/4]
[PATCH 4/4] PCI: document SR-IOV SR-IOV Documentation. Signed-off-by: Yu Zhao [EMAIL PROTECTED] Signed-off-by: Eddie Dong [EMAIL PROTECTED] --- Documentation/ABI/testing/sysfs-bus-pci | 13 ++ Documentation/PCI/00-INDEX |2 Documentation/PCI/pci-iov-howto.txt | 170 +++ 3 files changed, 185 insertions(+), 0 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ceddcff..9ada27b 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -9,3 +9,16 @@ Description: that some devices may have malformatted data. If the underlying VPD has a writable section then the corresponding section of this file will be writable. + +What: /sys/bus/pci/devices/.../iov +Date: August 2008 +Contact: Yu Zhao [EMAIL PROTECTED] +Description: + This file will appear when SR-IOV capability is enabled + by the device driver if supported. It holds number of + available Virtual Functions and Bus, Device, Function + number and status of these Virtual Functions that belong + to this device (Physical Function). This file can be + written using same format as what can be read out, to + change the number of available Virtual Functions and to + enable or disable a Virtual Functions. diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 49f4394..8f8ee17 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -10,3 +10,5 @@ pci.txt - info on the PCI subsystem for device driver authors pcieaer-howto.txt - the PCI Express Advanced Error Reporting Driver Guide HOWTO +pci-iov-howto.txt + - PCI Express Single Root I/O Virtualization HOWTO diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt new file mode 100644 index 000..2d7ae64 --- /dev/null +++ b/Documentation/PCI/pci-iov-howto.txt @@ -0,0 +1,170 @@ + PCI Express Single Root I/O Virtualization HOWTO + Copyright (C) 2008 Intel Corporation + Yu Zhao [EMAIL PROTECTED] + + +1. Overview + +1.1 What is SR-IOV + +SR-IOV is PCI Express Extended Capability, which makes one physical device +becomes multiple virtual devices. The physical device is referred as Physical +Function while the virtual devices are refereed as Virtual Functions. +Allocation of Virtual Functions can be dynamically controlled by Physical +Function via registers encapsulated in the capability. By default, this +feature is not enabled and the Physical Function behaves as traditional PCIe +device. Once it's turned on, each Virtual Function's PCI configuration space +can be accessed by its own Bus, Device and Function Number (Routing ID). And +each Virtual Function also has PCI Memory Space, which is used to map its +register set. Virtual Function device driver operates on the register set so +it can be functional and appear as a real existing PCI device. + +1.2 What is ARI + +Alternative Routing-ID Interpretation allows a PCI Express Endpoint to use +its device number field as part of function number. Traditionally, an +Endpoint can only have 8 functions, and the device number of all Endpoints +is zero. With ARI enabled, an Endpoint can have up to 256 functions. ARI is +managed via a ARI Forwarding bit in the Device Capabilities 2 register of +the PCI Express Capability on the Root Port or the Downstream Port and a new +ARI Capability on the Endpoint. + + +2. User Guide + +2.1 How can I manage SR-IOV + +SR-IOV can be managed by reading or writing /sys/bus/pci/devices/.../iov. +Legal operations on this file include: + - Read: will get number of available VFs and a list of them. + - Write: bb:dd.f={1|0} will enable or disable a VF. + - Write: NumVFs=N will change number of available VFs. + +2.2 How can I use Virtual Functions + +Virtual Functions can be treated as hot-plugged PCI devices in the kernel, +so they should be able to work in the same way as real PCI devices. +NOTE: Virtual Function device driver must be loaded to make it work. + + +3. Developer Guide + +3.1 SR-IOV APIs + +To enable SR-IOV, Physical Function device driver needs to call: + int pci_iov_enable(struct pci_dev *dev, int nvfs, + int (*cb)(struct pci_dev *, int, int)) +NOTE: this function sleeps 2 seconds waiting on hardware transaction +completion according to SR-IOV specification. + +To disable SR-IOV, Physical Function device driver needs to call: + void pci_iov_disable(struct pci_dev *dev) +NOTE: this function sleeps 1 second waiting on hardware transaction +completion according to SR-IOV specification. + +Following function can be used to query maximum number of Virtual Functions +that a Physical
Re: [et-mgmt-tools] Re: [libvirt] RE: [Qemu-devel] [ANNOUNCE] virt-mem tools version 0.2.8 released
On 11/08/2008, at 8:42 PM, Daniel P. Berrange wrote: On Mon, Aug 11, 2008 at 09:18:19AM +0100, Richard W.M. Jones wrote: On Mon, Aug 11, 2008 at 07:39:34PM +1200, james wrote: This is what libvirt gives you (and lots more, eg. secure remote access to hypervisors, bindings to Perl many other languages, etc.). Can you be more specfic about what you couldn't do with libvirt? I can give you such an example although I confess it could be due to my lack of understanding of the libvirt config. I have tried and tried to use libvirt to configure VMs within KVM using scsi disk images. Usually when tinkering/experimenting with RAID setups. It just will not take it. Starting a KVM based VM from the command line with the appropriate settings and I have no problems. This inability to use scsi within libvirt has been extremely frustrating until I took the plunge and went to to kvm command line. However if you can point out an example xml config for a VM using scsi disk images that works then that would be very cool Configuring SCSI disks with VMs in libvirt is no different to configuring any other kind of block based storage. The general description is here: http://libvirt.org/formatdomain.html#elementsDisks Specifically though you'd want a disk section looking like disk type='block' source file='/dev/sdf1'/ target dev='sda' bus='scsi'/ /disk NB, there is no restriction on mapping to the target bus -ie a SCSI disk in the host can be mapped to a IDE disk in the guest, and vica-verca. Also note that the 'dev' attribute on the target isn't a guarenteed device name in the guest - it is merely used for ordering of devices when spawning QEMU. Now, the main fun you'll have is actually outside of libvirt - namely that on Linux SCSI disk names are not guarenteed stable across reboots. So rather than using /dev/sdf1 you may want to consider one of the udev created stable paths under the directories /dev/disk/by-{id,path,uuid}, or if you are using a multipath enabled SAN, then a name under /dev/multipath/ Regards, Daniel To be clear I have not tried to use block devices but 5 image files as I am experimenting. I have tried to create VMs using raw image files under libvirt via the nice Virtual Machine Manager 0.5.3 and through hand crafted xml files. Neither method will register the images as scsi disks, it fails with an error. Getting the same disk images to be regarded as scsi disks via the kvm command line is fine or if I fall back to ide and limit. The ide test proves that the rest of the xml file is correct. Trying to define a domain using virsh using a disk section defined as: disk type='file' device='disk' source file='/home/someuser/scsi-test/scsi-disk1.img'/ target dev='sda' bus='scsi'/ /disk fails with this error: libvir: QEMU error : Invalid harddisk device name: sda Heck if I have missed the wood for the trees and there is a simple correction to this definition that will make it work I would be one happy camper. Cheers. -- James. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [et-mgmt-tools] Re: [libvirt] RE: [Qemu-devel] [ANNOUNCE] virt-mem tools version 0.2.8 released
On 12/08/2008, at 10:30 PM, Daniel P. Berrange wrote: Oh, that means the version of libvirt you are using is too old - it predates us adding support for the -drive argument, which is required in order to use SCSI disks. THis was added in libvirt 0.4.3 Regards, Daniel Ah - the penny drops. :-) I'm running Ubuntu 8.04 which seems to be using libvirt 0.4.0. I'm not really fussed about doing source build/installs so looks like I will have to wait until Intrepid is out which has libvirt 0.4.4 as part of it. Thanks heaps for clearing that up for me, much appreciated. --James. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Announcing: Open OVF project source code availibility
Announcing the open-ovf project and source code availibility. Hi folks, we are announcing the availibility of source code for the open-ovf project. OVF is a standard packaging format for virtual machines and software appliances. The open-ovf project is seeking contributors and users to help establish OVF as a transparent and platform-nuetral method for packaing virtual machine images. We anticipate being able to deploy a single OVF package to either Xen or KVM. Eventually expanding that list to include VMware, Hyper-V, and other platforms. Getting to that point will require community contributions. For a quick summary of OVF and the open source project, see the .pdf from the recent Xen summit located at: http://www.xen.org/files/xensummitboston08/open-ovf-proposal.pdf The open-ovf project is hosted at sourceforge: http://open-ovf.wiki.sourceforge.net/ And source code is hosted in a git repository: http://gitorious.org/projects/open-ovf/repos Scott Moser [EMAIL PROTECTED] is the project maintainer. Please feel free to respond to this email if you have any questions! Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: [EMAIL PROTECTED] AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing: Open OVF project source code availibility
On Tue, Aug 12, 2008 at 10:34:33AM -0400, Mike D. Day wrote: Announcing the open-ovf project and source code availibility. Hi folks, we are announcing the availibility of source code for the open-ovf project. Why was the Eclipse Public License chosen ? This license is not compatible with the GPL[1], so no GPL licensed app can make use of this code :-( For example it makes it impossible to use this code to assist in supporting OVF in virt-manager, virt-install or virt-image http://en.wikipedia.org/wiki/Eclipse_Public_License The EPL 1.0 is not compatible with the GPL, and a work created by combining a work licensed under the GPL with a work licensed under the EPL cannot be lawfully distributed. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing: Open OVF project source code availibility
On Tue, Aug 12, 2008 at 11:26:34AM -0400, Mike Day wrote: On 12/08/08 15:46 +0100, Daniel P. Berrange wrote: Why was the Eclipse Public License chosen ? This license is not compatible with the GPL[1], so no GPL licensed app can make use of this code :-( For example it makes it impossible to use this code to assist in supporting OVF in virt-manager, virt-install or virt-image http://en.wikipedia.org/wiki/Eclipse_Public_License The EPL 1.0 is not compatible with the GPL, and a work created by combining a work licensed under the GPL with a work licensed under the EPL cannot be lawfully distributed. As a file format, the output from open-ovf can be used by any code of any license. That is, any code can consume an OVF file. Yep, that's not what I'm concerned with. For runtime integration, we are planning on implementing an XML-RPC interface into and out of the ovf library, which will allow any other runtime code to interact with it. It is the runtime integration I was refering to. The open-ovf tools are all python. So is virt-manage/install/image. If these tools were under a license that were compatible with usage from a GPL licensed app, then I could simply call into the OVF APIs. Building an XML-RPC interface just to call into functions for manipulating OVF files is rather overkill. But perhaps you don't intend to maintain the python libraries as an official 'public' API for other apps to use ? Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Management : Public Bridge setup
Neat. Thanks /Jd --- On Mon, 8/11/08, Alberto Treviño [EMAIL PROTECTED] wrote: From: Alberto Treviño [EMAIL PROTECTED] Subject: Re: KVM Management : Public Bridge setup To: [EMAIL PROTECTED] [EMAIL PROTECTED] Cc: KVM List kvm@vger.kernel.org Date: Monday, August 11, 2008, 8:12 PM On Friday 08 August 2008 08:18:26 pm jd wrote: The /etc/qemu-if script seems to be taking the interface as a parameter, is there a way to pass the bridge name to the script as well. Thus from the command line one can pass the bridge that one whats to attach to. (One can potentially one script for each bridge and pass the script name to qemu... but it seems bit unnecessary.) I name my tap devices (using ifname=...) as follows: tap.br0.05.1 where tap -- this is a tap device br0 -- the name of the bridge 05 -- the ID of the VM (each of my VM's has a unique numeric ID) 1-- The NIC # on the VM (1, 2, 3, or 4 depending on # of NIC's) Then on my scripts I can extract the bridge name and use it. This also has the advantage of helping me know just by the name what bridge this devices is hooked up to, what VM it belongs to, and which NIC it is inside the VM. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr
Hi Avi, Thanks for catching the build error in this one. Here's a new (yet uglier) version; the rest remain the same. Cheers, Mark. Subject: [PATCH 08/11] kvm: qemu: Don't require all drivers to use virtio_net_hdr The virtio-net driver is the only one which wishes to deal with virtio_net_hdr headers, so add a using_vnet_hdr flag to allow it to indicate this. Preferably, we'd prefer to only enable IFF_VNET_HDR when we're using virtio-net, but qemu's various abstractions would make this very messy. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c |1 + qemu/net.h |1 + qemu/vl.c| 87 +++-- 3 files changed, 78 insertions(+), 11 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 9b1298e..2298316 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -97,6 +97,7 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev) uint32_t features = (1 VIRTIO_NET_F_MAC); if (tap_has_vnet_hdr(host)) { + tap_using_vnet_hdr(host, 1); features |= (1 VIRTIO_NET_F_CSUM); features |= (1 VIRTIO_NET_F_GUEST_CSUM); features |= (1 VIRTIO_NET_F_GUEST_TSO4); diff --git a/qemu/net.h b/qemu/net.h index ae1a338..4891669 100644 --- a/qemu/net.h +++ b/qemu/net.h @@ -46,6 +46,7 @@ void qemu_handler_true(void *opaque); void do_info_network(void); int tap_has_vnet_hdr(void *opaque); +void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr); int net_client_init(const char *device, const char *opts); void net_client_uninit(NICInfo *nd); diff --git a/qemu/vl.c b/qemu/vl.c index f5aacf0..63e21f2 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -4347,6 +4347,10 @@ int tap_has_vnet_hdr(void *opaque) return 0; } +void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr) +{ +} + #else /* !defined(_WIN32) */ #ifndef IFF_VNET_HDR @@ -4367,10 +4371,11 @@ typedef struct TAPState { char buf[TAP_BUFSIZE]; int size; unsigned int has_vnet_hdr : 1; +unsigned int using_vnet_hdr : 1; } TAPState; -static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov, - int iovcnt) +static ssize_t tap_writev(void *opaque, const struct iovec *iov, + int iovcnt) { TAPState *s = opaque; ssize_t len; @@ -4382,17 +4387,51 @@ static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov, return len; } +static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov, + int iovcnt) +{ +#ifdef IFF_VNET_HDR +TAPState *s = opaque; + +if (s-has_vnet_hdr !s-using_vnet_hdr) { + struct iovec *iov_copy; + struct virtio_net_hdr hdr = { 0, }; + + iov_copy = alloca(sizeof(struct iovec) * (iovcnt + 1)); + + iov_copy[0].iov_base = hdr; + iov_copy[0].iov_len = sizeof(hdr); + + memcpy(iov_copy[1], iov, sizeof(struct iovec) * iovcnt); + + return tap_writev(opaque, iov_copy, iovcnt + 1); +} +#endif + +return tap_writev(opaque, iov, iovcnt); +} + static void tap_receive(void *opaque, const uint8_t *buf, int size) { +struct iovec iov[2]; +int i = 0; + +#ifdef IFF_VNET_HDR TAPState *s = opaque; -int ret; -for(;;) { -ret = write(s-fd, buf, size); -if (ret 0 (errno == EINTR || errno == EAGAIN)) { -} else { -break; -} +struct virtio_net_hdr hdr = { 0, }; + +if (s-has_vnet_hdr !s-using_vnet_hdr) { + iov[i].iov_base = hdr; + iov[i].iov_len = sizeof(hdr); + i++; } +#endif + +iov[i].iov_base = (char *) buf; +iov[i].iov_len = size; +i++; + +tap_writev(opaque, iov, i); } static int tap_can_send(void *opaque) @@ -4421,6 +4460,21 @@ static int tap_can_send(void *opaque) return can_receive; } +static int tap_send_packet(TAPState *s) +{ +uint8_t *buf = s-buf; +int size = s-size; + +#ifdef IFF_VNET_HDR +if (s-has_vnet_hdr !s-using_vnet_hdr) { + buf += sizeof(struct virtio_net_hdr); + size -= sizeof(struct virtio_net_hdr); +} +#endif + +return qemu_send_packet(s-vc, buf, size); +} + static void tap_send(void *opaque) { TAPState *s = opaque; @@ -4430,7 +4484,7 @@ static void tap_send(void *opaque) int err; /* If noone can receive the packet, buffer it */ - err = qemu_send_packet(s-vc, s-buf, s-size); + err = tap_send_packet(s); if (err == -EAGAIN) return; } @@ -4454,7 +4508,7 @@ static void tap_send(void *opaque) int err; /* If noone can receive the packet, buffer it */ - err = qemu_send_packet(s-vc, s-buf, s-size); + err = tap_send_packet(s); if (err == -EAGAIN) break; } @@ -4469,6 +4523,17 @@ int tap_has_vnet_hdr(void *opaque) return s ? s-has_vnet_hdr : 0; } +void tap_using_vnet_hdr(void
Re: [PATCH 0/12] virtio_net perf patches
On Mon, 2008-08-11 at 15:30 -0500, Anthony Liguori wrote: Mark McLoughlin wrote: Hi Avi, Here's the set of patches that I think make sense to apply. We probably need to disable checksum offload on the RX side until we figure out what to do about the broken dhclient. That's going to hit a lot of users otherwise. If I could reproduce this, I'd get right on it ... but I'm not seeing the issue here. Wait, wait, wait. Bells are going off all of a sudden :-) Yes, I've been through this before. See: https://bugzilla.redhat.com/231444 So, we've had this long-standing dhclient patch in Fedora: http://cvs.fedoraproject.org/viewcvs/*checkout*/rpms/dhcp/devel/dhcp-4.0.0-xen-checksum.patch Herbert - any clue why this isn't upstream? That's quite surprising ... Ah, I see Rusty moved this to linux-netdev without cc-ing: http://marc.info/?l=linux-netdevm=121844837826895 Herbert wrote: One easy way of doing this is to hook up the rx checksum offload option in the guest with the tx offload option in the host. In other words, when rx offload is enabled in the guest we enable tx offload in the host, and disable it vice versa. Are you basically just saying that guests with a broken dhclient should manually disable rx checksum offload with ethtool? And that the host should react by disabling tx offload on the tap interfacE? Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Fix cpu-emulation mode builds
This patchset fixes a few issues to keep the QEMU tree buildable in cpu-emulation mode. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Fix --disable-kvm build
This patch fixes a few locations where --disable-kvm was not enforced when applicable. Signed-off-by: Philippe Gerum [EMAIL PROTECTED] --- qemu/gdbstub.c |2 +- qemu/hw/acpi.c |6 +- qemu/hw/vga.c |2 ++ qemu/qemu-kvm.h |4 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/qemu/gdbstub.c b/qemu/gdbstub.c index d828844..2af7830 100644 --- a/qemu/gdbstub.c +++ b/qemu/gdbstub.c @@ -33,8 +33,8 @@ #include qemu-char.h #include sysemu.h #include gdbstub.h -#include qemu-kvm.h #endif +#include qemu-kvm.h #include qemu_socket.h #ifdef _WIN32 diff --git a/qemu/hw/acpi.c b/qemu/hw/acpi.c index e3cd8d7..35bac86 100644 --- a/qemu/hw/acpi.c +++ b/qemu/hw/acpi.c @@ -23,10 +23,8 @@ #include sysemu.h #include i2c.h #include smbus.h -#ifdef USE_KVM -#include qemu-kvm.h -#endif #include string.h +#include qemu-kvm.h //#define DEBUG @@ -723,9 +721,7 @@ void qemu_system_cpu_hot_add(int cpu, int state) fprintf(stderr, cpu %d creation failed\n, cpu); return; } -#ifdef USE_KVM kvm_init_new_ap(cpu, env); -#endif } qemu_set_irq(pm_state-irq, 1); diff --git a/qemu/hw/vga.c b/qemu/hw/vga.c index 95d6033..f5c472c 100644 --- a/qemu/hw/vga.c +++ b/qemu/hw/vga.c @@ -1981,6 +1981,7 @@ typedef struct PCIVGAState { VGAState vga_state; } PCIVGAState; +#ifdef USE_KVM void vga_update_vram_mapping(VGAState *s, unsigned long vga_ram_begin, unsigned long vga_ram_end) { @@ -2010,6 +2011,7 @@ void vga_update_vram_mapping(VGAState *s, unsigned long vga_ram_begin, s-map_end = vga_ram_end; } } +#endif static void vga_map(PCIDevice *pci_dev, int region_num, uint32_t addr, uint32_t size, int type) diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h index 7e28428..9ba81a3 100644 --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -114,6 +114,10 @@ extern kvm_context_t kvm_context; #define kvm_enabled() (0) #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_cpu_env(cpu) ({ (void)cpu; NULL; }) +#define kvm_save_registers(cpu)do { (void)cpu; } while(0) +#define kvm_load_registers(env)do { (void)env; } while(0) +#define kvm_init_new_ap(cpu, env) do { (void)cpu; (void)env; } while(0) #endif void kvm_mutex_unlock(void); -- 1.5.4.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Use proper open call for init file
This patch fixes misspelled calls to qemu_fopen_file(). Signed-off-by: Philippe Gerum [EMAIL PROTECTED] --- qemu/hw/ds1225y.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu/hw/ds1225y.c b/qemu/hw/ds1225y.c index 3b91b4f..b1d0284 100644 --- a/qemu/hw/ds1225y.c +++ b/qemu/hw/ds1225y.c @@ -171,13 +171,13 @@ void *ds1225y_init(target_phys_addr_t mem_base, const char *filename) s-protection = 7; /* Read current file */ -file = qemu_fopen(filename, rb); +file = qemu_fopen_file(filename, rb); if (file) { /* Read nvram contents */ qemu_get_buffer(file, s-contents, s-chip_size); qemu_fclose(file); } -s-file = qemu_fopen(filename, wb); +s-file = qemu_fopen_file(filename, wb); if (s-file) { /* Write back contents, as 'wb' mode cleaned the file */ qemu_put_buffer(s-file, s-contents, s-chip_size); -- 1.5.4.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/12] virtio_net perf patches
On Tue, Aug 12, 2008 at 07:12:28PM +0100, Mark McLoughlin wrote: Are you basically just saying that guests with a broken dhclient should manually disable rx checksum offload with ethtool? And that the host should react by disabling tx offload on the tap interfacE? No I'm saying that everybody should default to no checksum offload. Those that have working kernels + user-space can then enable it on boot-up. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/9] Memory registration rework
Hi folks, The following series contain a proposal for our memory registration framework. This is by no means complete, and rather, a first step only. This first step, btw, has the goal of taking the kvm-specific memory registration functions from all over the code, so we can make the merging with qemu easier. Note that I'm putting kvm_cpu_register_phys_memory() _inside_ cpu_register_phys_memory(). To do that, we need to be resilient against the same region being registered multiple times, and should be able to interpret the flags embedded in phys_offset in a meaninful way. Although arguably with some bugs yet unknown, this series does exactly that. For that to work, we have to be sure that we'll never reach a situation in which we register a piece of memory, and later on, register another region that contains it. Current code does that, so we're fine. The oposite situation, namely, registering a large piece of memory and then re-registering pieces of it, is perfectly valid. In the to-be-merged version, if it ever exists, I intend to comment all those issues very well, to get an as predictable interface as possible. There's another option of doing this, as anthony pointed out in earlier private comments to me, which is scanning the already registered regions right before starting execution, and building our maps. While this is valid, we can't run away from doing what I'm doing, because some areas are manipulated _after_ the machine has started. For example, the pci region, for the hotplug case. Note that this is not tested in anything but x86. Comments welcome. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] coalesce mmio regions without an explicit call
Try to coalesce mmio regions inside kvm_cpu_register_physical_memory(). Coalescing is done if area has TLB_MMIO flags set, or anything greater than that. The original explicit function turns into an empty function. This is to be bisection friendly. Direct calls are to be removed in a later commit. Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- libkvm/libkvm.c |4 qemu/qemu-kvm.c | 13 +++-- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index d62cb2a..c563bb6 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -1107,6 +1107,9 @@ int kvm_register_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t size) perror(kvm_register_coalesced_mmio_zone); return -errno; } +#ifdef DEBUG_MEMREG + printf(Registered coalesced mmio region for %llx (%lx)\n, addr, size); +#endif return 0; } #endif @@ -1129,6 +1132,7 @@ int kvm_unregister_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t siz perror(kvm_unregister_coalesced_mmio_zone); return -errno; } + printf(Unregistered coalesced mmio region for %llx (%lx)\n, addr, size); return 0; } #endif diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 96b622b..29c5c1d 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -800,6 +800,7 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr, printf(No free mmio slots\n); exit(1); } +kvm_register_coalesced_mmio(kvm_context, start_addr, size); return; } r = kvm_is_intersecting_mem(kvm_context, start_addr); @@ -1035,13 +1036,5 @@ void kvm_mutex_lock(void) cpu_single_env = NULL; } -int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr, unsigned int size) -{ -return kvm_register_coalesced_mmio(kvm_context, addr, size); -} - -int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr, - unsigned int size) -{ -return kvm_unregister_coalesced_mmio(kvm_context, addr, size); -} +int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr, unsigned int size) {} +int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr, unsigned int size) {} -- 1.5.5.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] do not use mem_hole anymore.
Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- libkvm/libkvm.c | 78 --- qemu/qemu-kvm.c | 11 --- 2 files changed, 6 insertions(+), 83 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index 0bacb43..eb85445 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -504,84 +504,6 @@ int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start, return 0; } -int kvm_create_mem_hole(kvm_context_t kvm, unsigned long phys_start, - unsigned long len) -{ -#ifdef KVM_CAP_USER_MEMORY - int slot; - int r; - struct kvm_userspace_memory_region rmslot; - struct kvm_userspace_memory_region newslot1; - struct kvm_userspace_memory_region newslot2; - - len = (len + PAGE_SIZE - 1) PAGE_MASK; - - slot = get_intersecting_slot(phys_start); - /* no need to create hole, as there is already hole */ - if (slot == -1) - return 0; - - memset(rmslot, 0, sizeof(struct kvm_userspace_memory_region)); - memset(newslot1, 0, sizeof(struct kvm_userspace_memory_region)); - memset(newslot2, 0, sizeof(struct kvm_userspace_memory_region)); - - rmslot.guest_phys_addr = slots[slot].phys_addr; - rmslot.slot = slot; - - newslot1.guest_phys_addr = slots[slot].phys_addr; - newslot1.memory_size = phys_start - slots[slot].phys_addr; - newslot1.slot = slot; - newslot1.userspace_addr = slots[slot].userspace_addr; - newslot1.flags = slots[slot].flags; - - newslot2.guest_phys_addr = newslot1.guest_phys_addr + - newslot1.memory_size + len; - newslot2.memory_size = slots[slot].phys_addr + - slots[slot].len - newslot2.guest_phys_addr; - newslot2.userspace_addr = newslot1.userspace_addr + - newslot1.memory_size; - newslot2.slot = get_free_slot(kvm); - newslot2.flags = newslot1.flags; - - r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, rmslot); - if (r == -1) { - fprintf(stderr, kvm_create_mem_hole: %s\n, strerror(errno)); - return -1; - } - free_slot(slot); - -#ifdef DEBUG_MEMREG - printf(%s, newslot1: gpa: %llx, size: %llx, uaddr: %llx, slot: %x, flags: %lx\n, - __func__, newslot1.guest_phys_addr, newslot1.memory_size, - newslot1.userspace_addr, newslot1.slot, newslot1.flags); -#endif - - r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, newslot1); - if (r == -1) { - fprintf(stderr, kvm_create_mem_hole: %s\n, strerror(errno)); - return -1; - } - register_slot(newslot1.slot, newslot1.guest_phys_addr, - newslot1.memory_size, 1, newslot1.userspace_addr, - newslot1.flags); - -#ifdef DEBUG_MEMREG - printf(%s, newslot2: gpa: %llx, size: %llx, uaddr: %llx, slot: %x, flags: %lx\n, - __func__, newslot2.guest_phys_addr, newslot2.memory_size, - newslot2.userspace_addr, newslot2.slot, newslot2.flags); -#endif - r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, newslot2); - if (r == -1) { - fprintf(stderr, kvm_create_mem_hole: %s\n, strerror(errno)); - return -1; - } - register_slot(newslot2.slot, newslot2.guest_phys_addr, - newslot2.memory_size, 1, newslot2.userspace_addr, - newslot2.flags); -#endif - return 0; -} - int kvm_register_userspace_phys_mem(kvm_context_t kvm, unsigned long phys_start, void *userspace_addr, unsigned long len, int log) diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index f24c436..e6221f8 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -787,11 +787,12 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr, if (r) return; r = kvm_is_intersecting_mem(kvm_context, start_addr); -if (r) -kvm_create_mem_hole(kvm_context, start_addr, size); -r = kvm_register_userspace_phys_mem(kvm_context, start_addr, -phys_ram_base + phys_offset, -size, 0); +if (r) { +printf(Ignoring intersecting memory %llx (%lx)\n, start_addr, size); +} else +r = kvm_register_userspace_phys_mem(kvm_context, start_addr, +phys_ram_base + phys_offset, +size, 0); if (r 0) { printf(kvm_cpu_register_physical_memory: failed\n); exit(1); -- 1.5.5.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] remove explicit calls to kvm_qemu_register_coalesced_mmio
It is now done automatically for IO regions inside kvm_cpu_register_physical_memory(). --- qemu/hw/cirrus_vga.c |2 -- qemu/hw/e1000.c | 12 qemu/hw/pci.c|3 --- qemu/hw/vga.c|4 4 files changed, 0 insertions(+), 21 deletions(-) diff --git a/qemu/hw/cirrus_vga.c b/qemu/hw/cirrus_vga.c index c7e8f2c..42bca4f 100644 --- a/qemu/hw/cirrus_vga.c +++ b/qemu/hw/cirrus_vga.c @@ -3291,8 +3291,6 @@ static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci) cirrus_vga_mem_write, s); cpu_register_physical_memory(isa_mem_base + 0x000a, 0x2, vga_io_memory); -if (kvm_enabled()) -qemu_kvm_register_coalesced_mmio(isa_mem_base + 0x000a, 0x2); s-sr[0x06] = 0x0f; if (device_id == CIRRUS_ID_CLGD5446) { diff --git a/qemu/hw/e1000.c b/qemu/hw/e1000.c index 8d60ea6..ce3496b 100644 --- a/qemu/hw/e1000.c +++ b/qemu/hw/e1000.c @@ -951,18 +951,6 @@ e1000_mmio_map(PCIDevice *pci_dev, int region_num, d-mmio_base = addr; cpu_register_physical_memory(addr, PNPMMIO_SIZE, d-mmio_index); - -if (kvm_enabled()) { - int i; -uint32_t excluded_regs[] = { -E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS, -E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE -}; -qemu_kvm_register_coalesced_mmio(addr, excluded_regs[0]); -for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++) -qemu_kvm_register_coalesced_mmio(addr + excluded_regs[i] + 4, - excluded_regs[i + 1] - excluded_regs[i] - 4); -} } static int diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c index 92683d1..f58c634 100644 --- a/qemu/hw/pci.c +++ b/qemu/hw/pci.c @@ -324,9 +324,6 @@ static void pci_update_mappings(PCIDevice *d) cpu_register_physical_memory(pci_to_cpu_addr(r-addr), r-size, IO_MEM_UNASSIGNED); -if (kvm_enabled()) -qemu_kvm_unregister_coalesced_mmio(r-addr, - r-size); } } r-addr = new_addr; diff --git a/qemu/hw/vga.c b/qemu/hw/vga.c index 95d6033..17b5a36 100644 --- a/qemu/hw/vga.c +++ b/qemu/hw/vga.c @@ -2257,8 +2257,6 @@ void vga_init(VGAState *s) vga_io_memory = cpu_register_io_memory(0, vga_mem_read, vga_mem_write, s); cpu_register_physical_memory(isa_mem_base + 0x000a, 0x2, vga_io_memory); -if (kvm_enabled()) -qemu_kvm_register_coalesced_mmio(isa_mem_base + 0x000a, 0x2); } /* Memory mapped interface */ @@ -2334,8 +2332,6 @@ static void vga_mm_init(VGAState *s, target_phys_addr_t vram_base, cpu_register_physical_memory(ctrl_base, 0x10, s_ioport_ctrl); s-bank_offset = 0; cpu_register_physical_memory(vram_base + 0x000a, 0x2, vga_io_memory); -if (kvm_enabled()) -qemu_kvm_register_coalesced_mmio(vram_base + 0x000a, 0x2); } int isa_vga_init(DisplayState *ds, uint8_t *vga_ram_base, -- 1.5.5.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] move kvm_cpu_register_memory_area into qemu's
Turn the explicit calls to kvm_cpu_register_memoy_area() an empty function. Provide a __kvm_cpu_register_memory_area() that is called from within cpu_register_memory_area(). To avoid registering mmio regions to the hypervisor, since we depend on them faulting, we keep track of what regions are mmio regions too. This is to be bisection friendly. Direct calls are to be removed in a later commit. Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- libkvm/libkvm.c | 84 -- libkvm/libkvm.h |6 qemu/exec.c |3 ++ qemu/qemu-kvm.c | 22 ++ 4 files changed, 111 insertions(+), 4 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index c885dee..d62cb2a 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -65,14 +65,22 @@ struct slot_info { unsigned flags; }; +struct mmio_slot_info { +uint64_t phys_addr; +unsigned int len; +}; + struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS]; +struct mmio_slot_info mmio_slots[KVM_MAX_NUM_MEM_REGIONS]; void init_slots(void) { int i; - for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) + for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) { slots[i].len = 0; + mmio_slots[i].len = 0; + } } int get_free_slot(kvm_context_t kvm) @@ -102,6 +110,16 @@ int get_free_slot(kvm_context_t kvm) return -1; } +int get_free_mmio_slot(kvm_context_t kvm) +{ + + unsigned int i; + for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) + if (!mmio_slots[i].len) + return i; + return -1; +} + void register_slot(int slot, unsigned long phys_addr, unsigned long len, int user_alloc, unsigned long userspace_addr, unsigned flags) { @@ -153,14 +171,47 @@ int get_container_slot(uint64_t phys_addr, unsigned long size) return -1; } +int get_container_mmio_slot(kvm_context_t kvm, uint64_t phys_addr, unsigned long size) +{ + int i; + + for (i = 0; i KVM_MAX_NUM_MEM_REGIONS ; ++i) + if (mmio_slots[i].len mmio_slots[i].phys_addr = phys_addr + (mmio_slots[i].phys_addr + mmio_slots[i].len) = phys_addr + size) + return i; + return -1; +} + +int kvm_register_mmio_slot(kvm_context_t kvm, uint64_t phys_addr, unsigned int size) +{ + int slot = get_free_mmio_slot(kvm); + + if (slot == -1) + goto out; + +#ifdef DEBUG_MEMREG + printf(Registering mmio region %llx (%lx)\n, phys_addr, size); +#endif + mmio_slots[slot].phys_addr = phys_addr; + mmio_slots[slot].len = size; +out: + return slot; +} + int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, unsigned long size) { int slot = get_container_slot(phys_addr, size); - if (slot == -1) - return 0; - return 1; + + if (slot != -1) + return 1; + slot = get_container_mmio_slot(kvm, phys_addr, size); + if (slot != -1) + return 1; + + return 0; } + /* * dirty pages logging control */ @@ -576,6 +627,31 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long phys_start, kvm_create_kernel_phys_mem(kvm, phys_start, 0, 0, 0); } +void kvm_unregister_memory_area(kvm_context_t kvm, uint64_t phys_addr, unsigned long size) +{ + + int slot = get_container_slot(phys_addr, size); + + if (slot != -1) { +#ifdef DEBUG_MEMREG + printf(Unregistering memory region %llx (%lx)\n, phys_addr, size); +#endif + kvm_destroy_phys_mem(kvm, phys_addr, size); + return; + } + + slot = get_container_mmio_slot(kvm, phys_addr, size); + if (slot != -1) { +#ifdef DEBUG_MEMREG + printf(Unregistering mmio region %llx (%lx)\n, phys_addr, size); +#endif + kvm_unregister_coalesced_mmio(kvm, phys_addr, size); + mmio_slots[slot].len = 0; + } + + return; +} + static int kvm_get_map(kvm_context_t kvm, int ioctl_num, int slot, void *buf) { int r; diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h index d762323..ceadc45 100644 --- a/libkvm/libkvm.h +++ b/libkvm/libkvm.h @@ -454,6 +454,10 @@ void *kvm_create_phys_mem(kvm_context_t, unsigned long phys_start, unsigned long len, int log, int writable); void kvm_destroy_phys_mem(kvm_context_t, unsigned long phys_start, unsigned long len); + +void kvm_unregister_memory_area(kvm_context_t, uint64_t phys_start, + unsigned long len); + int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_start, unsigned long size); int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start, unsigned long len); @@ -467,6 +471,8 @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr,
[PATCH 5/9] substitute is_allocated_mem with more general is_containing_region
Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- libkvm/libkvm.c | 34 +- libkvm/libkvm.h |2 +- qemu/qemu-kvm.c |2 +- 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index 33f00b7..c885dee 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -140,6 +140,27 @@ int get_intersecting_slot(unsigned long phys_addr) return -1; } +/* Returns -1 if this slot is not totally contained on any other, + * and the number of the slot otherwise */ +int get_container_slot(uint64_t phys_addr, unsigned long size) +{ + int i; + + for (i = 0; i KVM_MAX_NUM_MEM_REGIONS ; ++i) + if (slots[i].len slots[i].phys_addr = phys_addr + (slots[i].phys_addr + slots[i].len) = phys_addr + size) + return i; + return -1; +} + +int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, unsigned long size) +{ + int slot = get_container_slot(phys_addr, size); + if (slot == -1) + return 0; + return 1; +} + /* * dirty pages logging control */ @@ -491,19 +512,6 @@ int kvm_is_intersecting_mem(kvm_context_t kvm, unsigned long phys_start) return get_intersecting_slot(phys_start) != -1; } -int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start, -unsigned long len) -{ - int slot; - - slot = get_slot(phys_start); - if (slot == -1) - return 0; - if (slots[slot].len == len) - return 1; - return 0; -} - int kvm_register_userspace_phys_mem(kvm_context_t kvm, unsigned long phys_start, void *userspace_addr, unsigned long len, int log) diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h index 9f06fcc..d762323 100644 --- a/libkvm/libkvm.h +++ b/libkvm/libkvm.h @@ -454,7 +454,7 @@ void *kvm_create_phys_mem(kvm_context_t, unsigned long phys_start, unsigned long len, int log, int writable); void kvm_destroy_phys_mem(kvm_context_t, unsigned long phys_start, unsigned long len); -int kvm_is_intersecting_mem(kvm_context_t kvm, unsigned long phys_start); +int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_start, unsigned long size); int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start, unsigned long len); int kvm_create_mem_hole(kvm_context_t kvm, unsigned long phys_start, diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index e6221f8..bfbaacc 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -783,7 +783,7 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr, r = kvm_check_extension(kvm_context, KVM_CAP_USER_MEMORY); if (r) { phys_offset = ~IO_MEM_ROM; -r = kvm_is_allocated_mem(kvm_context, start_addr, size); +r = kvm_is_containing_region(kvm_context, start_addr, size); if (r) return; r = kvm_is_intersecting_mem(kvm_context, start_addr); -- 1.5.5.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: qemu: Add missing DEPLIBS in Makefile.target
From 50a27ca42a565579e78e3545ca097a65a88cbadd Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Wed, 13 Aug 2008 11:35:17 +0800 Subject: [PATCH] kvm: qemu: Add missing DEPLIBS in Makefile.target Seems this flags is missing during merging long ago... And this result in updating of libkvm.a didn't affect make process of qemu, which is very puzzling... Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/Makefile.target |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu/Makefile.target b/qemu/Makefile.target index f4df081..c09af57 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -741,7 +741,7 @@ firmware.o: firmware.c $(CC) $(HELPER_CFLAGS) $(CPPFLAGS) $(BASE_CFLAGS) -c -o $@ $ endif -$(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a +$(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a $(DEPLIBS) $(CC) $(LDFLAGS) -o $@ $^ $(LIBS) $(SDL_LIBS) $(COCOA_LIBS) $(CURSES_LIBS) $(BRLAPI_LIBS) $(VDE_LIBS) endif # !CONFIG_USER_ONLY -- 1.5.4.5 From 50a27ca42a565579e78e3545ca097a65a88cbadd Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Wed, 13 Aug 2008 11:35:17 +0800 Subject: [PATCH] kvm: qemu: Add missing DEPLIBS in Makefile.target Seems this flags is missing during merging long ago... And this result in updating of libkvm.a didn't affect make process of qemu, which is very puzzling... Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/Makefile.target |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu/Makefile.target b/qemu/Makefile.target index f4df081..c09af57 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -741,7 +741,7 @@ firmware.o: firmware.c $(CC) $(HELPER_CFLAGS) $(CPPFLAGS) $(BASE_CFLAGS) -c -o $@ $ endif -$(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a +$(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a $(DEPLIBS) $(CC) $(LDFLAGS) -o $@ $^ $(LIBS) $(SDL_LIBS) $(COCOA_LIBS) $(CURSES_LIBS) $(BRLAPI_LIBS) $(VDE_LIBS) endif # !CONFIG_USER_ONLY -- 1.5.4.5
Serial problems
Hi all, Here at my work we develop our code for an embedded system in linux, but the application we use to put firmware onto the physical device runs only under windows. I am investigating the feasibility of running windows as a virtual machine, rather than our current situation where every developer has two boxes at their desk. The new boxes at work don't come with a serial port, so I am trying with a USB to serial converter and running kvm like: kvm -hda windows2.img -boot c -m 1000 -serial /dev/ttyUSB0 -smp 2 -usb -usbdevice tablet -full-screen -cdrom /dev/cdrom I can do low-cpu tasks with the embedded device like reading the current configuration, but I can't do cpu-intensive tasks like loading a new firmware onto the device. I have sniffed the line to see what is being sent down the physical wires and I have logged inside windows what the application was sending and the two are almost the same, until we meet an ascii null. We log an ascii null (0x00) as being sent, but on the other side of the virtual machine 0xFF is coming out. We can run the application with Wine and download firmware but can't read the current configuration, so it's not the linux usb to serial converter drivers. So my question is, do you have any idea what's going wrong? I have tried kvm72 and the 2008-08-12 nightly snapshot. I am running an Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz with 2048MB RAM on Ubuntu Linux 8.04 (Hardy Heron). if anyone can help, it would be greatly appreciated! Thanks in advance, Michael Malone === This email, including any attachments, is only for the intended addressee. It is subject to copyright, is confidential and may be the subject of legal or other privilege, none of which is waived or lost by reason of this transmission. If the receiver is not the intended addressee, please accept our apologies, notify us by return, delete all copies and perform no other act on the email. Unfortunately, we cannot warrant that the email has not been altered or corrupted during transmission. === -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html