[Qemu-devel] Qemu doesn't implement SCSI READ DISC INFORMATION command (0x51) Qemu reports: SK=5h/ASC=20h/ACQ=00h
https://bugs.launchpad.net/qemu/+bug/612901 Just curious if anyone has any information on this, perhaps this is fixed in a CVS or git branch of Qemu or a possible work around someone knows may work, or any information really.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 07:17:30PM -0400, Kevin O'Connor wrote: > On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote: > > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > > > There are better ways like using string I/O and optimizing the PIO > > > path in the kernel. That should cut down the 1s slow down with a > > > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > > > further off the initrd load is just not worth it using the current > > > model. > > > > > The slow down is not 1s any more. String PIO emulation had many bugs > > that were fixed in 2.6.35. I verified how much time it took to load 100M > > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > > that was already committed make it 20s. I have some code prototype that > > makes it 11s. I don't see how we can get below that, surely not back to > > ~2-3sec. > > I guess this slowness is primarily for kvm. I just ran some tests on > the latest qemu (with TCG). I pulled in a 400Meg file over fw_cfg > using the SeaBIOS interface - it takes 9.8 seconds (pretty > consistently). Oddly, if I change SeaBIOS to use insb (string pio) it > takes 11.5 seconds (again, pretty consistently). These times were > measured on the host - they don't include the extra time it takes qemu > to start up (during which it reads the file into its memory). > Yes only KVM is affected, nothing has changed in qemu itself. -- Gleb.
Re: [Qemu-devel] [PATCH 0/4] fix PowerPC 440 Bamboo platform emulation
On Wed, Aug 04, 2010 at 05:21:33PM -0700, Hollis Blanchard wrote: > These patches get the PowerPC Bamboo platform working again. I've re-written > two of the patches based on feedback from qemu-devel. > > Note that this platform still only works in conjunction with KVM, since the > PowerPC 440 MMU is still not accurately emulated by TCG. Is that the Book-E MMU? In case it is, I've got a couple of fairly ugly patches somewhere that at least made it possible for me to boot linux on a TCG emulated ppc 440. I'll see if I can dig them out and post them. Cheers
Re: [Qemu-devel] [PATCH 4/4] ppc4xx: load Bamboo kernel, initrd, and fdt at fixed addresses
On Wed, Aug 04, 2010 at 05:21:37PM -0700, Hollis Blanchard wrote: > We can't use the return value of load_uimage() for the kernel because it > can't account for BSS size, and the PowerPC kernel does not relocate > blobs before zeroing BSS. > > Instead, we now load at the fixed addresses chosen by u-boot (the normal > firmware for the board). > > Signed-off-by: Hollis Blanchard This looks good to me, thanks Hollis. Acked-by: Edgar E. Iglesias > > --- > hw/ppc440_bamboo.c | 39 ++- > 1 files changed, 18 insertions(+), 21 deletions(-) > > This fixes a critical bug in PowerPC 440 Bamboo board emulation. > > diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c > index d471d5d..34ddf45 100644 > --- a/hw/ppc440_bamboo.c > +++ b/hw/ppc440_bamboo.c > @@ -27,6 +27,11 @@ > > #define BINARY_DEVICE_TREE_FILE "bamboo.dtb" > > +/* from u-boot */ > +#define KERNEL_ADDR 0x100 > +#define FDT_ADDR 0x180 > +#define RAMDISK_ADDR 0x190 > + > static int bamboo_load_device_tree(target_phys_addr_t addr, > uint32_t ramsize, > target_phys_addr_t initrd_base, > @@ -98,10 +103,8 @@ static void bamboo_init(ram_addr_t ram_size, > uint64_t elf_lowaddr; > target_phys_addr_t entry = 0; > target_phys_addr_t loadaddr = 0; > -target_long kernel_size = 0; > -target_ulong initrd_base = 0; > target_long initrd_size = 0; > -target_ulong dt_base = 0; > +int success; > int i; > > /* Setup CPU. */ > @@ -118,15 +121,15 @@ static void bamboo_init(ram_addr_t ram_size, > > /* Load kernel. */ > if (kernel_filename) { > -kernel_size = load_uimage(kernel_filename, &entry, &loadaddr, NULL); > -if (kernel_size < 0) { > -kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry, > - &elf_lowaddr, NULL, 1, ELF_MACHINE, 0); > +success = load_uimage(kernel_filename, &entry, &loadaddr, NULL); > +if (success < 0) { > +success = load_elf(kernel_filename, NULL, NULL, &elf_entry, > + &elf_lowaddr, NULL, 1, ELF_MACHINE, 0); > entry = elf_entry; > loadaddr = elf_lowaddr; > } > /* XXX try again as binary */ > -if (kernel_size < 0) { > +if (success < 0) { > fprintf(stderr, "qemu: could not load kernel '%s'\n", > kernel_filename); > exit(1); > @@ -135,26 +138,20 @@ static void bamboo_init(ram_addr_t ram_size, > > /* Load initrd. */ > if (initrd_filename) { > -initrd_base = kernel_size + loadaddr; > -initrd_size = load_image_targphys(initrd_filename, initrd_base, > - ram_size - initrd_base); > +initrd_size = load_image_targphys(initrd_filename, RAMDISK_ADDR, > + ram_size - RAMDISK_ADDR); > > if (initrd_size < 0) { > -fprintf(stderr, "qemu: could not load initial ram disk '%s'\n", > -initrd_filename); > +fprintf(stderr, "qemu: could not load ram disk '%s' at %x\n", > +initrd_filename, RAMDISK_ADDR); > exit(1); > } > } > > /* If we're loading a kernel directly, we must load the device tree too. > */ > if (kernel_filename) { > -if (initrd_base) > -dt_base = initrd_base + initrd_size; > -else > -dt_base = kernel_size + loadaddr; > - > -if (bamboo_load_device_tree(dt_base, ram_size, > -initrd_base, initrd_size, kernel_cmdline) < 0) { > +if (bamboo_load_device_tree(FDT_ADDR, ram_size, RAMDISK_ADDR, > +initrd_size, kernel_cmdline) < 0) { > fprintf(stderr, "couldn't load device tree\n"); > exit(1); > } > @@ -163,7 +160,7 @@ static void bamboo_init(ram_addr_t ram_size, > > /* Set initial guest state. */ > env->gpr[1] = (16<<20) - 8; > -env->gpr[3] = dt_base; > +env->gpr[3] = FDT_ADDR; > env->nip = entry; > /* XXX we currently depend on KVM to create some initial TLB > entries. */ > } > -- > 1.7.2 > >
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 11:06 PM, David S. Ahern wrote: On 08/04/10 11:34, Avi Kivity wrote: And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore. For the Nth time, it's only slow with 100MB initrds. 100MB is really not that large for an initrd. Consider the deployment of stateless nodes - something that virtualization allows the rapid deployment of. 1 kernel, 1 initrd with the various binaries to be run. Create nodes as needed by launching a shell command - be it for more capacity, isolation, etc. Why require an iso or disk wrapper for a binary blob that is all to be run out of memory? It's inefficient. First qemu reads the initrd and stores it in memory (where it is kept while the guest runs in case you migrate or reboot). Then the guest copies it into temporary storage (where we currently have the slowdown). Then the guest decompresses and extracts it to tmpfs (initramfs model). Finally the guest runs init out of initrd, typically using just a part of the 100MB+. Whereas with a disk image, individual pages are copied to the guest on demand without taking space in qemu. With cache=none, they don't even affect host pagecache. The -append argument allows boot parameters to be specified at launch. That is a very powerful and simple design option. Good point. You still have it with a small initrd that bootstraps a larger image. Note -append probably works even without -kernel, it's just that the guest isn't tooled to look at it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
[Qemu-devel] [PATCH v2 2/8] qdev: export qdev_reset() for later use.
export qdev_reset() for later use. Signed-off-by: Isaku Yamahata --- hw/qdev.c | 29 + hw/qdev.h |1 + 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index e99c73f..322b315 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -256,13 +256,34 @@ DeviceState *qdev_device_add(QemuOpts *opts) return qdev; } -static void qdev_reset(void *opaque) +/* + * reset the device. + * Bring the device into initial known state (to some extent) + * on warm reset(system reset). + * Typically on system reset(or power-on reset), bus reset occurs on + * each bus which causes devices to reset. + * This reset doesn't include software reset which is triggered by + * issuing reset command. Those device reset would be implemented in a bus + * specific way. + * + * For example + * PCI: reset with RST# signal asserted. Not FLR of advanced feature capability + * PCIe: conventional reset. Not FLR. + * ATA: hardware reset with RESET- signal asserted. Not DEVICE RESET command. + * SCSI: hard reset with SCSI RST signal asserted. + * Not bus device reset message. + */ +void qdev_reset(DeviceState *dev) { -DeviceState *dev = opaque; if (dev->info->reset) dev->info->reset(dev); } +static void qdev_reset_fn(void *opaque) +{ +qdev_reset(opaque); +} + /* Initialize a device. Device properties should be set before calling this function. IRQs and MMIO regions should be connected/mapped after calling this function. @@ -278,7 +299,7 @@ int qdev_init(DeviceState *dev) qdev_free(dev); return rc; } -qemu_register_reset(qdev_reset, dev); +qemu_register_reset(qdev_reset_fn, dev); if (dev->info->vmsd) { vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev, dev->instance_id_alias, @@ -350,7 +371,7 @@ void qdev_free(DeviceState *dev) if (dev->opts) qemu_opts_del(dev->opts); } -qemu_unregister_reset(qdev_reset, dev); +qemu_unregister_reset(qdev_reset_fn, dev); QLIST_REMOVE(dev, sibling); for (prop = dev->info->props; prop && prop->name; prop++) { if (prop->info->free) { diff --git a/hw/qdev.h b/hw/qdev.h index 678f8b7..10f6769 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -162,6 +162,7 @@ struct DeviceInfo { extern DeviceInfo *device_info_list; void qdev_register(DeviceInfo *info); +void qdev_reset(DeviceState *dev); /* Register device properties. */ /* GPIO inputs also double as IRQ sinks. */ -- 1.7.1.1
[Qemu-devel] [PATCH v2 4/8] pci: make pci_device_reset() aware of qdev.
Make pci_device_reset handle qdevfied device and non-converted device differently. Later they will be handled differently. Signed-off-by: Isaku Yamahata --- hw/pci.c | 35 +-- hw/pci.h |1 + 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 6a614d1..c48bb3e 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -130,8 +130,7 @@ static void pci_update_irq_status(PCIDevice *dev) } } -/* Reset the device in response to RST# signal. */ -void pci_device_reset(PCIDevice *dev) +void pci_device_reset_default(PCIDevice *dev) { int r; @@ -159,6 +158,38 @@ void pci_device_reset(PCIDevice *dev) pci_update_mappings(dev); } +/* Reset the device in response to RST# signal. */ +void pci_device_reset(PCIDevice *dev) +{ +if (!dev->qdev.info || !dev->qdev.info->reset) { +/* for not qdevified device or reset isn't implemented property. + * So take care of them in PCI generic layer. + */ +pci_device_reset_default(dev); +return; +} + +/* + * There are two paths to reset pci device. Each resets does partially. + * qemu_system_reset() + * -> pci_device_reset() with bus + * -> pci_device_reset_default() which resets pci common part. + * -> DeviceState::reset: each device specific reset hanlder + * which resets device specific part. + * + * TODO: + * It requires two execution paths to reset the device fully. + * It is confusing and prone to error. Each device should know all + * its states. + * So move this part to each device specific callback. + */ + +/* For now qdev_reset() is called directly by qemu_system_reset() */ +/* qdev_reset(&dev->qdev); */ + +pci_device_reset_default(dev); +} + /* * Trigger pci bus reset under a given bus. * This functions emulates RST#. diff --git a/hw/pci.h b/hw/pci.h index be05662..ce1feb4 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -210,6 +210,7 @@ PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min); void pci_bus_reset(PCIBus *bus); void pci_device_reset(PCIDevice *dev); +void pci_device_reset_default(PCIDevice *dev); void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, int nirq); -- 1.7.1.1
[Qemu-devel] [PATCH v2 8/8] pci bridge: implement secondary bus reset.
implement secondary bus reset. Signed-off-by: Isaku Yamahata --- hw/pci_bridge.c | 13 - 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c index ab7ed6e..37710e9 100644 --- a/hw/pci_bridge.c +++ b/hw/pci_bridge.c @@ -119,6 +119,9 @@ pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, uint8_t type) void pci_bridge_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len) { +PCIBridge *s = container_of(d, PCIBridge, dev); +uint16_t bridge_control = pci_get_word(d->config + PCI_BRIDGE_CONTROL); + pci_default_write_config(d, address, val, len); if (/* io base/limit */ @@ -127,9 +130,17 @@ void pci_bridge_write_config(PCIDevice *d, /* memory base/limit, prefetchable base/limit and io base/limit upper 16 */ ranges_overlap(address, len, PCI_MEMORY_BASE, 20)) { -PCIBridge *s = container_of(d, PCIBridge, dev); pci_bridge_update_mappings(&s->sec_bus); } + +if (ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +uint16_t new = pci_get_word(d->config + PCI_BRIDGE_CONTROL); +if (!(bridge_control & PCI_BRIDGE_CTL_BUS_RESET) && +(new & PCI_BRIDGE_CTL_BUS_RESET)) { +/* 0 -> 1 */ +pci_bus_reset(&s->sec_bus); +} +} } /* reset bridge specific configuration registers */ -- 1.7.1.1
[Qemu-devel] [PATCH v2 1/8] apb: fix typo.
fix typo. Signed-off-by: Isaku Yamahata --- hw/apb_pci.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/apb_pci.c b/hw/apb_pci.c index 10a5baa..c619112 100644 --- a/hw/apb_pci.c +++ b/hw/apb_pci.c @@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, /* APB secondary busses */ pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true, "pbm-bridge"); -br = DO_UPCAST(PCIBridge, dev, dev); +br = DO_UPCAST(PCIBridge, dev, pci_dev); pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1", pci_apb_map_irq); qdev_init_nofail(&pci_dev->qdev); @@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true, "pbm-bridge"); -br = DO_UPCAST(PCIBridge, dev, dev); +br = DO_UPCAST(PCIBridge, dev, pci_dev); pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2", pci_apb_map_irq); qdev_init_nofail(&pci_dev->qdev); @@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = { .qdev.name = "pbm-bridge", .qdev.size = sizeof(PCIBridge), .qdev.vmsd = &vmstate_pci_device, -.qdev.reset = pci_brdige_reset, +.qdev.reset = pci_bridge_reset, .init = apb_pci_bridge_initfn, .exit = pci_bridge_exitfn, .config_write = pci_bridge_write_config, -- 1.7.1.1
[Qemu-devel] [PATCH v2 3/8] pci: export pci_bus_reset() and pci_device_reset() for later use.
export pci_bus_reset() and pci_device_reset() for later use with slight function signature adjustment. Signed-off-by: Isaku Yamahata --- hw/pci.c | 17 + hw/pci.h |4 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 2dc1577..6a614d1 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -130,7 +130,8 @@ static void pci_update_irq_status(PCIDevice *dev) } } -static void pci_device_reset(PCIDevice *dev) +/* Reset the device in response to RST# signal. */ +void pci_device_reset(PCIDevice *dev) { int r; @@ -158,9 +159,12 @@ static void pci_device_reset(PCIDevice *dev) pci_update_mappings(dev); } -static void pci_bus_reset(void *opaque) +/* + * Trigger pci bus reset under a given bus. + * This functions emulates RST#. + */ +void pci_bus_reset(PCIBus *bus) { -PCIBus *bus = opaque; int i; for (i = 0; i < bus->nirq; i++) { @@ -173,6 +177,11 @@ static void pci_bus_reset(void *opaque) } } +static void pci_bus_reset_fn(void *opaque) +{ +pci_bus_reset(opaque); +} + static void pci_host_bus_register(int domain, PCIBus *bus) { struct PCIHostBus *host; @@ -227,7 +236,7 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */ vmstate_register(NULL, -1, &vmstate_pcibus, bus); -qemu_register_reset(pci_bus_reset, bus); +qemu_register_reset(pci_bus_reset_fn, bus); } PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min) diff --git a/hw/pci.h b/hw/pci.h index c551f96..be05662 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -207,6 +207,10 @@ typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev, int state); void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, const char *name, int devfn_min); PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min); + +void pci_bus_reset(PCIBus *bus); +void pci_device_reset(PCIDevice *dev); + void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, int nirq); void pci_bus_hotplug(PCIBus *bus, pci_hotplug_fn hotplug, DeviceState *dev); -- 1.7.1.1
[Qemu-devel] [PATCH v2 5/8] qdev: introduce bus reset callback and helper functions.
Introduce bus reset callback to support bus reset at qbus layer and a function to trigger bus reset. Now qdev reset callback is triggered by parent qbus reset callback. And qdev should trigger child qbus reset callback. Signed-off-by: Isaku Yamahata --- changes v1 -> v2 - eliminate qemu_register_reset() from qdev_create() as Gerd suggested. - Inserted qdev_reset_default() as appropriate. This is required for qdev which has reset callback and child bus. --- hw/esp.c|2 ++ hw/lsi53c895a.c |1 + hw/qdev.c | 42 +++--- hw/qdev.h |7 +++ 4 files changed, 49 insertions(+), 3 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 349052a..cafc257 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -423,6 +423,8 @@ static void esp_hard_reset(DeviceState *d) { ESPState *s = container_of(d, ESPState, busdev.qdev); +qdev_reset_default(d); + memset(s->rregs, 0, ESP_REGS); memset(s->wregs, 0, ESP_REGS); s->rregs[ESP_TCHI] = TCHI_FAS100A; // Indicate fas100a diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index bd7b661..33a8eb2 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -2042,6 +2042,7 @@ static void lsi_scsi_reset(DeviceState *dev) { LSIState *s = DO_UPCAST(LSIState, dev.qdev, dev); +qdev_reset_default(dev); lsi_soft_reset(s); } diff --git a/hw/qdev.c b/hw/qdev.c index 322b315..8352f20 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -256,6 +256,14 @@ DeviceState *qdev_device_add(QemuOpts *opts) return qdev; } +void qdev_reset_default(DeviceState *dev) +{ +BusState *bus; +QLIST_FOREACH(bus, &dev->child_bus, sibling) { +qbus_reset(bus); +} +} + /* * reset the device. * Bring the device into initial known state (to some extent) @@ -275,8 +283,11 @@ DeviceState *qdev_device_add(QemuOpts *opts) */ void qdev_reset(DeviceState *dev) { -if (dev->info->reset) +if (dev->info->reset) { dev->info->reset(dev); +} else { +qdev_reset_default(dev); +} } static void qdev_reset_fn(void *opaque) @@ -299,7 +310,6 @@ int qdev_init(DeviceState *dev) qdev_free(dev); return rc; } -qemu_register_reset(qdev_reset_fn, dev); if (dev->info->vmsd) { vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev, dev->instance_id_alias, @@ -671,6 +681,29 @@ static BusState *qbus_find(const char *path) } } +void qbus_reset_default(BusState *bus) +{ +DeviceState *dev; +QLIST_FOREACH(dev, &bus->children, sibling) { +qdev_reset(dev); +} +} + +/* trigger bus reset */ +void qbus_reset(BusState *bus) +{ +if (bus->info->reset) { +bus->info->reset(bus); +} else { +qbus_reset_default(bus); +} +} + +static void qbus_reset_fn(void *opaque) +{ +qbus_reset(opaque); +} + void qbus_create_inplace(BusState *bus, BusInfo *info, DeviceState *parent, const char *name) { @@ -705,7 +738,10 @@ void qbus_create_inplace(BusState *bus, BusInfo *info, QLIST_INSERT_HEAD(&parent->child_bus, bus, sibling); parent->num_child_bus++; } - +if (!parent || !parent->info) { +/* parent device takes care of child bus reset */ +qemu_register_reset(qbus_reset_fn, bus); +} } BusState *qbus_create(BusInfo *info, DeviceState *parent, const char *name) diff --git a/hw/qdev.h b/hw/qdev.h index 10f6769..af76f31 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -50,6 +50,7 @@ struct DeviceState { typedef void (*bus_dev_printfn)(Monitor *mon, DeviceState *dev, int indent); typedef char *(*bus_get_dev_path)(DeviceState *dev); +typedef void (*bus_resetfn)(BusState *bus); struct BusInfo { const char *name; @@ -57,6 +58,9 @@ struct BusInfo { bus_dev_printfn print_dev; bus_get_dev_path get_dev_path; Property *props; + +/* bus reset callbacks */ +bus_resetfn reset; }; struct BusState { @@ -163,6 +167,7 @@ extern DeviceInfo *device_info_list; void qdev_register(DeviceInfo *info); void qdev_reset(DeviceState *dev); +void qdev_reset_default(DeviceState *dev); /* Register device properties. */ /* GPIO inputs also double as IRQ sinks. */ @@ -179,6 +184,8 @@ void qbus_create_inplace(BusState *bus, BusInfo *info, DeviceState *parent, const char *name); BusState *qbus_create(BusInfo *info, DeviceState *parent, const char *name); void qbus_free(BusState *bus); +void qbus_reset(BusState *bus); +void qbus_reset_default(BusState *bus); #define FROM_QBUS(type, dev) DO_UPCAST(type, qbus, dev) -- 1.7.1.1
[Qemu-devel] [PATCH v2 7/8] pci: eliminate work around in pci_device_reset().
Eliminate work around in pci_device_reset() by making each pci reset function to call pci_device_reset_default(). Each device should know reset itself. It shouldn't be done pci generic layer automatically. PCI layer should just signal reset and let each device respond to reset. Signed-off-by: Isaku Yamahata --- hw/e1000.c |1 + hw/lsi53c895a.c |2 ++ hw/pci.c|6 -- hw/pcnet.c |1 + hw/rtl8139.c|2 ++ hw/virtio-pci.c |1 + 6 files changed, 7 insertions(+), 6 deletions(-) diff --git a/hw/e1000.c b/hw/e1000.c index 8d87492..0f303b0 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -1077,6 +1077,7 @@ static void e1000_reset(void *opaque) memmove(d->mac_reg, mac_reg_init, sizeof mac_reg_init); d->rxbuf_min_shift = 1; memset(&d->tx, 0, sizeof d->tx); +pci_device_reset_default(&d->dev); } static NetClientInfo net_e1000_info = { diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 33a8eb2..1e4ba10 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -358,6 +358,8 @@ static void lsi_soft_reset(LSIState *s) qemu_free(s->current); s->current = NULL; } + +pci_device_reset_default(&s->dev); } static int lsi_dma_40bit(LSIState *s) diff --git a/hw/pci.c b/hw/pci.c index 731d367..54cb89b 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -171,13 +171,7 @@ void pci_device_reset(PCIDevice *dev) return; } -/* - * TODO: - * Each device should know all its states. - * So move this part to each device specific callback. - */ qdev_reset(&dev->qdev); -pci_device_reset_default(dev); } /* diff --git a/hw/pcnet.c b/hw/pcnet.c index b52935a..e73e682 100644 --- a/hw/pcnet.c +++ b/hw/pcnet.c @@ -2023,6 +2023,7 @@ static void pci_reset(DeviceState *dev) PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev.qdev, dev); pcnet_h_reset(&d->state); +pci_device_reset_default(&d->pci_dev); } static PCIDeviceInfo pcnet_info = { diff --git a/hw/rtl8139.c b/hw/rtl8139.c index d92981d..1f35e5d 100644 --- a/hw/rtl8139.c +++ b/hw/rtl8139.c @@ -1260,6 +1260,8 @@ static void rtl8139_reset(DeviceState *d) /* reset tally counters */ RTL8139TallyCounters_clear(&s->tally_counters); + +pci_device_reset_default(&s->dev); } static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index c728fff..d9b97be 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -184,6 +184,7 @@ static void virtio_pci_reset(DeviceState *d) virtio_reset(proxy->vdev); msix_reset(&proxy->pci_dev); proxy->bugs = 0; +pci_device_reset_default(&proxy->pci_dev); } static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) -- 1.7.1.1
[Qemu-devel] [PATCH v2 6/8] pci: use qbus bus reset callback.
use qbus bus reset callback. Signed-off-by: Isaku Yamahata --- hw/apb_pci.c|2 ++ hw/pci.c| 23 ++- hw/pci_bridge.c |2 ++ 3 files changed, 10 insertions(+), 17 deletions(-) diff --git a/hw/apb_pci.c b/hw/apb_pci.c index c619112..775063a 100644 --- a/hw/apb_pci.c +++ b/hw/apb_pci.c @@ -384,6 +384,8 @@ static void pci_pbm_reset(DeviceState *d) unsigned int i; APBState *s = container_of(d, APBState, busdev.qdev); +qdev_reset_default(d); + for (i = 0; i < 8; i++) { s->pci_irq_map[i] &= PBM_PCI_IMR_MASK; } diff --git a/hw/pci.c b/hw/pci.c index c48bb3e..731d367 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -40,12 +40,14 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent); static char *pcibus_get_dev_path(DeviceState *dev); +static void pci_bus_reset_fn(BusState *qbus); struct BusInfo pci_bus_info = { .name = "PCI", .size = sizeof(PCIBus), .print_dev = pcibus_dev_print, .get_dev_path = pcibus_get_dev_path, +.reset = pci_bus_reset_fn, .props = (Property[]) { DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1), DEFINE_PROP_STRING("romfile", PCIDevice, romfile), @@ -170,23 +172,11 @@ void pci_device_reset(PCIDevice *dev) } /* - * There are two paths to reset pci device. Each resets does partially. - * qemu_system_reset() - * -> pci_device_reset() with bus - * -> pci_device_reset_default() which resets pci common part. - * -> DeviceState::reset: each device specific reset hanlder - * which resets device specific part. - * * TODO: - * It requires two execution paths to reset the device fully. - * It is confusing and prone to error. Each device should know all - * its states. + * Each device should know all its states. * So move this part to each device specific callback. */ - -/* For now qdev_reset() is called directly by qemu_system_reset() */ -/* qdev_reset(&dev->qdev); */ - +qdev_reset(&dev->qdev); pci_device_reset_default(dev); } @@ -208,9 +198,9 @@ void pci_bus_reset(PCIBus *bus) } } -static void pci_bus_reset_fn(void *opaque) +static void pci_bus_reset_fn(BusState *qbus) { -pci_bus_reset(opaque); +pci_bus_reset(DO_UPCAST(PCIBus, qbus, qbus)); } static void pci_host_bus_register(int domain, PCIBus *bus) @@ -267,7 +257,6 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */ vmstate_register(NULL, -1, &vmstate_pcibus, bus); -qemu_register_reset(pci_bus_reset_fn, bus); } PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min) diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c index 198c3c7..ab7ed6e 100644 --- a/hw/pci_bridge.c +++ b/hw/pci_bridge.c @@ -158,6 +158,8 @@ void pci_bridge_reset_reg(PCIDevice *dev) void pci_bridge_reset(DeviceState *qdev) { PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev); +PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); +pci_bus_reset(&br->sec_bus); pci_bridge_reset_reg(dev); } -- 1.7.1.1
[Qemu-devel] [PATCH v2 0/8] qbus reset callback and implement pci bus reset
Changes v1 -> v2: - addressed personal feed back from Gerd. - reset signal are triggered by bus and propagated down into device. - Only 5/8 is modified. Other patches remains same. This patch isn't for 0.13 release. and for MST pci branch. (git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git pci) Patch description: Introduce bus reset notion at qbus layer and implement pci bus reset with it. At first related codes are cleaned up and then introduce bus reset callback. And then implement pci bus reset. The main motivation is to implement pci bus reset. But I suppose scsi bus and ide bus also can take advantage of this patch series. Isaku Yamahata (8): apb: fix typo. qdev: export qdev_reset() for later use. pci: export pci_bus_reset() and pci_device_reset() for later use. pci: make pci_device_reset() aware of qdev. qdev: introduce bus reset callback and helper functions. pci: use qbus bus reset callback. pci: eliminate work around in pci_device_reset(). pci bridge: implement secondary bus reset. hw/apb_pci.c|8 -- hw/e1000.c |1 + hw/esp.c|2 + hw/lsi53c895a.c |3 ++ hw/pci.c| 31 +--- hw/pci.h|5 hw/pci_bridge.c | 15 +++- hw/pcnet.c |1 + hw/qdev.c | 69 ++ hw/qdev.h |8 ++ hw/rtl8139.c|2 + hw/virtio-pci.c |1 + 12 files changed, 132 insertions(+), 14 deletions(-)
Re: [Qemu-devel] Re: [PATCH 2/3] pci/pci_host: pci host bus initialization clean up.
On Mon, Jul 26, 2010 at 02:33:30PM +0300, Michael S. Tsirkin wrote: > > +/* > > + * TODO: there remains some boards which doesn't use PCIHostState. > > + * Enhance PCIHostState API and convert remaining boards. > > I think I remember this comment from Paul: > On Tuesday 12 January 2010, Isaku Yamahata wrote: > > To use pci host framework, use PCIHostState instead of PCIBus in > > > PCIVPBState. > > > No. > > pci_host.[ch] provides very specific functionality, it is not a generic > PCI host device. Specifically it provides indirect access to PCI config > space via a memory mapped {address,data} pair. The versatile PCI host > exposes PCI > config space directly, so should not be using this code. > > If you want a generic framework for PCI hosts then you need to use > something else. If nothing else, assuming that a PCI host bridge is > always is > SysBus device is wrong. > > Still applies? No objection? Paul, do you have any comment? -- yamahata
[Qemu-devel] [Bug 613681] [NEW] implement true fullscreen
Public bug reported: Please implement a fullscreen functionality (similar to the one found in vmware, where there is an autohide bar), that enables display of a 1920x1080 VM on a 1920x1080 (for example) without resizing (currently the menubar prevents this). Thank you. ** Affects: qemu Importance: Undecided Status: New ** Tags: wishlist -- implement true fullscreen https://bugs.launchpad.net/bugs/613681 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Please implement a fullscreen functionality (similar to the one found in vmware, where there is an autohide bar), that enables display of a 1920x1080 VM on a 1920x1080 (for example) without resizing (currently the menubar prevents this). Thank you.
[Qemu-devel] [PATCH 4/4] ppc4xx: load Bamboo kernel, initrd, and fdt at fixed addresses
We can't use the return value of load_uimage() for the kernel because it can't account for BSS size, and the PowerPC kernel does not relocate blobs before zeroing BSS. Instead, we now load at the fixed addresses chosen by u-boot (the normal firmware for the board). Signed-off-by: Hollis Blanchard --- hw/ppc440_bamboo.c | 39 ++- 1 files changed, 18 insertions(+), 21 deletions(-) This fixes a critical bug in PowerPC 440 Bamboo board emulation. diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c index d471d5d..34ddf45 100644 --- a/hw/ppc440_bamboo.c +++ b/hw/ppc440_bamboo.c @@ -27,6 +27,11 @@ #define BINARY_DEVICE_TREE_FILE "bamboo.dtb" +/* from u-boot */ +#define KERNEL_ADDR 0x100 +#define FDT_ADDR 0x180 +#define RAMDISK_ADDR 0x190 + static int bamboo_load_device_tree(target_phys_addr_t addr, uint32_t ramsize, target_phys_addr_t initrd_base, @@ -98,10 +103,8 @@ static void bamboo_init(ram_addr_t ram_size, uint64_t elf_lowaddr; target_phys_addr_t entry = 0; target_phys_addr_t loadaddr = 0; -target_long kernel_size = 0; -target_ulong initrd_base = 0; target_long initrd_size = 0; -target_ulong dt_base = 0; +int success; int i; /* Setup CPU. */ @@ -118,15 +121,15 @@ static void bamboo_init(ram_addr_t ram_size, /* Load kernel. */ if (kernel_filename) { -kernel_size = load_uimage(kernel_filename, &entry, &loadaddr, NULL); -if (kernel_size < 0) { -kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry, - &elf_lowaddr, NULL, 1, ELF_MACHINE, 0); +success = load_uimage(kernel_filename, &entry, &loadaddr, NULL); +if (success < 0) { +success = load_elf(kernel_filename, NULL, NULL, &elf_entry, + &elf_lowaddr, NULL, 1, ELF_MACHINE, 0); entry = elf_entry; loadaddr = elf_lowaddr; } /* XXX try again as binary */ -if (kernel_size < 0) { +if (success < 0) { fprintf(stderr, "qemu: could not load kernel '%s'\n", kernel_filename); exit(1); @@ -135,26 +138,20 @@ static void bamboo_init(ram_addr_t ram_size, /* Load initrd. */ if (initrd_filename) { -initrd_base = kernel_size + loadaddr; -initrd_size = load_image_targphys(initrd_filename, initrd_base, - ram_size - initrd_base); +initrd_size = load_image_targphys(initrd_filename, RAMDISK_ADDR, + ram_size - RAMDISK_ADDR); if (initrd_size < 0) { -fprintf(stderr, "qemu: could not load initial ram disk '%s'\n", -initrd_filename); +fprintf(stderr, "qemu: could not load ram disk '%s' at %x\n", +initrd_filename, RAMDISK_ADDR); exit(1); } } /* If we're loading a kernel directly, we must load the device tree too. */ if (kernel_filename) { -if (initrd_base) -dt_base = initrd_base + initrd_size; -else -dt_base = kernel_size + loadaddr; - -if (bamboo_load_device_tree(dt_base, ram_size, -initrd_base, initrd_size, kernel_cmdline) < 0) { +if (bamboo_load_device_tree(FDT_ADDR, ram_size, RAMDISK_ADDR, +initrd_size, kernel_cmdline) < 0) { fprintf(stderr, "couldn't load device tree\n"); exit(1); } @@ -163,7 +160,7 @@ static void bamboo_init(ram_addr_t ram_size, /* Set initial guest state. */ env->gpr[1] = (16<<20) - 8; -env->gpr[3] = dt_base; +env->gpr[3] = FDT_ADDR; env->nip = entry; /* XXX we currently depend on KVM to create some initial TLB entries. */ } -- 1.7.2
[Qemu-devel] [PATCH 3/4] ppc4xx: don't unregister RAM at reset
The PowerPC 4xx SDRAM controller emulation unregisters RAM in its reset callback. However, qemu_system_reset() is now called at initialization time, so all RAM is unregistered before starting the guest (!). Signed-off-by: Hollis Blanchard --- hw/ppc4xx_devs.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) This fixes a critical bug in PowerPC 440 Bamboo board emulation. diff --git a/hw/ppc4xx_devs.c b/hw/ppc4xx_devs.c index be130c4..7f698b8 100644 --- a/hw/ppc4xx_devs.c +++ b/hw/ppc4xx_devs.c @@ -619,7 +619,6 @@ static void sdram_reset (void *opaque) /* We pre-initialize RAM banks */ sdram->status = 0x; sdram->cfg = 0x0080; -sdram_unmap_bcr(sdram); } void ppc4xx_sdram_init (CPUState *env, qemu_irq irq, int nbanks, -- 1.7.2
[Qemu-devel] [PATCH 2/4] ppc4xx: correct SDRAM controller warning message condition
The message "Truncating memory to %d MiB to fit SDRAM controller limits" should be displayed only when a user chooses an amount of RAM which can't be represented by the PPC 4xx SDRAM controller (e.g. 129MB, which would only be valid if the controller supports a bank size of 1MB). Signed-off-by: Hollis Blanchard --- hw/ppc4xx_devs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/ppc4xx_devs.c b/hw/ppc4xx_devs.c index b15db81..be130c4 100644 --- a/hw/ppc4xx_devs.c +++ b/hw/ppc4xx_devs.c @@ -684,7 +684,7 @@ ram_addr_t ppc4xx_sdram_adjust(ram_addr_t ram_size, int nr_banks, } ram_size -= size_left; -if (ram_size) +if (size_left) printf("Truncating memory to %d MiB to fit SDRAM controller limits.\n", (int)(ram_size >> 20)); -- 1.7.2
[Qemu-devel] [PATCH 1/4] Fix "make install" with a cross toolchain
We must be able to use a non-native strip executable, but not all versions of 'install' support the --strip-program option (e.g. OpenBSD). Accordingly, we can't use 'install -s', and we must run strip separately. Signed-off-by: Hollis Blanchard Cc: blauwir...@gmail.com --- Makefile.target |5 - configure |4 +++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/Makefile.target b/Makefile.target index 8a9c427..00bf6f9 100644 --- a/Makefile.target +++ b/Makefile.target @@ -326,7 +326,10 @@ clean: install: all ifneq ($(PROGS),) - $(INSTALL) -m 755 $(STRIP_OPT) $(PROGS) "$(DESTDIR)$(bindir)" + $(INSTALL) -m 755 $(PROGS) "$(DESTDIR)$(bindir)" +ifneq ($(STRIP),) + $(STRIP) $(patsubst %,"$(DESTDIR)$(bindir)/%",$(PROGS)) +endif endif # Include automatically generated dependency files diff --git a/configure b/configure index a20371c..146dac0 100755 --- a/configure +++ b/configure @@ -80,6 +80,7 @@ make="make" install="install" objcopy="objcopy" ld="ld" +strip="strip" helper_cflags="" libs_softmmu="" libs_tools="" @@ -125,6 +126,7 @@ cc="${cross_prefix}${cc}" ar="${cross_prefix}${ar}" objcopy="${cross_prefix}${objcopy}" ld="${cross_prefix}${ld}" +strip="${cross_prefix}${strip}" # default flags for all hosts QEMU_CFLAGS="-fno-strict-aliasing $QEMU_CFLAGS" @@ -2227,7 +2229,7 @@ if test "$debug" = "yes" ; then echo "CONFIG_DEBUG_EXEC=y" >> $config_host_mak fi if test "$strip_opt" = "yes" ; then - echo "STRIP_OPT=-s" >> $config_host_mak + echo "STRIP=${strip}" >> $config_host_mak fi if test "$bigendian" = "yes" ; then echo "HOST_WORDS_BIGENDIAN=y" >> $config_host_mak -- 1.7.2
[Qemu-devel] [PATCH 0/4] fix PowerPC 440 Bamboo platform emulation
These patches get the PowerPC Bamboo platform working again. I've re-written two of the patches based on feedback from qemu-devel. Note that this platform still only works in conjunction with KVM, since the PowerPC 440 MMU is still not accurately emulated by TCG.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > > There are better ways like using string I/O and optimizing the PIO > > path in the kernel. That should cut down the 1s slow down with a > > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > > further off the initrd load is just not worth it using the current > > model. > > > The slow down is not 1s any more. String PIO emulation had many bugs > that were fixed in 2.6.35. I verified how much time it took to load 100M > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > that was already committed make it 20s. I have some code prototype that > makes it 11s. I don't see how we can get below that, surely not back to > ~2-3sec. I guess this slowness is primarily for kvm. I just ran some tests on the latest qemu (with TCG). I pulled in a 400Meg file over fw_cfg using the SeaBIOS interface - it takes 9.8 seconds (pretty consistently). Oddly, if I change SeaBIOS to use insb (string pio) it takes 11.5 seconds (again, pretty consistently). These times were measured on the host - they don't include the extra time it takes qemu to start up (during which it reads the file into its memory). -Kevin
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 06:01:54PM +0300, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote: > > On 08/04/2010 09:38 AM, Gleb Natapov wrote: > > >ROM does not muck with the e820. It uses PMM to allocate memory and the > > >memory it gets is marked as reserved in e820 map. Every ROM is implemented differently - there's no way to really know what they'll do. > > PMM allocations are only valid during the init function's execution. > > It's intention is to enable the use of scratch memory to decompress > > or otherwise modify the ROM to shrink its size. > > > Hm, may be. I read seabios code differently, but may be I misread it. There is a PCIv3 extension to PMM which supports long term memory allocations. SeaBIOS does implement this. The base PMM spec though only supports memory allocations during the POST phase. -Kevin
[Qemu-devel] [RFC PATCH 2/4] AMD IOMMU emulation
This introduces emulation for the AMD IOMMU, described in "AMD I/O Virtualization Technology (IOMMU) Specification". Signed-off-by: Eduard - Gabriel Munteanu --- Makefile.target |2 + configure | 10 + hw/amd_iommu.c | 671 +++ hw/pc.c |4 + hw/pc.h |3 + hw/pci_ids.h|2 + hw/pci_regs.h |1 + 7 files changed, 693 insertions(+), 0 deletions(-) create mode 100644 hw/amd_iommu.c diff --git a/Makefile.target b/Makefile.target index 70a9c1b..86226a0 100644 --- a/Makefile.target +++ b/Makefile.target @@ -219,6 +219,8 @@ obj-i386-y += pcspk.o i8254.o obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o +obj-i386-$(CONFIG_AMD_IOMMU) += amd_iommu.o + # Hardware support obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV) obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o diff --git a/configure b/configure index af50607..7448603 100755 --- a/configure +++ b/configure @@ -317,6 +317,7 @@ io_thread="no" mixemu="no" kvm_cap_pit="" kvm_cap_device_assignment="" +amd_iommu="no" kerneldir="" aix="no" blobs="yes" @@ -629,6 +630,8 @@ for opt do ;; --enable-kvm-device-assignment) kvm_cap_device_assignment="yes" ;; + --enable-amd-iommu-emul) amd_iommu="yes" + ;; --enable-profiler) profiler="yes" ;; --enable-cocoa) @@ -871,6 +874,8 @@ echo " --disable-kvm-pitdisable KVM pit support" echo " --enable-kvm-pit enable KVM pit support" echo " --disable-kvm-device-assignment disable KVM device assignment support" echo " --enable-kvm-device-assignment enable KVM device assignment support" +echo " --disable-amd-iommu-emul disable AMD IOMMU emulation" +echo " --enable-amd-iommu-emul enable AMD IOMMU emulation" echo " --disable-nptl disable usermode NPTL support" echo " --enable-nptlenable usermode NPTL support" echo " --enable-system enable all system emulation targets" @@ -2251,6 +2256,7 @@ echo "Install blobs $blobs" echo "KVM support $kvm" echo "KVM PIT support $kvm_cap_pit" echo "KVM device assig. $kvm_cap_device_assignment" +echo "AMD IOMMU emul. $amd_iommu" echo "fdt support $fdt" echo "preadv support$preadv" echo "fdatasync $fdatasync" @@ -2645,6 +2651,10 @@ case "$target_arch2" in x86_64) TARGET_BASE_ARCH=i386 target_phys_bits=64 +if test "$amd_iommu" = "yes"; then + echo "CONFIG_AMD_IOMMU=y" >> $config_target_mak + echo "CONFIG_PCI_IOMMU=y" >> $config_host_mak +fi ;; ia64) target_phys_bits=64 diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c new file mode 100644 index 000..ff9903e --- /dev/null +++ b/hw/amd_iommu.c @@ -0,0 +1,671 @@ +/* + * AMD IOMMU emulation + * + * Copyright (c) 2010 Eduard - Gabriel Munteanu + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "pc.h" +#include "hw.h" +#include "pci.h" +#include "qlist.h" + +/* Capability registers */ +#define CAPAB_HEADER0x00 +#define CAPAB_REV_TYPE0x02 +#define CAPAB_FLAGS 0x03 +#define CAPAB_BAR_LOW 0x04 +#define CAPAB_BAR_HIGH 0x08 +#define CAPAB_RANGE 0x0C +#define CAPAB_MISC 0x10 + +#define CAPAB_SIZE 0x14 + +/* Capability header data */ +#define CAPAB_FLAG_IOTLBSUP (1 << 0) +#define CAPAB_FLAG_HTTUNNEL (1 << 1) +#define CAPAB_FLAG_NPCACHE (1 << 2) +#define CAPAB_INIT_REV (1 << 3) +#define CAPAB_INIT_TYPE 3 +#define CAPAB_INIT_REV_TYPE (CAPAB_REV | CAPAB_TYPE) +#define CAPAB_INIT_FLAGS(CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL) +#define CAPAB_INIT_MISC (64 << 15) | (48 << 8) +#define CAPAB_BAR_MASK ~((1UL << 14) - 1) + +/* MMIO registers */ +#define MMIO_DEVICE_TABLE 0x +#define MMIO_COMMAND_BASE 0x0008 +#define MMIO_EV
[Qemu-devel] [RFC PATCH 3/4] ide: use the PCI memory access interface
Emulated PCI IDE controllers now use the memory access interface. This also allows an emulated IOMMU to translate and check accesses. Map invalidation results in cancelling DMA transfers. Since the guest OS can't properly recover the DMA results in case the mapping is changed, this is a fairly good approximation. Signed-off-by: Eduard - Gabriel Munteanu --- dma-helpers.c | 37 +++-- dma.h | 21 - hw/ide/core.c | 15 --- hw/ide/internal.h | 39 +++ hw/ide/pci.c |7 +++ 5 files changed, 109 insertions(+), 10 deletions(-) diff --git a/dma-helpers.c b/dma-helpers.c index d4fc077..408fee3 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -10,12 +10,34 @@ #include "dma.h" #include "block_int.h" -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint) +static void *qemu_sglist_default_map(void *opaque, + target_phys_addr_t addr, + target_phys_addr_t *len, + int is_write) +{ +return cpu_physical_memory_map(addr, len, is_write); +} + +static void qemu_sglist_default_unmap(void *opaque, + void *buffer, + target_phys_addr_t len, + int is_write, + target_phys_addr_t access_len) +{ +cpu_physical_memory_unmap(buffer, len, is_write, access_len); +} + +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, + QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque) { qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry)); qsg->nsg = 0; qsg->nalloc = alloc_hint; qsg->size = 0; + +qsg->map = map ? map : (QEMUSGMapFunc *) qemu_sglist_default_map; +qsg->unmap = unmap ? unmap : (QEMUSGUnmapFunc *) qemu_sglist_default_unmap; +qsg->opaque = opaque; } void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base, @@ -79,6 +101,16 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs) } } +static void dma_bdrv_cancel(void *opaque) +{ +DMAAIOCB *dbs = opaque; + +bdrv_aio_cancel(dbs->acb); +dma_bdrv_unmap(dbs); +qemu_iovec_destroy(&dbs->iov); +qemu_aio_release(dbs); +} + static void dma_bdrv_cb(void *opaque, int ret) { DMAAIOCB *dbs = (DMAAIOCB *)opaque; @@ -100,7 +132,8 @@ static void dma_bdrv_cb(void *opaque, int ret) while (dbs->sg_cur_index < dbs->sg->nsg) { cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte; cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte; -mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write); +mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs, + cur_addr, &cur_len, !dbs->is_write); if (!mem) break; qemu_iovec_add(&dbs->iov, mem, cur_len); diff --git a/dma.h b/dma.h index f3bb275..d48f35c 100644 --- a/dma.h +++ b/dma.h @@ -15,6 +15,19 @@ #include "hw/hw.h" #include "block.h" +typedef void QEMUSGInvalMapFunc(void *opaque); +typedef void *QEMUSGMapFunc(void *opaque, +QEMUSGInvalMapFunc *inval_cb, +void *inval_opaque, +target_phys_addr_t addr, +target_phys_addr_t *len, +int is_write); +typedef void QEMUSGUnmapFunc(void *opaque, + void *buffer, + target_phys_addr_t len, + int is_write, + target_phys_addr_t access_len); + typedef struct { target_phys_addr_t base; target_phys_addr_t len; @@ -25,9 +38,15 @@ typedef struct { int nsg; int nalloc; target_phys_addr_t size; + +QEMUSGMapFunc *map; +QEMUSGUnmapFunc *unmap; +void *opaque; } QEMUSGList; -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint); +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, + QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, + void *opaque); void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base, target_phys_addr_t len); void qemu_sglist_destroy(QEMUSGList *qsg); diff --git a/hw/ide/core.c b/hw/ide/core.c index 0b3b7c2..c19013a 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -435,7 +435,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write) } prd; int l, len; -qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1); +qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1, + bm->map, bm->unmap, bm->opaque); s->io_buffer_size = 0; for(;;) { if (bm->cur_prd_len == 0) { @@ -443,7 +444,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write) if
[Qemu-devel] [RFC PATCH 4/4] rtl8139: use the PCI memory access interface
This allows the device to work properly with an emulated IOMMU. Signed-off-by: Eduard - Gabriel Munteanu --- hw/rtl8139.c | 99 - 1 files changed, 56 insertions(+), 43 deletions(-) diff --git a/hw/rtl8139.c b/hw/rtl8139.c index 72e2242..99d5f69 100644 --- a/hw/rtl8139.c +++ b/hw/rtl8139.c @@ -412,12 +412,6 @@ typedef struct RTL8139TallyCounters uint16_t TxUndrn; } RTL8139TallyCounters; -/* Clears all tally counters */ -static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters); - -/* Writes tally counters to specified physical memory address */ -static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters); - typedef struct RTL8139State { PCIDevice dev; uint8_t phys[8]; /* mac address */ @@ -496,6 +490,14 @@ typedef struct RTL8139State { } RTL8139State; +/* Clears all tally counters */ +static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters); + +/* Writes tally counters to specified physical memory address */ +static void +RTL8139TallyCounters_physical_memory_write(RTL8139State *s, + target_phys_addr_t tc_addr); + static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time); static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command) @@ -746,6 +748,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s) static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size) { +PCIDevice *dev = &s->dev; + if (s->RxBufAddr + size > s->RxBufferSize) { int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize); @@ -757,15 +761,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size) if (size > wrapped) { -cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, - buf, size-wrapped ); +pci_memory_write(dev, s->RxBuf + s->RxBufAddr, + buf, size-wrapped); } /* reset buffer pointer */ s->RxBufAddr = 0; -cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, - buf + (size-wrapped), wrapped ); +pci_memory_write(dev, s->RxBuf + s->RxBufAddr, + buf + (size-wrapped), wrapped); s->RxBufAddr = wrapped; @@ -774,7 +778,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size) } /* non-wrapping path or overwrapping enabled */ -cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size ); +pci_memory_write(dev, s->RxBuf + s->RxBufAddr, buf, size); s->RxBufAddr += size; } @@ -814,6 +818,7 @@ static int rtl8139_can_receive(VLANClientState *nc) static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt) { RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque; +PCIDevice *dev = &s->dev; int size = size_; uint32_t packet_header = 0; @@ -968,13 +973,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI; -cpu_physical_memory_read(cplus_rx_ring_desc,(uint8_t *)&val, 4); +pci_memory_read(dev, cplus_rx_ring_desc,(uint8_t *)&val, 4); rxdw0 = le32_to_cpu(val); -cpu_physical_memory_read(cplus_rx_ring_desc+4, (uint8_t *)&val, 4); +pci_memory_read(dev, cplus_rx_ring_desc+4, (uint8_t *)&val, 4); rxdw1 = le32_to_cpu(val); -cpu_physical_memory_read(cplus_rx_ring_desc+8, (uint8_t *)&val, 4); +pci_memory_read(dev, cplus_rx_ring_desc+8, (uint8_t *)&val, 4); rxbufLO = le32_to_cpu(val); -cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4); +pci_memory_read(dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4); rxbufHI = le32_to_cpu(val); DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n", @@ -1019,7 +1024,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI); /* receive/copy to target memory */ -cpu_physical_memory_write( rx_addr, buf, size ); +pci_memory_write(dev, rx_addr, buf, size); if (s->CpCmd & CPlusRxChkSum) { @@ -1032,7 +1037,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ #else val = 0; #endif -cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4); +pci_memory_write(dev, rx_addr + size, (uint8_t *)&val, 4); /* first segment of received packet flag */ #define CP_RX_STATUS_FS (1<<29) @@ -1081,9 +1086,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t
[Qemu-devel] [RFC PATCH 0/4] AMD IOMMU emulation 2nd version
Hi, I hope I solved the issues raised by Anthony and Paul. Please have a look and tell me what you think. However, don't merge it yet (in case you like it), I need to test and cleanup some pieces further. There are also some patches from the previous series I didn't include yet. Thanks, Eduard Eduard - Gabriel Munteanu (4): pci: memory access API and IOMMU support AMD IOMMU emulation ide: use the PCI memory access interface rtl8139: use the PCI memory access interface Makefile.target |2 + configure | 10 + dma-helpers.c | 37 +++- dma.h | 21 ++- hw/amd_iommu.c| 671 + hw/ide/core.c | 15 +- hw/ide/internal.h | 39 +++ hw/ide/pci.c |7 + hw/pc.c |4 + hw/pc.h |3 + hw/pci.c | 145 hw/pci.h | 130 +++ hw/pci_ids.h |2 + hw/pci_regs.h |1 + hw/rtl8139.c | 99 + qemu-common.h |1 + 16 files changed, 1134 insertions(+), 53 deletions(-) create mode 100644 hw/amd_iommu.c
[Qemu-devel] [RFC PATCH 1/4] pci: memory access API and IOMMU support
PCI devices should access memory through pci_memory_*() instead of cpu_physical_memory_*(). This also provides support for translation and access checking in case an IOMMU is emulated. Memory maps are treated as remote IOTLBs (that is, translation caches belonging to the IOMMU-aware device itself). Clients (devices) must provide callbacks for map invalidation in case these maps are persistent beyond the current I/O context, e.g. AIO DMA transfers. Signed-off-by: Eduard - Gabriel Munteanu --- hw/pci.c | 145 + hw/pci.h | 130 +++ qemu-common.h |1 + 3 files changed, 276 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 6871728..ce2734b 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -58,6 +58,10 @@ struct PCIBus { Keep a count of the number of devices with raised IRQs. */ int nirq; int *irq_count; + +#ifdef CONFIG_PCI_IOMMU +PCIIOMMU *iommu; +#endif }; static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent); @@ -2029,6 +2033,147 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) } } +#ifdef CONFIG_PCI_IOMMU + +void pci_register_iommu(PCIDevice *dev, PCIIOMMU *iommu) +{ +dev->bus->iommu = iommu; +} + +void pci_memory_rw(PCIDevice *dev, + pci_addr_t addr, + uint8_t *buf, + pci_addr_t len, + int is_write) +{ +int err, plen; +unsigned perms; +PCIIOMMU *iommu = dev->bus->iommu; +target_phys_addr_t paddr; + +if (!iommu || !iommu->translate) +return cpu_physical_memory_rw(addr, buf, len, is_write); + +perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ; + +while (len) { +err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms); +if (err) +return; + +/* The translation might be valid for larger regions. */ +if (plen > len) +plen = len; + +cpu_physical_memory_rw(paddr, buf, plen, is_write); + +len -= plen; +addr += plen; +buf += plen; +} +} + +void *pci_memory_map(PCIDevice *dev, + PCIInvalidateIOTLBFunc *cb, + void *opaque, + pci_addr_t addr, + target_phys_addr_t *len, + int is_write) +{ +int err, plen; +unsigned perms; +PCIIOMMU *iommu = dev->bus->iommu; +target_phys_addr_t paddr; + +if (!iommu || !iommu->translate) +return cpu_physical_memory_map(addr, len, is_write); + +perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ; + +plen = *len; +err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms); +if (err) +return NULL; + +/* + * If this is true, the virtual region is contiguous, + * but the translated physical region isn't. We just + * clamp *len, much like cpu_physical_memory_map() does. + */ +if (plen < *len) +*len = plen; + +/* We treat maps as remote TLBs to cope with stuff like AIO. */ +if (cb && iommu->register_iotlb_invalidator) +iommu->register_iotlb_invalidator(iommu, dev, addr, cb, opaque); + +return cpu_physical_memory_map(paddr, len, is_write); +} + +void pci_memory_unmap(PCIDevice *dev, + void *buffer, + target_phys_addr_t len, + int is_write, + target_phys_addr_t access_len) +{ +cpu_physical_memory_unmap(buffer, len, is_write, access_len); +} + +#define DEFINE_PCI_LD(suffix, size) \ +uint##size##_t pci_ld##suffix(PCIDevice *dev, pci_addr_t addr)\ +{ \ +PCIIOMMU *iommu = dev->bus->iommu;\ +target_phys_addr_t paddr; \ +int plen, err;\ + \ +if (!iommu || !iommu->translate) \ +return ld##suffix##_phys(addr); \ + \ +err = iommu->translate(iommu, dev,\ + addr, &paddr, &plen, IOMMU_PERM_READ); \ +if (err || (plen < size / 8)) \ +return 0; \ + \ +return ld##suffix##_phys(paddr); \ +} + +#define DEFINE_PCI_ST(suffix, size) \ +void pci_st##suffix(PCIDevice *dev, pci_addr_t addr,
RE: [Qemu-devel] [PATCH] Added an option to set the VMDK adapter type
there sent from my Telstra NEXTG™ handset -Original Message- From: Kevin Wolf Sent: Wednesday, 4 August 2010 10:29 PM To: andrzej zaborowski Cc: Aaron Mason ; qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [PATCH] Added an option to set the VMDK adapter type Am 04.08.2010 14:27, schrieb andrzej zaborowski: > Hi, > > On 4 August 2010 12:30, Kevin Wolf wrote: >> Am 04.08.2010 01:46, schrieb Aaron Mason: >>> +const char *real_filename, *temp_str, *adapterType = "ide"; > > Sorry to complain about style, but note that uppercase characters are > not used in variable names in Qemu (that I see). Whoops, missed that one when complaining about the other style problems. Yes, this should be adapter_type. Kevin
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 02:06:58PM -0600, David S. Ahern wrote: > > > On 08/04/10 11:34, Avi Kivity wrote: > > >> And it's awesome for fast prototyping. Of course, once that fast > >> becomes dog slow, it's not useful anymore. > > > > For the Nth time, it's only slow with 100MB initrds. > > 100MB is really not that large for an initrd. I'd just like to note that the libguestfs initrd is uncompressed. The reason for this is I found that the decompression code in Linux is really slow. I have to admit I didn't look into why this is. By not compressing it on the host and decompressing it on the guest, we saved a bunch of boot time (3-5 seconds IIRC). Anyway, comparing 115MB libguestfs initrd and other initrd sizes may not be a fair comparison, since almost every other initrd you will see will be compressed. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v
[Qemu-devel] [Bug 613529] Re: qemu does not accept regular disk geometry
Seems to be the same issue as in http://qemu-forum.ipi.fi/viewtopic.php?f=4&t=5218 -- qemu does not accept regular disk geometry https://bugs.launchpad.net/bugs/613529 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Hi, I am currently hunting a strange bug in qemu/kvm: I am using an lvm logical volume as a virtual hard disk for a virtual machine. I use fdisk or parted to create a partition table and partitions, kpartx to generate the device entries for the partitions, then install linux on ext3/ext4 with grub or msdos filesystem with syslinux. But then, in most cases even the boot process fails or behaves strangely, sometimes even mounting the file system in the virtual machine fails. It seems as if there is a problem with the virtual disk geometry. The problem does not seem to occur if I reboot the host system after creating the partition table on the logical volume. I guess the linux kernel needs to learn the disk geometry by reboot. A blkdev --rereadpt does not work on lvm volumes. The first approach to test/fix the problem would be to pass the disk geometry to qemu/lvm with the -drive option. Unfortunately, qemu/kvm does not accept the default geometry with 255 heads and 63 sectors. Seems to limit the number of heads to 16, thus limiting the disk size.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/10 11:34, Avi Kivity wrote: >> And it's awesome for fast prototyping. Of course, once that fast >> becomes dog slow, it's not useful anymore. > > For the Nth time, it's only slow with 100MB initrds. 100MB is really not that large for an initrd. Consider the deployment of stateless nodes - something that virtualization allows the rapid deployment of. 1 kernel, 1 initrd with the various binaries to be run. Create nodes as needed by launching a shell command - be it for more capacity, isolation, etc. Why require an iso or disk wrapper for a binary blob that is all to be run out of memory? The -append argument allows boot parameters to be specified at launch. That is a very powerful and simple design option. David
[Qemu-devel] segfault due to missing qdev_create()?
I am able to run qemu with the following commandline: /usr/local/bin/qemu-system-ppcemb -enable-kvm -kernel uImage.bamboo -nographic -M bamboo ppc440-angstrom-linux.img However, when I try to use virtio instead, I get this segfault: /usr/local/bin/qemu-system-ppcemb -enable-kvm -kernel uImage.bamboo -drive file=ppc440-angstrom-linux.img,if=virtio -nographic -M bamboo #0 0x1009864c in qbus_find_recursive (bus=0x0, name=0x0, info=0x10287238) at /home/hollisb/work/qemu.git/hw/qdev.c:461 #1 0x10099cc4 in qdev_device_add (opts=0x108a07a0) at /home/hollisb/work/qemu.git/hw/qdev.c:229 #2 0x101a4220 in device_init_func (opts=, opaque=) at /home/hollisb/work/qemu.git/vl.c:1519 #3 0x1002baf8 in qemu_opts_foreach (list=, func=0x101a4204 , opaque=0x0, abort_on_failure=) at qemu-option.c:978 #4 0x101a68e0 in main (argc=, argv=, envp=) at /home/hollisb/work/qemu.git/vl.c:2890 This patch avoids the segfault, but just gives me this message: No 'PCI' bus found for device 'virtio-blk-pci' diff --git a/hw/qdev.c b/hw/qdev.c index e99c73f..8fe4f06 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -455,6 +455,9 @@ static BusState *qbus_find_recursive(BusState *bus, const ch BusState *child, *ret; int match = 1; + if (!bus) + return NULL; + if (name && (strcmp(bus->name, name) != 0)) { match = 0; } FWIW, hw/ppc4xx_pci.c is my PCI controller. Do I need to add some qdev magic to that file to make this work? -Hollis
Re: [Qemu-devel] qemu cp15 access
That patch did not fix my issue. My problem turned out to be due to TLS accesses to cp15 not being allowed by qemu in user mode, even though these are permitted in ARMv6 and above architectures (e.g. see http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388f/CIHFGFGE.html). This was corrected by patch: http://patchwork.ozlabs.org/patch/43797/ which seems to be applied in trunk and will be released in 0.13.0 Thanks. On Wed, Jul 28, 2010 at 7:23 AM, Loïc Minier wrote: > On Mon, Jul 26, 2010, Raymes Khoury wrote: > > I am having the problem with qemu, as described in the post > > > http://old.nabble.com/-PATCH:-PR-target-42671--Use-Thumb1-GOT-address-loading-sequence-for--%09Thumb2-td27124497.html > > where > > accessing cp15 on ARM causes an error: > > See mid 1280086076-20649-1-git-send-email-loic.min...@linaro.org and > thread > > http://article.gmane.org/gmane.comp.emulators.qemu/77092 > > -- > Loïc Minier > >
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 09:16 PM, Anthony Liguori wrote: Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. You can't boot from 9p. As Alex said, you boot from a non-100MB initrd (or cdrom) and mount the 9pfs. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 20:16, Anthony Liguori wrote: > On 08/04/2010 01:13 PM, Alexander Graf wrote: >> On 04.08.2010, at 19:46, Richard W.M. Jones wrote: >> >> >>> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: >>> This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. >>> We had some discussion of this upstream& decided to do this. It >>> should save the time it takes for the guest kernel to unpack the >>> initrd, so maybe another second off boot time, which could bring us >>> ever closer to the "golden" 5 second boot target. >>> >>> It's not trivial mind you, and won't happen straightaway. Part of it >>> is that it requires reworking the appliance builder (a matter of just >>> coding really). The less trivial part is that we have to 'hide' the >>> CD device throughout the publically available interfaces. Then of >>> course, a lot of testing. >>> >> Why not go with 9p? That would save off even more time, as you don't have to >> generate an iso. You could just copy all the relevant executables into tmpfs >> and boot from there using your kernel and a very small (pre-built) initrd. >> > > You can't boot from 9p. But you could still use -kernel and -initrd for that, no? Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 09:13 PM, Alexander Graf wrote: It's not trivial mind you, and won't happen straightaway. Part of it is that it requires reworking the appliance builder (a matter of just coding really). The less trivial part is that we have to 'hide' the CD device throughout the publically available interfaces. Then of course, a lot of testing. Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. Yes - and you don't need to copy, just hardlink if your /tmp and /usr are on the same filesystem. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 01:13 PM, Alexander Graf wrote: On 04.08.2010, at 19:46, Richard W.M. Jones wrote: On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. We had some discussion of this upstream& decided to do this. It should save the time it takes for the guest kernel to unpack the initrd, so maybe another second off boot time, which could bring us ever closer to the "golden" 5 second boot target. It's not trivial mind you, and won't happen straightaway. Part of it is that it requires reworking the appliance builder (a matter of just coding really). The less trivial part is that we have to 'hide' the CD device throughout the publically available interfaces. Then of course, a lot of testing. Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. You can't boot from 9p. Regards, Anthony Liguori Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 19:46, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: >> This is basically my suggestion to libguestfs: instead of generating >> an initrd, generate a bootable cdrom, and boot from that. The >> result is faster and has a smaller memory footprint. Everyone wins. > > We had some discussion of this upstream & decided to do this. It > should save the time it takes for the guest kernel to unpack the > initrd, so maybe another second off boot time, which could bring us > ever closer to the "golden" 5 second boot target. > > It's not trivial mind you, and won't happen straightaway. Part of it > is that it requires reworking the appliance builder (a matter of just > coding really). The less trivial part is that we have to 'hide' the > CD device throughout the publically available interfaces. Then of > course, a lot of testing. Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 19:53, Anthony Liguori wrote: > On 08/04/2010 12:37 PM, Avi Kivity wrote: >> On 08/04/2010 08:27 PM, Anthony Liguori wrote: >>> On 08/04/2010 12:19 PM, Avi Kivity wrote: On 08/04/2010 08:01 PM, Paolo Bonzini wrote: > > That's another story and I totally agree here, but not reusing /dev/sd* > is not intrinsic in the design of virtio-blk (and one thing that Windows > gets right; everything is SCSI, period). > I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? >>> >>> Because that's what real hardware with only a few rare exceptions. >>> >> >> I thought that IDE was emulated as SCSI even when it wasn't. But I guess >> now with SATA you're right. > > IDE -> EIDE -> ATA -> SATA > > ATA can encapsulate SCSI commands via ATAPI which gives you the ability to > have ATA based CD-ROMs among other things. > > I don't believe that SATA actually uses SCSI commands for read/write > operations It doesn't. In fact, it's basically just a wrapper around the normal ATA commands - even for read/write. Plus some additional SATA only commands for parallel read/write. > but I think Linux exposes SATA drivers as SCSI anyway. Yup. That's what libata does. Even works with PATA drives. But this is a purely Linux internal thing. Alex
[Qemu-devel] [PATCH 2/3] savevm: Generate a name when run without one
When savevm is run without a name, the name stays blank and the snapshot is saved anyway. The new behavior is when savevm is run without parameters a name will be created automaticaly, so the snapshot is accessible to the user without needing the id when loadvm is run. (qemu) savevm (qemu) info snapshots IDTAG VM SIZEDATE VM CLOCK 1 vm-20100728134640 978K 2010-07-28 13:46:40 00:00:08.603 We use a name with the format 'vm-MMDDHHMMSS'. This is a first step to hide the internal id, because I don't see a reason to expose this kind of internals to the user. Signed-off-by: Miguel Di Ciurcio Filho --- savevm.c | 29 - 1 files changed, 20 insertions(+), 9 deletions(-) diff --git a/savevm.c b/savevm.c index 9291cfb..025bee6 100644 --- a/savevm.c +++ b/savevm.c @@ -1799,8 +1799,10 @@ void do_savevm(Monitor *mon, const QDict *qdict) uint32_t vm_state_size; #ifdef _WIN32 struct _timeb tb; +struct tm *ptm; #else struct timeval tv; +struct tm tm; #endif const char *name = qdict_get_try_str(qdict, "name"); @@ -1831,15 +1833,6 @@ void do_savevm(Monitor *mon, const QDict *qdict) vm_stop(0); memset(sn, 0, sizeof(*sn)); -if (name) { -ret = bdrv_snapshot_find(bs, old_sn, name); -if (ret >= 0) { -pstrcpy(sn->name, sizeof(sn->name), old_sn->name); -pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str); -} else { -pstrcpy(sn->name, sizeof(sn->name), name); -} -} /* fill auxiliary fields */ #ifdef _WIN32 @@ -1853,6 +1846,24 @@ void do_savevm(Monitor *mon, const QDict *qdict) #endif sn->vm_clock_nsec = qemu_get_clock(vm_clock); +if (name) { +ret = bdrv_snapshot_find(bs, old_sn, name); +if (ret >= 0) { +pstrcpy(sn->name, sizeof(sn->name), old_sn->name); +pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str); +} else { +pstrcpy(sn->name, sizeof(sn->name), name); +} +} else { +#ifdef _WIN32 +ptm = localtime(&tb.time); +strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", ptm); +#else +localtime_r(&tv.tv_sec, &tm); +strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", &tm); +#endif +} + /* Delete old snapshots of the same name */ if (name && del_existing_snapshots(mon, name) < 0) { goto the_end; -- 1.7.1
[Qemu-devel] [PATCH 0/3] snapshots: various updates
Hi there! This series introduces updates the 'info snapshots' and 'savevm' commands. Patch 1 summarizes the output of 'info snapshots' to show only fully available snapshots. Patch 2 adds a default name to an snapshot in case the user did not provide one, using a template like vm-MMDDHHMMSS. Patch 3 adds -f to the 'savevm' command in case the use really wants to overwrite an snapshot. More details in each patch. Changelog from previous version: - libvirt is not affected by the change in savevm - Fixed some coding errors and do not rename the name of variables Regards, Miguel Miguel Di Ciurcio Filho (3): monitor: make 'info snapshots' show only fully available snapshots savevm: Generate a name when run without one savevm: prevent snapshot overwriting qemu-monitor.hx |7 ++-- savevm.c| 97 -- 2 files changed, 76 insertions(+), 28 deletions(-)
[Qemu-devel] [PATCH 1/3] monitor: make 'info snapshots' show only fully available snapshots
The output generated by 'info snapshots' shows only snapshots that exist on the block device that saves the VM state. This output can cause an user to erroneously try to load an snapshot that is not available on all block devices. $ qemu-img snapshot -l xxtest.qcow2 Snapshot list: IDTAG VM SIZEDATE VM CLOCK 11.5M 2010-07-26 16:51:52 00:00:08.599 21.5M 2010-07-26 16:51:53 00:00:09.719 31.5M 2010-07-26 17:26:49 00:00:13.245 41.5M 2010-07-26 19:01:00 00:00:46.763 $ qemu-img snapshot -l xxtest2.qcow2 Snapshot list: IDTAG VM SIZEDATE VM CLOCK 3 0 2010-07-26 17:26:49 00:00:13.245 4 0 2010-07-26 19:01:00 00:00:46.763 Current output: $ qemu -hda xxtest.qcow2 -hdb xxtest2.qcow2 -monitor stdio -vnc :0 QEMU 0.12.4 monitor - type 'help' for more information (qemu) info snapshots Snapshot devices: ide0-hd0 Snapshot list (from ide0-hd0): IDTAG VM SIZEDATE VM CLOCK 11.5M 2010-07-26 16:51:52 00:00:08.599 21.5M 2010-07-26 16:51:53 00:00:09.719 31.5M 2010-07-26 17:26:49 00:00:13.245 41.5M 2010-07-26 19:01:00 00:00:46.763 Snapshots 1 and 2 do not exist on xxtest2.qcow, but they are displayed anyway. This patch sumarizes the output to only show fully available snapshots. New output: (qemu) info snapshots IDTAG VM SIZEDATE VM CLOCK 31.5M 2010-07-26 17:26:49 00:00:13.245 41.5M 2010-07-26 19:01:00 00:00:46.763 Signed-off-by: Miguel Di Ciurcio Filho --- savevm.c | 59 +++ 1 files changed, 43 insertions(+), 16 deletions(-) diff --git a/savevm.c b/savevm.c index 4c0e5d3..9291cfb 100644 --- a/savevm.c +++ b/savevm.c @@ -2004,8 +2004,10 @@ void do_delvm(Monitor *mon, const QDict *qdict) void do_info_snapshots(Monitor *mon) { BlockDriverState *bs, *bs1; -QEMUSnapshotInfo *sn_tab, *sn; -int nb_sns, i; +QEMUSnapshotInfo *sn_tab, *sn, s, *sn_info = &s; +int nb_sns, i, ret, available; +int total; +int *available_snapshots; char buf[256]; bs = bdrv_snapshots(); @@ -2013,27 +2015,52 @@ void do_info_snapshots(Monitor *mon) monitor_printf(mon, "No available block device supports snapshots\n"); return; } -monitor_printf(mon, "Snapshot devices:"); -bs1 = NULL; -while ((bs1 = bdrv_next(bs1))) { -if (bdrv_can_snapshot(bs1)) { -if (bs == bs1) -monitor_printf(mon, " %s", bdrv_get_device_name(bs1)); -} -} -monitor_printf(mon, "\n"); nb_sns = bdrv_snapshot_list(bs, &sn_tab); if (nb_sns < 0) { monitor_printf(mon, "bdrv_snapshot_list: error %d\n", nb_sns); return; } -monitor_printf(mon, "Snapshot list (from %s):\n", - bdrv_get_device_name(bs)); -monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), NULL)); -for(i = 0; i < nb_sns; i++) { + +if (nb_sns == 0) { +monitor_printf(mon, "There is no snapshot available.\n"); +return; +} + +available_snapshots = qemu_mallocz(sizeof(int) * nb_sns); +total = 0; +for (i = 0; i < nb_sns; i++) { sn = &sn_tab[i]; -monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), sn)); +available = 1; +bs1 = NULL; + +while ((bs1 = bdrv_next(bs1))) { +if (bdrv_can_snapshot(bs1) && bs1 != bs) { +ret = bdrv_snapshot_find(bs1, sn_info, sn->id_str); +if (ret < 0) { +available = 0; +break; +} +} +} + +if (available) { +available_snapshots[total] = i; +total++; +} } + +if (total > 0) { +monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), NULL)); +for (i = 0; i < total; i++) { +sn = &sn_tab[available_snapshots[i]]; +monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), sn)); +} +} else { +monitor_printf(mon, "There is no suitable snapshot available\n"); +} + qemu_free(sn_tab); +qemu_free(available_snapshots); + } -- 1.7.1
[Qemu-devel] [PATCH 3/3] savevm: prevent snapshot overwriting
When savevm is run using an previously saved snapshot id or name, it will delete the original and create a new one, using the same id and name and not prompting the user of what just happened. This behaviour is not good, IMHO. We add a '-f' parameter to savevm, to really force that to happen, in case the user really wants to. New behavior: (qemu) savevm snap1 An snapshot named 'snap1' already exists (qemu) savevm -f snap1 We do better error reporting in case '-f' is used too than before and don't reuse the previous id. Note: This patch depends on "savevm: Generate a name when run without one" Signed-off-by: Miguel Di Ciurcio Filho --- qemu-monitor.hx |7 --- savevm.c| 19 ++- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 2af3de6..683ac73 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -275,9 +275,10 @@ ETEXI { .name = "savevm", -.args_type = "name:s?", -.params = "[tag|id]", -.help = "save a VM snapshot. If no tag or id are provided, a new snapshot is created", +.args_type = "force:-f,name:s?", +.params = "[-f] [tag|id]", +.help = "save a VM snapshot. If no tag is provided, a new one is created" +"\n\t\t\t -f to overwrite an snapshot if it already exists", .mhandler.cmd = do_savevm, }, diff --git a/savevm.c b/savevm.c index 025bee6..f0a4b78 100644 --- a/savevm.c +++ b/savevm.c @@ -1805,6 +1805,7 @@ void do_savevm(Monitor *mon, const QDict *qdict) struct tm tm; #endif const char *name = qdict_get_try_str(qdict, "name"); +int force = qdict_get_try_bool(qdict, "force", 0); /* Verify if there is a device that doesn't support snapshots and is writable */ bs = NULL; @@ -1848,12 +1849,20 @@ void do_savevm(Monitor *mon, const QDict *qdict) if (name) { ret = bdrv_snapshot_find(bs, old_sn, name); -if (ret >= 0) { -pstrcpy(sn->name, sizeof(sn->name), old_sn->name); -pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str); -} else { -pstrcpy(sn->name, sizeof(sn->name), name); +if (ret == 0) { +if (force) { +ret = del_existing_snapshots(mon, name); +if (ret < 0) { +monitor_printf(mon, "Error deleting snapshot '%s', error: %d\n", name, ret); +goto the_end; +} +} else { +monitor_printf(mon, "An snapshot named '%s' already exists\n", name); +goto the_end; +} } + +pstrcpy(sn->name, sizeof(sn->name), name); } else { #ifdef _WIN32 ptm = localtime(&tb.time); -- 1.7.1
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:46 PM, Richard W.M. Jones wrote: On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. We had some discussion of this upstream& decided to do this. It should save the time it takes for the guest kernel to unpack the initrd, so maybe another second off boot time, which could bring us ever closer to the "golden" 5 second boot target. Great. IMO it's the right thing even if initrd took zero time. It's not trivial mind you, and won't happen straightaway. Part of it is that it requires reworking the appliance builder (a matter of just coding really). The less trivial part is that we have to 'hide' the CD device throughout the publically available interfaces. Then of course, a lot of testing. I will note that virt-install uses the -initrd interface for installing guests (large initrds too). And I've talked with a sysadmin who was using -kernel and -initrd for deploying VM hosting. In his case he did it so he could centralize kernel distribution / updates, and have the guests use /dev/vda == filesystem which made provisioning easy [for him -- I would have used libguestfs ...]. We still plan to improve pio speed. (note a few added seconds to guest install or bootup is not such a drag compared to the hit on an interactive tool like libguestfs). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 12:37 PM, Avi Kivity wrote: On 08/04/2010 08:27 PM, Anthony Liguori wrote: On 08/04/2010 12:19 PM, Avi Kivity wrote: On 08/04/2010 08:01 PM, Paolo Bonzini wrote: That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? Because that's what real hardware with only a few rare exceptions. I thought that IDE was emulated as SCSI even when it wasn't. But I guess now with SATA you're right. IDE -> EIDE -> ATA -> SATA ATA can encapsulate SCSI commands via ATAPI which gives you the ability to have ATA based CD-ROMs among other things. I don't believe that SATA actually uses SCSI commands for read/write operations but I think Linux exposes SATA drivers as SCSI anyway. Regards, Anthony Liguori
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 11:44:33AM -0500, Anthony Liguori wrote: > On 08/04/2010 11:36 AM, Avi Kivity wrote: > > On 08/04/2010 07:30 PM, Avi Kivity wrote: > >> On 08/04/2010 04:52 PM, Anthony Liguori wrote: > > > This is not like DMA event if done in chunks and chunks can be pretty > big. The code that dials with copying may temporary unmap some pci > devices to have more space there. > >>> > >>> > >>>That's a bit complicated because SeaBIOS is managing the PCI > >>>devices whereas the kernel code is running as an option rom. > >>>I don't know the BIOS PCI interfaces well so I don't know how > >>>doable this is. > >>> > >>>Maybe we're just being too fancy here. > >>> > >>>We could rewrite -kernel/-append/-initrd to just generate a > >>>floppy image in RAM, and just boot from floppy. > >> > >>How could this work? the RAM belongs to SeaBIOS immediately > >>after reset, it would just scribble over it. Or worse, not > >>scribble on it until some date in the future. > >> > >>-kernel data has to find its way to memory after the bios gives > >>control to some optionrom. An alternative would be to embed > >>knowledge of -kernel in seabios, but I don't think it's a good > >>one. > >> > > > >Oh, you meant host RAM, not guest RAM. Disregard. > > > >This is basically my suggestion to libguestfs: instead of > >generating an initrd, generate a bootable cdrom, and boot from > >that. The result is faster and has a smaller memory footprint. > >Everyone wins. > > Yeah, but we could also do that entirely in QEMU. If that's what we > suggest doing, there's no reason not to do it instead of the option > rom trickery that we do today. > > The option rom stuff has a number of short comings. Because we > hijack int19, extboot doesn't get to run. That means that if you > use -kernel to load a grub (the Ubuntu guys for their own absurd > reasons) then grub does not see extboot backed disks. The solution > for them is the same, generate a proper disk and boot from that > disk. > Extboot is not so relevant any more. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 19:36, Anthony Liguori wrote: > On 08/04/2010 12:31 PM, Alexander Graf wrote: >> On 04.08.2010, at 19:26, Anthony Liguori wrote: >> >> >>> On 08/04/2010 11:45 AM, Alexander Graf wrote: >>> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. >>> >>> Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is >>> exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used >>> by firmware and that normal guest drivers should not be written for id 6. >>> >> Why not make id 6 be a fw_cfg virtio interface? > > Because that's a ton more work and we need fw_cfg to be available before PCI > is. IOW, fw_cfg cannot be a PCI interface. in addition to fw_cfg. So you'd have the same contents be exposed using both interfaces. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:27 PM, Anthony Liguori wrote: On 08/04/2010 12:19 PM, Avi Kivity wrote: On 08/04/2010 08:01 PM, Paolo Bonzini wrote: That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? Because that's what real hardware with only a few rare exceptions. I thought that IDE was emulated as SCSI even when it wasn't. But I guess now with SATA you're right. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: > This is basically my suggestion to libguestfs: instead of generating > an initrd, generate a bootable cdrom, and boot from that. The > result is faster and has a smaller memory footprint. Everyone wins. We had some discussion of this upstream & decided to do this. It should save the time it takes for the guest kernel to unpack the initrd, so maybe another second off boot time, which could bring us ever closer to the "golden" 5 second boot target. It's not trivial mind you, and won't happen straightaway. Part of it is that it requires reworking the appliance builder (a matter of just coding really). The less trivial part is that we have to 'hide' the CD device throughout the publically available interfaces. Then of course, a lot of testing. I will note that virt-install uses the -initrd interface for installing guests (large initrds too). And I've talked with a sysadmin who was using -kernel and -initrd for deploying VM hosting. In his case he did it so he could centralize kernel distribution / updates, and have the guests use /dev/vda == filesystem which made provisioning easy [for him -- I would have used libguestfs ...]. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 12:31 PM, Alexander Graf wrote: On 04.08.2010, at 19:26, Anthony Liguori wrote: On 08/04/2010 11:45 AM, Alexander Graf wrote: Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6. Why not make id 6 be a fw_cfg virtio interface? Because that's a ton more work and we need fw_cfg to be available before PCI is. IOW, fw_cfg cannot be a PCI interface. Regards, Anthony Liguori That way we'd stay 100% compatible to everything we have and also get a fast path for reading big chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're in. Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;). Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:31 PM, Alexander Graf wrote: Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;). libguestfs could use 9pfs directly. That will be way faster and reduce the footprint dramatically (the guest will demand load only the pages it needs). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 19:14, Avi Kivity wrote: > On 08/04/2010 08:01 PM, Alexander Graf wrote: >> >>> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) >>> A guest/host interface is not private. >> fw_cfg is as private as it gets with host/guest interfaces. It's about as >> close as CPU specific MSRs or SMC chips. >> > > Well, it isn't. Two external projects already use it. You can't change it > due to the needs to live migrate from older versions. You can always extend it. You can even break it with a new -M. > Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. >>> 3) don't use -kernel for 100MB or more. It's not the right tool. >> Why not? You're the one always ranting about caring about users. Now you get >> at least 3 users from the Qemu development community actually using a >> feature and you just claim it's wrong? Please, we've added way more useless >> features for worse reasons. >> > > It's not wrong in itself, but using it with supersized initrds is wrong. The > data is stored in qemu, host pagecache, and the guest, so three copies, it's > limited by guest RAM, has to be live migrated. Sure we could optimize it, > but it's better to spend our efforts on more mainstream users. It's only stored twice. The host pagecache copy is gone during the lifetime of the VM. Migration also doesn't make sense for most -kernel/-initrd use cases. And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore. I bet within the time everybody spent on this thread we would have a working and stable DMA fw_cfg interface plus extra spare time for supporting breakage already. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:27 PM, Alexander Graf wrote: Well, it isn't. Two external projects already use it. You can't change it due to the needs to live migrate from older versions. You can always extend it. You can even break it with a new -M. Yes. But it's a pain to make sure it all works out. We're already suffering from this where we have no choice, why do it where we have a choice? It's not wrong in itself, but using it with supersized initrds is wrong. The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated. Sure we could optimize it, but it's better to spend our efforts on more mainstream users. It's only stored twice. The host pagecache copy is gone during the lifetime of the VM. It has still evicted some other pagecache. Footprint is footprint. 300MB to cat some file in a guest. Migration also doesn't make sense for most -kernel/-initrd use cases. You're just inviting a bug report here. If we add a feature, let's make it work. And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore. For the Nth time, it's only slow with 100MB initrds. I bet within the time everybody spent on this thread we would have a working and stable DMA fw_cfg interface plus extra spare time for supporting breakage already. The time would have been better spent improving kvm's pio or porting libguestfs to use a cdrom. I'm also hoping to get the point across that adding pv interfaces like crazy is not sustainable. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 19:26, Anthony Liguori wrote: > On 08/04/2010 11:45 AM, Alexander Graf wrote: >> Frankly, I partially agreed to your point when we were talking about 300ms >> vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We >> chose the wrong interface to transfer kernel+initrd data into the guest. >> >> Now the question is how to fix that. I would veto against anything normally >> guest-OS-visible. By occupying the floppy, you lose a floppy drive in the >> guest. By occupying a disk, you see an unwanted disk in the guest. > > > Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is > exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used > by firmware and that normal guest drivers should not be written for id 6. Why not make id 6 be a fw_cfg virtio interface? That way we'd stay 100% compatible to everything we have and also get a fast path for reading big chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're in. Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;). Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 11:45 AM, Alexander Graf wrote: Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6. Problem is now solved and everyone's happy. Now we can all go back to making slides for next week :-) Regards, Anthony Liguori By taking virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is great because it's a private interface nobody else accesses. I see two alternatives out of this mess: 1) Speed up string PIO so we're actually fast again. 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 12:19 PM, Avi Kivity wrote: On 08/04/2010 08:01 PM, Paolo Bonzini wrote: That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? Because that's what real hardware with only a few rare exceptions. Regards, Anthony Liguori
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 19:19, Avi Kivity wrote: > On 08/04/2010 08:01 PM, Paolo Bonzini wrote: >> >> That's another story and I totally agree here, but not reusing /dev/sd* is >> not intrinsic in the design of virtio-blk (and one thing that Windows gets >> right; everything is SCSI, period). >> > > I don't really get why everything must be SCSI. Everything must support > read, write, a few other commands, and a large set of optional commands. But > why map them all to SCSI? What's the magic? Hence the reference to megasas. It implements its own read/write/few other commands and the whole stack of optional commands as SCSI. I think virtio-blk should be the same. SCSI simply because it's there, it's flexible and it's well defined. You get a working spec and a lot of working implementations. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:01 PM, Paolo Bonzini wrote: That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:01 PM, Alexander Graf wrote: 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) A guest/host interface is not private. fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips. Well, it isn't. Two external projects already use it. You can't change it due to the needs to live migrate from older versions. Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. 3) don't use -kernel for 100MB or more. It's not the right tool. Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons. It's not wrong in itself, but using it with supersized initrds is wrong. The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated. Sure we could optimize it, but it's better to spend our efforts on more mainstream users. If you want to pull large amounts of data into the guest efficiently, use virtio-blk. That's what it's for. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 18:54, Avi Kivity wrote: > On 08/04/2010 07:45 PM, Alexander Graf wrote: >> >> I see two alternatives out of this mess: >> >> 1) Speed up string PIO so we're actually fast again. > > Certainly, the best option given that it needs no new interfaces, and > improves the most workloads. > >> 2) Using a different interface (that could also be DMA fw_cfg - remember, >> we're on a private interface anyways) > > A guest/host interface is not private. fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips. > >> Admittedly 1 would also help in more cases than just booting with -kernel >> and -initrd, but if that won't get us to acceptable levels (and yes, 8 >> seconds for 100MB is unacceptable) I don't see any way around 2. > > 3) don't use -kernel for 100MB or more. It's not the right tool. Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 06:49 PM, Anthony Liguori wrote: Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. If you want multiple LUNs per virtio device SCSI is obviously a good choice, but you will need something more (like the config space Avi mentioned). My position is that getting this "something more" right is considerably harder than virtio-blk. Maybe it will be done some day, but I still think that not having virtio-scsi from day 1 was actually a good thing. Even if we can learn from xenbus and all that. What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already. I think the toughest change would be making it appear as a scsi device within the guest. You could do that to virtio-blk but it would be a flag day as reasonable configured guests will break. Having virtio-blk device show up as /dev/vdX was a big mistake. It's been nothing but a giant PITA. There is an amazing amount of software that only looks at /dev/sd* and /dev/hd*. That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). Paolo
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 18:49, Anthony Liguori wrote: > On 08/04/2010 11:48 AM, Alexander Graf wrote: >> On 04.08.2010, at 18:46, Anthony Liguori wrote: >> >> >>> On 08/04/2010 11:44 AM, Avi Kivity wrote: >>> On 08/04/2010 03:53 PM, Anthony Liguori wrote: > So how do we enable support for more than 20 disks? I think a > virtio-scsi is inevitable.. > Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. >>> Right, the only question is, to you inject your own bus or do you just >>> reuse SCSI. On the surface, it seems like reusing SCSI has a significant >>> number of advantages. For instance, without changing the guest's drivers, >>> we can implement PV cdroms or PC tape drivers. >>> >> What exactly would keep us from doing that with virtio-blk? I thought that >> supports scsi commands already. >> > > I think the toughest change would be making it appear as a scsi device within > the guest. You could do that to virtio-blk but it would be a flag day as > reasonable configured guests will break. > > Having virtio-blk device show up as /dev/vdX was a big mistake. It's been > nothing but a giant PITA. There is an amazing amount of software that only > looks at /dev/sd* and /dev/hd*. I completely agree and yes, we should move in that direction IMHO. I don't see why virtio-blk should be any different from megasas for example. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 11:48 AM, Alexander Graf wrote: On 04.08.2010, at 18:46, Anthony Liguori wrote: On 08/04/2010 11:44 AM, Avi Kivity wrote: On 08/04/2010 03:53 PM, Anthony Liguori wrote: So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already. I think the toughest change would be making it appear as a scsi device within the guest. You could do that to virtio-blk but it would be a flag day as reasonable configured guests will break. Having virtio-blk device show up as /dev/vdX was a big mistake. It's been nothing but a giant PITA. There is an amazing amount of software that only looks at /dev/sd* and /dev/hd*. Regards, Anthony Liguori Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 07:45 PM, Alexander Graf wrote: I see two alternatives out of this mess: 1) Speed up string PIO so we're actually fast again. Certainly, the best option given that it needs no new interfaces, and improves the most workloads. 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) A guest/host interface is not private. Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. 3) don't use -kernel for 100MB or more. It's not the right tool. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 07:08 PM, Gleb Natapov wrote: After applying cache fix nothing definite as far as I remember (I ran it last time almost 2 week ago, need to rerun). Code always go through emulator now and check direction flags to update SI/DI accordingly. Emulator is a big switch and it calls various callbacks that may also slow things down. We can have it set up a fast path. Similar to how real hardware optimizes 'rep movs' to copy complete cachelines. The emulator does all the checks, sets up a callback to be called on completion or when an interrupt is made pending, and lets x86.c do all the work. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 18:46, Anthony Liguori wrote: > On 08/04/2010 11:44 AM, Avi Kivity wrote: >> On 08/04/2010 03:53 PM, Anthony Liguori wrote: >>> >>> So how do we enable support for more than 20 disks? I think a virtio-scsi >>> is inevitable.. >> >> Not only for large numbers of disks, also for JBOD performance. If you have >> one queue per disk you'll have low queue depths and high interrupt rates. >> Aggregating many spindles into a single queue is important for reducing >> overhead. > > Right, the only question is, to you inject your own bus or do you just reuse > SCSI. On the surface, it seems like reusing SCSI has a significant number of > advantages. For instance, without changing the guest's drivers, we can > implement PV cdroms or PC tape drivers. What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already. Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 11:44 AM, Avi Kivity wrote: On 08/04/2010 03:53 PM, Anthony Liguori wrote: So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. It also supports SCSI level pass through which is pretty nice for enabling things like NPIV. Regards, Anthony Liguori
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 07:44 PM, Anthony Liguori wrote: The option rom stuff has a number of short comings. Because we hijack int19, extboot doesn't get to run. That means that if you use -kernel to load a grub (the Ubuntu guys for their own absurd reasons) then grub does not see extboot backed disks. The solution for them is the same, generate a proper disk and boot from that disk. Let's print it out and hand out leaflets at the upcoming kvm forum. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 18:36, Avi Kivity wrote: > On 08/04/2010 07:30 PM, Avi Kivity wrote: >> On 08/04/2010 04:52 PM, Anthony Liguori wrote: > This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. >>> >>> >>> That's a bit complicated because SeaBIOS is managing the PCI devices >>> whereas the kernel code is running as an option rom. I don't know the BIOS >>> PCI interfaces well so I don't know how doable this is. >>> >>> Maybe we're just being too fancy here. >>> >>> We could rewrite -kernel/-append/-initrd to just generate a floppy image in >>> RAM, and just boot from floppy. >> >> How could this work? the RAM belongs to SeaBIOS immediately after reset, it >> would just scribble over it. Or worse, not scribble on it until some date >> in the future. >> >> -kernel data has to find its way to memory after the bios gives control to >> some optionrom. An alternative would be to embed knowledge of -kernel in >> seabios, but I don't think it's a good one. >> > > Oh, you meant host RAM, not guest RAM. Disregard. > > This is basically my suggestion to libguestfs: instead of generating an > initrd, generate a bootable cdrom, and boot from that. The result is faster > and has a smaller memory footprint. Everyone wins. Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. By taking virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is great because it's a private interface nobody else accesses. I see two alternatives out of this mess: 1) Speed up string PIO so we're actually fast again. 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. Alex
[Qemu-devel] [Bug 613529] [NEW] qemu does not accept regular disk geometry
Public bug reported: Hi, I am currently hunting a strange bug in qemu/kvm: I am using an lvm logical volume as a virtual hard disk for a virtual machine. I use fdisk or parted to create a partition table and partitions, kpartx to generate the device entries for the partitions, then install linux on ext3/ext4 with grub or msdos filesystem with syslinux. But then, in most cases even the boot process fails or behaves strangely, sometimes even mounting the file system in the virtual machine fails. It seems as if there is a problem with the virtual disk geometry. The problem does not seem to occur if I reboot the host system after creating the partition table on the logical volume. I guess the linux kernel needs to learn the disk geometry by reboot. A blkdev --rereadpt does not work on lvm volumes. The first approach to test/fix the problem would be to pass the disk geometry to qemu/lvm with the -drive option. Unfortunately, qemu/kvm does not accept the default geometry with 255 heads and 63 sectors. Seems to limit the number of heads to 16, thus limiting the disk size. ** Affects: qemu Importance: Undecided Status: New -- qemu does not accept regular disk geometry https://bugs.launchpad.net/bugs/613529 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Hi, I am currently hunting a strange bug in qemu/kvm: I am using an lvm logical volume as a virtual hard disk for a virtual machine. I use fdisk or parted to create a partition table and partitions, kpartx to generate the device entries for the partitions, then install linux on ext3/ext4 with grub or msdos filesystem with syslinux. But then, in most cases even the boot process fails or behaves strangely, sometimes even mounting the file system in the virtual machine fails. It seems as if there is a problem with the virtual disk geometry. The problem does not seem to occur if I reboot the host system after creating the partition table on the logical volume. I guess the linux kernel needs to learn the disk geometry by reboot. A blkdev --rereadpt does not work on lvm volumes. The first approach to test/fix the problem would be to pass the disk geometry to qemu/lvm with the -drive option. Unfortunately, qemu/kvm does not accept the default geometry with 255 heads and 63 sectors. Seems to limit the number of heads to 16, thus limiting the disk size.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 11:36 AM, Avi Kivity wrote: On 08/04/2010 07:30 PM, Avi Kivity wrote: On 08/04/2010 04:52 PM, Anthony Liguori wrote: This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. How could this work? the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it. Or worse, not scribble on it until some date in the future. -kernel data has to find its way to memory after the bios gives control to some optionrom. An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one. Oh, you meant host RAM, not guest RAM. Disregard. This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. Yeah, but we could also do that entirely in QEMU. If that's what we suggest doing, there's no reason not to do it instead of the option rom trickery that we do today. The option rom stuff has a number of short comings. Because we hijack int19, extboot doesn't get to run. That means that if you use -kernel to load a grub (the Ubuntu guys for their own absurd reasons) then grub does not see extboot backed disks. The solution for them is the same, generate a proper disk and boot from that disk. Regards, Anthony Liguori
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 03:53 PM, Anthony Liguori wrote: So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 11:30 AM, Avi Kivity wrote: On 08/04/2010 04:52 PM, Anthony Liguori wrote: This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. How could this work? the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it. Or worse, not scribble on it until some date in the future. I mean host RAM, not guest RAM. Regards, Anthony Liguori -kernel data has to find its way to memory after the bios gives control to some optionrom. An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 07:30 PM, Avi Kivity wrote: On 08/04/2010 04:52 PM, Anthony Liguori wrote: This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. How could this work? the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it. Or worse, not scribble on it until some date in the future. -kernel data has to find its way to memory after the bios gives control to some optionrom. An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one. Oh, you meant host RAM, not guest RAM. Disregard. This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 05:39 PM, Anthony Liguori wrote: We could make kernel an awful lot smarter but unless we've got someone just itching to write 16-bit option rom code, I think our best bet is to try to leverage a standard bootloader and expose a disk containing the kernel/initrd. A problem with that is that the booted kernel would see that disk and try to do something with it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 04:52 PM, Anthony Liguori wrote: This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. How could this work? the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it. Or worse, not scribble on it until some date in the future. -kernel data has to find its way to memory after the bios gives control to some optionrom. An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 04:24 PM, Richard W.M. Jones wrote: It's boot time, so you can just map it over some existing RAM surely? Linuxboot.bin can work out where to map it so it won't be in any memory either being used or the target for the copy. There's no such thing as boot time from the host's point of view. There are interfaces and they should work whatever the guest is doing right now. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 04:04 PM, Anthony Liguori wrote: On 08/04/2010 03:17 AM, Avi Kivity wrote: For playing games, there are three options: - existing fwcfg - fwcfg+dma - put roms in 4GB-2MB (or whatever we decide the flash size is) and have the BIOS copy them Existing fwcfg is the least amount of work and probably satisfactory for isapc. fwcfg+dma is IMO going off a tangent. High memory flash is the most hardware-like solution, pretty easy from a qemu point of view but requires more work. The only trouble I see is that high memory isn't always available. If it's a 32-bit PC and you've exhausted RAM space, then you're only left with the PCI hole and it's not clear to me if you can really pull out 100mb of space there as an option ROM without breaking something. 100MB is out of the question, certainly. I'm talking about your isapc problem, not about a cdrom replacement. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 05:59:40PM +0200, Alexander Graf wrote: > > On 04.08.2010, at 17:48, Gleb Natapov wrote: > > > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: > >> > >> On 04.08.2010, at 17:25, Gleb Natapov wrote: > >> > >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > On 08/04/2010 09:51 AM, David S. Ahern wrote: > > > > On 08/03/10 12:43, Avi Kivity wrote: > >> libguestfs does not depend on an x86 architectural feature. > >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We > >> should > >> discourage people from depending on this interface for production use. > > That is a feature of qemu - and an important one to me as well. Why > > should it be discouraged? You end up at the same place -- a running > > kernel and in-ram filesystem; why require going through a bootloader > > just because the hardware case needs it? > > It's smoke and mirrors. We're still providing a boot loader it's > just a little tiny one that we've written soley for this purpose. > > And it works fine for production use. The question is whether we > ought to be aggressively optimizing it for large initrd sizes. To > be honest, after a lot of discussion of possibilities, I've come to > the conclusion that it's just not worth it. > > There are better ways like using string I/O and optimizing the PIO > path in the kernel. That should cut down the 1s slow down with a > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > further off the initrd load is just not worth it using the current > model. > > >>> The slow down is not 1s any more. String PIO emulation had many bugs > >>> that were fixed in 2.6.35. I verified how much time it took to load 100M > >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > >>> that was already committed make it 20s. I have some code prototype that > >>> makes it 11s. I don't see how we can get below that, surely not back to > >>> ~2-3sec. > >> > >> What exactly is the reason for the slowdown? It can't be only boundary and > >> permission checks, right? > >> > >> > > The big part of slowdown right now is that write into memory is done > > for each byte. It means for each byte we call kvm_write_guest() and > > kvm_mmu_pte_write(). The second call is needed in case memory, instruction > > is trying to write to, is shadowed. Previously we didn't checked for > > that at all. This can be mitigated by introducing write cache and do > > combined writes into the memory and unshadow the page if there is more > > then one write into it. This optimization saves ~10secs. Currently string > > Ok, so you tackled that bit already. > > > emulation enter guest from time to time to check if event injection is > > needed and read from userspace is done in 1K chunks, not 4K like it was, > > but when I made reads to be 4K and disabled guest reentry I haven't seen > > any speed improvements worth talking about. > > So what are we wasting those 10 seconds on then? Does perf tell you anything > useful? > Not 10, but 7-8 seconds. After applying cache fix nothing definite as far as I remember (I ran it last time almost 2 week ago, need to rerun). Code always go through emulator now and check direction flags to update SI/DI accordingly. Emulator is a big switch and it calls various callbacks that may also slow things down. -- Gleb.
[Qemu-devel] [Bug 586175] Re: Windows XP/2003 doesn't boot
** Changed in: debian Status: New => Fix Released -- Windows XP/2003 doesn't boot https://bugs.launchpad.net/bugs/586175 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: Incomplete Status in “qemu-kvm” package in Ubuntu: New Status in Debian GNU/Linux: Fix Released Status in Fedora: Unknown Bug description: Hello everyone, my qemu doesn't boot any Windows XP/2003 installations if I try to boot the image. If I boot the install cd first, it's boot manager counts down and triggers the boot on it's own. That's kinda stupid. I'm using libvirt, but even by a simple > qemu-kvm -drive file=image.img,media=disk,if=ide,boot=on it won't boot. Qemu hangs at the message "Booting from Hard Disk..." I'm using qemu-kvm-0.12.4 with SeaBIOS 0.5.1 on Gentoo (No-Multilib and AMD64). It's a server, that means I'm using VNC as the primary graphic output but i don't think it should be an issue.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 17:48, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: >> >> On 04.08.2010, at 17:25, Gleb Natapov wrote: >> >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: On 08/04/2010 09:51 AM, David S. Ahern wrote: > > On 08/03/10 12:43, Avi Kivity wrote: >> libguestfs does not depend on an x86 architectural feature. >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should >> discourage people from depending on this interface for production use. > That is a feature of qemu - and an important one to me as well. Why > should it be discouraged? You end up at the same place -- a running > kernel and in-ram filesystem; why require going through a bootloader > just because the hardware case needs it? It's smoke and mirrors. We're still providing a boot loader it's just a little tiny one that we've written soley for this purpose. And it works fine for production use. The question is whether we ought to be aggressively optimizing it for large initrd sizes. To be honest, after a lot of discussion of possibilities, I've come to the conclusion that it's just not worth it. There are better ways like using string I/O and optimizing the PIO path in the kernel. That should cut down the 1s slow down with a 100MB initrd by a bit. But honestly, shaving a couple hundred ms further off the initrd load is just not worth it using the current model. >>> The slow down is not 1s any more. String PIO emulation had many bugs >>> that were fixed in 2.6.35. I verified how much time it took to load 100M >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations >>> that was already committed make it 20s. I have some code prototype that >>> makes it 11s. I don't see how we can get below that, surely not back to >>> ~2-3sec. >> >> What exactly is the reason for the slowdown? It can't be only boundary and >> permission checks, right? >> >> > The big part of slowdown right now is that write into memory is done > for each byte. It means for each byte we call kvm_write_guest() and > kvm_mmu_pte_write(). The second call is needed in case memory, instruction > is trying to write to, is shadowed. Previously we didn't checked for > that at all. This can be mitigated by introducing write cache and do > combined writes into the memory and unshadow the page if there is more > then one write into it. This optimization saves ~10secs. Currently string Ok, so you tackled that bit already. > emulation enter guest from time to time to check if event injection is > needed and read from userspace is done in 1K chunks, not 4K like it was, > but when I made reads to be 4K and disabled guest reentry I haven't seen > any speed improvements worth talking about. So what are we wasting those 10 seconds on then? Does perf tell you anything useful? Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: > > On 04.08.2010, at 17:25, Gleb Natapov wrote: > > > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > >> On 08/04/2010 09:51 AM, David S. Ahern wrote: > >>> > >>> On 08/03/10 12:43, Avi Kivity wrote: > libguestfs does not depend on an x86 architectural feature. > qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > discourage people from depending on this interface for production use. > >>> That is a feature of qemu - and an important one to me as well. Why > >>> should it be discouraged? You end up at the same place -- a running > >>> kernel and in-ram filesystem; why require going through a bootloader > >>> just because the hardware case needs it? > >> > >> It's smoke and mirrors. We're still providing a boot loader it's > >> just a little tiny one that we've written soley for this purpose. > >> > >> And it works fine for production use. The question is whether we > >> ought to be aggressively optimizing it for large initrd sizes. To > >> be honest, after a lot of discussion of possibilities, I've come to > >> the conclusion that it's just not worth it. > >> > >> There are better ways like using string I/O and optimizing the PIO > >> path in the kernel. That should cut down the 1s slow down with a > >> 100MB initrd by a bit. But honestly, shaving a couple hundred ms > >> further off the initrd load is just not worth it using the current > >> model. > >> > > The slow down is not 1s any more. String PIO emulation had many bugs > > that were fixed in 2.6.35. I verified how much time it took to load 100M > > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > > that was already committed make it 20s. I have some code prototype that > > makes it 11s. I don't see how we can get below that, surely not back to > > ~2-3sec. > > What exactly is the reason for the slowdown? It can't be only boundary and > permission checks, right? > > The big part of slowdown right now is that write into memory is done for each byte. It means for each byte we call kvm_write_guest() and kvm_mmu_pte_write(). The second call is needed in case memory, instruction is trying to write to, is shadowed. Previously we didn't checked for that at all. This can be mitigated by introducing write cache and do combined writes into the memory and unshadow the page if there is more then one write into it. This optimization saves ~10secs. Currently string emulation enter guest from time to time to check if event injection is needed and read from userspace is done in 1K chunks, not 4K like it was, but when I made reads to be 4K and disabled guest reentry I haven't seen any speed improvements worth talking about. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 04.08.2010, at 17:25, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: >> On 08/04/2010 09:51 AM, David S. Ahern wrote: >>> >>> On 08/03/10 12:43, Avi Kivity wrote: libguestfs does not depend on an x86 architectural feature. qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should discourage people from depending on this interface for production use. >>> That is a feature of qemu - and an important one to me as well. Why >>> should it be discouraged? You end up at the same place -- a running >>> kernel and in-ram filesystem; why require going through a bootloader >>> just because the hardware case needs it? >> >> It's smoke and mirrors. We're still providing a boot loader it's >> just a little tiny one that we've written soley for this purpose. >> >> And it works fine for production use. The question is whether we >> ought to be aggressively optimizing it for large initrd sizes. To >> be honest, after a lot of discussion of possibilities, I've come to >> the conclusion that it's just not worth it. >> >> There are better ways like using string I/O and optimizing the PIO >> path in the kernel. That should cut down the 1s slow down with a >> 100MB initrd by a bit. But honestly, shaving a couple hundred ms >> further off the initrd load is just not worth it using the current >> model. >> > The slow down is not 1s any more. String PIO emulation had many bugs > that were fixed in 2.6.35. I verified how much time it took to load 100M > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > that was already committed make it 20s. I have some code prototype that > makes it 11s. I don't see how we can get below that, surely not back to > ~2-3sec. What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right? Alex
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > On 08/04/2010 09:51 AM, David S. Ahern wrote: > > > >On 08/03/10 12:43, Avi Kivity wrote: > >>libguestfs does not depend on an x86 architectural feature. > >>qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > >>discourage people from depending on this interface for production use. > >That is a feature of qemu - and an important one to me as well. Why > >should it be discouraged? You end up at the same place -- a running > >kernel and in-ram filesystem; why require going through a bootloader > >just because the hardware case needs it? > > It's smoke and mirrors. We're still providing a boot loader it's > just a little tiny one that we've written soley for this purpose. > > And it works fine for production use. The question is whether we > ought to be aggressively optimizing it for large initrd sizes. To > be honest, after a lot of discussion of possibilities, I've come to > the conclusion that it's just not worth it. > > There are better ways like using string I/O and optimizing the PIO > path in the kernel. That should cut down the 1s slow down with a > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > further off the initrd load is just not worth it using the current > model. > The slow down is not 1s any more. String PIO emulation had many bugs that were fixed in 2.6.35. I verified how much time it took to load 100M via fw_cfg interface on older kernel and on 2.6.35. On older kernels on my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations that was already committed make it 20s. I have some code prototype that makes it 11s. I don't see how we can get below that, surely not back to ~2-3sec. > If this is important to someone, we ought to look at refactoring the > loader completely to be disk based which is a higher performance > interface. > > Regards, > > Anthony Liguori > > >David > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 10:07:24AM -0500, Anthony Liguori wrote: > On 08/04/2010 10:01 AM, Gleb Natapov wrote: > > > >Hm, may be. I read seabios code differently, but may be I misread it. > > The BIOS Boot Specification spells it all out pretty clearly. > I have the spec. Isn't this enough to be an expert? Or do you mean I should read it too? > >>If a ROM needs memory after the init function, it needs to use the > >>traditional tricks to allocate long term memory and the most popular > >>one is modifying the e820 tables. > >> > >e820 has no in memory format, > > Indeed. > > >>See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. > >so this ugly code intercepts int15 and mangle result. OMG. How this can > >even work if more then two ROMs want to do that? > > You have to save the old handlers and invoke them. Where do you > save the old handlers? There's tricks you can do by trying to use > some unused vectors and also temporarily using the stack. > > But basically, yeah, I'm amazed every time I see a PC boot that it > all actually works :-) > Heh. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 10:01 AM, Gleb Natapov wrote: Hm, may be. I read seabios code differently, but may be I misread it. The BIOS Boot Specification spells it all out pretty clearly. If a ROM needs memory after the init function, it needs to use the traditional tricks to allocate long term memory and the most popular one is modifying the e820 tables. e820 has no in memory format, Indeed. See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. so this ugly code intercepts int15 and mangle result. OMG. How this can even work if more then two ROMs want to do that? You have to save the old handlers and invoke them. Where do you save the old handlers? There's tricks you can do by trying to use some unused vectors and also temporarily using the stack. But basically, yeah, I'm amazed every time I see a PC boot that it all actually works :-) Regards, Anthony Liguori -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote: > On 08/04/2010 09:38 AM, Gleb Natapov wrote: > >> > >>But even if it wasn't it can potentially create havoc. I think we > >>currently believe that the northbridge likely never forwards RAM > >>access to a device so this doesn't fit how hardware would work. > >> > >Good point. > > > >>More importantly, BIOSes and ROMs do very funny things with RAM. > >>It's not unusual for a ROM to muck with the e820 map to allocate RAM > >>for itself which means there's always the chance that we're going to > >>walk over RAM being used for something else. > >> > >ROM does not muck with the e820. It uses PMM to allocate memory and the > >memory it gets is marked as reserved in e820 map. > > PMM allocations are only valid during the init function's execution. > It's intention is to enable the use of scratch memory to decompress > or otherwise modify the ROM to shrink its size. > Hm, may be. I read seabios code differently, but may be I misread it. > If a ROM needs memory after the init function, it needs to use the > traditional tricks to allocate long term memory and the most popular > one is modifying the e820 tables. > e820 has no in memory format, > See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. so this ugly code intercepts int15 and mangle result. OMG. How this can even work if more then two ROMs want to do that? -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 09:51 AM, David S. Ahern wrote: On 08/03/10 12:43, Avi Kivity wrote: libguestfs does not depend on an x86 architectural feature. qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should discourage people from depending on this interface for production use. That is a feature of qemu - and an important one to me as well. Why should it be discouraged? You end up at the same place -- a running kernel and in-ram filesystem; why require going through a bootloader just because the hardware case needs it? It's smoke and mirrors. We're still providing a boot loader it's just a little tiny one that we've written soley for this purpose. And it works fine for production use. The question is whether we ought to be aggressively optimizing it for large initrd sizes. To be honest, after a lot of discussion of possibilities, I've come to the conclusion that it's just not worth it. There are better ways like using string I/O and optimizing the PIO path in the kernel. That should cut down the 1s slow down with a 100MB initrd by a bit. But honestly, shaving a couple hundred ms further off the initrd load is just not worth it using the current model. If this is important to someone, we ought to look at refactoring the loader completely to be disk based which is a higher performance interface. Regards, Anthony Liguori David
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/03/10 12:43, Avi Kivity wrote: > libguestfs does not depend on an x86 architectural feature. > qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > discourage people from depending on this interface for production use. That is a feature of qemu - and an important one to me as well. Why should it be discouraged? You end up at the same place -- a running kernel and in-ram filesystem; why require going through a bootloader just because the hardware case needs it? David
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 09:38 AM, Gleb Natapov wrote: But even if it wasn't it can potentially create havoc. I think we currently believe that the northbridge likely never forwards RAM access to a device so this doesn't fit how hardware would work. Good point. More importantly, BIOSes and ROMs do very funny things with RAM. It's not unusual for a ROM to muck with the e820 map to allocate RAM for itself which means there's always the chance that we're going to walk over RAM being used for something else. ROM does not muck with the e820. It uses PMM to allocate memory and the memory it gets is marked as reserved in e820 map. PMM allocations are only valid during the init function's execution. It's intention is to enable the use of scratch memory to decompress or otherwise modify the ROM to shrink its size. If a ROM needs memory after the init function, it needs to use the traditional tricks to allocate long term memory and the most popular one is modifying the e820 tables. See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. Regards, Anthony Liguori -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 09:22 AM, Paolo Bonzini wrote: On 08/04/2010 04:00 PM, Gleb Natapov wrote: Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. May be. Can floppy be 100M? Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :) Not sure the BIOS would read such a beast, or SYSLINUX. By the way, if libguestfs insists for an initrd rather than a CDROM image, it could do something in between and make an ISO image with ISOLINUX and the required kernel/initrd pair. (By the way, a network installation image for a typical distribution has a 120M initrd, so it's not just libguestfs. It is very useful to pass the network installation images directly to qemu via -kernel/-initrd). We could make kernel an awful lot smarter but unless we've got someone just itching to write 16-bit option rom code, I think our best bet is to try to leverage a standard bootloader and expose a disk containing the kernel/initrd. Otherwise, we just stick with what we have and deal with the performance as is. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 09:22:22AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:26 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote: > >>On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > >>>On 08/04/2010 08:07 AM, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >On 08/04/2010 03:17 AM, Avi Kivity wrote: > >>For playing games, there are three options: > >>- existing fwcfg > >>- fwcfg+dma > >>- put roms in 4GB-2MB (or whatever we decide the flash size is) > >>and have the BIOS copy them > >> > >>Existing fwcfg is the least amount of work and probably > >>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >>High memory flash is the most hardware-like solution, pretty easy > >>from a qemu point of view but requires more work. > > > >The only trouble I see is that high memory isn't always available. > >If it's a 32-bit PC and you've exhausted RAM space, then you're only > >left with the PCI hole and it's not clear to me if you can really > >pull out 100mb of space there as an option ROM without breaking > >something. > > > We can map it on demand. Guest tells qemu to map rom "A" to address X by > writing into some io port. Guest copies rom. Guest tells qemu to unmap > it. Better then DMA interface IMHO. > >>>That's what I thought too, but in a 32-bit guest using ~3.5GB of > >>>RAM, where can you safely get 100MB of memory to full map the ROM? > >>>If you're going to map chunks at a time, you are basically doing > >>>DMA. > >>It's boot time, so you can just map it over some existing RAM surely? > >Not with current qemu. This is broken now. > > But even if it wasn't it can potentially create havoc. I think we > currently believe that the northbridge likely never forwards RAM > access to a device so this doesn't fit how hardware would work. > Good point. > More importantly, BIOSes and ROMs do very funny things with RAM. > It's not unusual for a ROM to muck with the e820 map to allocate RAM > for itself which means there's always the chance that we're going to > walk over RAM being used for something else. > ROM does not muck with the e820. It uses PMM to allocate memory and the memory it gets is marked as reserved in e820 map. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 09:14:01AM -0500, Anthony Liguori wrote: > >Unmapping device and mapping it at the same place is easy. Enumerating > >pci devices from multiboot.bin looks like unneeded churn though. > > > >>Maybe we're just being too fancy here. > >> > >>We could rewrite -kernel/-append/-initrd to just generate a floppy > >>image in RAM, and just boot from floppy. > >> > >May be. Can floppy be 100M? > > No, I forgot just how small they are. R/O usb mass storage device? > CDROM? I'm beginning thing that loading such a large initrd through > fwcfg is simply a dead end. > Well, libguestfs can use CDROM by itself to begin with. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 04:00 PM, Gleb Natapov wrote: Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. May be. Can floppy be 100M? Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :) Not sure the BIOS would read such a beast, or SYSLINUX. By the way, if libguestfs insists for an initrd rather than a CDROM image, it could do something in between and make an ISO image with ISOLINUX and the required kernel/initrd pair. (By the way, a network installation image for a typical distribution has a 120M initrd, so it's not just libguestfs. It is very useful to pass the network installation images directly to qemu via -kernel/-initrd). Paolo
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:26 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote: On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: On 08/04/2010 08:07 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: On 08/04/2010 03:17 AM, Avi Kivity wrote: For playing games, there are three options: - existing fwcfg - fwcfg+dma - put roms in 4GB-2MB (or whatever we decide the flash size is) and have the BIOS copy them Existing fwcfg is the least amount of work and probably satisfactory for isapc. fwcfg+dma is IMO going off a tangent. High memory flash is the most hardware-like solution, pretty easy >from a qemu point of view but requires more work. The only trouble I see is that high memory isn't always available. If it's a 32-bit PC and you've exhausted RAM space, then you're only left with the PCI hole and it's not clear to me if you can really pull out 100mb of space there as an option ROM without breaking something. We can map it on demand. Guest tells qemu to map rom "A" to address X by writing into some io port. Guest copies rom. Guest tells qemu to unmap it. Better then DMA interface IMHO. That's what I thought too, but in a 32-bit guest using ~3.5GB of RAM, where can you safely get 100MB of memory to full map the ROM? If you're going to map chunks at a time, you are basically doing DMA. It's boot time, so you can just map it over some existing RAM surely? Not with current qemu. This is broken now. But even if it wasn't it can potentially create havoc. I think we currently believe that the northbridge likely never forwards RAM access to a device so this doesn't fit how hardware would work. More importantly, BIOSes and ROMs do very funny things with RAM. It's not unusual for a ROM to muck with the e820 map to allocate RAM for itself which means there's always the chance that we're going to walk over RAM being used for something else. Regards, Anthony Liguori Linuxboot.bin can work out where to map it so it won't be in any memory either being used or the target for the copy. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 09:00 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote: On 08/04/2010 08:34 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: On 08/04/2010 08:07 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: On 08/04/2010 03:17 AM, Avi Kivity wrote: For playing games, there are three options: - existing fwcfg - fwcfg+dma - put roms in 4GB-2MB (or whatever we decide the flash size is) and have the BIOS copy them Existing fwcfg is the least amount of work and probably satisfactory for isapc. fwcfg+dma is IMO going off a tangent. High memory flash is the most hardware-like solution, pretty easy >from a qemu point of view but requires more work. The only trouble I see is that high memory isn't always available. If it's a 32-bit PC and you've exhausted RAM space, then you're only left with the PCI hole and it's not clear to me if you can really pull out 100mb of space there as an option ROM without breaking something. We can map it on demand. Guest tells qemu to map rom "A" to address X by writing into some io port. Guest copies rom. Guest tells qemu to unmap it. Better then DMA interface IMHO. That's what I thought too, but in a 32-bit guest using ~3.5GB of RAM, where can you safely get 100MB of memory to full map the ROM? If you're going to map chunks at a time, you are basically doing DMA. This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Unmapping device and mapping it at the same place is easy. Enumerating pci devices from multiboot.bin looks like unneeded churn though. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. May be. Can floppy be 100M? No, I forgot just how small they are. R/O usb mass storage device? CDROM? I'm beginning thing that loading such a large initrd through fwcfg is simply a dead end. Regards, Anthony Liguori -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:34 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > >>On 08/04/2010 08:07 AM, Gleb Natapov wrote: > >>>On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > On 08/04/2010 03:17 AM, Avi Kivity wrote: > >For playing games, there are three options: > >- existing fwcfg > >- fwcfg+dma > >- put roms in 4GB-2MB (or whatever we decide the flash size is) > >and have the BIOS copy them > > > >Existing fwcfg is the least amount of work and probably > >satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >High memory flash is the most hardware-like solution, pretty easy > >from a qemu point of view but requires more work. > > The only trouble I see is that high memory isn't always available. > If it's a 32-bit PC and you've exhausted RAM space, then you're only > left with the PCI hole and it's not clear to me if you can really > pull out 100mb of space there as an option ROM without breaking > something. > > >>>We can map it on demand. Guest tells qemu to map rom "A" to address X by > >>>writing into some io port. Guest copies rom. Guest tells qemu to unmap > >>>it. Better then DMA interface IMHO. > >>That's what I thought too, but in a 32-bit guest using ~3.5GB of > >>RAM, where can you safely get 100MB of memory to full map the ROM? > >>If you're going to map chunks at a time, you are basically doing > >>DMA. > >> > >This is not like DMA event if done in chunks and chunks can be pretty > >big. The code that dials with copying may temporary unmap some pci > >devices to have more space there. > > That's a bit complicated because SeaBIOS is managing the PCI devices > whereas the kernel code is running as an option rom. I don't know > the BIOS PCI interfaces well so I don't know how doable this is. > Unmapping device and mapping it at the same place is easy. Enumerating pci devices from multiboot.bin looks like unneeded churn though. > Maybe we're just being too fancy here. > > We could rewrite -kernel/-append/-initrd to just generate a floppy > image in RAM, and just boot from floppy. > May be. Can floppy be 100M? -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On 08/04/2010 08:34 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: On 08/04/2010 08:07 AM, Gleb Natapov wrote: On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: On 08/04/2010 03:17 AM, Avi Kivity wrote: For playing games, there are three options: - existing fwcfg - fwcfg+dma - put roms in 4GB-2MB (or whatever we decide the flash size is) and have the BIOS copy them Existing fwcfg is the least amount of work and probably satisfactory for isapc. fwcfg+dma is IMO going off a tangent. High memory flash is the most hardware-like solution, pretty easy >from a qemu point of view but requires more work. The only trouble I see is that high memory isn't always available. If it's a 32-bit PC and you've exhausted RAM space, then you're only left with the PCI hole and it's not clear to me if you can really pull out 100mb of space there as an option ROM without breaking something. We can map it on demand. Guest tells qemu to map rom "A" to address X by writing into some io port. Guest copies rom. Guest tells qemu to unmap it. Better then DMA interface IMHO. That's what I thought too, but in a 32-bit guest using ~3.5GB of RAM, where can you safely get 100MB of memory to full map the ROM? If you're going to map chunks at a time, you are basically doing DMA. This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. Regards, Anthony Liguori And what's the upper limit on ROM size that we impose? 100MB is already at the ridiculously large size. Agree. We have two solutions: 1. Avoid the problem 2. Fix the problem. Both are fine with me and I prefer 1, but if we are going with 2 I prefer something sane. -- Gleb.
Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:07 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >>On 08/04/2010 03:17 AM, Avi Kivity wrote: > >>>For playing games, there are three options: > >>>- existing fwcfg > >>>- fwcfg+dma > >>>- put roms in 4GB-2MB (or whatever we decide the flash size is) > >>>and have the BIOS copy them > >>> > >>>Existing fwcfg is the least amount of work and probably > >>>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >>>High memory flash is the most hardware-like solution, pretty easy > >>>from a qemu point of view but requires more work. > >> > >>The only trouble I see is that high memory isn't always available. > >>If it's a 32-bit PC and you've exhausted RAM space, then you're only > >>left with the PCI hole and it's not clear to me if you can really > >>pull out 100mb of space there as an option ROM without breaking > >>something. > >> > >We can map it on demand. Guest tells qemu to map rom "A" to address X by > >writing into some io port. Guest copies rom. Guest tells qemu to unmap > >it. Better then DMA interface IMHO. > > That's what I thought too, but in a 32-bit guest using ~3.5GB of > RAM, where can you safely get 100MB of memory to full map the ROM? > If you're going to map chunks at a time, you are basically doing > DMA. > This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. > And what's the upper limit on ROM size that we impose? 100MB is > already at the ridiculously large size. > Agree. We have two solutions: 1. Avoid the problem 2. Fix the problem. Both are fine with me and I prefer 1, but if we are going with 2 I prefer something sane. -- Gleb.