date:20100804

[Qemu-devel] Qemu doesn't implement SCSI READ DISC INFORMATION command (0x51) Qemu reports: SK=5h/ASC=20h/ACQ=00h

2010-08-04 Thread Jasper Hartline

https://bugs.launchpad.net/qemu/+bug/612901

Just curious if anyone has any information on this, perhaps this is
fixed in a CVS or git branch of Qemu
or a possible work around someone knows may work, or any information really.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 07:17:30PM -0400, Kevin O'Connor wrote:
> On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote:
> > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> > > There are better ways like using string I/O and optimizing the PIO
> > > path in the kernel.  That should cut down the 1s slow down with a
> > > 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> > > further off the initrd load is just not worth it using the current
> > > model.
> > > 
> > The slow down is not 1s any more. String PIO emulation had many bugs
> > that were fixed in 2.6.35. I verified how much time it took to load 100M
> > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> > that was already committed make it 20s. I have some code prototype that
> > makes it 11s. I don't see how we can get below that, surely not back to
> > ~2-3sec.
> 
> I guess this slowness is primarily for kvm.  I just ran some tests on
> the latest qemu (with TCG).  I pulled in a 400Meg file over fw_cfg
> using the SeaBIOS interface - it takes 9.8 seconds (pretty
> consistently).  Oddly, if I change SeaBIOS to use insb (string pio) it
> takes 11.5 seconds (again, pretty consistently).  These times were
> measured on the host - they don't include the extra time it takes qemu
> to start up (during which it reads the file into its memory).
> 
Yes only KVM is affected, nothing has changed in qemu itself.

--
Gleb.

Re: [Qemu-devel] [PATCH 0/4] fix PowerPC 440 Bamboo platform emulation

2010-08-04 Thread Edgar E. Iglesias

On Wed, Aug 04, 2010 at 05:21:33PM -0700, Hollis Blanchard wrote:
> These patches get the PowerPC Bamboo platform working again. I've re-written
> two of the patches based on feedback from qemu-devel.
> 
> Note that this platform still only works in conjunction with KVM, since the
> PowerPC 440 MMU is still not accurately emulated by TCG.

Is that the Book-E MMU?

In case it is, I've got a couple of fairly ugly patches somewhere that
at least made it possible for me to boot linux on a TCG emulated
ppc 440. I'll see if I can dig them out and post them.

Cheers

Re: [Qemu-devel] [PATCH 4/4] ppc4xx: load Bamboo kernel, initrd, and fdt at fixed addresses

2010-08-04 Thread Edgar E. Iglesias

On Wed, Aug 04, 2010 at 05:21:37PM -0700, Hollis Blanchard wrote:
> We can't use the return value of load_uimage() for the kernel because it
> can't account for BSS size, and the PowerPC kernel does not relocate
> blobs before zeroing BSS.
> 
> Instead, we now load at the fixed addresses chosen by u-boot (the normal
> firmware for the board).
> 
> Signed-off-by: Hollis Blanchard 


This looks good to me, thanks Hollis.

Acked-by: Edgar E. Iglesias 


> 
> ---
>  hw/ppc440_bamboo.c |   39 ++-
>  1 files changed, 18 insertions(+), 21 deletions(-)
> 
> This fixes a critical bug in PowerPC 440 Bamboo board emulation.
> 
> diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c
> index d471d5d..34ddf45 100644
> --- a/hw/ppc440_bamboo.c
> +++ b/hw/ppc440_bamboo.c
> @@ -27,6 +27,11 @@
>  
>  #define BINARY_DEVICE_TREE_FILE "bamboo.dtb"
>  
> +/* from u-boot */
> +#define KERNEL_ADDR  0x100
> +#define FDT_ADDR 0x180
> +#define RAMDISK_ADDR 0x190
> +
>  static int bamboo_load_device_tree(target_phys_addr_t addr,
>   uint32_t ramsize,
>   target_phys_addr_t initrd_base,
> @@ -98,10 +103,8 @@ static void bamboo_init(ram_addr_t ram_size,
>  uint64_t elf_lowaddr;
>  target_phys_addr_t entry = 0;
>  target_phys_addr_t loadaddr = 0;
> -target_long kernel_size = 0;
> -target_ulong initrd_base = 0;
>  target_long initrd_size = 0;
> -target_ulong dt_base = 0;
> +int success;
>  int i;
>  
>  /* Setup CPU. */
> @@ -118,15 +121,15 @@ static void bamboo_init(ram_addr_t ram_size,
>  
>  /* Load kernel. */
>  if (kernel_filename) {
> -kernel_size = load_uimage(kernel_filename, &entry, &loadaddr, NULL);
> -if (kernel_size < 0) {
> -kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
> -   &elf_lowaddr, NULL, 1, ELF_MACHINE, 0);
> +success = load_uimage(kernel_filename, &entry, &loadaddr, NULL);
> +if (success < 0) {
> +success = load_elf(kernel_filename, NULL, NULL, &elf_entry,
> +   &elf_lowaddr, NULL, 1, ELF_MACHINE, 0);
>  entry = elf_entry;
>  loadaddr = elf_lowaddr;
>  }
>  /* XXX try again as binary */
> -if (kernel_size < 0) {
> +if (success < 0) {
>  fprintf(stderr, "qemu: could not load kernel '%s'\n",
>  kernel_filename);
>  exit(1);
> @@ -135,26 +138,20 @@ static void bamboo_init(ram_addr_t ram_size,
>  
>  /* Load initrd. */
>  if (initrd_filename) {
> -initrd_base = kernel_size + loadaddr;
> -initrd_size = load_image_targphys(initrd_filename, initrd_base,
> -  ram_size - initrd_base);
> +initrd_size = load_image_targphys(initrd_filename, RAMDISK_ADDR,
> +  ram_size - RAMDISK_ADDR);
>  
>  if (initrd_size < 0) {
> -fprintf(stderr, "qemu: could not load initial ram disk '%s'\n",
> -initrd_filename);
> +fprintf(stderr, "qemu: could not load ram disk '%s' at %x\n",
> +initrd_filename, RAMDISK_ADDR);
>  exit(1);
>  }
>  }
>  
>  /* If we're loading a kernel directly, we must load the device tree too. 
> */
>  if (kernel_filename) {
> -if (initrd_base)
> -dt_base = initrd_base + initrd_size;
> -else
> -dt_base = kernel_size + loadaddr;
> -
> -if (bamboo_load_device_tree(dt_base, ram_size,
> -initrd_base, initrd_size, kernel_cmdline) < 0) {
> +if (bamboo_load_device_tree(FDT_ADDR, ram_size, RAMDISK_ADDR,
> +initrd_size, kernel_cmdline) < 0) {
>  fprintf(stderr, "couldn't load device tree\n");
>  exit(1);
>  }
> @@ -163,7 +160,7 @@ static void bamboo_init(ram_addr_t ram_size,
>  
>  /* Set initial guest state. */
>  env->gpr[1] = (16<<20) - 8;
> -env->gpr[3] = dt_base;
> +env->gpr[3] = FDT_ADDR;
>  env->nip = entry;
>  /* XXX we currently depend on KVM to create some initial TLB 
> entries. */
>  }
> -- 
> 1.7.2
> 
>

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 11:06 PM, David S. Ahern wrote:


On 08/04/10 11:34, Avi Kivity wrote:


And it's awesome for fast prototyping. Of course, once that fast
becomes dog slow, it's not useful anymore.

For the Nth time, it's only slow with 100MB initrds.

100MB is really not that large for an initrd.

Consider the deployment of stateless nodes - something that
virtualization allows the rapid deployment of. 1 kernel, 1 initrd with
the various binaries to be run. Create nodes as needed by launching a
shell command - be it for more capacity, isolation, etc. Why require an
iso or disk wrapper for a binary blob that is all to be run out of
memory?


It's inefficient.  First qemu reads the initrd and stores it in memory 
(where it is kept while the guest runs in case you migrate or reboot).  
Then the guest copies it into temporary storage (where we currently have 
the slowdown).  Then the guest decompresses and extracts it to tmpfs 
(initramfs model).  Finally the guest runs init out of initrd, typically 
using just a part of the 100MB+.


Whereas with a disk image, individual pages are copied to the guest on 
demand without taking space in qemu.  With cache=none, they don't even 
affect host pagecache.



The -append argument allows boot parameters to be specified at
launch. That is a very powerful and simple design option.


Good point.  You still have it with a small initrd that bootstraps a 
larger image.


Note -append probably works even without -kernel, it's just that the 
guest isn't tooled to look at it.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

[Qemu-devel] [PATCH v2 2/8] qdev: export qdev_reset() for later use.

2010-08-04 Thread Isaku Yamahata

export qdev_reset() for later use.

Signed-off-by: Isaku Yamahata 
---
 hw/qdev.c |   29 +
 hw/qdev.h |1 +
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index e99c73f..322b315 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -256,13 +256,34 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 return qdev;
 }
 
-static void qdev_reset(void *opaque)
+/*
+ * reset the device.
+ * Bring the device into initial known state (to some extent)
+ * on warm reset(system reset).
+ * Typically on system reset(or power-on reset), bus reset occurs on
+ * each bus which causes devices to reset.
+ * This reset doesn't include software reset which is triggered by
+ * issuing reset command. Those device reset would be implemented in a bus
+ * specific way.
+ *
+ * For example
+ * PCI: reset with RST# signal asserted. Not FLR of advanced feature capability
+ * PCIe: conventional reset. Not FLR.
+ * ATA: hardware reset with RESET- signal asserted. Not DEVICE RESET command.
+ * SCSI: hard reset with SCSI RST signal asserted.
+ *   Not bus device reset message.
+ */
+void qdev_reset(DeviceState *dev)
 {
-DeviceState *dev = opaque;
 if (dev->info->reset)
 dev->info->reset(dev);
 }
 
+static void qdev_reset_fn(void *opaque)
+{
+qdev_reset(opaque);
+}
+
 /* Initialize a device.  Device properties should be set before calling
this function.  IRQs and MMIO regions should be connected/mapped after
calling this function.
@@ -278,7 +299,7 @@ int qdev_init(DeviceState *dev)
 qdev_free(dev);
 return rc;
 }
-qemu_register_reset(qdev_reset, dev);
+qemu_register_reset(qdev_reset_fn, dev);
 if (dev->info->vmsd) {
 vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
dev->instance_id_alias,
@@ -350,7 +371,7 @@ void qdev_free(DeviceState *dev)
 if (dev->opts)
 qemu_opts_del(dev->opts);
 }
-qemu_unregister_reset(qdev_reset, dev);
+qemu_unregister_reset(qdev_reset_fn, dev);
 QLIST_REMOVE(dev, sibling);
 for (prop = dev->info->props; prop && prop->name; prop++) {
 if (prop->info->free) {
diff --git a/hw/qdev.h b/hw/qdev.h
index 678f8b7..10f6769 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -162,6 +162,7 @@ struct DeviceInfo {
 extern DeviceInfo *device_info_list;
 
 void qdev_register(DeviceInfo *info);
+void qdev_reset(DeviceState *dev);
 
 /* Register device properties.  */
 /* GPIO inputs also double as IRQ sinks.  */
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 4/8] pci: make pci_device_reset() aware of qdev.

2010-08-04 Thread Isaku Yamahata

Make pci_device_reset handle qdevfied device and non-converted device
differently.
Later they will be handled differently.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   35 +--
 hw/pci.h |1 +
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6a614d1..c48bb3e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -130,8 +130,7 @@ static void pci_update_irq_status(PCIDevice *dev)
 }
 }
 
-/* Reset the device in response to RST# signal. */
-void pci_device_reset(PCIDevice *dev)
+void pci_device_reset_default(PCIDevice *dev)
 {
 int r;
 
@@ -159,6 +158,38 @@ void pci_device_reset(PCIDevice *dev)
 pci_update_mappings(dev);
 }
 
+/* Reset the device in response to RST# signal. */
+void pci_device_reset(PCIDevice *dev)
+{
+if (!dev->qdev.info || !dev->qdev.info->reset) {
+/* for not qdevified device or reset isn't implemented property.
+ * So take care of them in PCI generic layer.
+ */
+pci_device_reset_default(dev);
+return;
+}
+
+/*
+ * There are two paths to reset pci device. Each resets does partially.
+ * qemu_system_reset()
+ *  -> pci_device_reset() with bus
+ * -> pci_device_reset_default() which resets pci common part.
+ *  -> DeviceState::reset: each device specific reset hanlder
+ * which resets device specific part.
+ *
+ * TODO:
+ * It requires two execution paths to reset the device fully.
+ * It is confusing and prone to error. Each device should know all
+ * its states.
+ * So move this part to each device specific callback.
+ */
+
+/* For now qdev_reset() is called directly by qemu_system_reset() */
+/* qdev_reset(&dev->qdev); */
+
+pci_device_reset_default(dev);
+}
+
 /*
  * Trigger pci bus reset under a given bus.
  * This functions emulates RST#.
diff --git a/hw/pci.h b/hw/pci.h
index be05662..ce1feb4 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -210,6 +210,7 @@ PCIBus *pci_bus_new(DeviceState *parent, const char *name, 
int devfn_min);
 
 void pci_bus_reset(PCIBus *bus);
 void pci_device_reset(PCIDevice *dev);
+void pci_device_reset_default(PCIDevice *dev);
 
 void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
   void *irq_opaque, int nirq);
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 8/8] pci bridge: implement secondary bus reset.

2010-08-04 Thread Isaku Yamahata

implement secondary bus reset.

Signed-off-by: Isaku Yamahata 
---
 hw/pci_bridge.c |   13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index ab7ed6e..37710e9 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -119,6 +119,9 @@ pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, 
uint8_t type)
 void pci_bridge_write_config(PCIDevice *d,
  uint32_t address, uint32_t val, int len)
 {
+PCIBridge *s = container_of(d, PCIBridge, dev);
+uint16_t bridge_control = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+
 pci_default_write_config(d, address, val, len);
 
 if (/* io base/limit */
@@ -127,9 +130,17 @@ void pci_bridge_write_config(PCIDevice *d,
 /* memory base/limit, prefetchable base/limit and
io base/limit upper 16 */
 ranges_overlap(address, len, PCI_MEMORY_BASE, 20)) {
-PCIBridge *s = container_of(d, PCIBridge, dev);
 pci_bridge_update_mappings(&s->sec_bus);
 }
+
+if (ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+uint16_t new = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+if (!(bridge_control & PCI_BRIDGE_CTL_BUS_RESET) &&
+(new & PCI_BRIDGE_CTL_BUS_RESET)) {
+/* 0 -> 1 */
+pci_bus_reset(&s->sec_bus);
+}
+}
 }
 
 /* reset bridge specific configuration registers */
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 1/8] apb: fix typo.

2010-08-04 Thread Isaku Yamahata

fix typo.

Signed-off-by: Isaku Yamahata 
---
 hw/apb_pci.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 10a5baa..c619112 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 /* APB secondary busses */
 pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true,
"pbm-bridge");
-br = DO_UPCAST(PCIBridge, dev, dev);
+br = DO_UPCAST(PCIBridge, dev, pci_dev);
 pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1",
pci_apb_map_irq);
 qdev_init_nofail(&pci_dev->qdev);
@@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 
 pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true,
"pbm-bridge");
-br = DO_UPCAST(PCIBridge, dev, dev);
+br = DO_UPCAST(PCIBridge, dev, pci_dev);
 pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2",
pci_apb_map_irq);
 qdev_init_nofail(&pci_dev->qdev);
@@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = {
 .qdev.name = "pbm-bridge",
 .qdev.size = sizeof(PCIBridge),
 .qdev.vmsd = &vmstate_pci_device,
-.qdev.reset = pci_brdige_reset,
+.qdev.reset = pci_bridge_reset,
 .init = apb_pci_bridge_initfn,
 .exit = pci_bridge_exitfn,
 .config_write = pci_bridge_write_config,
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 3/8] pci: export pci_bus_reset() and pci_device_reset() for later use.

2010-08-04 Thread Isaku Yamahata

export pci_bus_reset() and pci_device_reset() for later use
with slight function signature adjustment.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   17 +
 hw/pci.h |4 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..6a614d1 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -130,7 +130,8 @@ static void pci_update_irq_status(PCIDevice *dev)
 }
 }
 
-static void pci_device_reset(PCIDevice *dev)
+/* Reset the device in response to RST# signal. */
+void pci_device_reset(PCIDevice *dev)
 {
 int r;
 
@@ -158,9 +159,12 @@ static void pci_device_reset(PCIDevice *dev)
 pci_update_mappings(dev);
 }
 
-static void pci_bus_reset(void *opaque)
+/*
+ * Trigger pci bus reset under a given bus.
+ * This functions emulates RST#.
+ */
+void pci_bus_reset(PCIBus *bus)
 {
-PCIBus *bus = opaque;
 int i;
 
 for (i = 0; i < bus->nirq; i++) {
@@ -173,6 +177,11 @@ static void pci_bus_reset(void *opaque)
 }
 }
 
+static void pci_bus_reset_fn(void *opaque)
+{
+pci_bus_reset(opaque);
+}
+
 static void pci_host_bus_register(int domain, PCIBus *bus)
 {
 struct PCIHostBus *host;
@@ -227,7 +236,7 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */
 
 vmstate_register(NULL, -1, &vmstate_pcibus, bus);
-qemu_register_reset(pci_bus_reset, bus);
+qemu_register_reset(pci_bus_reset_fn, bus);
 }
 
 PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min)
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..be05662 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -207,6 +207,10 @@ typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice 
*pci_dev, int state);
 void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
  const char *name, int devfn_min);
 PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min);
+
+void pci_bus_reset(PCIBus *bus);
+void pci_device_reset(PCIDevice *dev);
+
 void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
   void *irq_opaque, int nirq);
 void pci_bus_hotplug(PCIBus *bus, pci_hotplug_fn hotplug, DeviceState *dev);
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 5/8] qdev: introduce bus reset callback and helper functions.

2010-08-04 Thread Isaku Yamahata

Introduce bus reset callback to support bus reset at qbus layer
and a function to trigger bus reset.
Now qdev reset callback is triggered by parent qbus reset callback.
And qdev should trigger child qbus reset callback.

Signed-off-by: Isaku Yamahata 
---
changes v1 -> v2
- eliminate qemu_register_reset() from qdev_create()
  as Gerd suggested.
- Inserted qdev_reset_default() as appropriate.
  This is required for qdev which has reset callback and child bus.
---
 hw/esp.c|2 ++
 hw/lsi53c895a.c |1 +
 hw/qdev.c   |   42 +++---
 hw/qdev.h   |7 +++
 4 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 349052a..cafc257 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -423,6 +423,8 @@ static void esp_hard_reset(DeviceState *d)
 {
 ESPState *s = container_of(d, ESPState, busdev.qdev);
 
+qdev_reset_default(d);
+
 memset(s->rregs, 0, ESP_REGS);
 memset(s->wregs, 0, ESP_REGS);
 s->rregs[ESP_TCHI] = TCHI_FAS100A; // Indicate fas100a
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index bd7b661..33a8eb2 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -2042,6 +2042,7 @@ static void lsi_scsi_reset(DeviceState *dev)
 {
 LSIState *s = DO_UPCAST(LSIState, dev.qdev, dev);
 
+qdev_reset_default(dev);
 lsi_soft_reset(s);
 }
 
diff --git a/hw/qdev.c b/hw/qdev.c
index 322b315..8352f20 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -256,6 +256,14 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 return qdev;
 }
 
+void qdev_reset_default(DeviceState *dev)
+{
+BusState *bus;
+QLIST_FOREACH(bus, &dev->child_bus, sibling) {
+qbus_reset(bus);
+}
+}
+
 /*
  * reset the device.
  * Bring the device into initial known state (to some extent)
@@ -275,8 +283,11 @@ DeviceState *qdev_device_add(QemuOpts *opts)
  */
 void qdev_reset(DeviceState *dev)
 {
-if (dev->info->reset)
+if (dev->info->reset) {
 dev->info->reset(dev);
+} else {
+qdev_reset_default(dev);
+}
 }
 
 static void qdev_reset_fn(void *opaque)
@@ -299,7 +310,6 @@ int qdev_init(DeviceState *dev)
 qdev_free(dev);
 return rc;
 }
-qemu_register_reset(qdev_reset_fn, dev);
 if (dev->info->vmsd) {
 vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
dev->instance_id_alias,
@@ -671,6 +681,29 @@ static BusState *qbus_find(const char *path)
 }
 }
 
+void qbus_reset_default(BusState *bus)
+{
+DeviceState *dev;
+QLIST_FOREACH(dev, &bus->children, sibling) {
+qdev_reset(dev);
+}
+}
+
+/* trigger bus reset */
+void qbus_reset(BusState *bus)
+{
+if (bus->info->reset) {
+bus->info->reset(bus);
+} else {
+qbus_reset_default(bus);
+}
+}
+
+static void qbus_reset_fn(void *opaque)
+{
+qbus_reset(opaque);
+}
+
 void qbus_create_inplace(BusState *bus, BusInfo *info,
  DeviceState *parent, const char *name)
 {
@@ -705,7 +738,10 @@ void qbus_create_inplace(BusState *bus, BusInfo *info,
 QLIST_INSERT_HEAD(&parent->child_bus, bus, sibling);
 parent->num_child_bus++;
 }
-
+if (!parent || !parent->info) {
+/* parent device takes care of child bus reset */
+qemu_register_reset(qbus_reset_fn, bus);
+}
 }
 
 BusState *qbus_create(BusInfo *info, DeviceState *parent, const char *name)
diff --git a/hw/qdev.h b/hw/qdev.h
index 10f6769..af76f31 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -50,6 +50,7 @@ struct DeviceState {
 
 typedef void (*bus_dev_printfn)(Monitor *mon, DeviceState *dev, int indent);
 typedef char *(*bus_get_dev_path)(DeviceState *dev);
+typedef void (*bus_resetfn)(BusState *bus);
 
 struct BusInfo {
 const char *name;
@@ -57,6 +58,9 @@ struct BusInfo {
 bus_dev_printfn print_dev;
 bus_get_dev_path get_dev_path;
 Property *props;
+
+/* bus reset callbacks */
+bus_resetfn reset;
 };
 
 struct BusState {
@@ -163,6 +167,7 @@ extern DeviceInfo *device_info_list;
 
 void qdev_register(DeviceInfo *info);
 void qdev_reset(DeviceState *dev);
+void qdev_reset_default(DeviceState *dev);
 
 /* Register device properties.  */
 /* GPIO inputs also double as IRQ sinks.  */
@@ -179,6 +184,8 @@ void qbus_create_inplace(BusState *bus, BusInfo *info,
  DeviceState *parent, const char *name);
 BusState *qbus_create(BusInfo *info, DeviceState *parent, const char *name);
 void qbus_free(BusState *bus);
+void qbus_reset(BusState *bus);
+void qbus_reset_default(BusState *bus);
 
 #define FROM_QBUS(type, dev) DO_UPCAST(type, qbus, dev)
 
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 7/8] pci: eliminate work around in pci_device_reset().

2010-08-04 Thread Isaku Yamahata

Eliminate work around in pci_device_reset() by
making each pci reset function to call pci_device_reset_default().
Each device should know reset itself. It shouldn't be done pci generic
layer automatically. PCI layer should just signal reset and let each device
respond to reset.

Signed-off-by: Isaku Yamahata 
---
 hw/e1000.c  |1 +
 hw/lsi53c895a.c |2 ++
 hw/pci.c|6 --
 hw/pcnet.c  |1 +
 hw/rtl8139.c|2 ++
 hw/virtio-pci.c |1 +
 6 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 8d87492..0f303b0 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1077,6 +1077,7 @@ static void e1000_reset(void *opaque)
 memmove(d->mac_reg, mac_reg_init, sizeof mac_reg_init);
 d->rxbuf_min_shift = 1;
 memset(&d->tx, 0, sizeof d->tx);
+pci_device_reset_default(&d->dev);
 }
 
 static NetClientInfo net_e1000_info = {
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 33a8eb2..1e4ba10 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -358,6 +358,8 @@ static void lsi_soft_reset(LSIState *s)
 qemu_free(s->current);
 s->current = NULL;
 }
+
+pci_device_reset_default(&s->dev);
 }
 
 static int lsi_dma_40bit(LSIState *s)
diff --git a/hw/pci.c b/hw/pci.c
index 731d367..54cb89b 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -171,13 +171,7 @@ void pci_device_reset(PCIDevice *dev)
 return;
 }
 
-/*
- * TODO:
- * Each device should know all its states.
- * So move this part to each device specific callback.
- */
 qdev_reset(&dev->qdev);
-pci_device_reset_default(dev);
 }
 
 /*
diff --git a/hw/pcnet.c b/hw/pcnet.c
index b52935a..e73e682 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -2023,6 +2023,7 @@ static void pci_reset(DeviceState *dev)
 PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev.qdev, dev);
 
 pcnet_h_reset(&d->state);
+pci_device_reset_default(&d->pci_dev);
 }
 
 static PCIDeviceInfo pcnet_info = {
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index d92981d..1f35e5d 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -1260,6 +1260,8 @@ static void rtl8139_reset(DeviceState *d)
 
 /* reset tally counters */
 RTL8139TallyCounters_clear(&s->tally_counters);
+
+pci_device_reset_default(&s->dev);
 }
 
 static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index c728fff..d9b97be 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -184,6 +184,7 @@ static void virtio_pci_reset(DeviceState *d)
 virtio_reset(proxy->vdev);
 msix_reset(&proxy->pci_dev);
 proxy->bugs = 0;
+pci_device_reset_default(&proxy->pci_dev);
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 6/8] pci: use qbus bus reset callback.

2010-08-04 Thread Isaku Yamahata

use qbus bus reset callback.

Signed-off-by: Isaku Yamahata 
---
 hw/apb_pci.c|2 ++
 hw/pci.c|   23 ++-
 hw/pci_bridge.c |2 ++
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index c619112..775063a 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -384,6 +384,8 @@ static void pci_pbm_reset(DeviceState *d)
 unsigned int i;
 APBState *s = container_of(d, APBState, busdev.qdev);
 
+qdev_reset_default(d);
+
 for (i = 0; i < 8; i++) {
 s->pci_irq_map[i] &= PBM_PCI_IMR_MASK;
 }
diff --git a/hw/pci.c b/hw/pci.c
index c48bb3e..731d367 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -40,12 +40,14 @@
 
 static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
 static char *pcibus_get_dev_path(DeviceState *dev);
+static void pci_bus_reset_fn(BusState *qbus);
 
 struct BusInfo pci_bus_info = {
 .name   = "PCI",
 .size   = sizeof(PCIBus),
 .print_dev  = pcibus_dev_print,
 .get_dev_path = pcibus_get_dev_path,
+.reset  = pci_bus_reset_fn,
 .props  = (Property[]) {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
 DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
@@ -170,23 +172,11 @@ void pci_device_reset(PCIDevice *dev)
 }
 
 /*
- * There are two paths to reset pci device. Each resets does partially.
- * qemu_system_reset()
- *  -> pci_device_reset() with bus
- * -> pci_device_reset_default() which resets pci common part.
- *  -> DeviceState::reset: each device specific reset hanlder
- * which resets device specific part.
- *
  * TODO:
- * It requires two execution paths to reset the device fully.
- * It is confusing and prone to error. Each device should know all
- * its states.
+ * Each device should know all its states.
  * So move this part to each device specific callback.
  */
-
-/* For now qdev_reset() is called directly by qemu_system_reset() */
-/* qdev_reset(&dev->qdev); */
-
+qdev_reset(&dev->qdev);
 pci_device_reset_default(dev);
 }
 
@@ -208,9 +198,9 @@ void pci_bus_reset(PCIBus *bus)
 }
 }
 
-static void pci_bus_reset_fn(void *opaque)
+static void pci_bus_reset_fn(BusState *qbus)
 {
-pci_bus_reset(opaque);
+pci_bus_reset(DO_UPCAST(PCIBus, qbus, qbus));
 }
 
 static void pci_host_bus_register(int domain, PCIBus *bus)
@@ -267,7 +257,6 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */
 
 vmstate_register(NULL, -1, &vmstate_pcibus, bus);
-qemu_register_reset(pci_bus_reset_fn, bus);
 }
 
 PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min)
diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index 198c3c7..ab7ed6e 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -158,6 +158,8 @@ void pci_bridge_reset_reg(PCIDevice *dev)
 void pci_bridge_reset(DeviceState *qdev)
 {
 PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
+PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
+pci_bus_reset(&br->sec_bus);
 pci_bridge_reset_reg(dev);
 }
 
-- 
1.7.1.1

[Qemu-devel] [PATCH v2 0/8] qbus reset callback and implement pci bus reset

2010-08-04 Thread Isaku Yamahata


Changes v1 -> v2:
- addressed personal feed back from Gerd.
- reset signal are triggered by bus and propagated down into device.
- Only 5/8 is modified. Other patches remains same.

This patch isn't for 0.13 release. and for MST pci branch.
(git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git pci)

Patch description:
Introduce bus reset notion at qbus layer and implement pci bus reset
with it.
At first related codes are cleaned up and then introduce bus reset callback.
And then implement pci bus reset.

The main motivation is to implement pci bus reset.
But I suppose scsi bus and ide bus also can take advantage of
this patch series.

Isaku Yamahata (8):
  apb: fix typo.
  qdev: export qdev_reset() for later use.
  pci: export pci_bus_reset() and pci_device_reset() for later use.
  pci: make pci_device_reset() aware of qdev.
  qdev: introduce bus reset callback and helper functions.
  pci: use qbus bus reset callback.
  pci: eliminate work around in pci_device_reset().
  pci bridge: implement secondary bus reset.

 hw/apb_pci.c|8 --
 hw/e1000.c  |1 +
 hw/esp.c|2 +
 hw/lsi53c895a.c |3 ++
 hw/pci.c|   31 +---
 hw/pci.h|5 
 hw/pci_bridge.c |   15 +++-
 hw/pcnet.c  |1 +
 hw/qdev.c   |   69 ++
 hw/qdev.h   |8 ++
 hw/rtl8139.c|2 +
 hw/virtio-pci.c |1 +
 12 files changed, 132 insertions(+), 14 deletions(-)

Re: [Qemu-devel] Re: [PATCH 2/3] pci/pci_host: pci host bus initialization clean up.

2010-08-04 Thread Isaku Yamahata

On Mon, Jul 26, 2010 at 02:33:30PM +0300, Michael S. Tsirkin wrote:
> > +/*
> > + * TODO: there remains some boards which doesn't use PCIHostState.
> > + *   Enhance PCIHostState API and convert remaining boards.
> 
> I think I remember this comment from Paul:
>   On Tuesday 12 January 2010, Isaku Yamahata wrote:
>   > To use pci host framework, use PCIHostState instead of PCIBus in  
>   
>   >  PCIVPBState. 
>   
> 
>   No.
> 
>   pci_host.[ch] provides very specific functionality, it is not a generic
>   PCI host device. Specifically it provides indirect access to PCI config
>   space via a memory mapped {address,data} pair. The versatile PCI host 
> exposes PCI
>   config space directly, so should not be using this code.
> 
>   If you want a generic framework for PCI hosts then you need to use
>   something else. If nothing else, assuming that a PCI host bridge is 
> always is
>   SysBus device is wrong.
> 
> Still applies?

No objection? Paul, do you have any comment?
-- 
yamahata

[Qemu-devel] [Bug 613681] [NEW] implement true fullscreen

2010-08-04 Thread ZTG

Public bug reported:

Please implement a fullscreen functionality (similar to the one found in 
vmware, where there is an autohide bar), that enables display of a 1920x1080 VM 
on a 1920x1080 (for example) without resizing (currently the menubar prevents 
this).
Thank you.

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: wishlist

-- 
implement true fullscreen
https://bugs.launchpad.net/bugs/613681
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
Please implement a fullscreen functionality (similar to the one found in 
vmware, where there is an autohide bar), that enables display of a 1920x1080 VM 
on a 1920x1080 (for example) without resizing (currently the menubar prevents 
this).
Thank you.

[Qemu-devel] [PATCH 4/4] ppc4xx: load Bamboo kernel, initrd, and fdt at fixed addresses

2010-08-04 Thread Hollis Blanchard

We can't use the return value of load_uimage() for the kernel because it
can't account for BSS size, and the PowerPC kernel does not relocate
blobs before zeroing BSS.

Instead, we now load at the fixed addresses chosen by u-boot (the normal
firmware for the board).

Signed-off-by: Hollis Blanchard 

---
 hw/ppc440_bamboo.c |   39 ++-
 1 files changed, 18 insertions(+), 21 deletions(-)

This fixes a critical bug in PowerPC 440 Bamboo board emulation.

diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c
index d471d5d..34ddf45 100644
--- a/hw/ppc440_bamboo.c
+++ b/hw/ppc440_bamboo.c
@@ -27,6 +27,11 @@
 
 #define BINARY_DEVICE_TREE_FILE "bamboo.dtb"
 
+/* from u-boot */
+#define KERNEL_ADDR  0x100
+#define FDT_ADDR 0x180
+#define RAMDISK_ADDR 0x190
+
 static int bamboo_load_device_tree(target_phys_addr_t addr,
  uint32_t ramsize,
  target_phys_addr_t initrd_base,
@@ -98,10 +103,8 @@ static void bamboo_init(ram_addr_t ram_size,
 uint64_t elf_lowaddr;
 target_phys_addr_t entry = 0;
 target_phys_addr_t loadaddr = 0;
-target_long kernel_size = 0;
-target_ulong initrd_base = 0;
 target_long initrd_size = 0;
-target_ulong dt_base = 0;
+int success;
 int i;
 
 /* Setup CPU. */
@@ -118,15 +121,15 @@ static void bamboo_init(ram_addr_t ram_size,
 
 /* Load kernel. */
 if (kernel_filename) {
-kernel_size = load_uimage(kernel_filename, &entry, &loadaddr, NULL);
-if (kernel_size < 0) {
-kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
-   &elf_lowaddr, NULL, 1, ELF_MACHINE, 0);
+success = load_uimage(kernel_filename, &entry, &loadaddr, NULL);
+if (success < 0) {
+success = load_elf(kernel_filename, NULL, NULL, &elf_entry,
+   &elf_lowaddr, NULL, 1, ELF_MACHINE, 0);
 entry = elf_entry;
 loadaddr = elf_lowaddr;
 }
 /* XXX try again as binary */
-if (kernel_size < 0) {
+if (success < 0) {
 fprintf(stderr, "qemu: could not load kernel '%s'\n",
 kernel_filename);
 exit(1);
@@ -135,26 +138,20 @@ static void bamboo_init(ram_addr_t ram_size,
 
 /* Load initrd. */
 if (initrd_filename) {
-initrd_base = kernel_size + loadaddr;
-initrd_size = load_image_targphys(initrd_filename, initrd_base,
-  ram_size - initrd_base);
+initrd_size = load_image_targphys(initrd_filename, RAMDISK_ADDR,
+  ram_size - RAMDISK_ADDR);
 
 if (initrd_size < 0) {
-fprintf(stderr, "qemu: could not load initial ram disk '%s'\n",
-initrd_filename);
+fprintf(stderr, "qemu: could not load ram disk '%s' at %x\n",
+initrd_filename, RAMDISK_ADDR);
 exit(1);
 }
 }
 
 /* If we're loading a kernel directly, we must load the device tree too. */
 if (kernel_filename) {
-if (initrd_base)
-dt_base = initrd_base + initrd_size;
-else
-dt_base = kernel_size + loadaddr;
-
-if (bamboo_load_device_tree(dt_base, ram_size,
-initrd_base, initrd_size, kernel_cmdline) < 0) {
+if (bamboo_load_device_tree(FDT_ADDR, ram_size, RAMDISK_ADDR,
+initrd_size, kernel_cmdline) < 0) {
 fprintf(stderr, "couldn't load device tree\n");
 exit(1);
 }
@@ -163,7 +160,7 @@ static void bamboo_init(ram_addr_t ram_size,
 
 /* Set initial guest state. */
 env->gpr[1] = (16<<20) - 8;
-env->gpr[3] = dt_base;
+env->gpr[3] = FDT_ADDR;
 env->nip = entry;
 /* XXX we currently depend on KVM to create some initial TLB entries. 
*/
 }
-- 
1.7.2

[Qemu-devel] [PATCH 3/4] ppc4xx: don't unregister RAM at reset

2010-08-04 Thread Hollis Blanchard

The PowerPC 4xx SDRAM controller emulation unregisters RAM in its reset
callback. However, qemu_system_reset() is now called at initialization
time, so all RAM is unregistered before starting the guest (!).

Signed-off-by: Hollis Blanchard 

---
 hw/ppc4xx_devs.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

This fixes a critical bug in PowerPC 440 Bamboo board emulation.

diff --git a/hw/ppc4xx_devs.c b/hw/ppc4xx_devs.c
index be130c4..7f698b8 100644
--- a/hw/ppc4xx_devs.c
+++ b/hw/ppc4xx_devs.c
@@ -619,7 +619,6 @@ static void sdram_reset (void *opaque)
 /* We pre-initialize RAM banks */
 sdram->status = 0x;
 sdram->cfg = 0x0080;
-sdram_unmap_bcr(sdram);
 }
 
 void ppc4xx_sdram_init (CPUState *env, qemu_irq irq, int nbanks,
-- 
1.7.2

[Qemu-devel] [PATCH 2/4] ppc4xx: correct SDRAM controller warning message condition

2010-08-04 Thread Hollis Blanchard

The message "Truncating memory to %d MiB to fit SDRAM controller limits"
should be displayed only when a user chooses an amount of RAM which
can't be represented by the PPC 4xx SDRAM controller (e.g. 129MB, which
would only be valid if the controller supports a bank size of 1MB).

Signed-off-by: Hollis Blanchard 
---
 hw/ppc4xx_devs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/ppc4xx_devs.c b/hw/ppc4xx_devs.c
index b15db81..be130c4 100644
--- a/hw/ppc4xx_devs.c
+++ b/hw/ppc4xx_devs.c
@@ -684,7 +684,7 @@ ram_addr_t ppc4xx_sdram_adjust(ram_addr_t ram_size, int 
nr_banks,
 }
 
 ram_size -= size_left;
-if (ram_size)
+if (size_left)
 printf("Truncating memory to %d MiB to fit SDRAM controller limits.\n",
(int)(ram_size >> 20));
 
-- 
1.7.2

[Qemu-devel] [PATCH 1/4] Fix "make install" with a cross toolchain

2010-08-04 Thread Hollis Blanchard

We must be able to use a non-native strip executable, but not all
versions of 'install' support the --strip-program option (e.g.
OpenBSD). Accordingly, we can't use 'install -s', and we must run strip
separately.

Signed-off-by: Hollis Blanchard 
Cc: blauwir...@gmail.com
---
 Makefile.target |5 -
 configure   |4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 8a9c427..00bf6f9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -326,7 +326,10 @@ clean:
 
 install: all
 ifneq ($(PROGS),)
-   $(INSTALL) -m 755 $(STRIP_OPT) $(PROGS) "$(DESTDIR)$(bindir)"
+   $(INSTALL) -m 755 $(PROGS) "$(DESTDIR)$(bindir)"
+ifneq ($(STRIP),)
+   $(STRIP) $(patsubst %,"$(DESTDIR)$(bindir)/%",$(PROGS))
+endif
 endif
 
 # Include automatically generated dependency files
diff --git a/configure b/configure
index a20371c..146dac0 100755
--- a/configure
+++ b/configure
@@ -80,6 +80,7 @@ make="make"
 install="install"
 objcopy="objcopy"
 ld="ld"
+strip="strip"
 helper_cflags=""
 libs_softmmu=""
 libs_tools=""
@@ -125,6 +126,7 @@ cc="${cross_prefix}${cc}"
 ar="${cross_prefix}${ar}"
 objcopy="${cross_prefix}${objcopy}"
 ld="${cross_prefix}${ld}"
+strip="${cross_prefix}${strip}"
 
 # default flags for all hosts
 QEMU_CFLAGS="-fno-strict-aliasing $QEMU_CFLAGS"
@@ -2227,7 +2229,7 @@ if test "$debug" = "yes" ; then
   echo "CONFIG_DEBUG_EXEC=y" >> $config_host_mak
 fi
 if test "$strip_opt" = "yes" ; then
-  echo "STRIP_OPT=-s" >> $config_host_mak
+  echo "STRIP=${strip}" >> $config_host_mak
 fi
 if test "$bigendian" = "yes" ; then
   echo "HOST_WORDS_BIGENDIAN=y" >> $config_host_mak
-- 
1.7.2

[Qemu-devel] [PATCH 0/4] fix PowerPC 440 Bamboo platform emulation

2010-08-04 Thread Hollis Blanchard

These patches get the PowerPC Bamboo platform working again. I've re-written
two of the patches based on feedback from qemu-devel.

Note that this platform still only works in conjunction with KVM, since the
PowerPC 440 MMU is still not accurately emulated by TCG.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Kevin O'Connor

On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> > There are better ways like using string I/O and optimizing the PIO
> > path in the kernel.  That should cut down the 1s slow down with a
> > 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> > further off the initrd load is just not worth it using the current
> > model.
> > 
> The slow down is not 1s any more. String PIO emulation had many bugs
> that were fixed in 2.6.35. I verified how much time it took to load 100M
> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> that was already committed make it 20s. I have some code prototype that
> makes it 11s. I don't see how we can get below that, surely not back to
> ~2-3sec.

I guess this slowness is primarily for kvm.  I just ran some tests on
the latest qemu (with TCG).  I pulled in a 400Meg file over fw_cfg
using the SeaBIOS interface - it takes 9.8 seconds (pretty
consistently).  Oddly, if I change SeaBIOS to use insb (string pio) it
takes 11.5 seconds (again, pretty consistently).  These times were
measured on the host - they don't include the extra time it takes qemu
to start up (during which it reads the file into its memory).

-Kevin

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Kevin O'Connor

On Wed, Aug 04, 2010 at 06:01:54PM +0300, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote:
> > On 08/04/2010 09:38 AM, Gleb Natapov wrote:
> > >ROM does not muck with the e820. It uses PMM to allocate memory and the
> > >memory it gets is marked as reserved in e820 map.

Every ROM is implemented differently - there's no way to really know
what they'll do.

> > PMM allocations are only valid during the init function's execution.
> > It's intention is to enable the use of scratch memory to decompress
> > or otherwise modify the ROM to shrink its size.
> > 
> Hm, may be. I read seabios code differently, but may be I misread it.

There is a PCIv3 extension to PMM which supports long term memory
allocations.  SeaBIOS does implement this.  The base PMM spec though
only supports memory allocations during the POST phase.

-Kevin

[Qemu-devel] [RFC PATCH 2/4] AMD IOMMU emulation

2010-08-04 Thread Eduard - Gabriel Munteanu

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu 
---
 Makefile.target |2 +
 configure   |   10 +
 hw/amd_iommu.c  |  671 +++
 hw/pc.c |4 +
 hw/pc.h |3 +
 hw/pci_ids.h|2 +
 hw/pci_regs.h   |1 +
 7 files changed, 693 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 70a9c1b..86226a0 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -219,6 +219,8 @@ obj-i386-y += pcspk.o i8254.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
+obj-i386-$(CONFIG_AMD_IOMMU) += amd_iommu.o
+
 # Hardware support
 obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
 obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
diff --git a/configure b/configure
index af50607..7448603 100755
--- a/configure
+++ b/configure
@@ -317,6 +317,7 @@ io_thread="no"
 mixemu="no"
 kvm_cap_pit=""
 kvm_cap_device_assignment=""
+amd_iommu="no"
 kerneldir=""
 aix="no"
 blobs="yes"
@@ -629,6 +630,8 @@ for opt do
   ;;
   --enable-kvm-device-assignment) kvm_cap_device_assignment="yes"
   ;;
+  --enable-amd-iommu-emul) amd_iommu="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -871,6 +874,8 @@ echo "  --disable-kvm-pitdisable KVM pit support"
 echo "  --enable-kvm-pit enable KVM pit support"
 echo "  --disable-kvm-device-assignment  disable KVM device assignment support"
 echo "  --enable-kvm-device-assignment   enable KVM device assignment support"
+echo "  --disable-amd-iommu-emul disable AMD IOMMU emulation"
+echo "  --enable-amd-iommu-emul  enable AMD IOMMU emulation"
 echo "  --disable-nptl   disable usermode NPTL support"
 echo "  --enable-nptlenable usermode NPTL support"
 echo "  --enable-system  enable all system emulation targets"
@@ -2251,6 +2256,7 @@ echo "Install blobs $blobs"
 echo "KVM support   $kvm"
 echo "KVM PIT support   $kvm_cap_pit"
 echo "KVM device assig. $kvm_cap_device_assignment"
+echo "AMD IOMMU emul.   $amd_iommu"
 echo "fdt support   $fdt"
 echo "preadv support$preadv"
 echo "fdatasync $fdatasync"
@@ -2645,6 +2651,10 @@ case "$target_arch2" in
   x86_64)
 TARGET_BASE_ARCH=i386
 target_phys_bits=64
+if test "$amd_iommu" = "yes"; then
+  echo "CONFIG_AMD_IOMMU=y" >> $config_target_mak
+  echo "CONFIG_PCI_IOMMU=y" >> $config_host_mak
+fi
   ;;
   ia64)
 target_phys_bits=64
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 000..ff9903e
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,671 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+
+/* Capability registers */
+#define CAPAB_HEADER0x00
+#define   CAPAB_REV_TYPE0x02
+#define   CAPAB_FLAGS   0x03
+#define CAPAB_BAR_LOW   0x04
+#define CAPAB_BAR_HIGH  0x08
+#define CAPAB_RANGE 0x0C
+#define CAPAB_MISC  0x10
+
+#define CAPAB_SIZE  0x14
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL (1 << 1)
+#define CAPAB_FLAG_NPCACHE  (1 << 2)
+#define CAPAB_INIT_REV  (1 << 3)
+#define CAPAB_INIT_TYPE 3
+#define CAPAB_INIT_REV_TYPE (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS(CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK  ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE   0x
+#define MMIO_COMMAND_BASE   0x0008
+#define MMIO_EV

[Qemu-devel] [RFC PATCH 3/4] ide: use the PCI memory access interface

2010-08-04 Thread Eduard - Gabriel Munteanu

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Signed-off-by: Eduard - Gabriel Munteanu 
---
 dma-helpers.c |   37 +++--
 dma.h |   21 -
 hw/ide/core.c |   15 ---
 hw/ide/internal.h |   39 +++
 hw/ide/pci.c  |7 +++
 5 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index d4fc077..408fee3 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,34 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+static void *qemu_sglist_default_map(void *opaque,
+ target_phys_addr_t addr,
+ target_phys_addr_t *len,
+ int is_write)
+{
+return cpu_physical_memory_map(addr, len, is_write);
+}
+
+static void qemu_sglist_default_unmap(void *opaque,
+  void *buffer,
+  target_phys_addr_t len,
+  int is_write,
+  target_phys_addr_t access_len)
+{
+cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+  QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
 {
 qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
 qsg->nsg = 0;
 qsg->nalloc = alloc_hint;
 qsg->size = 0;
+
+qsg->map = map ? map : (QEMUSGMapFunc *) qemu_sglist_default_map;
+qsg->unmap = unmap ? unmap : (QEMUSGUnmapFunc *) qemu_sglist_default_unmap;
+qsg->opaque = opaque;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -79,6 +101,16 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
 }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+DMAAIOCB *dbs = opaque;
+
+bdrv_aio_cancel(dbs->acb);
+dma_bdrv_unmap(dbs);
+qemu_iovec_destroy(&dbs->iov);
+qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
 DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +132,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
 while (dbs->sg_cur_index < dbs->sg->nsg) {
 cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
 cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
+   cur_addr, &cur_len, !dbs->is_write);
 if (!mem)
 break;
 qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..d48f35c 100644
--- a/dma.h
+++ b/dma.h
@@ -15,6 +15,19 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef void QEMUSGInvalMapFunc(void *opaque);
+typedef void *QEMUSGMapFunc(void *opaque,
+QEMUSGInvalMapFunc *inval_cb,
+void *inval_opaque,
+target_phys_addr_t addr,
+target_phys_addr_t *len,
+int is_write);
+typedef void QEMUSGUnmapFunc(void *opaque,
+ void *buffer,
+ target_phys_addr_t len,
+ int is_write,
+ target_phys_addr_t access_len);
+
 typedef struct {
 target_phys_addr_t base;
 target_phys_addr_t len;
@@ -25,9 +38,15 @@ typedef struct {
 int nsg;
 int nalloc;
 target_phys_addr_t size;
+
+QEMUSGMapFunc *map;
+QEMUSGUnmapFunc *unmap;
+void *opaque;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+  QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
+  void *opaque);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
  target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0b3b7c2..c19013a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -435,7 +435,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
 } prd;
 int l, len;
 
-qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
+qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
+ bm->map, bm->unmap, bm->opaque);
 s->io_buffer_size = 0;
 for(;;) {
 if (bm->cur_prd_len == 0) {
@@ -443,7 +444,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
 if

[Qemu-devel] [RFC PATCH 4/4] rtl8139: use the PCI memory access interface

2010-08-04 Thread Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu 
---
 hw/rtl8139.c |   99 -
 1 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 72e2242..99d5f69 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -412,12 +412,6 @@ typedef struct RTL8139TallyCounters
 uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t 
tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
 PCIDevice dev;
 uint8_t phys[8]; /* mac address */
@@ -496,6 +490,14 @@ typedef struct RTL8139State {
 
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+   target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -746,6 +748,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+PCIDevice *dev = &s->dev;
+
 if (s->RxBufAddr + size > s->RxBufferSize)
 {
 int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +761,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const 
void *buf, int size)
 
 if (size > wrapped)
 {
-cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-   buf, size-wrapped );
+pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+ buf, size-wrapped);
 }
 
 /* reset buffer pointer */
 s->RxBufAddr = 0;
 
-cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-   buf + (size-wrapped), wrapped );
+pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+ buf + (size-wrapped), wrapped);
 
 s->RxBufAddr = wrapped;
 
@@ -774,7 +778,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const 
void *buf, int size)
 }
 
 /* non-wrapping path or overwrapping enabled */
-cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+pci_memory_write(dev, s->RxBuf + s->RxBufAddr, buf, size);
 
 s->RxBufAddr += size;
 }
@@ -814,6 +818,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size_, int do_interrupt)
 {
 RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+PCIDevice *dev = &s->dev;
 int size = size_;
 
 uint32_t packet_header = 0;
@@ -968,13 +973,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-cpu_physical_memory_read(cplus_rx_ring_desc,(uint8_t *)&val, 4);
+pci_memory_read(dev, cplus_rx_ring_desc,(uint8_t *)&val, 4);
 rxdw0 = le32_to_cpu(val);
-cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+pci_memory_read(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 rxdw1 = le32_to_cpu(val);
-cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+pci_memory_read(dev, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
 rxbufLO = le32_to_cpu(val);
-cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+pci_memory_read(dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
 rxbufHI = le32_to_cpu(val);
 
 DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x 
%08x\n",
@@ -1019,7 +1024,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
 /* receive/copy to target memory */
-cpu_physical_memory_write( rx_addr, buf, size );
+pci_memory_write(dev, rx_addr, buf, size);
 
 if (s->CpCmd & CPlusRxChkSum)
 {
@@ -1032,7 +1037,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 #else
 val = 0;
 #endif
-cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+pci_memory_write(dev, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1086,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t

[Qemu-devel] [RFC PATCH 0/4] AMD IOMMU emulation 2nd version

2010-08-04 Thread Eduard - Gabriel Munteanu

Hi,

I hope I solved the issues raised by Anthony and Paul.

Please have a look and tell me what you think. However, don't merge it yet (in
case you like it), I need to test and cleanup some pieces further. There are
also some patches from the previous series I didn't include yet.


Thanks,
Eduard

Eduard - Gabriel Munteanu (4):
  pci: memory access API and IOMMU support
  AMD IOMMU emulation
  ide: use the PCI memory access interface
  rtl8139: use the PCI memory access interface

 Makefile.target   |2 +
 configure |   10 +
 dma-helpers.c |   37 +++-
 dma.h |   21 ++-
 hw/amd_iommu.c|  671 +
 hw/ide/core.c |   15 +-
 hw/ide/internal.h |   39 +++
 hw/ide/pci.c  |7 +
 hw/pc.c   |4 +
 hw/pc.h   |3 +
 hw/pci.c  |  145 
 hw/pci.h  |  130 +++
 hw/pci_ids.h  |2 +
 hw/pci_regs.h |1 +
 hw/rtl8139.c  |   99 +
 qemu-common.h |1 +
 16 files changed, 1134 insertions(+), 53 deletions(-)
 create mode 100644 hw/amd_iommu.c

[Qemu-devel] [RFC PATCH 1/4] pci: memory access API and IOMMU support

2010-08-04 Thread Eduard - Gabriel Munteanu

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu 
---
 hw/pci.c  |  145 +
 hw/pci.h  |  130 +++
 qemu-common.h |1 +
 3 files changed, 276 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6871728..ce2734b 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -58,6 +58,10 @@ struct PCIBus {
Keep a count of the number of devices with raised IRQs.  */
 int nirq;
 int *irq_count;
+
+#ifdef CONFIG_PCI_IOMMU
+PCIIOMMU *iommu;
+#endif
 };
 
 static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
@@ -2029,6 +2033,147 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
 }
 }
 
+#ifdef CONFIG_PCI_IOMMU
+
+void pci_register_iommu(PCIDevice *dev, PCIIOMMU *iommu)
+{
+dev->bus->iommu = iommu;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+   pci_addr_t addr,
+   uint8_t *buf,
+   pci_addr_t len,
+   int is_write)
+{
+int err, plen;
+unsigned perms;
+PCIIOMMU *iommu = dev->bus->iommu;
+target_phys_addr_t paddr;
+
+if (!iommu || !iommu->translate)
+return cpu_physical_memory_rw(addr, buf, len, is_write);
+
+perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+while (len) {
+err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms);
+if (err)
+return;
+
+/* The translation might be valid for larger regions. */
+if (plen > len)
+plen = len;
+
+cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+len -= plen;
+addr += plen;
+buf += plen;
+}
+}
+
+void *pci_memory_map(PCIDevice *dev,
+ PCIInvalidateIOTLBFunc *cb,
+ void *opaque,
+ pci_addr_t addr,
+ target_phys_addr_t *len,
+ int is_write)
+{
+int err, plen;
+unsigned perms;
+PCIIOMMU *iommu = dev->bus->iommu;
+target_phys_addr_t paddr;
+
+if (!iommu || !iommu->translate)
+return cpu_physical_memory_map(addr, len, is_write);
+
+perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+plen = *len;
+err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms);
+if (err)
+return NULL;
+
+/*
+ * If this is true, the virtual region is contiguous,
+ * but the translated physical region isn't. We just
+ * clamp *len, much like cpu_physical_memory_map() does.
+ */
+if (plen < *len)
+*len = plen;
+
+/* We treat maps as remote TLBs to cope with stuff like AIO. */
+if (cb && iommu->register_iotlb_invalidator)
+iommu->register_iotlb_invalidator(iommu, dev, addr, cb, opaque);
+
+return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+  void *buffer,
+  target_phys_addr_t len,
+  int is_write,
+  target_phys_addr_t access_len)
+{
+cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)   \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pci_addr_t addr)\
+{ \
+PCIIOMMU *iommu = dev->bus->iommu;\
+target_phys_addr_t paddr; \
+int plen, err;\
+  \
+if (!iommu || !iommu->translate)  \
+return ld##suffix##_phys(addr);   \
+  \
+err = iommu->translate(iommu, dev,\
+   addr, &paddr, &plen, IOMMU_PERM_READ); \
+if (err || (plen < size / 8)) \
+return 0; \
+  \
+return ld##suffix##_phys(paddr);  \
+}
+
+#define DEFINE_PCI_ST(suffix, size)   \
+void pci_st##suffix(PCIDevice *dev, pci_addr_t addr,

RE: [Qemu-devel] [PATCH] Added an option to set the VMDK adapter type

2010-08-04 Thread Aaron Mason

there

sent from my Telstra NEXTG™ handset

-Original Message-
From: Kevin Wolf 
Sent: Wednesday, 4 August 2010 10:29 PM
To: andrzej zaborowski 
Cc: Aaron Mason ; qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Added an option to set the VMDK adapter type

Am 04.08.2010 14:27, schrieb andrzej zaborowski:
> Hi,
> 
> On 4 August 2010 12:30, Kevin Wolf  wrote:
>> Am 04.08.2010 01:46, schrieb Aaron Mason:
>>> +const char *real_filename, *temp_str, *adapterType = "ide";
> 
> Sorry to complain about style, but note that uppercase characters are
> not used in variable names in Qemu (that I see).

Whoops, missed that one when complaining about the other style problems.
Yes, this should be adapter_type.

Kevin

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Richard W.M. Jones

On Wed, Aug 04, 2010 at 02:06:58PM -0600, David S. Ahern wrote:
> 
> 
> On 08/04/10 11:34, Avi Kivity wrote:
> 
> >> And it's awesome for fast prototyping. Of course, once that fast
> >> becomes dog slow, it's not useful anymore.
> > 
> > For the Nth time, it's only slow with 100MB initrds.
> 
> 100MB is really not that large for an initrd.

I'd just like to note that the libguestfs initrd is uncompressed.

The reason for this is I found that the decompression code in Linux is
really slow.  I have to admit I didn't look into why this is.  By not
compressing it on the host and decompressing it on the guest, we saved
a bunch of boot time (3-5 seconds IIRC).

Anyway, comparing 115MB libguestfs initrd and other initrd sizes may
not be a fair comparison, since almost every other initrd you will see
will be compressed.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

[Qemu-devel] [Bug 613529] Re: qemu does not accept regular disk geometry

2010-08-04 Thread Hadmut Danisch

Seems to be the same issue as in 
http://qemu-forum.ipi.fi/viewtopic.php?f=4&t=5218

-- 
qemu does not accept regular disk geometry
https://bugs.launchpad.net/bugs/613529
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
Hi, 

I am currently hunting a strange bug in qemu/kvm:

I am using an lvm logical volume as a virtual hard disk for a virtual machine. 

I use fdisk or parted to create a partition table and partitions, kpartx to 
generate the device entries for the partitions, then install linux on ext3/ext4 
with grub or msdos filesystem with syslinux. 

But then, in most cases even the boot process fails or behaves strangely, 
sometimes even mounting the file system in the virtual machine fails. It seems 
as if there is a problem with the virtual disk geometry. The problem does not 
seem to occur if I reboot the host system after creating the partition table on 
the logical volume. I guess the linux kernel needs to learn the disk geometry 
by reboot. A blkdev --rereadpt does not work on lvm volumes. 

The first approach to test/fix the problem would be to pass the disk geometry 
to qemu/lvm with the -drive option. Unfortunately, qemu/kvm does not accept the 
default geometry with 255 heads and 63 sectors. Seems to limit the number of 
heads to 16, thus limiting the disk size.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread David S. Ahern

On 08/04/10 11:34, Avi Kivity wrote:

>> And it's awesome for fast prototyping. Of course, once that fast
>> becomes dog slow, it's not useful anymore.
> 
> For the Nth time, it's only slow with 100MB initrds.

100MB is really not that large for an initrd.

Consider the deployment of stateless nodes - something that
virtualization allows the rapid deployment of. 1 kernel, 1 initrd with
the various binaries to be run. Create nodes as needed by launching a
shell command - be it for more capacity, isolation, etc. Why require an
iso or disk wrapper for a binary blob that is all to be run out of
memory? The -append argument allows boot parameters to be specified at
launch. That is a very powerful and simple design option.

David

[Qemu-devel] segfault due to missing qdev_create()?

2010-08-04 Thread Hollis Blanchard

I am able to run qemu with the following commandline:
/usr/local/bin/qemu-system-ppcemb -enable-kvm -kernel uImage.bamboo
-nographic -M bamboo ppc440-angstrom-linux.img

However, when I try to use virtio instead, I get this segfault:
/usr/local/bin/qemu-system-ppcemb -enable-kvm -kernel uImage.bamboo
-drive file=ppc440-angstrom-linux.img,if=virtio -nographic -M bamboo

#0  0x1009864c in qbus_find_recursive (bus=0x0, name=0x0, info=0x10287238)
at /home/hollisb/work/qemu.git/hw/qdev.c:461
#1  0x10099cc4 in qdev_device_add (opts=0x108a07a0)
at /home/hollisb/work/qemu.git/hw/qdev.c:229
#2  0x101a4220 in device_init_func (opts=,
opaque=) at /home/hollisb/work/qemu.git/vl.c:1519
#3  0x1002baf8 in qemu_opts_foreach (list=,
func=0x101a4204 , opaque=0x0,
abort_on_failure=) at qemu-option.c:978
#4  0x101a68e0 in main (argc=,
argv=, envp=)
at /home/hollisb/work/qemu.git/vl.c:2890

This patch avoids the segfault, but just gives me this message: No
'PCI' bus found for device 'virtio-blk-pci'

diff --git a/hw/qdev.c b/hw/qdev.c
index e99c73f..8fe4f06 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -455,6 +455,9 @@ static BusState *qbus_find_recursive(BusState *bus, const ch
 BusState *child, *ret;
 int match = 1;

+   if (!bus)
+   return NULL;
+
 if (name && (strcmp(bus->name, name) != 0)) {
 match = 0;
 }

FWIW, hw/ppc4xx_pci.c is my PCI controller. Do I need to add some qdev
magic to that file to make this work?

-Hollis

Re: [Qemu-devel] qemu cp15 access

2010-08-04 Thread Raymes Khoury

That patch did not fix my issue. My problem turned out to be due to TLS
accesses to cp15 not being allowed by qemu in user mode, even though these
are permitted in ARMv6 and above architectures (e.g. see
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388f/CIHFGFGE.html).
This was corrected by patch: http://patchwork.ozlabs.org/patch/43797/ which
seems to be applied in trunk and will be released in 0.13.0

Thanks.

On Wed, Jul 28, 2010 at 7:23 AM, Loïc Minier  wrote:

> On Mon, Jul 26, 2010, Raymes Khoury wrote:
> > I am having the problem with qemu, as described in the post
> >
> http://old.nabble.com/-PATCH:-PR-target-42671--Use-Thumb1-GOT-address-loading-sequence-for--%09Thumb2-td27124497.html
> > where
> > accessing cp15 on ARM causes an error:
>
>  See mid 1280086076-20649-1-git-send-email-loic.min...@linaro.org and
>  thread
>
>  http://article.gmane.org/gmane.comp.emulators.qemu/77092
>
> --
> Loïc Minier
>
>

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 09:16 PM, Anthony Liguori wrote:
Why not go with 9p? That would save off even more time, as you don't 
have to generate an iso. You could just copy all the relevant 
executables into tmpfs and boot from there using your kernel and a 
very small (pre-built) initrd.


You can't boot from 9p.



As Alex said, you boot from a non-100MB initrd (or cdrom) and mount the 
9pfs.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 20:16, Anthony Liguori wrote:

> On 08/04/2010 01:13 PM, Alexander Graf wrote:
>> On 04.08.2010, at 19:46, Richard W.M. Jones wrote:
>> 
>>   
>>> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
>>> 
 This is basically my suggestion to libguestfs: instead of generating
 an initrd, generate a bootable cdrom, and boot from that.  The
 result is faster and has a smaller memory footprint.  Everyone wins.
   
>>> We had some discussion of this upstream&  decided to do this.  It
>>> should save the time it takes for the guest kernel to unpack the
>>> initrd, so maybe another second off boot time, which could bring us
>>> ever closer to the "golden" 5 second boot target.
>>> 
>>> It's not trivial mind you, and won't happen straightaway.  Part of it
>>> is that it requires reworking the appliance builder (a matter of just
>>> coding really).  The less trivial part is that we have to 'hide' the
>>> CD device throughout the publically available interfaces.  Then of
>>> course, a lot of testing.
>>> 
>> Why not go with 9p? That would save off even more time, as you don't have to 
>> generate an iso. You could just copy all the relevant executables into tmpfs 
>> and boot from there using your kernel and a very small (pre-built) initrd.
>>   
> 
> You can't boot from 9p.

But you could still use -kernel and -initrd for that, no?


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 09:13 PM, Alexander Graf wrote:



It's not trivial mind you, and won't happen straightaway.  Part of it
is that it requires reworking the appliance builder (a matter of just
coding really).  The less trivial part is that we have to 'hide' the
CD device throughout the publically available interfaces.  Then of
course, a lot of testing.

Why not go with 9p? That would save off even more time, as you don't have to 
generate an iso. You could just copy all the relevant executables into tmpfs 
and boot from there using your kernel and a very small (pre-built) initrd.


Yes - and you don't need to copy, just hardlink if your /tmp and /usr 
are on the same filesystem.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 01:13 PM, Alexander Graf wrote:

On 04.08.2010, at 19:46, Richard W.M. Jones wrote:

   

On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
 

This is basically my suggestion to libguestfs: instead of generating
an initrd, generate a bootable cdrom, and boot from that.  The
result is faster and has a smaller memory footprint.  Everyone wins.
   

We had some discussion of this upstream&  decided to do this.  It
should save the time it takes for the guest kernel to unpack the
initrd, so maybe another second off boot time, which could bring us
ever closer to the "golden" 5 second boot target.

It's not trivial mind you, and won't happen straightaway.  Part of it
is that it requires reworking the appliance builder (a matter of just
coding really).  The less trivial part is that we have to 'hide' the
CD device throughout the publically available interfaces.  Then of
course, a lot of testing.
 

Why not go with 9p? That would save off even more time, as you don't have to 
generate an iso. You could just copy all the relevant executables into tmpfs 
and boot from there using your kernel and a very small (pre-built) initrd.
   


You can't boot from 9p.

Regards,

Anthony Liguori


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 19:46, Richard W.M. Jones wrote:

> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
>> This is basically my suggestion to libguestfs: instead of generating
>> an initrd, generate a bootable cdrom, and boot from that.  The
>> result is faster and has a smaller memory footprint.  Everyone wins.
> 
> We had some discussion of this upstream & decided to do this.  It
> should save the time it takes for the guest kernel to unpack the
> initrd, so maybe another second off boot time, which could bring us
> ever closer to the "golden" 5 second boot target.
> 
> It's not trivial mind you, and won't happen straightaway.  Part of it
> is that it requires reworking the appliance builder (a matter of just
> coding really).  The less trivial part is that we have to 'hide' the
> CD device throughout the publically available interfaces.  Then of
> course, a lot of testing.

Why not go with 9p? That would save off even more time, as you don't have to 
generate an iso. You could just copy all the relevant executables into tmpfs 
and boot from there using your kernel and a very small (pre-built) initrd.

Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 19:53, Anthony Liguori wrote:

> On 08/04/2010 12:37 PM, Avi Kivity wrote:
>> On 08/04/2010 08:27 PM, Anthony Liguori wrote:
>>> On 08/04/2010 12:19 PM, Avi Kivity wrote:
 On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
> 
> That's another story and I totally agree here, but not reusing /dev/sd* 
> is not intrinsic in the design of virtio-blk (and one thing that Windows 
> gets right; everything is SCSI, period).
> 
 
 I don't really get why everything must be SCSI.  Everything must support 
 read, write, a few other commands, and a large set of optional commands.  
 But why map them all to SCSI?  What's the magic?
>>> 
>>> Because that's what real hardware with only a few rare exceptions.
>>> 
>> 
>> I thought that IDE was emulated as SCSI even when it wasn't.  But I guess 
>> now with SATA you're right.
> 
> IDE -> EIDE -> ATA -> SATA
> 
> ATA can encapsulate SCSI commands via ATAPI which gives you the ability to 
> have ATA based CD-ROMs among other things.
> 
> I don't believe that SATA actually uses SCSI commands for read/write 
> operations

It doesn't. In fact, it's basically just a wrapper around the normal ATA 
commands - even for read/write. Plus some additional SATA only commands for 
parallel read/write.

> but I think Linux exposes SATA drivers as SCSI anyway.

Yup. That's what libata does. Even works with PATA drives. But this is a purely 
Linux internal thing.


Alex

[Qemu-devel] [PATCH 2/3] savevm: Generate a name when run without one

2010-08-04 Thread Miguel Di Ciurcio Filho

When savevm is run without a name, the name stays blank and the snapshot is
saved anyway.

The new behavior is when savevm is run without parameters a name will be
created automaticaly, so the snapshot is accessible to the user without needing
the id when loadvm is run.

(qemu) savevm
(qemu) info snapshots
IDTAG VM SIZEDATE   VM CLOCK
1 vm-20100728134640  978K 2010-07-28 13:46:40   00:00:08.603

We use a name with the format 'vm-MMDDHHMMSS'.

This is a first step to hide the internal id, because I don't see a reason to
expose this kind of internals to the user.

Signed-off-by: Miguel Di Ciurcio Filho 
---
 savevm.c |   29 -
 1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/savevm.c b/savevm.c
index 9291cfb..025bee6 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1799,8 +1799,10 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 uint32_t vm_state_size;
 #ifdef _WIN32
 struct _timeb tb;
+struct tm *ptm;
 #else
 struct timeval tv;
+struct tm tm;
 #endif
 const char *name = qdict_get_try_str(qdict, "name");
 
@@ -1831,15 +1833,6 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 vm_stop(0);
 
 memset(sn, 0, sizeof(*sn));
-if (name) {
-ret = bdrv_snapshot_find(bs, old_sn, name);
-if (ret >= 0) {
-pstrcpy(sn->name, sizeof(sn->name), old_sn->name);
-pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str);
-} else {
-pstrcpy(sn->name, sizeof(sn->name), name);
-}
-}
 
 /* fill auxiliary fields */
 #ifdef _WIN32
@@ -1853,6 +1846,24 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 #endif
 sn->vm_clock_nsec = qemu_get_clock(vm_clock);
 
+if (name) {
+ret = bdrv_snapshot_find(bs, old_sn, name);
+if (ret >= 0) {
+pstrcpy(sn->name, sizeof(sn->name), old_sn->name);
+pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str);
+} else {
+pstrcpy(sn->name, sizeof(sn->name), name);
+}
+} else {
+#ifdef _WIN32
+ptm = localtime(&tb.time);
+strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", ptm);
+#else
+localtime_r(&tv.tv_sec, &tm);
+strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", &tm);
+#endif
+}
+
 /* Delete old snapshots of the same name */
 if (name && del_existing_snapshots(mon, name) < 0) {
 goto the_end;
-- 
1.7.1

[Qemu-devel] [PATCH 0/3] snapshots: various updates

2010-08-04 Thread Miguel Di Ciurcio Filho

Hi there!

This series introduces updates the 'info snapshots' and 'savevm' commands.

Patch 1 summarizes the output of 'info snapshots' to show only fully
available snapshots.

Patch 2 adds a default name to an snapshot in case the user did not provide one,
using a template like vm-MMDDHHMMSS.

Patch 3 adds -f to the 'savevm' command in case the use really wants to
overwrite an snapshot.

More details in each patch.

Changelog from previous version:

- libvirt is not affected by the change in savevm
- Fixed some coding errors and do not rename the name of variables

Regards,

Miguel

Miguel Di Ciurcio Filho (3):
  monitor: make 'info snapshots' show only fully available snapshots
  savevm: Generate a name when run without one
  savevm: prevent snapshot overwriting

 qemu-monitor.hx |7 ++--
 savevm.c|   97 --
 2 files changed, 76 insertions(+), 28 deletions(-)

[Qemu-devel] [PATCH 1/3] monitor: make 'info snapshots' show only fully available snapshots

2010-08-04 Thread Miguel Di Ciurcio Filho

The output generated by 'info snapshots' shows only snapshots that exist on the
block device that saves the VM state. This output can cause an user to
erroneously try to load an snapshot that is not available on all block devices.

$ qemu-img snapshot -l xxtest.qcow2
Snapshot list:
IDTAG VM SIZEDATE   VM CLOCK
11.5M 2010-07-26 16:51:52   00:00:08.599
21.5M 2010-07-26 16:51:53   00:00:09.719
31.5M 2010-07-26 17:26:49   00:00:13.245
41.5M 2010-07-26 19:01:00   00:00:46.763

$ qemu-img snapshot -l xxtest2.qcow2
Snapshot list:
IDTAG VM SIZEDATE   VM CLOCK
3   0 2010-07-26 17:26:49   00:00:13.245
4   0 2010-07-26 19:01:00   00:00:46.763

Current output:
$ qemu -hda xxtest.qcow2 -hdb xxtest2.qcow2 -monitor stdio -vnc :0
QEMU 0.12.4 monitor - type 'help' for more information
(qemu) info snapshots
Snapshot devices: ide0-hd0
Snapshot list (from ide0-hd0):
IDTAG VM SIZEDATE   VM CLOCK
11.5M 2010-07-26 16:51:52   00:00:08.599
21.5M 2010-07-26 16:51:53   00:00:09.719
31.5M 2010-07-26 17:26:49   00:00:13.245
41.5M 2010-07-26 19:01:00   00:00:46.763

Snapshots 1 and 2 do not exist on xxtest2.qcow, but they are displayed anyway.

This patch sumarizes the output to only show fully available snapshots.

New output:
(qemu) info snapshots
IDTAG VM SIZEDATE   VM CLOCK
31.5M 2010-07-26 17:26:49   00:00:13.245
41.5M 2010-07-26 19:01:00   00:00:46.763

Signed-off-by: Miguel Di Ciurcio Filho 
---
 savevm.c |   59 +++
 1 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4c0e5d3..9291cfb 100644
--- a/savevm.c
+++ b/savevm.c
@@ -2004,8 +2004,10 @@ void do_delvm(Monitor *mon, const QDict *qdict)
 void do_info_snapshots(Monitor *mon)
 {
 BlockDriverState *bs, *bs1;
-QEMUSnapshotInfo *sn_tab, *sn;
-int nb_sns, i;
+QEMUSnapshotInfo *sn_tab, *sn, s, *sn_info = &s;
+int nb_sns, i, ret, available;
+int total;
+int *available_snapshots;
 char buf[256];
 
 bs = bdrv_snapshots();
@@ -2013,27 +2015,52 @@ void do_info_snapshots(Monitor *mon)
 monitor_printf(mon, "No available block device supports snapshots\n");
 return;
 }
-monitor_printf(mon, "Snapshot devices:");
-bs1 = NULL;
-while ((bs1 = bdrv_next(bs1))) {
-if (bdrv_can_snapshot(bs1)) {
-if (bs == bs1)
-monitor_printf(mon, " %s", bdrv_get_device_name(bs1));
-}
-}
-monitor_printf(mon, "\n");
 
 nb_sns = bdrv_snapshot_list(bs, &sn_tab);
 if (nb_sns < 0) {
 monitor_printf(mon, "bdrv_snapshot_list: error %d\n", nb_sns);
 return;
 }
-monitor_printf(mon, "Snapshot list (from %s):\n",
-   bdrv_get_device_name(bs));
-monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), NULL));
-for(i = 0; i < nb_sns; i++) {
+
+if (nb_sns == 0) {
+monitor_printf(mon, "There is no snapshot available.\n");
+return;
+}
+
+available_snapshots = qemu_mallocz(sizeof(int) * nb_sns);
+total = 0;
+for (i = 0; i < nb_sns; i++) {
 sn = &sn_tab[i];
-monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), sn));
+available = 1;
+bs1 = NULL;
+
+while ((bs1 = bdrv_next(bs1))) {
+if (bdrv_can_snapshot(bs1) && bs1 != bs) {
+ret = bdrv_snapshot_find(bs1, sn_info, sn->id_str);
+if (ret < 0) {
+available = 0;
+break;
+}
+}
+}
+
+if (available) {
+available_snapshots[total] = i;
+total++;
+}
 }
+
+if (total > 0) {
+monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), 
NULL));
+for (i = 0; i < total; i++) {
+sn = &sn_tab[available_snapshots[i]];
+monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), 
sn));
+}
+} else {
+monitor_printf(mon, "There is no suitable snapshot available\n");
+}
+
 qemu_free(sn_tab);
+qemu_free(available_snapshots);
+
 }
-- 
1.7.1

[Qemu-devel] [PATCH 3/3] savevm: prevent snapshot overwriting

2010-08-04 Thread Miguel Di Ciurcio Filho

When savevm is run using an previously saved snapshot id or name, it will
delete the original and create a new one, using the same id and name and not
prompting the user of what just happened.

This behaviour is not good, IMHO.

We add a '-f' parameter to savevm, to really force that to happen, in case the
user really wants to.

New behavior:
(qemu) savevm snap1
An snapshot named 'snap1' already exists

(qemu) savevm -f snap1

We do better error reporting in case '-f' is used too than before and don't
reuse the previous id.

Note: This patch depends on "savevm: Generate a name when run without one"

Signed-off-by: Miguel Di Ciurcio Filho 
---
 qemu-monitor.hx |7 ---
 savevm.c|   19 ++-
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 2af3de6..683ac73 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -275,9 +275,10 @@ ETEXI
 
 {
 .name   = "savevm",
-.args_type  = "name:s?",
-.params = "[tag|id]",
-.help   = "save a VM snapshot. If no tag or id are provided, a new 
snapshot is created",
+.args_type  = "force:-f,name:s?",
+.params = "[-f] [tag|id]",
+.help   = "save a VM snapshot. If no tag is provided, a new one is 
created"
+"\n\t\t\t -f to overwrite an snapshot if it already 
exists",
 .mhandler.cmd = do_savevm,
 },
 
diff --git a/savevm.c b/savevm.c
index 025bee6..f0a4b78 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1805,6 +1805,7 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 struct tm tm;
 #endif
 const char *name = qdict_get_try_str(qdict, "name");
+int force = qdict_get_try_bool(qdict, "force", 0);
 
 /* Verify if there is a device that doesn't support snapshots and is 
writable */
 bs = NULL;
@@ -1848,12 +1849,20 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 
 if (name) {
 ret = bdrv_snapshot_find(bs, old_sn, name);
-if (ret >= 0) {
-pstrcpy(sn->name, sizeof(sn->name), old_sn->name);
-pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str);
-} else {
-pstrcpy(sn->name, sizeof(sn->name), name);
+if (ret == 0) {
+if (force) {
+ret = del_existing_snapshots(mon, name);
+if (ret < 0) {
+monitor_printf(mon, "Error deleting snapshot '%s', error: 
%d\n", name, ret);
+goto the_end;
+}
+} else {
+monitor_printf(mon, "An snapshot named '%s' already exists\n", 
name);
+goto the_end;
+}
 }
+
+pstrcpy(sn->name, sizeof(sn->name), name);
 } else {
 #ifdef _WIN32
 ptm = localtime(&tb.time);
-- 
1.7.1

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 08:46 PM, Richard W.M. Jones wrote:

On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:

This is basically my suggestion to libguestfs: instead of generating
an initrd, generate a bootable cdrom, and boot from that.  The
result is faster and has a smaller memory footprint.  Everyone wins.

We had some discussion of this upstream&  decided to do this.  It
should save the time it takes for the guest kernel to unpack the
initrd, so maybe another second off boot time, which could bring us
ever closer to the "golden" 5 second boot target.



Great.  IMO it's the right thing even if initrd took zero time.


It's not trivial mind you, and won't happen straightaway.  Part of it
is that it requires reworking the appliance builder (a matter of just
coding really).  The less trivial part is that we have to 'hide' the
CD device throughout the publically available interfaces.  Then of
course, a lot of testing.

I will note that virt-install uses the -initrd interface for
installing guests (large initrds too).  And I've talked with a
sysadmin who was using -kernel and -initrd for deploying VM hosting.
In his case he did it so he could centralize kernel distribution /
updates, and have the guests use /dev/vda == filesystem which made
provisioning easy [for him -- I would have used libguestfs ...].


We still plan to improve pio speed.

(note a few added seconds to guest install or bootup is not such a drag 
compared to the hit on an interactive tool like libguestfs).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 12:37 PM, Avi Kivity wrote:

 On 08/04/2010 08:27 PM, Anthony Liguori wrote:

On 08/04/2010 12:19 PM, Avi Kivity wrote:

 On 08/04/2010 08:01 PM, Paolo Bonzini wrote:


That's another story and I totally agree here, but not reusing 
/dev/sd* is not intrinsic in the design of virtio-blk (and one 
thing that Windows gets right; everything is SCSI, period).




I don't really get why everything must be SCSI.  Everything must 
support read, write, a few other commands, and a large set of 
optional commands.  But why map them all to SCSI?  What's the magic?


Because that's what real hardware with only a few rare exceptions.



I thought that IDE was emulated as SCSI even when it wasn't.  But I 
guess now with SATA you're right.


IDE -> EIDE -> ATA -> SATA

ATA can encapsulate SCSI commands via ATAPI which gives you the ability 
to have ATA based CD-ROMs among other things.


I don't believe that SATA actually uses SCSI commands for read/write 
operations but I think Linux exposes SATA drivers as SCSI anyway.


Regards,

Anthony Liguori

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 11:44:33AM -0500, Anthony Liguori wrote:
> On 08/04/2010 11:36 AM, Avi Kivity wrote:
> > On 08/04/2010 07:30 PM, Avi Kivity wrote:
> >> On 08/04/2010 04:52 PM, Anthony Liguori wrote:
> >
> This is not like DMA event if done in chunks and chunks can be pretty
> big. The code that dials with copying may temporary unmap some pci
> devices to have more space there.
> >>>
> >>>
> >>>That's a bit complicated because SeaBIOS is managing the PCI
> >>>devices whereas the kernel code is running as an option rom.
> >>>I don't know the BIOS PCI interfaces well so I don't know how
> >>>doable this is.
> >>>
> >>>Maybe we're just being too fancy here.
> >>>
> >>>We could rewrite -kernel/-append/-initrd to just generate a
> >>>floppy image in RAM, and just boot from floppy.
> >>
> >>How could this work?  the RAM belongs to SeaBIOS immediately
> >>after reset, it would just scribble over it.  Or worse, not
> >>scribble on it until some date in the future.
> >>
> >>-kernel data has to find its way to memory after the bios gives
> >>control to some optionrom.  An alternative would be to embed
> >>knowledge of -kernel in seabios, but I don't think it's a good
> >>one.
> >>
> >
> >Oh, you meant host RAM, not guest RAM.  Disregard.
> >
> >This is basically my suggestion to libguestfs: instead of
> >generating an initrd, generate a bootable cdrom, and boot from
> >that.  The result is faster and has a smaller memory footprint.
> >Everyone wins.
> 
> Yeah, but we could also do that entirely in QEMU.  If that's what we
> suggest doing, there's no reason not to do it instead of the option
> rom trickery that we do today.
> 
> The option rom stuff has a number of short comings.  Because we
> hijack int19, extboot doesn't get to run.  That means that if you
> use -kernel to load a grub (the Ubuntu guys for their own absurd
> reasons) then grub does not see extboot backed disks.  The solution
> for them is the same, generate a proper disk and boot from that
> disk.
> 
Extboot is not so relevant any more.

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 19:36, Anthony Liguori wrote:

> On 08/04/2010 12:31 PM, Alexander Graf wrote:
>> On 04.08.2010, at 19:26, Anthony Liguori wrote:
>> 
>>   
>>> On 08/04/2010 11:45 AM, Alexander Graf wrote:
>>> 
 Frankly, I partially agreed to your point when we were talking about 300ms 
 vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. 
 We chose the wrong interface to transfer kernel+initrd data into the guest.
 
 Now the question is how to fix that. I would veto against anything 
 normally guest-OS-visible. By occupying the floppy, you lose a floppy 
 drive in the guest. By occupying a disk, you see an unwanted disk in the 
 guest.
   
>>> 
>>> Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is 
>>> exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be used 
>>> by firmware and that normal guest drivers should not be written for id 6.
>>> 
>> Why not make id 6 be a fw_cfg virtio interface?
> 
> Because that's a ton more work and we need fw_cfg to be available before PCI 
> is.  IOW, fw_cfg cannot be a PCI interface.

in addition to fw_cfg. So you'd have the same contents be exposed using both 
interfaces.

Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 08:27 PM, Anthony Liguori wrote:

On 08/04/2010 12:19 PM, Avi Kivity wrote:

 On 08/04/2010 08:01 PM, Paolo Bonzini wrote:


That's another story and I totally agree here, but not reusing 
/dev/sd* is not intrinsic in the design of virtio-blk (and one thing 
that Windows gets right; everything is SCSI, period).




I don't really get why everything must be SCSI.  Everything must 
support read, write, a few other commands, and a large set of 
optional commands.  But why map them all to SCSI?  What's the magic?


Because that's what real hardware with only a few rare exceptions.



I thought that IDE was emulated as SCSI even when it wasn't.  But I 
guess now with SATA you're right.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Richard W.M. Jones

On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
> This is basically my suggestion to libguestfs: instead of generating
> an initrd, generate a bootable cdrom, and boot from that.  The
> result is faster and has a smaller memory footprint.  Everyone wins.

We had some discussion of this upstream & decided to do this.  It
should save the time it takes for the guest kernel to unpack the
initrd, so maybe another second off boot time, which could bring us
ever closer to the "golden" 5 second boot target.

It's not trivial mind you, and won't happen straightaway.  Part of it
is that it requires reworking the appliance builder (a matter of just
coding really).  The less trivial part is that we have to 'hide' the
CD device throughout the publically available interfaces.  Then of
course, a lot of testing.

I will note that virt-install uses the -initrd interface for
installing guests (large initrds too).  And I've talked with a
sysadmin who was using -kernel and -initrd for deploying VM hosting.
In his case he did it so he could centralize kernel distribution /
updates, and have the guests use /dev/vda == filesystem which made
provisioning easy [for him -- I would have used libguestfs ...].

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 12:31 PM, Alexander Graf wrote:

On 04.08.2010, at 19:26, Anthony Liguori wrote:

   

On 08/04/2010 11:45 AM, Alexander Graf wrote:
 

Frankly, I partially agreed to your point when we were talking about 300ms vs. 
2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose 
the wrong interface to transfer kernel+initrd data into the guest.

Now the question is how to fix that. I would veto against anything normally 
guest-OS-visible. By occupying the floppy, you lose a floppy drive in the 
guest. By occupying a disk, you see an unwanted disk in the guest.
   


Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is 
exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be used by 
firmware and that normal guest drivers should not be written for id 6.
 

Why not make id 6 be a fw_cfg virtio interface?


Because that's a ton more work and we need fw_cfg to be available before 
PCI is.  IOW, fw_cfg cannot be a PCI interface.


Regards,

Anthony Liguori


  That way we'd stay 100% compatible to everything we have and also get a fast 
path for reading big chunks of data from fw_cfg. All we'd need is a command to 
set the 'file' we're in.

Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then 
implement a simple virtio-9p client in SeaBIOS and maybe even get direct 
kernel/initrd boot from a real 9p system ;).


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 08:31 PM, Alexander Graf wrote:


Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then 
implement a simple virtio-9p client in SeaBIOS and maybe even get direct 
kernel/initrd boot from a real 9p system ;).



libguestfs could use 9pfs directly.  That will be way faster and reduce 
the footprint dramatically (the guest will demand load only the pages it 
needs).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 19:14, Avi Kivity wrote:

> On 08/04/2010 08:01 PM, Alexander Graf wrote:
>> 
>>> 
 2) Using a different interface (that could also be DMA fw_cfg - remember, 
 we're on a private interface anyways)
>>> A guest/host interface is not private.
>> fw_cfg is as private as it gets with host/guest interfaces. It's about as 
>> close as CPU specific MSRs or SMC chips.
>> 
> 
> Well, it isn't.  Two external projects already use it.  You can't change it 
> due to the needs to live migrate from older versions.

You can always extend it. You can even break it with a new -M.

> 
 Admittedly 1 would also help in more cases than just booting with -kernel 
 and -initrd, but if that won't get us to acceptable levels (and yes, 8 
 seconds for 100MB is unacceptable) I don't see any way around 2.
>>> 3) don't use -kernel for 100MB or more.  It's not the right tool.
>> Why not? You're the one always ranting about caring about users. Now you get 
>> at least 3 users from the Qemu development community actually using a 
>> feature and you just claim it's wrong? Please, we've added way more useless 
>> features for worse reasons.
>> 
> 
> It's not wrong in itself, but using it with supersized initrds is wrong.  The 
> data is stored in qemu, host pagecache, and the guest, so three copies, it's 
> limited by guest RAM, has to be live migrated.  Sure we could optimize it, 
> but it's better to spend our efforts on more mainstream users.

It's only stored twice. The host pagecache copy is gone during the lifetime of 
the VM. Migration also doesn't make sense for most -kernel/-initrd use cases. 
And it's awesome for fast prototyping. Of course, once that fast becomes dog 
slow, it's not useful anymore.

I bet within the time everybody spent on this thread we would have a working 
and stable DMA fw_cfg interface plus extra spare time for supporting breakage 
already.


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 08:27 PM, Alexander Graf wrote:


Well, it isn't.  Two external projects already use it.  You can't change it due 
to the needs to live migrate from older versions.

You can always extend it. You can even break it with a new -M.


Yes.  But it's a pain to make sure it all works out.  We're already 
suffering from this where we have no choice, why do it where we have a 
choice?



It's not wrong in itself, but using it with supersized initrds is wrong.  The 
data is stored in qemu, host pagecache, and the guest, so three copies, it's 
limited by guest RAM, has to be live migrated.  Sure we could optimize it, but 
it's better to spend our efforts on more mainstream users.

It's only stored twice. The host pagecache copy is gone during the lifetime of 
the VM.


It has still evicted some other pagecache.  Footprint is footprint.  
300MB to cat some file in a guest.



Migration also doesn't make sense for most -kernel/-initrd use cases.


You're just inviting a bug report here.  If we add a feature, let's make 
it work.



And it's awesome for fast prototyping. Of course, once that fast becomes dog 
slow, it's not useful anymore.


For the Nth time, it's only slow with 100MB initrds.


I bet within the time everybody spent on this thread we would have a working 
and stable DMA fw_cfg interface plus extra spare time for supporting breakage 
already.


The time would have been better spent improving kvm's pio or porting 
libguestfs to use a cdrom.  I'm also hoping to get the point across that 
adding pv interfaces like crazy is not sustainable.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf

On 04.08.2010, at 19:26, Anthony Liguori wrote:

> On 08/04/2010 11:45 AM, Alexander Graf wrote:
>> Frankly, I partially agreed to your point when we were talking about 300ms 
>> vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We 
>> chose the wrong interface to transfer kernel+initrd data into the guest.
>> 
>> Now the question is how to fix that. I would veto against anything normally 
>> guest-OS-visible. By occupying the floppy, you lose a floppy drive in the 
>> guest. By occupying a disk, you see an unwanted disk in the guest.
> 
> 
> Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is 
> exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be used 
> by firmware and that normal guest drivers should not be written for id 6.

Why not make id 6 be a fw_cfg virtio interface? That way we'd stay 100% 
compatible to everything we have and also get a fast path for reading big 
chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're 
in.

Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then 
implement a simple virtio-9p client in SeaBIOS and maybe even get direct 
kernel/initrd boot from a real 9p system ;).

Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 11:45 AM, Alexander Graf wrote:

Frankly, I partially agreed to your point when we were talking about 300ms vs. 
2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose 
the wrong interface to transfer kernel+initrd data into the guest.

Now the question is how to fix that. I would veto against anything normally 
guest-OS-visible. By occupying the floppy, you lose a floppy drive in the 
guest. By occupying a disk, you see an unwanted disk in the guest.



Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is 
exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be 
used by firmware and that normal guest drivers should not be written for 
id 6.


Problem is now solved and everyone's happy.  Now we can all go back to 
making slides for next week :-)


Regards,

Anthony Liguori


  By taking virtio-serial you see an unwanted virtio-serial line in the guest. 
fw_cfg is great because it's a private interface nobody else accesses.

I see two alternatives out of this mess:

1) Speed up string PIO so we're actually fast again.
2) Using a different interface (that could also be DMA fw_cfg - remember, we're 
on a private interface anyways)

Admittedly 1 would also help in more cases than just booting with -kernel and 
-initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 
100MB is unacceptable) I don't see any way around 2.


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 12:19 PM, Avi Kivity wrote:

 On 08/04/2010 08:01 PM, Paolo Bonzini wrote:


That's another story and I totally agree here, but not reusing 
/dev/sd* is not intrinsic in the design of virtio-blk (and one thing 
that Windows gets right; everything is SCSI, period).




I don't really get why everything must be SCSI.  Everything must 
support read, write, a few other commands, and a large set of optional 
commands.  But why map them all to SCSI?  What's the magic?


Because that's what real hardware with only a few rare exceptions.

Regards,

Anthony Liguori

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf

On 04.08.2010, at 19:19, Avi Kivity wrote:

> On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>> 
>> That's another story and I totally agree here, but not reusing /dev/sd* is 
>> not intrinsic in the design of virtio-blk (and one thing that Windows gets 
>> right; everything is SCSI, period).
>> 
> 
> I don't really get why everything must be SCSI.  Everything must support 
> read, write, a few other commands, and a large set of optional commands.  But 
> why map them all to SCSI?  What's the magic?

Hence the reference to megasas. It implements its own read/write/few other 
commands and the whole stack of optional commands as SCSI. I think virtio-blk 
should be the same.

SCSI simply because it's there, it's flexible and it's well defined. You get a 
working spec and a lot of working implementations.

Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 08:01 PM, Paolo Bonzini wrote:


That's another story and I totally agree here, but not reusing 
/dev/sd* is not intrinsic in the design of virtio-blk (and one thing 
that Windows gets right; everything is SCSI, period).




I don't really get why everything must be SCSI.  Everything must support 
read, write, a few other commands, and a large set of optional 
commands.  But why map them all to SCSI?  What's the magic?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 08:01 PM, Alexander Graf wrote:





2) Using a different interface (that could also be DMA fw_cfg - remember, we're 
on a private interface anyways)

A guest/host interface is not private.

fw_cfg is as private as it gets with host/guest interfaces. It's about as close 
as CPU specific MSRs or SMC chips.



Well, it isn't.  Two external projects already use it.  You can't change 
it due to the needs to live migrate from older versions.



Admittedly 1 would also help in more cases than just booting with -kernel and 
-initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 
100MB is unacceptable) I don't see any way around 2.

3) don't use -kernel for 100MB or more.  It's not the right tool.

Why not? You're the one always ranting about caring about users. Now you get at 
least 3 users from the Qemu development community actually using a feature and 
you just claim it's wrong? Please, we've added way more useless features for 
worse reasons.



It's not wrong in itself, but using it with supersized initrds is 
wrong.  The data is stored in qemu, host pagecache, and the guest, so 
three copies, it's limited by guest RAM, has to be live migrated.  Sure 
we could optimize it, but it's better to spend our efforts on more 
mainstream users.


If you want to pull large amounts of data into the guest efficiently, 
use virtio-blk.  That's what it's for.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf

On 04.08.2010, at 18:54, Avi Kivity wrote:

> On 08/04/2010 07:45 PM, Alexander Graf wrote:
>> 
>> I see two alternatives out of this mess:
>> 
>> 1) Speed up string PIO so we're actually fast again.
> 
> Certainly, the best option given that it needs no new interfaces, and 
> improves the most workloads.
> 
>> 2) Using a different interface (that could also be DMA fw_cfg - remember, 
>> we're on a private interface anyways)
> 
> A guest/host interface is not private.

fw_cfg is as private as it gets with host/guest interfaces. It's about as close 
as CPU specific MSRs or SMC chips.

> 
>> Admittedly 1 would also help in more cases than just booting with -kernel 
>> and -initrd, but if that won't get us to acceptable levels (and yes, 8 
>> seconds for 100MB is unacceptable) I don't see any way around 2.
> 
> 3) don't use -kernel for 100MB or more.  It's not the right tool.

Why not? You're the one always ranting about caring about users. Now you get at 
least 3 users from the Qemu development community actually using a feature and 
you just claim it's wrong? Please, we've added way more useless features for 
worse reasons.

Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Paolo Bonzini


On 08/04/2010 06:49 PM, Anthony Liguori wrote:

Right, the only question is, to you inject your own bus or do you
just reuse SCSI.  On the surface, it seems like reusing SCSI has a
significant number of advantages.  For instance, without changing the
guest's drivers, we can implement PV cdroms or PC tape drivers.


If you want multiple LUNs per virtio device SCSI is obviously a good 
choice, but you will need something more (like the config space Avi 
mentioned).  My position is that getting this "something more" right is 
considerably harder than virtio-blk.


Maybe it will be done some day, but I still think that not having 
virtio-scsi from day 1 was actually a good thing.  Even if we can learn 
from xenbus and all that.



What exactly would keep us from doing that with virtio-blk? I thought
that supports scsi commands already.


I think the toughest change would be making it appear as a scsi device
within the guest.  You could do that to virtio-blk but it would be a
flag day as reasonable configured guests will break.

Having virtio-blk device show up as /dev/vdX was a big mistake.  It's
been nothing but a giant PITA.  There is an amazing amount of software
that only looks at /dev/sd* and /dev/hd*.


That's another story and I totally agree here, but not reusing /dev/sd* 
is not intrinsic in the design of virtio-blk (and one thing that Windows 
gets right; everything is SCSI, period).


Paolo

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 18:49, Anthony Liguori wrote:

> On 08/04/2010 11:48 AM, Alexander Graf wrote:
>> On 04.08.2010, at 18:46, Anthony Liguori wrote:
>> 
>>   
>>> On 08/04/2010 11:44 AM, Avi Kivity wrote:
>>> 
 On 08/04/2010 03:53 PM, Anthony Liguori wrote:
   
> So how do we enable support for more than 20 disks?  I think a 
> virtio-scsi is inevitable..
> 
 Not only for large numbers of disks, also for JBOD performance.  If you 
 have one queue per disk you'll have low queue depths and high interrupt 
 rates.  Aggregating many spindles into a single queue is important for 
 reducing overhead.
   
>>> Right, the only question is, to you inject your own bus or do you just 
>>> reuse SCSI.  On the surface, it seems like reusing SCSI has a significant 
>>> number of advantages.  For instance, without changing the guest's drivers, 
>>> we can implement PV cdroms or PC tape drivers.
>>> 
>> What exactly would keep us from doing that with virtio-blk? I thought that 
>> supports scsi commands already.
>>   
> 
> I think the toughest change would be making it appear as a scsi device within 
> the guest.  You could do that to virtio-blk but it would be a flag day as 
> reasonable configured guests will break.
> 
> Having virtio-blk device show up as /dev/vdX was a big mistake.  It's been 
> nothing but a giant PITA.  There is an amazing amount of software that only 
> looks at /dev/sd* and /dev/hd*.

I completely agree and yes, we should move in that direction IMHO. I don't see 
why virtio-blk should be any different from megasas for example.

Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 11:48 AM, Alexander Graf wrote:

On 04.08.2010, at 18:46, Anthony Liguori wrote:

   

On 08/04/2010 11:44 AM, Avi Kivity wrote:
 

On 08/04/2010 03:53 PM, Anthony Liguori wrote:
   

So how do we enable support for more than 20 disks?  I think a virtio-scsi is 
inevitable..
 

Not only for large numbers of disks, also for JBOD performance.  If you have 
one queue per disk you'll have low queue depths and high interrupt rates.  
Aggregating many spindles into a single queue is important for reducing 
overhead.
   

Right, the only question is, to you inject your own bus or do you just reuse 
SCSI.  On the surface, it seems like reusing SCSI has a significant number of 
advantages.  For instance, without changing the guest's drivers, we can 
implement PV cdroms or PC tape drivers.
 

What exactly would keep us from doing that with virtio-blk? I thought that 
supports scsi commands already.
   


I think the toughest change would be making it appear as a scsi device 
within the guest.  You could do that to virtio-blk but it would be a 
flag day as reasonable configured guests will break.


Having virtio-blk device show up as /dev/vdX was a big mistake.  It's 
been nothing but a giant PITA.  There is an amazing amount of software 
that only looks at /dev/sd* and /dev/hd*.


Regards,

Anthony Liguori


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 07:45 PM, Alexander Graf wrote:


I see two alternatives out of this mess:

1) Speed up string PIO so we're actually fast again.


Certainly, the best option given that it needs no new interfaces, and 
improves the most workloads.



2) Using a different interface (that could also be DMA fw_cfg - remember, we're 
on a private interface anyways)


A guest/host interface is not private.


Admittedly 1 would also help in more cases than just booting with -kernel and 
-initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 
100MB is unacceptable) I don't see any way around 2.


3) don't use -kernel for 100MB or more.  It's not the right tool.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 07:08 PM, Gleb Natapov wrote:


After applying cache fix nothing definite as far as I remember (I ran it last 
time
almost 2 week ago, need to rerun). Code always go through emulator now
and check direction flags to update SI/DI accordingly. Emulator is a big
switch and it calls various callbacks that may also slow things down.



We can have it set up a fast path.  Similar to how real hardware 
optimizes 'rep movs' to copy complete cachelines.


The emulator does all the checks, sets up a callback to be called on 
completion or when an interrupt is made pending, and lets x86.c do all 
the work.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 18:46, Anthony Liguori wrote:

> On 08/04/2010 11:44 AM, Avi Kivity wrote:
>> On 08/04/2010 03:53 PM, Anthony Liguori wrote:
>>> 
>>> So how do we enable support for more than 20 disks?  I think a virtio-scsi 
>>> is inevitable..
>> 
>> Not only for large numbers of disks, also for JBOD performance.  If you have 
>> one queue per disk you'll have low queue depths and high interrupt rates.  
>> Aggregating many spindles into a single queue is important for reducing 
>> overhead.
> 
> Right, the only question is, to you inject your own bus or do you just reuse 
> SCSI.  On the surface, it seems like reusing SCSI has a significant number of 
> advantages.  For instance, without changing the guest's drivers, we can 
> implement PV cdroms or PC tape drivers.

What exactly would keep us from doing that with virtio-blk? I thought that 
supports scsi commands already.


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 11:44 AM, Avi Kivity wrote:

 On 08/04/2010 03:53 PM, Anthony Liguori wrote:


So how do we enable support for more than 20 disks?  I think a 
virtio-scsi is inevitable..


Not only for large numbers of disks, also for JBOD performance.  If 
you have one queue per disk you'll have low queue depths and high 
interrupt rates.  Aggregating many spindles into a single queue is 
important for reducing overhead.


Right, the only question is, to you inject your own bus or do you just 
reuse SCSI.  On the surface, it seems like reusing SCSI has a 
significant number of advantages.  For instance, without changing the 
guest's drivers, we can implement PV cdroms or PC tape drivers.  It also 
supports SCSI level pass through which is pretty nice for enabling 
things like NPIV.


Regards,

Anthony Liguori

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 07:44 PM, Anthony Liguori wrote:


The option rom stuff has a number of short comings.  Because we hijack 
int19, extboot doesn't get to run.  That means that if you use -kernel 
to load a grub (the Ubuntu guys for their own absurd reasons) then 
grub does not see extboot backed disks.  The solution for them is the 
same, generate a proper disk and boot from that disk.




Let's print it out and hand out leaflets at the upcoming kvm forum.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf

On 04.08.2010, at 18:36, Avi Kivity wrote:

> On 08/04/2010 07:30 PM, Avi Kivity wrote:
>> On 08/04/2010 04:52 PM, Anthony Liguori wrote:
> 
 This is not like DMA event if done in chunks and chunks can be pretty
 big. The code that dials with copying may temporary unmap some pci
 devices to have more space there.
>>> 
>>> 
>>> That's a bit complicated because SeaBIOS is managing the PCI devices 
>>> whereas the kernel code is running as an option rom.  I don't know the BIOS 
>>> PCI interfaces well so I don't know how doable this is.
>>> 
>>> Maybe we're just being too fancy here.
>>> 
>>> We could rewrite -kernel/-append/-initrd to just generate a floppy image in 
>>> RAM, and just boot from floppy.
>> 
>> How could this work?  the RAM belongs to SeaBIOS immediately after reset, it 
>> would just scribble over it.  Or worse, not scribble on it until some date 
>> in the future.
>> 
>> -kernel data has to find its way to memory after the bios gives control to 
>> some optionrom.  An alternative would be to embed knowledge of -kernel in 
>> seabios, but I don't think it's a good one.
>> 
> 
> Oh, you meant host RAM, not guest RAM.  Disregard.
> 
> This is basically my suggestion to libguestfs: instead of generating an 
> initrd, generate a bootable cdrom, and boot from that.  The result is faster 
> and has a smaller memory footprint.  Everyone wins.

Frankly, I partially agreed to your point when we were talking about 300ms vs. 
2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose 
the wrong interface to transfer kernel+initrd data into the guest.

Now the question is how to fix that. I would veto against anything normally 
guest-OS-visible. By occupying the floppy, you lose a floppy drive in the 
guest. By occupying a disk, you see an unwanted disk in the guest. By taking 
virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is 
great because it's a private interface nobody else accesses.

I see two alternatives out of this mess:

1) Speed up string PIO so we're actually fast again.
2) Using a different interface (that could also be DMA fw_cfg - remember, we're 
on a private interface anyways)

Admittedly 1 would also help in more cases than just booting with -kernel and 
-initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 
100MB is unacceptable) I don't see any way around 2.

Alex

[Qemu-devel] [Bug 613529] [NEW] qemu does not accept regular disk geometry

2010-08-04 Thread Hadmut Danisch

Public bug reported:

Hi,

I am currently hunting a strange bug in qemu/kvm:

I am using an lvm logical volume as a virtual hard disk for a virtual
machine.

I use fdisk or parted to create a partition table and partitions, kpartx
to generate the device entries for the partitions, then install linux on
ext3/ext4 with grub or msdos filesystem with syslinux.

But then, in most cases even the boot process fails or behaves
strangely, sometimes even mounting the file system in the virtual
machine fails. It seems as if there is a problem with the virtual disk
geometry. The problem does not seem to occur if I reboot the host system
after creating the partition table on the logical volume. I guess the
linux kernel needs to learn the disk geometry by reboot. A blkdev
--rereadpt does not work on lvm volumes.

The first approach to test/fix the problem would be to pass the disk
geometry to qemu/lvm with the -drive option. Unfortunately, qemu/kvm
does not accept the default geometry with 255 heads and 63 sectors.
Seems to limit the number of heads to 16, thus limiting the disk size.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
qemu does not accept regular disk geometry
https://bugs.launchpad.net/bugs/613529
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
Hi, 

I am currently hunting a strange bug in qemu/kvm:

I am using an lvm logical volume as a virtual hard disk for a virtual machine. 

I use fdisk or parted to create a partition table and partitions, kpartx to 
generate the device entries for the partitions, then install linux on ext3/ext4 
with grub or msdos filesystem with syslinux. 

But then, in most cases even the boot process fails or behaves strangely, 
sometimes even mounting the file system in the virtual machine fails. It seems 
as if there is a problem with the virtual disk geometry. The problem does not 
seem to occur if I reboot the host system after creating the partition table on 
the logical volume. I guess the linux kernel needs to learn the disk geometry 
by reboot. A blkdev --rereadpt does not work on lvm volumes. 

The first approach to test/fix the problem would be to pass the disk geometry 
to qemu/lvm with the -drive option. Unfortunately, qemu/kvm does not accept the 
default geometry with 255 heads and 63 sectors. Seems to limit the number of 
heads to 16, thus limiting the disk size.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 11:36 AM, Avi Kivity wrote:

 On 08/04/2010 07:30 PM, Avi Kivity wrote:

 On 08/04/2010 04:52 PM, Anthony Liguori wrote:



This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.



That's a bit complicated because SeaBIOS is managing the PCI devices 
whereas the kernel code is running as an option rom.  I don't know 
the BIOS PCI interfaces well so I don't know how doable this is.


Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy 
image in RAM, and just boot from floppy.


How could this work?  the RAM belongs to SeaBIOS immediately after 
reset, it would just scribble over it.  Or worse, not scribble on it 
until some date in the future.


-kernel data has to find its way to memory after the bios gives 
control to some optionrom.  An alternative would be to embed 
knowledge of -kernel in seabios, but I don't think it's a good one.




Oh, you meant host RAM, not guest RAM.  Disregard.

This is basically my suggestion to libguestfs: instead of generating 
an initrd, generate a bootable cdrom, and boot from that.  The result 
is faster and has a smaller memory footprint.  Everyone wins.


Yeah, but we could also do that entirely in QEMU.  If that's what we 
suggest doing, there's no reason not to do it instead of the option rom 
trickery that we do today.


The option rom stuff has a number of short comings.  Because we hijack 
int19, extboot doesn't get to run.  That means that if you use -kernel 
to load a grub (the Ubuntu guys for their own absurd reasons) then grub 
does not see extboot backed disks.  The solution for them is the same, 
generate a proper disk and boot from that disk.


Regards,

Anthony Liguori

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 03:53 PM, Anthony Liguori wrote:


So how do we enable support for more than 20 disks?  I think a 
virtio-scsi is inevitable..


Not only for large numbers of disks, also for JBOD performance.  If you 
have one queue per disk you'll have low queue depths and high interrupt 
rates.  Aggregating many spindles into a single queue is important for 
reducing overhead.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 11:30 AM, Avi Kivity wrote:

 On 08/04/2010 04:52 PM, Anthony Liguori wrote:



This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.



That's a bit complicated because SeaBIOS is managing the PCI devices 
whereas the kernel code is running as an option rom.  I don't know 
the BIOS PCI interfaces well so I don't know how doable this is.


Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy 
image in RAM, and just boot from floppy.


How could this work?  the RAM belongs to SeaBIOS immediately after 
reset, it would just scribble over it.  Or worse, not scribble on it 
until some date in the future.


I mean host RAM, not guest RAM.

Regards,

Anthony Liguori



-kernel data has to find its way to memory after the bios gives 
control to some optionrom.  An alternative would be to embed knowledge 
of -kernel in seabios, but I don't think it's a good one.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 07:30 PM, Avi Kivity wrote:

 On 08/04/2010 04:52 PM, Anthony Liguori wrote:



This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.



That's a bit complicated because SeaBIOS is managing the PCI devices 
whereas the kernel code is running as an option rom.  I don't know 
the BIOS PCI interfaces well so I don't know how doable this is.


Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy 
image in RAM, and just boot from floppy.


How could this work?  the RAM belongs to SeaBIOS immediately after 
reset, it would just scribble over it.  Or worse, not scribble on it 
until some date in the future.


-kernel data has to find its way to memory after the bios gives 
control to some optionrom.  An alternative would be to embed knowledge 
of -kernel in seabios, but I don't think it's a good one.




Oh, you meant host RAM, not guest RAM.  Disregard.

This is basically my suggestion to libguestfs: instead of generating an 
initrd, generate a bootable cdrom, and boot from that.  The result is 
faster and has a smaller memory footprint.  Everyone wins.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 05:39 PM, Anthony Liguori wrote:


We could make kernel an awful lot smarter but unless we've got someone 
just itching to write 16-bit option rom code, I think our best bet is 
to try to leverage a standard bootloader and expose a disk containing 
the kernel/initrd.




A problem with that is that the booted kernel would see that disk and 
try to do something with it.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 04:52 PM, Anthony Liguori wrote:



This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.



That's a bit complicated because SeaBIOS is managing the PCI devices 
whereas the kernel code is running as an option rom.  I don't know the 
BIOS PCI interfaces well so I don't know how doable this is.


Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy 
image in RAM, and just boot from floppy.


How could this work?  the RAM belongs to SeaBIOS immediately after 
reset, it would just scribble over it.  Or worse, not scribble on it 
until some date in the future.


-kernel data has to find its way to memory after the bios gives control 
to some optionrom.  An alternative would be to embed knowledge of 
-kernel in seabios, but I don't think it's a good one.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 04:24 PM, Richard W.M. Jones wrote:


It's boot time, so you can just map it over some existing RAM surely?
Linuxboot.bin can work out where to map it so it won't be in any
memory either being used or the target for the copy.


There's no such thing as boot time from the host's point of view.  There 
are interfaces and they should work whatever the guest is doing right now.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Avi Kivity


 On 08/04/2010 04:04 PM, Anthony Liguori wrote:

On 08/04/2010 03:17 AM, Avi Kivity wrote:

For playing games, there are three options:
- existing fwcfg
- fwcfg+dma
- put roms in 4GB-2MB (or whatever we decide the flash size is) and 
have the BIOS copy them


Existing fwcfg is the least amount of work and probably satisfactory 
for isapc.  fwcfg+dma is IMO going off a tangent.  High memory flash 
is the most hardware-like solution, pretty easy from a qemu point of 
view but requires more work.


The only trouble I see is that high memory isn't always available.  If 
it's a 32-bit PC and you've exhausted RAM space, then you're only left 
with the PCI hole and it's not clear to me if you can really pull out 
100mb of space there as an option ROM without breaking something.




100MB is out of the question, certainly.  I'm talking about your isapc 
problem, not about a cdrom replacement.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 05:59:40PM +0200, Alexander Graf wrote:
> 
> On 04.08.2010, at 17:48, Gleb Natapov wrote:
> 
> > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
> >> 
> >> On 04.08.2010, at 17:25, Gleb Natapov wrote:
> >> 
> >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
>  On 08/04/2010 09:51 AM, David S. Ahern wrote:
> > 
> > On 08/03/10 12:43, Avi Kivity wrote:
> >> libguestfs does not depend on an x86 architectural feature.
> >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We 
> >> should
> >> discourage people from depending on this interface for production use.
> > That is a feature of qemu - and an important one to me as well. Why
> > should it be discouraged? You end up at the same place -- a running
> > kernel and in-ram filesystem; why require going through a bootloader
> > just because the hardware case needs it?
>  
>  It's smoke and mirrors.  We're still providing a boot loader it's
>  just a little tiny one that we've written soley for this purpose.
>  
>  And it works fine for production use.  The question is whether we
>  ought to be aggressively optimizing it for large initrd sizes.  To
>  be honest, after a lot of discussion of possibilities, I've come to
>  the conclusion that it's just not worth it.
>  
>  There are better ways like using string I/O and optimizing the PIO
>  path in the kernel.  That should cut down the 1s slow down with a
>  100MB initrd by a bit.  But honestly, shaving a couple hundred ms
>  further off the initrd load is just not worth it using the current
>  model.
>  
> >>> The slow down is not 1s any more. String PIO emulation had many bugs
> >>> that were fixed in 2.6.35. I verified how much time it took to load 100M
> >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> >>> that was already committed make it 20s. I have some code prototype that
> >>> makes it 11s. I don't see how we can get below that, surely not back to
> >>> ~2-3sec.
> >> 
> >> What exactly is the reason for the slowdown? It can't be only boundary and 
> >> permission checks, right?
> >> 
> >> 
> > The big part of slowdown right now is that write into memory is done
> > for each byte. It means for each byte we call kvm_write_guest() and
> > kvm_mmu_pte_write(). The second call is needed in case memory, instruction
> > is trying to write to, is shadowed. Previously we didn't checked for
> > that at all. This can be mitigated by introducing write cache and do
> > combined writes into the memory and unshadow the page if there is more
> > then one write into it. This optimization saves ~10secs. Currently string
> 
> Ok, so you tackled that bit already.
> 
> > emulation enter guest from time to time to check if event injection is
> > needed and read from userspace is done in 1K chunks, not 4K like it was,
> > but when I made reads to be 4K and disabled guest reentry I haven't seen
> > any speed improvements worth talking about.
> 
> So what are we wasting those 10 seconds on then? Does perf tell you anything 
> useful?
> 
Not 10, but 7-8 seconds.

After applying cache fix nothing definite as far as I remember (I ran it last 
time
almost 2 week ago, need to rerun). Code always go through emulator now
and check direction flags to update SI/DI accordingly. Emulator is a big
switch and it calls various callbacks that may also slow things down.

--
Gleb.

[Qemu-devel] [Bug 586175] Re: Windows XP/2003 doesn't boot

2010-08-04 Thread Bug Watch Updater

** Changed in: debian
   Status: New => Fix Released

-- 
Windows XP/2003 doesn't boot
https://bugs.launchpad.net/bugs/586175
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Incomplete
Status in “qemu-kvm” package in Ubuntu: New
Status in Debian GNU/Linux: Fix Released
Status in Fedora: Unknown

Bug description:
Hello everyone,

my qemu doesn't boot any Windows XP/2003 installations if I try to boot the 
image.
If I boot the install cd first, it's boot manager counts down and triggers the 
boot on it's own. That's kinda stupid.

I'm using libvirt, but even by a simple
> qemu-kvm -drive file=image.img,media=disk,if=ide,boot=on
it won't boot. Qemu hangs at the message "Booting from Hard Disk..."

I'm using qemu-kvm-0.12.4 with SeaBIOS 0.5.1 on Gentoo (No-Multilib and AMD64). 
It's a server, that means I'm using VNC as the primary graphic output but i 
don't think it should be an issue.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 17:48, Gleb Natapov wrote:

> On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
>> 
>> On 04.08.2010, at 17:25, Gleb Natapov wrote:
>> 
>>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
 On 08/04/2010 09:51 AM, David S. Ahern wrote:
> 
> On 08/03/10 12:43, Avi Kivity wrote:
>> libguestfs does not depend on an x86 architectural feature.
>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
>> discourage people from depending on this interface for production use.
> That is a feature of qemu - and an important one to me as well. Why
> should it be discouraged? You end up at the same place -- a running
> kernel and in-ram filesystem; why require going through a bootloader
> just because the hardware case needs it?
 
 It's smoke and mirrors.  We're still providing a boot loader it's
 just a little tiny one that we've written soley for this purpose.
 
 And it works fine for production use.  The question is whether we
 ought to be aggressively optimizing it for large initrd sizes.  To
 be honest, after a lot of discussion of possibilities, I've come to
 the conclusion that it's just not worth it.
 
 There are better ways like using string I/O and optimizing the PIO
 path in the kernel.  That should cut down the 1s slow down with a
 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
 further off the initrd load is just not worth it using the current
 model.
 
>>> The slow down is not 1s any more. String PIO emulation had many bugs
>>> that were fixed in 2.6.35. I verified how much time it took to load 100M
>>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
>>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
>>> that was already committed make it 20s. I have some code prototype that
>>> makes it 11s. I don't see how we can get below that, surely not back to
>>> ~2-3sec.
>> 
>> What exactly is the reason for the slowdown? It can't be only boundary and 
>> permission checks, right?
>> 
>> 
> The big part of slowdown right now is that write into memory is done
> for each byte. It means for each byte we call kvm_write_guest() and
> kvm_mmu_pte_write(). The second call is needed in case memory, instruction
> is trying to write to, is shadowed. Previously we didn't checked for
> that at all. This can be mitigated by introducing write cache and do
> combined writes into the memory and unshadow the page if there is more
> then one write into it. This optimization saves ~10secs. Currently string

Ok, so you tackled that bit already.

> emulation enter guest from time to time to check if event injection is
> needed and read from userspace is done in 1K chunks, not 4K like it was,
> but when I made reads to be 4K and disabled guest reentry I haven't seen
> any speed improvements worth talking about.

So what are we wasting those 10 seconds on then? Does perf tell you anything 
useful?


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
> 
> On 04.08.2010, at 17:25, Gleb Natapov wrote:
> 
> > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> >> On 08/04/2010 09:51 AM, David S. Ahern wrote:
> >>> 
> >>> On 08/03/10 12:43, Avi Kivity wrote:
>  libguestfs does not depend on an x86 architectural feature.
>  qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
>  discourage people from depending on this interface for production use.
> >>> That is a feature of qemu - and an important one to me as well. Why
> >>> should it be discouraged? You end up at the same place -- a running
> >>> kernel and in-ram filesystem; why require going through a bootloader
> >>> just because the hardware case needs it?
> >> 
> >> It's smoke and mirrors.  We're still providing a boot loader it's
> >> just a little tiny one that we've written soley for this purpose.
> >> 
> >> And it works fine for production use.  The question is whether we
> >> ought to be aggressively optimizing it for large initrd sizes.  To
> >> be honest, after a lot of discussion of possibilities, I've come to
> >> the conclusion that it's just not worth it.
> >> 
> >> There are better ways like using string I/O and optimizing the PIO
> >> path in the kernel.  That should cut down the 1s slow down with a
> >> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> >> further off the initrd load is just not worth it using the current
> >> model.
> >> 
> > The slow down is not 1s any more. String PIO emulation had many bugs
> > that were fixed in 2.6.35. I verified how much time it took to load 100M
> > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> > that was already committed make it 20s. I have some code prototype that
> > makes it 11s. I don't see how we can get below that, surely not back to
> > ~2-3sec.
> 
> What exactly is the reason for the slowdown? It can't be only boundary and 
> permission checks, right?
> 
> 
The big part of slowdown right now is that write into memory is done
for each byte. It means for each byte we call kvm_write_guest() and
kvm_mmu_pte_write(). The second call is needed in case memory, instruction
is trying to write to, is shadowed. Previously we didn't checked for
that at all. This can be mitigated by introducing write cache and do
combined writes into the memory and unshadow the page if there is more
then one write into it. This optimization saves ~10secs. Currently string
emulation enter guest from time to time to check if event injection is
needed and read from userspace is done in 1K chunks, not 4K like it was,
but when I made reads to be 4K and disabled guest reentry I haven't seen
any speed improvements worth talking about.

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Alexander Graf


On 04.08.2010, at 17:25, Gleb Natapov wrote:

> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
>> On 08/04/2010 09:51 AM, David S. Ahern wrote:
>>> 
>>> On 08/03/10 12:43, Avi Kivity wrote:
 libguestfs does not depend on an x86 architectural feature.
 qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
 discourage people from depending on this interface for production use.
>>> That is a feature of qemu - and an important one to me as well. Why
>>> should it be discouraged? You end up at the same place -- a running
>>> kernel and in-ram filesystem; why require going through a bootloader
>>> just because the hardware case needs it?
>> 
>> It's smoke and mirrors.  We're still providing a boot loader it's
>> just a little tiny one that we've written soley for this purpose.
>> 
>> And it works fine for production use.  The question is whether we
>> ought to be aggressively optimizing it for large initrd sizes.  To
>> be honest, after a lot of discussion of possibilities, I've come to
>> the conclusion that it's just not worth it.
>> 
>> There are better ways like using string I/O and optimizing the PIO
>> path in the kernel.  That should cut down the 1s slow down with a
>> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
>> further off the initrd load is just not worth it using the current
>> model.
>> 
> The slow down is not 1s any more. String PIO emulation had many bugs
> that were fixed in 2.6.35. I verified how much time it took to load 100M
> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> that was already committed make it 20s. I have some code prototype that
> makes it 11s. I don't see how we can get below that, surely not back to
> ~2-3sec.

What exactly is the reason for the slowdown? It can't be only boundary and 
permission checks, right?


Alex

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> On 08/04/2010 09:51 AM, David S. Ahern wrote:
> >
> >On 08/03/10 12:43, Avi Kivity wrote:
> >>libguestfs does not depend on an x86 architectural feature.
> >>qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
> >>discourage people from depending on this interface for production use.
> >That is a feature of qemu - and an important one to me as well. Why
> >should it be discouraged? You end up at the same place -- a running
> >kernel and in-ram filesystem; why require going through a bootloader
> >just because the hardware case needs it?
> 
> It's smoke and mirrors.  We're still providing a boot loader it's
> just a little tiny one that we've written soley for this purpose.
> 
> And it works fine for production use.  The question is whether we
> ought to be aggressively optimizing it for large initrd sizes.  To
> be honest, after a lot of discussion of possibilities, I've come to
> the conclusion that it's just not worth it.
> 
> There are better ways like using string I/O and optimizing the PIO
> path in the kernel.  That should cut down the 1s slow down with a
> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> further off the initrd load is just not worth it using the current
> model.
> 
The slow down is not 1s any more. String PIO emulation had many bugs
that were fixed in 2.6.35. I verified how much time it took to load 100M
via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
that was already committed make it 20s. I have some code prototype that
makes it 11s. I don't see how we can get below that, surely not back to
~2-3sec.

> If this is important to someone, we ought to look at refactoring the
> loader completely to be disk based which is a higher performance
> interface.
> 
> Regards,
> 
> Anthony Liguori
> 
> >David
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 10:07:24AM -0500, Anthony Liguori wrote:
> On 08/04/2010 10:01 AM, Gleb Natapov wrote:
> >
> >Hm, may be. I read seabios code differently, but may be I misread it.
> 
> The BIOS Boot Specification spells it all out pretty clearly.
> 
I have the spec. Isn't this enough to be an expert? Or do you mean I
should read it too?

> >>If a ROM needs memory after the init function, it needs to use the
> >>traditional tricks to allocate long term memory and the most popular
> >>one is modifying the e820 tables.
> >>
> >e820 has no in memory format,
> 
> Indeed.
> 
> >>See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.
> >so this ugly code intercepts int15 and mangle result. OMG. How this can
> >even work if more then two ROMs want to do that?
> 
> You have to save the old handlers and invoke them.  Where do you
> save the old handlers?  There's tricks you can do by trying to use
> some unused vectors and also temporarily using the stack.
> 
> But basically, yeah, I'm amazed every time I see a PC boot that it
> all actually works :-)
> 
Heh.

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 10:01 AM, Gleb Natapov wrote:


Hm, may be. I read seabios code differently, but may be I misread it.
   


The BIOS Boot Specification spells it all out pretty clearly.


If a ROM needs memory after the init function, it needs to use the
traditional tricks to allocate long term memory and the most popular
one is modifying the e820 tables.

 

e820 has no in memory format,
   


Indeed.


See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.
 

so this ugly code intercepts int15 and mangle result. OMG. How this can
even work if more then two ROMs want to do that?
   


You have to save the old handlers and invoke them.  Where do you save 
the old handlers?  There's tricks you can do by trying to use some 
unused vectors and also temporarily using the stack.


But basically, yeah, I'm amazed every time I see a PC boot that it all 
actually works :-)


Regards,

Anthony Liguori


--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote:
> On 08/04/2010 09:38 AM, Gleb Natapov wrote:
> >>
> >>But even if it wasn't it can potentially create havoc.  I think we
> >>currently believe that the northbridge likely never forwards RAM
> >>access to a device so this doesn't fit how hardware would work.
> >>
> >Good point.
> >
> >>More importantly, BIOSes and ROMs do very funny things with RAM.
> >>It's not unusual for a ROM to muck with the e820 map to allocate RAM
> >>for itself which means there's always the chance that we're going to
> >>walk over RAM being used for something else.
> >>
> >ROM does not muck with the e820. It uses PMM to allocate memory and the
> >memory it gets is marked as reserved in e820 map.
> 
> PMM allocations are only valid during the init function's execution.
> It's intention is to enable the use of scratch memory to decompress
> or otherwise modify the ROM to shrink its size.
> 
Hm, may be. I read seabios code differently, but may be I misread it.
 
> If a ROM needs memory after the init function, it needs to use the
> traditional tricks to allocate long term memory and the most popular
> one is modifying the e820 tables.
> 
e820 has no in memory format,

> See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.
so this ugly code intercepts int15 and mangle result. OMG. How this can
even work if more then two ROMs want to do that?

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 09:51 AM, David S. Ahern wrote:


On 08/03/10 12:43, Avi Kivity wrote:
   

libguestfs does not depend on an x86 architectural feature.
qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
discourage people from depending on this interface for production use.
 

That is a feature of qemu - and an important one to me as well. Why
should it be discouraged? You end up at the same place -- a running
kernel and in-ram filesystem; why require going through a bootloader
just because the hardware case needs it?
   


It's smoke and mirrors.  We're still providing a boot loader it's just a 
little tiny one that we've written soley for this purpose.


And it works fine for production use.  The question is whether we ought 
to be aggressively optimizing it for large initrd sizes.  To be honest, 
after a lot of discussion of possibilities, I've come to the conclusion 
that it's just not worth it.


There are better ways like using string I/O and optimizing the PIO path 
in the kernel.  That should cut down the 1s slow down with a 100MB 
initrd by a bit.  But honestly, shaving a couple hundred ms further off 
the initrd load is just not worth it using the current model.


If this is important to someone, we ought to look at refactoring the 
loader completely to be disk based which is a higher performance interface.


Regards,

Anthony Liguori


David

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread David S. Ahern

On 08/03/10 12:43, Avi Kivity wrote:
> libguestfs does not depend on an x86 architectural feature. 
> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
> discourage people from depending on this interface for production use.

That is a feature of qemu - and an important one to me as well. Why
should it be discouraged? You end up at the same place -- a running
kernel and in-ram filesystem; why require going through a bootloader
just because the hardware case needs it?

David

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 09:38 AM, Gleb Natapov wrote:


But even if it wasn't it can potentially create havoc.  I think we
currently believe that the northbridge likely never forwards RAM
access to a device so this doesn't fit how hardware would work.

 

Good point.

   

More importantly, BIOSes and ROMs do very funny things with RAM.
It's not unusual for a ROM to muck with the e820 map to allocate RAM
for itself which means there's always the chance that we're going to
walk over RAM being used for something else.

 

ROM does not muck with the e820. It uses PMM to allocate memory and the
memory it gets is marked as reserved in e820 map.
   


PMM allocations are only valid during the init function's execution.  
It's intention is to enable the use of scratch memory to decompress or 
otherwise modify the ROM to shrink its size.


If a ROM needs memory after the init function, it needs to use the 
traditional tricks to allocate long term memory and the most popular one 
is modifying the e820 tables.


See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.

Regards,

Anthony Liguori


--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 09:22 AM, Paolo Bonzini wrote:

On 08/04/2010 04:00 PM, Gleb Natapov wrote:

Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy
image in RAM, and just boot from floppy.


May be. Can floppy be 100M?


Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 
sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :)  Not sure the 
BIOS would read such a beast, or SYSLINUX.


By the way, if libguestfs insists for an initrd rather than a CDROM 
image, it could do something in between and make an ISO image with 
ISOLINUX and the required kernel/initrd pair.


(By the way, a network installation image for a typical distribution 
has a 120M initrd, so it's not just libguestfs.  It is very useful to 
pass the network installation images directly to qemu via 
-kernel/-initrd).


We could make kernel an awful lot smarter but unless we've got someone 
just itching to write 16-bit option rom code, I think our best bet is to 
try to leverage a standard bootloader and expose a disk containing the 
kernel/initrd.


Otherwise, we just stick with what we have and deal with the performance 
as is.


Regards,

Anthony Liguori



Paolo

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 09:22:22AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:26 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote:
> >>On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> >>>On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> >On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >>For playing games, there are three options:
> >>- existing fwcfg
> >>- fwcfg+dma
> >>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >>and have the BIOS copy them
> >>
> >>Existing fwcfg is the least amount of work and probably
> >>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >>High memory flash is the most hardware-like solution, pretty easy
> >>from a qemu point of view but requires more work.
> >
> >The only trouble I see is that high memory isn't always available.
> >If it's a 32-bit PC and you've exhausted RAM space, then you're only
> >left with the PCI hole and it's not clear to me if you can really
> >pull out 100mb of space there as an option ROM without breaking
> >something.
> >
> We can map it on demand. Guest tells qemu to map rom "A" to address X by
> writing into some io port. Guest copies rom. Guest tells qemu to unmap
> it. Better then DMA interface IMHO.
> >>>That's what I thought too, but in a 32-bit guest using ~3.5GB of
> >>>RAM, where can you safely get 100MB of memory to full map the ROM?
> >>>If you're going to map chunks at a time, you are basically doing
> >>>DMA.
> >>It's boot time, so you can just map it over some existing RAM surely?
> >Not with current qemu. This  is broken now.
> 
> But even if it wasn't it can potentially create havoc.  I think we
> currently believe that the northbridge likely never forwards RAM
> access to a device so this doesn't fit how hardware would work.
> 
Good point.

> More importantly, BIOSes and ROMs do very funny things with RAM.
> It's not unusual for a ROM to muck with the e820 map to allocate RAM
> for itself which means there's always the chance that we're going to
> walk over RAM being used for something else.
> 
ROM does not muck with the e820. It uses PMM to allocate memory and the
memory it gets is marked as reserved in e820 map.

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 09:14:01AM -0500, Anthony Liguori wrote:
> >Unmapping device and mapping it at the same place is easy. Enumerating
> >pci devices from multiboot.bin looks like unneeded churn though.
> >
> >>Maybe we're just being too fancy here.
> >>
> >>We could rewrite -kernel/-append/-initrd to just generate a floppy
> >>image in RAM, and just boot from floppy.
> >>
> >May be. Can floppy be 100M?
> 
> No, I forgot just how small they are.  R/O usb mass storage device?
> CDROM?  I'm beginning thing that loading such a large initrd through
> fwcfg is simply a dead end.
> 
Well, libguestfs can use CDROM by itself to begin with.

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Paolo Bonzini


On 08/04/2010 04:00 PM, Gleb Natapov wrote:

Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy
image in RAM, and just boot from floppy.


May be. Can floppy be 100M?


Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 
sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :)  Not sure the 
BIOS would read such a beast, or SYSLINUX.


By the way, if libguestfs insists for an initrd rather than a CDROM 
image, it could do something in between and make an ISO image with 
ISOLINUX and the required kernel/initrd pair.


(By the way, a network installation image for a typical distribution has 
a 120M initrd, so it's not just libguestfs.  It is very useful to pass 
the network installation images directly to qemu via -kernel/-initrd).


Paolo

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 08:26 AM, Gleb Natapov wrote:

On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote:
   

On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
 

On 08/04/2010 08:07 AM, Gleb Natapov wrote:
   

On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
 

On 08/04/2010 03:17 AM, Avi Kivity wrote:
   

For playing games, there are three options:
- existing fwcfg
- fwcfg+dma
- put roms in 4GB-2MB (or whatever we decide the flash size is)
and have the BIOS copy them

Existing fwcfg is the least amount of work and probably
satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
High memory flash is the most hardware-like solution, pretty easy
 

>from a qemu point of view but requires more work.

The only trouble I see is that high memory isn't always available.
If it's a 32-bit PC and you've exhausted RAM space, then you're only
left with the PCI hole and it's not clear to me if you can really
pull out 100mb of space there as an option ROM without breaking
something.

   

We can map it on demand. Guest tells qemu to map rom "A" to address X by
writing into some io port. Guest copies rom. Guest tells qemu to unmap
it. Better then DMA interface IMHO.
 

That's what I thought too, but in a 32-bit guest using ~3.5GB of
RAM, where can you safely get 100MB of memory to full map the ROM?
If you're going to map chunks at a time, you are basically doing
DMA.
   

It's boot time, so you can just map it over some existing RAM surely?
 

Not with current qemu. This  is broken now.
   


But even if it wasn't it can potentially create havoc.  I think we 
currently believe that the northbridge likely never forwards RAM access 
to a device so this doesn't fit how hardware would work.


More importantly, BIOSes and ROMs do very funny things with RAM.  It's 
not unusual for a ROM to muck with the e820 map to allocate RAM for 
itself which means there's always the chance that we're going to walk 
over RAM being used for something else.


Regards,

Anthony Liguori


Linuxboot.bin can work out where to map it so it won't be in any
memory either being used or the target for the copy.

 

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 09:00 AM, Gleb Natapov wrote:

On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote:
   

On 08/04/2010 08:34 AM, Gleb Natapov wrote:
 

On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
   

On 08/04/2010 08:07 AM, Gleb Natapov wrote:
 

On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
   

On 08/04/2010 03:17 AM, Avi Kivity wrote:
 

For playing games, there are three options:
- existing fwcfg
- fwcfg+dma
- put roms in 4GB-2MB (or whatever we decide the flash size is)
and have the BIOS copy them

Existing fwcfg is the least amount of work and probably
satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
High memory flash is the most hardware-like solution, pretty easy
   

>from a qemu point of view but requires more work.

The only trouble I see is that high memory isn't always available.
If it's a 32-bit PC and you've exhausted RAM space, then you're only
left with the PCI hole and it's not clear to me if you can really
pull out 100mb of space there as an option ROM without breaking
something.

 

We can map it on demand. Guest tells qemu to map rom "A" to address X by
writing into some io port. Guest copies rom. Guest tells qemu to unmap
it. Better then DMA interface IMHO.
   

That's what I thought too, but in a 32-bit guest using ~3.5GB of
RAM, where can you safely get 100MB of memory to full map the ROM?
If you're going to map chunks at a time, you are basically doing
DMA.

 

This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.
   

That's a bit complicated because SeaBIOS is managing the PCI devices
whereas the kernel code is running as an option rom.  I don't know
the BIOS PCI interfaces well so I don't know how doable this is.

 

Unmapping device and mapping it at the same place is easy. Enumerating
pci devices from multiboot.bin looks like unneeded churn though.

   

Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy
image in RAM, and just boot from floppy.

 

May be. Can floppy be 100M?
   


No, I forgot just how small they are.  R/O usb mass storage device?  
CDROM?  I'm beginning thing that loading such a large initrd through 
fwcfg is simply a dead end.


Regards,

Anthony Liguori


--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:34 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> >>On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> >>>On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >For playing games, there are three options:
> >- existing fwcfg
> >- fwcfg+dma
> >- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >and have the BIOS copy them
> >
> >Existing fwcfg is the least amount of work and probably
> >satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >High memory flash is the most hardware-like solution, pretty easy
> >from a qemu point of view but requires more work.
> 
> The only trouble I see is that high memory isn't always available.
> If it's a 32-bit PC and you've exhausted RAM space, then you're only
> left with the PCI hole and it's not clear to me if you can really
> pull out 100mb of space there as an option ROM without breaking
> something.
> 
> >>>We can map it on demand. Guest tells qemu to map rom "A" to address X by
> >>>writing into some io port. Guest copies rom. Guest tells qemu to unmap
> >>>it. Better then DMA interface IMHO.
> >>That's what I thought too, but in a 32-bit guest using ~3.5GB of
> >>RAM, where can you safely get 100MB of memory to full map the ROM?
> >>If you're going to map chunks at a time, you are basically doing
> >>DMA.
> >>
> >This is not like DMA event if done in chunks and chunks can be pretty
> >big. The code that dials with copying may temporary unmap some pci
> >devices to have more space there.
> 
> That's a bit complicated because SeaBIOS is managing the PCI devices
> whereas the kernel code is running as an option rom.  I don't know
> the BIOS PCI interfaces well so I don't know how doable this is.
> 
Unmapping device and mapping it at the same place is easy. Enumerating
pci devices from multiboot.bin looks like unneeded churn though.

> Maybe we're just being too fancy here.
> 
> We could rewrite -kernel/-append/-initrd to just generate a floppy
> image in RAM, and just boot from floppy.
> 
May be. Can floppy be 100M?

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Anthony Liguori


On 08/04/2010 08:34 AM, Gleb Natapov wrote:

On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
   

On 08/04/2010 08:07 AM, Gleb Natapov wrote:
 

On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
   

On 08/04/2010 03:17 AM, Avi Kivity wrote:
 

For playing games, there are three options:
- existing fwcfg
- fwcfg+dma
- put roms in 4GB-2MB (or whatever we decide the flash size is)
and have the BIOS copy them

Existing fwcfg is the least amount of work and probably
satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
High memory flash is the most hardware-like solution, pretty easy
   

>from a qemu point of view but requires more work.

The only trouble I see is that high memory isn't always available.
If it's a 32-bit PC and you've exhausted RAM space, then you're only
left with the PCI hole and it's not clear to me if you can really
pull out 100mb of space there as an option ROM without breaking
something.

 

We can map it on demand. Guest tells qemu to map rom "A" to address X by
writing into some io port. Guest copies rom. Guest tells qemu to unmap
it. Better then DMA interface IMHO.
   

That's what I thought too, but in a 32-bit guest using ~3.5GB of
RAM, where can you safely get 100MB of memory to full map the ROM?
If you're going to map chunks at a time, you are basically doing
DMA.

 

This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.
   


That's a bit complicated because SeaBIOS is managing the PCI devices 
whereas the kernel code is running as an option rom.  I don't know the 
BIOS PCI interfaces well so I don't know how doable this is.


Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy image 
in RAM, and just boot from floppy.


Regards,

Anthony Liguori

   

And what's the upper limit on ROM size that we impose?  100MB is
already at the ridiculously large size.

 

Agree. We have two solutions:
1. Avoid the problem
2. Fix the problem.

Both are fine with me and I prefer 1, but if we are going with 2 I
prefer something sane.

--
Gleb.

Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-04 Thread Gleb Natapov

On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> >>On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >>>For playing games, there are three options:
> >>>- existing fwcfg
> >>>- fwcfg+dma
> >>>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >>>and have the BIOS copy them
> >>>
> >>>Existing fwcfg is the least amount of work and probably
> >>>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >>>High memory flash is the most hardware-like solution, pretty easy
> >>>from a qemu point of view but requires more work.
> >>
> >>The only trouble I see is that high memory isn't always available.
> >>If it's a 32-bit PC and you've exhausted RAM space, then you're only
> >>left with the PCI hole and it's not clear to me if you can really
> >>pull out 100mb of space there as an option ROM without breaking
> >>something.
> >>
> >We can map it on demand. Guest tells qemu to map rom "A" to address X by
> >writing into some io port. Guest copies rom. Guest tells qemu to unmap
> >it. Better then DMA interface IMHO.
> 
> That's what I thought too, but in a 32-bit guest using ~3.5GB of
> RAM, where can you safely get 100MB of memory to full map the ROM?
> If you're going to map chunks at a time, you are basically doing
> DMA.
> 
This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.

> And what's the upper limit on ROM size that we impose?  100MB is
> already at the ridiculously large size.
> 
Agree. We have two solutions:
1. Avoid the problem
2. Fix the problem.

Both are fine with me and I prefer 1, but if we are going with 2 I
prefer something sane.

--
Gleb.

1 2 >

1 - 100 of 135 matches

Mail list logo