date:20100617

[Qemu-devel] [PATCH 00/10] pci: pci to pci bridge clean up and enhancement

2010-06-17 Thread Isaku Yamahata

This patch series cleans up pci to pci bridge layer by introducing
pci bridge layer. and some bug fixes.
Although pci bridge implementation would belong to pci.c,
I split it out into pci_bridge.c because pci.c is already big enough.

This might seem over engineering, but it's also a preparation for
pci express root/upstream/downstream port emulators.
Those express ports are similar, but different from each other.
So new pci bridge layer helps here.
Once this patch series is merged, the express ports patch will follow.

Isaku Yamahata (10):
  pci_bridge: split out pci bridge code into pci_bridge.c from pci.c
  qdev: export qdev_reset() for later use.
  pci: fix pci_bus_reset() with 64bit BAR and several clean ups.
  pci_bridge: introduce pci bridge layer.
  pci bridge: add helper function for ssvid capability.
  pci: eliminate work around in pci_device_reset().
  pci: fix pci domain registering.
  pci: remove PCIDeviceInfo::header_type
  pci: set PCI multi-function bit appropriately.
  pci: don't overwrite multi functio bit in pci header type.

 Makefile.objs |2 +-
 hw/ac97.c |1 -
 hw/acpi_piix4.c   |1 -
 hw/apb_pci.c  |   43 -
 hw/dec_pci.c  |   31 ++---
 hw/e1000.c|1 +
 hw/grackle_pci.c  |1 -
 hw/ide/cmd646.c   |1 -
 hw/ide/piix.c |1 -
 hw/lsi53c895a.c   |2 +
 hw/macio.c|1 -
 hw/ne2000.c   |1 -
 hw/openpic.c  |1 -
 hw/pci.c  |  194 +++--
 hw/pci.h  |   22 +-
 hw/pci_bridge.c   |  188 +++
 hw/pci_bridge.h   |   71 +++
 hw/pcnet.c|2 +-
 hw/piix4.c|3 +-
 hw/piix_pci.c |5 +-
 hw/prep_pci.c |1 -
 hw/qdev.c |   13 +++-
 hw/qdev.h |1 +
 hw/rtl8139.c  |3 +-
 hw/sun4u.c|1 -
 hw/unin_pci.c |4 -
 hw/usb-uhci.c |1 -
 hw/vga-pci.c  |1 -
 hw/virtio-pci.c   |2 +-
 hw/vmware_vga.c   |1 -
 hw/wdt_i6300esb.c |1 -
 qemu-common.h |1 +
 32 files changed, 430 insertions(+), 172 deletions(-)
 create mode 100644 hw/pci_bridge.c
 create mode 100644 hw/pci_bridge.h

[Qemu-devel] [PATCH 05/10] pci bridge: add helper function for ssvid capability.

2010-06-17 Thread Isaku Yamahata

helper function to add ssvid capability.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/pci_bridge.c |   20 
 hw/pci_bridge.h |3 +++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index 43c21d4..1397a11 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -29,6 +29,26 @@
 
 #include pci_bridge.h
 
+/* PCI bridge subsystem vendor ID helper functions */
+#define PCI_SSVID_SIZEOF8
+#define PCI_SSVID_SVID  4
+#define PCI_SSVID_SSID  6
+
+int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
+  uint16_t svid, uint16_t ssid)
+{
+int pos;
+pos = pci_add_capability_at_offset(dev, PCI_CAP_ID_SSVID,
+   offset, PCI_SSVID_SIZEOF);
+if (pos  0) {
+return pos;
+}
+
+pci_set_word(dev-config + pos + PCI_SSVID_SVID, svid);
+pci_set_word(dev-config + pos + PCI_SSVID_SSID, ssid);
+return pos;
+}
+
 void pci_bridge_write_config(PCIDevice *d,
  uint32_t address, uint32_t val, int len)
 {
diff --git a/hw/pci_bridge.h b/hw/pci_bridge.h
index 2747e7f..a1f160b 100644
--- a/hw/pci_bridge.h
+++ b/hw/pci_bridge.h
@@ -23,6 +23,9 @@
 
 #include pci.h
 
+int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
+  uint16_t svid, uint16_t ssid);
+
 struct PCIBridge {
 PCIDevice dev;
 
-- 
1.6.6.1

[Qemu-devel] [PATCH 08/10] pci: remove PCIDeviceInfo::header_type

2010-06-17 Thread Isaku Yamahata

replace PCIDeviceInfo::header_type with is_bridge
as suggested by Michael S. Tsirkin m...@redhat.com

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/apb_pci.c|2 +-
 hw/dec_pci.c|2 +-
 hw/pci.c|9 -
 hw/pci.h|8 ++--
 hw/pci_bridge.c |6 +-
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index cb9051b..a1c17b9 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -440,7 +440,7 @@ static PCIDeviceInfo pbm_pci_host_info = {
 .qdev.name = pbm,
 .qdev.size = sizeof(PCIDevice),
 .init  = pbm_pci_host_init,
-.header_type  = PCI_HEADER_TYPE_BRIDGE,
+.is_bridge = true,
 };
 
 static SysBusDeviceInfo pbm_host_info = {
diff --git a/hw/dec_pci.c b/hw/dec_pci.c
index 45b5c28..9311c6f 100644
--- a/hw/dec_pci.c
+++ b/hw/dec_pci.c
@@ -100,7 +100,7 @@ static PCIDeviceInfo dec_21154_pci_host_info = {
 .qdev.name = dec-21154,
 .qdev.size = sizeof(PCIDevice),
 .init  = dec_21154_pci_host_init,
-.header_type  = PCI_HEADER_TYPE_BRIDGE,
+.is_bridge  = true,
 };
 
 static void dec_register_devices(void)
diff --git a/hw/pci.c b/hw/pci.c
index 162dcd4..5316aa5 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -630,7 +630,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
  const char *name, int devfn,
  PCIConfigReadFunc *config_read,
  PCIConfigWriteFunc *config_write,
- uint8_t header_type)
+ bool is_bridge)
 {
 if (devfn  0) {
 for(devfn = bus-devfn_min ; devfn  ARRAY_SIZE(bus-devices);
@@ -652,13 +652,12 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pci_dev-irq_state = 0;
 pci_config_alloc(pci_dev);
 
-header_type = ~PCI_HEADER_TYPE_MULTI_FUNCTION;
-if (header_type == PCI_HEADER_TYPE_NORMAL) {
+if (!is_bridge) {
 pci_set_default_subsystem_id(pci_dev);
 }
 pci_init_cmask(pci_dev);
 pci_init_wmask(pci_dev);
-if (header_type == PCI_HEADER_TYPE_BRIDGE) {
+if (is_bridge) {
 pci_init_wmask_bridge(pci_dev);
 }
 
@@ -1575,7 +1574,7 @@ static int pci_qdev_init(DeviceState *qdev, DeviceInfo 
*base)
 devfn = pci_dev-devfn;
 pci_dev = do_pci_register_device(pci_dev, bus, base-name, devfn,
  info-config_read, info-config_write,
- info-header_type);
+ info-is_bridge);
 if (pci_dev == NULL)
 return -1;
 rc = info-init(pci_dev);
diff --git a/hw/pci.h b/hw/pci.h
index 10a63e8..ef06b27 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -334,8 +334,12 @@ typedef struct {
 PCIConfigReadFunc *config_read;
 PCIConfigWriteFunc *config_write;
 
-/* pci config header type */
-uint8_t header_type;
+/*
+ * pci-to-pci bridge or normal device.
+ * This doesn't mean pci host switch.
+ * When card bus bridge is supported, this would be enhanced.
+ */
+int is_bridge;
 
 /* pcie stuff */
 int is_express;   /* is this device pci express? */
diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index 1397a11..736a3db 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -113,6 +113,10 @@ static int pci_bridge_initfn(PCIDevice *dev)
 PCIBridgeInfo *info = DO_UPCAST(PCIBridgeInfo, pci, pci_info);
 int rc = 0;
 
+dev-config[PCI_HEADER_TYPE] =
+(dev-config[PCI_HEADER_TYPE]  PCI_HEADER_TYPE_MULTI_FUNCTION) |
+PCI_HEADER_TYPE_BRIDGE;
+
 pci_set_word(dev-config + PCI_STATUS,
  PCI_STATUS_66MHZ | PCI_STATUS_FAST_BACK);
 pci_config_set_class(dev-config, PCI_CLASS_BRIDGE_PCI);
@@ -153,7 +157,7 @@ void pci_bridge_qdev_register(PCIBridgeInfo *info)
 }
 info-pci.init = pci_bridge_initfn;
 info-pci.exit = pci_bridge_exitfn;
-info-pci.header_type = PCI_HEADER_TYPE_BRIDGE;
+info-pci.is_bridge = true;
 if (!info-pci.config_write) {
 info-pci.config_write = pci_bridge_write_config;
 }
-- 
1.6.6.1

[Qemu-devel] [PATCH 07/10] pci: fix pci domain registering.

2010-06-17 Thread Isaku Yamahata

Only pci host bus must be registered as root bus.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/pci.c  |8 ++--
 hw/pci.h  |1 +
 hw/piix_pci.c |1 +
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 5dee102..162dcd4 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -207,7 +207,7 @@ static void pci_bus_resetfn(void *opaque)
 pci_bus_reset(bus);
 }
 
-static void pci_host_bus_register(int domain, PCIBus *bus)
+void pci_host_bus_register(int domain, PCIBus *bus)
 {
 struct PCIHostBus *host;
 host = qemu_mallocz(sizeof(*host));
@@ -254,11 +254,7 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
 qbus_create_inplace(bus-qbus, pci_bus_info, parent, name);
 bus-devfn_min = devfn_min;
-
-/* host bridge */
 QLIST_INIT(bus-child);
-pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */
-
 vmstate_register(-1, vmstate_pcibus, bus);
 qemu_register_reset(pci_bus_resetfn, bus);
 }
@@ -302,6 +298,7 @@ PCIBus *pci_register_bus(DeviceState *parent, const char 
*name,
 PCIBus *bus;
 
 bus = pci_bus_new(parent, name, devfn_min);
+pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */
 pci_bus_irqs(bus, set_irq, map_irq, irq_opaque, nirq);
 return bus;
 }
@@ -317,7 +314,6 @@ PCIBus *pci_register_secondary_bus(PCIBus *parent,
 bus-map_irq = map_irq;
 bus-parent_dev = dev;
 
-QLIST_INIT(bus-child);
 QLIST_INSERT_HEAD(parent-child, bus, sibling);
 
 return bus;
diff --git a/hw/pci.h b/hw/pci.h
index 2a2c8ef..10a63e8 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -227,6 +227,7 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char 
*default_model,
const char *default_devaddr);
 int pci_bus_num(PCIBus *s);
 void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, 
PCIDevice *d));
+void pci_host_bus_register(int domain, PCIBus *bus);
 PCIBus *pci_find_root_bus(int domain);
 int pci_find_domain(const PCIBus *bus);
 PCIBus *pci_find_bus(PCIBus *bus, int bus_num);
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index d14d05e..16645cd 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -227,6 +227,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
*piix3_devfn, qemu_irq *
 dev = qdev_create(NULL, i440FX-pcihost);
 s = FROM_SYSBUS(I440FXState, sysbus_from_qdev(dev));
 b = pci_bus_new(s-busdev.qdev, NULL, 0);
+pci_host_bus_register(0, b);/* pci domain 0 */
 s-bus = b;
 qdev_init_nofail(dev);
 
-- 
1.6.6.1

[Qemu-devel] [PATCH 03/10] pci: fix pci_bus_reset() with 64bit BAR and several clean ups.

2010-06-17 Thread Isaku Yamahata

fix pci_device_reset() with 64bit BAR.
export pci_bus_reset(), pci_device_reset() and two helper functions
for later use. And several clean ups.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/pci.c |   44 
 hw/pci.h |5 +
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 9ba62eb..87f5e6c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -144,28 +144,50 @@ static void pci_update_irq_status(PCIDevice *dev)
 }
 }
 
-static void pci_device_reset(PCIDevice *dev)
+void pci_device_reset_default(PCIDevice *dev)
 {
 int r;
 
 dev-irq_state = 0;
 pci_update_irq_status(dev);
-dev-config[PCI_COMMAND] = ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
-  PCI_COMMAND_MASTER);
+pci_set_word(dev-config + PCI_COMMAND,
+ pci_get_word(dev-config + PCI_COMMAND) 
+ ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER));
 dev-config[PCI_CACHE_LINE_SIZE] = 0x0;
 dev-config[PCI_INTERRUPT_LINE] = 0x0;
 for (r = 0; r  PCI_NUM_REGIONS; ++r) {
-if (!dev-io_regions[r].size) {
+PCIIORegion *region = dev-io_regions[r];
+if (!region-size) {
 continue;
 }
-pci_set_long(dev-config + pci_bar(dev, r), dev-io_regions[r].type);
+
+if (!(region-type  PCI_BASE_ADDRESS_SPACE_IO) 
+region-type  PCI_BASE_ADDRESS_MEM_TYPE_64) {
+pci_set_quad(dev-config + pci_bar(dev, r), region-type);
+} else {
+pci_set_long(dev-config + pci_bar(dev, r), region-type);
+}
 }
 pci_update_mappings(dev);
 }
 
-static void pci_bus_reset(void *opaque)
+void pci_device_reset(PCIDevice *dev)
+{
+if (!dev-qdev.info) {
+/* for not qdevified device */
+pci_device_reset_default(dev);
+return;
+}
+
+qdev_reset(dev-qdev);
+
+/* TODO: make DeviceInfo::reset call
+   pci_device_reset_default() itself. */
+pci_device_reset_default(dev);
+}
+
+void pci_bus_reset(PCIBus *bus)
 {
-PCIBus *bus = opaque;
 int i;
 
 for (i = 0; i  bus-nirq; i++) {
@@ -178,6 +200,12 @@ static void pci_bus_reset(void *opaque)
 }
 }
 
+static void pci_bus_resetfn(void *opaque)
+{
+PCIBus *bus = opaque;
+pci_bus_reset(bus);
+}
+
 static void pci_host_bus_register(int domain, PCIBus *bus)
 {
 struct PCIHostBus *host;
@@ -231,7 +259,7 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */
 
 vmstate_register(-1, vmstate_pcibus, bus);
-qemu_register_reset(pci_bus_reset, bus);
+qemu_register_reset(pci_bus_resetfn, bus);
 }
 
 PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min)
diff --git a/hw/pci.h b/hw/pci.h
index f6e2551..2a2c8ef 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -199,6 +199,11 @@ int pci_device_load(PCIDevice *s, QEMUFile *f);
 typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
 typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
 typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev, int 
state);
+
+void pci_device_reset_default(PCIDevice *dev);
+void pci_device_reset(PCIDevice *dev);
+void pci_bus_reset(PCIBus *bus);
+
 void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
  const char *name, int devfn_min);
 PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min);
-- 
1.6.6.1

[Qemu-devel] [PATCH 04/10] pci_bridge: introduce pci bridge layer.

2010-06-17 Thread Isaku Yamahata

introduce pci bridge layer.
export pci_bridge_write_config() for generic use.
support device reset and bus reset of bridge control.
convert apb bridge and dec p2p bridge to use new pci bridge layer.
save/restore is supported as a side effect.

This might be a bit over engineering, but this is also preparation
for pci express root/upstream/downstream port.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/apb_pci.c|   38 +-
 hw/dec_pci.c|   28 +++---
 hw/pci_bridge.c |  146 +--
 hw/pci_bridge.h |   35 -
 qemu-common.h   |1 +
 5 files changed, 177 insertions(+), 71 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index c11d9b5..cb9051b 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -31,6 +31,7 @@
 #include pci_host.h
 #include pci_bridge.h
 #include rwhandler.h
+#include pci_bridge.h
 #include apb_pci.h
 #include sysemu.h
 
@@ -294,9 +295,12 @@ static void pci_apb_set_irq(void *opaque, int irq_num, int 
level)
 }
 }
 
-static void apb_pci_bridge_init(PCIBus *b)
+static int apb_pci_bridge_init(PCIBridge *br)
 {
-PCIDevice *dev = pci_bridge_get_device(b);
+PCIDevice *dev = br-dev;
+
+pci_config_set_vendor_id(dev-config, PCI_VENDOR_ID_SUN);
+pci_config_set_device_id(dev-config, PCI_DEVICE_ID_SUN_SIMBA);
 
 /*
  * command register:
@@ -316,6 +320,8 @@ static void apb_pci_bridge_init(PCIBus *b)
 pci_set_byte(dev-config + PCI_HEADER_TYPE,
  pci_get_byte(dev-config + PCI_HEADER_TYPE) |
  PCI_HEADER_TYPE_MULTI_FUNCTION);
+
+return 0;
 }
 
 PCIBus *pci_apb_init(target_phys_addr_t special_base,
@@ -326,6 +332,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 SysBusDevice *s;
 APBState *d;
 unsigned int i;
+PCIBridge *br;
 
 /* Ultrasparc PBM main bus */
 dev = qdev_create(NULL, pbm);
@@ -351,17 +358,13 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 pci_create_simple(d-bus, 0, pbm);
 
 /* APB secondary busses */
-*bus2 = pci_bridge_init(d-bus, PCI_DEVFN(1, 0),
-PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
-pci_apb_map_irq,
-Advanced PCI Bus secondary bridge 1);
-apb_pci_bridge_init(*bus2);
-
-*bus3 = pci_bridge_init(d-bus, PCI_DEVFN(1, 1),
-PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
-pci_apb_map_irq,
-Advanced PCI Bus secondary bridge 2);
-apb_pci_bridge_init(*bus3);
+br = pci_bridge_create_simple(d-bus, PCI_DEVFN(1, 0), pbm-bridge,
+  Advanced PCI Bus secondary bridge 1);
+*bus2 = pci_bridge_get_sec_bus(br);
+
+br = pci_bridge_create_simple(d-bus, PCI_DEVFN(1, 1), pbm-bridge,
+  Advanced PCI Bus secondary bridge 2);
+*bus3 = pci_bridge_get_sec_bus(br);
 
 return d-bus;
 }
@@ -446,10 +449,19 @@ static SysBusDeviceInfo pbm_host_info = {
 .qdev.reset = pci_pbm_reset,
 .init = pci_pbm_init_device,
 };
+
+static PCIBridgeInfo pbm_pci_bridge_info = {
+.pci.qdev.name = pbm-bridge,
+.pci.qdev.vmsd = vmstate_pci_device,
+.init = apb_pci_bridge_init,
+.map_irq = pci_apb_map_irq,
+};
+
 static void pbm_register_devices(void)
 {
 sysbus_register_withprop(pbm_host_info);
 pci_qdev_register(pbm_pci_host_info);
+pci_bridge_qdev_register(pbm_pci_bridge_info);
 }
 
 device_init(pbm_register_devices)
diff --git a/hw/dec_pci.c b/hw/dec_pci.c
index b2759dd..45b5c28 100644
--- a/hw/dec_pci.c
+++ b/hw/dec_pci.c
@@ -49,18 +49,27 @@ static int dec_map_irq(PCIDevice *pci_dev, int irq_num)
 return irq_num;
 }
 
-PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
+static int dec_21154_initfn(PCIBridge *br)
 {
-DeviceState *dev;
-PCIBus *ret;
+pci_config_set_vendor_id(br-dev.config, PCI_VENDOR_ID_DEC);
+pci_config_set_device_id(br-dev.config, PCI_DEVICE_ID_DEC_21154);
+return 0;
+}
 
-dev = qdev_create(NULL, dec-21154);
-qdev_init_nofail(dev);
-ret = pci_bridge_init(parent_bus, devfn,
-  PCI_VENDOR_ID_DEC, PCI_DEVICE_ID_DEC_21154,
-  dec_map_irq, DEC 21154 PCI-PCI bridge);
+static PCIBridgeInfo dec_21154_pci_bridge_info = {
+.pci.qdev.name = dec-21154-p2p-bridge,
+.pci.qdev.desc = DEC 21154 PCI-PCI bridge,
+.pci.qdev.vmsd = vmstate_pci_device,
+.init = dec_21154_initfn,
+.map_irq = dec_map_irq,
+};
 
-return ret;
+PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
+{
+PCIBridge *br;
+br = pci_bridge_create_simple(parent_bus, devfn, dec-21154-p2p-bridge,
+  DEC 21154 PCI-PCI bridge);
+return pci_bridge_get_sec_bus(br);
 }
 
 static int pci_dec_21154_init_device(SysBusDevice *dev)
@@ -99,6 +108,7 @@ static void dec_register_devices(void)

[Qemu-devel] [PATCH 01/10] pci_bridge: split out pci bridge code into pci_bridge.c from pci.c

2010-06-17 Thread Isaku Yamahata

Move pci bridge related code into pci_bridge.c from pci.c
for further enhancement. pci.c is big enough now, so split it out.

In fact, some of pci bridge functions stays in pci.c because
it accesses to PCIBus member. Unstatic the accessor functions
and use them from pci_bridge.c

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 Makefile.objs   |2 +-
 hw/apb_pci.c|1 +
 hw/dec_pci.c|1 +
 hw/pci.c|  104 ++-
 hw/pci.h|8 +++-
 hw/pci_bridge.c |  112 +++
 hw/pci_bridge.h |   37 ++
 7 files changed, 170 insertions(+), 95 deletions(-)
 create mode 100644 hw/pci_bridge.c
 create mode 100644 hw/pci_bridge.h

diff --git a/Makefile.objs b/Makefile.objs
index 2bfb6d1..5c37e5c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -139,7 +139,7 @@ user-obj-y += cutils.o cache-utils.o
 hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-y += virtio.o virtio-console.o
-hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o
+hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o
 hw-obj-y += watchdog.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 31c8d70..c11d9b5 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -29,6 +29,7 @@
 #include sysbus.h
 #include pci.h
 #include pci_host.h
+#include pci_bridge.h
 #include rwhandler.h
 #include apb_pci.h
 #include sysemu.h
diff --git a/hw/dec_pci.c b/hw/dec_pci.c
index 024c67c..b2759dd 100644
--- a/hw/dec_pci.c
+++ b/hw/dec_pci.c
@@ -27,6 +27,7 @@
 #include sysbus.h
 #include pci.h
 #include pci_host.h
+#include pci_bridge.h
 
 /* debug DEC */
 //#define DEBUG_DEC
diff --git a/hw/pci.c b/hw/pci.c
index 3777c1c..9ba62eb 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -277,24 +277,28 @@ PCIBus *pci_register_bus(DeviceState *parent, const char 
*name,
 return bus;
 }
 
-static void pci_register_secondary_bus(PCIBus *parent,
-   PCIBus *bus,
-   PCIDevice *dev,
-   pci_map_irq_fn map_irq,
-   const char *name)
+PCIBus *pci_register_secondary_bus(PCIBus *parent,
+   PCIDevice *dev,
+   pci_map_irq_fn map_irq,
+   const char *name)
 {
-qbus_create_inplace(bus-qbus, pci_bus_info, dev-qdev, name);
+PCIBus *bus;
+bus = pci_bus_new(dev-qdev, name, 0);
+
 bus-map_irq = map_irq;
 bus-parent_dev = dev;
 
 QLIST_INIT(bus-child);
 QLIST_INSERT_HEAD(parent-child, bus, sibling);
+
+return bus;
 }
 
-static void pci_unregister_secondary_bus(PCIBus *bus)
+void pci_unregister_secondary_bus(PCIBus *bus)
 {
 assert(QLIST_EMPTY(bus-child));
 QLIST_REMOVE(bus, sibling);
+qbus_free(bus-qbus);
 }
 
 int pci_bus_num(PCIBus *s)
@@ -1466,20 +1470,12 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char 
*default_model,
 return res;
 }
 
-typedef struct {
-PCIDevice dev;
-PCIBus bus;
-uint32_t vid;
-uint32_t did;
-} PCIBridge;
-
-
 static void pci_bridge_update_mappings_fn(PCIBus *b, PCIDevice *d)
 {
 pci_update_mappings(d);
 }
 
-static void pci_bridge_update_mappings(PCIBus *b)
+void pci_bridge_update_mappings(PCIBus *b)
 {
 PCIBus *child;
 
@@ -1490,21 +1486,6 @@ static void pci_bridge_update_mappings(PCIBus *b)
 }
 }
 
-static void pci_bridge_write_config(PCIDevice *d,
- uint32_t address, uint32_t val, int len)
-{
-pci_default_write_config(d, address, val, len);
-
-if (/* io base/limit */
-ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-
-/* memory base/limit, prefetchable base/limit and
-   io base/limit upper 16 */
-ranges_overlap(address, len, PCI_MEMORY_BASE, 20)) {
-pci_bridge_update_mappings(d-bus);
-}
-}
-
 PCIBus *pci_find_bus(PCIBus *bus, int bus_num)
 {
 PCIBus *sec;
@@ -1548,46 +1529,6 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num, int 
slot, int function)
 return bus-devices[PCI_DEVFN(slot, function)];
 }
 
-static int pci_bridge_initfn(PCIDevice *dev)
-{
-PCIBridge *s = DO_UPCAST(PCIBridge, dev, dev);
-
-pci_config_set_vendor_id(s-dev.config, s-vid);
-pci_config_set_device_id(s-dev.config, s-did);
-
-pci_set_word(dev-config + PCI_STATUS,
- PCI_STATUS_66MHZ | PCI_STATUS_FAST_BACK);
-pci_config_set_class(dev-config, PCI_CLASS_BRIDGE_PCI);
-dev-config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_BRIDGE;
-pci_set_word(dev-config + PCI_SEC_STATUS,
- PCI_STATUS_66MHZ | PCI_STATUS_FAST_BACK);
-return 0;
-}
-
-static int pci_bridge_exitfn(PCIDevice *pci_dev)
-{
-PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev);
-PCIBus *bus = s-bus;
-pci_unregister_secondary_bus(bus);
-return 0;
-}
-
-PCIBus

[Qemu-devel] [PATCH 02/10] qdev: export qdev_reset() for later use.

2010-06-17 Thread Isaku Yamahata

export qdev_reset() for later use.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/qdev.c |   13 +
 hw/qdev.h |1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index 61f999c..378f842 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -256,13 +256,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 return qdev;
 }
 
-static void qdev_reset(void *opaque)
+void qdev_reset(DeviceState *dev)
 {
-DeviceState *dev = opaque;
 if (dev-info-reset)
 dev-info-reset(dev);
 }
 
+static void qdev_reset_fn(void *opaque)
+{
+DeviceState *dev = opaque;
+qdev_reset(dev);
+}
+
 /* Initialize a device.  Device properties should be set before calling
this function.  IRQs and MMIO regions should be connected/mapped after
calling this function.
@@ -278,7 +283,7 @@ int qdev_init(DeviceState *dev)
 qdev_free(dev);
 return rc;
 }
-qemu_register_reset(qdev_reset, dev);
+qemu_register_reset(qdev_reset_fn, dev);
 if (dev-info-vmsd) {
 vmstate_register_with_alias_id(-1, dev-info-vmsd, dev,
dev-instance_id_alias,
@@ -348,7 +353,7 @@ void qdev_free(DeviceState *dev)
 if (dev-opts)
 qemu_opts_del(dev-opts);
 }
-qemu_unregister_reset(qdev_reset, dev);
+qemu_unregister_reset(qdev_reset_fn, dev);
 QLIST_REMOVE(dev, sibling);
 for (prop = dev-info-props; prop  prop-name; prop++) {
 if (prop-info-free) {
diff --git a/hw/qdev.h b/hw/qdev.h
index be5ad67..5fbdebf 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -113,6 +113,7 @@ typedef struct GlobalProperty {
 DeviceState *qdev_create(BusState *bus, const char *name);
 int qdev_device_help(QemuOpts *opts);
 DeviceState *qdev_device_add(QemuOpts *opts);
+void qdev_reset(DeviceState *dev);
 int qdev_init(DeviceState *dev) QEMU_WARN_UNUSED_RESULT;
 void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
-- 
1.6.6.1

[Qemu-devel] [PATCH 06/10] pci: eliminate work around in pci_device_reset().

2010-06-17 Thread Isaku Yamahata

Eliminate work around in pci_device_reset() by
making each pci reset function to call pci_device_reset_default().
If a driver reset function isn't specified, set it to pci default reset
function.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hw/e1000.c  |1 +
 hw/lsi53c895a.c |2 ++
 hw/pci.c|   12 
 hw/pcnet.c  |1 +
 hw/rtl8139.c|2 ++
 hw/virtio-pci.c |1 +
 6 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 0da65f9..448a743 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1069,6 +1069,7 @@ static void e1000_reset(void *opaque)
 memmove(d-mac_reg, mac_reg_init, sizeof mac_reg_init);
 d-rxbuf_min_shift = 1;
 memset(d-tx, 0, sizeof d-tx);
+pci_device_reset_default(d-dev);
 }
 
 static NetClientInfo net_e1000_info = {
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index f5a91ba..68723e3 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -358,6 +358,8 @@ static void lsi_soft_reset(LSIState *s)
 qemu_free(s-current);
 s-current = NULL;
 }
+
+pci_device_reset_default(s-dev);
 }
 
 static int lsi_dma_40bit(LSIState *s)
diff --git a/hw/pci.c b/hw/pci.c
index 87f5e6c..5dee102 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -171,6 +171,11 @@ void pci_device_reset_default(PCIDevice *dev)
 pci_update_mappings(dev);
 }
 
+static void pci_device_reset_default_fn(DeviceState *qdev)
+{
+pci_device_reset_default(DO_UPCAST(PCIDevice, qdev, qdev));
+}
+
 void pci_device_reset(PCIDevice *dev)
 {
 if (!dev-qdev.info) {
@@ -180,10 +185,6 @@ void pci_device_reset(PCIDevice *dev)
 }
 
 qdev_reset(dev-qdev);
-
-/* TODO: make DeviceInfo::reset call
-   pci_device_reset_default() itself. */
-pci_device_reset_default(dev);
 }
 
 void pci_bus_reset(PCIBus *bus)
@@ -1614,6 +1615,9 @@ void pci_qdev_register(PCIDeviceInfo *info)
 info-qdev.unplug = pci_unplug_device;
 info-qdev.exit = pci_unregister_device;
 info-qdev.bus_info = pci_bus_info;
+if (!info-qdev.reset) {
+info-qdev.reset = pci_device_reset_default_fn;
+}
 qdev_register(info-qdev);
 }
 
diff --git a/hw/pcnet.c b/hw/pcnet.c
index 5e63eb5..c894d13 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -2036,6 +2036,7 @@ static void pci_reset(DeviceState *dev)
 PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev.qdev, dev);
 
 pcnet_h_reset(d-state);
+pci_device_reset_default(d-pci_dev);
 }
 
 static PCIDeviceInfo pcnet_info = {
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 72e2242..bfa7cde 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -1260,6 +1260,8 @@ static void rtl8139_reset(DeviceState *d)
 
 /* reset tally counters */
 RTL8139TallyCounters_clear(s-tally_counters);
+
+pci_device_reset_default(s-dev);
 }
 
 static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index e101fa0..ea8ea6a 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -184,6 +184,7 @@ static void virtio_pci_reset(DeviceState *d)
 virtio_reset(proxy-vdev);
 msix_reset(proxy-pci_dev);
 proxy-bugs = 0;
+pci_device_reset_default(proxy-pci_dev);
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
-- 
1.6.6.1

[Qemu-devel] [PATCH 09/10] pci: set PCI multi-function bit appropriately.

2010-06-17 Thread Isaku Yamahata

set PCI multi-function bit appropriately.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
changes v1 - v2:
don't set header type register in configuration space.
---
 hw/pci.c |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 5316aa5..ee391dc 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -607,6 +607,30 @@ static void pci_init_wmask_bridge(PCIDevice *d)
 pci_set_word(d-wmask + PCI_BRIDGE_CONTROL, 0x);
 }
 
+static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev)
+{
+uint8_t slot = PCI_SLOT(dev-devfn);
+uint8_t func_max = 8;
+uint8_t func;
+
+for (func = 0; func  func_max; ++func) {
+if (bus-devices[PCI_DEVFN(slot, func)]) {
+break;
+}
+}
+if (func == func_max) {
+return;
+}
+
+for (func = 0; func  func_max; ++func) {
+if (bus-devices[PCI_DEVFN(slot, func)]) {
+bus-devices[PCI_DEVFN(slot, func)]-config[PCI_HEADER_TYPE] |=
+PCI_HEADER_TYPE_MULTI_FUNCTION;
+}
+}
+dev-config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
+}
+
 static void pci_config_alloc(PCIDevice *pci_dev)
 {
 int config_size = pci_config_size(pci_dev);
@@ -660,6 +684,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 if (is_bridge) {
 pci_init_wmask_bridge(pci_dev);
 }
+pci_init_multifunction(bus, pci_dev);
 
 if (!config_read)
 config_read = pci_default_read_config;
-- 
1.6.6.1

[Qemu-devel] [PATCH 10/10] pci: don't overwrite multi functio bit in pci header type.

2010-06-17 Thread Isaku Yamahata

Don't overwrite pci header type.
Otherwise, multi function bit which pci_init_header_type() sets
appropriately is lost.
Anyway PCI_HEADER_TYPE_NORMAL is zero, so it is unnecessary to zero
which is already zero cleared.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
changes v1 - v2:
- set header type of bridge type in pci_bridge_initfn().
- dropped ugly hunk in apb_pci.c.
---
 hw/ac97.c |1 -
 hw/acpi_piix4.c   |1 -
 hw/apb_pci.c  |2 --
 hw/grackle_pci.c  |1 -
 hw/ide/cmd646.c   |1 -
 hw/ide/piix.c |1 -
 hw/macio.c|1 -
 hw/ne2000.c   |1 -
 hw/openpic.c  |1 -
 hw/pcnet.c|1 -
 hw/piix4.c|3 +--
 hw/piix_pci.c |4 +---
 hw/prep_pci.c |1 -
 hw/rtl8139.c  |1 -
 hw/sun4u.c|1 -
 hw/unin_pci.c |4 
 hw/usb-uhci.c |1 -
 hw/vga-pci.c  |1 -
 hw/virtio-pci.c   |1 -
 hw/vmware_vga.c   |1 -
 hw/wdt_i6300esb.c |1 -
 21 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 4319bc8..d71072d 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -1295,7 +1295,6 @@ static int ac97_initfn (PCIDevice *dev)
 c[PCI_REVISION_ID] = 0x01;  /* rid revision ro */
 c[PCI_CLASS_PROG] = 0x00;  /* pi programming interface ro */
 pci_config_set_class (c, PCI_CLASS_MULTIMEDIA_AUDIO); /* ro */
-c[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; /* headtyp header type ro */
 
 /* TODO set when bar is registered. no need to override. */
 /* nabmar native audio mixer base address rw */
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 8d1a628..bfa1d9a 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -369,7 +369,6 @@ static int piix4_pm_initfn(PCIDevice *dev)
 pci_conf[0x08] = 0x03; // revision number
 pci_conf[0x09] = 0x00;
 pci_config_set_class(pci_conf, PCI_CLASS_BRIDGE_OTHER);
-pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 pci_conf[0x3d] = 0x01; // interrupt pin 1
 
 pci_conf[0x40] = 0x01; /* PM io base read only bit */
diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index a1c17b9..3b8eda3 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -431,8 +431,6 @@ static int pbm_pci_host_init(PCIDevice *d)
  PCI_STATUS_FAST_BACK | PCI_STATUS_66MHZ |
  PCI_STATUS_DEVSEL_MEDIUM);
 pci_config_set_class(d-config, PCI_CLASS_BRIDGE_HOST);
-pci_set_byte(d-config + PCI_HEADER_TYPE,
- PCI_HEADER_TYPE_NORMAL);
 return 0;
 }
 
diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
index aa0c51b..b3a5f54 100644
--- a/hw/grackle_pci.c
+++ b/hw/grackle_pci.c
@@ -126,7 +126,6 @@ static int grackle_pci_host_init(PCIDevice *d)
 d-config[0x08] = 0x00; // revision
 d-config[0x09] = 0x01;
 pci_config_set_class(d-config, PCI_CLASS_BRIDGE_HOST);
-d-config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 return 0;
 }
 
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 559147f..756ee81 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -240,7 +240,6 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
 pci_conf[PCI_CLASS_PROG] = 0x8f;
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
-pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 
 pci_conf[0x51] = 0x04; // enable IDE0
 if (d-secondary) {
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index dad6e86..8817915 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -122,7 +122,6 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
 
 pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
-pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 
 qemu_register_reset(piix3_reset, d);
 
diff --git a/hw/macio.c b/hw/macio.c
index e92e82a..789ca55 100644
--- a/hw/macio.c
+++ b/hw/macio.c
@@ -110,7 +110,6 @@ void macio_init (PCIBus *bus, int device_id, int 
is_oldworld, int pic_mem_index,
 pci_config_set_vendor_id(d-config, PCI_VENDOR_ID_APPLE);
 pci_config_set_device_id(d-config, device_id);
 pci_config_set_class(d-config, PCI_CLASS_OTHERS  8);
-d-config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 
 d-config[0x3d] = 0x01; // interrupt on pin 1
 
diff --git a/hw/ne2000.c b/hw/ne2000.c
index 78fe14f..126e7cf 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -723,7 +723,6 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REALTEK);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REALTEK_8029);
 pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
-pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 /* TODO: RST# value should be 0. PCI spec 6.2.4 */
 pci_conf[PCI_INTERRUPT_PIN] = 1; // interrupt pin 0
 
diff --git a/hw/openpic.c b/hw/openpic.c
index ac21993..2bbf787 100644
--- a/hw/openpic.c
+++ b/hw/openpic.c
@@ -1194,7 +1194,6 @@ qemu_irq

[Qemu-devel] Re: [SeaBIOS] [PATCHv2] load hpet info for HPET ACPI table from qemu

2010-06-17 Thread Gleb Natapov

On Wed, Jun 16, 2010 at 08:55:05PM -0400, Kevin O'Connor wrote:
 On Tue, Jun 15, 2010 at 07:41:02AM +0300, Avi Kivity wrote:
  On 06/14/2010 09:25 PM, Kevin O'Connor wrote:
  This seems to be a philosophical distinction.  Lets go over the
  practical implications.
  In my experience, well-defined interfaces (philosophical
  distinctions) are more important in the long term than
  practicalities.
 
 I agree with the importance of clean interfaces.  However, I feel the
 current approach to acpi table handling between qemu and seabios is
 not ideal.  I'm proposing an interface which I believe is an
 improvement.
 
What is it? If you mean passing ACPI tables, then this is huge step
backwards. Qemu started like that and then Fabrice moved everything to
BIOS where it should be. Check these commit in qemu git:
2146e8389f267a5fb751106b0dfc6421808ccbd0
362ee297a6a977801758302c68b4ef5d1af76ab3
a0910fa4a5371c2cc70cb9f63ccc65b66decbb22
71b300df920a39db4368723765eed533f5fc209b

  If a table needs to refer to some other information which is in a
  table that is generated by seabios, we cannot generate this table
  from qemu.  That's much worse that reviewing and applying two
  patches.
 
 I understand.  Such tables would not make sense to generate in qemu.
 
So now we generate part of the tables in qemu and part in seabios. This
is not sane.

 I also understand and appreciate the desire for qemu to not touch
 guest memory.  This means rsdp and rsdt are the domain of seabios.
 The only other table that directly addresses the memory location of
 another table (that I know of) is fadt - which is also tied to
 seabios' smi handler - so this too is in seabios' domain.
 
You see now you are starting to rationalize about which tables should
be generated by bios and which are not and the fact is that ACPI was
designed with assumption that firmware generates tables, so how can you
be sure that you will not find flaws in your rationalization later on?

 I'm not aware of any dependencies to seabios in any of the other
 tables (eg, madt, srat, hpet, ssdt, dsdt).
With certain HW configs firmware may need to configure hpet to function
in legacy mode (replacing RTC/PIT) so if seabios is unaware of hpet no
doesn't mean it will stay that way.

 
  I'm not suggesting a radical rethink of fwcfg, but I fail to see the
  advantage in introducing the arbitrary struct hpet_fw_entry when
  there is a perfectly good, well defined, struct acpi_20_hpet that
  already exists.  This new arbitrary intermediate format just
  introduces make work for all of us.
  
  Choosing an existing format is fine.  But seabios blindly copying
  qemu provided data is wrong IMO.
 
 Okay.  For struct acpi_20_hpet, what transformations or checks do
 you think seabios needs to perform?
 
Lets concentrate on principles and not hpet in particular. Just because
proposed interface is similar to how HPET ACPI table looks doesn't mean
we should build ACPI tables in qemu. I could have been lazy and pass
only has hpet flag to the seabios, but I tried to make interface
general to cover future qemu enhancements.

But lets get back to principles. Since qemu is not one platform but very
dynamic system and firmware is tightly coupled with the platform it runs
on and we do not want to have separate firmware for each possible
configuration the only solution is to make firmware to be dynamically
adoptable to platform it runs on. For that every bit of information
about underlying HW should be discoverable by firmware. Not all HW was
designed to be discoverable by software, so we need to create a channel
between qemu and firmware to pass information about otherwise undiscoverable
devices. Hpet is only one of such devices, so we need a way to pass platform
device tree into firmware. We can find generic way to do this for all
devices, or we may introduce separate interface for each device when
needed like propose patch does.

--
Gleb.

[Qemu-devel] Re: [SeaBIOS] [PATCHv2] load hpet info for HPET ACPI table from qemu

2010-06-17 Thread Peter Stuge

Avi Kivity wrote:
 In general, ACPI code can work with memory or device registers that have 
 been initialized by the BIOS and depend on them.  It's possible to write 
 ACPI code that depends on preceding BIOS code.

It's also possible to write C code that makes extensive use of goto. :)

To be fair, ACPI bytecode for actual hardware may need to rely on the
firmware because of limitations in the hardware. But I think qemu is
one instance where any ACPI bytecode can be kept simple.


 I don't know if that's the case with our ACPI implementation.

Where is that code? Who has the ASL for qemu? Who wrote it? Is it big?


//Peter

Re: [Qemu-devel] [PATCH 02/10] qdev: export qdev_reset() for later use.

2010-06-17 Thread Markus Armbruster

Isaku Yamahata yamah...@valinux.co.jp writes:

 export qdev_reset() for later use.

 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
 ---
  hw/qdev.c |   13 +
  hw/qdev.h |1 +
  2 files changed, 10 insertions(+), 4 deletions(-)

 diff --git a/hw/qdev.c b/hw/qdev.c
 index 61f999c..378f842 100644
 --- a/hw/qdev.c
 +++ b/hw/qdev.c
 @@ -256,13 +256,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
  return qdev;
  }
  
 -static void qdev_reset(void *opaque)
 +void qdev_reset(DeviceState *dev)
  {
 -DeviceState *dev = opaque;
  if (dev-info-reset)
  dev-info-reset(dev);
  }
  
 +static void qdev_reset_fn(void *opaque)
 +{
 +DeviceState *dev = opaque;
 +qdev_reset(dev);
 +}
 +

Nitpick: why the local variable?

[...]

[Qemu-devel] Re: [PATCH] SeaBIOS: Fix bvprintf() to respect padding for hex printing.

2010-06-17 Thread Jes Sorensen

On 06/17/10 04:42, Kevin O'Connor wrote:
 On Mon, Jun 14, 2010 at 02:04:31PM +0200, jes.soren...@redhat.com wrote:
 From: Jes Sorensen jes.soren...@redhat.com

 Fix bvprintf to respect space padding when printing hex numbers
 and the caller specifies alignment without zero padding, eg. %2x
 as opposed to %02x
 
 I thought your patch would increase stack space in 16bit mode, but
 oddly it seems to actually reduce stack space (at least on gcc4.4.4).
 
 So, the patch looks good, but I think you missed the case where the
 length given is smaller than the actual number, and %p needs to use
 zero padding.  How about the below instead.

Hi Kevin,

DOH, you're right! Your patch looks good to me so
Signed-off-by: Jes Sorensen jes.soren...@redhat.com

Thanks for catching this.

Cheers,
Jes

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 06:00:56PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 12:35:16PM +0300, Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 11:33:13AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:57:35AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:51:14AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:03:01AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 12:40:28AM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 There is no need starting with the special value for 
 hpet_cfg.count.
 Either Seabios is aware of the new firmware interface and properly
 interprets the counter or it simply ignores it anyway.

 I want seabios to be able to distinguish between old qemu and new 
 one.
 I see now. But isn't it a good chance to introduce a proper generic
 interface for exploring supported fw-cfg keys?

 Having such interface would be nice. Pity we haven't introduced it 
 from
 the start. If we do it now seabios will have to find out somehow that
 qemu support such interface. Chicken and egg ;)
 That is easy: Add a key the describes the highest supported key value
 (looks like this is monotonously increasing). Older qemu versions will
 return 0.

 That will not support holes in key space, and our key space is already
 sparse.
 Then add a service to obtain a bitmap of supported keys. If that bitmap
 is empty...

 Bitmap will be 2k long. We can add read capability to control port. To
 check if key is present you select it (write its value to control port)
 and then read control port back. If values is non-zero the key is valid.
 But how to detect qemu that does not support that?
 Isn't there some key that was always there and will always be?

 FW_CFG_SIGNATURE

 So any ideas? Or did I misunderstood your hint? ;)
 I thought you found the answer yourself:

 Seabios could select FW_CFG_SIGNATURE and then perform a read-back on
 the control register. Older QEMUs will return -1, versions that support
 the read-back 0. Problem solved, no?

 AFAIK QEMU returns 0 if io read was done from non-used port or mmio
 address, but can we rely on this? If we can then problem solved, if
 we can't then no.

It works for IO-based fw-cfg, but not for MMIO-based. So the firmware
should probably pick a non-zero key for this check, e.g. FW_CFG_ID.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 2/4] qemu: Enable XSAVE related CPUID

2010-06-17 Thread Sheng Yang

We can support it in KVM now. The 0xd leaf is queried from KVM.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 target-i386/cpuid.c |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 99d1f44..ab6536b 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1067,6 +1067,27 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx = 0;
 *edx = 0;
 break;
+case 0xD:
+/* Processor Extended State */
+if (!(env-cpuid_ext_features  CPUID_EXT_XSAVE)) {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+break;
+}
+if (kvm_enabled()) {
+*eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
+*edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
+} else {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+}
+break;
 case 0x8000:
 *eax = env-cpuid_xlevel;
 *ebx = env-cpuid_vendor1;
-- 
1.7.0.1

[Qemu-devel] [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 target-i386/cpu.h |5 ++
 target-i386/kvm.c |  134 +
 target-i386/machine.c |   20 +++
 3 files changed, 159 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 548ab80..75070d3 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -718,6 +718,11 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
+
+uint64_t xstate_bv;
+XMMReg ymmh_regs[CPU_NB_REGS];
+
+uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6a12f..90ff323 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
 } else {
 env-mp_state = KVM_MP_STATE_RUNNABLE;
 }
+/* Legal xcr0 for loading */
+env-xcr0 = 1;
 }
 
 static int kvm_has_msr_star(CPUState *env)
@@ -504,6 +506,57 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
 }
 
+#ifdef KVM_CAP_XSAVE
+
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+
+static int kvm_put_xsave(CPUState *env)
+{
+int i;
+struct kvm_xsave* xsave;
+uint16_t cwd, swd, twd, fop;
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env-fpus  ~(7  11);
+swd |= (env-fpstt  7)  11;
+cwd = env-fpuc;
+for (i = 0; i  8; ++i)
+twd |= (!env-fptags[i])  i;
+xsave-region[0] = (uint32_t)(swd  16) + cwd;
+xsave-region[1] = (uint32_t)(fop  16) + twd;
+memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
+sizeof env-fpregs);
+memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
+sizeof env-xmm_regs);
+xsave-region[XSAVE_MXCSR] = env-mxcsr;
+*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
+memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
+sizeof env-ymmh_regs);
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+}
+#endif
+
+#ifdef KVM_CAP_XCRS
+static int kvm_put_xcrs(CPUState *env)
+{
+struct kvm_xcrs xcrs;
+
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env-xcr0;
+return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
+}
+#endif
+
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -621,6 +674,59 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+#ifdef KVM_CAP_XSAVE
+static int kvm_get_xsave(CPUState *env)
+{
+struct kvm_xsave* xsave;
+int ret, i;
+uint16_t cwd, swd, twd, fop;
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+if (ret  0)
+return ret;
+
+cwd = (uint16_t)xsave-region[0];
+swd = (uint16_t)(xsave-region[0]  16);
+twd = (uint16_t)xsave-region[1];
+fop = (uint16_t)(xsave-region[1]  16);
+env-fpstt = (swd  11)  7;
+env-fpus = swd;
+env-fpuc = cwd;
+for (i = 0; i  8; ++i)
+env-fptags[i] = !((twd  i)  1);
+env-mxcsr = xsave-region[XSAVE_MXCSR];
+memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
+sizeof env-fpregs);
+memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
+sizeof env-xmm_regs);
+env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
+memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
+sizeof env-ymmh_regs);
+return 0;
+}
+#endif
+
+#ifdef KVM_CAP_XCRS
+static int kvm_get_xcrs(CPUState *env)
+{
+int i, ret;
+struct kvm_xcrs xcrs;
+
+ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, xcrs);
+if (ret  0)
+return ret;
+
+for (i = 0; i  xcrs.nr_xcrs; i++)
+/* Only support xcr0 now */
+if (xcrs.xcrs[0].xcr == 0) {
+env-xcr0 = xcrs.xcrs[0].value;
+break;
+}
+return 0;
+}
+#endif
+
 static int kvm_get_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -965,9 +1071,23 @@ int kvm_arch_put_registers(CPUState *env, int level)
 if (ret  0)
 return ret;
 
+#ifdef KVM_CAP_XSAVE
+if (kvm_check_extension(env-kvm_state, KVM_CAP_XSAVE))
+ret = kvm_put_xsave(env);
+else
+ret = kvm_put_fpu(env);
+#else
 ret = kvm_put_fpu(env);
+#endif
+if (ret  0)
+return ret;
+
+#ifdef KVM_CAP_XCRS
+if (kvm_check_extension(env-kvm_state, KVM_CAP_XCRS))
+ret = kvm_put_xcrs(env);
 if (ret  0)
 return ret;
+#endif
 
 ret = kvm_put_sregs(env);
 if (ret  0)
@@ -1009,9 +1129,23 @@ int kvm_arch_get_registers(CPUState *env)
 if (ret  0)
 return ret;
 
+#ifdef KVM_CAP_XSAVE
+if

[Qemu-devel] [PATCH 1/4] qemu: kvm: Extend kvm_arch_get_supported_cpuid() to support index

2010-06-17 Thread Sheng Yang

Would use it later for XSAVE related CPUID.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 kvm.h |2 +-
 target-i386/kvm.c |   19 +++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/kvm.h b/kvm.h
index a28e7aa..7975e87 100644
--- a/kvm.h
+++ b/kvm.h
@@ -145,7 +145,7 @@ bool kvm_arch_stop_on_emulation_error(CPUState *env);
 int kvm_check_extension(KVMState *s, unsigned int extension);
 
 uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-  int reg);
+  uint32_t index, int reg);
 void kvm_cpu_synchronize_state(CPUState *env);
 void kvm_cpu_synchronize_post_reset(CPUState *env);
 void kvm_cpu_synchronize_post_init(CPUState *env);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5a088a7..bb6a12f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -72,7 +72,8 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
 return cpuid;
 }
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int 
reg)
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+  uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
 int i, max;
@@ -89,7 +90,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t 
function, int reg)
 }
 
 for (i = 0; i  cpuid-nent; ++i) {
-if (cpuid-entries[i].function == function) {
+if (cpuid-entries[i].function == function 
+cpuid-entries[i].index == index) {
 switch (reg) {
 case R_EAX:
 ret = cpuid-entries[i].eax;
@@ -111,7 +113,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
  */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, R_EDX);
+cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, 
R_EDX);
 ret |= cpuid_1_edx  0x183f7ff;
 break;
 }
@@ -127,7 +129,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
 
 #else
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int 
reg)
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+  uint32_t index, int reg)
 {
 return -1U;
 }
@@ -179,16 +182,16 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 env-mp_state = KVM_MP_STATE_RUNNABLE;
 
-env-cpuid_features = kvm_arch_get_supported_cpuid(env, 1, R_EDX);
+env-cpuid_features = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
 
 i = env-cpuid_ext_features  CPUID_EXT_HYPERVISOR;
-env-cpuid_ext_features = kvm_arch_get_supported_cpuid(env, 1, R_ECX);
+env-cpuid_ext_features = kvm_arch_get_supported_cpuid(env, 1, 0, R_ECX);
 env-cpuid_ext_features |= i;
 
 env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(env, 0x8001,
- R_EDX);
+ 0, R_EDX);
 env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(env, 0x8001,
- R_ECX);
+ 0, R_ECX);
 
 cpuid_i = 0;
 
-- 
1.7.0.1

[Qemu-devel] [PATCH v4 0/4] XSAVE enabling in QEmu

2010-06-17 Thread Sheng Yang

Notice the first three patches applied to uq/master branch of qemu-kvm, the 
last one
applied to qemu-kvm master branch. And the last one would only apply after the
first three merged in master branch.

[Qemu-devel] [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang

Based on upstream xsave related fields.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu-kvm-x86.c |   95 +++-
 qemu-kvm.c |   24 ++
 qemu-kvm.h |   28 
 3 files changed, 146 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 3c33e64..dcef8b5 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct 
kvm_segment *rhs)
| (rhs-avl * DESC_AVL_MASK);
 }
 
+#ifdef KVM_CAP_XSAVE
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+#endif
+
 void kvm_arch_load_regs(CPUState *env, int level)
 {
 struct kvm_regs regs;
 struct kvm_fpu fpu;
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+#endif
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+#endif
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 int rc, n, i;
@@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
 
 kvm_set_regs(env, regs);
 
+#ifdef KVM_CAP_XSAVE
+if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+uint16_t cwd, swd, twd, fop;
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env-fpus  ~(7  11);
+swd |= (env-fpstt  7)  11;
+cwd = env-fpuc;
+for (i = 0; i  8; ++i)
+twd |= (!env-fptags[i])  i;
+xsave-region[0] = (uint32_t)(swd  16) + cwd;
+xsave-region[1] = (uint32_t)(fop  16) + twd;
+memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
+sizeof env-fpregs);
+memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
+sizeof env-xmm_regs);
+xsave-region[XSAVE_MXCSR] = env-mxcsr;
+*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
+memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
+sizeof env-ymmh_regs);
+kvm_set_xsave(env, xsave);
+#ifdef KVM_CAP_XCRS
+if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env-xcr0;
+kvm_set_xcrs(env, xcrs);
+}
+#endif /* KVM_CAP_XCRS */
+} else {
+#endif /* KVM_CAP_XSAVE */
 memset(fpu, 0, sizeof fpu);
 fpu.fsw = env-fpus  ~(7  11);
 fpu.fsw |= (env-fpstt  7)  11;
 fpu.fcw = env-fpuc;
 for (i = 0; i  8; ++i)
-   fpu.ftwx |= (!env-fptags[i])  i;
+fpu.ftwx |= (!env-fptags[i])  i;
 memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
 memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
 fpu.mxcsr = env-mxcsr;
 kvm_set_fpu(env, fpu);
+#ifdef KVM_CAP_XSAVE
+}
+#endif
 
 memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
 if (env-interrupt_injected = 0) {
@@ -934,6 +987,12 @@ void kvm_arch_save_regs(CPUState *env)
 {
 struct kvm_regs regs;
 struct kvm_fpu fpu;
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+#endif
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+#endif
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 uint32_t hflags;
@@ -965,6 +1024,37 @@ void kvm_arch_save_regs(CPUState *env)
 env-eflags = regs.rflags;
 env-eip = regs.rip;
 
+#ifdef KVM_CAP_XSAVE
+if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+uint16_t cwd, swd, twd, fop;
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+kvm_get_xsave(env, xsave);
+cwd = (uint16_t)xsave-region[0];
+swd = (uint16_t)(xsave-region[0]  16);
+twd = (uint16_t)xsave-region[1];
+fop = (uint16_t)(xsave-region[1]  16);
+env-fpstt = (swd  11)  7;
+env-fpus = swd;
+env-fpuc = cwd;
+for (i = 0; i  8; ++i)
+env-fptags[i] = !((twd  i)  1);
+env-mxcsr = xsave-region[XSAVE_MXCSR];
+memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
+sizeof env-fpregs);
+memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
+sizeof env-xmm_regs);
+env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
+memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
+sizeof env-ymmh_regs);
+#ifdef KVM_CAP_XCRS
+if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
+kvm_get_xcrs(env, xcrs);
+if (xcrs.xcrs[0].xcr == 0)
+env-xcr0 = xcrs.xcrs[0].value;
+}
+#endif
+} else {
+#endif
 kvm_get_fpu(env, fpu);
 env-fpstt = (fpu.fsw  11)  7;
 env-fpus = fpu.fsw;
@@ -974,6 +1064,9 @@ void kvm_arch_save_regs(CPUState *env)
 memcpy(env-fpregs, fpu.fpr, sizeof env-fpregs);
 memcpy(env-xmm_regs,

[Qemu-devel] Re: [PATCH v3] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
 On Thursday 17 June 2010 00:05:44 Marcelo Tosatti wrote:
 On Wed, Jun 16, 2010 at 05:48:46PM +0200, Jan Kiszka wrote:
 Marcelo Tosatti wrote:
 On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---

  qemu-kvm-x86.c|  109
  - qemu-kvm.c
 |   24 +++
  qemu-kvm.h|   28 +
  target-i386/cpu.h |5 ++
  target-i386/kvm.c |2 +
  target-i386/machine.c |   20 +
  6 files changed, 169 insertions(+), 19 deletions(-)
 Applied, thanks.
 Oops, late remark: Why introducing this feature against qemu-kvm instead
 of upstream? Doesn't this just generate additional conversion work and
 the risk of divergence to upstream in the migration protocol?
 
 Hi Jan
 
 You're late... Hope you could raise the comment earlier next time so we can 
 work 
 together more efficient.

This case is lost, probably was already when you posted the first
time. But I hope we can raise awareness for the issue that way again.

 Thats true. Sheng, can you add save/restore support to uq/master to
 avoid these problems?
 
 Yes, there is divergence risk, would send an upstream version as well.
 
 But I think as long as qemu-kvm and qemu upstream use different LM path, the 
 duplicate code/work can't be avoid. 

Probably. The vision is that one day you can write a KVM feature and
apply it to qemu-kvm as a staging tree for later unmodified merge into
qemu upstream. qemu-kvm[-arch].[ch] is still in our way, but it already
uses many bits from upstream. So I would recommend to design new
features against upstream first and then provide the few bits to also
make use of it in qemu-kvm once the latter has merged the required bits
(which may actually happen before upstream, but that doesn't matter).

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  target-i386/cpu.h |5 ++
  target-i386/kvm.c |  134 
 +
  target-i386/machine.c |   20 +++
  3 files changed, 159 insertions(+), 0 deletions(-)
 
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index 548ab80..75070d3 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -718,6 +718,11 @@ typedef struct CPUX86State {
  uint16_t fpus_vmstate;
  uint16_t fptag_vmstate;
  uint16_t fpregs_format_vmstate;
 +
 +uint64_t xstate_bv;
 +XMMReg ymmh_regs[CPU_NB_REGS];
 +
 +uint64_t xcr0;
  } CPUX86State;
  
  CPUX86State *cpu_x86_init(const char *cpu_model);
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index bb6a12f..90ff323 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
  } else {
  env-mp_state = KVM_MP_STATE_RUNNABLE;
  }
 +/* Legal xcr0 for loading */
 +env-xcr0 = 1;
  }
  
  static int kvm_has_msr_star(CPUState *env)
 @@ -504,6 +506,57 @@ static int kvm_put_fpu(CPUState *env)
  return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
  }
  
 +#ifdef KVM_CAP_XSAVE
 +
 +#define XSAVE_CWD_RIP 2
 +#define XSAVE_CWD_RDP 4
 +#define XSAVE_MXCSR   6
 +#define XSAVE_ST_SPACE8
 +#define XSAVE_XMM_SPACE   40
 +#define XSAVE_XSTATE_BV   128
 +#define XSAVE_YMMH_SPACE  144
 +
 +static int kvm_put_xsave(CPUState *env)
 +{
 +int i;
 +struct kvm_xsave* xsave;
 +uint16_t cwd, swd, twd, fop;
 +
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +memset(xsave, 0, sizeof(struct kvm_xsave));
 +cwd = swd = twd = fop = 0;
 +swd = env-fpus  ~(7  11);
 +swd |= (env-fpstt  7)  11;
 +cwd = env-fpuc;
 +for (i = 0; i  8; ++i)
 +twd |= (!env-fptags[i])  i;
 +xsave-region[0] = (uint32_t)(swd  16) + cwd;
 +xsave-region[1] = (uint32_t)(fop  16) + twd;
 +memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
 +sizeof env-fpregs);
 +memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
 +sizeof env-xmm_regs);
 +xsave-region[XSAVE_MXCSR] = env-mxcsr;
 +*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
 +memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
 +sizeof env-ymmh_regs);
 +return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
 +}
 +#endif
 +
 +#ifdef KVM_CAP_XCRS
 +static int kvm_put_xcrs(CPUState *env)
 +{
 +struct kvm_xcrs xcrs;
 +
 +xcrs.nr_xcrs = 1;
 +xcrs.flags = 0;
 +xcrs.xcrs[0].xcr = 0;
 +xcrs.xcrs[0].value = env-xcr0;
 +return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
 +}
 +#endif
 +
  static int kvm_put_sregs(CPUState *env)
  {
  struct kvm_sregs sregs;
 @@ -621,6 +674,59 @@ static int kvm_get_fpu(CPUState *env)
  return 0;
  }
  
 +#ifdef KVM_CAP_XSAVE
 +static int kvm_get_xsave(CPUState *env)
 +{
 +struct kvm_xsave* xsave;
 +int ret, i;
 +uint16_t cwd, swd, twd, fop;
 +
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
 +if (ret  0)
 +return ret;
 +
 +cwd = (uint16_t)xsave-region[0];
 +swd = (uint16_t)(xsave-region[0]  16);
 +twd = (uint16_t)xsave-region[1];
 +fop = (uint16_t)(xsave-region[1]  16);
 +env-fpstt = (swd  11)  7;
 +env-fpus = swd;
 +env-fpuc = cwd;
 +for (i = 0; i  8; ++i)
 +env-fptags[i] = !((twd  i)  1);
 +env-mxcsr = xsave-region[XSAVE_MXCSR];
 +memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
 +sizeof env-fpregs);
 +memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
 +sizeof env-xmm_regs);
 +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
 +memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
 +sizeof env-ymmh_regs);
 +return 0;
 +}
 +#endif
 +
 +#ifdef KVM_CAP_XCRS
 +static int kvm_get_xcrs(CPUState *env)
 +{
 +int i, ret;
 +struct kvm_xcrs xcrs;
 +
 +ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, xcrs);
 +if (ret  0)
 +return ret;
 +
 +for (i = 0; i  xcrs.nr_xcrs; i++)
 +/* Only support xcr0 now */
 +if (xcrs.xcrs[0].xcr == 0) {
 +env-xcr0 = xcrs.xcrs[0].value;
 +break;
 +}
 +return 0;
 +}
 +#endif
 +
  static int kvm_get_sregs(CPUState *env)
  {
  struct kvm_sregs sregs;
 @@ -965,9 +1071,23 @@ int kvm_arch_put_registers(CPUState *env, int level)
  if (ret  0)
  return ret;
  
 +#ifdef KVM_CAP_XSAVE
 +if (kvm_check_extension(env-kvm_state, KVM_CAP_XSAVE))
 +ret = kvm_put_xsave(env);
 +else
 +ret = kvm_put_fpu(env);
 +#else
  ret = kvm_put_fpu(env);
 +#endif
 +if (ret  0)
 +return ret;
 +
 +#ifdef KVM_CAP_XCRS
 +if (kvm_check_extension(env-kvm_state, KVM_CAP_XCRS))
 +ret = kvm_put_xcrs(env);
  if (ret  0)
  return ret;
 +#endif

[Qemu-devel] Re: [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
 Based on upstream xsave related fields.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  qemu-kvm-x86.c |   95 
 +++-
  qemu-kvm.c |   24 ++
  qemu-kvm.h |   28 
  3 files changed, 146 insertions(+), 1 deletions(-)
 
 diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
 index 3c33e64..dcef8b5 100644
 --- a/qemu-kvm-x86.c
 +++ b/qemu-kvm-x86.c
 @@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct 
 kvm_segment *rhs)
   | (rhs-avl * DESC_AVL_MASK);
  }
  
 +#ifdef KVM_CAP_XSAVE
 +#define XSAVE_CWD_RIP 2
 +#define XSAVE_CWD_RDP 4
 +#define XSAVE_MXCSR   6
 +#define XSAVE_ST_SPACE8
 +#define XSAVE_XMM_SPACE   40
 +#define XSAVE_XSTATE_BV   128
 +#define XSAVE_YMMH_SPACE  144
 +#endif
 +
  void kvm_arch_load_regs(CPUState *env, int level)
  {
  struct kvm_regs regs;
  struct kvm_fpu fpu;
 +#ifdef KVM_CAP_XSAVE
 +struct kvm_xsave* xsave;
 +#endif
 +#ifdef KVM_CAP_XCRS
 +struct kvm_xcrs xcrs;
 +#endif
  struct kvm_sregs sregs;
  struct kvm_msr_entry msrs[100];
  int rc, n, i;
 @@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
  
  kvm_set_regs(env, regs);
  
 +#ifdef KVM_CAP_XSAVE
 +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
 +uint16_t cwd, swd, twd, fop;
 +
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +memset(xsave, 0, sizeof(struct kvm_xsave));
 +cwd = swd = twd = fop = 0;
 +swd = env-fpus  ~(7  11);
 +swd |= (env-fpstt  7)  11;
 +cwd = env-fpuc;
 +for (i = 0; i  8; ++i)
 +twd |= (!env-fptags[i])  i;
 +xsave-region[0] = (uint32_t)(swd  16) + cwd;
 +xsave-region[1] = (uint32_t)(fop  16) + twd;
 +memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
 +sizeof env-fpregs);
 +memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
 +sizeof env-xmm_regs);
 +xsave-region[XSAVE_MXCSR] = env-mxcsr;
 +*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
 +memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
 +sizeof env-ymmh_regs);
 +kvm_set_xsave(env, xsave);
 +#ifdef KVM_CAP_XCRS
 +if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
 +xcrs.nr_xcrs = 1;
 +xcrs.flags = 0;
 +xcrs.xcrs[0].xcr = 0;
 +xcrs.xcrs[0].value = env-xcr0;
 +kvm_set_xcrs(env, xcrs);
 +}
 +#endif /* KVM_CAP_XCRS */
 +} else {
 +#endif /* KVM_CAP_XSAVE */

Why not reusing kvm_put/get_xsave as defined for upstream? There should
be enough examples for that pattern. The result will be a tiny qemu-kvm
patch.

Jan

  memset(fpu, 0, sizeof fpu);
  fpu.fsw = env-fpus  ~(7  11);
  fpu.fsw |= (env-fpstt  7)  11;
  fpu.fcw = env-fpuc;
  for (i = 0; i  8; ++i)
 - fpu.ftwx |= (!env-fptags[i])  i;
 +fpu.ftwx |= (!env-fptags[i])  i;
  memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
  memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
  fpu.mxcsr = env-mxcsr;
  kvm_set_fpu(env, fpu);
 +#ifdef KVM_CAP_XSAVE
 +}
 +#endif
  
  memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
  if (env-interrupt_injected = 0) {
 @@ -934,6 +987,12 @@ void kvm_arch_save_regs(CPUState *env)
  {
  struct kvm_regs regs;
  struct kvm_fpu fpu;
 +#ifdef KVM_CAP_XSAVE
 +struct kvm_xsave* xsave;
 +#endif
 +#ifdef KVM_CAP_XCRS
 +struct kvm_xcrs xcrs;
 +#endif
  struct kvm_sregs sregs;
  struct kvm_msr_entry msrs[100];
  uint32_t hflags;
 @@ -965,6 +1024,37 @@ void kvm_arch_save_regs(CPUState *env)
  env-eflags = regs.rflags;
  env-eip = regs.rip;
  
 +#ifdef KVM_CAP_XSAVE
 +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
 +uint16_t cwd, swd, twd, fop;
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +kvm_get_xsave(env, xsave);
 +cwd = (uint16_t)xsave-region[0];
 +swd = (uint16_t)(xsave-region[0]  16);
 +twd = (uint16_t)xsave-region[1];
 +fop = (uint16_t)(xsave-region[1]  16);
 +env-fpstt = (swd  11)  7;
 +env-fpus = swd;
 +env-fpuc = cwd;
 +for (i = 0; i  8; ++i)
 +env-fptags[i] = !((twd  i)  1);
 +env-mxcsr = xsave-region[XSAVE_MXCSR];
 +memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
 +sizeof env-fpregs);
 +memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
 +sizeof env-xmm_regs);
 +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
 +memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
 +sizeof env-ymmh_regs);
 +#ifdef KVM_CAP_XCRS
 +if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
 +kvm_get_xcrs(env, xcrs);
 +if (xcrs.xcrs[0].xcr == 0)
 +

Re: [Qemu-devel] Q35 qemu repository?

2010-06-17 Thread Isaku Yamahata

Thanks for the patch.
Does vista boot with the patch eventually?

On Wed, Jun 16, 2010 at 06:33:15PM +0100, Matthew Garrett wrote:
 On Wed, Jun 16, 2010 at 04:42:10PM +0100, Matthew Garrett wrote:
 
  Thanks for this - however, Vista gives me an ACPI error on boot (stop 
  0x00a5, 0x000d, which indicates that there's a malformed or 
  undefined ACPI device). I don't suppose you have any idea what the 
  problem here may be? Linux boots without complaint.
 
 Fixed with the following patch. Any devices with duplicate _HIDs require 
 _UIDs.
 
 diff --git a/src/q35-acpi-dsdt.dsl b/src/q35-acpi-dsdt.dsl
 index ad05c7a..4697527 100644
 --- a/src/q35-acpi-dsdt.dsl
 +++ b/src/q35-acpi-dsdt.dsl
 @@ -45,6 +45,7 @@ DefinitionBlock (
  Device (DBG0)
  {
  Name(_HID, EISAID(PNP0C02))
 +Name(_UID, 0)
  Name(_CRS, ResourceTemplate() {
  IO (Decode16, 0xb080, 0xb080, 0x00, 0x04)
  })
 @@ -71,6 +72,7 @@ DefinitionBlock (
  Device(HP0)
  {
  Name(_HID, EISAID(PNP0C02))
 +Name(_UID, 0x01)
  Name(_CRS, ResourceTemplate() {
  IO (Decode16, 0xae00, 0xae00, 0x00, 0x0C)
  IO (Decode16, 0xae0c, 0xae0c, 0x00, 0x01)
 
 -- 
 Matthew Garrett | mj...@srcf.ucam.org
 

-- 
yamahata

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 09:17:51AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 06:00:56PM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 12:35:16PM +0300, Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 11:33:13AM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 09:57:35AM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 09:51:14AM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 09:03:01AM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Wed, Jun 16, 2010 at 12:40:28AM +0200, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  There is no need starting with the special value for 
  hpet_cfg.count.
  Either Seabios is aware of the new firmware interface and 
  properly
  interprets the counter or it simply ignores it anyway.
 
  I want seabios to be able to distinguish between old qemu and 
  new one.
  I see now. But isn't it a good chance to introduce a proper 
  generic
  interface for exploring supported fw-cfg keys?
 
  Having such interface would be nice. Pity we haven't introduced it 
  from
  the start. If we do it now seabios will have to find out somehow 
  that
  qemu support such interface. Chicken and egg ;)
  That is easy: Add a key the describes the highest supported key 
  value
  (looks like this is monotonously increasing). Older qemu versions 
  will
  return 0.
 
  That will not support holes in key space, and our key space is 
  already
  sparse.
  Then add a service to obtain a bitmap of supported keys. If that 
  bitmap
  is empty...
 
  Bitmap will be 2k long. We can add read capability to control port. To
  check if key is present you select it (write its value to control port)
  and then read control port back. If values is non-zero the key is 
  valid.
  But how to detect qemu that does not support that?
  Isn't there some key that was always there and will always be?
 
  FW_CFG_SIGNATURE
 
  So any ideas? Or did I misunderstood your hint? ;)
  I thought you found the answer yourself:
 
  Seabios could select FW_CFG_SIGNATURE and then perform a read-back on
  the control register. Older QEMUs will return -1, versions that support
  the read-back 0. Problem solved, no?
 
  AFAIK QEMU returns 0 if io read was done from non-used port or mmio
  address, but can we rely on this? If we can then problem solved, if
  we can't then no.
 
 It works for IO-based fw-cfg, but not for MMIO-based. So the firmware
 should probably pick a non-zero key for this check, e.g. FW_CFG_ID.
 
Sorry, I lost you here. What works for IO-based fw-cfg, but not for
MMIO-based. Can you write pseudo logic of how you think it
all should work?

--
Gleb.

Re: [Qemu-devel] Re: [PATCH 1/2] qemu-io: retry fgets() when errno is EINTR

2010-06-17 Thread Kevin Wolf

Am 16.06.2010 18:52, schrieb MORITA Kazutaka:
 At Wed, 16 Jun 2010 13:04:47 +0200,
 Kevin Wolf wrote:

 Am 15.06.2010 19:53, schrieb MORITA Kazutaka:
 posix-aio-compat sends a signal in aio operations, so we should
 consider that fgets() could be interrupted here.

 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 ---
  cmd.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/cmd.c b/cmd.c
 index 2336334..460df92 100644
 --- a/cmd.c
 +++ b/cmd.c
 @@ -272,7 +272,10 @@ fetchline(void)
 return NULL;
 printf(%s, get_prompt());
 fflush(stdout);
 +again:
 if (!fgets(line, MAXREADLINESZ, stdin)) {
 +   if (errno == EINTR)
 +   goto again;
 free(line);
 return NULL;
 }

 This looks like a loop replaced by goto (and braces are missing). What
 about this instead?

 do {
 ret = fgets(...)
 } while (ret == NULL  errno == EINTR)

 if (ret == NULL) {
fail
 }

 
 I agree.
 
 However, it seems that my second patch have already solved the
 problem.  We register this readline routines as an aio handler now, so
 fgets() does not block and cannot return with EINTR.
 
 This patch looks no longer needed, sorry.

Good point. Thanks for having a look.

Kevin

[Qemu-devel] [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 target-i386/cpu.h |7 ++-
 target-i386/kvm.c |  139 -
 target-i386/machine.c |   20 +++
 3 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 548ab80..680eed1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -718,6 +718,11 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
+
+uint64_t xstate_bv;
+XMMReg ymmh_regs[CPU_NB_REGS];
+
+uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define cpu_list_id x86_cpu_list
 #define cpudef_setup   x86_cpudef_setup
 
-#define CPU_SAVE_VERSION 11
+#define CPU_SAVE_VERSION 12
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6a12f..e490c0a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
 } else {
 env-mp_state = KVM_MP_STATE_RUNNABLE;
 }
+/* Legal xcr0 for loading */
+env-xcr0 = 1;
 }
 
 static int kvm_has_msr_star(CPUState *env)
@@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
 }
 
+#ifdef KVM_CAP_XSAVE
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+#endif
+
+static int kvm_put_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+int i;
+struct kvm_xsave* xsave;
+uint16_t cwd, swd, twd, fop;
+
+if (kvm_check_extension(env-kvm_state, KVM_CAP_XSAVE))
+return kvm_put_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env-fpus  ~(7  11);
+swd |= (env-fpstt  7)  11;
+cwd = env-fpuc;
+for (i = 0; i  8; ++i)
+twd |= (!env-fptags[i])  i;
+xsave-region[0] = (uint32_t)(swd  16) + cwd;
+xsave-region[1] = (uint32_t)(fop  16) + twd;
+memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
+sizeof env-fpregs);
+memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
+sizeof env-xmm_regs);
+xsave-region[XSAVE_MXCSR] = env-mxcsr;
+*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
+memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
+sizeof env-ymmh_regs);
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+#else
+return kvm_put_fpu(env);
+#endif
+}
+
+static int kvm_put_xcrs(CPUState *env)
+{
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+
+if (!kvm_check_extension(env-kvm_state, KVM_CAP_XCRS))
+return 0;
+
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env-xcr0;
+return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
+#else
+return 0;
+#endif
+}
+
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+static int kvm_get_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+int ret, i;
+uint16_t cwd, swd, twd, fop;
+
+if (!kvm_check_extension(env-kvm_state, KVM_CAP_XSAVE))
+return kvm_get_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+if (ret  0)
+return ret;
+
+cwd = (uint16_t)xsave-region[0];
+swd = (uint16_t)(xsave-region[0]  16);
+twd = (uint16_t)xsave-region[1];
+fop = (uint16_t)(xsave-region[1]  16);
+env-fpstt = (swd  11)  7;
+env-fpus = swd;
+env-fpuc = cwd;
+for (i = 0; i  8; ++i)
+env-fptags[i] = !((twd  i)  1);
+env-mxcsr = xsave-region[XSAVE_MXCSR];
+memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
+sizeof env-fpregs);
+memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
+sizeof env-xmm_regs);
+env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
+memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
+sizeof env-ymmh_regs);
+return 0;
+#else
+return kvm_get_fpu(env);
+#endif
+}
+
+static int kvm_get_xcrs(CPUState *env)
+{
+#ifdef KVM_CAP_XCRS
+int i, ret;
+struct kvm_xcrs xcrs;
+
+if (!kvm_check_extension(env-kvm_state, KVM_CAP_XCRS))
+return 0;
+
+ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, xcrs);
+if (ret  0)
+return ret;
+
+for (i = 0; i  xcrs.nr_xcrs; i++)
+/* Only support xcr0 now */
+if (xcrs.xcrs[0].xcr == 0) {
+env-xcr0 = xcrs.xcrs[0].value;
+break;
+}
+return 0;
+#else
+return 0;
+#endif
+}
+
 static int

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
 On Thu, Jun 17, 2010 at 09:17:51AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 06:00:56PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 12:35:16PM +0300, Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 11:33:13AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:57:35AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:51:14AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:03:01AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 12:40:28AM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 There is no need starting with the special value for 
 hpet_cfg.count.
 Either Seabios is aware of the new firmware interface and 
 properly
 interprets the counter or it simply ignores it anyway.

 I want seabios to be able to distinguish between old qemu and 
 new one.
 I see now. But isn't it a good chance to introduce a proper 
 generic
 interface for exploring supported fw-cfg keys?

 Having such interface would be nice. Pity we haven't introduced it 
 from
 the start. If we do it now seabios will have to find out somehow 
 that
 qemu support such interface. Chicken and egg ;)
 That is easy: Add a key the describes the highest supported key 
 value
 (looks like this is monotonously increasing). Older qemu versions 
 will
 return 0.

 That will not support holes in key space, and our key space is 
 already
 sparse.
 Then add a service to obtain a bitmap of supported keys. If that 
 bitmap
 is empty...

 Bitmap will be 2k long. We can add read capability to control port. To
 check if key is present you select it (write its value to control port)
 and then read control port back. If values is non-zero the key is 
 valid.
 But how to detect qemu that does not support that?
 Isn't there some key that was always there and will always be?

 FW_CFG_SIGNATURE

 So any ideas? Or did I misunderstood your hint? ;)
 I thought you found the answer yourself:

 Seabios could select FW_CFG_SIGNATURE and then perform a read-back on
 the control register. Older QEMUs will return -1, versions that support
 the read-back 0. Problem solved, no?

 AFAIK QEMU returns 0 if io read was done from non-used port or mmio
 address, but can we rely on this? If we can then problem solved, if
 we can't then no.
 It works for IO-based fw-cfg, but not for MMIO-based. So the firmware
 should probably pick a non-zero key for this check, e.g. FW_CFG_ID.

 Sorry, I lost you here. What works for IO-based fw-cfg, but not for
 MMIO-based.

Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
you need to select a key that is different from both.

 Can you write pseudo logic of how you think it
 all should work?

The firmware should do this:

write(CTL_BASE, FW_CFG_ID);
if (read(CTL_BASE) != FW_CFG_ID)
deal_with_old_qemu();
else
check_for_supported_keys();

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Re: RFC v2: blockdev_add friends, brief rationale, QMP docs

2010-06-17 Thread Kevin Wolf

Am 16.06.2010 20:07, schrieb Anthony Liguori:
   But it requires that
 everything that -blockdev provides is accessible with -drive, too (or
 that we're okay with users hating us).

 
 I'm happy for -drive to die.  I think we should support -hda and 
 -blockdev. 

-hda is not sufficient for most users. It doesn't provide any options.
It doesn't even support virtio. If -drive is going to die (and we seem
to agree all on that), then -blockdev needs to be usable for users (and
it's only you who contradicts so far).

 -blockdev should be optimized for config files, not single 
 argument input.  IOW:
 
 [blockdev blk2]
   format = raw
   file = /path/to/base.img
   cache = writeback
 
 [blockdev blk1]
format = qcow2
file = /path/to/leaf.img
cache=off
backing_dev = blk2
 
 [device disk1]
driver = ide-drive
blockdev = blk1
bus = 0
unit = 0

You don't specify the backing file of an image on the command line (or
in the configuration file). It's saved as part of the image. It's more
like this (for a simple raw image file):

[blockdev-protocol proto1]
   protocol = file
   file = /path/to/image.img

[blockdev blk1]
   format = raw
   cache=off
   protocol = proto1

[device disk1]
   driver = ide-drive
   blockdev = blk1
   bus = 0
   unit = 0

(This would be Markus' option 3, I think)

 Or:
 
 qemu -blockdev id=blk2,format=raw,file=/path/to/base.img,cache=writeback \
-blockdev 
 id=blk1,format=qcow2,file=/path/to/leaf.img,backing_dev=blk2 \
-device ide-disk,blockdev=blk1,bus=0,unit=0
 
 Or:
 
 qemu -hda /path/to/leaf.img
 
 And if a user really feels they need to modify the defaults, they can do:
 
 qemu -hda /path/to/leaf.img -writeconfig myconf.cfg
 
 And edit from there.
 
 But honestly, I'm thoroughly confused about the distinction between
 protocol and format.  I had thought that protocols were a type of format
 and I'm not sure why we're making a distinction.
  
 Technically, they are mostly the same. Logically, they are not. You have
 one image format driver (raw, qcow2, ...) that accesses its image data
 through one or more stacked protocols (file, host_device, nbd, http, ...).

 In the past we've had quite some trouble because there was no clear
 distinction. raw and file was the same. If you had an image on a block
 device, you were asking for trouble.

 
 As Christoph mentions, we really don't have stacked protocols and I'm 
 not sure they make sense.

Right, if we go for Christoph's suggestion, we don't need stacked
protocols. We'll have stacked formats instead. I'm not sure if you like
this any better. ;-)

We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
blkdebug on file. We need to be able to represent this.

 I sure prefer the latter.  The brackets look like noise.  You need to
 understand protocol stacking for them to make any sense.

 Regarding confusion caused by mixing format and protocol options: yes,
 the brackets force you to distinguish between protocol options and
 other options.  But I doubt that'll reduce confusion here.  Either you
 understand protocols.  Then I doubt you need brackets to unconfuse
 you.  Or you don't understand protocols.  Then whether to put an
 option inside or outside the brackets is voodoo.


 If the above is necessary just to create a raw image, then we're doing
 something wrong in the block layer.  If should be possible to just say:

 -blockdev id=blk1,format=raw,file=fedora.img
  
 I think we all agree on this (although it contradicts what you said
 above, because file is a property of the protocol). The question is how
 to specify protocols explicitly.
 
 I think raw doesn't make very much sense then.  What's the point of it 
 if it's just a thin wrapper around a protocol?

That it can be wrapped around any protocol. It's just about separating
code for handling the content of an image and code for accessing the image.

Ever tried something like qemu-img create -f raw /dev/something 10G?
You need the host_device protocol there, not the file protocol. When we
had raw == file this completely failed. And it's definitely reasonable
to expect that it works because the image format _is_ raw, it's just not
saved in a file.

Or the famous qcow2 images on block devices. Why did qemu guess the
format correctly when qcow2 was saved in a file, but not on a host
device? This was just inconsistent.

I've had more than one bug report about things like this which are
magically fixed when you do the layering right.

Kevin

[Qemu-devel] Re: [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang

On Thursday 17 June 2010 15:41:43 Jan Kiszka wrote:
 Sheng Yang wrote:
  Based on upstream xsave related fields.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   qemu-kvm-x86.c |   95
   +++- qemu-kvm.c
   |   24 ++
   qemu-kvm.h |   28 
   3 files changed, 146 insertions(+), 1 deletions(-)
  
  diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
  index 3c33e64..dcef8b5 100644
  --- a/qemu-kvm-x86.c
  +++ b/qemu-kvm-x86.c
  @@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct
  kvm_segment *rhs)
  
  | (rhs-avl * DESC_AVL_MASK);
   
   }
  
  +#ifdef KVM_CAP_XSAVE
  +#define XSAVE_CWD_RIP 2
  +#define XSAVE_CWD_RDP 4
  +#define XSAVE_MXCSR   6
  +#define XSAVE_ST_SPACE8
  +#define XSAVE_XMM_SPACE   40
  +#define XSAVE_XSTATE_BV   128
  +#define XSAVE_YMMH_SPACE  144
  +#endif
  +
  
   void kvm_arch_load_regs(CPUState *env, int level)
   {
   
   struct kvm_regs regs;
   struct kvm_fpu fpu;
  
  +#ifdef KVM_CAP_XSAVE
  +struct kvm_xsave* xsave;
  +#endif
  +#ifdef KVM_CAP_XCRS
  +struct kvm_xcrs xcrs;
  +#endif
  
   struct kvm_sregs sregs;
   struct kvm_msr_entry msrs[100];
   int rc, n, i;
  
  @@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
  
   kvm_set_regs(env, regs);
  
  +#ifdef KVM_CAP_XSAVE
  +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
  +uint16_t cwd, swd, twd, fop;
  +
  +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
  +memset(xsave, 0, sizeof(struct kvm_xsave));
  +cwd = swd = twd = fop = 0;
  +swd = env-fpus  ~(7  11);
  +swd |= (env-fpstt  7)  11;
  +cwd = env-fpuc;
  +for (i = 0; i  8; ++i)
  +twd |= (!env-fptags[i])  i;
  +xsave-region[0] = (uint32_t)(swd  16) + cwd;
  +xsave-region[1] = (uint32_t)(fop  16) + twd;
  +memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
  +sizeof env-fpregs);
  +memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
  +sizeof env-xmm_regs);
  +xsave-region[XSAVE_MXCSR] = env-mxcsr;
  +*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
  +memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
  +sizeof env-ymmh_regs);
  +kvm_set_xsave(env, xsave);
  +#ifdef KVM_CAP_XCRS
  +if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
  +xcrs.nr_xcrs = 1;
  +xcrs.flags = 0;
  +xcrs.xcrs[0].xcr = 0;
  +xcrs.xcrs[0].value = env-xcr0;
  +kvm_set_xcrs(env, xcrs);
  +}
  +#endif /* KVM_CAP_XCRS */
  +} else {
  +#endif /* KVM_CAP_XSAVE */
 
 Why not reusing kvm_put/get_xsave as defined for upstream? There should
 be enough examples for that pattern. The result will be a tiny qemu-kvm
 patch.

Still lots of codes in kvm_arch_load/save_regs() duplicate with ones in kvm.c, 
e.g. kvm_get/put_sregs, kvm_get/put_msrs. So would like to wait for merging.

--
regards
Yang, Sheng

 
 Jan
 
   memset(fpu, 0, sizeof fpu);
   fpu.fsw = env-fpus  ~(7  11);
   fpu.fsw |= (env-fpstt  7)  11;
   fpu.fcw = env-fpuc;
   for (i = 0; i  8; ++i)
  
  -   fpu.ftwx |= (!env-fptags[i])  i;
  +fpu.ftwx |= (!env-fptags[i])  i;
  
   memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
   memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
   fpu.mxcsr = env-mxcsr;
   kvm_set_fpu(env, fpu);
  
  +#ifdef KVM_CAP_XSAVE
  +}
  +#endif
  
   memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
   if (env-interrupt_injected = 0) {
  
  @@ -934,6 +987,12 @@ void kvm_arch_save_regs(CPUState *env)
  
   {
   
   struct kvm_regs regs;
   struct kvm_fpu fpu;
  
  +#ifdef KVM_CAP_XSAVE
  +struct kvm_xsave* xsave;
  +#endif
  +#ifdef KVM_CAP_XCRS
  +struct kvm_xcrs xcrs;
  +#endif
  
   struct kvm_sregs sregs;
   struct kvm_msr_entry msrs[100];
   uint32_t hflags;
  
  @@ -965,6 +1024,37 @@ void kvm_arch_save_regs(CPUState *env)
  
   env-eflags = regs.rflags;
   env-eip = regs.rip;
  
  +#ifdef KVM_CAP_XSAVE
  +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
  +uint16_t cwd, swd, twd, fop;
  +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
  +kvm_get_xsave(env, xsave);
  +cwd = (uint16_t)xsave-region[0];
  +swd = (uint16_t)(xsave-region[0]  16);
  +twd = (uint16_t)xsave-region[1];
  +fop = (uint16_t)(xsave-region[1]  16);
  +env-fpstt = (swd  11)  7;
  +env-fpus = swd;
  +env-fpuc = cwd;
  +for (i = 0; i  8; ++i)
  +env-fptags[i] = !((twd  i)  1);
  +env-mxcsr = xsave-region[XSAVE_MXCSR];
  +memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
  +sizeof env-fpregs);
  +

Re: [Qemu-devel] Re: [PATCH] monitor: Really show snapshot information about all devices

2010-06-17 Thread Kevin Wolf

Am 16.06.2010 17:57, schrieb Chris Lalancette:
 On 06/16/10 - 05:32:58PM, Kevin Wolf wrote:
 Am 16.06.2010 17:22, schrieb Chris Lalancette:
 On 06/16/10 - 03:15:11PM, Kevin Wolf wrote:
 Am 16.06.2010 14:59, schrieb Miguel Di Ciurcio Filho:
 On Wed, Jun 16, 2010 at 9:40 AM, Kevin Wolf kw...@redhat.com wrote:

 If the human monitor was exactly what its name says, I'd happily apply
 this one (though I think it should be made clear from which image the VM
 state would be loaded). However, it isn't and I'm not sure if this
 wouldn't break libvirt. Dan, can you help?


 I didn't mention in the commit, but I've looked at libvirt's source
 and it is not using 'info snapshots' AFAIK.

 Anthony, Dan, are you okay with the change then?

 Right, exactly as Miguel said, libvirt doesn't use info snapshots at all
 at the moment.  One of the reasons we don't use it at present is precisely
 because it doesn't give us information about all disks in-use.

 The other reason that we can't use info snapshots is that we need to know
 parent information about snapshots. That is, if you take a sequence of
 snapshots:

 A - B - C

 And then you delete B, the disk changes from B will be merged automatically
 into C to keep C a valid snapshot.  However, there is currently no way to
 discover this parent/child relationship, so we can't use info snapshots
 for that reason as well.

 Well, there is no parent/child relation in qcow2, so exposing this is
 going to be really hard. We also don't really need it anywhere in qemu.
 What would libvirt use this information for?
 
 I keep being told this, and I don't really understand how this is.  I know
 when I was heavily playing with this, the scenario above worked; that is, the
 deletion of snapshot B maintained a valid C snapshot.  If nothing is tracking
 the parent/child relationship, how does this work?

Clusters are refcounted. When you save a snapshot, the refcount for all
clusters in the current state is increased. When you delete it, the
refcount is decreased and only if it's zero the cluster is freed.

 As for how libvirt uses it, it's mostly to provide the ability for the user
 to keep track of a tree of snapshots.  So the user could do something like
 install their base OS, and take a snapshot S1.  Then they could install one 
 set
 of applications, and take a snapshot S2.  Now they can go back to the base
 image, install a different set of applications, and take a snapshot S3.
 Now both S2 and S3 are children of S1, and libvirt wants to be able to
 represent this relationship.

qemu doesn't even remember which snapshot you have loaded. Basically you
have one L1 table for active cluster allocations and you have another
one for each snapshot. When you load a snapshot, it just copies the L1
table to the active one (and adjusts refcounts).

So technically the concept of a snapshot tree doesn't exist with
internal snapshots. It's something that management introduces.

Kevin

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
  Sorry, I lost you here. What works for IO-based fw-cfg, but not for
  MMIO-based.
 
 Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
 you need to select a key that is different from both.
 
But can we rely on it? Is this defined somewhere or if it happens to be
the case in current qemu for x86 arch.

  Can you write pseudo logic of how you think it
  all should work?
 
 The firmware should do this:
 
 write(CTL_BASE, FW_CFG_ID);
 if (read(CTL_BASE) != FW_CFG_ID)
   deal_with_old_qemu();
 else
   check_for_supported_keys();
 
Ah, I thought about read() returning 0/1, not key itself, so any key that
always existed would do.

--
Gleb.

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
 On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
 Sorry, I lost you here. What works for IO-based fw-cfg, but not for
 MMIO-based.
 Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
 you need to select a key that is different from both.

 But can we rely on it? Is this defined somewhere or if it happens to be
 the case in current qemu for x86 arch.

For x86 with its port-based access, we are on the safe side as (pre-pnp)
device probing used to work this way. Can't tell for the other archs
that support fw-cfg.

 
 Can you write pseudo logic of how you think it
 all should work?
 The firmware should do this:

 write(CTL_BASE, FW_CFG_ID);
 if (read(CTL_BASE) != FW_CFG_ID)
  deal_with_old_qemu();
 else
  check_for_supported_keys();

 Ah, I thought about read() returning 0/1, not key itself, so any key that
 always existed would do.

Yes, read-back would mean returning FWCfgState::cur_entry. And that will
be -1 when selected an invalid one.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Gautham R Shenoy

On Wed, Jun 16, 2010 at 02:34:16PM +0200, Paolo Bonzini wrote:
 +block-obj-y += qemu-thread.o
 +block-obj-y += async-work.o

 These should be (at least for now) block-obj-$(CONFIG_POSIX).

Right. Will fix that.

 +while (QTAILQ_EMPTY((queue-request_list))
 +   (ret != ETIMEDOUT)) {
 +ret = qemu_cond_timedwait((queue-cond),
 +(queue-lock), 10*10);
 +}

 Using qemu_cond_timedwait is a hack for not properly broadcasting the 
 condvar in flush_threadlet_queue.

I think Anthony answered this one.

 +if (QTAILQ_EMPTY((queue-request_list)))
 +goto check_exit;

 What's the reason for the goto?  {...} works just as well.

Yes {...} works.

Besides, this two step condition checking is broken and can
cause the threads to exit even in the presence of unprocessed
queued ThreadletWork items.
Will fix this in the v5 (hopefully there will be one :-))


 +/**
 + * flush_threadlet_queue: Wait till completion of all the submitted tasks
 + * @queue: Queue containing the tasks we're waiting on.
 + */
 +void flush_threadlet_queue(ThreadletQueue *queue)
 +{
 +qemu_mutex_lock(queue-lock);
 +queue-exit = 1;
 +
 +qemu_barrier_init(queue-barr, queue-cur_threads + 1);
 +qemu_mutex_unlock(queue-lock);
 +
 +qemu_barrier_wait(queue-barr);

 Can be implemented just as well with queue-cond and a loop waiting for 
 queue-cur_threads == 0.  This would remove the need to implement barriers 
 in qemu-threads (especially for Win32).  Anyway whoever will contribute 
 Win32 qemu-threads can do it, since it's not hard.

That was the other option I had considered before going for barriers,
for no particular reason. Now, considering that barriers are not
welcome, I will implement this method.


 +int cancel_threadlet_common(ThreadletWork *work)
 +{
 +return cancel_threadlet(globalqueue, work);
 +}

 I would prefer *_threadlet to be the globalqueue function (and 
 flush_threadlets) and queue_*_threadlet to be the special-queue function. I 
 should have spoken earlier probably, but please consider this if there will 
 be a v5.

Sure, will do that.

 + * Generalization based on posix-aio emulation code.

 No need to specify these as long as the original authors are attributed 
 properly.

Ok!

 +static inline void threadlet_queue_init(ThreadletQueue *queue,
 +int max_threads, int min_threads)
 +{
 +queue-cur_threads  = 0;
 +queue-idle_threads = 0;
 +queue-exit = 0;
 +queue-max_threads  = max_threads;
 +queue-min_threads  = min_threads;
 +QTAILQ_INIT((queue-request_list));
 +QTAILQ_INIT((queue-threadlet_work_pool));
 +qemu_mutex_init((queue-lock));
 +qemu_cond_init((queue-cond));
 +}

 No need to make this inline.

Will fix this.

 +extern void threadlet_submit(ThreadletQueue *queue,
 +  ThreadletWork *work);
 +
 +extern void threadlet_submit_common(ThreadletWork *work);
 +
 +extern int cancel_threadlet(ThreadletQueue *queue, ThreadletWork *work);
 +extern int cancel_threadlet_common(ThreadletWork *work);
 +
 +
 +extern void flush_threadlet_queue(ThreadletQueue *queue);
 +extern void flush_common_threadlet_queue(void);

 Please make the position of the verb consistent (e.g. submit_threadlet).

Overlooked threadlet_submit() in the rename process. It has to be
submit_threadlet(). Will fix.

Thanks for the detailed review.
Regards
gautham.

 Paolo

[Qemu-devel] Re: [SeaBIOS] [PATCHv2] load hpet info for HPET ACPI table from qemu

2010-06-17 Thread Gleb Natapov

On Wed, Jun 16, 2010 at 09:22:09PM -0400, Kevin O'Connor wrote:
 On Tue, Jun 15, 2010 at 09:37:07AM +0300, Gleb Natapov wrote:
  On Mon, Jun 14, 2010 at 04:12:32PM -0400, Kevin O'Connor wrote:
   But.. in order to move to a newer ACPI spec, there would be qemu
   changes anyway.  (If nothing else, so that qemu can tell seabios if
   it's okay to use the new rev.)  At that point we're stuck changing
   both repos anyway - nothing gained, nothing lost.
  I don't see why qemu should care what ACPI rev Seabios uses.
 
 A change in ACPI rev would likely break old OSs.  Only the user would
In that case new ACPI would never be adopted. No HW manufacturer would
risk to not be able to run WindowsXP on their HW. Real BIOS may have
config option to choose what ACPI version to use though. We can add this
too.

 know this, and so a method of propagating that info from qemu to
 seabios would be needed.  (However, it's much more likely that a new
 ACPI rev would require more data which qemu would then also need to
 pass to seabios.)
Why do you think so? But anyway my position is that we need to pass
maximum information from qemu to firmware. On real HW bios knows
everything about underlying hardware.


 
   I still think there is an opportunity to reduce the load on the bulk
   of acpi changes - most of these changes have no dependence on seabios
   at all.
  That depends on how you view seabios project. If you consider it to be
  legacy bios functionality provider only then I agree and we should move
  to coreboot model. If you consider it to be legacy bios + qemu firmware
  (like old BOCHS bios was) then by definition it's seabios job to
  describe underlying HW to an OS.
 
 I don't think this is that cut and dry.  A real machine just ships
 with these acpi tables compiled in.  This is what BOCHS bios did and
 it is what seabios did up until about 8 months ago.
That was because qemu was stale project for a couple of years. Now pace of
qemu development is very fast, so the same is required from firmware
too. When qemu development started to accelerate BOCHS bios was
essentially forked to allow for faster development.

 
 However, qemu isn't a simple machine emulator - it can emulate a whole
 class of x86 machines.  It's not practical to compile a seabios.bin
 file for every permutation of x86 machine that qemu can emulate.  So,
 we pass info from qemu to seabios so that it can support all the
 classes of hardware.  This isn't what real machines do, and it's not
 what bochs bios did.
BOCHS bios didn't do it because when qemu development accelerated we
switched to seabios. I agree with paragraph above otherwise. We just not
agree in what form information should be passed. You think we should
pass HPET ACPI table (my guess is just because we already have a way to
pass ACPI table to seabios) and I think this is abuse of ACPI spec. fw cfg
interface was designed to be extendible, why oppose adding things to it?
It is not like if we build HPET table in qemu we will not have to patch
seabios and coordinate changes. Seabios creates HPET table
unconditionally now and we will have to fix it to not do that if HPET
table is passed from qemu (and for that seabios will have to expect all tables
that it receives over fw cfg interface, something it doesn't do now) and
it will have to detect old qemu somehow and create HPET unconditionally
to preserve old behaviour on old qemus. In the end the change to seabios
will be bigger that proposed patch.

 
 I do view SeaBIOS as primarily a legacy bios interface and a boot
 loader.  
This is worrying statement for qemu project.

  I also think it makes sense to handle qemu and kvm firmware
 needs - 
Good, but qemu needs are growing in the pace of qemu development and
this is fast these days.

  some initialization wants to be done from the guest and
 seabios is a good place to do that.
 
HW does not initialize itself. So the only sane place to do _all_
initialization is from guest.

 This hpet thing is really rather minor, but it has me puzzled.  The
 guest OS wants the info in ACPI form, and only qemu has the info.  I
 don't understand why there is a desire to pass the info in this new
 arbitrary form instead of passing it in the form that the OS wants it
 in.
Because OS does not talk directly to qemu. It has mediator in the form
of seabios. We have spec that define interface between seabios and an OS
(ACPI spec) and we define interface between seabios and qemu by ourselves.
Why intentionally blur this separation?

 
 A couple of emails back you stated you considered using the existing
 qemu_cfg_acpi_additional_tables() format but dismissed the idea.
 Maybe you could explain why you dismissed it and/or what the
 deficiencies of this mechanism are?
 
I dismissed it (very quickly) on the premiss that this is layering
violation. I saw that I need to specify value that qemu should have
nothing to do with to build header and to support old qemu with new
seabios I need to add new fw

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 10:42:34AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
  Sorry, I lost you here. What works for IO-based fw-cfg, but not for
  MMIO-based.
  Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
  you need to select a key that is different from both.
 
  But can we rely on it? Is this defined somewhere or if it happens to be
  the case in current qemu for x86 arch.
 
 For x86 with its port-based access, we are on the safe side as (pre-pnp)
 device probing used to work this way. Can't tell for the other archs
 that support fw-cfg.
 
  
  Can you write pseudo logic of how you think it
  all should work?
  The firmware should do this:
 
  write(CTL_BASE, FW_CFG_ID);
  if (read(CTL_BASE) != FW_CFG_ID)
 deal_with_old_qemu();
  else
 check_for_supported_keys();
 
  Ah, I thought about read() returning 0/1, not key itself, so any key that
  always existed would do.
 
 Yes, read-back would mean returning FWCfgState::cur_entry. And that will
 be -1 when selected an invalid one.
 
Heh, actually I have better idea. Why not advance FW_CFG_ID to version 2.

--
Gleb.

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
 On Thu, Jun 17, 2010 at 10:42:34AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
 Sorry, I lost you here. What works for IO-based fw-cfg, but not for
 MMIO-based.
 Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
 you need to select a key that is different from both.

 But can we rely on it? Is this defined somewhere or if it happens to be
 the case in current qemu for x86 arch.
 For x86 with its port-based access, we are on the safe side as (pre-pnp)
 device probing used to work this way. Can't tell for the other archs
 that support fw-cfg.

 Can you write pseudo logic of how you think it
 all should work?
 The firmware should do this:

 write(CTL_BASE, FW_CFG_ID);
 if (read(CTL_BASE) != FW_CFG_ID)
deal_with_old_qemu();
 else
check_for_supported_keys();

 Ah, I thought about read() returning 0/1, not key itself, so any key that
 always existed would do.
 Yes, read-back would mean returning FWCfgState::cur_entry. And that will
 be -1 when selected an invalid one.

 Heh, actually I have better idea. Why not advance FW_CFG_ID to version 2.

If that is supposed to be a version number - yeah, good idea.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  target-i386/cpu.h |7 ++-
  target-i386/kvm.c |  139 
 -
  target-i386/machine.c |   20 +++
  3 files changed, 163 insertions(+), 3 deletions(-)
 
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index 548ab80..680eed1 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -718,6 +718,11 @@ typedef struct CPUX86State {
  uint16_t fpus_vmstate;
  uint16_t fptag_vmstate;
  uint16_t fpregs_format_vmstate;
 +
 +uint64_t xstate_bv;
 +XMMReg ymmh_regs[CPU_NB_REGS];
 +
 +uint64_t xcr0;
  } CPUX86State;
  
  CPUX86State *cpu_x86_init(const char *cpu_model);
 @@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
  #define cpu_list_id x86_cpu_list
  #define cpudef_setup x86_cpudef_setup
  
 -#define CPU_SAVE_VERSION 11
 +#define CPU_SAVE_VERSION 12
  
  /* MMU modes definitions */
  #define MMU_MODE0_SUFFIX _kernel
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index bb6a12f..e490c0a 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
  } else {
  env-mp_state = KVM_MP_STATE_RUNNABLE;
  }
 +/* Legal xcr0 for loading */
 +env-xcr0 = 1;
  }
  
  static int kvm_has_msr_star(CPUState *env)
 @@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
  return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
  }
  
 +#ifdef KVM_CAP_XSAVE
 +#define XSAVE_CWD_RIP 2
 +#define XSAVE_CWD_RDP 4
 +#define XSAVE_MXCSR   6
 +#define XSAVE_ST_SPACE8
 +#define XSAVE_XMM_SPACE   40
 +#define XSAVE_XSTATE_BV   128
 +#define XSAVE_YMMH_SPACE  144
 +#endif
 +
 +static int kvm_put_xsave(CPUState *env)
 +{
 +#ifdef KVM_CAP_XSAVE
 +int i;
 +struct kvm_xsave* xsave;
 +uint16_t cwd, swd, twd, fop;
 +
 +if (kvm_check_extension(env-kvm_state, KVM_CAP_XSAVE))

That's still one syscall too much for this path (which will be a
fast-path for Kemari). Get that value during arch_init.

 +return kvm_put_fpu(env);
 +
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +memset(xsave, 0, sizeof(struct kvm_xsave));
 +cwd = swd = twd = fop = 0;
 +swd = env-fpus  ~(7  11);
 +swd |= (env-fpstt  7)  11;
 +cwd = env-fpuc;
 +for (i = 0; i  8; ++i)
 +twd |= (!env-fptags[i])  i;
 +xsave-region[0] = (uint32_t)(swd  16) + cwd;
 +xsave-region[1] = (uint32_t)(fop  16) + twd;
 +memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
 +sizeof env-fpregs);
 +memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
 +sizeof env-xmm_regs);
 +xsave-region[XSAVE_MXCSR] = env-mxcsr;
 +*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
 +memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
 +sizeof env-ymmh_regs);
 +return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
 +#else
 +return kvm_put_fpu(env);
 +#endif
 +}
 +
 +static int kvm_put_xcrs(CPUState *env)
 +{
 +#ifdef KVM_CAP_XCRS
 +struct kvm_xcrs xcrs;
 +
 +if (!kvm_check_extension(env-kvm_state, KVM_CAP_XCRS))
 +return 0;
 +
 +xcrs.nr_xcrs = 1;
 +xcrs.flags = 0;
 +xcrs.xcrs[0].xcr = 0;
 +xcrs.xcrs[0].value = env-xcr0;
 +return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
 +#else
 +return 0;
 +#endif
 +}
 +
  static int kvm_put_sregs(CPUState *env)
  {
  struct kvm_sregs sregs;
 @@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
  return 0;
  }
  
 +static int kvm_get_xsave(CPUState *env)
 +{
 +#ifdef KVM_CAP_XSAVE
 +struct kvm_xsave* xsave;
 +int ret, i;
 +uint16_t cwd, swd, twd, fop;
 +
 +if (!kvm_check_extension(env-kvm_state, KVM_CAP_XSAVE))
 +return kvm_get_fpu(env);
 +
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
 +if (ret  0)
 +return ret;
 +
 +cwd = (uint16_t)xsave-region[0];
 +swd = (uint16_t)(xsave-region[0]  16);
 +twd = (uint16_t)xsave-region[1];
 +fop = (uint16_t)(xsave-region[1]  16);
 +env-fpstt = (swd  11)  7;
 +env-fpus = swd;
 +env-fpuc = cwd;
 +for (i = 0; i  8; ++i)
 +env-fptags[i] = !((twd  i)  1);
 +env-mxcsr = xsave-region[XSAVE_MXCSR];
 +memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
 +sizeof env-fpregs);
 +memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
 +sizeof env-xmm_regs);
 +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
 +memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
 +sizeof env-ymmh_regs);
 +return 0;
 +#else
 +return kvm_get_fpu(env);
 +#endif
 +}
 +
 +static int kvm_get_xcrs(CPUState *env)
 +{
 +#ifdef KVM_CAP_XCRS
 +int i, ret;
 +struct kvm_xcrs xcrs;
 +
 +if (!kvm_check_extension(env-kvm_state, KVM_CAP_XCRS))
 +return 0;
 +
 +ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS,

Re: [Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Gautham R Shenoy

On Wed, Jun 16, 2010 at 10:20:36AM -0500, Anthony Liguori wrote:
 On 06/16/2010 09:52 AM, Paolo Bonzini wrote:
 BTW it's obviously okay with signaling the condition when a threadlet is 
 submitted.  But when something affects all queue's workers 
 (flush_threadlet_queue) you want a broadcast and using expiration as a 
 substitute is fishy.

 IMHO, there shouldn't be a need for flush_threadlet_queue.  It doesn't look 
 used in the aio conversion and if virtio-9p needs it, I suspect something 
 is wrong.

virtio-9p doesn't need it.

The API has been added for the vnc-server case, where a subsystem wants
to wait on the threads of it's private queue to finish executing the
already queued tasks. It's the responsibility of the subsystem to make sure
that new tasks are not submitted during this interval.

I sought clarification regarding this earlier,
http://lists.gnu.org/archive/html/qemu-devel/2010-06/msg01382.html

But now I am beginning to doubt I understood the use-case correctly.

 Regards,

 Anthony Liguori
-- 
Thanks and Regards
gautham

Re: [Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Gautham R Shenoy

On Wed, Jun 16, 2010 at 06:06:35PM +0200, Corentin Chary wrote:
 On Wed, Jun 16, 2010 at 5:52 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com wrote:
  On 06/16/2010 10:47 AM, Corentin Chary wrote:
 
  I would need something like flush_threadlet_queue for the vnc server.
  I need it in
  vnc_disconnect(), vnc_dpy_resize() and vnc_dpy_cpy() so wait (and/or
  abort) current
  encoding jobs.
 
 
  I'm not sure threadlets are the right thing for the VNC server.  The VNC
  server wants one dedicated thread.  Threadlets are a thread pool.  You could
  potentially use one thread per client but I doubt it would be worth it.
 
  At any rate, flushing the full queue is overkill.  You want to wait for your
  specific thread to terminate and you want to block execution until that
  happens.  IOW, you want to join the thread.
 
 
 Oh right, I should have read the changelog more carefully, it's a
 global queue now ...

Well, the APIs that allow the subsystems to create their own private
queues is still retained. But having read what Anthony mentioned, I
doubt if you would want to do that for a single helper thread :-)

 
 Thanks,
 -- 
 Corentin Chary
 http://xf.iksaif.net

-- 
Thanks and Regards
gautham

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 10:59:01AM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  On Thu, Jun 17, 2010 at 10:42:34AM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
  Sorry, I lost you here. What works for IO-based fw-cfg, but not for
  MMIO-based.
  Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
  you need to select a key that is different from both.
 
  But can we rely on it? Is this defined somewhere or if it happens to be
  the case in current qemu for x86 arch.
  For x86 with its port-based access, we are on the safe side as (pre-pnp)
  device probing used to work this way. Can't tell for the other archs
  that support fw-cfg.
 
  Can you write pseudo logic of how you think it
  all should work?
  The firmware should do this:
 
  write(CTL_BASE, FW_CFG_ID);
  if (read(CTL_BASE) != FW_CFG_ID)
   deal_with_old_qemu();
  else
   check_for_supported_keys();
 
  Ah, I thought about read() returning 0/1, not key itself, so any key that
  always existed would do.
  Yes, read-back would mean returning FWCfgState::cur_entry. And that will
  be -1 when selected an invalid one.
 
  Heh, actually I have better idea. Why not advance FW_CFG_ID to version 2.
 
 If that is supposed to be a version number - yeah, good idea.
 
That was the idea behind it. I just forgot it exists.

--
Gleb.

[Qemu-devel] Re: [PATCH 09/10] pci: set PCI multi-function bit appropriately.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:51PM +0900, Isaku Yamahata wrote:
 set PCI multi-function bit appropriately.
 
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
 
 ---
 changes v1 - v2:
 don't set header type register in configuration space.
 ---
  hw/pci.c |   25 +
  1 files changed, 25 insertions(+), 0 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 5316aa5..ee391dc 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -607,6 +607,30 @@ static void pci_init_wmask_bridge(PCIDevice *d)
  pci_set_word(d-wmask + PCI_BRIDGE_CONTROL, 0x);
  }
  
 +static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev)
 +{
 +uint8_t slot = PCI_SLOT(dev-devfn);
 +uint8_t func_max = 8;

enum or define would be better

 +uint8_t func;

If I understand correctly what this does, it goes over
other functions of the same device, and sets the MULTI_FUNCTION bit
for them if there's more than one function.
Instead, why don't we just set PCI_HEADER_TYPE_MULTI_FUNCTION
in relevant devices?

 +
 +for (func = 0; func  func_max; ++func) {
 +if (bus-devices[PCI_DEVFN(slot, func)]) {
 +break;
 +}
 +}
 +if (func == func_max) {
 +return;
 +}
 +

The above only works if the function is called before
device is added to bus.

 +for (func = 0; func  func_max; ++func) {
 +if (bus-devices[PCI_DEVFN(slot, func)]) {
 +bus-devices[PCI_DEVFN(slot, func)]-config[PCI_HEADER_TYPE] |=
 +PCI_HEADER_TYPE_MULTI_FUNCTION;
 +}
 +}
 +dev-config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;

Isn't the bit set above already?

 +}
 +
  static void pci_config_alloc(PCIDevice *pci_dev)
  {
  int config_size = pci_config_size(pci_dev);
 @@ -660,6 +684,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  if (is_bridge) {
  pci_init_wmask_bridge(pci_dev);
  }
 +pci_init_multifunction(bus, pci_dev);
  
  if (!config_read)
  config_read = pci_default_read_config;
 -- 
 1.6.6.1

[Qemu-devel] Re: [PATCH 10/10] pci: don't overwrite multi functio bit in pci header type.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:52PM +0900, Isaku Yamahata wrote:
 Don't overwrite pci header type.
 Otherwise, multi function bit which pci_init_header_type() sets
 appropriately is lost.
 Anyway PCI_HEADER_TYPE_NORMAL is zero, so it is unnecessary to zero
 which is already zero cleared.
 
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

All this churn will need quite a bit of testing.
Please record what was tested in the commit message.
If we are doing it, let's clean other registers which
sets registers to default values?

 ---
 changes v1 - v2:
 - set header type of bridge type in pci_bridge_initfn().
 - dropped ugly hunk in apb_pci.c.
 ---
  hw/ac97.c |1 -
  hw/acpi_piix4.c   |1 -
  hw/apb_pci.c  |2 --
  hw/grackle_pci.c  |1 -
  hw/ide/cmd646.c   |1 -
  hw/ide/piix.c |1 -
  hw/macio.c|1 -
  hw/ne2000.c   |1 -
  hw/openpic.c  |1 -
  hw/pcnet.c|1 -
  hw/piix4.c|3 +--
  hw/piix_pci.c |4 +---
  hw/prep_pci.c |1 -
  hw/rtl8139.c  |1 -
  hw/sun4u.c|1 -
  hw/unin_pci.c |4 
  hw/usb-uhci.c |1 -
  hw/vga-pci.c  |1 -
  hw/virtio-pci.c   |1 -
  hw/vmware_vga.c   |1 -
  hw/wdt_i6300esb.c |1 -
  21 files changed, 2 insertions(+), 28 deletions(-)
 
 diff --git a/hw/ac97.c b/hw/ac97.c
 index 4319bc8..d71072d 100644
 --- a/hw/ac97.c
 +++ b/hw/ac97.c
 @@ -1295,7 +1295,6 @@ static int ac97_initfn (PCIDevice *dev)
  c[PCI_REVISION_ID] = 0x01;  /* rid revision ro */
  c[PCI_CLASS_PROG] = 0x00;  /* pi programming interface ro */
  pci_config_set_class (c, PCI_CLASS_MULTIMEDIA_AUDIO); /* ro */
 -c[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; /* headtyp header type ro */
  
  /* TODO set when bar is registered. no need to override. */
  /* nabmar native audio mixer base address rw */
 diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
 index 8d1a628..bfa1d9a 100644
 --- a/hw/acpi_piix4.c
 +++ b/hw/acpi_piix4.c
 @@ -369,7 +369,6 @@ static int piix4_pm_initfn(PCIDevice *dev)
  pci_conf[0x08] = 0x03; // revision number
  pci_conf[0x09] = 0x00;
  pci_config_set_class(pci_conf, PCI_CLASS_BRIDGE_OTHER);
 -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
  pci_conf[0x3d] = 0x01; // interrupt pin 1
  
  pci_conf[0x40] = 0x01; /* PM io base read only bit */
 diff --git a/hw/apb_pci.c b/hw/apb_pci.c
 index a1c17b9..3b8eda3 100644
 --- a/hw/apb_pci.c
 +++ b/hw/apb_pci.c
 @@ -431,8 +431,6 @@ static int pbm_pci_host_init(PCIDevice *d)
   PCI_STATUS_FAST_BACK | PCI_STATUS_66MHZ |
   PCI_STATUS_DEVSEL_MEDIUM);
  pci_config_set_class(d-config, PCI_CLASS_BRIDGE_HOST);
 -pci_set_byte(d-config + PCI_HEADER_TYPE,
 - PCI_HEADER_TYPE_NORMAL);
  return 0;
  }
  
 diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
 index aa0c51b..b3a5f54 100644
 --- a/hw/grackle_pci.c
 +++ b/hw/grackle_pci.c
 @@ -126,7 +126,6 @@ static int grackle_pci_host_init(PCIDevice *d)
  d-config[0x08] = 0x00; // revision
  d-config[0x09] = 0x01;
  pci_config_set_class(d-config, PCI_CLASS_BRIDGE_HOST);
 -d-config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
  return 0;
  }
  
 diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
 index 559147f..756ee81 100644
 --- a/hw/ide/cmd646.c
 +++ b/hw/ide/cmd646.c
 @@ -240,7 +240,6 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
  pci_conf[PCI_CLASS_PROG] = 0x8f;
  
  pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
 -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
  
  pci_conf[0x51] = 0x04; // enable IDE0
  if (d-secondary) {
 diff --git a/hw/ide/piix.c b/hw/ide/piix.c
 index dad6e86..8817915 100644
 --- a/hw/ide/piix.c
 +++ b/hw/ide/piix.c
 @@ -122,7 +122,6 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
  
  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
  pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
 -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
  
  qemu_register_reset(piix3_reset, d);
  
 diff --git a/hw/macio.c b/hw/macio.c
 index e92e82a..789ca55 100644
 --- a/hw/macio.c
 +++ b/hw/macio.c
 @@ -110,7 +110,6 @@ void macio_init (PCIBus *bus, int device_id, int 
 is_oldworld, int pic_mem_index,
  pci_config_set_vendor_id(d-config, PCI_VENDOR_ID_APPLE);
  pci_config_set_device_id(d-config, device_id);
  pci_config_set_class(d-config, PCI_CLASS_OTHERS  8);
 -d-config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
  
  d-config[0x3d] = 0x01; // interrupt on pin 1
  
 diff --git a/hw/ne2000.c b/hw/ne2000.c
 index 78fe14f..126e7cf 100644
 --- a/hw/ne2000.c
 +++ b/hw/ne2000.c
 @@ -723,7 +723,6 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
  pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REALTEK);
  pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REALTEK_8029);

[Qemu-devel] [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 kvm-all.c |   21 +++
 kvm.h |2 +
 target-i386/cpu.h |7 ++-
 target-i386/kvm.c |  139 -
 target-i386/machine.c |   20 +++
 5 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 43704b8..343c06e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -71,6 +71,7 @@ struct KVMState
 #endif
 int irqchip_in_kernel;
 int pit_in_kernel;
+int xsave, xcrs;
 };
 
 static KVMState *kvm_state;
@@ -685,6 +686,16 @@ int kvm_init(int smp_cpus)
 s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
 #endif
 
+s-xsave = 0;
+#ifdef KVM_CAP_XSAVE
+s-xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
+#endif
+
+s-xcrs = 0;
+#ifdef KVM_CAP_XCRS
+s-xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
+#endif
+
 ret = kvm_arch_init(s, smp_cpus);
 if (ret  0)
 goto err;
@@ -1013,6 +1024,16 @@ int kvm_has_debugregs(void)
 return kvm_state-debugregs;
 }
 
+int kvm_has_xsave(void)
+{
+return kvm_state-xsave;
+}
+
+int kvm_has_xcrs(void)
+{
+return kvm_state-xcrs;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm.h b/kvm.h
index 7975e87..50c4192 100644
--- a/kvm.h
+++ b/kvm.h
@@ -41,6 +41,8 @@ int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
+int kvm_has_xsave(void);
+int kvm_has_xcrs(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 548ab80..680eed1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -718,6 +718,11 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
+
+uint64_t xstate_bv;
+XMMReg ymmh_regs[CPU_NB_REGS];
+
+uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define cpu_list_id x86_cpu_list
 #define cpudef_setup   x86_cpudef_setup
 
-#define CPU_SAVE_VERSION 11
+#define CPU_SAVE_VERSION 12
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6a12f..db1f21d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
 } else {
 env-mp_state = KVM_MP_STATE_RUNNABLE;
 }
+/* Legal xcr0 for loading */
+env-xcr0 = 1;
 }
 
 static int kvm_has_msr_star(CPUState *env)
@@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
 }
 
+#ifdef KVM_CAP_XSAVE
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+#endif
+
+static int kvm_put_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+int i;
+struct kvm_xsave* xsave;
+uint16_t cwd, swd, twd, fop;
+
+if (!kvm_has_xsave())
+return kvm_put_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env-fpus  ~(7  11);
+swd |= (env-fpstt  7)  11;
+cwd = env-fpuc;
+for (i = 0; i  8; ++i)
+twd |= (!env-fptags[i])  i;
+xsave-region[0] = (uint32_t)(swd  16) + cwd;
+xsave-region[1] = (uint32_t)(fop  16) + twd;
+memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
+sizeof env-fpregs);
+memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
+sizeof env-xmm_regs);
+xsave-region[XSAVE_MXCSR] = env-mxcsr;
+*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
+memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
+sizeof env-ymmh_regs);
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+#else
+return kvm_put_fpu(env);
+#endif
+}
+
+static int kvm_put_xcrs(CPUState *env)
+{
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+
+if (!kvm_has_xcrs())
+return 0;
+
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env-xcr0;
+return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
+#else
+return 0;
+#endif
+}
+
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+static int kvm_get_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+int ret, i;
+uint16_t cwd, swd, twd, fop;
+
+if (!kvm_has_xsave())
+return kvm_get_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+if (ret  0)
+return ret;
+
+cwd = (uint16_t)xsave-region[0];
+swd =

[Qemu-devel] Re: [PATCH 04/10] pci_bridge: introduce pci bridge layer.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:46PM +0900, Isaku Yamahata wrote:
 introduce pci bridge layer.
 export pci_bridge_write_config() for generic use.
 support device reset and bus reset of bridge control.
 convert apb bridge and dec p2p bridge to use new pci bridge layer.
 save/restore is supported as a side effect.
 
 This might be a bit over engineering, but this is also preparation
 for pci express root/upstream/downstream port.
 
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

Well, preparations are easier to judge with patches that use them.

 ---
  hw/apb_pci.c|   38 +-
  hw/dec_pci.c|   28 +++---
  hw/pci_bridge.c |  146 
 +--
  hw/pci_bridge.h |   35 -
  qemu-common.h   |1 +
  5 files changed, 177 insertions(+), 71 deletions(-)
 
 diff --git a/hw/apb_pci.c b/hw/apb_pci.c
 index c11d9b5..cb9051b 100644
 --- a/hw/apb_pci.c
 +++ b/hw/apb_pci.c
 @@ -31,6 +31,7 @@
  #include pci_host.h
  #include pci_bridge.h
  #include rwhandler.h
 +#include pci_bridge.h
  #include apb_pci.h
  #include sysemu.h
  
 @@ -294,9 +295,12 @@ static void pci_apb_set_irq(void *opaque, int irq_num, 
 int level)
  }
  }
  
 -static void apb_pci_bridge_init(PCIBus *b)
 +static int apb_pci_bridge_init(PCIBridge *br)
  {
 -PCIDevice *dev = pci_bridge_get_device(b);
 +PCIDevice *dev = br-dev;
 +
 +pci_config_set_vendor_id(dev-config, PCI_VENDOR_ID_SUN);
 +pci_config_set_device_id(dev-config, PCI_DEVICE_ID_SUN_SIMBA);
  
  /*
   * command register:
 @@ -316,6 +320,8 @@ static void apb_pci_bridge_init(PCIBus *b)
  pci_set_byte(dev-config + PCI_HEADER_TYPE,
   pci_get_byte(dev-config + PCI_HEADER_TYPE) |
   PCI_HEADER_TYPE_MULTI_FUNCTION);
 +
 +return 0;
  }
  
  PCIBus *pci_apb_init(target_phys_addr_t special_base,
 @@ -326,6 +332,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
  SysBusDevice *s;
  APBState *d;
  unsigned int i;
 +PCIBridge *br;
  
  /* Ultrasparc PBM main bus */
  dev = qdev_create(NULL, pbm);
 @@ -351,17 +358,13 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
  pci_create_simple(d-bus, 0, pbm);
  
  /* APB secondary busses */
 -*bus2 = pci_bridge_init(d-bus, PCI_DEVFN(1, 0),
 -PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
 -pci_apb_map_irq,
 -Advanced PCI Bus secondary bridge 1);
 -apb_pci_bridge_init(*bus2);
 -
 -*bus3 = pci_bridge_init(d-bus, PCI_DEVFN(1, 1),
 -PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
 -pci_apb_map_irq,
 -Advanced PCI Bus secondary bridge 2);
 -apb_pci_bridge_init(*bus3);
 +br = pci_bridge_create_simple(d-bus, PCI_DEVFN(1, 0), pbm-bridge,
 +  Advanced PCI Bus secondary bridge 1);
 +*bus2 = pci_bridge_get_sec_bus(br);
 +
 +br = pci_bridge_create_simple(d-bus, PCI_DEVFN(1, 1), pbm-bridge,
 +  Advanced PCI Bus secondary bridge 2);
 +*bus3 = pci_bridge_get_sec_bus(br);
  
  return d-bus;
  }
 @@ -446,10 +449,19 @@ static SysBusDeviceInfo pbm_host_info = {
  .qdev.reset = pci_pbm_reset,
  .init = pci_pbm_init_device,
  };
 +
 +static PCIBridgeInfo pbm_pci_bridge_info = {
 +.pci.qdev.name = pbm-bridge,
 +.pci.qdev.vmsd = vmstate_pci_device,
 +.init = apb_pci_bridge_init,
 +.map_irq = pci_apb_map_irq,
 +};
 +
  static void pbm_register_devices(void)
  {
  sysbus_register_withprop(pbm_host_info);
  pci_qdev_register(pbm_pci_host_info);
 +pci_bridge_qdev_register(pbm_pci_bridge_info);
  }
  
  device_init(pbm_register_devices)
 diff --git a/hw/dec_pci.c b/hw/dec_pci.c
 index b2759dd..45b5c28 100644
 --- a/hw/dec_pci.c
 +++ b/hw/dec_pci.c
 @@ -49,18 +49,27 @@ static int dec_map_irq(PCIDevice *pci_dev, int irq_num)
  return irq_num;
  }
  
 -PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
 +static int dec_21154_initfn(PCIBridge *br)
  {
 -DeviceState *dev;
 -PCIBus *ret;
 +pci_config_set_vendor_id(br-dev.config, PCI_VENDOR_ID_DEC);
 +pci_config_set_device_id(br-dev.config, PCI_DEVICE_ID_DEC_21154);
 +return 0;
 +}
  
 -dev = qdev_create(NULL, dec-21154);
 -qdev_init_nofail(dev);
 -ret = pci_bridge_init(parent_bus, devfn,
 -  PCI_VENDOR_ID_DEC, PCI_DEVICE_ID_DEC_21154,
 -  dec_map_irq, DEC 21154 PCI-PCI bridge);
 +static PCIBridgeInfo dec_21154_pci_bridge_info = {
 +.pci.qdev.name = dec-21154-p2p-bridge,
 +.pci.qdev.desc = DEC 21154 PCI-PCI bridge,
 +.pci.qdev.vmsd = vmstate_pci_device,
 +.init = dec_21154_initfn,
 +.map_irq = dec_map_irq,
 +};
  
 -return ret;
 +PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
 +{
 +PCIBridge *br;
 +br =

[Qemu-devel] [PATCH] qemu-kvm: Replace kvm_set/get_fpu() with upstream version.

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang sh...@linux.intel.com
---

Would send out XSAVE patch after the upstream ones have been merged, since the
patch would be affected by the merge.

 qemu-kvm-x86.c|   23 ++-
 qemu-kvm.c|   10 --
 qemu-kvm.h|   30 --
 target-i386/kvm.c |5 -
 4 files changed, 6 insertions(+), 62 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 3c33e64..49218ae 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -775,7 +775,6 @@ static void get_seg(SegmentCache *lhs, const struct 
kvm_segment *rhs)
 void kvm_arch_load_regs(CPUState *env, int level)
 {
 struct kvm_regs regs;
-struct kvm_fpu fpu;
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 int rc, n, i;
@@ -806,16 +805,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 
 kvm_set_regs(env, regs);
 
-memset(fpu, 0, sizeof fpu);
-fpu.fsw = env-fpus  ~(7  11);
-fpu.fsw |= (env-fpstt  7)  11;
-fpu.fcw = env-fpuc;
-for (i = 0; i  8; ++i)
-   fpu.ftwx |= (!env-fptags[i])  i;
-memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
-memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
-fpu.mxcsr = env-mxcsr;
-kvm_set_fpu(env, fpu);
+kvm_put_fpu(env);
 
 memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
 if (env-interrupt_injected = 0) {
@@ -933,7 +923,6 @@ void kvm_arch_load_regs(CPUState *env, int level)
 void kvm_arch_save_regs(CPUState *env)
 {
 struct kvm_regs regs;
-struct kvm_fpu fpu;
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 uint32_t hflags;
@@ -965,15 +954,7 @@ void kvm_arch_save_regs(CPUState *env)
 env-eflags = regs.rflags;
 env-eip = regs.rip;
 
-kvm_get_fpu(env, fpu);
-env-fpstt = (fpu.fsw  11)  7;
-env-fpus = fpu.fsw;
-env-fpuc = fpu.fcw;
-for (i = 0; i  8; ++i)
-   env-fptags[i] = !((fpu.ftwx  i)  1);
-memcpy(env-fpregs, fpu.fpr, sizeof env-fpregs);
-memcpy(env-xmm_regs, fpu.xmm, sizeof env-xmm_regs);
-env-mxcsr = fpu.mxcsr;
+kvm_get_fpu(env);
 
 kvm_get_sregs(env, sregs);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 96d458c..114cb5e 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -461,16 +461,6 @@ int kvm_set_regs(CPUState *env, struct kvm_regs *regs)
 return kvm_vcpu_ioctl(env, KVM_SET_REGS, regs);
 }
 
-int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu)
-{
-return kvm_vcpu_ioctl(env, KVM_GET_FPU, fpu);
-}
-
-int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu)
-{
-return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
-}
-
 int kvm_get_sregs(CPUState *env, struct kvm_sregs *sregs)
 {
 return kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs);
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6f6c6d8..ebe7893 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -222,36 +222,6 @@ int kvm_get_regs(CPUState *env, struct kvm_regs *regs);
  * \return 0 on success
  */
 int kvm_set_regs(CPUState *env, struct kvm_regs *regs);
-/*!
- * \brief Read VCPU fpu registers
- *
- * This gets the FPU registers from the VCPU and outputs them
- * into a kvm_fpu structure
- *
- * \note This function returns a \b copy of the VCPUs registers.\n
- * If you wish to modify the VCPU FPU registers, you should call kvm_set_fpu()
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which virtual CPU should get dumped
- * \param fpu Pointer to a kvm_fpu which will be populated with the VCPUs
- * fpu registers values
- * \return 0 on success
- */
-int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu);
-
-/*!
- * \brief Write VCPU fpu registers
- *
- * This sets the FPU registers on the VCPU from a kvm_fpu structure
- *
- * \note When this function returns, the fpu pointer and the data it points to
- * can be discarded
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which virtual CPU should get dumped
- * \param fpu Pointer to a kvm_fpu which holds the new vcpu fpu state
- * \return 0 on success
- */
-int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu);
 
 /*!
  * \brief Read VCPU system registers
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9cb9cf4..9c13f62 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -488,6 +488,7 @@ static int kvm_getput_regs(CPUState *env, int set)
 
 return ret;
 }
+#endif /* KVM_UPSTREAM */
 
 static int kvm_put_fpu(CPUState *env)
 {
@@ -507,6 +508,7 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
 }
 
+#ifdef KVM_UPSTREAM
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -605,7 +607,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data);
 
 }
-
+#endif /* KVM_UPSTREAM */
 
 static int kvm_get_fpu(CPUState *env)
 {
@@ -628,6 +630,7 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+#ifdef KVM_UPSTREAM
 static int kvm_get_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
-- 
1.7.0.1

[Qemu-devel] Re: [PATCH 00/10] pci: pci to pci bridge clean up and enhancement

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:42PM +0900, Isaku Yamahata wrote:
 This patch series cleans up pci to pci bridge layer by introducing
 pci bridge layer. and some bug fixes.
 Although pci bridge implementation would belong to pci.c,
 I split it out into pci_bridge.c because pci.c is already big enough.
 
 This might seem over engineering, but it's also a preparation for
 pci express root/upstream/downstream port emulators.
 Those express ports are similar, but different from each other.
 So new pci bridge layer helps here.
 Once this patch series is merged, the express ports patch will follow.

A separate patchset with just bugfixes and cleanups, would be easier to
merge.


For example, forcing all devices to call pci_reset_default
in their reset routines does not look like a good cleanup:
the less boilerplate, the better IMO. New APIs seem undocumented
and it is not obvious what they do. For example, what does
qdev_reset do?  Adding more callbacks
also does not make me very happy, they are hard to follow
and to debug. Maybe it would be better to look at the bridge layer when
we see how it helps pci express. But it would be even better IMO to avoid
layers, and just export some common functions that can be reused,
without forcing all devices to go through them.

 Isaku Yamahata (10):
   pci_bridge: split out pci bridge code into pci_bridge.c from pci.c
   qdev: export qdev_reset() for later use.
   pci: fix pci_bus_reset() with 64bit BAR and several clean ups.
   pci_bridge: introduce pci bridge layer.
   pci bridge: add helper function for ssvid capability.
   pci: eliminate work around in pci_device_reset().
   pci: fix pci domain registering.
   pci: remove PCIDeviceInfo::header_type
   pci: set PCI multi-function bit appropriately.
   pci: don't overwrite multi functio bit in pci header type.
 
  Makefile.objs |2 +-
  hw/ac97.c |1 -
  hw/acpi_piix4.c   |1 -
  hw/apb_pci.c  |   43 -
  hw/dec_pci.c  |   31 ++---
  hw/e1000.c|1 +
  hw/grackle_pci.c  |1 -
  hw/ide/cmd646.c   |1 -
  hw/ide/piix.c |1 -
  hw/lsi53c895a.c   |2 +
  hw/macio.c|1 -
  hw/ne2000.c   |1 -
  hw/openpic.c  |1 -
  hw/pci.c  |  194 
 +++--
  hw/pci.h  |   22 +-
  hw/pci_bridge.c   |  188 +++
  hw/pci_bridge.h   |   71 +++
  hw/pcnet.c|2 +-
  hw/piix4.c|3 +-
  hw/piix_pci.c |5 +-
  hw/prep_pci.c |1 -
  hw/qdev.c |   13 +++-
  hw/qdev.h |1 +
  hw/rtl8139.c  |3 +-
  hw/sun4u.c|1 -
  hw/unin_pci.c |4 -
  hw/usb-uhci.c |1 -
  hw/vga-pci.c  |1 -
  hw/virtio-pci.c   |2 +-
  hw/vmware_vga.c   |1 -
  hw/wdt_i6300esb.c |1 -
  qemu-common.h |1 +
  32 files changed, 430 insertions(+), 172 deletions(-)
  create mode 100644 hw/pci_bridge.c
  create mode 100644 hw/pci_bridge.h

[Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Paolo Bonzini


+while (QTAILQ_EMPTY((queue-request_list))
+   (ret != ETIMEDOUT)) {
+ret = qemu_cond_timedwait((queue-cond),
+   (queue-lock), 10*10);
+}


Using qemu_cond_timedwait is a hack for not properly broadcasting the
condvar in flush_threadlet_queue.


I think Anthony answered this one.


I think he said that the code has been changed so I am right? :)


+/**
+ * flush_threadlet_queue: Wait till completion of all the submitted tasks
+ * @queue: Queue containing the tasks we're waiting on.
+ */
+void flush_threadlet_queue(ThreadletQueue *queue)
+{
+qemu_mutex_lock(queue-lock);
+queue-exit = 1;
+
+qemu_barrier_init(queue-barr, queue-cur_threads + 1);
+qemu_mutex_unlock(queue-lock);
+
+qemu_barrier_wait(queue-barr);


Can be implemented just as well with queue-cond and a loop waiting for
queue-cur_threads == 0.  This would remove the need to implement barriers
in qemu-threads (especially for Win32).  Anyway whoever will contribute
Win32 qemu-threads can do it, since it's not hard.


That was the other option I had considered before going for barriers,
for no particular reason. Now, considering that barriers are not
welcome, I will implement this method.


I guess we decided flush isn't really useful at all.  Might as well 
leave it out of v5 and implement it later, so the barrier and 
complicated exit condition are now unnecessary.


Thanks,

Paolo

[Qemu-devel] Re: [PATCH 02/10] qdev: export qdev_reset() for later use.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:44PM +0900, Isaku Yamahata wrote:
 export qdev_reset() for later use.
 
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
 ---
  hw/qdev.c |   13 +
  hw/qdev.h |1 +
  2 files changed, 10 insertions(+), 4 deletions(-)
 
 diff --git a/hw/qdev.c b/hw/qdev.c
 index 61f999c..378f842 100644
 --- a/hw/qdev.c
 +++ b/hw/qdev.c
 @@ -256,13 +256,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
  return qdev;
  }
  
 -static void qdev_reset(void *opaque)
 +void qdev_reset(DeviceState *dev)

What does this API do? Yes, I see that it invokes
the reset callback internally. But what does it do
that the caller wants? After all, the callback
gets invoked on reset directly.


  {
 -DeviceState *dev = opaque;
  if (dev-info-reset)
  dev-info-reset(dev);
  }
  
 +static void qdev_reset_fn(void *opaque)
 +{
 +DeviceState *dev = opaque;
 +qdev_reset(dev);
 +}
 +
  /* Initialize a device.  Device properties should be set before calling
 this function.  IRQs and MMIO regions should be connected/mapped after
 calling this function.
 @@ -278,7 +283,7 @@ int qdev_init(DeviceState *dev)
  qdev_free(dev);
  return rc;
  }
 -qemu_register_reset(qdev_reset, dev);
 +qemu_register_reset(qdev_reset_fn, dev);
  if (dev-info-vmsd) {
  vmstate_register_with_alias_id(-1, dev-info-vmsd, dev,
 dev-instance_id_alias,
 @@ -348,7 +353,7 @@ void qdev_free(DeviceState *dev)
  if (dev-opts)
  qemu_opts_del(dev-opts);
  }
 -qemu_unregister_reset(qdev_reset, dev);
 +qemu_unregister_reset(qdev_reset_fn, dev);
  QLIST_REMOVE(dev, sibling);
  for (prop = dev-info-props; prop  prop-name; prop++) {
  if (prop-info-free) {
 diff --git a/hw/qdev.h b/hw/qdev.h
 index be5ad67..5fbdebf 100644
 --- a/hw/qdev.h
 +++ b/hw/qdev.h
 @@ -113,6 +113,7 @@ typedef struct GlobalProperty {
  DeviceState *qdev_create(BusState *bus, const char *name);
  int qdev_device_help(QemuOpts *opts);
  DeviceState *qdev_device_add(QemuOpts *opts);
 +void qdev_reset(DeviceState *dev);
  int qdev_init(DeviceState *dev) QEMU_WARN_UNUSED_RESULT;
  void qdev_init_nofail(DeviceState *dev);
  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
 -- 
 1.6.6.1

[Qemu-devel] Re: [PATCH 05/10] pci bridge: add helper function for ssvid capability.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:47PM +0900, Isaku Yamahata wrote:
 helper function to add ssvid capability.
 
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

But .. this is unused?

 ---
  hw/pci_bridge.c |   20 
  hw/pci_bridge.h |3 +++
  2 files changed, 23 insertions(+), 0 deletions(-)
 
 diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
 index 43c21d4..1397a11 100644
 --- a/hw/pci_bridge.c
 +++ b/hw/pci_bridge.c
 @@ -29,6 +29,26 @@
  
  #include pci_bridge.h
  
 +/* PCI bridge subsystem vendor ID helper functions */
 +#define PCI_SSVID_SIZEOF8
 +#define PCI_SSVID_SVID  4
 +#define PCI_SSVID_SSID  6
 +
 +int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
 +  uint16_t svid, uint16_t ssid)
 +{
 +int pos;
 +pos = pci_add_capability_at_offset(dev, PCI_CAP_ID_SSVID,
 +   offset, PCI_SSVID_SIZEOF);
 +if (pos  0) {
 +return pos;
 +}
 +
 +pci_set_word(dev-config + pos + PCI_SSVID_SVID, svid);
 +pci_set_word(dev-config + pos + PCI_SSVID_SSID, ssid);
 +return pos;
 +}
 +
  void pci_bridge_write_config(PCIDevice *d,
   uint32_t address, uint32_t val, int len)
  {
 diff --git a/hw/pci_bridge.h b/hw/pci_bridge.h
 index 2747e7f..a1f160b 100644
 --- a/hw/pci_bridge.h
 +++ b/hw/pci_bridge.h
 @@ -23,6 +23,9 @@
  
  #include pci.h
  
 +int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
 +  uint16_t svid, uint16_t ssid);
 +
  struct PCIBridge {
  PCIDevice dev;
  
 -- 
 1.6.6.1

Re: [Qemu-devel] RFC v3: blockdev_add friends, brief rationale, QMP docs

2010-06-17 Thread Stefan Hajnoczi

On Wed, Jun 16, 2010 at 6:27 PM, Markus Armbruster arm...@redhat.com wrote:
 blockdev_add
 

 Add host block device.

 Arguments:

 - id: the host block device's ID, must be unique (json-string)
 - format: image format (json-string, optional)
    - Possible values: raw, qcow2, ...

What is the default when unset?  (I expect we'll auto-detect the
format but this should be documented.)

 - protocol: image access protocol (json-string, optional)
    - Possible values: auto, file, nbd, ...

The semantics of auto are not documented here.

 Notes:

 (1) If argument protocol is missing, all other optional arguments must
    be missing as well.  This defines a block device with no media
    inserted.

Perhaps this is what auto means?

 (2) It's possible to list supported disk formats and protocols by
    running QEMU with arguments -blockdev_add \?.

Is there an query-block-driver command or something in QMP to
enumerate supported formats and protocols?  Not sure how useful this
would be to the management stack - blockdev_add will probably return
an error if an attempt is made to open an unsupported file.

 blockdev_del
 

 Remove a host block device.

 Arguments:

 - id: the host block device's ID (json-string)

 Example:

 - { execute: blockdev_del, arguments: { id: blk1 } }
 - { return: {} }

What about an attached guest device?  Will this fail if the virtio-blk
PCI device is still present?  For SCSI I imagine we can usually just
remove the host block device.  For IDE there isn't hotplug support
AFAIK, what happens?

Stefan

[Qemu-devel] Re: [PATCH 5/5] linux fbdev display driver.

2010-06-17 Thread Gerd Hoffmann


  Hi,


+static void fbdev_free_displaysurface(DisplaySurface *surface)
+{
+if (surface == NULL)
+return;
+
+if (surface-flags  QEMU_ALLOCATED_FLAG) {
+qemu_free(surface-data);
+}
+
+surface-data = NULL;


This is pretty pointless ...


+qemu_free(surface);


... as you free surface anyway ;)


@@ -910,7 +959,17 @@ void fbdev_display_init(DisplayState *ds, const char 
*device)
  dcl-dpy_update  = fbdev_update;
  dcl-dpy_resize  = fbdev_resize;
  dcl-dpy_refresh = fbdev_refresh;
+dcl-dpy_setdata = fbdev_setdata;
  register_displaychangelistener(ds, dcl);
+
+da = qemu_mallocz(sizeof (DisplayAllocator));
+da-create_displaysurface = fbdev_create_displaysurface;
+da-resize_displaysurface = fbdev_resize_displaysurface;
+da-free_displaysurface = fbdev_free_displaysurface;
+
+if (register_displayallocator(ds, da) == da) {
+dpy_resize(ds);
+}


You register the display allocator, but don't unregister in 
fbdev_display_uninit().


You are just lucky that fbdev_cleanup() forgets to unmap the framebuffer.

Apply the attached fix, start qemu with vnc, then do change fbdev on 
and change fbdev off in the monitor and watch qemu segfault.


Also after change fbdev on the guest screen isn't rendered correctly.

cheers,
  Gerd

From 685849ae48eaef7927b90e012fb6afb4494052d0 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann kra...@redhat.com
Date: Thu, 17 Jun 2010 12:32:53 +0200
Subject: [PATCH] fbdev: unmap framebuffer on cleanup

---
 fbdev.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fbdev.c b/fbdev.c
index 6623a4f..1a95ede 100644
--- a/fbdev.c
+++ b/fbdev.c
@@ -518,6 +518,10 @@ static void fbdev_cleanup(void)
 fprintf(stderr, %s\n, __FUNCTION__);
 
 /* restore console */
+if (fb_mem != NULL) {
+munmap(fb_mem, fb_fix.smem_len+fb_mem_offset);
+fb_mem = NULL;
+}
 if (fb != -1) {
 if (ioctl(fb,FBIOPUT_VSCREENINFO, fb_ovar)  0)
 perror(ioctl FBIOPUT_VSCREENINFO);
-- 
1.6.5.2

[Qemu-devel] [Bug 595438] Re: KVM segmentation fault, using SCSI+writeback and linux 2.4 guest

2010-06-17 Thread Коренберг Марк

** Summary changed:

- segmentation  scsi writeback
+ KVM segmentation fault, using SCSI+writeback and linux 2.4 guest

-- 
KVM segmentation fault, using SCSI+writeback and linux 2.4 guest
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [Bug 595438] Re: segmentation scsi writeback

2010-06-17 Thread Коренберг Марк

Bug 100% reproductible (on this, and on other machine with different
processor)

core dump (bzip2) attached


** Attachment added: core dump
   http://launchpadlibrarian.net/50482028/core.bz2

-- 
segmentation  scsi writeback
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [Bug 595438] [NEW] segmentation scsi writeback

2010-06-17 Thread Коренберг Марк

Public bug reported:

I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system.
During install process (when .tar.gz actively unpacked), kvm dead with
segmentation fault.

And ONLY when scsi virtual disk and writeback simultaneously.
writeback+ide, writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when
qcow2 file size need to be expanded.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
segmentation  scsi writeback
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [Bug 595438] Re: segmentation scsi writeback

2010-06-17 Thread Коренберг Марк

do not fuck me about 'spamsender' machine name. I never send spam. it's
just our mail server :)

-- 
segmentation  scsi writeback
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [PATCH 0/8] seabios: pci: multi pci bus support

2010-06-17 Thread Isaku Yamahata

This patch set allows seabios to initialize multi pci bus and 64bit BAR.

Currently seabios is able to initialize only pci root bus.
However multi pci bus support is wanted because
  - more pci bus is wanted in qemu for many slots
  - pci express support is commin in qemu which requires multi pci bus.
those patches on Qemu part are under way, though.

Isaku Yamahata (8):
  seabios: pci: introduce foreachpci_in_bus() helper macro.
  seabios: pciinit: factor out pci bar region allocation logic.
  seabios: pciinit: make pci memory space assignment 64bit aware.
  seabios: pciinit: make pci bar assigner preferchable memory aware.
  seabios: pciinit: factor out bar offset calculation.
  seabios: pciinit: make bar offset calculation pci bridge aware.
  seabios: pciinit: pci bridge bus initialization.
  seabios: pciinit: initialize pci bridge filtering registers.

 src/pci.c |   30 ++
 src/pci.h |   11 ++
 src/pciinit.c |  310 
 3 files changed, 306 insertions(+), 45 deletions(-)

[Qemu-devel] [PATCH 7/8] seabios: pciinit: pci bridge bus initialization.

2010-06-17 Thread Isaku Yamahata

pci bridge bus initialization.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |   70 +
 1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 1c2c8c6..fe6848a 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -220,6 +220,74 @@ static void pci_bios_init_device(u16 bdf)
 }
 }
 
+static void
+pci_bios_init_bus_rec(int bus, u8 *pci_bus)
+{
+int devfn, bdf;
+u16 class;
+
+dprintf(1, PCI: %s bus = 0x%x\n, __func__, bus);
+
+/* prevent accidental access to unintended devices */
+foreachpci_in_bus(bus, devfn, bdf) {
+class = pci_config_readw(bdf, PCI_CLASS_DEVICE);
+if (class == PCI_CLASS_BRIDGE_PCI) {
+pci_config_writeb(bdf, PCI_SECONDARY_BUS, 255);
+pci_config_writeb(bdf, PCI_SUBORDINATE_BUS, 0);
+}
+}
+
+foreachpci_in_bus(bus, devfn, bdf) {
+class = pci_config_readw(bdf, PCI_CLASS_DEVICE);
+if (class != PCI_CLASS_BRIDGE_PCI) {
+continue;
+}
+dprintf(1, PCI: %s bdf = 0x%x\n, __func__, bdf);
+
+u8 pribus = pci_config_readb(bdf, PCI_PRIMARY_BUS);
+if (pribus != bus) {
+dprintf(1, PCI: primary bus = 0x%x - 0x%x\n, pribus, bus);
+pci_config_writeb(bdf, PCI_PRIMARY_BUS, bus);
+} else {
+dprintf(1, PCI: primary bus = 0x%x\n, pribus);
+}
+
+u8 secbus = pci_config_readb(bdf, PCI_SECONDARY_BUS);
+(*pci_bus)++;
+if (*pci_bus != secbus) {
+dprintf(1, PCI: secondary bus = 0x%x - 0x%x\n,
+secbus, *pci_bus);
+secbus = *pci_bus;
+pci_config_writeb(bdf, PCI_SECONDARY_BUS, secbus);
+} else {
+dprintf(1, PCI: secondary bus = 0x%x\n, secbus);
+}
+
+/* set to max for access to all subordinate buses.
+   later set it to accurate value */
+u8 subbus = pci_config_readb(bdf, PCI_SUBORDINATE_BUS);
+pci_config_writeb(bdf, PCI_SUBORDINATE_BUS, 255);
+
+pci_bios_init_bus_rec(secbus, pci_bus);
+
+if (subbus != *pci_bus) {
+dprintf(1, PCI: subordinate bus = 0x%x - 0x%x\n,
+subbus, *pci_bus);
+subbus = *pci_bus;
+} else {
+dprintf(1, PCI: subordinate bus = 0x%x\n, subbus);
+}
+pci_config_writeb(bdf, PCI_SUBORDINATE_BUS, subbus);
+}
+}
+
+static void
+pci_bios_init_bus(void)
+{
+u8 pci_bus = 0;
+pci_bios_init_bus_rec(0 /* host bus */, pci_bus);
+}
+
 void
 pci_setup(void)
 {
@@ -235,6 +303,8 @@ pci_setup(void)
 /* pci_bios_mem_addr + some value */
 pci_bios_prefmem_addr = pci_bios_mem_addr + 0x0800;
 
+pci_bios_init_bus();
+
 int bdf, max;
 foreachpci(bdf, max) {
 pci_bios_init_bridges(bdf);
-- 
1.6.6.1

[Qemu-devel] [PATCH 2/8] seabios: pciinit: factor out pci bar region allocation logic.

2010-06-17 Thread Isaku Yamahata

factor out pci bar region allocation logic.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |   84 -
 1 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 0556ee2..488c77b 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -37,6 +37,50 @@ static void pci_set_io_region_addr(u16 bdf, int region_num, 
u32 addr)
 dprintf(1, region %d: 0x%08x\n, region_num, addr);
 }
 
+static void pci_bios_allocate_region(u16 bdf, int region_num)
+{
+u32 *paddr;
+int ofs;
+if (region_num == PCI_ROM_SLOT)
+ofs = PCI_ROM_ADDRESS;
+else
+ofs = PCI_BASE_ADDRESS_0 + region_num * 4;
+
+u32 old = pci_config_readl(bdf, ofs);
+u32 mask;
+if (region_num == PCI_ROM_SLOT) {
+mask = PCI_ROM_ADDRESS_MASK;
+pci_config_writel(bdf, ofs, mask);
+} else {
+if (old  PCI_BASE_ADDRESS_SPACE_IO)
+mask = PCI_BASE_ADDRESS_IO_MASK;
+else
+mask = PCI_BASE_ADDRESS_MEM_MASK;
+pci_config_writel(bdf, ofs, ~0);
+}
+u32 val = pci_config_readl(bdf, ofs);
+pci_config_writel(bdf, ofs, old);
+
+if (val != 0) {
+u32 size = (~(val  mask)) + 1;
+if (val  PCI_BASE_ADDRESS_SPACE_IO)
+paddr = pci_bios_io_addr;
+else
+paddr = pci_bios_mem_addr;
+*paddr = ALIGN(*paddr, size);
+pci_set_io_region_addr(bdf, region_num, *paddr);
+*paddr += size;
+}
+}
+
+static void pci_bios_allocate_regions(u16 bdf)
+{
+int i;
+for (i = 0; i  PCI_NUM_REGIONS; i++) {
+pci_bios_allocate_region(bdf, i);
+}
+}
+
 /* return the global irq number corresponding to a given device irq
pin. We could also use the bus number to have a more precise
mapping. */
@@ -78,8 +122,7 @@ static void pci_bios_init_bridges(u16 bdf)
 static void pci_bios_init_device(u16 bdf)
 {
 int class;
-u32 *paddr;
-int i, pin, pic_irq, vendor_id, device_id;
+int pin, pic_irq, vendor_id, device_id;
 
 class = pci_config_readw(bdf, PCI_CLASS_DEVICE);
 vendor_id = pci_config_readw(bdf, PCI_VENDOR_ID);
@@ -94,7 +137,7 @@ static void pci_bios_init_device(u16 bdf)
 /* PIIX3/PIIX4 IDE */
 pci_config_writew(bdf, 0x40, 0x8000); // enable IDE0
 pci_config_writew(bdf, 0x42, 0x8000); // enable IDE1
-goto default_map;
+pci_bios_allocate_regions(bdf);
 } else {
 /* IDE: we map it as in ISA mode */
 pci_set_io_region_addr(bdf, 0, PORT_ATA1_CMD_BASE);
@@ -121,41 +164,8 @@ static void pci_bios_init_device(u16 bdf)
 }
 break;
 default:
-default_map:
 /* default memory mappings */
-for (i = 0; i  PCI_NUM_REGIONS; i++) {
-int ofs;
-if (i == PCI_ROM_SLOT)
-ofs = PCI_ROM_ADDRESS;
-else
-ofs = PCI_BASE_ADDRESS_0 + i * 4;
-
-u32 old = pci_config_readl(bdf, ofs);
-u32 mask;
-if (i == PCI_ROM_SLOT) {
-mask = PCI_ROM_ADDRESS_MASK;
-pci_config_writel(bdf, ofs, mask);
-} else {
-if (old  PCI_BASE_ADDRESS_SPACE_IO)
-mask = PCI_BASE_ADDRESS_IO_MASK;
-else
-mask = PCI_BASE_ADDRESS_MEM_MASK;
-pci_config_writel(bdf, ofs, ~0);
-}
-u32 val = pci_config_readl(bdf, ofs);
-pci_config_writel(bdf, ofs, old);
-
-if (val != 0) {
-u32 size = (~(val  mask)) + 1;
-if (val  PCI_BASE_ADDRESS_SPACE_IO)
-paddr = pci_bios_io_addr;
-else
-paddr = pci_bios_mem_addr;
-*paddr = ALIGN(*paddr, size);
-pci_set_io_region_addr(bdf, i, *paddr);
-*paddr += size;
-}
-}
+pci_bios_allocate_regions(bdf);
 break;
 }
 
-- 
1.6.6.1

[Qemu-devel] Re: [PATCH 03/10] pci: fix pci_bus_reset() with 64bit BAR and several clean ups.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:45PM +0900, Isaku Yamahata wrote:
 fix pci_device_reset() with 64bit BAR.
 export pci_bus_reset(), pci_device_reset() and two helper functions
 for later use. And several clean ups.
 
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
 ---
  hw/pci.c |   44 
  hw/pci.h |5 +
  2 files changed, 41 insertions(+), 8 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 9ba62eb..87f5e6c 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -144,28 +144,50 @@ static void pci_update_irq_status(PCIDevice *dev)
  }
  }
  
 -static void pci_device_reset(PCIDevice *dev)
 +void pci_device_reset_default(PCIDevice *dev)
  {
  int r;
  
  dev-irq_state = 0;
  pci_update_irq_status(dev);
 -dev-config[PCI_COMMAND] = ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
 -  PCI_COMMAND_MASTER);
 +pci_set_word(dev-config + PCI_COMMAND,
 + pci_get_word(dev-config + PCI_COMMAND) 
 + ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | 
 PCI_COMMAND_MASTER));
  dev-config[PCI_CACHE_LINE_SIZE] = 0x0;
  dev-config[PCI_INTERRUPT_LINE] = 0x0;
  for (r = 0; r  PCI_NUM_REGIONS; ++r) {
 -if (!dev-io_regions[r].size) {
 +PCIIORegion *region = dev-io_regions[r];
 +if (!region-size) {
  continue;
  }
 -pci_set_long(dev-config + pci_bar(dev, r), dev-io_regions[r].type);
 +
 +if (!(region-type  PCI_BASE_ADDRESS_SPACE_IO) 
 +region-type  PCI_BASE_ADDRESS_MEM_TYPE_64) {
 +pci_set_quad(dev-config + pci_bar(dev, r), region-type);
 +} else {
 +pci_set_long(dev-config + pci_bar(dev, r), region-type);
 +}
  }
  pci_update_mappings(dev);
  }
  

I applied the first hunk. Looking at it
made me notice that we don't clear interrupt disable
bit on reset, and we really should as it is read/write.
Rather than duplicating code, we should just use wmask.

I ended up with this:

commit b82d3876099c4f1fd009082f052e3bac7e3062e7
Author: Isaku Yamahata yamah...@valinux.co.jp
Date:   Thu Jun 17 15:15:45 2010 +0900

pci: fix pci_device_reset

Clear interrupt disable bit on reset, according to PCI spec.
Fix pci_device_reset() with 64bit BAR.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/hw/pci.c b/hw/pci.c
index 7787005..de33745 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -150,15 +150,24 @@ static void pci_device_reset(PCIDevice *dev)
 
 dev-irq_state = 0;
 pci_update_irq_status(dev);
-dev-config[PCI_COMMAND] = ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
-  PCI_COMMAND_MASTER);
+/* Clear all writeable bits */
+pci_set_word(dev-config + PCI_COMMAND,
+ pci_get_word(dev-config + PCI_COMMAND) 
+ ~pci_get_word(dev-wmask + PCI_COMMAND));
 dev-config[PCI_CACHE_LINE_SIZE] = 0x0;
 dev-config[PCI_INTERRUPT_LINE] = 0x0;
 for (r = 0; r  PCI_NUM_REGIONS; ++r) {
-if (!dev-io_regions[r].size) {
+PCIIORegion *region = dev-io_regions[r];
+if (!region-size) {
 continue;
 }
-pci_set_long(dev-config + pci_bar(dev, r), dev-io_regions[r].type);
+
+if (!(region-type  PCI_BASE_ADDRESS_SPACE_IO) 
+region-type  PCI_BASE_ADDRESS_MEM_TYPE_64) {
+pci_set_quad(dev-config + pci_bar(dev, r), region-type);
+} else {
+pci_set_long(dev-config + pci_bar(dev, r), region-type);
+}
 }
 pci_update_mappings(dev);
 }

[Qemu-devel] [PATCH 1/8] seabios: pci: introduce foreachpci_in_bus() helper macro.

2010-06-17 Thread Isaku Yamahata

This patch introduces foreachpci_in_bus() helper macro for
depth first recursion. foreachpci() is for width first recursion.
The macro will be used later to initialize pci bridge
that requires depth first recursion.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pci.c |   30 ++
 src/pci.h |   11 +++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/src/pci.c b/src/pci.c
index 1ab3c2c..d418b4b 100644
--- a/src/pci.c
+++ b/src/pci.c
@@ -157,6 +157,36 @@ pci_find_vga(void)
 }
 }
 
+// Helper function for foreachpci_in_bus() macro - return next devfn
+int
+pci_next_in_bus(int bus, int devfn)
+{
+int bdf = pci_bus_devfn_to_bdf(bus, devfn);
+if (pci_bdf_to_fn(bdf) == 1
+ (pci_config_readb(bdf-1, PCI_HEADER_TYPE)  0x80) == 0)
+// Last found device wasn't a multi-function device - skip to
+// the next device.
+devfn += 7;
+
+for (;;) {
+if (devfn = 0x100)
+return -1;
+
+bdf = pci_bus_devfn_to_bdf(bus, devfn);
+u16 v = pci_config_readw(bdf, PCI_VENDOR_ID);
+if (v != 0x  v != 0x)
+// Device is present.
+break;
+
+if (pci_bdf_to_fn(bdf) == 0)
+devfn += 8;
+else
+devfn += 1;
+}
+
+return devfn;
+}
+
 // Search for a device with the specified vendor and device ids.
 int
 pci_find_device(u16 vendid, u16 devid)
diff --git a/src/pci.h b/src/pci.h
index 8a21c06..26bfd40 100644
--- a/src/pci.h
+++ b/src/pci.h
@@ -21,6 +21,9 @@ static inline u8 pci_bdf_to_fn(u16 bdf) {
 static inline u16 pci_to_bdf(int bus, int dev, int fn) {
 return (bus8) | (dev3) | fn;
 }
+static inline u16 pci_bus_devfn_to_bdf(int bus, u16 devfn) {
+return (bus  8) | devfn;
+}
 
 static inline u32 pci_vd(u16 vendor, u16 device) {
 return (device  16) | vendor;
@@ -50,6 +53,14 @@ int pci_next(int bdf, int *pmax);
  ; BDF = 0 \
  ; BDF=pci_next(BDF+1, MAX))
 
+int pci_next_in_bus(int bus, int devfn);
+#define foreachpci_in_bus(BUS, DEVFN, BDF)  \
+for (DEVFN = pci_next_in_bus(BUS, 0),   \
+ BDF = pci_bus_devfn_to_bdf(BUS, DEVFN) \
+ ; DEVFN = 0   \
+ ; DEVFN = pci_next_in_bus(BUS, DEVFN + 1), \
+   BDF = pci_bus_devfn_to_bdf(BUS, DEVFN))
+
 // pirtable.c
 void create_pirtable(void);
 
-- 
1.6.6.1

[Qemu-devel] [PATCH 4/8] seabios: pciinit: make pci bar assigner preferchable memory aware.

2010-06-17 Thread Isaku Yamahata

Make pci bar assigner preferchable memory aware.
This is needed for PCI bridge support because memory space and
prefetchable memory space is filtered independently based on
memory base/limit and prefetchable memory base/limit of pci bridge.
On bus 0, such a distinction isn't necessary so keep existing behavior
by checking bus=0.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index b635e44..b6ab157 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -16,6 +16,7 @@
 
 static u32 pci_bios_io_addr;
 static u32 pci_bios_mem_addr;
+static u32 pci_bios_prefmem_addr;
 /* host irqs corresponding to PCI irqs A-D */
 static u8 pci_irqs[4] = {
 10, 10, 11, 11
@@ -70,6 +71,12 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
 u32 size = (~(val  mask)) + 1;
 if (val  PCI_BASE_ADDRESS_SPACE_IO)
 paddr = pci_bios_io_addr;
+else if ((val  PCI_BASE_ADDRESS_MEM_PREFETCH) 
+ /* keep behaviour on bus = 0 */
+ pci_bdf_to_bus(bdf) != 0 
+ /* If pci_bios_prefmem_addr == 0, keep old behaviour */
+ pci_bios_prefmem_addr != 0)
+paddr = pci_bios_prefmem_addr;
 else
 paddr = pci_bios_mem_addr;
 *paddr = ALIGN(*paddr, size);
@@ -221,6 +228,9 @@ pci_setup(void)
 pci_bios_io_addr = 0xc000;
 pci_bios_mem_addr = BUILD_PCIMEM_START;
 
+/* pci_bios_mem_addr + some value */
+pci_bios_prefmem_addr = pci_bios_mem_addr + 0x0800;
+
 int bdf, max;
 foreachpci(bdf, max) {
 pci_bios_init_bridges(bdf);
-- 
1.6.6.1

[Qemu-devel] [PATCH 5/8] seabios: pciinit: factor out bar offset calculation.

2010-06-17 Thread Isaku Yamahata

This patch factors out bar offset calculation.
Later the calculation logic will be enhanced.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index b6ab157..6ba51f2 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -22,15 +22,19 @@ static u8 pci_irqs[4] = {
 10, 10, 11, 11
 };
 
+static u32 pci_bar(u16 bdf, int region_num)
+{
+if (region_num != PCI_ROM_SLOT) {
+return PCI_BASE_ADDRESS_0 + region_num * 4;
+}
+return PCI_ROM_ADDRESS;
+}
+
 static void pci_set_io_region_addr(u16 bdf, int region_num, u32 addr)
 {
 u32 ofs, old_addr;
 
-if (region_num == PCI_ROM_SLOT) {
-ofs = PCI_ROM_ADDRESS;
-} else {
-ofs = PCI_BASE_ADDRESS_0 + region_num * 4;
-}
+ofs = pci_bar(bdf, region_num);
 
 old_addr = pci_config_readl(bdf, ofs);
 
@@ -46,11 +50,7 @@ static void pci_set_io_region_addr(u16 bdf, int region_num, 
u32 addr)
 static int pci_bios_allocate_region(u16 bdf, int region_num)
 {
 u32 *paddr;
-int ofs;
-if (region_num == PCI_ROM_SLOT)
-ofs = PCI_ROM_ADDRESS;
-else
-ofs = PCI_BASE_ADDRESS_0 + region_num * 4;
+u32 ofs = pci_bar(bdf, region_num);
 
 u32 old = pci_config_readl(bdf, ofs);
 u32 mask;
-- 
1.6.6.1

[Qemu-devel] [PATCH 8/8] seabios: pciinit: initialize pci bridge filtering registers.

2010-06-17 Thread Isaku Yamahata

initialize pci bridge filtering registers.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |  117 +++-
 1 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index fe6848a..f68a690 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -14,6 +14,8 @@
 #define PCI_ROM_SLOT 6
 #define PCI_NUM_REGIONS 7
 
+static void pci_bios_init_device_in_bus(int bus);
+
 static u32 pci_bios_io_addr;
 static u32 pci_bios_mem_addr;
 static u32 pci_bios_prefmem_addr;
@@ -145,6 +147,106 @@ static void pci_bios_init_bridges(u16 bdf)
 }
 }
 
+#define PCI_IO_ALIGN4096
+#define PCI_IO_SHIFT8
+#define PCI_MEMORY_ALIGN(1UL  20)
+#define PCI_MEMORY_SHIFT16
+#define PCI_PREF_MEMORY_ALIGN   (1UL  20)
+#define PCI_PREF_MEMORY_SHIFT   16
+
+static void pci_bios_init_device_bridge(u16 bdf)
+{
+u32 io_old;
+u32 mem_old;
+u32 prefmem_old;
+
+u32 io_base;
+u32 io_end;
+u32 mem_base;
+u32 mem_end;
+u32 prefmem_base;
+u32 prefmem_end;
+
+pci_bios_allocate_region(bdf, 0);
+pci_bios_allocate_region(bdf, 1);
+pci_bios_allocate_region(bdf, PCI_ROM_SLOT);
+
+io_old = pci_bios_io_addr;
+mem_old = pci_bios_mem_addr;
+prefmem_old = pci_bios_prefmem_addr;
+
+/* IO BASE is assumed to be 16 bit */
+pci_bios_io_addr = ALIGN(pci_bios_io_addr, PCI_IO_ALIGN);
+pci_bios_mem_addr = ALIGN(pci_bios_mem_addr, PCI_MEMORY_ALIGN);
+pci_bios_prefmem_addr =
+ALIGN(pci_bios_prefmem_addr, PCI_PREF_MEMORY_ALIGN);
+
+io_base = pci_bios_io_addr;
+mem_base = pci_bios_mem_addr;
+prefmem_base = pci_bios_prefmem_addr;
+
+u8 secbus = pci_config_readb(bdf, PCI_SECONDARY_BUS);
+if (secbus  0) {
+pci_bios_init_device_in_bus(secbus);
+}
+
+pci_bios_io_addr = ALIGN(pci_bios_io_addr, PCI_IO_ALIGN);
+pci_bios_mem_addr = ALIGN(pci_bios_mem_addr, PCI_MEMORY_ALIGN);
+pci_bios_prefmem_addr =
+ALIGN(pci_bios_prefmem_addr, PCI_PREF_MEMORY_ALIGN);
+
+io_end = pci_bios_io_addr;
+if (io_end == io_base) {
+pci_bios_io_addr = io_old;
+io_base = 0x;
+io_end = 1;
+}
+pci_config_writeb(bdf, PCI_IO_BASE, io_base  PCI_IO_SHIFT);
+pci_config_writew(bdf, PCI_IO_BASE_UPPER16, 0);
+pci_config_writeb(bdf, PCI_IO_LIMIT, (io_end - 1)  PCI_IO_SHIFT);
+pci_config_writew(bdf, PCI_IO_LIMIT_UPPER16, 0);
+
+mem_end = pci_bios_mem_addr;
+if (mem_end == mem_base) {
+pci_bios_mem_addr = mem_old;
+mem_base = 0x;
+mem_end = 1;
+}
+pci_config_writew(bdf, PCI_MEMORY_BASE, mem_base  PCI_MEMORY_SHIFT);
+pci_config_writew(bdf, PCI_MEMORY_LIMIT, (mem_end -1)  PCI_MEMORY_SHIFT);
+
+prefmem_end = pci_bios_prefmem_addr;
+if (prefmem_end == prefmem_base) {
+pci_bios_prefmem_addr = prefmem_old;
+prefmem_base = 0x;
+prefmem_end = 1;
+}
+pci_config_writew(bdf, PCI_PREF_MEMORY_BASE,
+  prefmem_base  PCI_PREF_MEMORY_SHIFT);
+pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT,
+  (prefmem_end - 1)  PCI_PREF_MEMORY_SHIFT);
+pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, 0);
+pci_config_writel(bdf, PCI_PREF_LIMIT_UPPER32, 0);
+
+dprintf(1, PCI: br io   = [0x%x, 0x%x)\n, io_base, io_end);
+dprintf(1, PCI: br mem  = [0x%x, 0x%x)\n, mem_base, mem_end);
+dprintf(1, PCI: br pref = [0x%x, 0x%x)\n, prefmem_base, prefmem_end);
+
+u16 cmd = pci_config_readw(bdf, PCI_COMMAND);
+cmd = ~PCI_COMMAND_IO;
+if (io_end  io_base) {
+cmd |= PCI_COMMAND_IO;
+}
+cmd = ~PCI_COMMAND_MEMORY;
+if (mem_end  mem_base || prefmem_end  prefmem_base) {
+cmd |= PCI_COMMAND_MEMORY;
+}
+cmd |= PCI_COMMAND_MASTER;
+pci_config_writew(bdf, PCI_COMMAND, cmd);
+
+pci_config_maskw(bdf, PCI_BRIDGE_CONTROL, 0, PCI_BRIDGE_CTL_SERR);
+}
+
 static void pci_bios_init_device(u16 bdf)
 {
 int class;
@@ -189,6 +291,9 @@ static void pci_bios_init_device(u16 bdf)
 pci_set_io_region_addr(bdf, 0, 0x8080);
 }
 break;
+case PCI_CLASS_BRIDGE_PCI:
+pci_bios_init_device_bridge(bdf);
+break;
 default:
 /* default memory mappings */
 pci_bios_allocate_regions(bdf);
@@ -220,6 +325,14 @@ static void pci_bios_init_device(u16 bdf)
 }
 }
 
+static void pci_bios_init_device_in_bus(int bus)
+{
+int devfn, bdf;
+foreachpci_in_bus(bus, devfn, bdf) {
+pci_bios_init_device(bdf);
+}
+}
+
 static void
 pci_bios_init_bus_rec(int bus, u8 *pci_bus)
 {
@@ -309,7 +422,5 @@ pci_setup(void)
 foreachpci(bdf, max) {
 pci_bios_init_bridges(bdf);
 }
-foreachpci(bdf, max) {
-pci_bios_init_device(bdf);
-}
+pci_bios_init_device_in_bus(0 /* host bus */);
 }
-- 
1.6.6.1

[Qemu-devel] [PATCH 6/8] seabios: pciinit: make bar offset calculation pci bridge aware.

2010-06-17 Thread Isaku Yamahata

This patch makes pci bar offset calculation pci bridge aware.
The offset of pci bridge rom is different from normal device.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 6ba51f2..1c2c8c6 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -27,7 +27,11 @@ static u32 pci_bar(u16 bdf, int region_num)
 if (region_num != PCI_ROM_SLOT) {
 return PCI_BASE_ADDRESS_0 + region_num * 4;
 }
-return PCI_ROM_ADDRESS;
+
+#define PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
+u8 type = pci_config_readb(bdf, PCI_HEADER_TYPE);
+type = ~PCI_HEADER_TYPE_MULTI_FUNCTION;
+return type == PCI_HEADER_TYPE_BRIDGE ? PCI_ROM_ADDRESS1 : PCI_ROM_ADDRESS;
 }
 
 static void pci_set_io_region_addr(u16 bdf, int region_num, u32 addr)
-- 
1.6.6.1

[Qemu-devel] [PATCH 3/8] seabios: pciinit: make pci memory space assignment 64bit aware.

2010-06-17 Thread Isaku Yamahata

make pci memory space assignment 64bit aware.
If 64bit memory space is found while assigning pci memory space,
clear higher bit and skip to next bar.

This patch is preparation for q35 chipset initialization which
has 64bit bar.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 src/pciinit.c |   19 +--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 488c77b..b635e44 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -37,7 +37,12 @@ static void pci_set_io_region_addr(u16 bdf, int region_num, 
u32 addr)
 dprintf(1, region %d: 0x%08x\n, region_num, addr);
 }
 
-static void pci_bios_allocate_region(u16 bdf, int region_num)
+/*
+ * return value
+ *  0: 32bit BAR
+ *  non 0: 64bit BAR
+ */
+static int pci_bios_allocate_region(u16 bdf, int region_num)
 {
 u32 *paddr;
 int ofs;
@@ -71,13 +76,23 @@ static void pci_bios_allocate_region(u16 bdf, int 
region_num)
 pci_set_io_region_addr(bdf, region_num, *paddr);
 *paddr += size;
 }
+
+int is_64bit = !(val  PCI_BASE_ADDRESS_SPACE_IO) 
+(val  PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
+if (is_64bit) {
+pci_config_writel(bdf, ofs + 4, 0);
+}
+return is_64bit;
 }
 
 static void pci_bios_allocate_regions(u16 bdf)
 {
 int i;
 for (i = 0; i  PCI_NUM_REGIONS; i++) {
-pci_bios_allocate_region(bdf, i);
+int is_64bit = pci_bios_allocate_region(bdf, i);
+if (is_64bit){
+i++;
+}
 }
 }
 
-- 
1.6.6.1

[Qemu-devel] [PATCH] vmware_vga: fix reset value for command register

2010-06-17 Thread Michael S. Tsirkin

Make init value for this register match the spec.
BAR address is 0 at init, so enabling it
only works by chance.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

This patch is untested. Could someone who has vmware
guests please look at it?
Thanks!

 hw/vmware_vga.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index bf2a699..41c959b 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1240,9 +1240,6 @@ static int pci_vmsvga_initfn(PCIDevice *dev)
 
 pci_config_set_vendor_id(s-card.config, PCI_VENDOR_ID_VMWARE);
 pci_config_set_device_id(s-card.config, SVGA_PCI_DEVICE_ID);
-s-card.config[PCI_COMMAND]= PCI_COMMAND_IO |
-  PCI_COMMAND_MEMORY |
-  PCI_COMMAND_MASTER; /* I/O + Memory */
 pci_config_set_class(s-card.config, PCI_CLASS_DISPLAY_VGA);
 s-card.config[PCI_CACHE_LINE_SIZE]= 0x08; /* Cache line 
size */
 s-card.config[PCI_LATENCY_TIMER] = 0x40;  /* Latency timer */
-- 
1.7.1.12.g42b7f

[Qemu-devel] [PATCH] pcnet: address TODOs

2010-06-17 Thread Michael S. Tsirkin

pcnet enables memory/io on init, which
does not make sense as BAR values are wrong.
Fix this, disabling BARs according to PCI spec.
Address other minor TODOs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

The following untested patch brings pcnet in
compliance with the spec.
Could someone who's interested in pcnet look
at this patch please?


 hw/pcnet.c |   17 ++---
 1 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/hw/pcnet.c b/hw/pcnet.c
index 5e63eb5..b52935a 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1981,26 +1981,14 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
 
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_AMD);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_AMD_LANCE);
-/* TODO: value should be 0 at RST# */
-pci_set_word(pci_conf + PCI_COMMAND,
- PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
 pci_set_word(pci_conf + PCI_STATUS,
  PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MEDIUM);
 pci_conf[PCI_REVISION_ID] = 0x10;
-/* TODO: 0 is the default anyway, no need to set it. */
-pci_conf[PCI_CLASS_PROG] = 0x00;
 pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
-pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
-
-/* TODO: not necessary, is set when BAR is registered. */
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, PCI_BASE_ADDRESS_SPACE_IO);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0 + 4,
- PCI_BASE_ADDRESS_SPACE_MEMORY);
 
 pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID, 0x0);
 pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, 0x0);
 
-/* TODO: value must be 0 at RST# */
 pci_conf[PCI_INTERRUPT_PIN] = 1; // interrupt pin 0
 pci_conf[PCI_MIN_GNT] = 0x06;
 pci_conf[PCI_MAX_LAT] = 0xff;
@@ -2009,11 +1997,10 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
 s-mmio_index =
   cpu_register_io_memory(pcnet_mmio_read, pcnet_mmio_write, d-state);
 
-/* TODO: use pci_dev, avoid cast below. */
-pci_register_bar((PCIDevice *)d, 0, PCNET_IOPORT_SIZE,
+pci_register_bar(pci_dev, 0, PCNET_IOPORT_SIZE,
PCI_BASE_ADDRESS_SPACE_IO, pcnet_ioport_map);
 
-pci_register_bar((PCIDevice *)d, 1, PCNET_PNPMMIO_SIZE,
+pci_register_bar(pci_dev, 1, PCNET_PNPMMIO_SIZE,
PCI_BASE_ADDRESS_SPACE_MEMORY, pcnet_mmio_map);
 
 s-irq = pci_dev-irq[0];
-- 
1.7.1.12.g42b7f

[Qemu-devel] Re: [PATCH 0/8] seabios: pci: multi pci bus support

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 08:03:08PM +0900, Isaku Yamahata wrote:
 This patch set allows seabios to initialize multi pci bus and 64bit BAR.
 
 Currently seabios is able to initialize only pci root bus.
 However multi pci bus support is wanted because
   - more pci bus is wanted in qemu for many slots
   - pci express support is commin in qemu which requires multi pci bus.
 those patches on Qemu part are under way, though.

Not that I object, but - does it really require multi bus? Why?

 Isaku Yamahata (8):
   seabios: pci: introduce foreachpci_in_bus() helper macro.
   seabios: pciinit: factor out pci bar region allocation logic.
   seabios: pciinit: make pci memory space assignment 64bit aware.
   seabios: pciinit: make pci bar assigner preferchable memory aware.
   seabios: pciinit: factor out bar offset calculation.
   seabios: pciinit: make bar offset calculation pci bridge aware.
   seabios: pciinit: pci bridge bus initialization.
   seabios: pciinit: initialize pci bridge filtering registers.
 
  src/pci.c |   30 ++
  src/pci.h |   11 ++
  src/pciinit.c |  310 
  3 files changed, 306 insertions(+), 45 deletions(-)

[Qemu-devel] Re: [PATCH 00/10] pci: pci to pci bridge clean up and enhancement

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 01:02:43PM +0300, Michael S. Tsirkin wrote:
 For example, forcing all devices to call pci_reset_default
 in their reset routines does not look like a good cleanup:
 the less boilerplate, the better IMO.

One thing that we need to address, is devices
which need to enable memory+master on init.
They should probably also enable this on reset.

One approach that was discussed several times
would be to call cleanup and then init again.
I expect this would be enough to get rid of reset
callbacks in most devices.

-- 
MST

[Qemu-devel] [RFC][PATCH 0/2] block: Add flush after metadata writes

2010-06-17 Thread Kevin Wolf

This addresses the data integrity problems described at
http://wiki.qemu.org/Features/Qcow2DataIntegrity#Metadata_update_ordering.2C_Part_2

These problems are the same for all image formats (except raw, which doesn't
have any metadata), so I'm going to add more patches for the other formats for
the real patch submission.

Kevin Wolf (2):
  block: Add bdrv_(p)write_sync
  qcow2: Use bdrv_(p)write_sync for metadata writes

 block.c|   37 +
 block.h|4 
 block/qcow2-cluster.c  |   16 
 block/qcow2-refcount.c |   18 +-
 block/qcow2-snapshot.c |   14 +++---
 block/qcow2.c  |   10 +-
 6 files changed, 70 insertions(+), 29 deletions(-)

[Qemu-devel] [RFC][PATCH 1/2] block: Add bdrv_(p)write_sync

2010-06-17 Thread Kevin Wolf

Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c |   37 +
 block.h |4 
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 0765fbc..7b64c2d 100644
--- a/block.c
+++ b/block.c
@@ -1010,6 +1010,43 @@ int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 return count1;
 }
 
+/*
+ * Writes to the file and ensures that no writes are reordered across this
+ * request (acts as a barrier)
+ *
+ * Returns 0 on success, -errno in error cases.
+ */
+int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
+const void *buf, int count)
+{
+int ret;
+
+ret = bdrv_pwrite(bs, offset, buf, count);
+if (ret  0) {
+return ret;
+}
+
+/* No flush needed for cache=writethrough, it uses O_DSYNC */
+if ((bs-open_flags  BDRV_O_CACHE_MASK) != 0) {
+bdrv_flush(bs);
+}
+
+return 0;
+}
+
+/*
+ * Writes to the file and ensures that no writes are reordered across this
+ * request (acts as a barrier)
+ *
+ * Returns 0 on success, -errno in error cases.
+ */
+int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
+const uint8_t *buf, int nb_sectors)
+{
+return bdrv_pwrite_sync(bs, BDRV_SECTOR_SIZE * sector_num,
+buf, BDRV_SECTOR_SIZE * nb_sectors);
+}
+
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
diff --git a/block.h b/block.h
index 9df9b38..6a157f4 100644
--- a/block.h
+++ b/block.h
@@ -80,6 +80,10 @@ int bdrv_pread(BlockDriverState *bs, int64_t offset,
void *buf, int count);
 int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 const void *buf, int count);
+int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
+const void *buf, int count);
+int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
+const uint8_t *buf, int nb_sectors);
 int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
-- 
1.6.6.1

[Qemu-devel] [RFC][PATCH 2/2] qcow2: Use bdrv_(p)write_sync for metadata writes

2010-06-17 Thread Kevin Wolf

Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/qcow2-cluster.c  |   16 
 block/qcow2-refcount.c |   18 +-
 block/qcow2-snapshot.c |   14 +++---
 block/qcow2.c  |   10 +-
 4 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5760ad6..05cf6c2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -64,7 +64,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 BLKDBG_EVENT(bs-file, BLKDBG_L1_GROW_WRITE_TABLE);
 for(i = 0; i  s-l1_size; i++)
 new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-ret = bdrv_pwrite(bs-file, new_l1_table_offset, new_l1_table, 
new_l1_size2);
+ret = bdrv_pwrite_sync(bs-file, new_l1_table_offset, new_l1_table, 
new_l1_size2);
 if (ret != new_l1_size2)
 goto fail;
 for(i = 0; i  s-l1_size; i++)
@@ -74,7 +74,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 BLKDBG_EVENT(bs-file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
 cpu_to_be32w((uint32_t*)data, new_l1_size);
 cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
-ret = bdrv_pwrite(bs-file, offsetof(QCowHeader, l1_size), 
data,sizeof(data));
+ret = bdrv_pwrite_sync(bs-file, offsetof(QCowHeader, l1_size), 
data,sizeof(data));
 if (ret != sizeof(data)) {
 goto fail;
 }
@@ -207,7 +207,7 @@ static int write_l1_entry(BlockDriverState *bs, int 
l1_index)
 }
 
 BLKDBG_EVENT(bs-file, BLKDBG_L1_UPDATE);
-ret = bdrv_pwrite(bs-file, s-l1_table_offset + 8 * l1_start_index,
+ret = bdrv_pwrite_sync(bs-file, s-l1_table_offset + 8 * l1_start_index,
 buf, sizeof(buf));
 if (ret  0) {
 return ret;
@@ -263,7 +263,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, 
uint64_t **table)
 }
 /* write the l2 table to the file */
 BLKDBG_EVENT(bs-file, BLKDBG_L2_ALLOC_WRITE);
-ret = bdrv_pwrite(bs-file, l2_offset, l2_table,
+ret = bdrv_pwrite_sync(bs-file, l2_offset, l2_table,
 s-l2_size * sizeof(uint64_t));
 if (ret  0) {
 goto fail;
@@ -413,8 +413,8 @@ static int copy_sectors(BlockDriverState *bs, uint64_t 
start_sect,
 s-aes_encrypt_key);
 }
 BLKDBG_EVENT(bs-file, BLKDBG_COW_WRITE);
-ret = bdrv_write(bs-file, (cluster_offset  9) + n_start,
- s-cluster_data, n);
+ret = bdrv_write_sync(bs-file, (cluster_offset  9) + n_start,
+s-cluster_data, n);
 if (ret  0)
 return ret;
 return 0;
@@ -631,7 +631,7 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
 BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE_COMPRESSED);
 l2_table[l2_index] = cpu_to_be64(cluster_offset);
-if (bdrv_pwrite(bs-file,
+if (bdrv_pwrite_sync(bs-file,
 l2_offset + l2_index * sizeof(uint64_t),
 l2_table + l2_index,
 sizeof(uint64_t)) != sizeof(uint64_t))
@@ -655,7 +655,7 @@ static int write_l2_entries(BlockDriverState *bs, uint64_t 
*l2_table,
 int ret;
 
 BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE);
-ret = bdrv_pwrite(bs-file, l2_offset + start_offset,
+ret = bdrv_pwrite_sync(bs-file, l2_offset + start_offset,
 l2_table[l2_start_index], len);
 if (ret  0) {
 return ret;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 41e1da9..540bf49 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -44,7 +44,7 @@ static int write_refcount_block(BlockDriverState *bs)
 }
 
 BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_UPDATE);
-if (bdrv_pwrite(bs-file, s-refcount_block_cache_offset,
+if (bdrv_pwrite_sync(bs-file, s-refcount_block_cache_offset,
 s-refcount_block_cache, size) != size)
 {
 return -EIO;
@@ -269,7 +269,7 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index)
 
 /* Now the new refcount block needs to be written to disk */
 BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_ALLOC_WRITE);
-ret = bdrv_pwrite(bs-file, new_block, s-refcount_block_cache,
+ret = bdrv_pwrite_sync(bs-file, new_block, s-refcount_block_cache,
 s-cluster_size);
 if (ret  0) {
 goto fail_block;
@@ -279,7 +279,7 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index)
 if (refcount_table_index  s-refcount_table_size) {
 uint64_t data64 = cpu_to_be64(new_block);
 BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_ALLOC_HOOKUP);
-ret = bdrv_pwrite(bs-file,
+ret = bdrv_pwrite_sync(bs-file,
 s-refcount_table_offset + refcount_table_index * sizeof(uint64_t),
 data64, sizeof(data64));
 if (ret  0) {
@@ -359,7 +359,7 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index)
 
 /* Write refcount blocks to disk */

Re: [Qemu-devel] RFC v3: blockdev_add friends, brief rationale, QMP docs

2010-06-17 Thread Markus Armbruster

Stefan Hajnoczi stefa...@gmail.com writes:

 On Wed, Jun 16, 2010 at 6:27 PM, Markus Armbruster arm...@redhat.com wrote:
 blockdev_add
 

 Add host block device.

 Arguments:

 - id: the host block device's ID, must be unique (json-string)
 - format: image format (json-string, optional)
    - Possible values: raw, qcow2, ...

 What is the default when unset?  (I expect we'll auto-detect the
 format but this should be documented.)

For command line and human monitor, we definitely want a sensible
default.  I sketched one in section Command line syntax.  I'll quote
it for your convenience a few lines down.

 - protocol: image access protocol (json-string, optional)
    - Possible values: auto, file, nbd, ...

 The semantics of auto are not documented here.

Uh, that slipped in here.  It means guess protocol from image file
type.

Again, for command line and human monitor, we definitely want a sensible
default, and I sketched one in section Command line syntax.

We may want the same defaults in QMP, although more for consistency than
for usability.  But I didn't want to complicate the QMP section with all
that defaults business, so I moved discussion of defaults down to the
command line section.  Hope I didn't cause even more confusion that way.

Anyway, here's what I wrote on default format:

   * The default format is derived from the image file name: if it ends
 with .F, where F is a format name, that format is the default, else
 raw.

To let users ask for this explicitely, we could have pseudo-format
auto.

We also need a pseudo-format probe, which guesses the format from the
image contents.  Can't be made the default, because it's insecure.

On protocol auto:

   * The default protocol depends on the image file type: if it is a
 special file, it defaults to the protocol appropriate for that special
 file (host_cdrom for CD-ROM, ...).  Else it defaults to file.

   This permits shortening the first two examples:

   -blockdev id=blk1,file=fedora.img

   -blockdev id=blk2,blkdebug=test.blkdebug,file=test.qcow2

And for completeness, let me quote the unshortened examples, too:

   -blockdev id=blk1,format=raw,protocol=file,file=fedora.img

   -blockdev id=blk2,format=qcow2,blkdebug=test.blkdebug,\
   protocol=file,file=test.qcow2

 Notes:

 (1) If argument protocol is missing, all other optional arguments must
    be missing as well.  This defines a block device with no media
    inserted.

 Perhaps this is what auto means?

 (2) It's possible to list supported disk formats and protocols by
    running QEMU with arguments -blockdev_add \?.

 Is there an query-block-driver command or something in QMP to
 enumerate supported formats and protocols?  Not sure how useful this
 would be to the management stack - blockdev_add will probably return
 an error if an attempt is made to open an unsupported file.

QMP should be self-documenting: a client should be able to list
commands, their arguments, and possible argument values.  Listing
supported formats then becomes list possible values of command
blockdev_add's argument format.

 blockdev_del
 

 Remove a host block device.

 Arguments:

 - id: the host block device's ID (json-string)

 Example:

 - { execute: blockdev_del, arguments: { id: blk1 } }
 - { return: {} }

 What about an attached guest device?  Will this fail if the virtio-blk
 PCI device is still present?  For SCSI I imagine we can usually just
 remove the host block device.  For IDE there isn't hotplug support
 AFAIK, what happens?

Command fails.  You have to device_del the device first.  Which is only
possible if its bus supports hot-plug.

Thanks!

[Qemu-devel] Re: [Bug 595117] Re: qemu-nbd slow and missing writeback cache option

2010-06-17 Thread Stephane Chazelas

2010-06-16 20:36:00 -, Dustin Kirkland:
[...]
 Could you please send that patch to the qemu-devel@ mailing list?
 Thanks!
[...]

Hi Dustin, it looks like qemu-devel is subscribed to bugs in
there, so the bug report is on the list already.

Note that I still consider it as a bug because:
  - slow performance for no good reason
  - --nocache option is misleading
  - no fsync on -d which to my mind is a bug.

Cheers,
Stephane

-- 
qemu-nbd slow and missing writeback cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and qemu-nbd -d doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless syncs are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
syncs will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon qemu-nbd -d to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a  /dev/null
qemu-nbd --cache=x -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT-sync or die$!' 1 /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.

[Qemu-devel] [PATCH v2] monitor: Really show snapshot information about all devices

2010-06-17 Thread Miguel Di Ciurcio Filho

The 'info snapshots' monitor command does not show snapshot information from all
available block devices.

Usage example:
$ qemu -hda disk1.qcow2 -hdb disk2.qcow2

(qemu) info snapshots
Snapshot devices: ide0-hd0
Snapshot list (from ide0-hd0):
IDTAG VM SIZEDATE   VM CLOCK
11.5M 2010-05-26 21:51:02   00:00:03.263
21.5M 2010-05-26 21:51:09   00:00:08.844
31.5M 2010-05-26 21:51:24   00:00:23.274
41.5M 2010-05-26 21:53:17   00:00:03.595

In the above case, disk2.qcow2 has snapshot information, but it is not being
shown. Only the first device is always shown.

This patch updates the do_info_snapshots() function do correctly show snapshot
information about all available block devices.

New output:
(qemu) info snapshots
Snapshot list from ide0-hd0 (VM state image):
IDTAG VM SIZEDATE   VM CLOCK
11.5M 2010-05-26 21:51:02   00:00:03.263
21.5M 2010-05-26 21:51:09   00:00:08.844
31.5M 2010-05-26 21:51:24   00:00:23.274
41.5M 2010-05-26 21:53:17   00:00:03.595

Snapshot list from ide0-hd1:
IDTAG VM SIZEDATE   VM CLOCK
1   0 2010-05-26 21:51:02   00:00:03.263
2   0 2010-05-26 21:51:09   00:00:08.844
3   0 2010-05-26 21:51:24   00:00:23.274
4   0 2010-05-26 21:53:17   00:00:03.595

changelog
-
v1 - v2
- Added support to identify the device elected to save the VM's state.

Signed-off-by: Miguel Di Ciurcio Filho miguel.fi...@gmail.com
---
 savevm.c |   57 -
 1 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/savevm.c b/savevm.c
index 20354a8..5bc5fcd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1858,37 +1858,44 @@ void do_delvm(Monitor *mon, const QDict *qdict)
 
 void do_info_snapshots(Monitor *mon)
 {
-BlockDriverState *bs, *bs1;
-QEMUSnapshotInfo *sn_tab, *sn;
+BlockDriverState *bs, *bs_vm_state;
+QEMUSnapshotInfo *sn_tab;
 int nb_sns, i;
 char buf[256];
 
-bs = get_bs_snapshots();
-if (!bs) {
+bs_vm_state = get_bs_snapshots();
+if (!bs_vm_state) {
 monitor_printf(mon, No available block device supports snapshots\n);
 return;
 }
-monitor_printf(mon, Snapshot devices:);
-bs1 = NULL;
-while ((bs1 = bdrv_next(bs1))) {
-if (bdrv_can_snapshot(bs1)) {
-if (bs == bs1)
-monitor_printf(mon,  %s, bdrv_get_device_name(bs1));
-}
-}
-monitor_printf(mon, \n);
 
-nb_sns = bdrv_snapshot_list(bs, sn_tab);
-if (nb_sns  0) {
-monitor_printf(mon, bdrv_snapshot_list: error %d\n, nb_sns);
-return;
-}
-monitor_printf(mon, Snapshot list (from %s):\n,
-   bdrv_get_device_name(bs));
-monitor_printf(mon, %s\n, bdrv_snapshot_dump(buf, sizeof(buf), NULL));
-for(i = 0; i  nb_sns; i++) {
-sn = sn_tab[i];
-monitor_printf(mon, %s\n, bdrv_snapshot_dump(buf, sizeof(buf), sn));
+bs = NULL;
+while ((bs = bdrv_next(bs))) {
+if (bdrv_can_snapshot(bs)) {
+monitor_printf(mon, Snapshot list from %s,
+   bdrv_get_device_name(bs));
+
+if (bs == bs_vm_state) {
+monitor_printf(mon,  (VM state image):\n);
+} else {
+monitor_printf(mon, :\n);
+}
+
+monitor_printf(mon, %s\n, bdrv_snapshot_dump(buf, sizeof(buf), 
NULL));
+
+nb_sns = bdrv_snapshot_list(bs, sn_tab);
+if (nb_sns  0) {
+monitor_printf(mon, bdrv_snapshot_list: error %d\n, nb_sns);
+continue;
+}
+
+for (i = 0; i  nb_sns; i++) {
+monitor_printf(mon, %s\n, bdrv_snapshot_dump(buf, 
sizeof(buf),
+sn_tab[i]));
+}
+
+qemu_free(sn_tab);
+monitor_printf(mon, \n);
+}
 }
-qemu_free(sn_tab);
 }
-- 
1.7.1

Re: [Qemu-devel] Re: RFC v2: blockdev_add friends, brief rationale, QMP docs

2010-06-17 Thread Anthony Liguori


On 06/17/2010 03:20 AM, Kevin Wolf wrote:

Am 16.06.2010 20:07, schrieb Anthony Liguori:
   

   But it requires that
everything that -blockdev provides is accessible with -drive, too (or
that we're okay with users hating us).

   

I'm happy for -drive to die.  I think we should support -hda and
-blockdev.
 

-hda is not sufficient for most users. It doesn't provide any options.
It doesn't even support virtio. If -drive is going to die (and we seem
to agree all on that), then -blockdev needs to be usable for users (and
it's only you who contradicts so far).
   


I've always thought we should have a -vda argument and an -sda argument 
specifically for specifying virtio and scsi disks.



-blockdev should be optimized for config files, not single
argument input.  IOW:

[blockdev blk2]
   format = raw
   file = /path/to/base.img
   cache = writeback

[blockdev blk1]
format = qcow2
file = /path/to/leaf.img
cache=off
backing_dev = blk2

[device disk1]
driver = ide-drive
blockdev = blk1
bus = 0
unit = 0
 

You don't specify the backing file of an image on the command line (or
in the configuration file).


But we really ought to allow it.  Backing files are implemented as part 
of the core block layer, not the actual block formats.  Today the block 
layer queries the block format for the name of the backing file but gets 
no additional options from the block format.  File isn't necessarily 
enough information to successfully open the backing device so why treat 
it specially?


I think we should keep the current ability to query the block format for 
a backing file name but we should also support hooking up the backing 
device without querying the block format at all.  It makes the model 
much more elegant IMHO because then we're just creating block devices 
and hooking them up.  All block devices are created equal more or less.



  It's saved as part of the image. It's more
like this (for a simple raw image file):

[blockdev-protocol proto1]
protocol = file
file = /path/to/image.img

[blockdev blk1]
format = raw
cache=off
protocol = proto1

[device disk1]
driver = ide-drive
blockdev = blk1
bus = 0
unit = 0

(This would be Markus' option 3, I think)
   


I don't understand why we need two layers of abstraction here.  Why not 
just:


[blockdev proto1]
  protocol = file
  cache = off
  file = /path/to/image.img

Why does the cache option belong with raw and not with file and why 
can't we just use file directly?As Christoph mentions, we really don't 
have stacked protocols and I'm



not sure they make sense.
 

Right, if we go for Christoph's suggestion, we don't need stacked
protocols. We'll have stacked formats instead. I'm not sure if you like
this any better. ;-)

We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
blkdebug on file. We need to be able to represent this.
   


I think we need to do stacking in a device specific way.  When you look 
at something like vmdk, it should actually support multiple leafs since 
the format does support such a thing.  So what I'd suggest is:


[blockdev part1]
  format = raw
  file = image000.vmdk

[blockdev part2]
  format = raw
  file = image001.vmdk

[blockdev image]
  format = vmdk
  section0 = part1
  section1 = part2

Note, we'll need to support this sort of model in order to support a 
disk that creates an automatic partition table (which would be a pretty 
useful feature).  For blkdebug, it would look like:


[blockdev disk]
  format = qcow2
  file = foo.qcow2

[blockdev debug]
  format = blkdebug
  blockdev = disk


I think raw doesn't make very much sense then.  What's the point of it
if it's just a thin wrapper around a protocol?
 

That it can be wrapped around any protocol. It's just about separating
code for handling the content of an image and code for accessing the image.

Ever tried something like qemu-img create -f raw /dev/something 10G?
You need the host_device protocol there, not the file protocol. When we
had raw == file this completely failed. And it's definitely reasonable
to expect that it works because the image format _is_ raw, it's just not
saved in a file.
   


No, I don't actually thing it's reasonable.  There's nothing meaningful 
that command can do.  Also, I've never understand creating qcow2 images 
on a physical device.  qcow2 needs to grow dynamically and physical 
devices can't.


I understand that we need to support the later use case but I don't 
think creating this layer of user-visible abstraction is the right thing 
to do.  This is an obscure use case and it shouldn't be the model that 
we force upon our users.



Or the famous qcow2 images on block devices. Why did qemu guess the
format correctly when qcow2 was saved in a file, but not on a host
device? This was just inconsistent.

I've had more than one bug report about things like this which are
magically fixed when you do the layering right.
   


Beyond qcow2

[Qemu-devel] Re: [PATCH] pcnet: address TODOs

2010-06-17 Thread Jan Kiszka

Michael S. Tsirkin wrote:
 pcnet enables memory/io on init, which
 does not make sense as BAR values are wrong.
 Fix this, disabling BARs according to PCI spec.
 Address other minor TODOs.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 The following untested patch brings pcnet in
 compliance with the spec.
 Could someone who's interested in pcnet look
 at this patch please?
 

At least our special guest still works with your patch applied.

Tested-by: Jan Kiszka jan.kis...@siemens.com

Jan

 
  hw/pcnet.c |   17 ++---
  1 files changed, 2 insertions(+), 15 deletions(-)
 
 diff --git a/hw/pcnet.c b/hw/pcnet.c
 index 5e63eb5..b52935a 100644
 --- a/hw/pcnet.c
 +++ b/hw/pcnet.c
 @@ -1981,26 +1981,14 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
  
  pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_AMD);
  pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_AMD_LANCE);
 -/* TODO: value should be 0 at RST# */
 -pci_set_word(pci_conf + PCI_COMMAND,
 - PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
  pci_set_word(pci_conf + PCI_STATUS,
   PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MEDIUM);
  pci_conf[PCI_REVISION_ID] = 0x10;
 -/* TODO: 0 is the default anyway, no need to set it. */
 -pci_conf[PCI_CLASS_PROG] = 0x00;
  pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
 -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
 -
 -/* TODO: not necessary, is set when BAR is registered. */
 -pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, PCI_BASE_ADDRESS_SPACE_IO);
 -pci_set_long(pci_conf + PCI_BASE_ADDRESS_0 + 4,
 - PCI_BASE_ADDRESS_SPACE_MEMORY);
  
  pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID, 0x0);
  pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, 0x0);
  
 -/* TODO: value must be 0 at RST# */
  pci_conf[PCI_INTERRUPT_PIN] = 1; // interrupt pin 0
  pci_conf[PCI_MIN_GNT] = 0x06;
  pci_conf[PCI_MAX_LAT] = 0xff;
 @@ -2009,11 +1997,10 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
  s-mmio_index =
cpu_register_io_memory(pcnet_mmio_read, pcnet_mmio_write, d-state);
  
 -/* TODO: use pci_dev, avoid cast below. */
 -pci_register_bar((PCIDevice *)d, 0, PCNET_IOPORT_SIZE,
 +pci_register_bar(pci_dev, 0, PCNET_IOPORT_SIZE,
 PCI_BASE_ADDRESS_SPACE_IO, pcnet_ioport_map);
  
 -pci_register_bar((PCIDevice *)d, 1, PCNET_PNPMMIO_SIZE,
 +pci_register_bar(pci_dev, 1, PCNET_PNPMMIO_SIZE,
 PCI_BASE_ADDRESS_SPACE_MEMORY, pcnet_mmio_map);
  
  s-irq = pci_dev-irq[0];

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  kvm-all.c |   21 +++
  kvm.h |2 +
  target-i386/cpu.h |7 ++-
  target-i386/kvm.c |  139 
 -
  target-i386/machine.c |   20 +++
  5 files changed, 186 insertions(+), 3 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 43704b8..343c06e 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -71,6 +71,7 @@ struct KVMState
  #endif
  int irqchip_in_kernel;
  int pit_in_kernel;
 +int xsave, xcrs;
  };
  
  static KVMState *kvm_state;
 @@ -685,6 +686,16 @@ int kvm_init(int smp_cpus)
  s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
  #endif
  
 +s-xsave = 0;
 +#ifdef KVM_CAP_XSAVE
 +s-xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
 +#endif
 +
 +s-xcrs = 0;
 +#ifdef KVM_CAP_XCRS
 +s-xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
 +#endif
 +
  ret = kvm_arch_init(s, smp_cpus);
  if (ret  0)
  goto err;
 @@ -1013,6 +1024,16 @@ int kvm_has_debugregs(void)
  return kvm_state-debugregs;
  }
  
 +int kvm_has_xsave(void)
 +{
 +return kvm_state-xsave;
 +}
 +
 +int kvm_has_xcrs(void)
 +{
 +return kvm_state-xcrs;
 +}
 +
  void kvm_setup_guest_memory(void *start, size_t size)
  {
  if (!kvm_has_sync_mmu()) {
 diff --git a/kvm.h b/kvm.h
 index 7975e87..50c4192 100644
 --- a/kvm.h
 +++ b/kvm.h
 @@ -41,6 +41,8 @@ int kvm_has_sync_mmu(void);
  int kvm_has_vcpu_events(void);
  int kvm_has_robust_singlestep(void);
  int kvm_has_debugregs(void);
 +int kvm_has_xsave(void);
 +int kvm_has_xcrs(void);
  
  #ifdef NEED_CPU_H
  int kvm_init_vcpu(CPUState *env);
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index 548ab80..680eed1 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -718,6 +718,11 @@ typedef struct CPUX86State {
  uint16_t fpus_vmstate;
  uint16_t fptag_vmstate;
  uint16_t fpregs_format_vmstate;
 +
 +uint64_t xstate_bv;
 +XMMReg ymmh_regs[CPU_NB_REGS];
 +
 +uint64_t xcr0;
  } CPUX86State;
  
  CPUX86State *cpu_x86_init(const char *cpu_model);
 @@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
  #define cpu_list_id x86_cpu_list
  #define cpudef_setup x86_cpudef_setup
  
 -#define CPU_SAVE_VERSION 11
 +#define CPU_SAVE_VERSION 12
  
  /* MMU modes definitions */
  #define MMU_MODE0_SUFFIX _kernel
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index bb6a12f..db1f21d 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
  } else {
  env-mp_state = KVM_MP_STATE_RUNNABLE;
  }
 +/* Legal xcr0 for loading */
 +env-xcr0 = 1;
  }
  
  static int kvm_has_msr_star(CPUState *env)
 @@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
  return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
  }
  
 +#ifdef KVM_CAP_XSAVE
 +#define XSAVE_CWD_RIP 2
 +#define XSAVE_CWD_RDP 4
 +#define XSAVE_MXCSR   6
 +#define XSAVE_ST_SPACE8
 +#define XSAVE_XMM_SPACE   40
 +#define XSAVE_XSTATE_BV   128
 +#define XSAVE_YMMH_SPACE  144
 +#endif
 +
 +static int kvm_put_xsave(CPUState *env)
 +{
 +#ifdef KVM_CAP_XSAVE
 +int i;
 +struct kvm_xsave* xsave;
 +uint16_t cwd, swd, twd, fop;
 +
 +if (!kvm_has_xsave())
 +return kvm_put_fpu(env);
 +
 +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 +memset(xsave, 0, sizeof(struct kvm_xsave));
 +cwd = swd = twd = fop = 0;
 +swd = env-fpus  ~(7  11);
 +swd |= (env-fpstt  7)  11;
 +cwd = env-fpuc;
 +for (i = 0; i  8; ++i)
 +twd |= (!env-fptags[i])  i;
 +xsave-region[0] = (uint32_t)(swd  16) + cwd;
 +xsave-region[1] = (uint32_t)(fop  16) + twd;
 +memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
 +sizeof env-fpregs);
 +memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
 +sizeof env-xmm_regs);
 +xsave-region[XSAVE_MXCSR] = env-mxcsr;
 +*(uint64_t *)xsave-region[XSAVE_XSTATE_BV] = env-xstate_bv;
 +memcpy(xsave-region[XSAVE_YMMH_SPACE], env-ymmh_regs,
 +sizeof env-ymmh_regs);
 +return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
 +#else
 +return kvm_put_fpu(env);
 +#endif
 +}
 +
 +static int kvm_put_xcrs(CPUState *env)
 +{
 +#ifdef KVM_CAP_XCRS
 +struct kvm_xcrs xcrs;
 +
 +if (!kvm_has_xcrs())
 +return 0;
 +
 +xcrs.nr_xcrs = 1;
 +xcrs.flags = 0;
 +xcrs.xcrs[0].xcr = 0;
 +xcrs.xcrs[0].value = env-xcr0;
 +return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
 +#else
 +return 0;
 +#endif
 +}
 +
  static int kvm_put_sregs(CPUState *env)
  {
  struct kvm_sregs sregs;
 @@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
  return 0;
  }
  
 +static int kvm_get_xsave(CPUState *env)
 +{
 +#ifdef KVM_CAP_XSAVE
 +struct kvm_xsave* xsave;
 +int ret, i;
 +uint16_t cwd, swd, twd, fop;
 +
 +if (!kvm_has_xsave())
 +return kvm_get_fpu(env);

[Qemu-devel] Re: [PATCH] qemu-kvm: Replace kvm_set/get_fpu() with upstream version.

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
 
 Would send out XSAVE patch after the upstream ones have been merged, since the
 patch would be affected by the merge.
 
  qemu-kvm-x86.c|   23 ++-
  qemu-kvm.c|   10 --
  qemu-kvm.h|   30 --
  target-i386/kvm.c |5 -
  4 files changed, 6 insertions(+), 62 deletions(-)
 
 diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
 index 3c33e64..49218ae 100644
 --- a/qemu-kvm-x86.c
 +++ b/qemu-kvm-x86.c
 @@ -775,7 +775,6 @@ static void get_seg(SegmentCache *lhs, const struct 
 kvm_segment *rhs)
  void kvm_arch_load_regs(CPUState *env, int level)
  {
  struct kvm_regs regs;
 -struct kvm_fpu fpu;
  struct kvm_sregs sregs;
  struct kvm_msr_entry msrs[100];
  int rc, n, i;
 @@ -806,16 +805,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
  
  kvm_set_regs(env, regs);
  
 -memset(fpu, 0, sizeof fpu);
 -fpu.fsw = env-fpus  ~(7  11);
 -fpu.fsw |= (env-fpstt  7)  11;
 -fpu.fcw = env-fpuc;
 -for (i = 0; i  8; ++i)
 - fpu.ftwx |= (!env-fptags[i])  i;
 -memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
 -memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
 -fpu.mxcsr = env-mxcsr;
 -kvm_set_fpu(env, fpu);
 +kvm_put_fpu(env);
  
  memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
  if (env-interrupt_injected = 0) {
 @@ -933,7 +923,6 @@ void kvm_arch_load_regs(CPUState *env, int level)
  void kvm_arch_save_regs(CPUState *env)
  {
  struct kvm_regs regs;
 -struct kvm_fpu fpu;
  struct kvm_sregs sregs;
  struct kvm_msr_entry msrs[100];
  uint32_t hflags;
 @@ -965,15 +954,7 @@ void kvm_arch_save_regs(CPUState *env)
  env-eflags = regs.rflags;
  env-eip = regs.rip;
  
 -kvm_get_fpu(env, fpu);
 -env-fpstt = (fpu.fsw  11)  7;
 -env-fpus = fpu.fsw;
 -env-fpuc = fpu.fcw;
 -for (i = 0; i  8; ++i)
 - env-fptags[i] = !((fpu.ftwx  i)  1);
 -memcpy(env-fpregs, fpu.fpr, sizeof env-fpregs);
 -memcpy(env-xmm_regs, fpu.xmm, sizeof env-xmm_regs);
 -env-mxcsr = fpu.mxcsr;
 +kvm_get_fpu(env);
  
  kvm_get_sregs(env, sregs);
  
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 96d458c..114cb5e 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -461,16 +461,6 @@ int kvm_set_regs(CPUState *env, struct kvm_regs *regs)
  return kvm_vcpu_ioctl(env, KVM_SET_REGS, regs);
  }
  
 -int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu)
 -{
 -return kvm_vcpu_ioctl(env, KVM_GET_FPU, fpu);
 -}
 -
 -int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu)
 -{
 -return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
 -}
 -
  int kvm_get_sregs(CPUState *env, struct kvm_sregs *sregs)
  {
  return kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs);
 diff --git a/qemu-kvm.h b/qemu-kvm.h
 index 6f6c6d8..ebe7893 100644
 --- a/qemu-kvm.h
 +++ b/qemu-kvm.h
 @@ -222,36 +222,6 @@ int kvm_get_regs(CPUState *env, struct kvm_regs *regs);
   * \return 0 on success
   */
  int kvm_set_regs(CPUState *env, struct kvm_regs *regs);
 -/*!
 - * \brief Read VCPU fpu registers
 - *
 - * This gets the FPU registers from the VCPU and outputs them
 - * into a kvm_fpu structure
 - *
 - * \note This function returns a \b copy of the VCPUs registers.\n
 - * If you wish to modify the VCPU FPU registers, you should call 
 kvm_set_fpu()
 - *
 - * \param kvm Pointer to the current kvm_context
 - * \param vcpu Which virtual CPU should get dumped
 - * \param fpu Pointer to a kvm_fpu which will be populated with the VCPUs
 - * fpu registers values
 - * \return 0 on success
 - */
 -int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu);
 -
 -/*!
 - * \brief Write VCPU fpu registers
 - *
 - * This sets the FPU registers on the VCPU from a kvm_fpu structure
 - *
 - * \note When this function returns, the fpu pointer and the data it points 
 to
 - * can be discarded
 - * \param kvm Pointer to the current kvm_context
 - * \param vcpu Which virtual CPU should get dumped
 - * \param fpu Pointer to a kvm_fpu which holds the new vcpu fpu state
 - * \return 0 on success
 - */
 -int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu);
  
  /*!
   * \brief Read VCPU system registers
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 9cb9cf4..9c13f62 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -488,6 +488,7 @@ static int kvm_getput_regs(CPUState *env, int set)
  
  return ret;
  }
 +#endif /* KVM_UPSTREAM */
  
  static int kvm_put_fpu(CPUState *env)
  {
 @@ -507,6 +508,7 @@ static int kvm_put_fpu(CPUState *env)
  return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
  }
  
 +#ifdef KVM_UPSTREAM
  static int kvm_put_sregs(CPUState *env)
  {
  struct kvm_sregs sregs;
 @@ -605,7 +607,7 @@ static int kvm_put_msrs(CPUState *env, int level)
  return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data);
  
  }
 -
 +#endif /* KVM_UPSTREAM */
  
  static int kvm_get_fpu(CPUState *env)
  {
 @@ -628,6

Re: [Qemu-devel] Q35 qemu repository?

2010-06-17 Thread Matthew Garrett

On Thu, Jun 17, 2010 at 04:48:09PM +0900, Isaku Yamahata wrote:
 Thanks for the patch.
 Does vista boot with the patch eventually?

Vista boots, but is unable to allocate resources for the pcie root 
ports. I'm looking into that.

-- 
Matthew Garrett | mj...@srcf.ucam.org

Re: [Qemu-devel] Re: RFC v2: blockdev_add friends, brief rationale, QMP docs

2010-06-17 Thread Kevin Wolf

Am 17.06.2010 15:01, schrieb Anthony Liguori:
 On 06/17/2010 03:20 AM, Kevin Wolf wrote:
 Am 16.06.2010 20:07, schrieb Anthony Liguori:

But it requires that
 everything that -blockdev provides is accessible with -drive, too (or
 that we're okay with users hating us).


 I'm happy for -drive to die.  I think we should support -hda and
 -blockdev.
  
 -hda is not sufficient for most users. It doesn't provide any options.
 It doesn't even support virtio. If -drive is going to die (and we seem
 to agree all on that), then -blockdev needs to be usable for users (and
 it's only you who contradicts so far).

 
 I've always thought we should have a -vda argument and an -sda argument 
 specifically for specifying virtio and scsi disks.

It would at least fix the most obvious problem. However, it still
doesn't allow passing options.

 -blockdev should be optimized for config files, not single
 argument input.  IOW:

 [blockdev blk2]
format = raw
file = /path/to/base.img
cache = writeback

 [blockdev blk1]
 format = qcow2
 file = /path/to/leaf.img
 cache=off
 backing_dev = blk2

 [device disk1]
 driver = ide-drive
 blockdev = blk1
 bus = 0
 unit = 0
  
 You don't specify the backing file of an image on the command line (or
 in the configuration file).
 
 But we really ought to allow it.  Backing files are implemented as part 
 of the core block layer, not the actual block formats.  

The generic block layer knows the name of the backing file, so it can be
displayed in tools, but that's about it. Calling this the
implementation of backing files is daring.

I see no use case for specifying it on the command line. The only thing
you can achieve better with it is corrupting your image because you
specify the wrong/no backing file next time.

 Today the block 
 layer queries the block format for the name of the backing file but gets 
 no additional options from the block format.  File isn't necessarily 
 enough information to successfully open the backing device so why treat 
 it specially?
 
 I think we should keep the current ability to query the block format for 
 a backing file name but we should also support hooking up the backing 
 device without querying the block format at all.  It makes the model 
 much more elegant IMHO because then we're just creating block devices 
 and hooking them up.  All block devices are created equal more or less.
 
   It's saved as part of the image. It's more
 like this (for a simple raw image file):

 [blockdev-protocol proto1]
 protocol = file
 file = /path/to/image.img

 [blockdev blk1]
 format = raw
 cache=off
 protocol = proto1

 [device disk1]
 driver = ide-drive
 blockdev = blk1
 bus = 0
 unit = 0

 (This would be Markus' option 3, I think)

 
 I don't understand why we need two layers of abstraction here.  Why not 
 just:
 
 [blockdev proto1]
protocol = file
cache = off
file = /path/to/image.img
 
 Why does the cache option belong with raw and not with file and why 
 can't we just use file directly?

The cache option is shared along the chain, so it probably fits best in
the blockdev.

And we don't use file directly because it's wrong. Users say that their
image is in raw format, and they don't get why they should have to make
a difference between a raw image stored on a block device and one stored
in a file.

 As Christoph mentions, we really don't 
 have stacked protocols and I'm

The only question is if we call them stacked formats or stacked
protocols. One of them exists.

 not sure they make sense.
  
 Right, if we go for Christoph's suggestion, we don't need stacked
 protocols. We'll have stacked formats instead. I'm not sure if you like
 this any better. ;-)

 We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
 blkdebug on file. We need to be able to represent this.

 
 I think we need to do stacking in a device specific way.  When you look 
 at something like vmdk, it should actually support multiple leafs since 
 the format does support such a thing.  So what I'd suggest is:
 
 [blockdev part1]
format = raw
file = image000.vmdk
 
 [blockdev part2]
format = raw
file = image001.vmdk
 
 [blockdev image]
format = vmdk
section0 = part1
section1 = part2

Actually, I'd prefer to read that information from the VMDK file instead
of requiring the user to configure this manually...

 Note, we'll need to support this sort of model in order to support a 
 disk that creates an automatic partition table (which would be a pretty 
 useful feature). 

Sounds like a good example of a useful protocol.

Markus, I'm afraid we've found an equivalent to Avi's mirror. If not
even more complicated, because we'd need to accept any length for the
list of partitions - possibly an option that should take an array?

 For blkdebug, it would look like:
 
 [blockdev disk]
format = qcow2
file = foo.qcow2

[Qemu-devel] Re: [RFC][PATCH 2/2] qcow2: Use bdrv_(p)write_sync for metadata writes

2010-06-17 Thread Stefan Hajnoczi

On Thu, Jun 17, 2010 at 1:03 PM, Kevin Wolf kw...@redhat.com wrote:
 Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Any performance numbers?  This change is necessary for correctness but
I wonder what the performance impact is for users.

Stefan

[Qemu-devel] Re: [PATCH v3 0/5] Add QMP migration events

2010-06-17 Thread Luiz Capitulino

On Wed, 16 Jun 2010 21:10:04 +0200
Juan Quintela quint...@redhat.com wrote:

 Luiz Capitulino lcapitul...@redhat.com wrote:
  On Tue, 15 Jun 2010 17:24:59 +0200
  Juan Quintela quint...@redhat.com wrote:
 
  
I still don't see the need for MIGRATION_STARTED, it could be useful in
   the target but I'd like to understand the use case in more detail.
  
  At this point, if you are doing migration with tcp, and you are putting
  the wrong port on source (no path or any other error), you get no info
  at all of what is happening.
 
   Shouldn't the migrate command just the return the expected error?
 
 No.  Think you are having troubles.  You try to find what happens.
 launch things by hand.  And there is no way to know if anybody has
 conected to the destination machine.  Some notification that migration
 has started is _very_ useful.  expecially when there are
 networks/firewalls/... in the middle.

 [...]

 That is it.  But you continue telling that going to the old house and
 doing a info migrate is a good interface.

 I'm sorry? When did I ever claimed such a thing?

 First point: all you describe is MIGRATION_CONNECTED, at the end of the day
it would do exactly what you want for MIGRATION_STARTED.

 The second, and most important point, is that we're trying not to make
things worse. Adding a number of events to circumvent a bad designed
command and having the wrong expectations (ie. help developer debugging)
is a clear recipe for disaster.

 Anyway, I think it doesn't matter anymore, as QMP is not going to be declared
stable for 0.13. In this case we'll have enough time to design the proper
interface.

 To add insult to injury, the problem is that libvirt people are not
 collaborative, and expect things that can't be done, are uncooperative,

 Again, I've never claimed that and I think you're taking this thread to
the wrong direction.

 
 
 Libvirt folks also do lots of things wrong, they are not perfect.  But
 it in this case, who is being completely unreasonable is qemu land.
 
 Later, Juan.

[Qemu-devel] Re: [PATCH 5/5] linux fbdev display driver.

2010-06-17 Thread Julian Pidancet

On 06/17/2010 11:43 AM, Gerd Hoffmann wrote:
Hi,
 
 You register the display allocator, but don't unregister in 
 fbdev_display_uninit().
 
 You are just lucky that fbdev_cleanup() forgets to unmap the framebuffer.
 
 Apply the attached fix, start qemu with vnc, then do change fbdev on 
 and change fbdev off in the monitor and watch qemu segfault.
 
 Also after change fbdev on the guest screen isn't rendered correctly.
 
 cheers,
Gerd
 

Hi,

Thanks for spotting these errors. Here is a respin of my patch to address you 
concerns.
(The munmap call is included).

Cheers,

Julian

diff --git a/console.c b/console.c
index 698bc10..12ce215 100644
--- a/console.c
+++ b/console.c
@@ -1376,6 +1376,16 @@ DisplayAllocator *register_displayallocator(DisplayState 
*ds, DisplayAllocator *
 return ds-allocator;
 }
 
+void unregister_displayallocator(DisplayState *ds)
+{
+if (ds-allocator != default_allocator) {
+ds-allocator-free_displaysurface(ds-surface);
+ds-surface = defaultallocator_create_displaysurface(ds_get_width(ds),
+ 
ds_get_height(ds));
+ds-allocator = default_allocator;
+}
+}
+
 DisplayState *graphic_console_init(vga_hw_update_ptr update,
vga_hw_invalidate_ptr invalidate,
vga_hw_screen_dump_ptr screen_dump,
diff --git a/console.h b/console.h
index 124a22b..40bd927 100644
--- a/console.h
+++ b/console.h
@@ -192,6 +192,7 @@ PixelFormat qemu_different_endianness_pixelformat(int bpp);
 PixelFormat qemu_default_pixelformat(int bpp);
 
 DisplayAllocator *register_displayallocator(DisplayState *ds, DisplayAllocator 
*da);
+void unregister_displayallocator(DisplayState *ds);
 
 static inline DisplaySurface* qemu_create_displaysurface(DisplayState *ds, int 
width, int height)
 {
@@ -371,7 +372,7 @@ void sdl_display_init(DisplayState *ds, int full_screen, 
int no_frame);
 
 /* fbdev.c */
 void fbdev_display_init(DisplayState *ds, const char *device);
-void fbdev_display_uninit(void);
+void fbdev_display_uninit(DisplayState *ds);
 
 /* cocoa.m */
 void cocoa_display_init(DisplayState *ds, int full_screen);
diff --git a/fbdev.c b/fbdev.c
index 54f2381..8ea1838 100644
--- a/fbdev.c
+++ b/fbdev.c
@@ -67,13 +67,13 @@ static int fb_switch_state = FB_ACTIVE;
 
 /* qdev windup */
 static DisplayChangeListener  *dcl;
+static DisplayAllocator   *da;
 static QemuPfConv *conv;
 static PixelFormatfbpf;
-static intresize_screen;
-static intredraw_screen;
 static intcx, cy, cw, ch;
 static intdebug = 0;
 static Notifier   exit_notifier;
+uint8_t   *guest_surface;
 
 /* fwd decls */
 static int fbdev_activate_vt(int tty, int vtno, bool wait);
@@ -519,6 +519,10 @@ static void fbdev_cleanup(void)
 fprintf(stderr, %s\n, __FUNCTION__);
 
 /* restore console */
+if (fb_mem != NULL) {
+munmap(fb_mem, fb_fix.smem_len + fb_mem_offset);
+fb_mem = NULL;
+}
 if (fb != -1) {
 if (ioctl(fb,FBIOPUT_VSCREENINFO, fb_ovar)  0)
 perror(ioctl FBIOPUT_VSCREENINFO);
@@ -786,10 +790,10 @@ static void fbdev_render(DisplayState *ds, int x, int y, 
int w, int h)
 uint8_t *src;
 int line;
 
-if (!conv)
+if (!conv || !guest_surface)
 return;
 
-src = ds_get_data(ds) + y * ds_get_linesize(ds)
+src = guest_surface + y * ds_get_linesize(ds)
 + x * ds_get_bytes_per_pixel(ds);
 dst = fb_mem + y * fb_fix.line_length
 + x * fbpf.bytes_per_pixel;
@@ -819,46 +823,50 @@ static void fbdev_update(DisplayState *ds, int x, int y, 
int w, int h)
 if (fb_switch_state != FB_ACTIVE)
 return;
 
-if (resize_screen) {
-if (debug)
-fprintf(stderr, %s: handle resize\n, __FUNCTION__);
-resize_screen = 0;
-cx = 0; cy = 0;
-cw = ds_get_width(ds);
-ch = ds_get_height(ds);
-if (ds_get_width(ds)  fb_var.xres) {
-cx = (fb_var.xres - ds_get_width(ds)) / 2;
-}
-if (ds_get_height(ds)  fb_var.yres) {
-cy = (fb_var.yres - ds_get_height(ds)) / 2;
-}
+if (guest_surface != NULL) {
+fbdev_render(ds, x, y, w, h);
+}
+}
 
-if (conv) {
-qemu_pf_conv_put(conv);
-}
-conv = qemu_pf_conv_get(fbpf, ds-surface-pf);
-if (conv == NULL) {
-fprintf(stderr, fbdev: unsupported PixelFormat conversion\n);
-}
+static void fbdev_setdata(DisplayState *ds)
+{
+if (conv) {
+qemu_pf_conv_put(conv);
 }
 
-if (redraw_screen) {
-if (debug)
-fprintf(stderr, %s: handle redraw\n, __FUNCTION__);
-redraw_screen = 0;
-fbdev_cls();
-x = 0; y = 0; w = ds_get_width(ds); h = ds_get_height(ds);
+conv =

[Qemu-devel] Re: [PATCH v2] monitor: Really show snapshot information about all devices

2010-06-17 Thread Luiz Capitulino

On Thu, 17 Jun 2010 09:58:37 -0300
Miguel Di Ciurcio Filho miguel.fi...@gmail.com wrote:

 The 'info snapshots' monitor command does not show snapshot information from 
 all
 available block devices.
 
 Usage example:
 $ qemu -hda disk1.qcow2 -hdb disk2.qcow2
 
 (qemu) info snapshots
 Snapshot devices: ide0-hd0
 Snapshot list (from ide0-hd0):
 IDTAG VM SIZEDATE   VM CLOCK
 11.5M 2010-05-26 21:51:02   00:00:03.263
 21.5M 2010-05-26 21:51:09   00:00:08.844
 31.5M 2010-05-26 21:51:24   00:00:23.274
 41.5M 2010-05-26 21:53:17   00:00:03.595
 
 In the above case, disk2.qcow2 has snapshot information, but it is not being
 shown. Only the first device is always shown.
 
 This patch updates the do_info_snapshots() function do correctly show snapshot
 information about all available block devices.
 
 New output:
 (qemu) info snapshots
 Snapshot list from ide0-hd0 (VM state image):
 IDTAG VM SIZEDATE   VM CLOCK
 11.5M 2010-05-26 21:51:02   00:00:03.263
 21.5M 2010-05-26 21:51:09   00:00:08.844
 31.5M 2010-05-26 21:51:24   00:00:23.274
 41.5M 2010-05-26 21:53:17   00:00:03.595
 
 Snapshot list from ide0-hd1:
 IDTAG VM SIZEDATE   VM CLOCK
 1   0 2010-05-26 21:51:02   00:00:03.263
 2   0 2010-05-26 21:51:09   00:00:08.844
 3   0 2010-05-26 21:51:24   00:00:23.274
 4   0 2010-05-26 21:53:17   00:00:03.595

 I agree we need this info somewhere, but I'm wondering if this output won't
get users confused.

 Perhaps it would be perfect to have 'info snapshots -a', but the user Monitor
don't support passing options to info commands.

 Suggestions?

[Qemu-devel] Re: [CFR 0/10] QMP specification review

2010-06-17 Thread Luiz Capitulino

On Tue, 15 Jun 2010 11:30:20 -0500
Anthony Liguori aligu...@us.ibm.com wrote:

 This is the first set of commands as part of the QMP specification review.
 Please comment on the individual commands specifications and Stefan and I will
 try to fold the comments back into the command documentation.

 Very nice!

 A few comments regarding the process in general:

  1. How are the issues going to be addressed? I mean, are you or Stefan going
 to send fixes or should we create a TODO page in the wiki so that we can
 work on the feedback later?

  2. I think we should slow down, so that we give more time to reviewers and
 there's no reason to hurry IMHO, as we won't go stable in 0.13

  3. Avi and Daniel, please join the effort

[Qemu-devel] Re: [PATCH 2/3] Monitor command 'info trace'

2010-06-17 Thread Stefan Hajnoczi

On Wed, Jun 16, 2010 at 06:12:06PM +0530, Prerna Saxena wrote:
 diff --git a/simpletrace.c b/simpletrace.c
 index 2fec4d3..239ae3f 100644
 --- a/simpletrace.c
 +++ b/simpletrace.c
 @@ -62,3 +62,16 @@ void trace4(TraceEvent event, unsigned long x1, unsigned 
 long x2, unsigned long
  void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
 long x3, unsigned long x4, unsigned long x5) {
  trace(event, x1, x2, x3, x4, x5);
  }
 +
 +void do_info_trace(Monitor *mon)
 +{
 +unsigned int i, max_idx;
 +
 +max_idx = trace_idx ? trace_idx : TRACE_BUF_LEN;

trace_idx is always in the range [0, TRACE_BUF_LEN).  There is no need
to perform this test.

 +
 +for (i=0; imax_idx ;i++) {

Whitespace i=0; imax_idx ;i++.  i = 0; i  max_idx; i++ is pretty
common across QEMU.

 +monitor_printf(mon, Event %ld : %ld %ld %ld %ld %ld\n,
 +  trace_buf[i].event, trace_buf[i].x1, 
 trace_buf[i].x2,
 +trace_buf[i].x3, trace_buf[i].x4, 
 trace_buf[i].x5);

Getting only numeric output is the limitation of a binary trace.  It
would probably be possible to pretty-print without much additional code
by using the format strings from the trace-events file.

I think the numeric dump is good for now though.  Hex is more compact
than decimal and would make pointers easier to spot.  Want to change
this?

 +}
 +}
 diff --git a/tracetool b/tracetool
 index 9ea9c08..2c73bab 100755
 --- a/tracetool
 +++ b/tracetool
 @@ -130,6 +130,7 @@ void trace2(TraceEvent event, unsigned long x1, unsigned 
 long x2);
  void trace3(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
 long x3);
  void trace4(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
 long x3, unsigned long x4);
  void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
 long x3, unsigned long x4, unsigned long x5);
 +void do_info_trace(Monitor *mon);
  EOF
 
  simple_event_num=0
 @@ -289,6 +290,7 @@ tracetoh()
  #define TRACE_H
 
  #include qemu-common.h
 +#include monitor.h

qemu-common.h forward-declares Monitor, I don't think you need
monitor.h.

Stefan

RE: [Qemu-devel] VLIW?

2010-06-17 Thread Gibbons, Scott

Yes, as a guest.

Thanks for the helpful suggestions.  We have a closed pipeline and code errors 
are caught by the assembler.  Delaying writeback is most likely what I'll be 
doing.

Another question I have is how to handle this multithreaded architecture.  This 
seems to be extraordinarily difficult as a dynamic translation problem and I'll 
probably defer it to later.  But, if anyone has any suggestions, I'd be glad to 
hear them.

Thanks,
--Scott

---
Qualcomm Inc. / Hexagon Tools
Austin, TX




-Original Message-
From: Richard Henderson [mailto:rth7...@gmail.com] On Behalf Of Richard 
Henderson
Sent: Wednesday, June 16, 2010 12:41 PM
To: Gibbons, Scott
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] VLIW?

On 06/15/2010 08:53 AM, Gibbons, Scott wrote:
 Has anyone done a port of QEMU to a VLIW architecture?  I'm interested
 in seeing what was done.

Do you mean as guest or host?  I presume guest.

There's not such a port in the main repository; I don't know
what might have been done privately.

It'll be a more difficult job if you have an open pipeline, but
even then I should think it could be done.  It really depends on
the exact specification of your cpu.

For instance, with a closed pipeline, I think all you would need
to track during translation are the output temporaries.  You would
translate each member instruction sequentially, but delay writeback
to the architectual register until the end of the vliw packet.

With an open pipeline, I imagine that you would model each exposed
architectural feature.  For instance, if a load insn places its
result onto a bus in the cycle following the issue of the load,
then you could model the bus with a TCG register and have the
translator be responsible for issuing moves between the TCG 
registers during appropriate cycles.

I imagine the difficulty increases (but not intractably) if you
want the translator to catch and signal user coding errors in the
vliw assembly.  Though usually that's a job that can be performed
statically by the assembler...


r~

[Qemu-devel] [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Alex Williamson

The comment suggests we're checking for the driver in the ready
state and bus master disabled, but the code is checking that it's
not in the ready state.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Found-by: Amit Shah amit.s...@redhat.com
---

 hw/virtio-pci.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index e101fa0..7a86a81 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
*f)
 
 /* Try to find out if the guest has bus master disabled, but is
in ready state. Then we have a buggy guest OS. */
-if (!(proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
+if ((proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
 !(proxy-pci_dev.config[PCI_COMMAND]  PCI_COMMAND_MASTER)) {
 proxy-bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
 }

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 09:15:02AM -0600, Alex Williamson wrote:
 The comment suggests we're checking for the driver in the ready
 state and bus master disabled, but the code is checking that it's
 not in the ready state.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Found-by: Amit Shah amit.s...@redhat.com

Acked-by: Michael S. Tsirkin m...@redhat.com

 ---
 
  hw/virtio-pci.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index e101fa0..7a86a81 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
 *f)
  
  /* Try to find out if the guest has bus master disabled, but is
 in ready state. Then we have a buggy guest OS. */
 -if (!(proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
 +if ((proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
  !(proxy-pci_dev.config[PCI_COMMAND]  PCI_COMMAND_MASTER)) {
  proxy-bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
  }

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Amit Shah

On (Thu) Jun 17 2010 [09:15:02], Alex Williamson wrote:
 The comment suggests we're checking for the driver in the ready
 state and bus master disabled, but the code is checking that it's
 not in the ready state.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Found-by: Amit Shah amit.s...@redhat.com
 ---

Acked-by: Amit Shah amit.s...@redhat.com


Amit

Re: [Qemu-devel] VLIW?

2010-06-17 Thread Richard Henderson

On 06/17/2010 08:12 AM, Gibbons, Scott wrote:
 Another question I have is how to handle this multithreaded
 architecture.  This seems to be extraordinarily difficult as a
 dynamic translation problem and I'll probably defer it to later.
 But, if anyone has any suggestions, I'd be glad to hear them.

How is your threading different from other SMP systems?

In system mode, QEMU TCG is single-threaded and models SMP via
cooperative switching in between TCG translation blocks.  It's
not ideal, but it does solve quite a number of problems and is
at least functional.


r~

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Alexander Graf

Alex Williamson wrote:
 The comment suggests we're checking for the driver in the ready
 state and bus master disabled, but the code is checking that it's
 not in the ready state.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Found-by: Amit Shah amit.s...@redhat.com
 ---

  hw/virtio-pci.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index e101fa0..7a86a81 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
 *f)
  
  /* Try to find out if the guest has bus master disabled, but is
 in ready state. Then we have a buggy guest OS. */
 -if (!(proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
 +if ((proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
   

Phew - that's an evil one. Thanks for the catch!

Acked-by: Alexander Graf ag...@suse.de

Alex

[Qemu-devel] Re: [Bug 595117] Re: qemu-nbd slow and missing writeback cache option

2010-06-17 Thread Dustin Kirkland

Stephane-

I understand your plight.  However, according to the rules and
policies of the QEMU project, you must submit the patch on the
qemu-devel@ mailing list, in addition to (or instead of) in the bug
tracker.  It's not my project, not my policy.  I'm just trying to make
sure you get your patch in front of the right audience such that it
can be discussed and accepted.

-- 
qemu-nbd slow and missing writeback cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and qemu-nbd -d doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless syncs are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
syncs will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon qemu-nbd -d to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a  /dev/null
qemu-nbd --cache=x -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT-sync or die$!' 1 /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.

[Qemu-devel] Re: [PATCH 3/3] Toggle tracepoint state

2010-06-17 Thread Stefan Hajnoczi

On Wed, Jun 16, 2010 at 06:14:35PM +0530, Prerna Saxena wrote:
 This patch adds support for dynamically enabling/disabling of tracepoints.
 This is done by internally maintaining each tracepoint's state, and 
 permitting logging of data from a tracepoint only if it is in an 
 'active' state.
 
 Monitor commands added :
 1) info tracepoints   : to view all available tracepoints and 
 their state.
 2) tracepoint NAME on|off : to enable/disable data logging from a 
 given tracepoint.
 Eg, tracepoint paio_submit off 
   disables logging of data when 
   paio_submit is hit.
 
 Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com
 ---
 
  monitor.c   |   16 ++
  qemu-monitor.hx |   18 
  simpletrace.c   |   63 
 +++
  tracetool   |   30 +++---
  vl.c|6 +
  5 files changed, 129 insertions(+), 4 deletions(-)
 
 
 diff --git a/monitor.c b/monitor.c
 index 8b60830..238bdf0 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -548,6 +548,15 @@ static void do_commit(Monitor *mon, const QDict *qdict)
  }
  }
 
 +#ifdef CONFIG_SIMPLE_TRACE
 +static void do_change_tracepoint_state(Monitor *mon, const QDict *qdict)
 +{
 +const char *tp_name = qdict_get_str(qdict, name);
 +bool new_state = qdict_get_bool(qdict, option);
 +change_tracepoint_state(tp_name, new_state);
 +}
 +#endif
 +
  static void user_monitor_complete(void *opaque, QObject *ret_data)
  {
  MonitorCompletionData *data = (MonitorCompletionData *)opaque; 
 @@ -2791,6 +2800,13 @@ static const mon_cmd_t info_cmds[] = {
  .help   = show current contents of trace buffer,
  .mhandler.info = do_info_trace,
  },
 +{
 +.name   = tracepoints,
 +.args_type  = ,
 +.params = ,
 +.help   = show available tracepoints  their state,
 +.mhandler.info = do_info_all_tracepoints,
 +},
  #endif
  {
  .name   = NULL,
 diff --git a/qemu-monitor.hx b/qemu-monitor.hx
 index 766c30f..8540b8f 100644
 --- a/qemu-monitor.hx
 +++ b/qemu-monitor.hx
 @@ -117,6 +117,8 @@ show device tree
  #ifdef CONFIG_SIMPLE_TRACE
  @item info trace
  show contents of trace buffer
 +...@item info tracepoints
 +show available tracepoints and their state
  #endif
  @end table
  ETEXI
 @@ -225,6 +227,22 @@ STEXI
  @item logfile @var{filename}
  @findex logfile
  Output logs to @var{filename}.
 +#ifdef CONFIG_SIMPLE_TRACE
 +ETEXI
 +
 +{
 +.name   = tracepoint,
 +.args_type  = name:s,option:b,
 +.params = name on|off,
 +.help   = changes status of a specific tracepoint,
 +.mhandler.cmd = do_change_tracepoint_state,
 +},
 +
 +STEXI
 +...@item tracepoint
 +...@findex tracepoint
 +changes status of a tracepoint
 +#endif
  ETEXI
 
  {
 diff --git a/simpletrace.c b/simpletrace.c
 index 239ae3f..4221a8f 100644
 --- a/simpletrace.c
 +++ b/simpletrace.c
 @@ -3,6 +3,12 @@
  #include trace.h
 
  typedef struct {
 +char *tp_name;
 +bool state;
 +unsigned int hash;
 +} Tracepoint;

The tracing infrastructure avoids using the name 'tracepoint'.  It calls
them trace events.  I didn't deliberately choose that name, but was
unaware at the time that Linux tracing calls them tracepoints.  Given
that 'trace event' is currently used, it would be nice to remain
consistent/reduce confusion.

How about:
typedef struct {
const char *name;
bool enabled;
unsigned int hash;
} TraceEventState;

Or a nicer overall change might be to rename enum TraceEvent to
TraceEventID and Tracepoint to TraceEvent.

 +
 +typedef struct {
  unsigned long event;
  unsigned long x1;
  unsigned long x2;
 @@ -18,11 +24,29 @@ enum {
  static TraceRecord trace_buf[TRACE_BUF_LEN];
  static unsigned int trace_idx;
  static FILE *trace_fp;
 +static Tracepoint trace_list[NR_TRACEPOINTS];
 +
 +void init_tracepoint(const char *tname, TraceEvent tevent)
 +{
 +if (!tname || tevent  NR_TRACEPOINTS) {
 +return;
 +}

I'd drop this check because only trace.c should use init_tracepoint()
and you have ensured it uses correct arguments.  Just a coding style
suggestion; having redundant checks makes the code more verbose, may
lead the reader to assume that this function really is called with junk
arguments, and silently returning will not help make the issue visible.

 +trace_list[tevent].tp_name = (char*)qemu_malloc(strlen(tname)+1);
 +strncpy(trace_list[tevent].tp_name, tname, strlen(tname));

Or use qemmu_strdup() but we don't really need to allocate memory at all
here.  Just hold the const char* to a string literal since the trace
event is a static object that is built into the binary.

 +trace_list[tevent].hash = qemu_hash(tname);

Re: [Qemu-devel] Re: [PATCH 5/5] linux fbdev display driver.

2010-06-17 Thread Julian Pidancet

On 06/17/2010 03:29 PM, Julian Pidancet wrote:
 
 Hi,
 
 Thanks for spotting these errors. Here is a respin of my patch to address you 
 concerns.
 (The munmap call is included).
 
 Cheers,
 
 Julian
 

Oh, I actually tested the last patch only with the -nographic switch. There's 
still a segfault when starting qemu with vnc.
You can fix it by adding a call to dpy_resize(ds) after the dcl = NULL; line in 
fbdev_display_uninit().

For some reason, the display is extremely slow when using vnc and fbdev at the 
same time.

Julian

[Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing writeback cache option

2010-06-17 Thread Brian Murray

** Tags added: patch

-- 
qemu-nbd slow and missing writeback cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and qemu-nbd -d doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless syncs are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
syncs will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon qemu-nbd -d to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a  /dev/null
qemu-nbd --cache=x -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT-sync or die$!' 1 /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Juan Quintela

Alex Williamson alex.william...@redhat.com wrote:
 The comment suggests we're checking for the driver in the ready
 state and bus master disabled, but the code is checking that it's
 not in the ready state.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Found-by: Amit Shah amit.s...@redhat.com

Acked-by: Juan Quintela quint...@redhat.com

 ---

  hw/virtio-pci.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index e101fa0..7a86a81 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
 *f)
  
  /* Try to find out if the guest has bus master disabled, but is
 in ready state. Then we have a buggy guest OS. */
 -if (!(proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
 +if ((proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
  !(proxy-pci_dev.config[PCI_COMMAND]  PCI_COMMAND_MASTER)) {
  proxy-bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
  }

1 2 >

1 - 100 of 130 matches

Mail list logo