date:20150203

[Qemu-devel] [PATCH v16 1/2] sPAPR: Implement EEH RTAS calls

2015-02-03 Thread Gavin Shan

The emulation for EEH RTAS requests from guest isn't covered
by QEMU yet and the patch implements them.

The patch defines constants used by EEH RTAS calls and adds
callback sPAPRPHBClass::eeh_handler, which is going to be used
this way:

  * RTAS calls are received in spapr_pci.c, sanity check is done
there.
  * RTAS handlers handle what they can. If there is something it
cannot handle and sPAPRPHBClass::eeh_handler callback is defined,
it is called.
  * sPAPRPHBClass::eeh_handler is only implemented for VFIO now. It
does ioctl() to the IOMMU container fd to complete the call. Error
codes from that ioctl() are transferred back to the guest.

[aik: defined RTAS tokens for EEH RTAS calls]
Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 hw/ppc/spapr_pci.c  | 310 
 include/hw/pci-host/spapr.h |   7 +
 include/hw/ppc/spapr.h  |  43 +-
 3 files changed, 358 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 6deeb19..3fac5a9 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -406,6 +406,297 @@ static void 
rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
 rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */
 }
 
+static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
+sPAPREnvironment *spapr,
+uint32_t token, uint32_t nargs,
+target_ulong args, uint32_t nret,
+target_ulong rets)
+{
+sPAPRPHBState *sphb;
+sPAPRPHBClass *spc;
+uint32_t addr, option;
+uint64_t buid;
+int ret;
+
+if ((nargs != 4) || (nret != 1)) {
+goto param_error_exit;
+}
+
+buid = ((uint64_t)rtas_ld(args, 1)  32) | rtas_ld(args, 2);
+addr = rtas_ld(args, 0);
+option = rtas_ld(args, 3);
+
+sphb = find_phb(spapr, buid);
+if (!sphb) {
+goto param_error_exit;
+}
+
+spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+if (!spc-eeh_handler) {
+goto param_error_exit;
+}
+
+switch (option) {
+case RTAS_EEH_ENABLE:
+if (!find_dev(spapr, buid, addr)) {
+goto param_error_exit;
+}
+break;
+case RTAS_EEH_DISABLE:
+case RTAS_EEH_THAW_IO:
+case RTAS_EEH_THAW_DMA:
+break;
+default:
+goto param_error_exit;
+}
+
+ret = spc-eeh_handler(sphb, RTAS_EEH_REQ_SET_OPTION, option);
+if (ret  0) {
+rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+return;
+}
+
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+return;
+
+param_error_exit:
+rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
+   sPAPREnvironment *spapr,
+   uint32_t token, uint32_t nargs,
+   target_ulong args, uint32_t nret,
+   target_ulong rets)
+{
+sPAPRPHBState *sphb;
+sPAPRPHBClass *spc;
+PCIDevice *pdev;
+uint32_t addr, option;
+uint64_t buid;
+
+if ((nargs != 4) || (nret != 2)) {
+goto param_error_exit;
+}
+
+buid = ((uint64_t)rtas_ld(args, 1)  32) | rtas_ld(args, 2);
+sphb = find_phb(spapr, buid);
+if (!sphb) {
+goto param_error_exit;
+}
+
+spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+if (!spc-eeh_handler) {
+goto param_error_exit;
+}
+
+addr = rtas_ld(args, 0);
+option = rtas_ld(args, 3);
+if (option != RTAS_GET_PE_ADDR  option != RTAS_GET_PE_MODE) {
+goto param_error_exit;
+}
+
+pdev = find_dev(spapr, buid, addr);
+if (!pdev) {
+goto param_error_exit;
+}
+
+/*
+ * For now, we always have bus level PE whose address
+ * has format 00BBSS00. The guest OS might regard
+ * PE address 0 as invalid. We avoid that simply by
+ * extending it with one.
+ */
+if (option == RTAS_GET_PE_ADDR) {
+rtas_st(rets, 1, (pci_bus_num(pdev-bus)  16) + 1);
+} else {
+rtas_st(rets, 1, RTAS_PE_MODE_SHARED);
+}
+
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+return;
+
+param_error_exit:
+rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
+sPAPREnvironment *spapr,
+uint32_t token, uint32_t nargs,
+target_ulong args, uint32_t nret,
+target_ulong rets)
+{
+sPAPRPHBState *sphb;
+sPAPRPHBClass *spc;
+uint64_t buid;
+int ret;
+
+if ((nargs != 3) || (nret != 4  nret != 5)) {
+goto param_error_exit;
+}
+
+buid = ((uint64_t)rtas_ld(args, 1)  32) | rtas_ld(args, 2);
+sphb = find_phb(spapr, buid);
+if (!sphb) {
+

[Qemu-devel] [PATCH v16 0/2] EEH Support for VFIO Devices

2015-02-03 Thread Gavin Shan

The series of patches adds support EEH for VFIO PCI devices on sPAPR platform.
It requires corresponding host kernel support, which was merged during 3.17
merge window. This patchset has been rebased to Alex Graf's QEMU repository:

   git://github.com/agraf/qemu.git (branch: ppc-next)

The implementations notes are below. Please consider for merging!

* RTAS calls are received in spapr_pci.c, sanity check is done there. RTAS
  handlers handle what they can. If there is something it cannot handle and
  sPAPRPHBClass::eeh_handler callback is defined, it is called.
* sPAPRPHBClass::eeh_handler is only implemented for VFIO now. It does ioctl()
  to the IOMMU container fd to complete the call. Error codes from that ioctl()
  are transferred back to the guest.

Changelog
=
v12 - v13:
* Rebase to Alex Graf's QEMU repository (ppc-next branch).
* Drop the patch for header file (vfio.h) changes, which was merged
  to QEMU repository by commit a9fd1654 (linux-headers: update to 
3.17-rc7).
* Retested on Emulex adapter and EEH errors are recovered successfully.
v13 - v14:
* Check if sPAPRPHBState instance is valid before converting it to the
  corresponding class as pointed by Alex Graf.
v14 - v15:
* Dropped unrelated patch making find_phb()/find_dev() public.
* Checking RTAS parameter number before accessing RTAS parameter buffer for
  more safety.
* Return hardware error from RTAS call ibm,set-eeh-option and 
ibm,set-slot-reset
  for some cases according to PAPR spec.
v15 - v16:
* Drop rtas_handle_eeh_request() and merge the logic to its callers so that
  more accurate return values can be returned for RTAS calls in the callers
* Always return 1 (No error log) for RTAS call ibm,slot-error-detail and
  correct wrong return values for other RTAS calls according to David 
Gibson's
  suggestions.
* Make fall-through more obvious for case of negative return value from
  sPAPRPHBClass::eeh_handler()
* Clear the argument buffer passed to ioctl()
* Rename sPAPRPHBClass variable from info to spc

Gavin Shan (2):
  sPAPR: Implement EEH RTAS calls
  sPAPR: Implement sPAPRPHBClass::eeh_handler

 hw/ppc/spapr_pci.c  | 310 
 hw/ppc/spapr_pci_vfio.c |  58 +
 hw/vfio/common.c|   1 +
 include/hw/pci-host/spapr.h |   7 +
 include/hw/ppc/spapr.h  |  43 +-
 5 files changed, 417 insertions(+), 2 deletions(-)

-- 
1.8.3.2

Re: [Qemu-devel] [PATCH 6/9] cosmetic changes preparing for the following patches

2015-02-03 Thread Fam Zheng

On Tue, 02/03 13:52, Paolo Bonzini wrote:
 From: Mike Day ncm...@ncultra.org
 
 Signed-off-by: Mike Day ncm...@ncultra.org
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch_init.c|  5 +--
  exec.c | 84 
 +-
  include/exec/cpu-all.h |  1 +
  3 files changed, 57 insertions(+), 33 deletions(-)
 
 diff --git a/arch_init.c b/arch_init.c
 index 89c8fa4..b13f74b 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -688,9 +688,9 @@ static int ram_find_and_save_block(QEMUFile *f, bool 
 last_stage)
  }
  }
  }
 +
  last_seen_block = block;
  last_offset = offset;
 -
  return bytes_sent;
  }
  
 @@ -1117,7 +1117,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
 version_id)
  ret = -EINVAL;
  break;
  }
 -
  ch = qemu_get_byte(f);
  ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
  break;
 @@ -1128,7 +1127,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
 version_id)
  ret = -EINVAL;
  break;
  }
 -
  qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
  break;
  case RAM_SAVE_FLAG_XBZRLE:
 @@ -1138,7 +1136,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
 version_id)
  ret = -EINVAL;
  break;
  }
 -
  if (load_xbzrle(f, addr, host)  0) {
  error_report(Failed to decompress XBZRLE page at 
   RAM_ADDR_FMT, addr);
 diff --git a/exec.c b/exec.c
 index 05c5b44..8239370 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1265,11 +1265,12 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
  return NULL;
  }
  
 +/* Called with iothread lock held.  */
  void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
  {
 -RAMBlock *new_block = find_ram_block(addr);
 -RAMBlock *block;
 +RAMBlock *new_block, *block;
  
 +new_block = find_ram_block(addr);
  assert(new_block);
  assert(!new_block-idstr[0]);
  
 @@ -1282,7 +1283,6 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
 *name, DeviceState *dev)
  }
  pstrcat(new_block-idstr, sizeof(new_block-idstr), name);
  
 -/* This assumes the iothread lock is taken here too.  */
  qemu_mutex_lock_ramlist();
  QTAILQ_FOREACH(block, ram_list.blocks, next) {
  if (block != new_block  !strcmp(block-idstr, new_block-idstr)) {
 @@ -1294,10 +1294,17 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
 *name, DeviceState *dev)
  qemu_mutex_unlock_ramlist();
  }
  
 +/* Called with iothread lock held.  */
  void qemu_ram_unset_idstr(ram_addr_t addr)
  {
 -RAMBlock *block = find_ram_block(addr);
 +RAMBlock *block;
  
 +/* FIXME: arch_init.c assumes that this is not called throughout
 + * migration.  Ignore the problem since hot-unplug during migration
 + * does not work anyway.
 + */
 +
 +block = find_ram_block(addr);
  if (block) {
  memset(block-idstr, 0, sizeof(block-idstr));
  }
 @@ -1585,7 +1592,6 @@ void qemu_ram_free(ram_addr_t addr)
  }
  }
  qemu_mutex_unlock_ramlist();
 -
  }
  
  #ifndef _WIN32
 @@ -1633,7 +1639,6 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
  memory_try_enable_merging(vaddr, length);
  qemu_ram_setup_dump(vaddr, length);
  }
 -return;

Other changes are equivalent, but not quite for this one. But I think it is
still correct, so:

Reviewed-by: Fam Zheng f...@redhat.com

  }
  }
  }
 @@ -1641,49 +1646,60 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t 
 length)
  
  int qemu_get_ram_fd(ram_addr_t addr)
  {
 -RAMBlock *block = qemu_get_ram_block(addr);
 +RAMBlock *block;
 +int fd;
  
 -return block-fd;
 +block = qemu_get_ram_block(addr);
 +fd = block-fd;
 +return fd;
  }
  
  void *qemu_get_ram_block_host_ptr(ram_addr_t addr)
  {
 -RAMBlock *block = qemu_get_ram_block(addr);
 +RAMBlock *block;
 +void *ptr;
  
 -return ramblock_ptr(block, 0);
 +block = qemu_get_ram_block(addr);
 +ptr = ramblock_ptr(block, 0);
 +return ptr;
  }
  
  /* Return a host pointer to ram allocated with qemu_ram_alloc.
 -   With the exception of the softmmu code in this file, this should
 -   only be used for local memory (e.g. video ram) that the device owns,
 -   and knows it isn't going to access beyond the end of the block.
 -
 -   It should not be used for general purpose DMA.
 -   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
 + * This should not be used for general purpose DMA.  Use address_space_map
 + * or address_space_rw instead. For local memory (e.g. video ram) that the
 + * device owns, use memory_region_get_ram_ptr.
   */
  void *qemu_get_ram_ptr(ram_addr_t addr)
  {
 -RAMBlock *block =

Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-03 Thread David Woodhouse

On Tue, 2015-02-03 at 16:19 -0800, David Miller wrote:
 From: David Woodhouse dw...@infradead.org
 Date: Mon, 02 Feb 2015 07:27:10 +

  I'm guessing you don't want to push the *whole* management of the TLS
  control connection *and* the UDP transport, and probing the latter with
  keepalives, into the kernel? I certainly don't :)

 Whilst Herbert Xu and I have discussed in the past supporting
 automatic SSL handling of socket data during socket writes in the
 kernel, doing TLS stuff would be a bit of a stretch :-)

Right. For the DTLS I was thinking we'd do the handshake in userspace
and then hand the UDP socket down. At that point it's basically the same
as ESP with the bytes in a slightly different place.

So I really am looking at an option for here's a UDP socket to send
those tun packets out on, with this encryption setup as the sanest
plan I can come up with.

-- 
dwmw2

smime.p7s
Description: S/MIME cryptographic signature

[Qemu-devel] [PATCH v2 09/12] acpi, piix4: Add memory hot unplug support for piix4.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Call memory unplug cb in piix4_device_unplug_cb().

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/piix4.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 8bd9007..acd054e 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -377,8 +377,15 @@ static void piix4_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
 {
-error_setg(errp, acpi: device unplug for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+PIIX4PMState *s = PIIX4_PM(hotplug_dev);
+
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+acpi_memory_unplug_cb(s-ar, s-irq, s-acpi_memory_hotplug,
+  dev, errp);
+} else {
+error_setg(errp, acpi: device unplug for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque)
-- 
1.9.3

[Qemu-devel] [PATCH v2 06/12] acpi, ich9: Add memory hot unplug request support for ich9.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Call memory unplug request cb in ich9_pm_device_unplug_request_cb().

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/ich9.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 5352e19..b85eed4 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -400,8 +400,14 @@ void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState 
*dev, Error **errp)
 void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
   Error **errp)
 {
-error_setg(errp, acpi: device unplug request for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (pm-acpi_memory_hotplug.is_enabled 
+object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+acpi_memory_unplug_request_cb(pm-acpi_regs, pm-irq,
+  pm-acpi_memory_hotplug, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug request for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
-- 
1.9.3

[Qemu-devel] [PATCH v2 08/12] acpi, mem-hotplug: Add unplug cb for memory device.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Reset all memory status, and unparent the memory device.

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/memory_hotplug.c | 34 ++
 hw/core/qdev.c   |  2 +-
 include/hw/acpi/memory_hotplug.h |  2 ++
 include/hw/qdev-core.h   |  1 +
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index 3d3c1ec..3ae9629 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,7 @@
 #include hw/acpi/memory_hotplug.h
 #include hw/acpi/pc-hotplug.h
 #include hw/mem/pc-dimm.h
+#include hw/i386/pc.h
 #include hw/boards.h
 #include trace.h
 #include qapi-event.h
@@ -221,6 +222,39 @@ void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq 
irq,
 acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS);
 }
 
+void acpi_memory_unplug_cb(ACPIREGS *ar, qemu_irq irq,
+   MemHotplugState *mem_st,
+   DeviceState *dev, Error **errp)
+{
+MemStatus *mdev;
+HotplugHandler *hotplug_dev;
+PCMachineState *pcms;
+PCDIMMDevice *dimm;
+PCDIMMDeviceClass *ddc;
+MemoryRegion *mr;
+
+if (!mem_st-is_enabled) {
+error_setg(errp, memory hotplug is not supported);
+return;
+}
+
+mdev = acpi_memory_slot_status(mem_st, dev, errp);
+if (!mdev)
+return;
+
+mdev-is_enabled = false;
+mdev-dimm = NULL;
+
+hotplug_dev = qdev_get_hotplug_handler(dev);
+pcms = PC_MACHINE(hotplug_dev);
+dimm = PC_DIMM(dev);
+ddc = PC_DIMM_GET_CLASS(dimm);
+mr = ddc-get_memory_region(dimm);
+
+memory_region_del_subregion(pcms-hotplug_memory, mr);
+vmstate_unregister_ram(mr, dev);
+}
+
 static const VMStateDescription vmstate_memhp_sts = {
 .name = memory hotplug device state,
 .version_id = 1,
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 2eacac0..2f3d1df 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -273,7 +273,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int 
alias_id,
 dev-alias_required_for_version = required_for_version;
 }
 
-static HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
+HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
 {
 HotplugHandler *hotplug_ctrl = NULL;
 
diff --git a/include/hw/acpi/memory_hotplug.h b/include/hw/acpi/memory_hotplug.h
index c437a85..6b8d9f7 100644
--- a/include/hw/acpi/memory_hotplug.h
+++ b/include/hw/acpi/memory_hotplug.h
@@ -32,6 +32,8 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
 void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
MemHotplugState *mem_st,
DeviceState *dev, Error **errp);
+void acpi_memory_unplug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
+   DeviceState *dev, Error **errp);
 
 extern const VMStateDescription vmstate_memory_hotplug;
 #define VMSTATE_MEMORY_HOTPLUG(memhp, state) \
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 15a226f..03d6239 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -266,6 +266,7 @@ int qdev_init(DeviceState *dev) QEMU_WARN_UNUSED_RESULT;
 void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
  int required_for_version);
+HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
-- 
1.9.3

[Qemu-devel] [PATCH v2 04/12] acpi, mem-hotplug: Add unplug request cb for memory device.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Memory hot unplug are both asynchronize procedures.
When the unplug operation happens, unplug request cb is called first.
And when ghest OS finished handling unplug, unplug cb will be called
to do the real removal of device.

This patch adds unplug request cb for memory device. Add a new bool
member named is_removing to MemStatus indicating that the memory slot
is being removed. Set it to true in acpi_memory_unplug_request_cb(),
and send SCI to guest.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/memory_hotplug.c | 16 
 include/hw/acpi/memory_hotplug.h |  4 
 2 files changed, 20 insertions(+)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index f30d8f9..3d3c1ec 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -205,6 +205,22 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
 acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS);
 }
 
+void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
+   MemHotplugState *mem_st,
+   DeviceState *dev, Error **errp)
+{
+MemStatus *mdev;
+
+mdev = acpi_memory_slot_status(mem_st, dev, errp);
+if (!mdev)
+return;
+
+mdev-is_removing = true;
+
+/* Do ACPI magic */
+acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS);
+}
+
 static const VMStateDescription vmstate_memhp_sts = {
 .name = memory hotplug device state,
 .version_id = 1,
diff --git a/include/hw/acpi/memory_hotplug.h b/include/hw/acpi/memory_hotplug.h
index 7bbf8a0..c437a85 100644
--- a/include/hw/acpi/memory_hotplug.h
+++ b/include/hw/acpi/memory_hotplug.h
@@ -11,6 +11,7 @@ typedef struct MemStatus {
 DeviceState *dimm;
 bool is_enabled;
 bool is_inserting;
+bool is_removing;
 uint32_t ost_event;
 uint32_t ost_status;
 } MemStatus;
@@ -28,6 +29,9 @@ void acpi_memory_hotplug_init(MemoryRegion *as, Object *owner,
 
 void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
  DeviceState *dev, Error **errp);
+void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
+   MemHotplugState *mem_st,
+   DeviceState *dev, Error **errp);
 
 extern const VMStateDescription vmstate_memory_hotplug;
 #define VMSTATE_MEMORY_HOTPLUG(memhp, state) \
-- 
1.9.3

Re: [Qemu-devel] [PATCH v4] sheepdog: selectable object size support

2015-02-03 Thread Teruaki Ishizaki


(2015/02/02 15:52), Liu Yuan wrote:

On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:

Previously, qemu block driver of sheepdog used hard-coded VDI object size.
This patch enables users to handle block_size_shift value for
calculating VDI object size.

When you start qemu, you don't need to specify additional command option.

But when you create the VDI which doesn't have default object size
with qemu-img command, you specify block_size_shift option.

If you want to create a VDI of 8MB(1  23) object size,
you need to specify following command option.

  # qemu-img create -o block_size_shift=23 sheepdog:test1 100M


Is it possible to make this option more user friendly? such as

  $ qemu-img create -o object_size=8M sheepdog:test 1G


At first, I thought that the object_size was user friendly.
But, Sheepdog has already the value of block_size_shift
in the inode layout that means like object_size.

'object_size' doesn't always fit right in 'block_size_shift'.
On the other hands, 'block_size_shift' always fit right in
'object_size'.

I think that existing layout shouldn't be changed easily and
it seems that it is difficult for users to specify
the object_size value that fit right in 'block_size_shift'.

Thanks,
Teruaki Ishizaki

Re: [Qemu-devel] [PATCH 3/9] exec: RCUify AddressSpaceDispatch

2015-02-03 Thread Fam Zheng

On Tue, 02/03 13:52, Paolo Bonzini wrote:
 Note that even after this patch, most callers of address_space_*
 functions must still be under the big QEMU lock, otherwise the memory
 region returned by address_space_translate can disappear as soon as
 address_space_translate returns.  This will be fixed in the next part
 of this series.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  cpu-exec.c  | 25 -
  cpus.c  |  2 +-
  cputlb.c|  8 ++--
  exec.c  | 34 ++
  hw/i386/intel_iommu.c   |  3 +++
  hw/pci-host/apb.c   |  1 +
  hw/ppc/spapr_iommu.c|  1 +
  include/exec/exec-all.h |  1 +
  8 files changed, 63 insertions(+), 12 deletions(-)
 
 diff --git a/cpu-exec.c b/cpu-exec.c
 index 98f968d..adb939a 100644
 --- a/cpu-exec.c
 +++ b/cpu-exec.c
 @@ -26,6 +26,7 @@
  #include qemu/timer.h
  #include exec/address-spaces.h
  #include exec/memory-internal.h
 +#include qemu/rcu.h
  
  /* -icount align implementation. */
  
 @@ -146,8 +147,27 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc)
  
  void cpu_reload_memory_map(CPUState *cpu)
  {
 +AddressSpaceDispatch *d;
 +
 +if (qemu_in_vcpu_thread()) {
 +/* Do not let the guest prolong the critical section as much as it
 + * as it desires.
 + *
 + * Currently, this is prevented by the I/O thread's periodinc kicking
 + * of the VCPU thread (iothread_requesting_mutex, 
 qemu_cpu_kick_thread)
 + * but this will go away once TCG's execution moves out of the global
 + * mutex.
 + *
 + * This pair matches cpu_exec's rcu_read_lock()/rcu_read_unlock(), 
 which
 + * only protects cpu-as-dispatch.  Since we reload it below, we can
 + * split the critical section.
 + */
 +rcu_read_unlock();
 +rcu_read_lock();
 +}
 +
  /* The CPU and TLB are protected by the iothread lock.  */
 -AddressSpaceDispatch *d = cpu-as-dispatch;
 +d = atomic_rcu_read(cpu-as-dispatch);
  cpu-memory_dispatch = d;
  tlb_flush(cpu, 1);
  }
 @@ -362,6 +382,8 @@ int cpu_exec(CPUArchState *env)
   * an instruction scheduling constraint on modern architectures.  */
  smp_mb();
  
 +rcu_read_lock();
 +
  if (unlikely(exit_request)) {
  cpu-exit_request = 1;
  }
 @@ -564,6 +586,7 @@ int cpu_exec(CPUArchState *env)
  } /* for(;;) */
  
  cc-cpu_exec_exit(cpu);
 +rcu_read_unlock();
  
  /* fail safe : never use current_cpu outside cpu_exec() */
  current_cpu = NULL;
 diff --git a/cpus.c b/cpus.c
 index 0cdd1d7..b826fac 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -1104,7 +1104,7 @@ bool qemu_cpu_is_self(CPUState *cpu)
  return qemu_thread_is_self(cpu-thread);
  }
  
 -static bool qemu_in_vcpu_thread(void)
 +bool qemu_in_vcpu_thread(void)
  {
  return current_cpu  qemu_cpu_is_self(current_cpu);
  }
 diff --git a/cputlb.c b/cputlb.c
 index f92db5e..38f2151 100644
 --- a/cputlb.c
 +++ b/cputlb.c
 @@ -243,8 +243,12 @@ static void tlb_add_large_page(CPUArchState *env, 
 target_ulong vaddr,
  }
  
  /* Add a new TLB entry. At most one entry for a given virtual address
 -   is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the
 -   supplied size is only used by tlb_flush_page.  */
 + * is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the
 + * supplied size is only used by tlb_flush_page.
 + *
 + * Called from TCG-generated code, which is under an RCU read-side
 + * critical section.
 + */
  void tlb_set_page(CPUState *cpu, target_ulong vaddr,
hwaddr paddr, int prot,
int mmu_idx, target_ulong size)
 diff --git a/exec.c b/exec.c
 index 1854c95..a423def 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -115,6 +115,8 @@ struct PhysPageEntry {
  typedef PhysPageEntry Node[P_L2_SIZE];
  
  typedef struct PhysPageMap {
 +struct rcu_head rcu;
 +
  unsigned sections_nb;
  unsigned sections_nb_alloc;
  unsigned nodes_nb;
 @@ -124,6 +126,8 @@ typedef struct PhysPageMap {
  } PhysPageMap;
  
  struct AddressSpaceDispatch {
 +struct rcu_head rcu;
 +
  /* This is a multi-level map on the physical address space.
   * The bottom level has pointers to MemoryRegionSections.
   */
 @@ -315,6 +319,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
   mr != io_mem_watch;
  }
  
 +/* Called from RCU critical section */
  static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch 
 *d,
  hwaddr addr,
  bool resolve_subpage)
 @@ -330,6 +335,7 @@ static MemoryRegionSection 
 *address_space_lookup_region(AddressSpaceDispatch *d,
  return section;
  }
  
 +/* Called from RCU critical section */
  static MemoryRegionSection *
  address_space_translate_internal(AddressSpaceDispatch *d, hwaddr

[Qemu-devel] [PATCH v2 05/12] acpi, piix4: Add memory hot unplug request support for piix4.

2015-02-03 Thread Zhu Guihua

From: Hu Tao hu...@cn.fujitsu.com

Call memory unplug request cb in piix4_device_unplug_request_cb().

Signed-off-by: Hu Tao hu...@cn.fujitsu.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/piix4.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 14d40a0..8bd9007 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -361,7 +361,11 @@ static void piix4_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 {
 PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
-if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+if (s-acpi_memory_hotplug.is_enabled 
+object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+acpi_memory_unplug_request_cb(s-ar, s-irq, s-acpi_memory_hotplug,
+  dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
 acpi_pcihp_device_unplug_cb(s-ar, s-irq, s-acpi_pci_hotplug, dev,
 errp);
 } else {
-- 
1.9.3

[Qemu-devel] [PATCH v2 07/12] pc-dimm: Add memory hot unplug request support for pc-dimm.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Implement memory unplug request cb for pc-dimm, and call it in
pc_machine_device_unplug_request_cb().

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/i386/pc.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 850b6b5..ddc0190 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1641,6 +1641,26 @@ out:
 error_propagate(errp, local_err);
 }
 
+static void pc_dimm_unplug_request(HotplugHandler *hotplug_dev,
+   DeviceState *dev, Error **errp)
+{
+HotplugHandlerClass *hhc;
+Error *local_err = NULL;
+PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+
+if (!pcms-acpi_dev) {
+error_setg(local_err,
+   memory hotplug is not enabled: missing acpi device);
+goto out;
+}
+
+hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev);
+hhc-unplug_request(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err);
+
+out:
+error_propagate(errp, local_err);
+}
+
 static void pc_cpu_plug(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
@@ -1683,8 +1703,12 @@ static void pc_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
-error_setg(errp, acpi: device unplug request for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+pc_dimm_unplug_request(hotplug_dev, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug request for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
-- 
1.9.3

Re: [Qemu-devel] [PATCH 4/9] rcu: introduce RCU-enabled QLIST

2015-02-03 Thread Fam Zheng

On Tue, 02/03 13:52, Paolo Bonzini wrote:
 From: Mike Day ncm...@ncultra.org
 
 Add RCU-enabled variants on the existing bsd DQ facility. Each
 operation has the same interface as the existing (non-RCU)
 version. Also, each operation is implemented as macro.
 
 Using the RCU-enabled QLIST, existing QLIST users will be able to
 convert to RCU without using a different list interface.
 
 Signed-off-by: Mike Day ncm...@ncultra.org
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/9pfs/virtio-9p-synth.c |   2 +-
  include/qemu/queue.h  |  11 --
  include/qemu/rcu_queue.h  | 134 
  tests/Makefile|   5 +-
  tests/test-rcu-list.c | 306 
 ++
  5 files changed, 445 insertions(+), 13 deletions(-)
  create mode 100644 include/qemu/rcu_queue.h
  create mode 100644 tests/test-rcu-list.c
 
 diff --git a/hw/9pfs/virtio-9p-synth.c b/hw/9pfs/virtio-9p-synth.c
 index e75aa87..a0ab9a8 100644
 --- a/hw/9pfs/virtio-9p-synth.c
 +++ b/hw/9pfs/virtio-9p-synth.c
 @@ -18,7 +18,7 @@
  #include fsdev/qemu-fsdev.h
  #include virtio-9p-synth.h
  #include qemu/rcu.h
 -
 +#include qemu/rcu_queue.h
  #include sys/stat.h
  
  /* Root node for synth file system */
 diff --git a/include/qemu/queue.h b/include/qemu/queue.h
 index c602797..8094150 100644
 --- a/include/qemu/queue.h
 +++ b/include/qemu/queue.h
 @@ -139,17 +139,6 @@ struct { 
\
  (elm)-field.le_prev = (head)-lh_first;   \
  } while (/*CONSTCOND*/0)
  
 -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
 -(elm)-field.le_prev = (head)-lh_first;   \
 -(elm)-field.le_next = (head)-lh_first;\
 -smp_wmb(); /* fill elm before linking it */ \
 -if ((head)-lh_first != NULL)  {\
 -(head)-lh_first-field.le_prev = (elm)-field.le_next;\
 -}   \
 -(head)-lh_first = (elm);   \
 -smp_wmb();  \
 -} while (/* CONSTCOND*/0)
 -
  #define QLIST_REMOVE(elm, field) do {   \
  if ((elm)-field.le_next != NULL)   \
  (elm)-field.le_next-field.le_prev =   \
 diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
 new file mode 100644
 index 000..3aca7a5
 --- /dev/null
 +++ b/include/qemu/rcu_queue.h
 @@ -0,0 +1,134 @@
 +#ifndef QEMU_RCU_QUEUE_H
 +#define QEMU_RCU_QUEUE_H
 +
 +/*
 + * rcu_queue.h
 + *
 + * RCU-friendly versions of the queue.h primitives.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, write to the Free Software
 + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
 USA
 + *
 + * Copyright (c) 2013 Mike D. Day, IBM Corporation.
 + *
 + * IBM's contributions to this file may be relicensed under LGPLv2 or later.
 + */
 +
 +#include qemu/queue.h
 +#include qemu/atomic.h
 +
 +#ifdef __cplusplus
 +extern C {
 +#endif
 +
 +
 +/*
 + * List access methods.
 + */
 +#define QLIST_EMPTY_RCU(head) (atomic_rcu_read((head)-lh_first) == NULL)
 +#define QLIST_FIRST_RCU(head) (atomic_rcu_read((head)-lh_first))
 +#define QLIST_NEXT_RCU(elm, field) (atomic_rcu_read((elm)-field.le_next))
 +
 +/*
 + * List functions.
 + */
 +
 +
 +/*
 + *  The difference between atomic_read/set and atomic_rcu_read/set
 + *  is in the including of a read/write memory barrier to the volatile
 + *  access. atomic_rcu_* macros include the memory barrier, the
 + *  plain atomic macros do not. Therefore, it should be correct to
 + *  issue a series of reads or writes to the same element using only
 + *  the atomic_* macro, until the last read or write, which should be
 + *  atomic_rcu_* to introduce a read or write memory barrier as
 + *  appropriate.
 + */
 +
 +/* Upon publication of the listelm-next value, list readers
 + * will see the new node when following next pointers from
 + * antecedent nodes, but may not see the new node when following
 + * prev pointers from subsequent nodes until after the RCU grace
 + * period expires.
 + * see linux/include/rculist.h

Re: [Qemu-devel] [PATCH v2 0/1] dataplane vs. endianness

2015-02-03 Thread David Gibson

On Tue, Jan 27, 2015 at 04:15:23PM +1100, David Gibson wrote:
 On Mon, Jan 26, 2015 at 05:26:41PM +0100, Cornelia Huck wrote:
  Stefan:
  
  Here's v2 of my endianness patch for dataplane, with the extraneous
  vdev argument dropped from get_desc().
  
  I orginally planned to send my virtio-1 patchset as well, but I haven't
  found the time for it; therefore, I think this should be applied
  independently.
  
  David: I take it your r-b still holds?
 
 Yes.  I also retested this version and it still works fine.
 
 Tested-by: David Gibson da...@gibson.dropbear.id.au

Any word on getting this merged?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpxzCp_NTceq.pgp
Description: PGP signature

[Qemu-devel] [PATCH v2 10/12] acpi, ich9: Add memory hot unplug support for ich9.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Call memory unplug cb in ich9_pm_device_unplug_cb().

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/ich9.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index b85eed4..3a8d712 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -413,8 +413,14 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, 
DeviceState *dev,
 void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
   Error **errp)
 {
-error_setg(errp, acpi: device unplug for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (pm-acpi_memory_hotplug.is_enabled 
+object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+acpi_memory_unplug_cb(pm-acpi_regs, pm-irq,
+  pm-acpi_memory_hotplug, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 void ich9_pm_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list)
-- 
1.9.3

[Qemu-devel] [PATCH v2 12/12] acpi: Add hardware implementation for memory hot unplug.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

This patch adds a new bit to memory hotplug IO port indicating that
ej0 has been evaluated by guest OS. And call pc-dimm unplug cb to do
the real removal.

Signed-off-by: Hu Tao hu...@cn.fujitsu.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 docs/specs/acpi_mem_hotplug.txt   |  9 +++--
 hw/acpi/memory_hotplug.c  | 25 ++---
 hw/i386/acpi-dsdt-mem-hotplug.dsl | 11 ++-
 hw/i386/ssdt-mem.dsl  |  5 +
 include/hw/acpi/pc-hotplug.h  |  2 ++
 5 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/docs/specs/acpi_mem_hotplug.txt b/docs/specs/acpi_mem_hotplug.txt
index 1290994..9805f1a 100644
--- a/docs/specs/acpi_mem_hotplug.txt
+++ b/docs/specs/acpi_mem_hotplug.txt
@@ -19,7 +19,9 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte 
access):
   1: Device insert event, used to distinguish device for which
  no device check event to OSPM was issued.
  It's valid only when bit 1 is set.
-  2-7: reserved and should be ignored by OSPM
+  2: Device remove event, used to distinguish device for which
+ no device check event to OSPM was issued.
+  3-7: reserved and should be ignored by OSPM
   [0x15-0x17] reserved
 
   write access:
@@ -35,7 +37,10 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte 
access):
   1: if set to 1 clears device insert event, set by OSPM
  after it has emitted device check event for the
  selected memory device
-  2-7: reserved, OSPM must clear them before writing to register
+  2: if set to 1 clears device remove event, set by OSPM
+ after it has emitted device check event for the
+ selected memory device
+  3-7: reserved, OSPM must clear them before writing to register
 
 Selecting memory device slot beyond present range has no effect on platform:
- write accesses to memory hot-plug registers not documented above are
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index 3ae9629..a6fc3b3 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -3,6 +3,7 @@
 #include hw/mem/pc-dimm.h
 #include hw/i386/pc.h
 #include hw/boards.h
+#include hw/qdev-core.h
 #include trace.h
 #include qapi-event.h
 
@@ -76,6 +77,7 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, hwaddr 
addr,
 case 0x14: /* pack and return is_* fields */
 val |= mdev-is_enabled   ? 1 : 0;
 val |= mdev-is_inserting ? 2 : 0;
+val |= mdev-is_removing  ? 4 : 0;
 trace_mhp_acpi_read_flags(mem_st-selector, val);
 break;
 default:
@@ -91,6 +93,8 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr 
addr, uint64_t data,
 MemHotplugState *mem_st = opaque;
 MemStatus *mdev;
 ACPIOSTInfo *info;
+DeviceState *dev = NULL;
+HotplugHandler *hotplug_ctrl = NULL;
 
 if (!mem_st-dev_count) {
 return;
@@ -122,21 +126,36 @@ static void acpi_memory_hotplug_write(void *opaque, 
hwaddr addr, uint64_t data,
 mdev = mem_st-devs[mem_st-selector];
 mdev-ost_status = data;
 trace_mhp_acpi_write_ost_status(mem_st-selector, mdev-ost_status);
-/* TODO: implement memory removal on guest signal */
 
 info = acpi_memory_device_status(mem_st-selector, mdev);
 qapi_event_send_acpi_device_ost(info, error_abort);
 qapi_free_ACPIOSTInfo(info);
 break;
-case 0x14:
+case 0x14: /* set is_* fields */
 mdev = mem_st-devs[mem_st-selector];
+
 if (data  2) { /* clear insert event */
 mdev-is_inserting  = false;
 trace_mhp_acpi_clear_insert_evt(mem_st-selector);
+} else if (data  4) { /* request removal of device */
+mdev-is_removing = false;
+trace_mhp_acpi_clear_remove_evt(mem_st-selector);
+/*
+ * QEmu memory hot unplug is an asynchronized procedure. QEmu first
+ * calls pc-dimm unplug request cb to send a SCI to guest. When the
+ * Guest OS finished handling the SCI, it evaluates ACPI ej0, and
+ * QEmu calls pc-dimm unplug cb to remove memory device.
+ */
+dev = DEVICE(mdev-dimm);
+hotplug_ctrl = qdev_get_hotplug_handler(dev);
+/* Call pc-dimm unplug cb. */
+hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
 }
+
+break;
+default:
 break;
 }
-
 }
 static const MemoryRegionOps acpi_memory_hotplug_ops = {
 .read = acpi_memory_hotplug_read,
diff --git a/hw/i386/acpi-dsdt-mem-hotplug.dsl 
b/hw/i386/acpi-dsdt-mem-hotplug.dsl
index 2a36c47..b53bf77 100644
--- a/hw/i386/acpi-dsdt-mem-hotplug.dsl
+++ b/hw/i386/acpi-dsdt-mem-hotplug.dsl
@@ -50,6 +50,7 @@

[Qemu-devel] [PATCH v2 02/12] acpi, mem-hotplug: Add acpi_memory_slot_status() to get MemStatus.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Add a new API named acpi_memory_get_slot_status_descriptor() to obtain
a single memory slot status. Doing this is because this procedure will
be used by other functions in the next coming patches.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/memory_hotplug.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index c6580da..ddbe01b 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -163,29 +163,40 @@ void acpi_memory_hotplug_init(MemoryRegion *as, Object 
*owner,
 memory_region_add_subregion(as, ACPI_MEMORY_HOTPLUG_BASE, state-io);
 }
 
-void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
- DeviceState *dev, Error **errp)
+static MemStatus *
+acpi_memory_slot_status(MemHotplugState *mem_st,
+DeviceState *dev, Error **errp)
 {
-MemStatus *mdev;
 Error *local_err = NULL;
 int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
local_err);
 
 if (local_err) {
 error_propagate(errp, local_err);
-return;
+return NULL;
 }
 
 if (slot = mem_st-dev_count) {
 char *dev_path = object_get_canonical_path(OBJECT(dev));
-error_setg(errp, acpi_memory_plug_cb: 
+error_setg(errp, acpi_memory_get_slot_status_descriptor: 
device [%s] returned invalid memory slot[%d],
-dev_path, slot);
+   dev_path, slot);
 g_free(dev_path);
-return;
+return NULL;
 }
 
-mdev = mem_st-devs[slot];
+return mem_st-devs[slot];
+}
+
+void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
+ DeviceState *dev, Error **errp)
+{
+MemStatus *mdev;
+
+mdev = acpi_memory_slot_status(mem_st, dev, errp);
+if (!mdev)
+return;
+
 mdev-dimm = dev;
 mdev-is_enabled = true;
 mdev-is_inserting = true;
-- 
1.9.3

[Qemu-devel] [PATCH v2 00/12] QEmu memory hot unplug support

2015-02-03 Thread Zhu Guihua

Memory hot unplug are both asynchronize procedures.
When the unplug operation happens, unplug request cb is called first.
And when ghest OS finished handling unplug, unplug cb will be called
to do the real removal of device.

This series depends on the following patchset.
[PATCH v2 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug
https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg03929.html

v2:
- do a generic for acpi to send gpe event
- unparent object by PC_MACHINE
- update description in acpi_mem_hotplug.txt
- combine the last two patches in the last version
- cleanup external state in acpi_memory_unplug_cb

Hu Tao (1):
  acpi, piix4: Add memory hot unplug request support for piix4.

Tang Chen (11):
  acpi, mem-hotplug: Use PC_DIMM_SLOT_PROP in acpi_memory_plug_cb().
  acpi, mem-hotplug: Add acpi_memory_slot_status() to get MemStatus.
  acpi, mem-hotplug: Add acpi_memory_hotplug_sci() to rise sci for
memory hotplug.
  acpi, mem-hotplug: Add unplug request cb for memory device.
  acpi, ich9: Add memory hot unplug request support for ich9.
  pc-dimm: Add memory hot unplug request support for pc-dimm.
  acpi, mem-hotplug: Add unplug cb for memory device.
  acpi, piix4: Add memory hot unplug support for piix4.
  acpi, ich9: Add memory hot unplug support for ich9.
  pc-dimm: Add memory hot unplug support for pc-dimm.
  acpi: Add hardware implementation for memory hot unplug.

 docs/specs/acpi_mem_hotplug.txt   |   9 +++-
 hw/acpi/core.c|   7 +++
 hw/acpi/ich9.c|  20 +--
 hw/acpi/memory_hotplug.c  | 111 --
 hw/acpi/piix4.c   |  17 --
 hw/core/qdev.c|   2 +-
 hw/i386/acpi-dsdt-mem-hotplug.dsl |  11 +++-
 hw/i386/pc.c  |  48 +++--
 hw/i386/ssdt-mem.dsl  |   5 ++
 include/hw/acpi/acpi.h|   3 ++
 include/hw/acpi/memory_hotplug.h  |   6 +++
 include/hw/acpi/pc-hotplug.h  |   2 +
 include/hw/qdev-core.h|   1 +
 13 files changed, 211 insertions(+), 31 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v16 2/2] sPAPR: Implement sPAPRPHBClass::eeh_handler

2015-02-03 Thread Gavin Shan

The patch implements sPAPRPHBClass::eeh_handler so that the
EEH RTAS requests can be routed to VFIO for further handling.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 hw/ppc/spapr_pci_vfio.c | 58 +
 hw/vfio/common.c|  1 +
 2 files changed, 59 insertions(+)

diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 144912b..b76c660 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -71,6 +71,63 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState 
*sphb, Error **errp)
 spapr_tce_get_iommu(tcet));
 }
 
+static int spapr_phb_vfio_eeh_handler(sPAPRPHBState *sphb, int req, int opt)
+{
+sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+struct vfio_eeh_pe_op op;
+int cmd;
+
+memset(op, 0, sizeof(op));
+op.argsz = sizeof(op);
+switch (req) {
+case RTAS_EEH_REQ_SET_OPTION:
+switch (opt) {
+case RTAS_EEH_DISABLE:
+cmd = VFIO_EEH_PE_DISABLE;
+break;
+case RTAS_EEH_ENABLE:
+cmd = VFIO_EEH_PE_ENABLE;
+break;
+case RTAS_EEH_THAW_IO:
+cmd = VFIO_EEH_PE_UNFREEZE_IO;
+break;
+case RTAS_EEH_THAW_DMA:
+cmd = VFIO_EEH_PE_UNFREEZE_DMA;
+break;
+default:
+return -EINVAL;
+}
+break;
+case RTAS_EEH_REQ_GET_STATE:
+cmd = VFIO_EEH_PE_GET_STATE;
+break;
+case RTAS_EEH_REQ_RESET:
+switch (opt) {
+case RTAS_SLOT_RESET_DEACTIVATE:
+cmd = VFIO_EEH_PE_RESET_DEACTIVATE;
+break;
+case RTAS_SLOT_RESET_HOT:
+cmd = VFIO_EEH_PE_RESET_HOT;
+break;
+case RTAS_SLOT_RESET_FUNDAMENTAL:
+cmd = VFIO_EEH_PE_RESET_FUNDAMENTAL;
+break;
+default:
+return -EINVAL;
+}
+break;
+case RTAS_EEH_REQ_CONFIGURE:
+cmd = VFIO_EEH_PE_CONFIGURE;
+break;
+default:
+ return -EINVAL;
+}
+
+op.op = cmd;
+return vfio_container_ioctl(svphb-phb.iommu_as, svphb-iommugroupid,
+VFIO_EEH_PE_OP, op);
+}
+
 static void spapr_phb_vfio_reset(DeviceState *qdev)
 {
 /* Do nothing */
@@ -84,6 +141,7 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, 
void *data)
 dc-props = spapr_phb_vfio_properties;
 dc-reset = spapr_phb_vfio_reset;
 spc-finish_realize = spapr_phb_vfio_finish_realize;
+spc-eeh_handler = spapr_phb_vfio_eeh_handler;
 }
 
 static const TypeInfo spapr_phb_vfio_info = {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cf483ff..8a10c8b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -948,6 +948,7 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
 switch (req) {
 case VFIO_CHECK_EXTENSION:
 case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+case VFIO_EEH_PE_OP:
 break;
 default:
 /* Return an error on unknown requests */
-- 
1.8.3.2

Re: [Qemu-devel] [PATCH 2/9] exec: make iotlb RCU-friendly

2015-02-03 Thread Fam Zheng

On Tue, 02/03 13:52, Paolo Bonzini wrote:
 After the previous patch, TLBs will be flushed on every change to
 the memory mapping.  This patch augments that with synchronization
 of the MemoryRegionSections referred to in the iotlb array.
 
 With this change, it is guaranteed that iotlb_to_region will access
 the correct memory map, even once the TLB will be accessed outside
 the BQL.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  cpu-exec.c  |  6 +-
  cputlb.c|  5 ++---
  exec.c  | 13 -
  include/exec/cputlb.h   |  2 +-
  include/exec/exec-all.h |  3 ++-
  include/qom/cpu.h   |  1 +
  softmmu_template.h  |  4 ++--
  7 files changed, 21 insertions(+), 13 deletions(-)
 
 diff --git a/cpu-exec.c b/cpu-exec.c
 index 78fe382..98f968d 100644
 --- a/cpu-exec.c
 +++ b/cpu-exec.c
 @@ -24,6 +24,8 @@
  #include qemu/atomic.h
  #include sysemu/qtest.h
  #include qemu/timer.h
 +#include exec/address-spaces.h
 +#include exec/memory-internal.h
  
  /* -icount align implementation. */
  
 @@ -144,7 +146,9 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc)
  
  void cpu_reload_memory_map(CPUState *cpu)
  {
 -/* The TLB is protected by the iothread lock.  */
 +/* The CPU and TLB are protected by the iothread lock.  */
 +AddressSpaceDispatch *d = cpu-as-dispatch;
 +cpu-memory_dispatch = d;
  tlb_flush(cpu, 1);
  }
  #endif
 diff --git a/cputlb.c b/cputlb.c
 index 3b271d4..f92db5e 100644
 --- a/cputlb.c
 +++ b/cputlb.c
 @@ -265,8 +265,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
  }
  
  sz = size;
 -section = address_space_translate_for_iotlb(cpu-as, paddr,
 -xlat, sz);
 +section = address_space_translate_for_iotlb(cpu, paddr, xlat, sz);
  assert(sz = TARGET_PAGE_SIZE);
  
  #if defined(DEBUG_TLB)
 @@ -347,7 +346,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
 target_ulong addr)
  cpu_ldub_code(env1, addr);
  }
  pd = env1-iotlb[mmu_idx][page_index]  ~TARGET_PAGE_MASK;
 -mr = iotlb_to_region(cpu-as, pd);
 +mr = iotlb_to_region(cpu, pd);
  if (memory_region_is_unassigned(mr)) {
  CPUClass *cc = CPU_GET_CLASS(cpu);
  
 diff --git a/exec.c b/exec.c
 index 5a75909..1854c95 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -401,11 +401,12 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
 hwaddr addr,
  }
  
  MemoryRegionSection *
 -address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr 
 *xlat,
 -  hwaddr *plen)
 +address_space_translate_for_iotlb(CPUState *cpu, hwaddr addr,
 +  hwaddr *xlat, hwaddr *plen)
  {
  MemoryRegionSection *section;
 -section = address_space_translate_internal(as-dispatch, addr, xlat, 
 plen, false);
 +section = address_space_translate_internal(cpu-memory_dispatch,
 +   addr, xlat, plen, false);
  
  assert(!section-mr-iommu_ops);
  return section;
 @@ -1961,9 +1962,11 @@ static uint16_t dummy_section(PhysPageMap *map, 
 AddressSpace *as,
  return phys_section_add(map, section);
  }
  
 -MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index)
+MemoryRegion *iotlb_to_region(CPUState *cpu, hwaddr index)
  {
 -return as-dispatch-map.sections[index  ~TARGET_PAGE_MASK].mr;
 +MemoryRegionSection *sections = cpu-memory_dispatch-map.sections;
 +
 +return sections[index  ~TARGET_PAGE_MASK].mr;
  }
  
  static void io_mem_init(void)
 diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h
 index b8ecd6f..e0da9d7 100644
 --- a/include/exec/cputlb.h
 +++ b/include/exec/cputlb.h
 @@ -34,7 +34,7 @@ extern int tlb_flush_count;
  void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr);
  
  MemoryRegionSection *
 -address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr 
 *xlat,
 +address_space_translate_for_iotlb(CPUState *cpu, hwaddr addr, hwaddr *xlat,
hwaddr *plen);
  hwaddr memory_region_section_get_iotlb(CPUState *cpu,
 MemoryRegionSection *section,
 diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
 index 1b30813..bb3fd37 100644
 --- a/include/exec/exec-all.h
 +++ b/include/exec/exec-all.h
 @@ -338,7 +338,8 @@ extern uintptr_t tci_tb_ptr;
  
  void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
  
 -struct MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index);
 +struct MemoryRegion *iotlb_to_region(CPUState *cpu,
 + hwaddr index);
  bool io_mem_read(struct MemoryRegion *mr, hwaddr addr,
   uint64_t *pvalue, unsigned size);
  bool io_mem_write(struct MemoryRegion *mr, hwaddr addr,
 diff --git a/include/qom/cpu.h b/include/qom/cpu.h
 index 2098f1c..48fd6fb 100644
 --- a/include/qom/cpu.h
 +++ b/include/qom/cpu.h
 @@ -256,6 +256,7 @@

Re: [Qemu-devel] [v4 12/13] migration: Add command to set migration parameter

2015-02-03 Thread Eric Blake

On 02/03/2015 06:26 PM, Li, Liang Z wrote:

 Hmm - do we really need two parameters here?  Remember, compress
 threads is used only on the source, and decompress threads is used only on
 the destination.  Having a single parameter, 'threads', which is set to
 compression threads on source and decompression threads on destination,
 and which need not be equal between the two machines, should still work,
 right?

 
 Yes, it works. The benefit of using one parameter instead of two can reduce 
 the QMP 
 command count, and the side effect of using the same thread count for 
 compression
  and decompression is a little waste if the user just want to use the default 
 settings,
 you know, decompression is usually  about 4 times faster than compression.  
 Use more
 decompression threads than needed will waste some RAM which used to save data 
 structure related to the decompression thread, about 4K bytes RAM per thread, 
 is it 
 acceptable?

The default setting is no compression.  The user already has to
configure things on both sides to get compression, so it is not a burden
to ask them to configure thread count on both sides correctly.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 03/12] acpi, mem-hotplug: Add acpi_memory_hotplug_sci() to rise sci for memory hotplug.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Add a new API named acpi_memory_hotplug_sci() to send memory hotplug SCI.
Doing this is because this procedure will be used by other functions in the
next coming patches.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/core.c   | 7 +++
 hw/acpi/memory_hotplug.c | 6 ++
 include/hw/acpi/acpi.h   | 3 +++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 51913d6..98ca994 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -666,6 +666,13 @@ uint32_t acpi_gpe_ioport_readb(ACPIREGS *ar, uint32_t addr)
 return val;
 }
 
+void acpi_send_gpe_event(ACPIREGS *ar, qemu_irq irq,
+ unsigned int hotplug_status)
+{
+ar-gpe.sts[0] |= hotplug_status;
+acpi_update_sci(ar, irq);
+}
+
 void acpi_update_sci(ACPIREGS *regs, qemu_irq irq)
 {
 int sci_level, pm1a_sts;
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index ddbe01b..f30d8f9 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -201,10 +201,8 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
 mdev-is_enabled = true;
 mdev-is_inserting = true;
 
-/* do ACPI magic */
-ar-gpe.sts[0] |= ACPI_MEMORY_HOTPLUG_STATUS;
-acpi_update_sci(ar, irq);
-return;
+/* Do ACPI magic */
+acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS);
 }
 
 static const VMStateDescription vmstate_memhp_sts = {
diff --git a/include/hw/acpi/acpi.h b/include/hw/acpi/acpi.h
index 1f678b4..7a0a209 100644
--- a/include/hw/acpi/acpi.h
+++ b/include/hw/acpi/acpi.h
@@ -172,6 +172,9 @@ void acpi_gpe_reset(ACPIREGS *ar);
 void acpi_gpe_ioport_writeb(ACPIREGS *ar, uint32_t addr, uint32_t val);
 uint32_t acpi_gpe_ioport_readb(ACPIREGS *ar, uint32_t addr);
 
+void acpi_send_gpe_event(ACPIREGS *ar, qemu_irq irq,
+ unsigned int hotplug_status);
+
 void acpi_update_sci(ACPIREGS *acpi_regs, qemu_irq irq);
 
 /* acpi.c */
-- 
1.9.3

[Qemu-devel] [PATCH v2 01/12] acpi, mem-hotplug: Use PC_DIMM_SLOT_PROP in acpi_memory_plug_cb().

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Replace string slot in acpi_memory_plug_cb() with MACRO PC_DIMM_SLOT_PROP.

Reviewed-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/memory_hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index ed39241..c6580da 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -168,7 +168,8 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
 {
 MemStatus *mdev;
 Error *local_err = NULL;
-int slot = object_property_get_int(OBJECT(dev), slot, local_err);
+int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
+   local_err);
 
 if (local_err) {
 error_propagate(errp, local_err);
-- 
1.9.3

Re: [Qemu-devel] [PATCH 7/9] rcu: prod call_rcu thread when calling synchronize_rcu

2015-02-03 Thread Fam Zheng

On Tue, 02/03 13:52, Paolo Bonzini wrote:
 call_rcu operates on the principle that either there is a steady stream of
 incoming RCU callbacks, or it is not worthwhile to wake up and process the
 few that are there.
 
 This however makes it hard to assert in testcases that all RCU callbacks
 are processed.  To avoid this, make call_rcu also process callbacks if there
 is a steady stream of synchronize_rcu calls.
 
 This avoids deadlocks in the upcoming test-rcu-list unit test, which waits
 for call_rcu to reclaim all nodes that it allocates.  Especially with very
 high load on the host, call_rcu decided to wait for a few more callbacks
 to pile up, but the test was done and was not going to produce more.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  util/rcu.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/util/rcu.c b/util/rcu.c
 index c9c3e6e..aa9f639 100644
 --- a/util/rcu.c
 +++ b/util/rcu.c
 @@ -48,6 +48,9 @@ unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
  QemuEvent rcu_gp_event;
  static QemuMutex rcu_gp_lock;
  
 +static int rcu_call_count;
 +static QemuEvent rcu_call_ready_event;
 +
  /*
   * Check whether a quiescent state was crossed between the beginning of
   * update_counter_and_wait and now.
 @@ -149,6 +152,9 @@ void synchronize_rcu(void)
  }
  
  qemu_mutex_unlock(rcu_gp_lock);
 +if (atomic_read(rcu_call_count)) {
 +qemu_event_set(rcu_call_ready_event);
 +}
  }
  
  
 @@ -159,8 +165,6 @@ void synchronize_rcu(void)
   */
  static struct rcu_head dummy;
  static struct rcu_head *head = dummy, **tail = dummy.next;
 -static int rcu_call_count;
 -static QemuEvent rcu_call_ready_event;
  
  static void enqueue(struct rcu_head *node)
  {
 -- 
 1.8.3.1
 
 

Reviewed-by: Fam Zheng f...@redhat.com

[Qemu-devel] [PATCH v2 11/12] pc-dimm: Add memory hot unplug support for pc-dimm.

2015-02-03 Thread Zhu Guihua

From: Tang Chen tangc...@cn.fujitsu.com

Implement unplug cb for pc-dimm. It remove the corresponding
memory region, and unregister vmstat. At last, it calls memory
unplug cb to reset memory status and do unparenting.

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/i386/pc.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ddc0190..4c03ee5 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1661,6 +1661,17 @@ out:
 error_propagate(errp, local_err);
 }
 
+static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
+   DeviceState *dev, Error **errp)
+{
+PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+HotplugHandlerClass *hhc;
+Error *local_err = NULL;
+
+hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev);
+hhc-unplug(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err);
+}
+
 static void pc_cpu_plug(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
@@ -1714,8 +1725,13 @@ static void 
pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
-error_setg(errp, acpi: device unplug for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+pc_dimm_unplug(hotplug_dev, dev, errp);
+object_unparent(OBJECT(dev));
+} else {
+error_setg(errp, acpi: device unplug for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
-- 
1.9.3

Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory

2015-02-03 Thread Thomas Huth

On Tue, 03 Feb 2015 16:22:32 +0100
Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 03/02/2015 16:16, Thomas Huth wrote:
  Actually, I'd prefer to keep the virtual in the defines for the type
  of operation below: When it comes to s390 storage keys, we likely might
  need some calls for reading and writing to physical memory, too. Then
  we could simply extend this ioctl instead of inventing a new one.
 
 Can you explain why it is necessary to read/write physical addresses
 from user space?  In the case of QEMU, I'm worried that you would have
 to invent your own memory read/write APIs that are different from
 everything else.
 
 On real s390 zPCI, does bus-master DMA update storage keys?

Ah, I was not thinking about bus-mastering/DMA here: AFAIK there are
some CPU instructions that access a parameter block in physical memory,
for example the SCLP instruction (see hw/s390x/sclp.c) - it's already
doing a cpu_physical_memory_read and ..._write for the parameters.
However, I haven't checked yet whether it is also supposed to touch
the storage keys, so if not, we also might be fine without the ioctls
for reading/writing to physical memory.

  Not really true, as you don't check it.  So It is not used by KVM with
  the currently defined set of flags is a better explanation.
  
  ok ... and maybe add should be set to zero ?
 
 If you don't check it, it is misleading to document this.

True... so I'll omit that.

 Thomas

Re: [Qemu-devel] [PATCH] hw/arm/virt: explain device-to-transport mapping in create_virtio_devices()

2015-02-03 Thread Peter Maydell

On 30 January 2015 at 04:34, Laszlo Ersek ler...@redhat.com wrote:
 Peter,

 On 01/30/15 05:31, Laszlo Ersek wrote:
 Signed-off-by: Laszlo Ersek ler...@redhat.com
 ---
  hw/arm/virt.c | 32 
  1 file changed, 28 insertions(+), 4 deletions(-)

 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index 2353440..091e5ee 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -441,10 +441,27 @@ static void create_virtio_devices(const VirtBoardInfo 
 *vbi, qemu_irq *pic)
  int i;
  hwaddr size = vbi-memmap[VIRT_MMIO].size;

 -/* Note that we have to create the transports in forwards order
 - * so that command line devices are inserted lowest address first,
 - * and then add dtb nodes in reverse order so that they appear in
 - * the finished device tree lowest address first.
 +/* We create the transports in forwards order. Since qbus_realize()
 + * prepends (not appends) new child buses, the incrementing loop below 
 will
 + * create a list of virtio-mmio buses with decreasing base addresses.
 + *
 + * When a -device option is processed from the command line,
 + * qbus_find_recursive() picks the next free virtio-mmio bus in forwards
 + * order. The upshot is that -device options in increasing command line
 + * order are mapped to virtio-mmio buses with decreasing base addresses.
 + *
 + * When this code was originally written, that arrangement ensured that 
 the
 + * guest Linux kernel would give the lowest name (/dev/vda, eth0, 
 etc) to
 + * the first -device on the command line. (The end-to-end order is a
 + * function of this loop, qbus_realize(), qbus_find_recursive(), and the
 + * guest kernel's name-to-address assignment strategy.)
 + *
 + * Meanwhile, the kernel's traversal seems to have been reserved; see 
 eg.

 can you please s/reserved/reversed/?

 Result of over-editing, sorry.

Sure, no problem. I also suggest I add this para:
 *
 * In any case, the kernel makes no guarantee about the stability of
 * enumeration order of virtio devices (as demonstrated by it changing
 * between kernel versions). For reliable and stable identification
 * of disks users must use UUIDs or similar mechanisms.

-- PMM

Re: [Qemu-devel] [PULL 0/9] s390x bugfixes and cleanups

2015-02-03 Thread Christian Borntraeger

Am 03.02.2015 um 14:45 schrieb Peter Maydell:
 On 3 February 2015 at 13:08, Cornelia Huck cornelia.h...@de.ibm.com wrote:
 The following changes since commit 16017c48547960539fcadb1f91d252124f442482:

   softfloat: Clarify license status (2015-01-29 16:45:45 +)

 are available in the git repository at:

   git://github.com/cohuck/qemu tags/s390x-20150203

 for you to fetch changes up to 553ce81c31e49d834b1bf635ab486695a4694333:

   pc-bios/s390-ccw: update binary (2015-02-03 13:42:40 +0100)

 
 Some bugfixes and cleanups for s390x, both in the new pci code and
 in old code.

 
 
 I'm a bit sad my fix-clang-warnings-in-s390 code patches didn't
 make it in to this pull, because I think they're the only
 remaining obstacle to my enabling warnings-are-errors in that
 build config...


These fixes are tcg code, so Alex or Richard should take these.

Christian

[Qemu-devel] [PATCH 1/9] exec: introduce cpu_reload_memory_map

2015-02-03 Thread Paolo Bonzini

This for now is a simple TLB flush.  This can change later for two
reasons:

1) an AddressSpaceDispatch will be cached in the CPUState object

2) it will not be possible to do tlb_flush once the TCG-generated code
runs outside the BQL.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpu-exec.c  | 6 ++
 exec.c  | 2 +-
 include/exec/exec-all.h | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index fa506e6..78fe382 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -141,6 +141,12 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc)
 cpu-exception_index = -1;
 siglongjmp(cpu-jmp_env, 1);
 }
+
+void cpu_reload_memory_map(CPUState *cpu)
+{
+/* The TLB is protected by the iothread lock.  */
+tlb_flush(cpu, 1);
+}
 #endif
 
 /* Execute a TB, and fix up the CPU state afterwards if necessary */
diff --git a/exec.c b/exec.c
index 6b79ad1..5a75909 100644
--- a/exec.c
+++ b/exec.c
@@ -2026,7 +2026,7 @@ static void tcg_commit(MemoryListener *listener)
 if (cpu-tcg_as_listener != listener) {
 continue;
 }
-tlb_flush(cpu, 1);
+cpu_reload_memory_map(cpu);
 }
 }
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6a15448..1b30813 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -96,6 +96,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end,
   int is_cpu_write_access);
 #if !defined(CONFIG_USER_ONLY)
+void cpu_reload_memory_map(CPUState *cpu);
 void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as);
 /* cputlb.c */
 void tlb_flush_page(CPUState *cpu, target_ulong addr);
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory

2015-02-03 Thread Paolo Bonzini



On 03/02/2015 13:11, Thomas Huth wrote:
 On s390, we've got to make sure to hold the IPTE lock while accessing
 virtual memory. So let's add an ioctl for reading and writing virtual
 memory to provide this feature for userspace, too.
 
 Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
 Reviewed-by: Dominik Dingel din...@linux.vnet.ibm.com
 Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
 ---
  Documentation/virtual/kvm/api.txt |   44 +
  arch/s390/kvm/gaccess.c   |   22 +
  arch/s390/kvm/gaccess.h   |2 +
  arch/s390/kvm/kvm-s390.c  |   63 
 +
  include/uapi/linux/kvm.h  |   21 
  5 files changed, 152 insertions(+), 0 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index b112efc..bf44b53 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2716,6 +2716,50 @@ The fields in each entry are defined as follows:
 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
   this function/index combination
  
 +4.89 KVM_GUEST_MEM_OP
 +
 +Capability: KVM_CAP_MEM_OP

Put virtual somewhere in the ioctl name and capability?

 +Architectures: s390
 +Type: vcpu ioctl
 +Parameters: struct kvm_guest_mem_op (in)
 +Returns: = 0 on success,
 +  0 on generic error (e.g. -EFAULT or -ENOMEM),
 +  0 if an exception occurred while walking the page tables
 +
 +Read or write data from/to the virtual memory of a VPCU.
 +
 +Parameters are specified via the following structure:
 +
 +struct kvm_guest_mem_op {
 + __u64 gaddr;/* the guest address */
 + __u64 flags;/* arch specific flags */
 + __u32 size; /* amount of bytes */
 + __u32 op;   /* type of operation */
 + __u64 buf;  /* buffer in userspace */
 + __u8 reserved[32];  /* should be set to 0 */
 +};
 +
 +The type of operation is specified in the op field, either 
 KVM_MEMOP_VIRTREAD
 +for reading from memory, KVM_MEMOP_VIRTWRITE for writing to memory, or
 +KVM_MEMOP_CHECKVIRTREAD or KVM_MEMOP_CHECKVIRTWRITE to check whether the

Better:

#define KVM_MEMOP_READ   0
#define KVM_MEMOP_WRITE  1

and in the flags field:

#define KVM_MEMOP_F_CHECK_ONLY (1  1)

 +corresponding memory access would create an access exception (without
 +changing the data in the memory at the destination). In case an access
 +exception occurred while walking the MMU tables of the guest, the ioctl
 +returns a positive error number to indicate the type of exception. The
 +exception is raised directly at the corresponding VCPU if the bit
 +KVM_MEMOP_F_INJECT_EXC is set in the flags field.

KVM_MEMOP_F_INJECT_EXCEPTION.

 +The logical (virtual) start address of the memory region has to be specified
 +in the gaddr field, and the length of the region in the size field.
 +buf is the buffer supplied by the userspace application where the read data
 +should be written to for KVM_MEMOP_VIRTREAD, or where the data that should
 +be written is stored for a KVM_MEMOP_VIRTWRITE. buf can be NULL for both
 +CHECK operations.

buf is unused and can be NULL for both CHECK operations.

 +The reserved field is meant for future extensions. It must currently be
 +set to 0.

Not really true, as you don't check it.  So It is not used by KVM with
the currently defined set of flags is a better explanation.

Paolo

 +
  5. The kvm_run structure
  
  
 diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
 index 8a1be90..d912362 100644
 --- a/arch/s390/kvm/gaccess.c
 +++ b/arch/s390/kvm/gaccess.c
 @@ -697,6 +697,28 @@ int guest_translate_address(struct kvm_vcpu *vcpu, 
 unsigned long gva,
  }
  
  /**
 + * check_gva_range - test a range of guest virtual addresses for 
 accessibility
 + */
 +int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva,
 + unsigned long length, int is_write)
 +{
 + unsigned long gpa;
 + unsigned long currlen;
 + int rc = 0;
 +
 + ipte_lock(vcpu);
 + while (length  0  !rc) {
 + currlen = min(length, PAGE_SIZE - (gva % PAGE_SIZE));
 + rc = guest_translate_address(vcpu, gva, gpa, is_write);
 + gva += currlen;
 + length -= currlen;
 + }
 + ipte_unlock(vcpu);
 +
 + return rc;
 +}
 +
 +/**
   * kvm_s390_check_low_addr_protection - check for low-address protection
   * @ga: Guest address
   *
 diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
 index 0149cf1..268beb7 100644
 --- a/arch/s390/kvm/gaccess.h
 +++ b/arch/s390/kvm/gaccess.h
 @@ -157,6 +157,8 @@ int read_guest_lc(struct kvm_vcpu *vcpu, unsigned long 
 gra, void *data,
  
  int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva,
   unsigned long *gpa, int write);
 +int check_gva_range(struct kvm_vcpu *vcpu, unsigned long

Re: [Qemu-devel] [PULL 0/2] OpenRISC patch queue for 2.3

2015-02-03 Thread Sebastian Macke


Hi Peter,

unfortunately you are right.

The correct line is this:

 /* invalidate lock */
-env-cpu_lock_addr = -1;
+env-lock_addr = -1;

I am sorry. It was most likely the last line which I added. But I 
forgot, that I disabled the system emulation already.

Therefore my make process didn't complain.
Should I send an updated patch, or can you do a hot-fix?

Sebastian


On 2/3/2015 11:40 AM, Peter Maydell wrote:

On 3 February 2015 at 02:19, Jia Liu pro...@gmail.com wrote:

Hi Anthony,

This is my OpenRISC patch queue for 2.3, it have been well tested, please pull.

...it can't have been very well tested, because it doesn't
compile:

target-openrisc/interrupt.c: In function ‘openrisc_cpu_do_interrupt’:
target-openrisc/interrupt.c:58:8: error: ‘CPUOpenRISCState’ has no
member named ‘cpu_lock_addr’

thanks
-- PMM

Re: [Qemu-devel] [PULL 0/2] OpenRISC patch queue for 2.3

2015-02-03 Thread Peter Maydell

On 3 February 2015 at 13:04, Sebastian Macke sebast...@macke.de wrote:
 Hi Peter,

 unfortunately you are right.

 The correct line is this:

  /* invalidate lock */
 -env-cpu_lock_addr = -1;
 +env-lock_addr = -1;

 I am sorry. It was most likely the last line which I added. But I forgot,
 that I disabled the system emulation already.
 Therefore my make process didn't complain.
 Should I send an updated patch, or can you do a hot-fix?

You should send an updated patch, and then Jia needs to re-test
and send a new pull request.

Somebody ought to be testing these instructions in system
emulation mode as well as linux-user...

thanks
-- PMM

Re: [Qemu-devel] [PULL 0/9] s390x bugfixes and cleanups

2015-02-03 Thread Peter Maydell

On 3 February 2015 at 13:08, Cornelia Huck cornelia.h...@de.ibm.com wrote:
 The following changes since commit 16017c48547960539fcadb1f91d252124f442482:

   softfloat: Clarify license status (2015-01-29 16:45:45 +)

 are available in the git repository at:

   git://github.com/cohuck/qemu tags/s390x-20150203

 for you to fetch changes up to 553ce81c31e49d834b1bf635ab486695a4694333:

   pc-bios/s390-ccw: update binary (2015-02-03 13:42:40 +0100)

 
 Some bugfixes and cleanups for s390x, both in the new pci code and
 in old code.

 

I'm a bit sad my fix-clang-warnings-in-s390 code patches didn't
make it in to this pull, because I think they're the only
remaining obstacle to my enabling warnings-are-errors in that
build config...

-- PMM

Re: [Qemu-devel] [PATCH] vfio: free dynamically-allocated data in instance_finalize

2015-02-03 Thread Alex Williamson

On Tue, 2015-02-03 at 13:48 +0100, Paolo Bonzini wrote:
 In order to enable out-of-BQL address space lookup, destruction of
 devices needs to be split in two phases.
 
 Unrealize is the first phase; once it complete no new accesses will
 be started, but there may still be pending memory accesses can still
 be completed.
 
 The second part is freeing the device, which only happens once all memory
 accesses are complete.  At this point the reference count has dropped to
 zero, an RCU grace period must have completed (because the RCU-protected
 FlatViews hold a reference to the device via memory_region_ref).  This is
 when instance_finalize is called.
 
 Freeing data belongs in an instance_finalize callback, because the
 dynamically allocated memory can still be used after unrealize by the
 pending memory accesses.
 
 In the case of VFIO, the unrealize callback is too early to munmap the
 BARs.  The munmap must be delayed until memory accesses are complete.
 To do this, split vfio_unmap_bars in two.  The removal step, now called
 vfio_unregister_bars, remains in vfio_exitfn.  The reclamation step
 is vfio_unmap_bars and is moved to the instance_finalize callback.
 
 Similarly, quirk MemoryRegions have to be removed during
 vfio_unregister_bars, but freeing the data structure must be delayed
 to vfio_unmap_bars.
 
 Cc: Alex Williamson alex.william...@redhat.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
   This patch is part of the third installment 3 of the RCU work.
   Sending it out separately for Alex to review it.
 
  hw/vfio/pci.c |   78 
 +-
  1 file changed, 68 insertions(+), 10 deletions(-)

Looks good to me.  I don't see any external dependencies, so do you want
me to pull this in through my branch?  Thanks,

Alex

 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 014a92c..69d4a33 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -1997,12 +1997,23 @@ static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
  
  static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
  {
 +VFIOQuirk *quirk;
 +int i;
 +
 +for (i = 0; i  ARRAY_SIZE(vdev-vga.region); i++) {
 +QLIST_FOREACH(quirk, vdev-vga.region[i].quirks, next) {
 +memory_region_del_subregion(vdev-vga.region[i].mem, 
 quirk-mem);
 +}
 +}
 +}
 +
 +static void vfio_vga_quirk_free(VFIOPCIDevice *vdev)
 +{
  int i;
  
  for (i = 0; i  ARRAY_SIZE(vdev-vga.region); i++) {
  while (!QLIST_EMPTY(vdev-vga.region[i].quirks)) {
  VFIOQuirk *quirk = QLIST_FIRST(vdev-vga.region[i].quirks);
 -memory_region_del_subregion(vdev-vga.region[i].mem, 
 quirk-mem);
  object_unparent(OBJECT(quirk-mem));
  QLIST_REMOVE(quirk, next);
  g_free(quirk);
 @@ -2023,10 +2034,19 @@ static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, 
 int nr)
  static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = vdev-bars[nr];
 +VFIOQuirk *quirk;
 +
 +QLIST_FOREACH(quirk, bar-quirks, next) {
 +memory_region_del_subregion(bar-region.mem, quirk-mem);
 +}
 +}
 +
 +static void vfio_bar_quirk_free(VFIOPCIDevice *vdev, int nr)
 +{
 +VFIOBAR *bar = vdev-bars[nr];
  
  while (!QLIST_EMPTY(bar-quirks)) {
  VFIOQuirk *quirk = QLIST_FIRST(bar-quirks);
 -memory_region_del_subregion(bar-region.mem, quirk-mem);
  object_unparent(OBJECT(quirk-mem));
  QLIST_REMOVE(quirk, next);
  g_free(quirk);
 @@ -2282,7 +2302,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, 
 bool enabled)
  }
  }
  
 -static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 +static void vfio_unregister_bar(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = vdev-bars[nr];
  
 @@ -2293,10 +2313,25 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int 
 nr)
  vfio_bar_quirk_teardown(vdev, nr);
  
  memory_region_del_subregion(bar-region.mem, bar-region.mmap_mem);
 -munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem));
  
  if (vdev-msix  vdev-msix-table_bar == nr) {
  memory_region_del_subregion(bar-region.mem, vdev-msix-mmap_mem);
 +}
 +}
 +
 +static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 +{
 +VFIOBAR *bar = vdev-bars[nr];
 +
 +if (!bar-region.size) {
 +return;
 +}
 +
 +vfio_bar_quirk_free(vdev, nr);
 +
 +munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem));
 +
 +if (vdev-msix  vdev-msix-table_bar == nr) {
  munmap(vdev-msix-mmap, memory_region_size(vdev-msix-mmap_mem));
  }
  }
 @@ -2413,6 +2448,19 @@ static void vfio_unmap_bars(VFIOPCIDevice *vdev)
  }
  
  if (vdev-has_vga) {
 +vfio_vga_quirk_free(vdev);
 +}
 +}
 +
 +static void vfio_unregister_bars(VFIOPCIDevice *vdev)
 +{
 +int i;
 +
 +for (i = 0; i  PCI_ROM_SLOT; i++) {
 +vfio_unregister_bar(vdev, i);
 +}
 +
 +if

Re: [Qemu-devel] [PATCH] block: introduce BDRV_REQUEST_MAX_SECTORS

2015-02-03 Thread Denis V. Lunev


On 03/02/15 17:30, Peter Lieven wrote:

Am 03.02.2015 um 14:29 schrieb Denis V. Lunev:

On 03/02/15 15:12, Peter Lieven wrote:

we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
  INT_MAX
  INT_MAX  BDRV_SECTOR_BITS
  UINT_MAX  BDRV_SECTOR_BITS
  SIZE_MAX  BDRV_SECTOR_BITS

This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.

Signed-off-by: Peter Lieven p...@kamp.de
---
  block.c   | 19 ---
  hw/block/virtio-blk.c |  4 ++--
  include/block/block.h |  3 +++
  3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 8272ef9..4e58b35 100644
--- a/block.c
+++ b/block.c
@@ -2671,7 +2671,7 @@ static int 
bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
  static int bdrv_check_request(BlockDriverState *bs, int64_t 
sector_num,

int nb_sectors)
  {
-if (nb_sectors  0 || nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EIO;
  }
  @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, 
int64_t sector_num, uint8_t *buf,

  .iov_len = nb_sectors * BDRV_SECTOR_SIZE,
  };
  -if (nb_sectors  0 || nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EINVAL;
  }
  @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, 
BdrvRequestFlags flags)

  }
for (;;) {
-nb_sectors = target_sectors - sector_num;
+nb_sectors = MIN(target_sectors - sector_num, 
BDRV_REQUEST_MAX_SECTORS);

  if (nb_sectors = 0) {
  return 0;
  }
-if (nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
-nb_sectors = INT_MAX / BDRV_SECTOR_SIZE;
-}
  ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n);
  if (ret  0) {
  error_report(error getting block status at sector % 
PRId64 : %s,
@@ -3167,7 +3164,7 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,

  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
  BdrvRequestFlags flags)
  {
-if (nb_sectors  0 || nb_sectors  (UINT_MAX  
BDRV_SECTOR_BITS)) {

+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EINVAL;
  }
  @@ -3202,8 +3199,8 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,

  struct iovec iov = {0};
  int ret = 0;
  -int max_write_zeroes = bs-bl.max_write_zeroes ?
-   bs-bl.max_write_zeroes : INT_MAX;
+int max_write_zeroes = MIN_NON_ZERO(bs-bl.max_write_zeroes,
+ BDRV_REQUEST_MAX_SECTORS);
while (nb_sectors  0  !ret) {
  int num = nb_sectors;
@@ -3458,7 +3455,7 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,

  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
  BdrvRequestFlags flags)
  {
-if (nb_sectors  0 || nb_sectors  (INT_MAX  
BDRV_SECTOR_BITS)) {

+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EINVAL;
  }
  @@ -5120,7 +5117,7 @@ int coroutine_fn 
bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,

  return 0;
  }
  -max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX;
+max_discard = MIN_NON_ZERO(bs-bl.max_discard, 
BDRV_REQUEST_MAX_SECTORS);

  while (nb_sectors  0) {
  int ret;
  int num = nb_sectors;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8c51a29..1a8a176 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend 
*blk, MultiReqBuffer *mrb)

  }
max_xfer_len = 
blk_get_max_transfer_length(mrb-reqs[0]-dev-blk);

-max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX);
+max_xfer_len = MIN_NON_ZERO(max_xfer_len, 
BDRV_REQUEST_MAX_SECTORS);

qsort(mrb-reqs, mrb-num_reqs, sizeof(*mrb-reqs),
multireq_compare);
@@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock 
*dev,

  uint64_t nb_sectors = size  BDRV_SECTOR_BITS;
  uint64_t total_sectors;
  -if (nb_sectors  INT_MAX) {
+if (nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return false;
  }
  if (sector  dev-sector_mask) {
diff --git a/include/block/block.h b/include/block/block.h
index 3082d2b..25a6d62 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -83,6 +83,9 @@ typedef enum {
  #define BDRV_SECTOR_SIZE   (1ULL  BDRV_SECTOR_BITS)
  #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
  +#define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX  BDRV_SECTOR_BITS, \
+ INT_MAX  BDRV_SECTOR_BITS)
+
  /*
   * Allocation status flags
   * BDRV_BLOCK_DATA: data is read from bs-file or another file

Reviewed-by: Denis V. Lunev

Re: [Qemu-devel] [PULL 0/9] s390x bugfixes and cleanups

2015-02-03 Thread Peter Maydell

On 3 February 2015 at 13:08, Cornelia Huck cornelia.h...@de.ibm.com wrote:
 The following changes since commit 16017c48547960539fcadb1f91d252124f442482:

   softfloat: Clarify license status (2015-01-29 16:45:45 +)

 are available in the git repository at:

   git://github.com/cohuck/qemu tags/s390x-20150203

 for you to fetch changes up to 553ce81c31e49d834b1bf635ab486695a4694333:

   pc-bios/s390-ccw: update binary (2015-02-03 13:42:40 +0100)

 
 Some bugfixes and cleanups for s390x, both in the new pci code and
 in old code.

 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 0/4] block: Drop BDS.filename

2015-02-03 Thread Kevin Wolf

Am 03.02.2015 um 14:48 hat Max Reitz geschrieben:
 On 2015-02-03 at 04:32, Kevin Wolf wrote:
 Am 24.09.2014 um 21:48 hat Max Reitz geschrieben:
 The BDS filename field is generally only used when opening disk images
 or emitting error or warning messages, the only exception to this rule
 is the map command of qemu-img. However, using exact_filename there
 instead should not be a problem. Therefore, we can drop the filename
 field from the BlockDriverState and use a function instead which builds
 the filename from scratch when called.
 
 This is slower than reading a static char array but the problem of that
 static array is that it may become obsolete due to changes in any
 BlockDriverState or in the BDS graph. Using a function which rebuilds
 the filename every time it is called resolves this problem.
 
 The disadvantage of worse performance is negligible, on the other hand.
 After patch 2 of this series, which replaces some queries of
 BDS.filename by reads from somewhere else (mostly BDS.exact_filename),
 the filename field is only used when a disk image is opened or some
 message should be emitted, both of which cases do not suffer from the
 performance hit.
 Surprisingly (or not), this one needs rebasing.
 
 Well...
 
 I tried it and it doesn't look too hard, but it's a little bit more than
 what I'm comfortable with doing while applying a series.
 
 I admire your courage, but I'm not sure whether this series is ready
 for being applied at all. First we (or I) will have to look into how
 users like libvirt which identify a BDS based on the filename can
 break from applying this series.

Well, I haven't reviewed it, so I can't tell. It didn't have a
(Self-)NACK and it's still on your list of to-be-merged patches, so I
took a look.  You're talking about courage - but I just wasn't
courageous enough yet to attack your larger series... ;-)

Kevin

[Qemu-devel] Looking for Outreachy sponsors for QEMU, libvirt, and KVM internships (was Outreach Program for Women)

2015-02-03 Thread Stefan Hajnoczi

Outreach Program for Women is renaming to Outreachy.  The new website
is: http://outreachy.org/

What is Outreachy?
Outreachy helps people from underrepresented groups join the open
source community
through a 12-week full-time paid internship.

The format is similar to Google Summer of Code.  Instead of funding
university students
the focus is on funding women (cis and trans), trans men, and
genderqueer people.

Last year QEMU participated with one intern, Maria, who developed a
qcow2 image format
fuzzer to find input validation bugs in QEMU's qcow2 block driver.

GNOME, the Linux kernel community, and other projects have also been
participating
successfully for years.

What is the level of sponsorship?
Sponsorship is $6,500 per intern.  Sponsors can choose their mentor if desired,
otherwise we have experienced mentors who can participate.

If your company wants to be active in growing the open source
community, this is a
great way to engage without administrating your own internship program!

Dates:
 * Funding commitment: Monday, February 16
 * Participating orgs announced: February 17
 * Application deadline for interns: March 24
 * Internship dates: May 25 to August 25

Sponsors are listed for recognition on the Outreachy website and can
promote job openings.

How do QEMU, libvirt, and KVM participate?
We try to participate in both Outreachy and Google Summer of Code each
year.  QEMU acts
as an umbrella organization for libvirt and KVM.  We have experienced
mentors and are able
to add new mentors who are active contributors to QEMU, libvirt, or KVM.

Full info for organizations:
https://wiki.gnome.org/Outreachy/Admin/InfoForOrgs

Please let me know if you have any questions.

Stefan

[Qemu-devel] [PATCH v3 8/8] tcg: Remove unused opcodes

2015-02-03 Thread Richard Henderson

We no longer need INDEX_op_end to terminate the list, nor do we
need 5 forms of nop, since we just remove the TCGOp instead.

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de
Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/tcg-opc.h |  9 -
 tcg/tcg.c |  7 ++-
 tci.c | 13 -
 3 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 042d442..42d0cfe 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -27,15 +27,6 @@
  */
 
 /* predefined ops */
-DEF(end, 0, 0, 0, TCG_OPF_NOT_PRESENT) /* must be kept first */
-DEF(nop, 0, 0, 0, TCG_OPF_NOT_PRESENT)
-DEF(nop1, 0, 0, 1, TCG_OPF_NOT_PRESENT)
-DEF(nop2, 0, 0, 2, TCG_OPF_NOT_PRESENT)
-DEF(nop3, 0, 0, 3, TCG_OPF_NOT_PRESENT)
-
-/* variable number of parameters */
-DEF(nopn, 0, 0, 1, TCG_OPF_NOT_PRESENT)
-
 DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT)
 DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 4115e8b..3841e99 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1260,7 +1260,7 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
 s-gen_first_op_idx = next;
 }
 
-*op = (TCGOp){ .opc = INDEX_op_nop, .next = -1, .prev = -1 };
+memset(op, -1, sizeof(*op));
 
 #ifdef CONFIG_PROFILER
 s-del_op_count++;
@@ -1385,8 +1385,6 @@ static void tcg_liveness_analysis(TCGContext *s)
 }
 break;
 case INDEX_op_debug_insn_start:
-case INDEX_op_nop:
-case INDEX_op_end:
 break;
 case INDEX_op_discard:
 /* mark the temporary as dead */
@@ -2244,7 +2242,7 @@ void tcg_dump_op_count(FILE *f, fprintf_function 
cpu_fprintf)
 {
 int i;
 
-for(i = INDEX_op_end; i  NB_OPS; i++) {
+for (i = 0; i  NB_OPS; i++) {
 cpu_fprintf(f, %s % PRId64 \n, tcg_op_defs[i].name,
 tcg_table_op_count[i]);
 }
@@ -2328,7 +2326,6 @@ static inline int tcg_gen_code_common(TCGContext *s,
 tcg_reg_alloc_movi(s, args, dead_args, sync_args);
 break;
 case INDEX_op_debug_insn_start:
-case INDEX_op_nop:
 break;
 case INDEX_op_discard:
 temp_dead(s, args[0]);
diff --git a/tci.c b/tci.c
index 4711ee4..28292b3 100644
--- a/tci.c
+++ b/tci.c
@@ -506,19 +506,6 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 tb_ptr += 2;
 
 switch (opc) {
-case INDEX_op_end:
-case INDEX_op_nop:
-break;
-case INDEX_op_nop1:
-case INDEX_op_nop2:
-case INDEX_op_nop3:
-case INDEX_op_nopn:
-case INDEX_op_discard:
-TODO();
-break;
-case INDEX_op_set_label:
-TODO();
-break;
 case INDEX_op_call:
 t0 = tci_read_ri(tb_ptr);
 #if TCG_TARGET_REG_BITS == 32
-- 
2.1.0

[Qemu-devel] [PATCH v3 5/8] tcg: Put opcodes in a linked list

2015-02-03 Thread Richard Henderson

The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.

Signed-off-by: Richard Henderson r...@twiddle.net
---
 include/exec/gen-icount.h |  22 ++-
 tcg/optimize.c| 286 ++-
 tcg/tcg-op.c  | 190 ---
 tcg/tcg.c | 376 +++---
 tcg/tcg.h |  58 ---
 5 files changed, 431 insertions(+), 501 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index a37a61d..6e5b012 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -11,8 +11,8 @@ static int exitreq_label;
 
 static inline void gen_tb_start(TranslationBlock *tb)
 {
-TCGv_i32 count;
-TCGv_i32 flag;
+TCGv_i32 count, flag, imm;
+int i;
 
 exitreq_label = gen_new_label();
 flag = tcg_temp_new_i32();
@@ -21,16 +21,25 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_gen_brcondi_i32(TCG_COND_NE, flag, 0, exitreq_label);
 tcg_temp_free_i32(flag);
 
-if (!(tb-cflags  CF_USE_ICOUNT))
+if (!(tb-cflags  CF_USE_ICOUNT)) {
 return;
+}
 
 icount_label = gen_new_label();
 count = tcg_temp_local_new_i32();
 tcg_gen_ld_i32(count, cpu_env,
-ENV_OFFSET + offsetof(CPUState, icount_decr.u32));
+
+imm = tcg_temp_new_i32();
+tcg_gen_movi_i32(imm, 0xdeadbeef);
+
 /* This is a horrid hack to allow fixing up the value later.  */
-icount_arg = tcg_ctx.gen_opparam_ptr + 1;
-tcg_gen_subi_i32(count, count, 0xdeadbeef);
+i = tcg_ctx.gen_last_op_idx;
+i = tcg_ctx.gen_op_buf[i].args;
+icount_arg = tcg_ctx.gen_opparam_buf[i + 1];
+
+tcg_gen_sub_i32(count, count, imm);
+tcg_temp_free_i32(imm);
 
 tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, icount_label);
 tcg_gen_st16_i32(count, cpu_env,
@@ -49,7 +58,8 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns)
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_ICOUNT_EXPIRED);
 }
 
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
+/* Terminate the linked list.  */
+tcg_ctx.gen_op_buf[tcg_ctx.gen_last_op_idx].next = -1;
 }
 
 static inline void gen_io_start(void)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 34ae3c2..f2b8acf 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -162,13 +162,13 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
 return false;
 }
 
-static void tcg_opt_gen_mov(TCGContext *s, int op_index, TCGArg *gen_args,
+static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
 TCGOpcode old_op, TCGArg dst, TCGArg src)
 {
 TCGOpcode new_op = op_to_mov(old_op);
 tcg_target_ulong mask;
 
-s-gen_opc_buf[op_index] = new_op;
+op-opc = new_op;
 
 reset_temp(dst);
 mask = temps[src].mask;
@@ -193,17 +193,17 @@ static void tcg_opt_gen_mov(TCGContext *s, int op_index, 
TCGArg *gen_args,
 temps[src].next_copy = dst;
 }
 
-gen_args[0] = dst;
-gen_args[1] = src;
+args[0] = dst;
+args[1] = src;
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, int op_index, TCGArg *gen_args,
+static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
  TCGOpcode old_op, TCGArg dst, TCGArg val)
 {
 TCGOpcode new_op = op_to_movi(old_op);
 tcg_target_ulong mask;
 
-s-gen_opc_buf[op_index] = new_op;
+op-opc = new_op;
 
 reset_temp(dst);
 temps[dst].state = TCG_TEMP_CONST;
@@ -215,8 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, int op_index, 
TCGArg *gen_args,
 }
 temps[dst].mask = mask;
 
-gen_args[0] = dst;
-gen_args[1] = val;
+args[0] = dst;
+args[1] = val;
 }
 
 static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
@@ -533,11 +533,9 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 }
 
 /* Propagate constants and copies, fold constant expressions. */
-static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
-TCGArg *args, TCGOpDef *tcg_op_defs)
+static void tcg_constant_folding(TCGContext *s)
 {
-int nb_ops, op_index, nb_temps, nb_globals;
-TCGArg *gen_args;
+int oi, oi_next, nb_temps, nb_globals;
 
 /* Array VALS has an element for each temp.
If this temp holds a constant then its value is kept in VALS' element.
@@ -548,24 +546,23 @@ static TCGArg *tcg_constant_folding(TCGContext *s, 
uint16_t *tcg_opc_ptr,
 nb_globals = s-nb_globals;
 reset_all_temps(nb_temps);
 
-nb_ops = tcg_opc_ptr - s-gen_opc_buf;
-gen_args = args;
-for (op_index = 0; op_index  nb_ops; op_index++) {
-TCGOpcode op = s-gen_opc_buf[op_index];
-const TCGOpDef *def = tcg_op_defs[op];
+for (oi = s-gen_first_op_idx; oi = 0; oi = oi_next) {
 tcg_target_ulong mask, partmask, affected;
-int nb_oargs, nb_iargs, nb_args, i;

[Qemu-devel] [PATCH v3 2/8] tcg: Reduce ifdefs in tcg-op.c

2015-02-03 Thread Richard Henderson

Almost completely eliminates the ifdefs in this file, improving
confidence in the lesser used 32-bit builds.

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de
Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/tcg-op.c | 449 +++
 1 file changed, 207 insertions(+), 242 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a6fd0a6..5305f1d 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -25,6 +25,15 @@
 #include tcg.h
 #include tcg-op.h
 
+/* Reduce the number of ifdefs below.  This assumes that all uses of
+   TCGV_HIGH and TCGV_LOW are properly protected by a conditional that
+   the compiler can eliminate.  */
+#if TCG_TARGET_REG_BITS == 64
+extern TCGv_i32 TCGV_LOW_link_error(TCGv_i64);
+extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
+#define TCGV_LOW  TCGV_LOW_link_error
+#define TCGV_HIGH TCGV_HIGH_link_error
+#endif
 
 void tcg_gen_op0(TCGContext *ctx, TCGOpcode opc)
 {
@@ -901,11 +910,14 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, 
int64_t arg2)
 
 void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
 {
-#if TCG_TARGET_REG_BITS == 32
-tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
-tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2  32);
-#else
 TCGv_i64 t0;
+
+if (TCG_TARGET_REG_BITS == 32) {
+tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
+tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2  32);
+return;
+}
+
 /* Some cases can be optimized here.  */
 switch (arg2) {
 case 0:
@@ -937,15 +949,15 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, 
uint64_t arg2)
 t0 = tcg_const_i64(arg2);
 tcg_gen_and_i64(ret, arg1, t0);
 tcg_temp_free_i64(t0);
-#endif
 }
 
 void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-#if TCG_TARGET_REG_BITS == 32
-tcg_gen_ori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
-tcg_gen_ori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2  32);
-#else
+if (TCG_TARGET_REG_BITS == 32) {
+tcg_gen_ori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
+tcg_gen_ori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2  32);
+return;
+}
 /* Some cases can be optimized here.  */
 if (arg2 == -1) {
 tcg_gen_movi_i64(ret, -1);
@@ -956,15 +968,15 @@ void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t 
arg2)
 tcg_gen_or_i64(ret, arg1, t0);
 tcg_temp_free_i64(t0);
 }
-#endif
 }
 
 void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-#if TCG_TARGET_REG_BITS == 32
-tcg_gen_xori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
-tcg_gen_xori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2  32);
-#else
+if (TCG_TARGET_REG_BITS == 32) {
+tcg_gen_xori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
+tcg_gen_xori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2  32);
+return;
+}
 /* Some cases can be optimized here.  */
 if (arg2 == 0) {
 tcg_gen_mov_i64(ret, arg1);
@@ -976,10 +988,8 @@ void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t 
arg2)
 tcg_gen_xor_i64(ret, arg1, t0);
 tcg_temp_free_i64(t0);
 }
-#endif
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static inline void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1,
   unsigned c, bool right, bool arith)
 {
@@ -1031,23 +1041,10 @@ static inline void tcg_gen_shifti_i64(TCGv_i64 ret, 
TCGv_i64 arg1,
 
 void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
 {
-tcg_gen_shifti_i64(ret, arg1, arg2, 0, 0);
-}
-
-void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
-{
-tcg_gen_shifti_i64(ret, arg1, arg2, 1, 0);
-}
-
-void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
-{
-tcg_gen_shifti_i64(ret, arg1, arg2, 1, 1);
-}
-#else /* TCG_TARGET_REG_SIZE == 64 */
-void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
-{
 tcg_debug_assert(arg2  64);
-if (arg2 == 0) {
+if (TCG_TARGET_REG_BITS == 32) {
+tcg_gen_shifti_i64(ret, arg1, arg2, 0, 0);
+} else if (arg2 == 0) {
 tcg_gen_mov_i64(ret, arg1);
 } else {
 TCGv_i64 t0 = tcg_const_i64(arg2);
@@ -1059,7 +1056,9 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, 
unsigned arg2)
 void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
 {
 tcg_debug_assert(arg2  64);
-if (arg2 == 0) {
+if (TCG_TARGET_REG_BITS == 32) {
+tcg_gen_shifti_i64(ret, arg1, arg2, 1, 0);
+} else if (arg2 == 0) {
 tcg_gen_mov_i64(ret, arg1);
 } else {
 TCGv_i64 t0 = tcg_const_i64(arg2);
@@ -1071,7 +1070,9 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, 
unsigned arg2)
 void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
 {
 tcg_debug_assert(arg2  64);
-if (arg2 == 0) {
+if (TCG_TARGET_REG_BITS == 32) {
+tcg_gen_shifti_i64(ret, arg1, arg2, 1, 1);
+} else if (arg2 == 0) {

Re: [Qemu-devel] [PATCH v5 07/10] qmp: add rocker device support

2015-02-03 Thread Eric Blake

On 01/22/2015 01:03 AM, sfel...@gmail.com wrote:
 From: Scott Feldman sfel...@gmail.com
 
 Add QMP/HMP support for rocker devices.  This is mostly for debugging purposes
 to see inside the device's tables and port configurations.  Some examples:
 

QMP interface review:

 +++ b/qapi-schema.json
 @@ -3523,3 +3523,6 @@
  # Since: 2.1
  ##
  { 'command': 'rtc-reset-reinjection' }
 +
 +# Rocker ethernet network switch
 +{ 'include': 'qapi/rocker.json' }
 diff --git a/qapi/rocker.json b/qapi/rocker.json
 new file mode 100644
 index 000..326c6c7
 --- /dev/null
 +++ b/qapi/rocker.json
 @@ -0,0 +1,259 @@
 +##
 +# @Rocker:
 +#
 +# Rocker switch information.
 +#
 +# @name: switch name
 +#
 +# @id: switch ID
 +#
 +# @ports: number of front-panel ports
 +##

Missing a 'Since: 2.3' designation.

 +{ 'type': 'RockerSwitch',
 +  'data': { 'name': 'str', 'id': 'uint64', 'ports': 'uint32' } }
 +
 +##
 +# @rocker:
 +#
 +# Return rocker switch information.
 +#
 +# Returns: @Rocker information
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'rocker',
 +  'data': { 'name': 'str' },
 +  'returns': 'RockerSwitch' }

Should this command be named 'query-rocker', as it is used for queries?
 Should the 'name' argument be optional, and the output be an array (all
rocker devices, rather than just a given rocker name lookup)?

 +
 +##
 +# @RockerPortDuplex:
 +#
 +# An eumeration of port duplex states.
 +#
 +# @half: half duplex
 +#
 +# @full: full duplex
 +##

Missing a 'Since: 2.3' designation.

 +{ 'enum': 'RockerPortDuplex', 'data': [ 'half', 'full' ] }
 +
 +##
 +# @RockerPortAutoneg:
 +#
 +# An eumeration of port autoneg states.
 +#
 +# @off: autoneg is off
 +#
 +# @on: autoneg is on
 +##

Missing a 'Since: 2.3' designation.

 +{ 'enum': 'RockerPortAutoneg', 'data': [ 'off', 'on' ] }
 +
 +##
 +# @RockerPort:
 +#
 +# Rocker switch port information.
 +#
 +# @name: port name
 +#
 +# @enabled: port is enabled for I/O
 +#
 +# @link-up: physical link is UP on port
 +#
 +# @speed: port link speed in Mbps
 +#
 +# @duplex: port link duplex
 +#
 +# @autoneg: port link autoneg
 +##

Missing a 'Since: 2.3' designation.

 +{ 'type': 'RockerPort',
 +  'data': { 'name': 'str', 'enabled': 'bool', 'link-up': 'bool',
 +'speed': 'uint32', 'duplex': 'RockerPortDuplex',
 +'autoneg': 'RockerPortAutoneg' } }
 +
 +##
 +# @rocker-ports:
 +#
 +# Return rocker switch information.
 +#
 +# Returns: @Rocker information
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'rocker-ports',

Should this be named 'query-rocker-ports'?  Should the port information
be returned as part of the more generic 'rocker' command rather than
having to do a two-stage query (what are my rocker devices, then for
each device what are the ports)?

 +  'data': { 'name': 'str' },
 +  'returns': ['RockerPort'] }
 +
 +##
 +# @RockerOfDpaFlowKey:
 +#
 +# Rocker switch OF-DPA flow key
 +#
 +# @priority: key priority, 0 being lowest priority
 +#
 +# @tbl-id: flow table ID
 +#
 +# @in-pport: physical input port
 +#
 +# @tunnel-id: tunnel ID
 +#
 +# @vlan-id: VLAN ID
 +#
 +# @eth-type: Ethernet header type
 +#
 +# @eth-src: Ethernet header source MAC address
 +#
 +# @eth-dst: Ethernet header destination MAC address
 +#
 +# @ip-proto: IP Header protocol field
 +#
 +# @ip-tos: IP header TOS field
 +#
 +# @ip-dst: IP header destination address
 +##

Missing a 'Since: 2.3' designation.

 +{ 'type': 'RockerOfDpaFlowKey',
 +  'data' : { 'priority': 'uint32', 'tbl-id': 'uint32', '*in-pport': 'uint32',
 + '*tunnel-id': 'uint32', '*vlan-id': 'uint16',
 + '*eth-type': 'uint16', '*eth-src': 'str', '*eth-dst': 'str',
 + '*ip-proto': 'uint8', '*ip-tos': 'uint8', '*ip-dst': 'str' } }

Missing '#optional' tags on the various optional fields.  Why are
certain fields optional?  Does it mean they have a default value, or
that they don't make sense in some configurations?  The docs could be
more clear on that.

 +
 +##
 +# @RockerOfDpaFlowMask:
 +#
 +# Rocker switch OF-DPA flow mask
 +#
 +# @in-pport: physical input port
 +#
 +# @tunnel-id: tunnel ID
 +#
 +# @vlan-id: VLAN ID
 +#
 +# @eth-src: Ethernet header source MAC address
 +#
 +# @eth-dst: Ethernet header destination MAC address
 +#
 +# @ip-proto: IP Header protocol field
 +#
 +# @ip-tos: IP header TOS field
 +##

Missing a 'Since: 2.3' designation.

 +{ 'type': 'RockerOfDpaFlowMask',
 +  'data' : { '*in-pport': 'uint32', '*tunnel-id': 'uint32',
 + '*vlan-id': 'uint16', '*eth-src': 'str', '*eth-dst': 'str',
 + '*ip-proto': 'uint8', '*ip-tos': 'uint8' } }

Again, missing #optional tags in the docs, as well as what it means when
a field is omitted.

 +
 +##
 +# @RockerOfDpaFlowAction:
 +#
 +# Rocker switch OF-DPA flow action
 +#
 +# @goto-tbl: next table ID
 +#
 +# @group-id: group ID
 +#
 +# @tunnel-lport: tunnel logical port ID
 +#
 +# @vlan-id: VLAN ID
 +#
 +# @new-vlan-id: new VLAN ID
 +#
 +# @out-pport: physical output port
 +##

Missing a 'Since: 2.3' designation.

 +{ 'type':

Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory

2015-02-03 Thread Thomas Huth

On Tue, 03 Feb 2015 14:04:43 +0100
Paolo Bonzini pbonz...@redhat.com wrote:

 On 03/02/2015 13:11, Thomas Huth wrote:
  On s390, we've got to make sure to hold the IPTE lock while accessing
  virtual memory. So let's add an ioctl for reading and writing virtual
  memory to provide this feature for userspace, too.
  
  Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
  Reviewed-by: Dominik Dingel din...@linux.vnet.ibm.com
  Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
  ---
   Documentation/virtual/kvm/api.txt |   44 +
   arch/s390/kvm/gaccess.c   |   22 +
   arch/s390/kvm/gaccess.h   |2 +
   arch/s390/kvm/kvm-s390.c  |   63 
  +
   include/uapi/linux/kvm.h  |   21 
   5 files changed, 152 insertions(+), 0 deletions(-)
  
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index b112efc..bf44b53 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -2716,6 +2716,50 @@ The fields in each entry are defined as follows:
  eax, ebx, ecx, edx: the values returned by the cpuid instruction for
this function/index combination
   
  +4.89 KVM_GUEST_MEM_OP
  +
  +Capability: KVM_CAP_MEM_OP
 
 Put virtual somewhere in the ioctl name and capability?

Actually, I'd prefer to keep the virtual in the defines for the type
of operation below: When it comes to s390 storage keys, we likely might
need some calls for reading and writing to physical memory, too. Then
we could simply extend this ioctl instead of inventing a new one.

  +Architectures: s390
  +Type: vcpu ioctl
  +Parameters: struct kvm_guest_mem_op (in)
  +Returns: = 0 on success,
  +  0 on generic error (e.g. -EFAULT or -ENOMEM),
  +  0 if an exception occurred while walking the page tables
  +
  +Read or write data from/to the virtual memory of a VPCU.
  +
  +Parameters are specified via the following structure:
  +
  +struct kvm_guest_mem_op {
  +   __u64 gaddr;/* the guest address */
  +   __u64 flags;/* arch specific flags */
  +   __u32 size; /* amount of bytes */
  +   __u32 op;   /* type of operation */
  +   __u64 buf;  /* buffer in userspace */
  +   __u8 reserved[32];  /* should be set to 0 */
  +};
  +
  +The type of operation is specified in the op field, either 
  KVM_MEMOP_VIRTREAD
  +for reading from memory, KVM_MEMOP_VIRTWRITE for writing to memory, or
  +KVM_MEMOP_CHECKVIRTREAD or KVM_MEMOP_CHECKVIRTWRITE to check whether the
 
 Better:
 
 #define KVM_MEMOP_READ   0
 #define KVM_MEMOP_WRITE  1
 
 and in the flags field:
 
 #define KVM_MEMOP_F_CHECK_ONLY (1  1)

Ok, a flag for the check operations is fine for me, too.

...
  +The logical (virtual) start address of the memory region has to be 
  specified
  +in the gaddr field, and the length of the region in the size field.
  +buf is the buffer supplied by the userspace application where the read 
  data
  +should be written to for KVM_MEMOP_VIRTREAD, or where the data that should
  +be written is stored for a KVM_MEMOP_VIRTWRITE. buf can be NULL for both
  +CHECK operations.
 
 buf is unused and can be NULL for both CHECK operations.
 
  +The reserved field is meant for future extensions. It must currently be
  +set to 0.
 
 Not really true, as you don't check it.  So It is not used by KVM with
 the currently defined set of flags is a better explanation.

ok ... and maybe add should be set to zero ?

 Paolo

Thanks for the review!

 Thomas

[Qemu-devel] [PATCH v3 0/8] Linked list for tcg ops

2015-02-03 Thread Richard Henderson

Currently tcg ops are simply placed in a buffer in order.  Which is
fine until we want to actually do something with the opcode stream,
such as optimize them.  Note the horrible things like call opcodes
needing their argument count both prefixed and postfixed so that we
can iterate across the call either forward or backward.

While I'm changing this, I also move quite a lot of tcg-op.h out of
line.  There is very little benefit to having most of them be inline,
since their arguments are extracted from the guest instructions being
translated, and thus their values are not really predictable.

I chose a cutoff of one function call.  If a tcg-op.h function consists
of a single function call, inline it, otherwise move it out of line.

This also removes a bit of boilerplate from each target.

I haven't been able to measure a performance difference with this
patch set.  I wouldn't really expect any, as the complexity level
remains the same.  I simply find the link list significantly more
maintainable.

Changes v2-v3:
  * Parameter order bug affecting 32-bit hosts fixed (thanks Peter).


r~


Richard Henderson (8):
  tcg: Move some opcode generation functions out of line
  tcg: Reduce ifdefs in tcg-op.c
  tcg: Move emit of INDEX_op_end into gen_tb_end
  tcg: Introduce tcg_op_buf_count and tcg_op_buf_full
  tcg: Put opcodes in a linked list
  tcg: Remove opcodes instead of noping them out
  tcg: Implement insert_op_before
  tcg: Remove unused opcodes

 Makefile.target   |2 +-
 include/exec/gen-icount.h |   22 +-
 target-alpha/translate.c  |   16 +-
 target-arm/translate-a64.c|   10 +-
 target-arm/translate.c|   10 +-
 target-cris/translate.c   |   15 +-
 target-i386/translate.c   |   11 +-
 target-lm32/translate.c   |   16 +-
 target-m68k/translate.c   |   10 +-
 target-microblaze/translate.c |   22 +-
 target-mips/translate.c   |   10 +-
 target-moxie/translate.c  |   10 +-
 target-openrisc/translate.c   |   15 +-
 target-ppc/translate.c|   11 +-
 target-s390x/translate.c  |   11 +-
 target-sh4/translate.c|   10 +-
 target-sparc/translate.c  |   10 +-
 target-tricore/translate.c|5 +-
 target-unicore32/translate.c  |   10 +-
 target-xtensa/translate.c |8 +-
 tcg/optimize.c|  307 +++--
 tcg/tcg-op.c  | 1934 
 tcg/tcg-op.h  | 2487 ++---
 tcg/tcg-opc.h |9 -
 tcg/tcg.c |  532 +++--
 tcg/tcg.h |   72 +-
 tci.c |   13 -
 27 files changed, 2751 insertions(+), 2837 deletions(-)
 create mode 100644 tcg/tcg-op.c

-- 
2.1.0

[Qemu-devel] [PATCH v3 3/8] tcg: Move emit of INDEX_op_end into gen_tb_end

2015-02-03 Thread Richard Henderson

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de
Signed-off-by: Richard Henderson r...@twiddle.net
---
 include/exec/gen-icount.h | 2 ++
 target-alpha/translate.c  | 2 +-
 target-arm/translate-a64.c| 1 -
 target-arm/translate.c| 1 -
 target-cris/translate.c   | 2 +-
 target-i386/translate.c   | 2 +-
 target-lm32/translate.c   | 2 +-
 target-m68k/translate.c   | 1 -
 target-microblaze/translate.c | 2 +-
 target-mips/translate.c   | 2 +-
 target-moxie/translate.c  | 2 +-
 target-openrisc/translate.c   | 2 +-
 target-ppc/translate.c| 2 +-
 target-s390x/translate.c  | 2 +-
 target-sh4/translate.c| 2 +-
 target-sparc/translate.c  | 2 +-
 target-tricore/translate.c| 1 -
 target-unicore32/translate.c  | 1 -
 target-xtensa/translate.c | 1 -
 19 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 221aad0..a37a61d 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -48,6 +48,8 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns)
 gen_set_label(icount_label);
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_ICOUNT_EXPIRED);
 }
+
+*tcg_ctx.gen_opc_ptr = INDEX_op_end;
 }
 
 static inline void gen_io_start(void)
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index f888367..aa04c60 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2912,7 +2912,7 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 }
 
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
+
 if (search_pc) {
 j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
 lj++;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 80d2359..10e09bc 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -11090,7 +11090,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 
 done_generating:
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
 
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
diff --git a/target-arm/translate.c b/target-arm/translate.c
index bdfcdf1..4b30698 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11330,7 +11330,6 @@ static inline void 
gen_intermediate_code_internal(ARMCPU *cpu,
 
 done_generating:
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
 
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
diff --git a/target-cris/translate.c b/target-cris/translate.c
index b675ed0..b5a792c 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -3344,7 +3344,7 @@ gen_intermediate_code_internal(CRISCPU *cpu, 
TranslationBlock *tb,
 }
 }
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
+
 if (search_pc) {
 j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
 lj++;
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 9ebdf4b..e2e21e4 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8077,7 +8077,7 @@ static inline void gen_intermediate_code_internal(X86CPU 
*cpu,
 gen_io_end();
 done_generating:
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
+
 /* we don't forget to fill the last values */
 if (search_pc) {
 j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
diff --git a/target-lm32/translate.c b/target-lm32/translate.c
index a7579dc..cd09293 100644
--- a/target-lm32/translate.c
+++ b/target-lm32/translate.c
@@ -1158,7 +1158,7 @@ void gen_intermediate_code_internal(LM32CPU *cpu,
 }
 
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
+
 if (search_pc) {
 j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
 lj++;
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 47edc7a..7e98a17 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -3075,7 +3075,6 @@ gen_intermediate_code_internal(M68kCPU *cpu, 
TranslationBlock *tb,
 }
 }
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
 
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
index 69ce4df..437a069 100644
--- a/target-microblaze/translate.c
+++ b/target-microblaze/translate.c
@@ -1846,7 +1846,7 @@ gen_intermediate_code_internal(MicroBlazeCPU *cpu, 
TranslationBlock *tb,
 }
 }
 gen_tb_end(tb, num_insns);
-*tcg_ctx.gen_opc_ptr = INDEX_op_end;
+
 if (search_pc) {
 j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
 lj++;
diff --git a/target-mips/translate.c b/target-mips/translate.c
index e9d86b2..70b5b45 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -19240,7 +19240,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, 
TranslationBlock *tb,
 }

[Qemu-devel] [PATCH v3 7/8] tcg: Implement insert_op_before

2015-02-03 Thread Richard Henderson

Rather reserving space in the op stream for optimization,
let the optimizer add ops as necessary.

Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/optimize.c | 57 +++--
 tcg/tcg-op.c   | 21 -
 tcg/tcg-op.h   |  1 -
 3 files changed, 35 insertions(+), 44 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 973fbb4..067917c 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -67,6 +67,37 @@ static void reset_temp(TCGArg temp)
 temps[temp].mask = -1;
 }
 
+static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
+TCGOpcode opc, int nargs)
+{
+int oi = s-gen_next_op_idx;
+int pi = s-gen_next_parm_idx;
+int prev = old_op-prev;
+int next = old_op - s-gen_op_buf;
+TCGOp *new_op;
+
+tcg_debug_assert(oi  OPC_BUF_SIZE);
+tcg_debug_assert(pi + nargs = OPPARAM_BUF_SIZE);
+s-gen_next_op_idx = oi + 1;
+s-gen_next_parm_idx = pi + nargs;
+
+new_op = s-gen_op_buf[oi];
+*new_op = (TCGOp){
+.opc = opc,
+.args = pi,
+.prev = prev,
+.next = next
+};
+if (prev = 0) {
+s-gen_op_buf[prev].next = oi;
+} else {
+s-gen_first_op_idx = oi;
+}
+old_op-prev = oi;
+
+return new_op;
+}
+
 /* Reset all temporaries, given that there are NB_TEMPS of them.  */
 static void reset_all_temps(int nb_temps)
 {
@@ -1108,8 +1139,8 @@ static void tcg_constant_folding(TCGContext *s)
 uint64_t a = ((uint64_t)ah  32) | al;
 uint64_t b = ((uint64_t)bh  32) | bl;
 TCGArg rl, rh;
-TCGOp *op2;
-TCGArg *args2;
+TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2);
+TCGArg *args2 = s-gen_opparam_buf[op2-args];
 
 if (opc == INDEX_op_add2_i32) {
 a += b;
@@ -1117,15 +1148,6 @@ static void tcg_constant_folding(TCGContext *s)
 a -= b;
 }
 
-/* We emit the extra nop when we emit the add2/sub2.  */
-op2 = s-gen_op_buf[oi_next];
-assert(op2-opc == INDEX_op_nop);
-
-/* But we still have to allocate args for the op.  */
-op2-args = s-gen_next_parm_idx;
-s-gen_next_parm_idx += 2;
-args2 = s-gen_opparam_buf[op2-args];
-
 rl = args[0];
 rh = args[1];
 tcg_opt_gen_movi(s, op, args, opc, rl, (uint32_t)a);
@@ -1144,17 +1166,8 @@ static void tcg_constant_folding(TCGContext *s)
 uint32_t b = temps[args[3]].val;
 uint64_t r = (uint64_t)a * b;
 TCGArg rl, rh;
-TCGOp *op2;
-TCGArg *args2;
-
-/* We emit the extra nop when we emit the mulu2.  */
-op2 = s-gen_op_buf[oi_next];
-assert(op2-opc == INDEX_op_nop);
-
-/* But we still have to allocate args for the op.  */
-op2-args = s-gen_next_parm_idx;
-s-gen_next_parm_idx += 2;
-args2 = s-gen_opparam_buf[op2-args];
+TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2);
+TCGArg *args2 = s-gen_opparam_buf[op2-args];
 
 rl = args[0];
 rh = args[1];
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index cbaa15c..afa351d 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -57,11 +57,6 @@ static void tcg_emit_op(TCGContext *ctx, TCGOpcode opc, int 
args)
 };
 }
 
-void tcg_gen_op0(TCGContext *ctx, TCGOpcode opc)
-{
-tcg_emit_op(ctx, opc, -1);
-}
-
 void tcg_gen_op1(TCGContext *ctx, TCGOpcode opc, TCGArg a1)
 {
 int pi = ctx-gen_next_parm_idx;
@@ -571,8 +566,6 @@ void tcg_gen_add2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al,
 {
 if (TCG_TARGET_HAS_add2_i32) {
 tcg_gen_op6_i32(INDEX_op_add2_i32, rl, rh, al, ah, bl, bh);
-/* Allow the optimizer room to replace add2 with two moves.  */
-tcg_gen_op0(tcg_ctx, INDEX_op_nop);
 } else {
 TCGv_i64 t0 = tcg_temp_new_i64();
 TCGv_i64 t1 = tcg_temp_new_i64();
@@ -590,8 +583,6 @@ void tcg_gen_sub2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al,
 {
 if (TCG_TARGET_HAS_sub2_i32) {
 tcg_gen_op6_i32(INDEX_op_sub2_i32, rl, rh, al, ah, bl, bh);
-/* Allow the optimizer room to replace sub2 with two moves.  */
-tcg_gen_op0(tcg_ctx, INDEX_op_nop);
 } else {
 TCGv_i64 t0 = tcg_temp_new_i64();
 TCGv_i64 t1 = tcg_temp_new_i64();
@@ -608,8 +599,6 @@ void tcg_gen_mulu2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 
arg1, TCGv_i32 arg2)
 {
 if (TCG_TARGET_HAS_mulu2_i32) {
 tcg_gen_op4_i32(INDEX_op_mulu2_i32, rl, rh, arg1, arg2);
-/* Allow the optimizer room to replace mulu2 with two moves.  */
-tcg_gen_op0(tcg_ctx, INDEX_op_nop);
 } else if

[Qemu-devel] QEMU crash on PCI passthrough

2015-02-03 Thread Krzysztof Katowicz-Kowalewski

Hello qemu-devel list,

I have a problem with PCI passthrough of my second gfx card (Nvidia GTX
760). I use gentoo, gentoo-hardened kernel sources and I have SELinux
enabled (anyway for purpose of these tests I've added svirt_t and virtd_t
to permissive types, so it shouldn't make any problem). I use the following
versions of applications:

app-emulation/qemu-2.2.0 (but I've also tested app-emulation/qemu-2.1.2-r2)
app-emulation/libvirt-1.2.10-r4
app-emulation/virt-manager-1.1.0

While running guest without PCI passthrough everything works fine so far,
when I pass my PCI device to the guest I end up with the following message:

__QUOTE_BEGIN_CANARY__
Error starting domain: internal error: Process exited while reading console
log output: char device redirected to /dev/pts/3 (label charserial0)
qemu: hardware error: pci read failed, ret = 0 errno = 0

CPU #0:
EAX= EBX= ECX= EDX=0663
ESI= EDI= EBP= ESP=
EIP=fff0 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000   9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3=0
__QUOTE_END_CANARY__

In /var/log/libvirtd/libvirtd.log there is:
2015-02-03 02:12:27.135+: 3073: error : qemuProcessReadLogOutput:1719 :
internal error: Process exited while reading console log output: char
device redirected to /dev/pts/3 (label charserial0)
qemu: hardware error: pci read failed, ret = 0 errno = 0

__QUOTE_BEGIN_CANARY__ ... __QUOTE_END_CANARY__

And in /var/log/libvirtd/qemu/guest.log there is:
2015-02-03 02:12:27.025+: starting up
LC_ALL=C
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
HOME=/ USER=root QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name
debian7 -S -machine pc-i440fx-2.1,accel=kvm -m 1024 -smp
1,sockets=1,cores=1,threads=1 -uuid
a00b499e-ad90-4470-8b65-18e17e55dca4 -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/debian7.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,driftfix=slew -no-hpet -no-shutdown -boot c -usb -drive
file=/var/lib/libvirt/filesystems/debian.qcow2,if=none,id=drive-virtio-disk0,format=qcow2
-device
virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0
-drive
file=/var/lib/libvirt/images/debian-7.8.0-amd64-netinst.iso,if=none,media=cdrom,
id=drive-ide0-0-0,readonly=on,format=raw -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device
pci-assign,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
char device redirected to /dev/pts/3 (label charserial0)
qemu: hardware error: pci read failed, ret = 0 errno = 0

__QUOTE_BEGIN_CANARY__ ... __QUOTE_END_CANARY__

I add that I was required to turn off integrated sound device because the
earlier error was:
2015-02-03 01:47:40.189+: 3070: error : virPCIDeviceReset:985 :
internal error: Unable to reset PCI device :06:00.0: internal error:
Active :06:00.1 devices on bus with :06:00.0, not doing bus reset
after typing: echo 1 
/sys/devices/pci:00/:00:15.0/:06:00.1/remove this problem
disappears and I'm stuck on the above one.

Any help? Any work around? Is it a known issue? Could I provide any
additional info to help you diagnose the problem?

Thanks, Chris

Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory

2015-02-03 Thread Paolo Bonzini



On 03/02/2015 16:16, Thomas Huth wrote:
 Actually, I'd prefer to keep the virtual in the defines for the type
 of operation below: When it comes to s390 storage keys, we likely might
 need some calls for reading and writing to physical memory, too. Then
 we could simply extend this ioctl instead of inventing a new one.

Can you explain why it is necessary to read/write physical addresses
from user space?  In the case of QEMU, I'm worried that you would have
to invent your own memory read/write APIs that are different from
everything else.

On real s390 zPCI, does bus-master DMA update storage keys?

 Not really true, as you don't check it.  So It is not used by KVM with
 the currently defined set of flags is a better explanation.
 
 ok ... and maybe add should be set to zero ?

If you don't check it, it is misleading to document this.

Paolo

[Qemu-devel] [PATCH v3 6/8] tcg: Remove opcodes instead of noping them out

2015-02-03 Thread Richard Henderson

With the linked list scheme we need not leave nops in the stream
that we need to process later.

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de
Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/optimize.c | 14 +++---
 tcg/tcg.c  | 28 
 tcg/tcg.h  |  1 +
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f2b8acf..973fbb4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -758,7 +758,7 @@ static void tcg_constant_folding(TCGContext *s)
 break;
 do_mov3:
 if (temps_are_copies(args[0], args[1])) {
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 } else {
 tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]);
 }
@@ -916,7 +916,7 @@ static void tcg_constant_folding(TCGContext *s)
 if (affected == 0) {
 assert(nb_oargs == 1);
 if (temps_are_copies(args[0], args[1])) {
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 } else if (temps[args[1]].state != TCG_TEMP_CONST) {
 tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]);
 } else {
@@ -948,7 +948,7 @@ static void tcg_constant_folding(TCGContext *s)
 CASE_OP_32_64(and):
 if (temps_are_copies(args[1], args[2])) {
 if (temps_are_copies(args[0], args[1])) {
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 } else {
 tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]);
 }
@@ -979,7 +979,7 @@ static void tcg_constant_folding(TCGContext *s)
 switch (opc) {
 CASE_OP_32_64(mov):
 if (temps_are_copies(args[0], args[1])) {
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 break;
 }
 if (temps[args[1]].state != TCG_TEMP_CONST) {
@@ -1074,7 +1074,7 @@ static void tcg_constant_folding(TCGContext *s)
 op-opc = INDEX_op_br;
 args[0] = args[3];
 } else {
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 }
 break;
 }
@@ -1084,7 +1084,7 @@ static void tcg_constant_folding(TCGContext *s)
 tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]);
 if (tmp != 2) {
 if (temps_are_copies(args[0], args[4-tmp])) {
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 } else if (temps[args[4-tmp]].state == TCG_TEMP_CONST) {
 tcg_opt_gen_movi(s, op, args, opc,
  args[0], temps[args[4-tmp]].val);
@@ -1177,7 +1177,7 @@ static void tcg_constant_folding(TCGContext *s)
 args[0] = args[5];
 } else {
 do_brcond_false:
-op-opc = INDEX_op_nop;
+tcg_op_remove(s, op);
 }
 } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
 temps[args[2]].state == TCG_TEMP_CONST
diff --git a/tcg/tcg.c b/tcg/tcg.c
index ee041b9..4115e8b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1244,6 +1244,29 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef 
*tdefs)
 #endif
 }
 
+void tcg_op_remove(TCGContext *s, TCGOp *op)
+{
+int next = op-next;
+int prev = op-prev;
+
+if (next = 0) {
+s-gen_op_buf[next].prev = prev;
+} else {
+s-gen_last_op_idx = prev;
+}
+if (prev = 0) {
+s-gen_op_buf[prev].next = next;
+} else {
+s-gen_first_op_idx = next;
+}
+
+*op = (TCGOp){ .opc = INDEX_op_nop, .next = -1, .prev = -1 };
+
+#ifdef CONFIG_PROFILER
+s-del_op_count++;
+#endif
+}
+
 #ifdef USE_LIVENESS_ANALYSIS
 /* liveness analysis: end of function: all temps are dead, and globals
should be in memory. */
@@ -1466,10 +1489,7 @@ static void tcg_liveness_analysis(TCGContext *s)
 }
 }
 do_remove:
-op-opc = INDEX_op_nop;
-#ifdef CONFIG_PROFILER
-s-del_op_count++;
-#endif
+tcg_op_remove(s, op);
 } else {
 do_not_remove:
 /* output args are dead */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 596e30a..f941965 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -743,6 +743,7 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef 
*tdefs);
 void tcg_gen_callN(TCGContext *s, void *func,
TCGArg ret, int nargs, TCGArg *args);
 
+void tcg_op_remove(TCGContext *s, TCGOp *op);
 void tcg_optimize(TCGContext *s);
 
 /* only used for debugging purposes */
-- 
2.1.0

[Qemu-devel] [PATCH v3 4/8] tcg: Introduce tcg_op_buf_count and tcg_op_buf_full

2015-02-03 Thread Richard Henderson

The method by which we count the number of ops emitted
is going to change.  Abstract that away into some inlines.

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de
Signed-off-by: Richard Henderson r...@twiddle.net
---
 target-alpha/translate.c  | 14 +++---
 target-arm/translate-a64.c|  9 +++--
 target-arm/translate.c|  9 +++--
 target-cris/translate.c   | 13 +
 target-i386/translate.c   |  9 +++--
 target-lm32/translate.c   | 14 +-
 target-m68k/translate.c   |  9 +++--
 target-microblaze/translate.c | 20 
 target-mips/translate.c   |  8 +++-
 target-moxie/translate.c  |  8 +++-
 target-openrisc/translate.c   | 13 +
 target-ppc/translate.c|  9 +++--
 target-s390x/translate.c  |  9 +++--
 target-sh4/translate.c|  8 +++-
 target-sparc/translate.c  |  8 +++-
 target-tricore/translate.c|  4 +---
 target-unicore32/translate.c  |  9 +++--
 target-xtensa/translate.c |  7 +++
 tcg/tcg.h | 12 
 19 files changed, 79 insertions(+), 113 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index aa04c60..9c77d46 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2790,7 +2790,6 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 target_ulong pc_start;
 target_ulong pc_mask;
 uint32_t insn;
-uint16_t *gen_opc_end;
 CPUBreakpoint *bp;
 int j, lj = -1;
 ExitStatus ret;
@@ -2798,7 +2797,6 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 int max_insns;
 
 pc_start = tb-pc;
-gen_opc_end = tcg_ctx.gen_opc_buf + OPC_MAX_SIZE;
 
 ctx.tb = tb;
 ctx.pc = pc_start;
@@ -2839,11 +2837,12 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 }
 }
 if (search_pc) {
-j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
+j = tcg_op_buf_count();
 if (lj  j) {
 lj++;
-while (lj  j)
+while (lj  j) {
 tcg_ctx.gen_opc_instr_start[lj++] = 0;
+}
 }
 tcg_ctx.gen_opc_pc[lj] = ctx.pc;
 tcg_ctx.gen_opc_instr_start[lj] = 1;
@@ -2881,7 +2880,7 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
or exhaust instruction count, stop generation.  */
 if (ret == NO_EXIT
  ((ctx.pc  pc_mask) == 0
-|| tcg_ctx.gen_opc_ptr = gen_opc_end
+|| tcg_op_buf_full()
 || num_insns = max_insns
 || singlestep
 || ctx.singlestep_enabled)) {
@@ -2914,10 +2913,11 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 gen_tb_end(tb, num_insns);
 
 if (search_pc) {
-j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
+j = tcg_op_buf_count();
 lj++;
-while (lj = j)
+while (lj = j) {
 tcg_ctx.gen_opc_instr_start[lj++] = 0;
+}
 } else {
 tb-size = ctx.pc - pc_start;
 tb-icount = num_insns;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 10e09bc..a85ca5d 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -10899,7 +10899,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 CPUARMState *env = cpu-env;
 DisasContext dc1, *dc = dc1;
 CPUBreakpoint *bp;
-uint16_t *gen_opc_end;
 int j, lj;
 target_ulong pc_start;
 target_ulong next_page_start;
@@ -10910,8 +10909,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 
 dc-tb = tb;
 
-gen_opc_end = tcg_ctx.gen_opc_buf + OPC_MAX_SIZE;
-
 dc-is_jmp = DISAS_NEXT;
 dc-pc = pc_start;
 dc-singlestep_enabled = cs-singlestep_enabled;
@@ -10980,7 +10977,7 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 }
 
 if (search_pc) {
-j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
+j = tcg_op_buf_count();
 if (lj  j) {
 lj++;
 while (lj  j) {
@@ -11030,7 +11027,7 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
  * ensures prefetch aborts occur at the right place.
  */
 num_insns++;
-} while (!dc-is_jmp  tcg_ctx.gen_opc_ptr  gen_opc_end 
+} while (!dc-is_jmp  !tcg_op_buf_full() 
  !cs-singlestep_enabled 
  !singlestep 
  !dc-ss_active 
@@ -11101,7 +11098,7 @@ done_generating:
 }
 #endif
 if (search_pc) {
-j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf;
+j = tcg_op_buf_count();
 lj++;
 while (lj = j) {
 tcg_ctx.gen_opc_instr_start[lj++] = 0;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 4b30698..24658f6 100644
--- a/target-arm/translate.c
+++

Re: [Qemu-devel] [PATCH] block: introduce BDRV_REQUEST_MAX_SECTORS

2015-02-03 Thread Peter Lieven


Am 03.02.2015 um 14:29 schrieb Denis V. Lunev:

On 03/02/15 15:12, Peter Lieven wrote:

we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
  INT_MAX
  INT_MAX  BDRV_SECTOR_BITS
  UINT_MAX  BDRV_SECTOR_BITS
  SIZE_MAX  BDRV_SECTOR_BITS

This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.

Signed-off-by: Peter Lieven p...@kamp.de
---
  block.c   | 19 ---
  hw/block/virtio-blk.c |  4 ++--
  include/block/block.h |  3 +++
  3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 8272ef9..4e58b35 100644
--- a/block.c
+++ b/block.c
@@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, 
int64_t offset,
  static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num,
int nb_sectors)
  {
-if (nb_sectors  0 || nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EIO;
  }
  @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t 
sector_num, uint8_t *buf,
  .iov_len = nb_sectors * BDRV_SECTOR_SIZE,
  };
  -if (nb_sectors  0 || nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EINVAL;
  }
  @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, 
BdrvRequestFlags flags)
  }
for (;;) {
-nb_sectors = target_sectors - sector_num;
+nb_sectors = MIN(target_sectors - sector_num, 
BDRV_REQUEST_MAX_SECTORS);
  if (nb_sectors = 0) {
  return 0;
  }
-if (nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
-nb_sectors = INT_MAX / BDRV_SECTOR_SIZE;
-}
  ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n);
  if (ret  0) {
  error_report(error getting block status at sector % PRId64 : 
%s,
@@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState 
*bs,
  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
  BdrvRequestFlags flags)
  {
-if (nb_sectors  0 || nb_sectors  (UINT_MAX  BDRV_SECTOR_BITS)) {
+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EINVAL;
  }
  @@ -3202,8 +3199,8 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
  struct iovec iov = {0};
  int ret = 0;
  -int max_write_zeroes = bs-bl.max_write_zeroes ?
-   bs-bl.max_write_zeroes : INT_MAX;
+int max_write_zeroes = MIN_NON_ZERO(bs-bl.max_write_zeroes,
+ BDRV_REQUEST_MAX_SECTORS);
while (nb_sectors  0  !ret) {
  int num = nb_sectors;
@@ -3458,7 +3455,7 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
  BdrvRequestFlags flags)
  {
-if (nb_sectors  0 || nb_sectors  (INT_MAX  BDRV_SECTOR_BITS)) {
+if (nb_sectors  0 || nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return -EINVAL;
  }
  @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
  return 0;
  }
  -max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX;
+max_discard = MIN_NON_ZERO(bs-bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
  while (nb_sectors  0) {
  int ret;
  int num = nb_sectors;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8c51a29..1a8a176 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
  }
max_xfer_len = blk_get_max_transfer_length(mrb-reqs[0]-dev-blk);
-max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX);
+max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);
qsort(mrb-reqs, mrb-num_reqs, sizeof(*mrb-reqs),
multireq_compare);
@@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
  uint64_t nb_sectors = size  BDRV_SECTOR_BITS;
  uint64_t total_sectors;
  -if (nb_sectors  INT_MAX) {
+if (nb_sectors  BDRV_REQUEST_MAX_SECTORS) {
  return false;
  }
  if (sector  dev-sector_mask) {
diff --git a/include/block/block.h b/include/block/block.h
index 3082d2b..25a6d62 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -83,6 +83,9 @@ typedef enum {
  #define BDRV_SECTOR_SIZE   (1ULL  BDRV_SECTOR_BITS)
  #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
  +#define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX  BDRV_SECTOR_BITS, \
+ INT_MAX  BDRV_SECTOR_BITS)
+
  /*
   * Allocation status flags
   * BDRV_BLOCK_DATA: data is read from bs-file or another file

Reviewed-by: Denis V. Lunev d...@openvz.org

On the other hand the limitation to INT_MAX for a request

Re: [Qemu-devel] [PATCH v2 0/5] vhost-scsi: support to assign boot order

2015-02-03 Thread Gonglei

On 2015/1/29 15:08, Gonglei (Arei) wrote:

 From: Gonglei arei.gong...@huawei.com
 
 Qemu haven't provide a bootindex property for vhost-scsi device.
 So, we can not assign the boot order for it at present. But
 Some clients/users have requirements for that in some scenarios.
 This patch achieve the aim in Qemu side.
 
 Because Qemu only accept an wwpn argument for vhost-scsi, we
 cannot assign a tpgt. That's say tpg is transparent for Qemu, Qemu
 doesn't know which tpg can boot, but vhost-scsi driver module
 doesn't know too for one assigned wwpn.
 
 At present, we assume that the first tpg can boot only, and add
 a boot_tpgt property that defaults to 0. Of course, people can
 pass a valid value by qemu command line.
 

Ping...

 v2 - v1: (Thanks to Paolo's suggestion)
  - change calling  qdev_get_own_fw_dev_path_from_handler in
get_boot_devices_list, and convert non-NULL suffixes to
implementations of FWPathProvider in Patch 1. (Paolo)
  - add a boot_tpgt property for vhost-scsi in Patch 4. (Paolo)
  - remove the ioctl calling in Patch 4, because the kernel
patch hasn't been accepted.
 
 kernel patch:
 [PATCH] vhost-scsi: introduce an ioctl to get the minimum tpgt
 http://news.gmane.org/gmane.comp.emulators.kvm.devel
 

 Gonglei (5):
   qdev: support to get a device firmware path directly
   vhost-scsi: add bootindex property
   vhost-scsi: realize the TYPE_FW_PATH_PROVIDER interface
   vhost-scsi: add a property for booting
   vhost-scsi: set the bootable value of channel/target/lun
 
  bootdevice.c| 31 +--
  hw/core/qdev.c  |  7 +++
  hw/scsi/vhost-scsi.c| 35 +++
  hw/virtio/virtio-pci.c  |  2 ++
  include/hw/qdev-core.h  |  1 +
  include/hw/virtio/vhost-scsi.h  |  5 +
  include/hw/virtio/virtio-scsi.h |  1 +
  7 files changed, 68 insertions(+), 14 deletions(-)

Re: [Qemu-devel] [Qemu-ppc] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations

2015-02-03 Thread Nikunj A Dadhania

David Gibson da...@gibson.dropbear.id.au writes:

 qemu currently implements the hypercalls H_LOGICAL_CI_LOAD and
 H_LOGICAL_CI_STORE as PAPR extensions.  These are used by the SLOF firmware
 for IO, because performing cache inhibited MMIO accesses with the MMU off
 (real mode) is very awkward on POWER.

 This approach breaks when SLOF needs to access IO devices implemented
 within KVM instead of in qemu.  The simplest example would be virtio-blk
 using an iothread, because the iothread / dataplane mechanism relies on
 an in-kernel implementation of the virtio queue notification MMIO.

 To fix this, an in-kernel implementation of these hypercalls has been made,
 however, the hypercalls still need to be enabled from qemu.  This performs
 the necessary calls to do so.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au

Reviewed-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com

Re: [Qemu-devel] [PATCH 1/3] nbd: Drop BDS backpointer

2015-02-03 Thread Paolo Bonzini



On 02/02/2015 22:40, Max Reitz wrote:
 Before this patch, the opaque pointer in an NBD BDS points to a
 BDRVNBDState, which contains an NbdClientSession object, which in turn
 contains a pointer to the BDS. This pointer may become invalid due to
 bdrv_swap(), so drop it, and instead pass the BDS directly to the
 nbd-client.c functions which then retrieve the NbdClientSession object
 from there.

Looks good, but please change function names from nbd_client_session_foo
to nbd_client_foo or even just nbd_foo if they do not take an
NbdClientSession* as the first parameter.

Thanks,

Paolo

 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block/nbd-client.c | 95 
 --
  block/nbd-client.h | 20 ++--
  block/nbd.c| 37 -
  3 files changed, 73 insertions(+), 79 deletions(-)
 
 diff --git a/block/nbd-client.c b/block/nbd-client.c
 index 28bfb62..4ede714 100644
 --- a/block/nbd-client.c
 +++ b/block/nbd-client.c
 @@ -43,20 +43,23 @@ static void 
 nbd_recv_coroutines_enter_all(NbdClientSession *s)
  }
  }
  
 -static void nbd_teardown_connection(NbdClientSession *client)
 +static void nbd_teardown_connection(BlockDriverState *bs)
  {
 +NbdClientSession *client = nbd_get_client_session(bs);
 +
  /* finish any pending coroutines */
  shutdown(client-sock, 2);
  nbd_recv_coroutines_enter_all(client);
  
 -nbd_client_session_detach_aio_context(client);
 +nbd_client_session_detach_aio_context(bs);
  closesocket(client-sock);
  client-sock = -1;
  }
  
  static void nbd_reply_ready(void *opaque)
  {
 -NbdClientSession *s = opaque;
 +BlockDriverState *bs = opaque;
 +NbdClientSession *s = nbd_get_client_session(bs);
  uint64_t i;
  int ret;
  
 @@ -89,28 +92,29 @@ static void nbd_reply_ready(void *opaque)
  }
  
  fail:
 -nbd_teardown_connection(s);
 +nbd_teardown_connection(bs);
  }
  
  static void nbd_restart_write(void *opaque)
  {
 -NbdClientSession *s = opaque;
 +BlockDriverState *bs = opaque;
  
 -qemu_coroutine_enter(s-send_coroutine, NULL);
 +qemu_coroutine_enter(nbd_get_client_session(bs)-send_coroutine, NULL);
  }
  
 -static int nbd_co_send_request(NbdClientSession *s,
 -struct nbd_request *request,
 -QEMUIOVector *qiov, int offset)
 +static int nbd_co_send_request(BlockDriverState *bs,
 +   struct nbd_request *request,
 +   QEMUIOVector *qiov, int offset)
  {
 +NbdClientSession *s = nbd_get_client_session(bs);
  AioContext *aio_context;
  int rc, ret;
  
  qemu_co_mutex_lock(s-send_mutex);
  s-send_coroutine = qemu_coroutine_self();
 -aio_context = bdrv_get_aio_context(s-bs);
 +aio_context = bdrv_get_aio_context(bs);
  aio_set_fd_handler(aio_context, s-sock,
 -   nbd_reply_ready, nbd_restart_write, s);
 +   nbd_reply_ready, nbd_restart_write, bs);
  if (qiov) {
  if (!s-is_unix) {
  socket_set_cork(s-sock, 1);
 @@ -129,7 +133,7 @@ static int nbd_co_send_request(NbdClientSession *s,
  } else {
  rc = nbd_send_request(s-sock, request);
  }
 -aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, s);
 +aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, bs);
  s-send_coroutine = NULL;
  qemu_co_mutex_unlock(s-send_mutex);
  return rc;
 @@ -195,10 +199,11 @@ static void nbd_coroutine_end(NbdClientSession *s,
  }
  }
  
 -static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num,
 +static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov,
int offset)
  {
 +NbdClientSession *client = nbd_get_client_session(bs);
  struct nbd_request request = { .type = NBD_CMD_READ };
  struct nbd_reply reply;
  ssize_t ret;
 @@ -207,7 +212,7 @@ static int nbd_co_readv_1(NbdClientSession *client, 
 int64_t sector_num,
  request.len = nb_sectors * 512;
  
  nbd_coroutine_start(client, request);
 -ret = nbd_co_send_request(client, request, NULL, 0);
 +ret = nbd_co_send_request(bs, request, NULL, 0);
  if (ret  0) {
  reply.error = -ret;
  } else {
 @@ -218,15 +223,16 @@ static int nbd_co_readv_1(NbdClientSession *client, 
 int64_t sector_num,
  
  }
  
 -static int nbd_co_writev_1(NbdClientSession *client, int64_t sector_num,
 +static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov,
 int offset)
  {
 +NbdClientSession *client = nbd_get_client_session(bs);
  struct nbd_request request = { .type = NBD_CMD_WRITE };
  struct nbd_reply reply;
  ssize_t ret;
  
 -if (!bdrv_enable_write_cache(client-bs) 
 +if (!bdrv_enable_write_cache(bs) 
  (client-nbdflags

Re: [Qemu-devel] [PATCH 3/3] iotests: Add test for drive-mirror with NBD target

2015-02-03 Thread Paolo Bonzini



On 02/02/2015 22:40, Max Reitz wrote:
 When the drive-mirror block job is completed, it will call bdrv_swap()
 on the source and the target BDS; this should obviously not result in a
 segmentation fault.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  tests/qemu-iotests/094 | 81 
 ++
  tests/qemu-iotests/094.out | 11 +++
  tests/qemu-iotests/group   |  1 +
  3 files changed, 93 insertions(+)
  create mode 100755 tests/qemu-iotests/094
  create mode 100644 tests/qemu-iotests/094.out
 
 diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094
 new file mode 100755
 index 000..27a2be2
 --- /dev/null
 +++ b/tests/qemu-iotests/094
 @@ -0,0 +1,81 @@
 +#!/bin/bash
 +#
 +# Test case for drive-mirror to NBD (especially bdrv_swap() on NBD BDS)
 +#
 +# Copyright (C) 2015 Red Hat, Inc.
 +#
 +# This program is free software; you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation; either version 2 of the License, or
 +# (at your option) any later version.
 +#
 +# This program is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program.  If not, see http://www.gnu.org/licenses/.
 +#
 +
 +# creator
 +owner=mre...@redhat.com
 +
 +seq=$(basename $0)
 +echo QA output created by $seq
 +
 +here=$PWD
 +tmp=/tmp/$$
 +status=1 # failure is the default!
 +
 +trap exit \$status 0 1 2 3 15
 +
 +# get standard environment, filters and checks
 +. ./common.rc
 +. ./common.filter
 +. ./common.qemu
 +
 +_supported_fmt generic
 +_supported_proto nbd
 +_supported_os Linux
 +_unsupported_imgopts subformat=monolithicFlat 
 subformat=twoGbMaxExtentFlat
 +
 +_make_test_img 64M
 +$QEMU_IMG create -f $IMGFMT $TEST_DIR/source.$IMGFMT 64M | 
 _filter_img_create
 +
 +_launch_qemu -drive 
 if=none,id=src,file=$TEST_DIR/source.$IMGFMT,format=raw \
 + -nodefaults
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +{'execute': 'qmp_capabilities'} \
 +'return'
 +
 +# 'format': 'nbd' is not actually correct, but this is probably the only 
 way
 +# to test bdrv_swap() on an NBD BDS
 +_send_qemu_cmd $QEMU_HANDLE  \
 +{'execute': 'drive-mirror',
 +  'arguments': {'device': 'src',
 +'target': '$TEST_IMG',
 +'format': 'nbd',
 +'sync':'full',
 +'mode':'existing'}} \
 +'BLOCK_JOB_READY'
 +
 +_send_qemu_cmd $QEMU_HANDLE  \
 +{'execute': 'block-job-complete',
 +  'arguments': {'device': 'src'}} \
 +'BLOCK_JOB_COMPLETE'
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +{'execute': 'quit'} \
 +'return'
 +
 +wait=1 _cleanup_qemu
 +
 +_cleanup_test_img
 +rm -f $TEST_DIR/source.$IMGFMT
 +
 +# success, all done
 +echo '*** done'
 +rm -f $seq.full
 +status=0
 diff --git a/tests/qemu-iotests/094.out b/tests/qemu-iotests/094.out
 new file mode 100644
 index 000..b66dc07
 --- /dev/null
 +++ b/tests/qemu-iotests/094.out
 @@ -0,0 +1,11 @@
 +QA output created by 094
 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 +Formatting 'TEST_DIR/source.IMGFMT', fmt=IMGFMT size=67108864
 +{return: {}}
 +{return: {}}
 +{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
 BLOCK_JOB_READY, data: {device: src, len: 67108864, offset: 
 67108864, speed: 0, type: mirror}}
 +{return: {}}
 +{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
 BLOCK_JOB_COMPLETED, data: {device: src, len: 67108864, offset: 
 67108864, speed: 0, type: mirror}}
 +{return: {}}
 +{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
 SHUTDOWN}
 +*** done
 diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
 index 4b2b93b..6e2447a 100644
 --- a/tests/qemu-iotests/group
 +++ b/tests/qemu-iotests/group
 @@ -99,6 +99,7 @@
  090 rw auto quick
  091 rw auto
  092 rw auto quick
 +094 rw auto quick
  095 rw auto quick
  097 rw auto backing
  098 rw auto backing quick
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [PATCH v2 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug

2015-02-03 Thread Zhi Yong Wu

HI,

If you can push the patchset to a branch on github, it will be
convenient for other guys to do some tests.

On Wed, Jan 28, 2015 at 3:45 PM, Zhu Guihua zhugh.f...@cn.fujitsu.com wrote:
 Memory and CPU hot unplug are both asynchronous procedures.
 When the unplug operation happens, unplug request cb is called first.
 And when guest OS finished handling unplug, unplug cb will be called
 to do the real removal of device.

 They both need pc-machine, piix4 and ich9 unplug and unplug request cb.
 So this patchset introduces these commom functions as part1, and memory
 and CPU hot-unplug will come soon as part 2 and 3.

 This patch-set is based on QEmu 2.2

 v2:
 - Commit messages changes

 Tang Chen (5):
   acpi, pc: Add hotunplug request cb for pc machine.
   acpi, ich9: Add hotunplug request cb for ich9.
   acpi, pc: Add unplug cb for pc machine.
   acpi, ich9: Add unplug cb for ich9.
   acpi, piix4: Add unplug cb for piix4.

  hw/acpi/ich9.c | 14 ++
  hw/acpi/piix4.c|  8 
  hw/i386/pc.c   | 16 
  hw/isa/lpc_ich9.c  | 14 --
  include/hw/acpi/ich9.h |  4 
  5 files changed, 54 insertions(+), 2 deletions(-)

 --
 1.9.3





-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters

2015-02-03 Thread Paolo Bonzini



On 02/02/2015 22:09, John Snow wrote:
 
 In this case, only the command header had a utility written for it to
 flip the bits for me. This is part of the FIS, instead, which has no
 explicit flip-on-write mechanism inside of commit.
 
 So, it's correct, but not terribly consistent.
 
 I can write a fis write helper to make this more internally consistent
 about when we handle it for the user and when we don't.

Please do. :)

Paolo

Re: [Qemu-devel] [PATCH v2 1/2] configure: Default to enable module build

2015-02-03 Thread Paolo Bonzini



On 03/02/2015 02:29, Fam Zheng wrote:
  Peter reported that module linking fails on ARM host:
LINK  block/curl.so
  /usr/bin/ld: block/curl.o: relocation R_ARM_THM_MOVW_ABS_NC against
  `__stack_chk_guard' can not be used when making a shared object;
  recompile with -fPIC
  block/curl.o: could not read symbols: Bad value
  collect2: error: ld returned 1 exit status

 I don't see how -fPIC is missed in ARM host :( Does the below patch fix this?

I haven't yet tested on ARM host, hope to do so some time this week.

Paolo

Re: [Qemu-devel] [PATCH 2/3] iotests: Add wait functionality to _cleanup_qemu

2015-02-03 Thread Paolo Bonzini



On 02/02/2015 22:40, Max Reitz wrote:
 The qemu process does not always need to be killed, just waiting for it
 can be fine, too. This introduces a way to do so.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  tests/qemu-iotests/common.qemu | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)
 
 diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
 index 8e618b5..4e1996c 100644
 --- a/tests/qemu-iotests/common.qemu
 +++ b/tests/qemu-iotests/common.qemu
 @@ -187,13 +187,23 @@ function _launch_qemu()
  
  
  # Silenty kills the QEMU process
 +#
 +# If $wait is set to anything other than the empty string, the process will 
 not
 +# be killed but only waited for, and any output will be forwarded to stdout. 
 If
 +# $wait is empty, the process will be killed and all output will be 
 suppressed.
  function _cleanup_qemu()
  {
  # QEMU_PID[], QEMU_IN[], QEMU_OUT[] all use same indices
  for i in ${!QEMU_OUT[@]}
  do
 -kill -KILL ${QEMU_PID[$i]} 2/dev/null
 +if [ -z ${wait} ]; then
 +kill -KILL ${QEMU_PID[$i]} 2/dev/null
 +fi
  wait ${QEMU_PID[$i]} 2/dev/null # silent kill
 +if [ -n ${wait} ]; then
 +cat ${QEMU_OUT[$i]} | _filter_testdir | _filter_qemu \
 +  | _filter_qemu_io | _filter_qmp
 +fi
  rm -f ${QEMU_FIFO_IN}_${i} ${QEMU_FIFO_OUT}_${i}
  eval exec ${QEMU_IN[$i]}-   # close file descriptors
  eval exec ${QEMU_OUT[$i]}-
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support

2015-02-03 Thread Zhi Yong Wu

HI,

Can you push the patchset to a branch on github? It will be convenient
for other guys to do some tests.

On Wed, Jan 14, 2015 at 3:44 PM, Zhu Guihua zhugh.f...@cn.fujitsu.com wrote:
 This series is based on chen fan's previous i386 cpu hot remove patchset:
 https://lists.nongnu.org/archive/html/qemu-devel/2013-12/msg04266.html

 Via implementing ACPI standard methods _EJ0 in ACPI table, after Guest
 OS remove one vCPU online, the fireware will store removed bitmap to
 QEMU, then QEMU could know to notify the assigned vCPU of exiting.
 Meanwhile, intruduce the QOM command 'device_del' to remove vCPU from
 QEMU itself.

 The whole work is based on the new hot plug/unplug framework, ,the unplug 
 request
 callback does the pre-check and send the request, unplug callback does the
 removal handling.

 This series depends on tangchen's common hot plug/unplug enhance patchset.
 [RESEND PATCH v1 0/5] Common unplug and unplug request cb for memory and CPU 
 hot-unplug
 https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg00429.html

 The is the second half of the previous series:
 [RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove 
 support
 https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html

 If you want to test the series, you need to apply the 'device_add 
 foo-x86_64-cpu'
 patchset first:
 [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support
 https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg01552.html

 ---
 Changelog since v1:
  -rebase on the latest version.
  -delete patch i386/cpu: add instance finalize callback, and put it into 
 patchset
   [PATCH v3 0/6] cpu: add device_add foo-x86_64-cpu support.

 Changelog since RFC:
  -splited the i386 cpu hot remove into single thread.
  -replaced apic_no with apic_id, so does the related stuff to make it
   work with arbitrary CPU hotadd.
  -add the icc_device_unrealize callback to handle apic unrealize.
  -rework on the new hot plug/unplug platform.
 ---

 Chen Fan (2):
   x86: add x86_cpu_unrealizefn() for cpu apic remove
   cpu hotplug: implement function cpu_status_write() for vcpu ejection

 Gu Zheng (5):
   acpi/cpu: add cpu hot unplug request callback function
   acpi/piix4: add cpu hot unplug callback support
   acpi/ich9: add cpu hot unplug support
   pc: add cpu hot unplug callback support
   cpus: reclaim allocated vCPU objects

 Zhu Guihua (4):
   acpi/piix4: add cpu hot unplug request callback support
   acpi/ich9: add cpu hot unplug request callback support
   pc: add cpu hot unplug request callback support
   acpi/cpu: add cpu hot unplug callback function

  cpus.c| 44 
  hw/acpi/cpu_hotplug.c | 88 
 ---
  hw/acpi/ich9.c| 17 ++--
  hw/acpi/piix4.c   | 12 +-
  hw/core/qdev.c|  2 +-
  hw/cpu/icc_bus.c  | 11 +
  hw/i386/acpi-dsdt-cpu-hotplug.dsl |  6 ++-
  hw/i386/kvm/apic.c|  8 
  hw/i386/pc.c  | 62 +--
  hw/intc/apic.c| 10 +
  hw/intc/apic_common.c | 21 ++
  include/hw/acpi/cpu_hotplug.h |  8 
  include/hw/cpu/icc_bus.h  |  1 +
  include/hw/i386/apic_internal.h   |  1 +
  include/hw/qdev-core.h|  1 +
  include/qom/cpu.h |  9 
  include/sysemu/kvm.h  |  1 +
  kvm-all.c | 57 -
  target-i386/cpu.c | 46 
  19 files changed, 378 insertions(+), 27 deletions(-)

 --
 1.9.3





-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [question] the patch which affect performance of virtio-scsi

2015-02-03 Thread Paolo Bonzini



On 03/02/2015 03:31, Wangting (Kathy) wrote:
 Hi Paolo,
 
 Recently I test IO performance with virtio-scsi, and find out that the patch 
 of
 virtio-scsi: add support for the any_layout feature affects IO performance 
 of model
 with 4KB 32iodepth sequence read.
 
 Why cdb and sense is removed from the struct of VirtIOSCSICmdReq and 
 VirtIOSCSICmdResp?

Because I could not find any other way to implement ANY_LAYOUT, and
virtio 1.0 requires that feature.

The performance however was already improved in commit faf1e1f
(virtio-scsi: Optimize virtio_scsi_init_req, 2014-09-16).

Paolo

 How do you consider the impact of the changes to the performance?
 Although the latest version of qemu can optimize the performance by the way 
 of reading merger,
 I think the affect of this patch cannot be ignored.

Re: [Qemu-devel] [RESEND PATCH v1 00/13] QEmu memory hot unplug support.

2015-02-03 Thread Zhi Yong Wu

HI,

Can you push the patchset to a branch on github? It will be convenient
for other guys to do some tests.

On Thu, Jan 8, 2015 at 9:06 AM, Tang Chen tangc...@cn.fujitsu.com wrote:
 Memory hot unplug are both asynchronize procedures.
 When the unplug operation happens, unplug request cb is called first.
 And when ghest OS finished handling unplug, unplug cb will be called
 to do the real removal of device.

 This patch-set is based on QEmu 2.2

 This series depends on the following patchset.
 [PATCH] Common unplug and unplug request cb for memory and CPU hot-unplug.
 https://www.mail-archive.com/qemu-devel@nongnu.org/msg272745.html

 Hu Tao (2):
   acpi, piix4: Add memory hot unplug request support for piix4.
   pc, acpi bios: Add memory hot unplug interface.

 Tang Chen (11):
   acpi, mem-hotplug: Use PC_DIMM_SLOT_PROP in acpi_memory_plug_cb().
   acpi, mem-hotplug: Add acpi_memory_get_slot_status_descriptor() to get
 MemStatus.
   acpi, mem-hotplug: Add acpi_memory_hotplug_sci() to rise sci for
 memory hotplug.
   acpi, mem-hotplug: Add unplug request cb for memory device.
   acpi, ich9: Add memory hot unplug request support for ich9.
   pc-dimm: Add memory hot unplug request support for pc-dimm.
   acpi, mem-hotplug: Add unplug cb for memory device.
   acpi, piix4: Add memory hot unplug support for piix4.
   acpi, ich9: Add memory hot unplug support for ich9.
   pc-dimm: Add memory hot unplug support for pc-dimm.
   acpi: Add hardware implementation for memory hot unplug.

  docs/specs/acpi_mem_hotplug.txt   |  8 +++-
  hw/acpi/ich9.c| 20 ++--
  hw/acpi/memory_hotplug.c  | 97 
 ---
  hw/acpi/piix4.c   | 18 ++--
  hw/core/qdev.c|  2 +-
  hw/i386/acpi-dsdt-mem-hotplug.dsl | 11 -
  hw/i386/pc.c  | 53 +++--
  hw/i386/ssdt-mem.dsl  |  5 ++
  include/hw/acpi/memory_hotplug.h  |  6 +++
  include/hw/acpi/pc-hotplug.h  |  2 +
  include/hw/qdev-core.h|  1 +
  11 files changed, 192 insertions(+), 31 deletions(-)

 --
 1.8.4.2





-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [question] the patch which affect performance of virtio-scsi

2015-02-03 Thread Paolo Bonzini

On 03/02/2015 03:56, Wangting (Kathy) wrote:
 Sorry, I find that the patch of virtio-scsi: Optimize virtio_scsi_init_req 
 can slove this problem.

Great that you could confirm that. :)

 By the way, can you tell me the reason of the change about cdb and sense?

cdb and sense are variable-size items.  ANY_LAYOUT support changed
VirtIOSCSIReq: instead of having a pointer to the request, it copies the
request from guest memory into VirtIOSCSIReq.  This is required because
the request might not be contiguous in guest memory.  And because the
request and response headers (e.g. VirtIOSCSICmdReq and
VirtIOSCSICmdResp) are included by value in VirtIOSCSIReq, the
variable-sized fields have to be treated specially.

Only one of them can remain in VirtIOSCSIReq, because you cannot have a
flexible array member (e.g. uint_8 sense[];) in the middle of a struct.

cdb is always used, so it is chosen for the variable-sized part of
VirtIOSCSIReq: cdb was simply moved from VirtIOSCSICmdReq to VirtIOSCSIReq.

Instead, requests that complete with sense data are not a fast path.
Hence sense is retrieved from the SCSIRequest, and
virtio_scsi_command_complete copies it into the guest buffer via
scsi_req_get_sense + qemu_iovec_from_buf.

Paolo

Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-02-03 Thread Gonglei

On 2015/2/3 15:49, Markus Armbruster wrote:

 You're right.  pc.c's set_boot_dev() fails when its boot order argument
 is invalid.
 
 The boot order interface is crap, because it makes detecting
 configuration errors early hard.  Two solutions:
 
 A. It may be hard, but not too hard for the determined
 
1. If once is given, register reset handler to restore boot order.
 
2. Pass the normal boot order to machine creation.  Should fail when
the normal boot order is invalid.
 
3. If once is given, set it with qemu_boot_set().  Fails when the
once boot order is invalid.
 
4. Start the machine.
 
5. On reset, the reset handler calls qemu_boot_set() to restore boot
order.  Should never fail.
 

What about the below patch?

diff --git a/vl.c b/vl.c
index 983259b..7d37191 100644
--- a/vl.c
+++ b/vl.c
@@ -126,6 +126,7 @@ int main(int argc, char **argv)
@@ -126,6 +126,7 @@ int main(int argc, char **argv)
--- a/vl.c
+++ b/vl.c
@@ -126,6 +126,7 @@ int main(int argc, char **argv)

 static const char *data_dir[16];
 static int data_dir_idx;
+const char *once = NULL;
 const char *bios_name = NULL;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 DisplayType display_type = DT_DEFAULT;
@@ -4046,7 +4047,7 @@ int main(int argc, char **argv, char **envp)
 opts = qemu_opts_find(qemu_find_opts(boot-opts), NULL);
 if (opts) {
 char *normal_boot_order;
-const char *order, *once;
+const char *order;
 Error *local_err = NULL;

 order = qemu_opt_get(opts, order);
@@ -4067,7 +4068,6 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 normal_boot_order = g_strdup(boot_order);
-boot_order = once;
 qemu_register_reset(restore_boot_order, normal_boot_order);
 }

@@ -4246,6 +4246,15 @@ int main(int argc, char **argv, char **envp)

 net_check_clients();

+if (once) {
+Error *local_err = NULL;
+qemu_boot_set(once, local_err);
+if (local_err) {
+error_report(%s, error_get_pretty(local_err));
+exit(1);
+}
+}
+

Regards,
-Gonglei

 B. Fix the crappy interface
 
Separate parameter validation from the actual action.  Only
validation may fail.  Validate before starting the guest.
 
 * validate_bootdevices() fails

   Should never happen, because we've called it in main() already,
   treating failure as fatal error.

 Yes.



 * boot_set_handler is null

   MachineClass method init() may set this.  main() could *easily* test
   whether it did!  If it didn't, and -boot once is given, error out.
   Similar checks exist already, e.g. drive_check_orphaned(),
   net_check_clients().  They only warn, but that's detail.

 I agree, just need to report the error message.

 Regards,
 -Gonglei

Re: [Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine

2015-02-03 Thread Paolo Bonzini



On 02/02/2015 22:12, John Snow wrote:
 
 It comes in handy later for testing migration so I don't have to do a
 lot of boilerplate for each instance, though it is just a convenience
 subroutine with no logic of its own.
 
 I like to cut down on boilerplate as much as possible to expose the
 logic of the test as much as possible.
 
 Have a suggestion for a better name, or are you very adamant about
 culling it?

I'm adamant about culling it because I don't have a suggestion for a
better name.

In the long run, I think we should just have a qos_boot function that
does everything including PCI scanning, mapping BARs and initializing
devices.  But we're of course very far from that.

Paolo

Re: [Qemu-devel] [PATCH 0/4] nbd: iotest fixes and error message improvement

2015-02-03 Thread Kevin Wolf

Am 27.01.2015 um 03:02 hat Max Reitz geschrieben:
 This series is a follow-up to my previous patch iotests: Specify format
 for qemu-nbd and as such relies on it.
 
 The first three patches of this series fix the qemu-iotests so they once
 again pass when using NBD.
 
 The fourth patch of this series improves NBD's error message for
 establishing connections, especially if the server's and the client's
 NBD version differs (which, until now, was simply Bad magic received).

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [RFC PATCH v8 11/21] replay: recording and replaying clock ticks

2015-02-03 Thread Pavel Dovgaluk

 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 03/02/2015 11:51, Pavel Dovgaluk wrote:
  From: Paolo Bonzini [mailto:pbonz...@redhat.com]
  On 22/01/2015 09:52, Pavel Dovgalyuk wrote:
  Clock ticks are considered as the sources of non-deterministic data for
  virtual machine. This patch implements saving the clock values when they
  are acquired (virtual, host clock, rdtsc, and some other timers).
  When replaying the execution corresponding values are read from log and
  transfered to the module, which wants to read the values.
  Such a design required the clock polling to be synchronized. Sometimes
  it is not true - e.g. when timeouts for timer lists are checked. In this 
  case
  we use a cached value of the clock, passing it to the client code.

  Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru
  ---
   cpus.c   |3 +-
   include/qemu/timer.h |   10 +
   qemu-timer.c |7 ++--
   replay/Makefile.objs |1 +
   replay/replay-internal.h |   13 +++
   replay/replay-time.c |   84 
  ++
   replay/replay.h  |   25 ++
   stubs/replay.c   |9 +
   8 files changed, 147 insertions(+), 5 deletions(-)
   create mode 100755 replay/replay-time.c

  diff --git a/cpus.c b/cpus.c
  index 8787277..01d89aa 100644
  --- a/cpus.c
  +++ b/cpus.c
  @@ -353,7 +353,8 @@ static void icount_warp_rt(void *opaque)

   seqlock_write_lock(timers_state.vm_clock_seqlock);
   if (runstate_is_running()) {
  -int64_t clock = cpu_get_clock_locked();
  +int64_t clock = REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT,
  + cpu_get_clock_locked());
   int64_t warp_delta;

   warp_delta = clock - vm_clock_warp_start;
  diff --git a/include/qemu/timer.h b/include/qemu/timer.h
  index 0666920..0c2472c 100644
  --- a/include/qemu/timer.h
  +++ b/include/qemu/timer.h
  @@ -4,6 +4,7 @@
   #include qemu/typedefs.h
   #include qemu-common.h
   #include qemu/notify.h
  +#include replay/replay.h

   /* timers */

  @@ -760,6 +761,8 @@ int64_t cpu_icount_to_ns(int64_t icount);
   /***/
   /* host CPU ticks (if available) */

  +#define cpu_get_real_ticks cpu_get_real_ticks_impl
  +
   #if defined(_ARCH_PPC)

   static inline int64_t cpu_get_real_ticks(void)
  @@ -913,6 +916,13 @@ static inline int64_t cpu_get_real_ticks (void)
   }
   #endif

  +#undef cpu_get_real_ticks
  +
  +static inline int64_t cpu_get_real_ticks(void)

  cpu_get_real_ticks should never be used.  Please instead wrap
  cpu_get_ticks() with REPLAY_CLOCK.

  I don't quite understand this comment.
  Do you mean that I should move REPLAY_CLOCK to the cpu_get_real_ticks 
  usages instead of it's
 implementation?

 Only to the cpu_get_ticks usage.  The others are okay.

cpu_get_ticks cannot call cpu_get_real_ticks in icount mode.
And other functions can. Then we should put REPLAY_CLOCK into those functions?

  +/*! Reads next clock value from the file.
  +If clock kind read from the file is different from the parameter,
  +the value is not used.
  +If the parameter is -1, the clock value is read to the cache anyway. 
  */

  In what case could the clock kind not match?

  It was used in full version which had to skip clock from the log while 
  loading the VM state.

 So can it be removed for now?

I think it can.

Pavel Dovgalyuk

Re: [Qemu-devel] [PATCH 1/3] softfloat: Expand out the STATUS_PARAM macro

2015-02-03 Thread Peter Maydell

On 2 February 2015 at 21:37, Richard Henderson r...@twiddle.net wrote:
 On 02/02/2015 12:31 PM, Peter Maydell wrote:
 -void float_raise( int8 flags STATUS_PARAM )
 +void float_raise(int8 flags , float_status *status)

 Extra space before comma.

Thanks, fixed. I don't propose to send out a respin just for that.

-- PMM

Re: [Qemu-devel] [PATCH v2 0/5] vhost-scsi: support to assign boot order

2015-02-03 Thread Gonglei

On 2015/2/3 19:11, Paolo Bonzini wrote:

 
 
 On 03/02/2015 09:55, Gonglei wrote:
 On 2015/1/29 15:08, Gonglei (Arei) wrote:

 From: Gonglei arei.gong...@huawei.com

 Qemu haven't provide a bootindex property for vhost-scsi device.
 So, we can not assign the boot order for it at present. But
 Some clients/users have requirements for that in some scenarios.
 This patch achieve the aim in Qemu side.

 Because Qemu only accept an wwpn argument for vhost-scsi, we
 cannot assign a tpgt. That's say tpg is transparent for Qemu, Qemu
 doesn't know which tpg can boot, but vhost-scsi driver module
 doesn't know too for one assigned wwpn.
 
 At present, we assume that the first tpg can boot only, and add
 a boot_tpgt property that defaults to 0. Of course, people can
 pass a valid value by qemu command line.


 Ping...
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Thanks :)

Regards,
-Gonglei

Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-03 Thread Peter Lieven

Am 03.02.2015 um 08:31 schrieb Denis V. Lunev:
 On 02/02/15 23:46, Denis V. Lunev wrote:
 On 02/02/15 23:40, Peter Lieven wrote:
 Am 02.02.2015 um 21:09 schrieb Denis V. Lunev:
 qemu_gluster_co_discard calculates size to discard as follows
  size_t size = nb_sectors * BDRV_SECTOR_SIZE;
  ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, 
 acb);

 glfs_discard_async is declared as follows:
int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
glfs_io_cbk fn, void *data) __THROW
 This is problematic on i686 as sizeof(size_t) == 4.

 Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
 on i386.

 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Kevin Wolf kw...@redhat.com
 CC: Peter Lieven p...@kamp.de
 ---
   block/gluster.c | 9 +
   1 file changed, 9 insertions(+)

 diff --git a/block/gluster.c b/block/gluster.c
 index 1eb3a8c..8a8c153 100644
 --- a/block/gluster.c
 +++ b/block/gluster.c
 @@ -622,6 +622,11 @@ out:
   return ret;
   }
   +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error 
 **errp)
 +{
 +bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
 +}
 +
 Looking at the gluster code bl.max_transfer_length should have the same 
 limit, but thats a different patch.
 ha, the same applies to nbd code too.

 I'll do this stuff tomorrow and also I think that some
 audit in other drivers could reveal something interesting.

 Den
 ok. The situation is well rotten here on i686.

 The problem comes from the fact that QEMUIOVector
 and iovec uses size_t as length. All API calls use
 this abstraction. Thus all conversion operations
 from nr_sectors to size could bang at any moment.

 Putting dirty hands here is problematic from my point
 of view. Should we really care about this? 32bit
 applications are becoming old good history of IT...

The host has to be 32bit to be in trouble. And at least if we have KVM the host
has to support long mode.

I have on my todo to add generic code for honouring bl.max_transfer_length
in block.c. We could change default maximum from INT_MAX to SIZE_MAX  
BDRV_SECTOR_BITS
for bl.max_transfer_length.

Peter

[Qemu-devel] [PATCH] libcacard: stop linking against every single 3rd party library

2015-02-03 Thread Daniel P. Berrange

Building QEMU results in a libcacard.so that links against
practically the entire world

linux-vdso.so.1 =  (0x7fff71e99000)
libssl3.so = /usr/lib64/libssl3.so (0x7f49f94b6000)
libsmime3.so = /usr/lib64/libsmime3.so (0x7f49f928e000)
libnss3.so = /usr/lib64/libnss3.so (0x7f49f8f67000)
libnssutil3.so = /usr/lib64/libnssutil3.so (0x7f49f8d3b000)
libplds4.so = /usr/lib64/libplds4.so (0x7f49f8b36000)
libplc4.so = /usr/lib64/libplc4.so (0x7f49f8931000)
libnspr4.so = /usr/lib64/libnspr4.so (0x7f49f86f2000)
libdl.so.2 = /usr/lib64/libdl.so.2 (0x7f49f84ed000)
libm.so.6 = /usr/lib64/libm.so.6 (0x7f49f81e5000)
libgthread-2.0.so.0 = /usr/lib64/libgthread-2.0.so.0 
(0x7f49f7fe3000)
librt.so.1 = /usr/lib64/librt.so.1 (0x7f49f7dda000)
libz.so.1 = /usr/lib64/libz.so.1 (0x7f49f7bc4000)
libcap-ng.so.0 = /usr/lib64/libcap-ng.so.0 (0x7f49f79be000)
libuuid.so.1 = /usr/lib64/libuuid.so.1 (0x7f49f77b8000)
libgnutls.so.28 = /usr/lib64/libgnutls.so.28 (0x7f49f749a000)
libSDL-1.2.so.0 = /usr/lib64/libSDL-1.2.so.0 (0x7f49f71fd000)
libpthread.so.0 = /usr/lib64/libpthread.so.0 (0x7f49f6fe)
libvte.so.9 = /usr/lib64/libvte.so.9 (0x7f49f6d3f000)
libXext.so.6 = /usr/lib64/libXext.so.6 (0x7f49f6b2d000)
libgtk-x11-2.0.so.0 = /usr/lib64/libgtk-x11-2.0.so.0 
(0x7f49f64a)
libgdk-x11-2.0.so.0 = /usr/lib64/libgdk-x11-2.0.so.0 
(0x7f49f61de000)
libpangocairo-1.0.so.0 = /usr/lib64/libpangocairo-1.0.so.0 
(0x7f49f5fd1000)
libatk-1.0.so.0 = /usr/lib64/libatk-1.0.so.0 (0x7f49f5daa000)
libcairo.so.2 = /usr/lib64/libcairo.so.2 (0x7f49f5a9d000)
libgdk_pixbuf-2.0.so.0 = /usr/lib64/libgdk_pixbuf-2.0.so.0 
(0x7f49f5878000)
libgio-2.0.so.0 = /usr/lib64/libgio-2.0.so.0 (0x7f49f550)
libpangoft2-1.0.so.0 = /usr/lib64/libpangoft2-1.0.so.0 
(0x7f49f52eb000)
libpango-1.0.so.0 = /usr/lib64/libpango-1.0.so.0 (0x7f49f50a)
libgobject-2.0.so.0 = /usr/lib64/libgobject-2.0.so.0 
(0x7f49f4e4e000)
libglib-2.0.so.0 = /usr/lib64/libglib-2.0.so.0 (0x7f49f4b15000)
libfontconfig.so.1 = /usr/lib64/libfontconfig.so.1 (0x7f49f48d6000)
libfreetype.so.6 = /usr/lib64/libfreetype.so.6 (0x7f49f462b000)
libX11.so.6 = /usr/lib64/libX11.so.6 (0x7f49f42e8000)
libxenstore.so.3.0 = /usr/lib64/libxenstore.so.3.0 (0x7f49f40de000)
libxenctrl.so.4.4 = /usr/lib64/libxenctrl.so.4.4 (0x7f49f3eb6000)
libxenguest.so.4.4 = /usr/lib64/libxenguest.so.4.4 (0x7f49f3c8b000)
libseccomp.so.2 = /usr/lib64/libseccomp.so.2 (0x7f49f3a74000)
librdmacm.so.1 = /usr/lib64/librdmacm.so.1 (0x7f49f385d000)
libibverbs.so.1 = /usr/lib64/libibverbs.so.1 (0x7f49f364a000)
libutil.so.1 = /usr/lib64/libutil.so.1 (0x7f49f3447000)
libc.so.6 = /usr/lib64/libc.so.6 (0x7f49f3089000)
/lib64/ld-linux-x86-64.so.2 (0x7f49f9902000)
libp11-kit.so.0 = /usr/lib64/libp11-kit.so.0 (0x7f49f2e23000)
libtspi.so.1 = /usr/lib64/libtspi.so.1 (0x7f49f2bb2000)
libtasn1.so.6 = /usr/lib64/libtasn1.so.6 (0x7f49f299f000)
libnettle.so.4 = /usr/lib64/libnettle.so.4 (0x7f49f276d000)
libhogweed.so.2 = /usr/lib64/libhogweed.so.2 (0x7f49f2545000)
libgmp.so.10 = /usr/lib64/libgmp.so.10 (0x7f49f22cd000)
libncurses.so.5 = /usr/lib64/libncurses.so.5 (0x7f49f20a5000)
libtinfo.so.5 = /usr/lib64/libtinfo.so.5 (0x7f49f1e7a000)
libgmodule-2.0.so.0 = /usr/lib64/libgmodule-2.0.so.0 
(0x7f49f1c76000)
libXfixes.so.3 = /usr/lib64/libXfixes.so.3 (0x7f49f1a6f000)
libXrender.so.1 = /usr/lib64/libXrender.so.1 (0x7f49f1865000)
libXinerama.so.1 = /usr/lib64/libXinerama.so.1 (0x7f49f1662000)
libXi.so.6 = /usr/lib64/libXi.so.6 (0x7f49f1452000)
libXrandr.so.2 = /usr/lib64/libXrandr.so.2 (0x7f49f1247000)
libXcursor.so.1 = /usr/lib64/libXcursor.so.1 (0x7f49f103c000)
libXcomposite.so.1 = /usr/lib64/libXcomposite.so.1 (0x7f49f0e39000)
libXdamage.so.1 = /usr/lib64/libXdamage.so.1 (0x7f49f0c35000)
libharfbuzz.so.0 = /usr/lib64/libharfbuzz.so.0 (0x7f49f09dd000)
libpixman-1.so.0 = /usr/lib64/libpixman-1.so.0 (0x7f49f072f000)
libEGL.so.1 = /usr/lib64/libEGL.so.1 (0x7f49f0505000)
libpng16.so.16 = /usr/lib64/libpng16.so.16 (0x7f49f02d2000)
libxcb-shm.so.0 = /usr/lib64/libxcb-shm.so.0 (0x7f49f00cd000)
libxcb-render.so.0 = /usr/lib64/libxcb-render.so.0 (0x7f49efec3000)
libxcb.so.1 = /usr/lib64/libxcb.so.1 (0x7f49efca1000)
libGL.so.1 = /usr/lib64/libGL.so.1 (0x7f49efa06000)

Re: [Qemu-devel] [PATCH v2 00/11] target-arm: handle mmu_idx/translation regimes properly

2015-02-03 Thread Peter Maydell

On 29 January 2015 at 18:55, Peter Maydell peter.mayd...@linaro.org wrote:
 This patchseries fixes up our somewhat broken handling of mmu_idx values:
  * implement the full set of 7 mmu_idxes we need for supporting EL2 and EL3
  * pass the mmu_idx in the TB flags rather than EL or a priv flag,
so we can generate code with the correct kind of access
  * identify the correct mmu_idx to use for AT/ATS system insns
  * pass mmu_idx into get_phys_addr() and use it within that family
of functions as an indication of which translation regime to do
a v-to-p lookup for, instead of relying on an is_user flag plus the
current CPU state
  * some minor indent stuff on the end

 It does not contain:
  * complete support for EL2 or 64-bit EL3; in some places I have added
the code where it was obvious and easy; in others I have just left
TODO marker comments
  * the 'tlb_flush_for_mmuidx' functionality I proposed in a previous mail;
I preferred to get the semantics right in this patchset first before
improving the efficiency later

I'm planning to put this series into my next target-arm pull,
sometime tail end of the week.

-- PMM

Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-03 Thread Kevin Wolf

Am 03.02.2015 um 12:30 hat Peter Lieven geschrieben:
 Am 03.02.2015 um 08:31 schrieb Denis V. Lunev:
  On 02/02/15 23:46, Denis V. Lunev wrote:
  On 02/02/15 23:40, Peter Lieven wrote:
  Am 02.02.2015 um 21:09 schrieb Denis V. Lunev:
  qemu_gluster_co_discard calculates size to discard as follows
   size_t size = nb_sectors * BDRV_SECTOR_SIZE;
   ret = glfs_discard_async(s-fd, offset, size, 
  gluster_finish_aiocb, acb);
 
  glfs_discard_async is declared as follows:
 int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
 glfs_io_cbk fn, void *data) __THROW
  This is problematic on i686 as sizeof(size_t) == 4.
 
  Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
  on i386.
 
  Signed-off-by: Denis V. Lunev d...@openvz.org
  CC: Kevin Wolf kw...@redhat.com
  CC: Peter Lieven p...@kamp.de
  ---
block/gluster.c | 9 +
1 file changed, 9 insertions(+)
 
  diff --git a/block/gluster.c b/block/gluster.c
  index 1eb3a8c..8a8c153 100644
  --- a/block/gluster.c
  +++ b/block/gluster.c
  @@ -622,6 +622,11 @@ out:
return ret;
}
+static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error 
  **errp)
  +{
  +bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
  +}
  +
  Looking at the gluster code bl.max_transfer_length should have the same 
  limit, but thats a different patch.
  ha, the same applies to nbd code too.
 
  I'll do this stuff tomorrow and also I think that some
  audit in other drivers could reveal something interesting.
 
  Den
  ok. The situation is well rotten here on i686.
 
  The problem comes from the fact that QEMUIOVector
  and iovec uses size_t as length. All API calls use
  this abstraction. Thus all conversion operations
  from nr_sectors to size could bang at any moment.
 
  Putting dirty hands here is problematic from my point
  of view. Should we really care about this? 32bit
  applications are becoming old good history of IT...
 
 The host has to be 32bit to be in trouble. And at least if we have KVM the 
 host
 has to support long mode.
 
 I have on my todo to add generic code for honouring bl.max_transfer_length
 in block.c. We could change default maximum from INT_MAX to SIZE_MAX  
 BDRV_SECTOR_BITS
 for bl.max_transfer_length.

So the conclusion is that we'll apply this series as it is and you'll
take care of the rest later?

Kevin

Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-03 Thread Peter Lieven

Am 03.02.2015 um 12:37 schrieb Kevin Wolf:
 Am 03.02.2015 um 12:30 hat Peter Lieven geschrieben:
 Am 03.02.2015 um 08:31 schrieb Denis V. Lunev:
 On 02/02/15 23:46, Denis V. Lunev wrote:
 On 02/02/15 23:40, Peter Lieven wrote:
 Am 02.02.2015 um 21:09 schrieb Denis V. Lunev:
 qemu_gluster_co_discard calculates size to discard as follows
  size_t size = nb_sectors * BDRV_SECTOR_SIZE;
  ret = glfs_discard_async(s-fd, offset, size, 
 gluster_finish_aiocb, acb);

 glfs_discard_async is declared as follows:
int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
glfs_io_cbk fn, void *data) __THROW
 This is problematic on i686 as sizeof(size_t) == 4.

 Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
 on i386.

 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Kevin Wolf kw...@redhat.com
 CC: Peter Lieven p...@kamp.de
 ---
   block/gluster.c | 9 +
   1 file changed, 9 insertions(+)

 diff --git a/block/gluster.c b/block/gluster.c
 index 1eb3a8c..8a8c153 100644
 --- a/block/gluster.c
 +++ b/block/gluster.c
 @@ -622,6 +622,11 @@ out:
   return ret;
   }
   +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error 
 **errp)
 +{
 +bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
 +}
 +
 Looking at the gluster code bl.max_transfer_length should have the same 
 limit, but thats a different patch.
 ha, the same applies to nbd code too.

 I'll do this stuff tomorrow and also I think that some
 audit in other drivers could reveal something interesting.

 Den
 ok. The situation is well rotten here on i686.

 The problem comes from the fact that QEMUIOVector
 and iovec uses size_t as length. All API calls use
 this abstraction. Thus all conversion operations
 from nr_sectors to size could bang at any moment.

 Putting dirty hands here is problematic from my point
 of view. Should we really care about this? 32bit
 applications are becoming old good history of IT...
 The host has to be 32bit to be in trouble. And at least if we have KVM the 
 host
 has to support long mode.

 I have on my todo to add generic code for honouring bl.max_transfer_length
 in block.c. We could change default maximum from INT_MAX to SIZE_MAX  
 BDRV_SECTOR_BITS
 for bl.max_transfer_length.
 So the conclusion is that we'll apply this series as it is and you'll
 take care of the rest later?

Yes, and actually we need a macro like

#define BDRV_MAX_REQUEST_SECTORS MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX)

as limit for everything. Because bdrv_check_byte_request already has a size_t 
argument.
So we could already create an overflow in bdrv_check_request when we convert
nb_sectors to size_t.

I will create a patch to catch at least this overflow shortly.

Peter

[Qemu-devel] [PATCH v2 12/19] libqos/ahci: add ahci command functions

2015-02-03 Thread John Snow

This patch adds the AHCICommand structure, and a set of functions to
operate on the structure.

ahci_command_create - Initialize and create a new AHCICommand in memory
ahci_command_free - Destroy this object.
ahci_command_set_buffer - Set where the guest memory DMA buffer is.
ahci_command_commit - Write this command to the AHCI HBA.
ahci_command_issue - Issue the committed command synchronously.
ahci_command_issue_async - Issue the committed command asynchronously.
ahci_command_wait - Wait for an asynchronous command to finish.
ahci_command_slot - Get the number of the command slot we committed to.

Helpers:
size_to_prdtl   - Calculate the required minimum PRDTL size from
  a buffer size.
ahci_command_find   - Given an ATA command mnemonic, look it up in the
  properties table to obtain info about the command.
command_header_init - Initialize the command header with sane values.
command_table_init  - Initialize the command table with sane values.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   |  73 +--
 tests/libqos/ahci.c | 202 
 tests/libqos/ahci.h |  15 
 3 files changed, 234 insertions(+), 56 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 658956d..0834020 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -657,30 +657,28 @@ static void ahci_test_port_spec(AHCIQState *ahci, uint8_t 
port)
  */
 static void ahci_test_identify(AHCIQState *ahci)
 {
-RegH2DFIS fis;
-AHCICommandHeader cmd;
-PRD prd;
 uint32_t data_ptr;
 uint16_t buff[256];
 unsigned i;
 int rc;
+AHCICommand *cmd;
 uint8_t cx;
-uint64_t table;
 
 g_assert(ahci != NULL);
 
 /* We need to:
- * (1) Create a Command Table Buffer and update the Command List Slot #0
- * to point to this buffer.
- * (2) Construct an FIS host-to-device command structure, and write it to
+ * (1) Create a data buffer for the IDENTIFY response to be sent to,
+ * (2) Create a Command Table Buffer
+ * (3) Construct an FIS host-to-device command structure, and write it to
  * the top of the command table buffer.
- * (3) Create a data buffer for the IDENTIFY response to be sent to
  * (4) Create a Physical Region Descriptor that points to the data buffer,
  * and write it to the bottom (offset 0x80) of the command table.
- * (5) Now, PxCLB points to the command list, command 0 points to
+ * (5) Obtain a Command List slot, and update this header to point to
+ * the Command Table we built above.
+ * (6) Now, PxCLB points to the command list, command 0 points to
  * our table, and our table contains an FIS instruction and a
  * PRD that points to our rx buffer.
- * (6) We inform the HBA via PxCI that there is a command ready in slot #0.
+ * (7) We inform the HBA via PxCI that there is a command ready in slot #0.
  */
 
 /* Pick the first implemented and running port */
@@ -690,61 +688,24 @@ static void ahci_test_identify(AHCIQState *ahci)
 /* Clear out the FIS Receive area and any pending interrupts. */
 ahci_port_clear(ahci, i);
 
-/* Create a Command Table buffer. 0x80 is the smallest with a PRDTL of 0. 
*/
-/* We need at least one PRD, so round up to the nearest 0x80 multiple.
*/
-table = ahci_alloc(ahci, CMD_TBL_SIZ(1));
-g_assert(table);
-ASSERT_BIT_CLEAR(table, 0x7F);
-
-/* Create a data buffer ... where we will dump the IDENTIFY data to. */
+/* Create a data buffer where we will dump the IDENTIFY data to. */
 data_ptr = ahci_alloc(ahci, 512);
 g_assert(data_ptr);
 
-/* pick a command slot (should be 0!) */
-cx = ahci_pick_cmd(ahci, i);
-
-/* Construct our Command Header (set_command_header handles endianness.) */
-memset(cmd, 0x00, sizeof(cmd));
-cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */
-cmd.flags |= CMDH_CLR_BSY; /* clear PxTFD.STS.BSY when done */
-cmd.prdtl = 1; /* One PRD table entry. */
-cmd.prdbc = 0;
-cmd.ctba = table;
-
-/* Construct our PRD, noting that DBC is 0-indexed. */
-prd.dba = cpu_to_le64(data_ptr);
-prd.res = 0;
-/* 511+1 bytes, request DPS interrupt */
-prd.dbc = cpu_to_le32(511 | 0x8000);
-
-/* Construct our Command FIS, Based on http://wiki.osdev.org/AHCI */
-memset(fis, 0x00, sizeof(fis));
-fis.fis_type = REG_H2D_FIS;  /* Register Host-to-Device FIS */
-fis.command = CMD_IDENTIFY;
-fis.device = 0;
-fis.flags = REG_H2D_FIS_CMD; /* Indicate this is a command FIS */
-
-/* We've committed nothing yet, no interrupts should be posted yet. */
-g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
-
-/* Commit the Command FIS to the Command Table */
-ahci_write_fis(ahci, fis, table);
-
-/* Commit the PRD entry to the Command Table */
-

Re: [Qemu-devel] [PATCH 4/8] guest agent: add guest-pipe-open

2015-02-03 Thread Eric Blake

On 12/31/2014 06:06 AM, Denis V. Lunev wrote:
 From: Simon Zolin szo...@parallels.com
 
 Creates a FIFO pair that can be used with existing file read/write
 interfaces to communicate with processes spawned via the forthcoming
 guest-file-exec interface.
 
 Signed-off-by: Simon Zolin szo...@parallels.com
 Acked-by: Roman Kagan rka...@parallels.com
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Michael Roth mdr...@linux.vnet.ibm.com
 ---

 +++ b/qga/qapi-schema.json
 @@ -212,12 +212,33 @@
'returns': 'int' }
  
  ##
 +# @guest-pipe-open
 +#
 +# Open a pipe to in the guest to associated with a qga-spawned processes
 +# for communication.
 +#
 +# Returns: Guest file handle on success, as per guest-file-open. This
 +# handle is useable with the same interfaces as a handle returned by

s/useable/usable/

 +# guest-file-open.
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'guest-pipe-open',
 +  'data':{ 'mode': 'str' },
 +  'returns': 'int' }

I'm not a fan of returning a bare 'int' - it is not extensible.  Better
is returning a dictionary, such as 'returns': { 'fd': 'int' }.  That
way, if we ever find a reason to return multiple pieces of information,
we just return a larger dictionary.

Yeah, I know guest-pipe-open breaks the rules here, and so consistency
may be an argument in favor of also breaking the rules.

I don't like 'mode' encoded as a raw string.  Make it an enum type (as
in { 'enum':'PipeMode', 'data':['read', 'write']} ... 'mode':'PipeMode')
or even a bool (as in 'read':'bool')

This only returns ONE end of a pipe (good for when the host is piping
data into the child, or when the child is piping data into the host).
But isn't your goal to also make it possible to string together multiple
child processes where the output of one is the input of the other (no
host involvement)?  How would you wire that up?

 +
 +##
  # @guest-file-close:
  #
  # Close an open file in the guest
  #
  # @handle: filehandle returned by guest-file-open
  #
 +# Please note that closing the write side of a pipe will block until the read
 +# side is closed.  If you passed the read-side of the pipe to a qga-spawned
 +# process, make sure the process has exited before attempting to close the
 +# write side.

How does one pass the read side of a pipe to a spawned child?  Can you
design the spawning API so that close cannot deadlock?

 +#
  # Returns: Nothing on success.
  #
  # Since: 0.15.0
 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 3/8] guest agent: guest-file-open: refactoring

2015-02-03 Thread Eric Blake

On 12/31/2014 06:06 AM, Denis V. Lunev wrote:
 From: Simon Zolin szo...@parallels.com
 
 Moved the code that sets non-blocking flag on fd into a separate function.
 
 Signed-off-by: Simon Zolin szo...@parallels.com
 Acked-by: Roman Kagan rka...@parallels.com
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Michael Roth mdr...@linux.vnet.ibm.com
 ---
  qga/commands-posix.c | 31 +++
  1 file changed, 23 insertions(+), 8 deletions(-)
 
 diff --git a/qga/commands-posix.c b/qga/commands-posix.c
 index f6f3e3c..fd746db 100644
 --- a/qga/commands-posix.c
 +++ b/qga/commands-posix.c
 @@ -376,13 +376,33 @@ safe_open_or_create(const char *path, const char *mode, 
 Error **errp)
  return NULL;
  }
  
 +static int guest_file_toggle_flags(int fd, long flags, bool set, Error **err)
 +{

Why is 'flags' a long?

 +int ret, old_flags;
 +
 +old_flags = fcntl(fd, F_GETFL);
 +if (old_flags == -1) {
 +error_set_errno(err, errno, QERR_QGA_COMMAND_FAILED,
 +failed to fetch filehandle flags);
 +return -1;
 +}
 +
 +ret = fcntl(fd, F_SETFL, set ? (old_flags | flags) : (old_flags  
 ~flags));

Bug. 'int | long' is a long, but on 64-bit platforms, passing a 'long'
as the var-arg third argument of fcntl where the interface expects 'int'
is liable to corrupt things depending on endianness.  You MUST pass an
'int' for F_SETFL.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 4/8] guest agent: add guest-pipe-open

2015-02-03 Thread Eric Blake

On 02/03/2015 02:57 PM, Eric Blake wrote:

 +# Returns: Guest file handle on success, as per guest-file-open. This
 +# handle is useable with the same interfaces as a handle returned by
 

 +  'returns': 'int' }
 
 I'm not a fan of returning a bare 'int' - it is not extensible.  Better
 is returning a dictionary, such as 'returns': { 'fd': 'int' }.  That
 way, if we ever find a reason to return multiple pieces of information,
 we just return a larger dictionary.
 
 Yeah, I know guest-pipe-open breaks the rules here, and so consistency
 may be an argument in favor of also breaking the rules.

I meant to say 'guest-file-open' breaks the rules, and that you are
proposing that 'guest-pipe-open' be consistent with 'guest-file-open'.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] RFC: Proposal to add QEMU Guest Environment Variables

2015-02-03 Thread Michael Roth

Quoting Gabriel L. Somlo (2015-02-03 15:38:59)
 On Tue, Feb 03, 2015 at 02:11:12PM -0600, Michael Roth wrote:
  
  This does seem like useful functionality, but I think I'd like to know
  more about the actual use-cases being looked at.
 
 The proposed functionality is mostly equivalent to that offered by
 GuestInfo variables. So yes, initial activation scripts :)
 
  Is this mostly about executing initial activation scripts? Because after
  that point, a key-value store can be managed through the
  guest-file-read/write interfaces for anything on the guest-side that's
  interested in these variables.
  
  Even activation could be done using this approach, where the
  scripts start QGA and wait for the host to coordinate the initial creation
  of the file containing those variables, then setting a file marker that
  allows activation to proceed. And if that seems wonky, I'm fairly sure you
  could script the creation of the initial key-value store prior to starting
  the guest using libguestfs:
  
http://libguestfs.org/
 
 Specifically, I'm trying to port to QEMU a simulation/training setup
 where multiple VMs are started from the same base image, and guestinfo
 environment variables help each instance determine its personality.
 
 Editing the disk image is not feasible, since the idea is to share the
 base disk image across multiple VMs. And needing to connect to each VM

Well, I assume by shared a base image you mean using a template image
as the backing image for a COW image allocated for each guest prior
to activation? As long as the editing is done against the COW image rather
than the backing image it should work. Maybe it's not ideal, but it's
feasible.

I hadn't really considered the SMBIOS approach though. That might be
more straightforward to get the initial store to the guest.

 after having started it, wait for it to bring up the QGA, then get it
 to accept environment variables, that's precisely the wonkiness I'm
 trying to avoid :)

Understandable :)

 
 I can certainly start small and implement read-only, host-guest startup
 time values (the smbios type11 strings plus a way to read them via a
 guest-side binary associated with a guest-tools package), and we can
 decide whether we want to support set-env operations and exporting
 set-env and get-env via the agent at a later stage. That functionality
 is available with GuestInfo variables, but the system I'm trying to port
 to QEMU doesn't require it as far as I can tell.

Seems like a reasonable start to me.

 
 Thanks much,
 --Gabriel
 
  
  I think we'd need a very strong argument to bake what seems to be
  high-level guest management tasks into QEMU. If that can
  avoided with some automated image modifications beforehand that seems
  to me the more reasonable approach. Libvirt could ostensibly even
  handle the task of writing those XML strings into the image's
  key-value store to make management easier, but I suspect even that is
  a bit too low in the stack for this level of management.

Re: [Qemu-devel] RFC: Proposal to add QEMU Guest Environment Variables

2015-02-03 Thread Gabriel L. Somlo

On Wed, Feb 04, 2015 at 12:49:22AM +0300, Denis V. Lunev wrote:
 On 04/02/15 00:38, Gabriel L. Somlo wrote:
 On Tue, Feb 03, 2015 at 02:11:12PM -0600, Michael Roth wrote:
 
 This does seem like useful functionality, but I think I'd like to know
 more about the actual use-cases being looked at.
 
 The proposed functionality is mostly equivalent to that offered by
 GuestInfo variables. So yes, initial activation scripts :)
 
 Is this mostly about executing initial activation scripts? Because after
 that point, a key-value store can be managed through the
 guest-file-read/write interfaces for anything on the guest-side that's
 interested in these variables.
 
 Even activation could be done using this approach, where the
 scripts start QGA and wait for the host to coordinate the initial creation
 of the file containing those variables, then setting a file marker that
 allows activation to proceed. And if that seems wonky, I'm fairly sure you
 could script the creation of the initial key-value store prior to starting
 the guest using libguestfs:
 
http://libguestfs.org/
 
 Specifically, I'm trying to port to QEMU a simulation/training setup
 where multiple VMs are started from the same base image, and guestinfo
 environment variables help each instance determine its personality.
 
 Editing the disk image is not feasible, since the idea is to share the
 base disk image across multiple VMs. And needing to connect to each VM
 after having started it, wait for it to bring up the QGA, then get it
 to accept environment variables, that's precisely the wonkiness I'm
 trying to avoid :)
 
 I can certainly start small and implement read-only, host-guest startup
 time values (the smbios type11 strings plus a way to read them via a
 guest-side binary associated with a guest-tools package), and we can
 decide whether we want to support set-env operations and exporting
 set-env and get-env via the agent at a later stage. That functionality
 is available with GuestInfo variables, but the system I'm trying to port
 to QEMU doesn't require it as far as I can tell.
 
 
 guest exec with ability to pass an environment could solve your
 problem even without read/write. Boot guest, wait guest agent
 startup, start something you need from agent with desired
 environment.

I'm trying as hard as I can to avoid the bit where I have to wait
guest agent startup, connect, run a bunch of stuff on the guest
through the agent... :)

The application I'm trying to port has a bunch of VM templates with
some guestinfo/environment variables in them, many of them sharing
the same disk image (vmdk) file. We start them all, and that's it.
Fire and forget, no further fuss. If a VM hangs during startup, that's
the application USER's problem, they can restart it, or whatever.

With your suggestion, I'd have to write a bunch of additional logic
to monitor each starting VM, connect to it, handle errors and exceptions
(what if the QGA doesn't start, what if it takes a long time to start, etc.)
Now dealing with a failed boot is suddenly the application's problem,
since I'm sitting there waiting to connect to the agent, and have to
do something if the agent isn't coming up.

That's currently not necessary when using that other hypervisor from
which I'm trying to migrate, so QEMU is at a bit of a disadvantage. I
think there's value making it easy to port stuff over without imposing
a major redesign of the application being ported...

Thanks,
--Gabriel

 
 this is a quote from the patchset being discussed at the moment.
   [PATCH v2 0/8]  qemu: guest agent: implement guest-exec command for Linux
 
 +##
 +# @guest-exec:
 +#
 +# Execute a command in the guest
 +#
 +# @path: path or executable name to execute
 +# @params: #optional parameter list to pass to executable
 +# @env: #optional environment variables to pass to executable
 +# @handle_stdin: #optional handle to associate with process' stdin.
 +# @handle_stdout: #optional handle to associate with process' stdout
 +# @handle_stderr: #optional handle to associate with process' stderr
 +#
 +# Returns: PID on success.
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'guest-exec',
 +  'data':{ 'path': 'str', '*params': ['str'], '*env': ['str'],
 +   '*handle_stdin': 'int', '*handle_stdout': 'int',
 +   '*handle_stderr': 'int' },
 +  'returns': 'int' }
 
 
 Thanks much,
 --Gabriel
 
 
 I think we'd need a very strong argument to bake what seems to be
 high-level guest management tasks into QEMU. If that can
 avoided with some automated image modifications beforehand that seems
 to me the more reasonable approach. Libvirt could ostensibly even
 handle the task of writing those XML strings into the image's
 key-value store to make management easier, but I suspect even that is
 a bit too low in the stack for this level of management.

Re: [Qemu-devel] [PATCH v2] qga: add guest-set-admin-password command

2015-02-03 Thread Eric Blake

On 01/12/2015 08:58 AM, Daniel P. Berrange wrote:
 Add a new 'guest-set-admin-password' command for changing the
 root/administrator password. This command is needed to allow
 OpenStack to support its API for changing the admin password
 on a running guest.
 
 Accepts either the raw password string:
 
 $ virsh -c qemu:///system  qemu-agent-command f21x86_64 \
'{ execute: guest-set-admin-password, arguments:
  { crypted: false, password: 12345678 } }'
   {return:{}}
 
 Or a pre-encrypted string (recommended)
 
 $ virsh -c qemu:///system  qemu-agent-command f21x86_64 \
'{ execute: guest-set-admin-password, arguments:
  { crypted: true, password:
 $6$T9O/j/aGPrE...sniprQoRN4F0.GG0MPjNUNyml. } }'
 
 NB windows support is desirable, but not implemented in this
 patch.
 
 Signed-off-by: Daniel P. Berrange berra...@redhat.com
 ---
  qga/commands-posix.c | 90 
 
  qga/commands-win32.c |  6 
  qga/qapi-schema.json | 19 +++
  3 files changed, 115 insertions(+)
 

 +++ b/qga/qapi-schema.json
 @@ -738,3 +738,22 @@
  ##
  { 'command': 'guest-get-fsinfo',
'returns': ['GuestFilesystemInfo'] }
 +
 +##
 +# @guest-set-admin-password
 +#
 +# @crypted: true if password is already crypt()d, false if raw
 +# @password: the new password entry
 +#
 +# If the @crypted flag is true, it is the callers responsibility

s/callers/caller's/

 +# to ensure the correct crypt() encryption scheme is used. This
 +# command does not attempt to interpret or report on the encryption
 +# scheme. Refer to the documentation of the guest operating system
 +# in question to determine what is supported.
 +#
 +# Returns: Nothing on success.
 +#
 +# Since 2.3
 +##
 +{ 'command': 'guest-set-admin-password',
 +  'data': { 'crypted': 'bool', 'password': 'str' } }
 

Normally, 'password':'str' means we are passing UTF8 JSON.  But what if
the desired password is NOT valid UTF8, but still valid to the end user
(for example, a user that intentionally wants a Latin1 encoded password
that uses 8-bit characters)?  In other interfaces, we've allowed an enum
that specifies whether a raw data string is 'utf8' or 'base64' encoded;
should we have such a parameter here?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [v4 11/13] migration: Add interface to control compression

2015-02-03 Thread Eric Blake

On 02/02/2015 04:05 AM, Liang Li wrote:
 The multiple compression threads can be turned on/off through
 qmp and hmp interface before doing live migration.
 
 Signed-off-by: Liang Li liang.z...@intel.com
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 Reviewed-by: Dr.David Alan Gilbert dgilb...@redhat.com
 ---
  migration/migration.c | 7 +--
  qapi-schema.json  | 7 ++-
  2 files changed, 11 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext.

2015-02-03 Thread Richard Henderson

On 01/29/2015 07:44 AM, Peter Maydell wrote:
 On 16 January 2015 at 17:19,  fred.kon...@greensocs.com wrote:
 From: KONRAD Frederic fred.kon...@greensocs.com

 In order to have one TCGContext per thread and a single TBContext we have to
 extract TBContext from TCGContext.
 
 This seems a bit odd. It's not clear to me what the advantages
 are of having one TCGContext per thread but only a single
 TBContext (as opposed to either (1) having a single TCGContext
 and TBContext with locks protecting against multiple threads
 generating code at once, or (2) having each thread have its
 own TCGContext and TBContext and completely independent codegen).
 
 Maybe it would help if you sketched out your design in a little
 more detail in the cover letter, with emphasis on which data
 structures are going to be per-thread and which are going to
 be shared (and if so how shared).
 
 (Long term we would want to be able to have multiple
 TBContexts to support heterogenous systems where CPUs
 might be different architectures or have different views
 of physical memory...)

Seconded.


r~

Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.

2015-02-03 Thread Paolo Bonzini



On 03/02/2015 17:17, Richard Henderson wrote:
  @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
   PageDesc *pd = *lp;
   
   for (i = 0; i  V_L2_SIZE; ++i) {
  -pd[i].first_tb = NULL;
  +for (j = 0; j  MAX_CPUS; j++) {
  +pd[i].first_tb[j] = NULL;
  +}
   invalidate_page_bitmap(pd + i);
   }
   } else {
 Surely you've got to do some locking somewhere in order to be able to modify
 another thread's cpu tb list.

But that's probably not even necessary.  page_flush_tb_1 is called from
tb_flush, which in turn is only called in very special circumstances.

It should be possible to have something like the kernel's stop_machine
that does the following:

1) schedule a callback on all TCG CPU threads

2) wait for all CPUs to have reached that callback

3) do tb_flush on all CPUs, while it knows they are not holding any lock

4) release all TCG CPU threads

With one TCG thread, just use qemu_bh_new (hidden behind a suitable API
of course!).  Once you have multiple TCG CPU threads, loop on all CPUs
with the same run_on_cpu function that KVM uses.

Paolo

Re: [Qemu-devel] [RFC PATCH v2 09/11] hw/arm/virt-acpi-build: Generate XSDT table

2015-02-03 Thread Laszlo Ersek

On 02/03/15 17:19, Igor Mammedov wrote:
 On Thu, 29 Jan 2015 16:37:11 +0800
 Shannon Zhao zhaoshengl...@huawei.com wrote:
 
 XDST points to other tables except FACS  DSDT.
 Is there any reason to use XSDT instead of RSDT?
 If ACPI tables are below 4Gb which probably would
 be the case then RSDT could be used just fine and
 we could share more code between x86 and ARM.
 
 Laszlo,
 Do you know if OVMF allocates memory below 4G address range?

Yes, it does.

https://github.com/tianocore/edk2/blob/master/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c#L162

RSDT should suffice.

Thanks,
Laszlo

Re: [Qemu-devel] [PATCH 0/7] MIPS: IEEE 754-2008 features support

2015-02-03 Thread Thomas Schwinge

Hi!

On Fri, 30 Jan 2015 13:47:17 +, Maciej W. Rozycki ma...@linux-mips.org 
wrote:
 On Fri, 30 Jan 2015, Peter Maydell wrote:
 
This patch series comprises changes to QEMU, both the MIPS backend and
   generic SoftFloat support code, to support IEEE 754-2008 features
   introduced to revision 3.50 of the MIPS Architecture as follows.
  
  Just to let you know that:
  (1) the softfloat relicensing has hit master, so this patchset isn't
  blocked by anything now

\o/

  (2) I would like to see a definite we are happy to license this
  patchset under the SoftFloat-2a license for these changes, because
  they were submitted before we applied the relicensing, and therefore
  the changes after $DATE will be -2a license unless otherwise stated
  note in the sourcecode can't be assumed to apply to them.
 
  Thanks for the heads-up!  At this stage however someone at Mentor will 
 have to make such a statement on behalf of the company as I am no longer 
 there and as far as this patch set is concerned I am merely a member of 
 the public who can just make technical comments as anyone can, including 
 you.
 
  I think Thomas, being the writer of the majority of code comprising these 
 patches

Too bad that Git doesn't allow for listing several authors.  ;-)

 is now in the best position to make such a statement happen.  
 Thomas -- will you be able to take it from here?  Thanks!

It is fine to license these changes under the SoftFloat-2a license.


Grüße,
 Thomas


pgpiNvmH44Sxl.pgp
Description: PGP signature

[Qemu-devel] balloon vs postcopy migrate

2015-02-03 Thread Dr. David Alan Gilbert

Hi,
  Andrea pointed out there is a risk that a guest inflating its
balloon during a postcopy migrate could cause us problems, and
I wanted to see what the best way of avoiding the problem was.

Guests inflating there balloon cause an madvise(MADV_DONTNEED) on
the host, marking pages as not present, that will potentially trigger
a userfault, that we are using in postcopy to detect pages that need
to be fetched from the source.

In theory, at the moment guests *should* only ask for a balloon
inflation if they've been asked to do so by the host; however there
are no guards for that, and it's been suggested giving the
guest more freedom might be a good idea anyway.

My alternatives seem to be:
   1) Stop servicing the message queue from the guest so
 that we just don't notice the inflate messages until
 afterwards.  (Easy for Qemu, not sure how the guests
 will like an unserviced queue).

   2) I could keep servicing the queue and ignore the messages
 (Easy for everyone, not very nice in actual used memory -
  does it cause any long term problems other than that?)

   3) I could keep servicing the queue but put the messages
 in a list somewhere that replay after migrate has finished.
 (That list sounds bounded only in a very large way?)

Thoughts?

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH v2 09/11] hw/arm/virt-acpi-build: Generate XSDT table

2015-02-03 Thread Igor Mammedov

On Thu, 29 Jan 2015 16:37:11 +0800
Shannon Zhao zhaoshengl...@huawei.com wrote:

 XDST points to other tables except FACS  DSDT.
Is there any reason to use XSDT instead of RSDT?
If ACPI tables are below 4Gb which probably would
be the case then RSDT could be used just fine and
we could share more code between x86 and ARM.

Laszlo,
Do you know if OVMF allocates memory below 4G address range?

 
 Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com
 ---
  hw/arm/virt-acpi-build.c|   32 +++-
  include/hw/acpi/acpi-defs.h |9 +
  2 files changed, 40 insertions(+), 1 deletions(-)
 
 diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
 index ac0a864..2a2b2ab 100644
 --- a/hw/arm/virt-acpi-build.c
 +++ b/hw/arm/virt-acpi-build.c
 @@ -176,6 +176,32 @@ static void acpi_dsdt_add_virtio(AcpiAml *scope, const 
 hwaddr *mmio_addrs,
  }
  }
  
 +
 +/* XSDT */
 +static void
 +build_xsdt(GArray *table_data, GArray *linker, GArray *table_offsets)
 +{
 +AcpiXsdtDescriptor *xsdt;
 +size_t xsdt_len;
 +int i;
 +
 +xsdt_len = sizeof(*xsdt) + sizeof(uint64_t) * table_offsets-len;
 +xsdt = acpi_data_push(table_data, xsdt_len);
 +memcpy(xsdt-table_offset_entry, table_offsets-data,
 +   sizeof(uint64_t) * table_offsets-len);
 +for (i = 0; i  table_offsets-len; ++i) {
 +/* xsdt-table_offset_entry to be filled by Guest linker */
 +bios_linker_loader_add_pointer(linker,
 +   ACPI_BUILD_TABLE_FILE,
 +   ACPI_BUILD_TABLE_FILE,
 +   table_data, 
 xsdt-table_offset_entry[i],
 +   sizeof(uint64_t));
 +}
 +build_header(linker, table_data, (void *)xsdt, XSDT,
 + ACPI_BUILD_APPNAME6, ACPI_BUILD_APPNAME4,
 + xsdt_len, 1);
 +}
 +
  /* GTDT */
  static void
  build_gtdt(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info)
 @@ -311,7 +337,7 @@ static
  void virt_acpi_build(VirtGuestInfo *guest_info, AcpiBuildTables *tables)
  {
  GArray *table_offsets;
 -unsigned dsdt;
 +unsigned dsdt, xsdt;
  VirtAcpiCpuInfo cpuinfo;
  
  virt_acpi_get_cpu_info(cpuinfo);
 @@ -346,6 +372,10 @@ void virt_acpi_build(VirtGuestInfo *guest_info, 
 AcpiBuildTables *tables)
  acpi_add_table(table_offsets, tables-table_data.buf);
  build_gtdt(tables-table_data.buf, tables-linker, guest_info);
  
 +/* XSDT is pointed to by RSDP */
 +xsdt = tables-table_data.buf-len;
 +build_xsdt(tables-table_data.buf, tables-linker, table_offsets);
 +
  /* Cleanup memory that's no longer used. */
  g_array_free(table_offsets, true);
  }
 diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
 index ee40a5e..47c8c41 100644
 --- a/include/hw/acpi/acpi-defs.h
 +++ b/include/hw/acpi/acpi-defs.h
 @@ -88,6 +88,15 @@ struct AcpiTableHeader /* ACPI common table header 
 */
  typedef struct AcpiTableHeader AcpiTableHeader;
  
  /*
 + * Extended System Description Table (XSDT)
 + */
 +struct AcpiXsdtDescriptor {
 +ACPI_TABLE_HEADER_DEF
 +uint64_t table_offset_entry[1]; /* Array of pointers to ACPI tables */
 +} QEMU_PACKED;
 +typedef struct AcpiXsdtDescriptor AcpiXsdtDescriptor;
 +
 +/*
   * ACPI Fixed ACPI Description Table (FADT)
   */
  #define ACPI_FADT_COMMON_DEF /* FADT common definition */ \

Re: [Qemu-devel] [PATCH v2 3/7] softfloat: Convert `*_default_nan' variables into inline functions

2015-02-03 Thread Richard Henderson

On 01/30/2015 08:02 AM, Maciej W. Rozycki wrote:
  Hmm, so perhaps my idea for a later improvement:
 
   Eventually we might want to move the new inline functions into a
  separate header to be included from softfloat.h instead of softfloat.c,
  but let's make changes one step at a time.
 will actually have to be made right away.  I suspect GCC is more liberal 
 here due to its convoluted extern/static/inline semantics history.  
 Sigh...

GCC 5 is moving to -std=gnu11 as default, and so will have the same problem.


r~

Re: [Qemu-devel] [PATCH v5 02/10] virtio-net: use qemu_mac_strdup_printf

2015-02-03 Thread Eric Blake

On 01/22/2015 01:03 AM, sfel...@gmail.com wrote:
 From: Scott Feldman sfel...@gmail.com
 
 Signed-off-by: Scott Feldman sfel...@gmail.com
 ---
  hw/net/virtio-net.c |   12 +++-
  1 file changed, 3 insertions(+), 9 deletions(-)

You could merge this with 1/10 without any dire consequences.  But
whether you merge or keep as two patches, feel free to add this on the
respin:

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] vfio: free dynamically-allocated data in instance_finalize

2015-02-03 Thread Paolo Bonzini



On 03/02/2015 16:20, Alex Williamson wrote:
 On Tue, 2015-02-03 at 13:48 +0100, Paolo Bonzini wrote:
 In order to enable out-of-BQL address space lookup, destruction of
 devices needs to be split in two phases.

 Unrealize is the first phase; once it complete no new accesses will
 be started, but there may still be pending memory accesses can still
 be completed.

 The second part is freeing the device, which only happens once all memory
 accesses are complete.  At this point the reference count has dropped to
 zero, an RCU grace period must have completed (because the RCU-protected
 FlatViews hold a reference to the device via memory_region_ref).  This is
 when instance_finalize is called.

 Freeing data belongs in an instance_finalize callback, because the
 dynamically allocated memory can still be used after unrealize by the
 pending memory accesses.

 In the case of VFIO, the unrealize callback is too early to munmap the
 BARs.  The munmap must be delayed until memory accesses are complete.
 To do this, split vfio_unmap_bars in two.  The removal step, now called
 vfio_unregister_bars, remains in vfio_exitfn.  The reclamation step
 is vfio_unmap_bars and is moved to the instance_finalize callback.

 Similarly, quirk MemoryRegions have to be removed during
 vfio_unregister_bars, but freeing the data structure must be delayed
 to vfio_unmap_bars.

 Cc: Alex Williamson alex.william...@redhat.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  This patch is part of the third installment 3 of the RCU work.
  Sending it out separately for Alex to review it.

  hw/vfio/pci.c |   78 
 +-
  1 file changed, 68 insertions(+), 10 deletions(-)
 
 Looks good to me.  I don't see any external dependencies, so do you want
 me to pull this in through my branch?  Thanks,

Yes, please.

Paolo

Re: [Qemu-devel] [PATCH v5 03/10] rocker: add register programming guide

2015-02-03 Thread Eric Blake

On 01/22/2015 01:03 AM, sfel...@gmail.com wrote:
 From: Scott Feldman sfel...@gmail.com
 
 This is the register programming guide for the Rocker device.  It's intended
 for driver writers and device writers.  It covers the device's PCI space,
 the register set, DMA interface, and interrupts.
 

In addition to typos already pointed out by Stefan,

 +
 +Writing BASE_ADDR or SIZE will reset HEAD and TAIL to zero.  HEAD cannot be
 +written passed TAIL.  To do so would wrap the ring.  An empty ring is when 
 HEAD

s/passed/past/

 +
 +To support forward- and backward-compatibility, descriptor and completion
 +payloads are specified in TLV format.  Fields are packed with Type=field 
 name,
 +Length=field length, and Value=field value.  Software will ignore unknown 
 fields
 +filled in by the switch.  Likewise, the switch will ignore unknown fields
 +filled in by software.

Is ignoring unknown fields always the wisest action?  If the unknown
fields are supposed to have an impact according the to writer, but get
ignored by the reader, then the two can get out of sync with what they
assume the other end is doing.


 +MSI-X vectors used for descriptor ring completions use a credit mechanism for
 +efficient device, PCIe bus, OS and driver operations.  Each descriptor ring 
 has
 +a credit count which represent the number of outstanding descriptors to be

s/represent/represents/


 +
 + portmapping
 + ---
 + 0   CPU port (for packets to/from host CPU)
 + 1-62front-panel physical ports
 + 63  loopback port
 + 64-0x   RSVD
 + 0x0001-0x0001   logical tunnel ports
 +0x0002-0x   RSVD

Alignment looks off.


 +Port Settings
 +-
 +
 +Links status for all front-panel ports is available via 
 PORT_PHYS_LINK_STATUS:

s/Links/Link/

 +
 + DESC_COMP_ERR   reason
 + 
 + 0   OK
 + -ROCKER_ENXIO   address or data read err on desc buf
 + -ROCKER_ENOMEM  no memory for internal staging desc buf
 + -ROCKER_EMSGSIZE Rx descriptor buffer wasn't big enough to contain
 + pactet data TLV and other TLVs.

s/pactet/packet/

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.

2015-02-03 Thread Richard Henderson

On 01/16/2015 09:19 AM, fred.kon...@greensocs.com wrote:
 @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
  PageDesc *pd = *lp;
  
  for (i = 0; i  V_L2_SIZE; ++i) {
 -pd[i].first_tb = NULL;
 +for (j = 0; j  MAX_CPUS; j++) {
 +pd[i].first_tb[j] = NULL;
 +}
  invalidate_page_bitmap(pd + i);
  }
  } else {

Surely you've got to do some locking somewhere in order to be able to modify
another thread's cpu tb list.

I realize that we do have to solve this problem for x86, but for most other
targets we ought, in principal, be able to avoid it.  Which simply requires
that we not treat icache flushes as nops.

When the kernel has modified a page, like so, it will also have notified the
other cpus that like so,

if (smp_call_function(ipi_flush_icache_page, mm, 1)) {

We ought to be able to leverage this to avoid some locking at the qemu level.


r~

Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-03 Thread Denis V. Lunev


On 03/02/15 14:47, Peter Lieven wrote:

Am 03.02.2015 um 12:37 schrieb Kevin Wolf:

Am 03.02.2015 um 12:30 hat Peter Lieven geschrieben:

Am 03.02.2015 um 08:31 schrieb Denis V. Lunev:

On 02/02/15 23:46, Denis V. Lunev wrote:

On 02/02/15 23:40, Peter Lieven wrote:

Am 02.02.2015 um 21:09 schrieb Denis V. Lunev:

qemu_gluster_co_discard calculates size to discard as follows
  size_t size = nb_sectors * BDRV_SECTOR_SIZE;
  ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);

glfs_discard_async is declared as follows:
int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
glfs_io_cbk fn, void *data) __THROW
This is problematic on i686 as sizeof(size_t) == 4.

Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
on i386.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Kevin Wolf kw...@redhat.com
CC: Peter Lieven p...@kamp.de
---
   block/gluster.c | 9 +
   1 file changed, 9 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..8a8c153 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -622,6 +622,11 @@ out:
   return ret;
   }
   +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
+}
+

Looking at the gluster code bl.max_transfer_length should have the same limit, 
but thats a different patch.

ha, the same applies to nbd code too.

I'll do this stuff tomorrow and also I think that some
audit in other drivers could reveal something interesting.

Den

ok. The situation is well rotten here on i686.

The problem comes from the fact that QEMUIOVector
and iovec uses size_t as length. All API calls use
this abstraction. Thus all conversion operations
from nr_sectors to size could bang at any moment.

Putting dirty hands here is problematic from my point
of view. Should we really care about this? 32bit
applications are becoming old good history of IT...

The host has to be 32bit to be in trouble. And at least if we have KVM the host
has to support long mode.

I have on my todo to add generic code for honouring bl.max_transfer_length
in block.c. We could change default maximum from INT_MAX to SIZE_MAX  
BDRV_SECTOR_BITS
for bl.max_transfer_length.

So the conclusion is that we'll apply this series as it is and you'll
take care of the rest later?

Yes, and actually we need a macro like

#define BDRV_MAX_REQUEST_SECTORS MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX)

as limit for everything. Because bdrv_check_byte_request already has a size_t 
argument.
So we could already create an overflow in bdrv_check_request when we convert
nb_sectors to size_t.

I will create a patch to catch at least this overflow shortly.

Peter


I like this macro :)

I vote to move MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX) into generic code
on discard/write_zero paths immediately and drop this exact patch.

Patch 2 of this set would be better to have additional
+bs-bl.max_transfer_length = UINT32_MAX  BDRV_SECTOR_BITS;

I'll wait Peter's patch and respin on top of it to avoid unnecessary 
commits.


Den

[Qemu-devel] [PATCH v2 0/3] bootdevcie: change the boot order validation logic

2015-02-03 Thread arei.gonglei

From: Gonglei arei.gong...@huawei.com

The reset logic can be done by both machine reset and
boot handler. So we shouldn't return error when the boot
handler callback don't be set in patch 1.

Patch 2 check boot order argument validation
before vm running.

Patch 3 passing error_abort instead of NULL.

v2 - v1:
 - add patch 2 suggested by Markus.
 - rework patch 3. (Maruks)
 - add R-by in patch 1.

Gonglei (3):
  bootdevice: remove the check about boot_set_handler
  bootdevice: check boot order argument validation before vm running
  bootdevice: add check in restore_boot_order()

 bootdevice.c | 12 
 vl.c | 13 +++--
 2 files changed, 15 insertions(+), 10 deletions(-)

-- 
1.7.12.4

[Qemu-devel] [PATCH v2 2/3] bootdevice: check boot order argument validation before vm running

2015-02-03 Thread arei.gonglei

From: Gonglei arei.gong...@huawei.com

Either 'once' option or 'order' option can take effect for -boot at
the same time, that is say initial startup processing can check only
one. And pc.c's set_boot_dev() fails when its boot order argument
is invalid. This patch provide a solution fix this problem:

 1. If once is given, register reset handler to restore boot order.

 2. Pass the normal boot order to machine creation.  Should fail when
   the normal boot order is invalid.

 3. If once is given, set it with qemu_boot_set().  Fails when the
   once boot order is invalid.

 4. Start the machine.

 5. On reset, the reset handler calls qemu_boot_set() to restore boot
   order.  Should never fail.

Suggested-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 vl.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index 983259b..0d90d98 100644
--- a/vl.c
+++ b/vl.c
@@ -2734,6 +2734,7 @@ int main(int argc, char **argv, char **envp)
 const char *initrd_filename;
 const char *kernel_filename, *kernel_cmdline;
 const char *boot_order;
+const char *once = NULL;
 DisplayState *ds;
 int cyls, heads, secs, translation;
 QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
@@ -4046,7 +4047,7 @@ int main(int argc, char **argv, char **envp)
 opts = qemu_opts_find(qemu_find_opts(boot-opts), NULL);
 if (opts) {
 char *normal_boot_order;
-const char *order, *once;
+const char *order;
 Error *local_err = NULL;
 
 order = qemu_opt_get(opts, order);
@@ -4067,7 +4068,6 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 normal_boot_order = g_strdup(boot_order);
-boot_order = once;
 qemu_register_reset(restore_boot_order, normal_boot_order);
 }
 
@@ -4246,6 +4246,15 @@ int main(int argc, char **argv, char **envp)
 
 net_check_clients();
 
+if (once) {
+Error *local_err = NULL;
+qemu_boot_set(once, local_err);
+if (local_err) {
+error_report(%s, error_get_pretty(local_err));
+exit(1);
+}
+}
+
 ds = init_displaystate();
 
 /* init local displays */
-- 
1.7.12.4

[Qemu-devel] [PATCH v2 3/3] bootdevice: add check in restore_boot_order()

2015-02-03 Thread arei.gonglei

From: Gonglei arei.gong...@huawei.com

qemu_boot_set() can't fail in restore_boot_order(),
then simply assert it doesn't fail, by passing
error_abort.

Signed-off-by: Gonglei arei.gong...@huawei.com
---
 bootdevice.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bootdevice.c b/bootdevice.c
index 52d3f9e..d3d4277 100644
--- a/bootdevice.c
+++ b/bootdevice.c
@@ -101,7 +101,7 @@ void restore_boot_order(void *opaque)
 return;
 }
 
-qemu_boot_set(normal_boot_order, NULL);
+qemu_boot_set(normal_boot_order, error_abort);
 
 qemu_unregister_reset(restore_boot_order, normal_boot_order);
 g_free(normal_boot_order);
-- 
1.7.12.4

[Qemu-devel] [PATCH] vfio: free dynamically-allocated data in instance_finalize

2015-02-03 Thread Paolo Bonzini

In order to enable out-of-BQL address space lookup, destruction of
devices needs to be split in two phases.

Unrealize is the first phase; once it complete no new accesses will
be started, but there may still be pending memory accesses can still
be completed.

The second part is freeing the device, which only happens once all memory
accesses are complete.  At this point the reference count has dropped to
zero, an RCU grace period must have completed (because the RCU-protected
FlatViews hold a reference to the device via memory_region_ref).  This is
when instance_finalize is called.

Freeing data belongs in an instance_finalize callback, because the
dynamically allocated memory can still be used after unrealize by the
pending memory accesses.

In the case of VFIO, the unrealize callback is too early to munmap the
BARs.  The munmap must be delayed until memory accesses are complete.
To do this, split vfio_unmap_bars in two.  The removal step, now called
vfio_unregister_bars, remains in vfio_exitfn.  The reclamation step
is vfio_unmap_bars and is moved to the instance_finalize callback.

Similarly, quirk MemoryRegions have to be removed during
vfio_unregister_bars, but freeing the data structure must be delayed
to vfio_unmap_bars.

Cc: Alex Williamson alex.william...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
This patch is part of the third installment 3 of the RCU work.
Sending it out separately for Alex to review it.

 hw/vfio/pci.c |   78 
+-
 1 file changed, 68 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 014a92c..69d4a33 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1997,12 +1997,23 @@ static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
 
 static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
 {
+VFIOQuirk *quirk;
+int i;
+
+for (i = 0; i  ARRAY_SIZE(vdev-vga.region); i++) {
+QLIST_FOREACH(quirk, vdev-vga.region[i].quirks, next) {
+memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem);
+}
+}
+}
+
+static void vfio_vga_quirk_free(VFIOPCIDevice *vdev)
+{
 int i;
 
 for (i = 0; i  ARRAY_SIZE(vdev-vga.region); i++) {
 while (!QLIST_EMPTY(vdev-vga.region[i].quirks)) {
 VFIOQuirk *quirk = QLIST_FIRST(vdev-vga.region[i].quirks);
-memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem);
 object_unparent(OBJECT(quirk-mem));
 QLIST_REMOVE(quirk, next);
 g_free(quirk);
@@ -2023,10 +2034,19 @@ static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, 
int nr)
 static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 {
 VFIOBAR *bar = vdev-bars[nr];
+VFIOQuirk *quirk;
+
+QLIST_FOREACH(quirk, bar-quirks, next) {
+memory_region_del_subregion(bar-region.mem, quirk-mem);
+}
+}
+
+static void vfio_bar_quirk_free(VFIOPCIDevice *vdev, int nr)
+{
+VFIOBAR *bar = vdev-bars[nr];
 
 while (!QLIST_EMPTY(bar-quirks)) {
 VFIOQuirk *quirk = QLIST_FIRST(bar-quirks);
-memory_region_del_subregion(bar-region.mem, quirk-mem);
 object_unparent(OBJECT(quirk-mem));
 QLIST_REMOVE(quirk, next);
 g_free(quirk);
@@ -2282,7 +2302,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, 
bool enabled)
 }
 }
 
-static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
+static void vfio_unregister_bar(VFIOPCIDevice *vdev, int nr)
 {
 VFIOBAR *bar = vdev-bars[nr];
 
@@ -2293,10 +2313,25 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 vfio_bar_quirk_teardown(vdev, nr);
 
 memory_region_del_subregion(bar-region.mem, bar-region.mmap_mem);
-munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem));
 
 if (vdev-msix  vdev-msix-table_bar == nr) {
 memory_region_del_subregion(bar-region.mem, vdev-msix-mmap_mem);
+}
+}
+
+static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
+{
+VFIOBAR *bar = vdev-bars[nr];
+
+if (!bar-region.size) {
+return;
+}
+
+vfio_bar_quirk_free(vdev, nr);
+
+munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem));
+
+if (vdev-msix  vdev-msix-table_bar == nr) {
 munmap(vdev-msix-mmap, memory_region_size(vdev-msix-mmap_mem));
 }
 }
@@ -2413,6 +2448,19 @@ static void vfio_unmap_bars(VFIOPCIDevice *vdev)
 }
 
 if (vdev-has_vga) {
+vfio_vga_quirk_free(vdev);
+}
+}
+
+static void vfio_unregister_bars(VFIOPCIDevice *vdev)
+{
+int i;
+
+for (i = 0; i  PCI_ROM_SLOT; i++) {
+vfio_unregister_bar(vdev, i);
+}
+
+if (vdev-has_vga) {
 vfio_vga_quirk_teardown(vdev);
 pci_unregister_vga(vdev-pdev);
 }
@@ -3324,6 +3372,7 @@ static int vfio_initfn(PCIDevice *pdev)
 out_teardown:
 pci_device_set_intx_routing_notifier(vdev-pdev, NULL);
 vfio_teardown_msi(vdev);
+vfio_unregister_bars(vdev);

Re: [Qemu-devel] [PATCH RFC 0/1] KVM: ioctl for reading/writing guest memory

2015-02-03 Thread Christian Borntraeger

Am 03.02.2015 um 13:59 schrieb Paolo Bonzini:
 
 
 On 03/02/2015 13:11, Thomas Huth wrote:
 The userspace (QEMU) then can simply call this ioctl when it wants
 to read or write from/to virtual guest memory. Then kernel then takes
 the IPTE-lock, walks the MMU table of the guest to find out the
 physical address that corresponds to the virtual address, copies
 the requested amount of bytes from the userspace buffer to guest
 memory or the other way round, and finally frees the IPTE-lock again.

 Does that sound like a viable solution (IMHO it does ;-))? Or should
 I maybe try to pursue another approach?
 
 It looks feasible to me as well.

Yes, we discussed this internally a lot and things are really tricky. The
ipte lock could be exported to userspace, but we might also need to handle
storage keys (and key protection) in an atomic fashion, so this really
looks like the only safe way.
I guess we will give it some more testing, but to me it looks like a good
candidate for kvm/next after 3.20-rc1.


Christian

[Qemu-devel] [PULL 6/9] s390x/kvm: unknown DIAGNOSE code should give a specification exception

2015-02-03 Thread Cornelia Huck

From: Christian Borntraeger borntrae...@de.ibm.com

As described in CP programming services an unimplemented DIAGNOSE
function should return a specification exception. Today we give the
guest an operation exception.
As both exception types are suppressing and Linux as a guest does not
care about the type of program check in its exception table handler
as long as both types have the same kind of error handling (nullifying,
terminating, suppressing etc.) this was unnoticed.

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 target-s390x/kvm.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 6bf2719..6f2d5b4 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -1091,7 +1091,7 @@ static int handle_diag(S390CPU *cpu, struct kvm_run *run, 
uint32_t ipb)
 break;
 default:
 DPRINTF(KVM: unknown DIAG: 0x%x\n, func_code);
-r = -1;
+enter_pgmcheck(cpu, PGM_SPECIFICATION);
 break;
 }
 
-- 
1.7.9.5

1 2 3 >

1 - 100 of 257 matches

Mail list logo