date:20160831

Re: [Qemu-devel] [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-08-31 Thread Li, Liang Z

> Subject: Re: [PATCH v3 kernel 0/7] Extend virtio-balloon for fast 
> (de)inflating
> & fast live migration
> 
> 2016-08-08 14:35 GMT+08:00 Liang Li :
> > This patch set contains two parts of changes to the virtio-balloon.
> >
> > One is the change for speeding up the inflating & deflating process,
> > the main idea of this optimization is to use bitmap to send the page
> > information to host instead of the PFNs, to reduce the overhead of
> > virtio data transmission, address translation and madvise(). This can
> > help to improve the performance by about 85%.
> >
> > Another change is for speeding up live migration. By skipping process
> > guest's free pages in the first round of data copy, to reduce needless
> > data processing, this can help to save quite a lot of CPU cycles and
> > network bandwidth. We put guest's free page information in bitmap and
> > send it to host with the virt queue of virtio-balloon. For an idle 8GB
> > guest, this can help to shorten the total live migration time from
> > 2Sec to about 500ms in the 10Gbps network environment.
> 
> I just read the slides of this feature for recent kvm forum, the cloud
> providers more care about live migration downtime to avoid customers'
> perception than total time, however, this feature will increase downtime
> when acquire the benefit of reducing total time, maybe it will be more
> acceptable if there is no downside for downtime.
> 
> Regards,
> Wanpeng Li

In theory, there is no factor that will increase the downtime. There is no 
additional operation
and no more data copy during the stop and copy stage. But in the test, the 
downtime increases
and this can be reproduced. I think the busy network line maybe the reason for 
this. With this
 optimization, a huge amount of data is written to the socket in a shorter 
time, so some of the write
operation may need to wait. Without this optimization, zero page checking takes 
more time,
the network is not so busy.

If the guest is not an idle one, I think the gap of the downtime will not so 
obvious.  Anyway, the
downtime is still less than the  max_down_time set by the user.

Thanks!
Liang

Re: [Qemu-devel] [PATCH] Fix memory leak in ide_register_restart_cb()

2016-08-31 Thread Ashijeet Acharya

I am still waiting for review on this one.

On Tue, Aug 16, 2016 at 10:40 PM, Ashijeet Acharya
 wrote:
> Fix a memory leak in ide_register_restart_cb() in hw/ide/core.c and add 
> idebus_unrealize() in hw/ide/qdev.c to have calls to 
> qemu_del_vm_change_state_handler() to deal with the dangling change state 
> handler during hot-unplugging ide devices which might lead to a crash.
>
> Signed-off-by: Ashijeet Acharya 
> ---
>  hw/ide/core.c |  2 +-
>  hw/ide/qdev.c | 14 ++
>  include/hw/ide/internal.h |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 45b6df1..eecbb47 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2582,7 +2582,7 @@ static void ide_restart_cb(void *opaque, int running, 
> RunState state)
>  void ide_register_restart_cb(IDEBus *bus)
>  {
>  if (bus->dma->ops->restart_dma) {
> -qemu_add_vm_change_state_handler(ide_restart_cb, bus);
> +bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, bus);
>  }
>  }
>
> diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
> index 67c76bf..6f75f77 100644
> --- a/hw/ide/qdev.c
> +++ b/hw/ide/qdev.c
> @@ -31,6 +31,7 @@
>  /* - */
>
>  static char *idebus_get_fw_dev_path(DeviceState *dev);
> +static void idebus_unrealize(DeviceState *qdev, Error **errp);
>
>  static Property ide_props[] = {
>  DEFINE_PROP_UINT32("unit", IDEDevice, unit, -1),
> @@ -345,6 +346,7 @@ static void ide_device_class_init(ObjectClass *klass, 
> void *data)
>  k->init = ide_qdev_init;
>  set_bit(DEVICE_CATEGORY_STORAGE, k->categories);
>  k->bus_type = TYPE_IDE_BUS;
> +k->unrealize = idebus_unrealize;
>  k->props = ide_props;
>  }
>
> @@ -368,3 +370,15 @@ static void ide_register_types(void)
>  }
>
>  type_init(ide_register_types)
> +
> +static void idebus_unrealize(DeviceState *qdev, Error **errp)
> +{
> +IDEBus *bus = DO_UPCAST(IDEBus, qbus, qdev->parent_bus);
> +
> +if (bus->dma->ops->restart_dma) {
> +if (bus->vmstate) {
> +qemu_del_vm_change_state_handler(bus->vmstate);
> +}
> +}
> +}
>
> diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
> index 7824bc3..2103261 100644
> --- a/include/hw/ide/internal.h
> +++ b/include/hw/ide/internal.h
> @@ -480,6 +480,7 @@ struct IDEBus {
>  uint8_t retry_unit;
>  int64_t retry_sector_num;
>  uint32_t retry_nsector;
> +VMChangeStateEntry *vmstate;
>  };
>
>  #define TYPE_IDE_DEVICE "ide-device"
> --
> 2.6.2
>

Re: [Qemu-devel] [PATCH v2 0/9] SMMUv3 Emulation support

2016-08-31 Thread Prem Mallappa

Oops, my mistake, copy-paste from different part.
I'll correct it in next spin.

Eric, you are most welcome to review though :)



On Thu, Sep 1, 2016 at 3:14 AM, Auger Eric  wrote:

> Hi Prem,
> On 22/08/2016 18:17, Prem Mallappa wrote:
> > v1 -> v2:
> >   - Adopted review comments from Eric Auger
> Although I am really interested in your series, those comments are not
> mine and credit should be given to somebody else (Edgar?)
>
> I will do my utmost to review it too ;-)
>
> Thanks
>
> Eric
> >   - Make SMMU_DPRINTF to internally call qemu_log
> >   (since translation requests are too many, we need
> control
> >on the type of log we want)
> >   - SMMUTransCfg modified to suite simplicity
> >   - Change RegInfo to uint64 register array
> >   - Code cleanup
> >   - Test cleanups
> >   - Reshuffled patches
> >
> > RFC -> v1:
> >   - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
> >   - Reworked register access/update logic
> >   - Factored out translation code for
> >   - single point bug fix
> >   - sharing/removal in future
> >   - (optional) Unit tests added, with PCI test device
> >   - S1 with 4k/64k, S1+S2 with 4k/64k
> >   - (S1 or S2) only can be verified by Linux 4.7 driver
> >   - (optional) Priliminary ACPI support
> >
> > RFC:
> >   - Implements SMMUv3 spec 11.0
> >   - Supported for PCIe devices,
> >   - Command Queue and Event Queue supported
> >   - LPAE only, S1 is supported and Tested, S2 not tested
> >   - BE mode Translation not supported
> >   - IRQ support (legacy, no MSI)
> >   - Tested with DPDK and e1000
> >
> > Patch 1: Add new log type for IOMMU transactions
> >
> > Patch 2: Adds support in virt.c to create both SMMUv3 device and dts
> entries
> >
> > Patch 2: Adds SMMUv3 model to QEMU
> >   Multiple files, big ones, translate functionality is split across
> to
> >   accomodate SMMUv2 model, and to remove when common translation
> feature
> >   (if) becomes available.
> >
> > Patch 3: Adds SMMU build support
> >
> > Patch 4: Some devicetree function to add support for SMMU's multiple
> interrupt
> >assignment with names
> >
> > << optional patches >>
> > Optional patches are posted for completeness or for those who wants to
> test.
> >
> > Patch 5: A simple PCI device which does DMA from 'src' to 'dst' given
> >src_addr, dst_addr and size, and is used by unit test, uses
> >pci_dma_read and pci_dma_write in a crude way but serves the
> purpose.
> >
> > Patch 6: Current libqos PCI helpers are x86 only, this addes a generic
> interface
> >
> > Patch 7: Unit tests for SMMU,
> >   - initializes SMMU device
> >   - initializes Test device
> >   - allocates page tables 1:1 mapping va == pa
> >   - allocates STE/CD accordingly for S1, S2, S1+S2
> >   - initiates DMA via PCI test device
> >   - verifies transfered data
> >
> > Patch 8: Added ACPI IORT tables, was needed for internal project
> purpose, but
> >posting here for anyone looking for testing ACPI on ARM platforms.
> >(P.S: Linux side IORT patches are WIP)
> >
> > Repo:
> > https://github.com/pmallappa/qemu/tree/upstream/smmuv3/v2
> >
> > To Test:
> > $ make tests/smmuv3-test
> > $ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64
> tests/smmuv3-test
> > << expect lot of prints >>
> >
> > Any comments welcome..
> >
> > Cheers
> > /Prem
> >
> > Prem Mallappa (9):
> >   log: Add new IOMMU type
> >   devicetree: Added new APIs to make use of more fdt functions
> >   hw: arm: SMMUv3 emulation model
> >   hw: arm: Added SMMUv3 files for build
> >   hw: arm: Add SMMUv3 to virt platform, create DTS accordingly
> >   [optional] hw: misc: added testdev for smmu
> >   [optional] tests: libqos: generic pci probing helpers
> >   [optional] tests: SMMUv3 unit tests
> >   [optional] arm: smmu-v3: ACPI IORT initial support
> >
> >  default-configs/aarch64-softmmu.mak |1 +
> >  device_tree.c   |   35 +
> >  hw/arm/Makefile.objs|1 +
> >  hw/arm/smmu-common.c|  152 
> >  hw/arm/smmu-common.h|  141 
> >  hw/arm/smmu-v3.c| 1369
> +++
> >  hw/arm/smmuv3-internal.h|  432 +++
> >  hw/arm/virt-acpi-build.c|   43 ++
> >  hw/arm/virt.c   |   62 ++
> >  hw/misc/Makefile.objs   |2 +-
> >  hw/misc/pci-testdev-smmu.c  |  239 ++
> >  hw/misc/pci-testdev-smmu.h  |   22 +
> >  hw/vfio/common.c|2 +-
> >  include/hw/acpi/acpi-defs.h |   84 +++
> >  include/hw/arm/smmu.h   |   33 +
> >  include/hw/arm/virt.h   |2 +
> >

Re: [Qemu-devel] [PATCH V12 07/10] colo-compare: add TCP, UDP, ICMP packet comparison

2016-08-31 Thread Zhang Chen




On 08/31/2016 05:33 PM, Jason Wang wrote:



On 2016年08月17日 16:10, Zhang Chen wrote:

We add TCP,UDP,ICMP packet comparison to replace
IP packet comparison. This can increase the
accuracy of the package comparison.
Less checkpoint more efficiency.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/colo-compare.c | 152 
+++--

  trace-events   |   4 ++
  2 files changed, 152 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index b90cf1f..0daefd9 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -18,6 +18,7 @@
  #include "qapi/qmp/qerror.h"
  #include "qapi/error.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/vhost_net.h"
  #include "qom/object_interfaces.h"
  #include "qemu/iov.h"
@@ -179,9 +180,136 @@ static int colo_packet_compare(Packet *ppkt, 
Packet *spkt)

  }
  }
  -static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+/*
+ * called from the compare thread on the primary
+ * for compare tcp packet
+ * compare_tcp copied from Dr. David Alan Gilbert's branch
+ */
+static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
+{
+struct tcphdr *ptcp, *stcp;
+int res;
+char *sdebug, *ddebug;
+
+trace_colo_compare_main("compare tcp");
+if (ppkt->size != spkt->size) {
+if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+trace_colo_compare_main("pkt size not same");
+}
+return -1;
+}
+
+ptcp = (struct tcphdr *)ppkt->transport_header;
+stcp = (struct tcphdr *)spkt->transport_header;
+
+if (ptcp->th_seq != stcp->th_seq) {
+if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+trace_colo_compare_main("pkt tcp seq not same");
+}
+return -1;
+}
+
+/*
+ * The 'identification' field in the IP header is *very* random
+ * it almost never matches.  Fudge this by ignoring differences in
+ * unfragmented packets; they'll normally sort themselves out if 
different

+ * anyway, and it should recover at the TCP level.
+ * An alternative would be to get both the primary and secondary 
to rewrite

+ * somehow; but that would need some sync traffic to sync the state
+ */
+if (ntohs(ppkt->ip->ip_off) & IP_DF) {
+spkt->ip->ip_id = ppkt->ip->ip_id;
+/* and the sum will be different if the IDs were different */
+spkt->ip->ip_sum = ppkt->ip->ip_sum;
+}
+
+res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
+(spkt->size - ETH_HLEN));


This may work but I worry about whether or not tagged packet can work 
here. Looks like parse_packet_early() can recognize vlan tag, but 
fill_connection_key() can not. This looks can result queuing wrong 
packets into wrong connection.


Currently COLO proxy can't support vlan, we will add this feature in the 
future.





+
+if (res != 0 && 
trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {

+sdebug = strdup(inet_ntoa(ppkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst));
+fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u"
+" s: seq/ack=%u/%u res=%d flags=%x/%x\n", __func__,
+   sdebug, ddebug,
+   ntohl(ptcp->th_seq), ntohl(ptcp->th_ack),
+   ntohl(stcp->th_seq), ntohl(stcp->th_ack),
+   res, ptcp->th_flags, stcp->th_flags);


I tend not mix using debug logs with tracepoints.


OK, I will change trace_colo_compare_tcp_miscompare() to fprintf() here.

Thanks
Zhang Chen




+
+trace_colo_compare_tcp_miscompare("Primary len", ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", 
ppkt->size);

+trace_colo_compare_tcp_miscompare("Secondary len", spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare", 
spkt->size);

+
+g_free(sdebug);
+g_free(ddebug);
+}
+
+return res;
+}
+
+/*
+ * called from the compare thread on the primary
+ * for compare udp packet
+ */
+static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt)
+{
+int ret;
+
+trace_colo_compare_main("compare udp");
+ret = colo_packet_compare(ppkt, spkt);
+
+if (ret) {
+trace_colo_compare_udp_miscompare("primary pkt size", 
ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", 
ppkt->size);
+trace_colo_compare_udp_miscompare("Secondary pkt size", 
spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare", 
spkt->size);

+}
+
+return ret;
+}
+
+/*
+ * called from the compare thread on the primary
+ * for compare icmp packet
+ */
+static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt)
+{
+int network_length;
+
+trace_colo_compare_main("compare icmp");
+network_length =

Re: [Qemu-devel] [PATCH v2] scsi: check page count while initialising descriptor rings

2016-08-31 Thread P J P

  Hello Dmitry,

+-- On Wed, 31 Aug 2016, Dmitry Fleytman wrote --+
| > -if ((ri->reqRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES)
| > -|| (ri->cmpRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES)) {
| > -return -1;
| > -}
| 
| Hello Prasad,
| 
| Why did you decide to move this logic out of pvscsi_ring_init_data()?
| Why not just amend existing “if" as you did in v1 of this patch?

  'ri->reqRingNumPages' and 'ri->cmpRingNumPages' values are also used in 
routine 'pvscsi_dbg_dump_tx_rings_config' before 'pvscsi_ring_init_data' call. 
if they were to have arbitrary values, this loop could run longer leading to 
OOB memory access.

for (i = 0; i < rc->reqRingNumPages; i++) { 
trace_pvscsi_tx_rings_ppn("Request Ring", rc->reqRingPPNs[i]);  
}

Moving above logic to 'pvscsi_on_cmd_setup_rings' helps both functions.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F

Re: [Qemu-devel] [PATCH V12 06/10] colo-compare: introduce packet comparison thread

2016-08-31 Thread Zhang Chen




On 08/31/2016 05:13 PM, Jason Wang wrote:



On 2016年08月17日 16:10, Zhang Chen wrote:

If primary packet is same with secondary packet,
we will send primary packet and drop secondary
packet, otherwise notify COLO frame to do checkpoint.
If primary packet comes but secondary packet does not,
after REGULAR_PACKET_CHECK_MS milliseconds we set
the primary packet as old_packet,then do a checkpoint.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/colo-compare.c | 216 
+

  net/colo.c |   1 +
  net/colo.h |   3 +
  trace-events   |   2 +
  4 files changed, 222 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index bab215b..b90cf1f 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -36,6 +36,8 @@
#define COMPARE_READ_LEN_MAX NET_BUFSIZE
  #define MAX_QUEUE_SIZE 1024
+/* TODO: Should be configurable */
+#define REGULAR_PACKET_CHECK_MS 3000
/*
+ CompareState ++
@@ -79,6 +81,10 @@ typedef struct CompareState {
  GQueue conn_list;
  /* hashtable to save connection */
  GHashTable *connection_track_table;
+/* compare thread, a thread for each NIC */
+QemuThread thread;
+/* Timer used on the primary to find packets that are never 
matched */

+QEMUTimer *timer;
  } CompareState;
typedef struct CompareClass {
@@ -152,6 +158,113 @@ static int packet_enqueue(CompareState *s, int 
mode)

  return 0;
  }
  +/*
+ * The IP packets sent by primary and secondary
+ * will be compared in here
+ * TODO support ip fragment, Out-Of-Order
+ * return:0  means packet same
+ *> 0 || < 0 means packet different
+ */
+static int colo_packet_compare(Packet *ppkt, Packet *spkt)
+{
+trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
+ inet_ntoa(ppkt->ip->ip_dst), spkt->size,
+ inet_ntoa(spkt->ip->ip_src),
+ inet_ntoa(spkt->ip->ip_dst));
+
+if (ppkt->size == spkt->size) {
+return memcmp(ppkt->data, spkt->data, spkt->size);
+} else {
+return -1;
+}
+}
+
+static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+{
+trace_colo_compare_main("compare all");
+return colo_packet_compare(ppkt, spkt);
+}
+
+static void colo_old_packet_check_one(void *opaque_packet,
+  void *opaque_found)
+{
+int64_t now;
+bool *found_old = (bool *)opaque_found;
+Packet *ppkt = (Packet *)opaque_packet;
+
+if (*found_old) {
+/* Someone found an old packet earlier in the queue */
+return;
+}
+
+now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if ((now - ppkt->creation_ms) > REGULAR_PACKET_CHECK_MS) {
+ trace_colo_old_packet_check_found(ppkt->creation_ms);
+*found_old = true;
+}
+}
+
+static void colo_old_packet_check_one_conn(void *opaque,
+   void *user_data)
+{
+bool found_old = false;
+Connection *conn = opaque;
+
+g_queue_foreach(>primary_list, colo_old_packet_check_one,
+_old);


As I mentioned in last version, can we avoid iterating all packets by 
using g_queue_find_custom() here?


OK~~ I got it.




+if (found_old) {
+/* do checkpoint will flush old packet */
+/* TODO: colo_notify_checkpoint();*/
+}
+}
+
+/*
+ * Look for old packets that the secondary hasn't matched,
+ * if we have some then we have to checkpoint to wake
+ * the secondary up.
+ */
+static void colo_old_packet_check(void *opaque)
+{
+CompareState *s = opaque;
+
+g_queue_foreach(>conn_list, colo_old_packet_check_one_conn, 
NULL);

+}
+
+/*
+ * called from the compare thread on the primary
+ * for compare connection
+ */
+static void colo_compare_connection(void *opaque, void *user_data)
+{
+CompareState *s = user_data;
+Connection *conn = opaque;
+Packet *pkt = NULL;
+GList *result = NULL;
+int ret;
+
+while (!g_queue_is_empty(>primary_list) &&
+   !g_queue_is_empty(>secondary_list)) {
+pkt = g_queue_pop_tail(>primary_list);
+result = g_queue_find_custom(>secondary_list,
+  pkt, 
(GCompareFunc)colo_packet_compare_all);

+
+if (result) {
+ret = compare_chr_send(s->chr_out, pkt->data, pkt->size);
+if (ret < 0) {
+error_report("colo_send_primary_packet failed");
+}
+trace_colo_compare_main("packet same and release packet");
+g_queue_remove(>secondary_list, result->data);
+packet_destroy(pkt, NULL);
+} else {


Better add a comment to explain the case when secondary packet comes a 
little bit late here.


OK~~ I will add comments in next version.




+ trace_colo_compare_main("packet different");
+g_queue_push_tail(>primary_list, pkt);
+/* TODO:

Re: [Qemu-devel] [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-08-31 Thread Wanpeng Li

2016-08-08 14:35 GMT+08:00 Liang Li :
> This patch set contains two parts of changes to the virtio-balloon.
>
> One is the change for speeding up the inflating & deflating process,
> the main idea of this optimization is to use bitmap to send the page
> information to host instead of the PFNs, to reduce the overhead of
> virtio data transmission, address translation and madvise(). This can
> help to improve the performance by about 85%.
>
> Another change is for speeding up live migration. By skipping process
> guest's free pages in the first round of data copy, to reduce needless
> data processing, this can help to save quite a lot of CPU cycles and
> network bandwidth. We put guest's free page information in bitmap and
> send it to host with the virt queue of virtio-balloon. For an idle 8GB
> guest, this can help to shorten the total live migration time from 2Sec
> to about 500ms in the 10Gbps network environment.

I just read the slides of this feature for recent kvm forum, the cloud
providers more care about live migration downtime to avoid customers'
perception than total time, however, this feature will increase
downtime when acquire the benefit of reducing total time, maybe it
will be more acceptable if there is no downside for downtime.

Regards,
Wanpeng Li

Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Tian, Kevin

> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 11:49 PM
> 
> > >
> > > IGD doesn't have such peer-to-peer resource setup requirement. So
> > > it's sufficient to create/destroy a mdev instance in a single action on
> > > IGD. However I'd expect we still keep the "start/stop" interface (
> > > maybe not exposed as sysfs node, instead being a VFIO API), as
> > > required to support future live migration usage. We've made prototype
> > > working for KVMGT today.
> 
> Great!
> 

btw here is a link to KVMGT live migration demo:

https://www.youtube.com/watch?v=y2SkU5JODIY

Thanks
Kevin

Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Tian, Kevin

> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 11:49 PM
> 
> On Wed, 31 Aug 2016 15:04:13 +0800
> Jike Song  wrote:
> 
> > On 08/31/2016 02:12 PM, Tian, Kevin wrote:
> > >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> > >> Sent: Wednesday, August 31, 2016 12:17 AM
> > >>
> > >> Hi folks,
> > >>
> > >> At KVM Forum we had a BoF session primarily around the mediated device
> > >> sysfs interface.  I'd like to share what I think we agreed on and the
> > >> "problem areas" that still need some work so we can get the thoughts
> > >> and ideas from those who weren't able to attend.
> > >>
> > >> DanPB expressed some concern about the mdev_supported_types sysfs
> > >> interface, which exposes a flat csv file with fields like "type",
> > >> "number of instance", "vendor string", and then a bunch of type
> > >> specific fields like "framebuffer size", "resolution", "frame rate
> > >> limit", etc.  This is not entirely machine parsing friendly and sort of
> > >> abuses the sysfs concept of one value per file.  Example output taken
> > >> from Neo's libvirt RFC:
> > >>
> > >> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> > >> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> > >> framebuffer,
> > >> max_resolution
> > >> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> > >> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> > >> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> > >> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> > >> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> > >> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> > >> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> > >> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> > >>
> > >> The create/destroy then looks like this:
> > >>
> > >> echo "$mdev_UUID:vendor_specific_argument_list" >
> > >>  /sys/bus/pci/devices/.../mdev_create
> > >>
> > >> echo "$mdev_UUID:vendor_specific_argument_list" >
> > >>  /sys/bus/pci/devices/.../mdev_destroy
> > >>
> > >> "vendor_specific_argument_list" is nebulous.
> > >>
> > >> So the idea to fix this is to explode this into a directory structure,
> > >> something like:
> > >>
> > >> ├── mdev_destroy
> > >> └── mdev_supported_types
> > >> ├── 11
> > >> │   ├── create
> > >> │   ├── description
> > >> │   └── max_instances
> > >> ├── 12
> > >> │   ├── create
> > >> │   ├── description
> > >> │   └── max_instances
> > >> └── 13
> > >> ├── create
> > >> ├── description
> > >> └── max_instances
> > >>
> > >> Note that I'm only exposing the minimal attributes here for simplicity,
> > >> the other attributes would be included in separate files and we would
> > >> require vendors to create standard attributes for common device classes.
> > >
> > > I like this idea. All standard attributes are reflected into this 
> > > hierarchy.
> > > In the meantime, can we still allow optional vendor string in create
> > > interface? libvirt doesn't need to know the meaning, but allows upper
> > > layer to do some vendor specific tweak if necessary.
> > >
> >
> > Not sure whether this can done within MDEV framework (attrs provided by
> > vendor driver of course), or must be within the vendor driver.
> 
> The purpose of the sub-directories is that libvirt doesn't need to pass
> arbitrary, vendor strings to the create function, the attributes of the
> mdev device created are defined by the attributes in the sysfs
> directory where the create is done.  The user only provides a uuid for
> the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> need to know the meaning, but would need to know when to apply them,
> which is just as bad.  Ultimately we want libvirt to be able to
> interact with sysfs without having an vendor specific knowledge.

Understand. Today Intel doesn't have such vendor specific parameter
requirement when creating a mdev instance (assuming type definition
is enough to cover our existing parameters).

Just think about future extensibility. Say if a new parameter (say
a QoS parameter like weight or cap) must be statically set before 
created mdev instance starts to work, due to device limitation, such
parameter needs to be exposed as a new attribute under the specific 
mdev instance, e.g.:
/sys/bus/pci/devices//mdev/weight

Then libvirt needs to make sure it's set before open() the instance.

If such flow is acceptable, it should remove necessity of vendor specific
parameter at the create, because any such requirement should be 
converted into sysfs node, if applicable to all vendors, then libvirt
can do asynchronous configurations before starting the instance.

> 
> > >>
> > >> For vGPUs like NVIDIA where we

Re: [Qemu-devel] [PATCH for 2.8 10/11] Revert "intel_iommu: Throw hw_error on notify_started"

2016-08-31 Thread Peter Xu

On Wed, Aug 31, 2016 at 08:43:42PM -0600, Alex Williamson wrote:
> > > >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> > > >>device IOTLB API will get notified and send invalidation request to
> > > >>vhost through this notifier.  
> > > >AFAICT this series does not address the original problem for which
> > > >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> > > >use case of a device iotlb firing the iommu notifier therefore this
> > > >change is a regression versus 2.7 since it allows invalid
> > > >configurations with a physical iommu which will never receive the
> > > >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> > > >
> > > >Alex  
> > > 
> > > Looking at vfio, it cares about map but vhost only cares about IOTLB
> > > invalidation. Then I think we probably need another kind of notifier in 
> > > this
> > > case to avoid this.  
> > 
> > Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
> > invalidation? If so, we can use the same IOTLB interface as before.
> > IMHO these two interfaces are not conflicting?
> > 
> > Alex,
> > 
> > Do you mean we should still disallow user from passing through devices
> > while Intel IOMMU enabled? If so, not sure whether patch below can
> > solve the issue.
> > 
> > It seems that we need a "name" for either IOMMU notifier
> > provider/consumer, and we should not allow (provider==Intel &&
> > consumer==VFIO) happen. In the following case, I added a name for
> > provider, and VFIO checks it.
> 
> Absolutely not, intel-iommu emulation is simply incomplete, the IOMMU
> notifier is never called for mappings.  There's a whole aspect of
> iommu notifiers that intel-iommu simply hasn't bothered to implement.
> Don't punish vfio for actually making use of the interface as it was
> intended to be used.  AFAICT you're implementing the unmap/invalidation
> half, without the actual mapping half of the interface.  It's broken
> and incompatible with any iommu notifiers that expect to see both
> sides.  Thanks,

Yeah I think I got your point. Thanks for the explanation.

Now I agree with Jason that we may need another notifier mechanism.

-- peterx

Re: [Qemu-devel] [PATCH COLO-Frame v19 00/22] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-08-31 Thread no-reply

Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Subject: [Qemu-devel] [PATCH COLO-Frame v19 00/22] COarse-grain 
LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Type: series
Message-id: 1472700265-16760-1-git-send-email-zhang.zhanghaili...@huawei.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
make J=8 docker-test-quick@centos6
make J=8 docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
42c65a0 configure: Support enable/disable COLO feature
4e337ab docs: Add documentation for COLO feature
e54092d COLO: Add block replication into colo process
d58788e COLO: Update the global runstate after going into colo state
d16cb12 COLO: Handle shutdown command for VM in COLO state
f310213 COLO: Don't do failover while loading VM's state
fb716cc COLO: Shutdown related socket fd while do failover
6a50027 COLO: Implement failover work for secondary VM
ecc5fd3 COLO: Implement the process of failover for primary VM
ac596c7 COLO: Introduce state to record failover process
4da078e COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
4b1c6e6 COLO: Synchronize PVM's state to SVM periodically
c29d4f3 COLO: Add checkpoint-delay parameter for migrate-set-parameters
672e55a COLO: Load VMState into QIOChannelBuffer before restore it
ec8130b COLO: Send PVM state to secondary side when do checkpoint
ca88227 COLO: Add a new RunState RUN_STATE_COLO
a3c6044 COLO: Introduce checkpointing protocol
4772fcd COLO: Establish a new communicating path for COLO
05e2b8c migration: Switch to COLO process after finishing loadvm
b970b41 migration: Enter into COLO mode after migration if COLO is enabled
4c33192 COLO: migrate COLO related info to secondary node
d596fff migration: Introduce capability 'x-colo' to migration

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD centos6
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPY RUNNER
  RUN test-quick in centos6
No C++ compiler available; disabling C++ specific optional code
Install prefix/tmp/qemu-test/src/tests/docker/install
BIOS directory/tmp/qemu-test/src/tests/docker/install/share/qemu
binary directory  /tmp/qemu-test/src/tests/docker/install/bin
library directory /tmp/qemu-test/src/tests/docker/install/lib
module directory  /tmp/qemu-test/src/tests/docker/install/lib/qemu
libexec directory /tmp/qemu-test/src/tests/docker/install/libexec
include directory /tmp/qemu-test/src/tests/docker/install/include
config directory  /tmp/qemu-test/src/tests/docker/install/etc
local state directory   /tmp/qemu-test/src/tests/docker/install/var
Manual directory  /tmp/qemu-test/src/tests/docker/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-fPIE -DPIE -m64 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
COLO support  yes
RDMA support

Re: [Qemu-devel] [PATCH COLO-Frame v18 01/34] configure: Add parameter for configure to enable/disable COLO support

2016-08-31 Thread Hailiang Zhang


Hi Amit，

On 2016/8/26 5:45, Amit Shah wrote:

On (Wed) 03 Aug 2016 [20:25:39], zhanghailiang wrote:

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is On by default.


Can you please make this the last patch in the series - so we get the
code in before we add in the config option?  Better for bisection, but
also for a logical flow.



I have moved it to the end of the series, please see the new version.

Thanks.
Hailiang


Thanks,

 Amit

.

[Qemu-devel] [PATCH COLO-Frame v19 03/22] migration: Enter into COLO mode after migration if COLO is enabled

2016-08-31 Thread zhanghailiang

Add a new migration state: MIGRATION_STATUS_COLO. Migration source side
enters this state after the first live migration successfully finished
if COLO is enabled by command 'migrate_set_capability x-colo on'.

We reuse migration thread, so the process of checkpointing will be handled
in migration thread.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix title to make it more exact.
v11:
- Rebase to master
- Add Reviewed-by tag
v10:
- Simplify process by dropping colo thread and reusing migration thread.
 (Dave's suggestion)
---
 include/migration/colo.h |  3 +++
 migration/colo.c | 31 +++
 migration/migration.c| 32 
 migration/trace-events   |  3 +++
 qapi-schema.json |  4 +++-
 stubs/migration-colo.c   |  9 +
 6 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 1c899a0..bf84b99 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -19,4 +19,7 @@
 bool colo_supported(void);
 void colo_info_init(void);
 
+void migrate_start_colo_process(MigrationState *s);
+bool migration_in_colo_state(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index d215057..fd3ceeb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -11,9 +11,40 @@
  */
 
 #include "qemu/osdep.h"
+#include "sysemu/sysemu.h"
 #include "migration/colo.h"
+#include "trace.h"
 
 bool colo_supported(void)
 {
 return false;
 }
+
+bool migration_in_colo_state(void)
+{
+MigrationState *s = migrate_get_current();
+
+return (s->state == MIGRATION_STATUS_COLO);
+}
+
+static void colo_process_checkpoint(MigrationState *s)
+{
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
+
+/* TODO: COLO checkpoint savevm loop */
+
+migrate_set_state(>state, MIGRATION_STATUS_COLO,
+  MIGRATION_STATUS_COMPLETED);
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+qemu_mutex_unlock_iothread();
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_COLO);
+colo_process_checkpoint(s);
+qemu_mutex_lock_iothread();
+}
diff --git a/migration/migration.c b/migration/migration.c
index 17f0f75..4a5bdb9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -690,6 +690,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 get_xbzrle_cache_stats(info);
 break;
+case MIGRATION_STATUS_COLO:
+info->has_status = true;
+/* TODO: display COLO specific information (checkpoint info etc.) */
+break;
 case MIGRATION_STATUS_COMPLETED:
 get_xbzrle_cache_stats(info);
 
@@ -1094,7 +1098,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.shared = has_inc && inc;
 
 if (migration_is_setup_or_active(s->state) ||
-s->state == MIGRATION_STATUS_CANCELLING) {
+s->state == MIGRATION_STATUS_CANCELLING ||
+s->state == MIGRATION_STATUS_COLO) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -1686,8 +1691,11 @@ static void migration_completion(MigrationState *s, int 
current_active_state,
 goto fail_invalidate;
 }
 
-migrate_set_state(>state, current_active_state,
-  MIGRATION_STATUS_COMPLETED);
+if (!migrate_colo_enabled()) {
+migrate_set_state(>state, current_active_state,
+  MIGRATION_STATUS_COMPLETED);
+}
+
 return;
 
 fail_invalidate:
@@ -1732,6 +1740,7 @@ static void *migration_thread(void *opaque)
 bool entered_postcopy = false;
 /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
 enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
+bool enable_colo = migrate_colo_enabled();
 
 rcu_register_thread();
 
@@ -1840,7 +1849,13 @@ static void *migration_thread(void *opaque)
 end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
 qemu_mutex_lock_iothread();
-qemu_savevm_state_cleanup();
+/*
+ * The resource has been allocated by migration will be reused in COLO
+ * process, so don't release them.
+ */
+if (!enable_colo) {
+qemu_savevm_state_cleanup();
+}
 if (s->state == MIGRATION_STATUS_COMPLETED) {
 uint64_t transferred_bytes = qemu_ftell(s->to_dst_file);
 s->total_time = end_time - s->total_time;
@@ -1853,6 +1868,15 @@ static void *migration_thread(void *opaque)
 }
 runstate_set(RUN_STATE_POSTMIGRATE);
 } else {
+if (s->state == MIGRATION_STATUS_ACTIVE && enable_colo) {
+migrate_start_colo_process(s);
+qemu_savevm_state_cleanup();
+/*
+

[Qemu-devel] [PATCH COLO-Frame v19 09/22] COLO: Load VMState into QIOChannelBuffer before restore it

2016-08-31 Thread zhanghailiang

We should not destroy the state of SVM (Secondary VM) until we receive
the complete data of PVM's state, in case the primary fails in the process
of sending the state, so we cache the VM's state in secondary side before
load it into SVM.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
v19:
- fix title and comments
v17:
- Replace the old buffer API with the new channel buffer API.
v16:
- Rename colo_get_cmd_value() to colo_receive_mesage_value();
v13:
- Fix the define of colo_get_cmd_value() to use 'Error **errp' instead of
  return value.
v12:
- Use the new helper colo_get_cmd_value() instead of colo_ctl_get()
---
 migration/colo.c | 67 ++--
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d8ac34d..9a98caa 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -115,6 +115,28 @@ static void colo_receive_check_message(QEMUFile *f, 
COLOMessage expect_msg,
 }
 }
 
+static uint64_t colo_receive_message_value(QEMUFile *f, uint32_t expect_msg,
+   Error **errp)
+{
+Error *local_err = NULL;
+uint64_t value;
+int ret;
+
+colo_receive_check_message(f, expect_msg, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return 0;
+}
+
+value = qemu_get_be64(f);
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to get value for COLO message: 
%s",
+ COLOMessage_lookup[expect_msg]);
+}
+return value;
+}
+
 static int colo_do_checkpoint_transaction(MigrationState *s,
   QIOChannelBuffer *bioc,
   QEMUFile *fb)
@@ -286,6 +308,10 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 void *colo_process_incoming_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
+QEMUFile *fb = NULL;
+QIOChannelBuffer *bioc = NULL; /* Cache incoming device state */
+uint64_t total_size;
+uint64_t value;
 Error *local_err = NULL;
 
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
@@ -303,6 +329,10 @@ void *colo_process_incoming_thread(void *opaque)
  */
 qemu_file_set_blocking(mis->from_src_file, true);
 
+bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
+fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
+object_unref(OBJECT(bioc));
+
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
   _err);
 if (local_err) {
@@ -330,7 +360,29 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-/* TODO: read migration data into colo buffer */
+value = colo_receive_message_value(mis->from_src_file,
+ COLO_MESSAGE_VMSTATE_SIZE, _err);
+if (local_err) {
+goto out;
+}
+
+/*
+ * Read VM device state data into channel buffer,
+ * It's better to re-use the memory allocated.
+ * Here we need to handle the channel buffer directly.
+ */
+if (value > bioc->capacity) {
+bioc->capacity = value;
+bioc->data = g_realloc(bioc->data, bioc->capacity);
+}
+total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
+if (total_size != value) {
+error_report("Got %lu VMState data, less than expected %lu",
+ total_size, value);
+goto out;
+}
+bioc->usage = total_size;
+qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
 
 colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
  _err);
@@ -338,7 +390,14 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-/* TODO: load vm state */
+qemu_mutex_lock_iothread();
+qemu_system_reset(VMRESET_SILENT);
+if (qemu_loadvm_state(fb) < 0) {
+error_report("COLO: loadvm failed");
+qemu_mutex_unlock_iothread();
+goto out;
+}
+qemu_mutex_unlock_iothread();
 
 colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
  _err);
@@ -353,6 +412,10 @@ out:
 error_report_err(local_err);
 }
 
+if (fb) {
+qemu_fclose(fb);
+}
+
 if (mis->to_src_file) {
 qemu_fclose(mis->to_src_file);
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 16/22] COLO: Shutdown related socket fd while do failover

2016-08-31 Thread zhanghailiang

If the net connection between primary host and secondary host
is broken while COLO/COLO incoming thread is blocked in read()/write()
socket fd.
It will be a long time to detect this error until connection is timeout.

Here we shutdown all the related socket file descriptors to wake up the
blocking operation in failover BH. Besides, we should close the corresponding
file descriptors after failvoer BH shutdown them, or there will be an error.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
v19:
- fix the title
v17:
- Rename colo_sem to colo_exit_sem.
v13:
- Add Reviewed-by tag
- Use semaphore to notify colo/colo incoming loop that
  failover work is finished.
v12:
- Shutdown both QEMUFile's fd though they may use the
  same fd. (Dave's suggestion)
v11:
- Only shutdown fd for once
---
 include/migration/migration.h |  3 +++
 migration/colo.c  | 43 +++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f4b215a..9406218 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -113,6 +113,7 @@ struct MigrationIncomingState {
 QemuThread colo_incoming_thread;
 /* The coroutine we should enter (back) after failover */
 Coroutine *migration_incoming_co;
+QemuSemaphore colo_incoming_sem;
 
 /* See savevm.c */
 LoadStateEntry_Head loadvm_handlers;
@@ -183,6 +184,8 @@ struct MigrationState
 QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
 /* The RAMBlock used in the last src_page_request */
 RAMBlock *last_req_rb;
+/* The semaphore is used to notify COLO thread that failover is finished */
+QemuSemaphore colo_exit_sem;
 
 /* The last error that occurred */
 Error *error;
diff --git a/migration/colo.c b/migration/colo.c
index f1fb2ef..fc89438 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
 /* recover runstate to normal migration finish state */
 autostart = true;
 }
+/*
+ * Make sure COLO incoming thread not block in recv or send,
+ * If mis->from_src_file and mis->to_src_file use the same fd,
+ * The second shutdown() will return -1, we ignore this value,
+ * It is harmless.
+ */
+if (mis->from_src_file) {
+qemu_file_shutdown(mis->from_src_file);
+}
+if (mis->to_src_file) {
+qemu_file_shutdown(mis->to_src_file);
+}
 
 old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
  "secondary VM", old_state);
 return;
 }
+/* Notify COLO incoming thread that failover work is finished */
+qemu_sem_post(>colo_incoming_sem);
 /* For Secondary VM, jump to incoming co */
 if (mis->migration_incoming_co) {
 qemu_coroutine_enter(mis->migration_incoming_co);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+/*
+ * Wake up COLO thread which may blocked in recv() or send(),
+ * The s->rp_state.from_dst_file and s->to_dst_file may use the
+ * same fd, but we still shutdown the fd for twice, it is harmless.
+ */
+if (s->to_dst_file) {
+qemu_file_shutdown(s->to_dst_file);
+}
+if (s->rp_state.from_dst_file) {
+qemu_file_shutdown(s->rp_state.from_dst_file);
+}
+
 old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
FAILOVER_STATUS_COMPLETED);
 if (old_state != FAILOVER_STATUS_HANDLING) {
@@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
  old_state);
 return;
 }
+/* Notify COLO thread that failover work is finished */
+qemu_sem_post(>colo_exit_sem);
 }
 
 void colo_do_failover(MigrationState *s)
@@ -362,6 +390,14 @@ out:
 
 qemu_fclose(fb);
 
+/* Hope this not to be too long to wait here */
+qemu_sem_wait(>colo_exit_sem);
+qemu_sem_destroy(>colo_exit_sem);
+/*
+ * Must be called after failover BH is completed,
+ * Or the failover BH may shutdown the wrong fd that
+ * re-used by other threads after we release here.
+ */
 if (s->rp_state.from_dst_file) {
 qemu_fclose(s->rp_state.from_dst_file);
 }
@@ -370,6 +406,7 @@ out:
 void migrate_start_colo_process(MigrationState *s)
 {
 qemu_mutex_unlock_iothread();
+qemu_sem_init(>colo_exit_sem, 0);
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 colo_process_checkpoint(s);
@@ -408,6 +445,8 @@ void

[Qemu-devel] [PATCH COLO-Frame v19 01/22] migration: Introduce capability 'x-colo' to migration

2016-08-31 Thread zhanghailiang

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Eric Blake 
---
v16:
- fix compile broken due to missing osdep.h
v14:
- Fix the date of Copyright to 2016
v10:
- Rename capability 'colo' to experimental 'x-colo' (Eric's suggestion).
- Rename migrate_enable_colo() to migrate_colo_enabled() (Eric's suggestion).
---
 include/migration/colo.h  | 20 
 include/migration/migration.h |  1 +
 migration/Makefile.objs   |  1 +
 migration/colo.c  | 19 +++
 migration/migration.c | 18 ++
 qapi-schema.json  |  6 +-
 qmp-commands.hx   |  2 +-
 stubs/Makefile.objs   |  1 +
 stubs/migration-colo.c| 19 +++
 9 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
new file mode 100644
index 000..59a632a
--- /dev/null
+++ b/include/migration/colo.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_COLO_H
+#define QEMU_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3c96623..5effc05 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -301,6 +301,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_colo_enabled(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 30ad945..cff96f0 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += tls.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 000..d215057
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,19 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+return false;
+}
diff --git a/migration/migration.c b/migration/migration.c
index 955d5ee..17f0f75 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -36,6 +36,7 @@
 #include "exec/address-spaces.h"
 #include "io/channel-buffer.h"
 #include "io/channel-tls.h"
+#include "migration/colo.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -537,6 +538,9 @@ MigrationCapabilityStatusList 
*qmp_query_migrate_capabilities(Error **errp)
 
 caps = NULL; /* silence compiler warning */
 for (i = 0; i < MIGRATION_CAPABILITY__MAX; i++) {
+if (i == MIGRATION_CAPABILITY_X_COLO && !colo_supported()) {
+continue;
+}
 if (head == NULL) {
 head = g_malloc0(sizeof(*caps));
 caps = head;
@@ -728,6 +732,14 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 }
 
 for (cap = params; cap; cap = cap->next) {
+if (cap->value->capability == MIGRATION_CAPABILITY_X_COLO) {
+if (!colo_supported()) {
+error_setg(errp, "COLO is not currently supported, please"
+ " configure with --enable-colo option in order to"
+ " support COLO feature");
+continue;
+}
+}

Re: [Qemu-devel] [PATCH for 2.8 11/11] vhost_net: device IOTLB support

2016-08-31 Thread Peter Xu

On Tue, Aug 30, 2016 at 11:06:59AM +0800, Jason Wang wrote:
> This patches implements Device IOTLB support for vhost kernel. This is
> done through:
> 
> 1) switch to use dma helpers when map/unmap vrings from vhost codes
> 2) kernel support for Device IOTLB API:
> 
> - allow vhost-net to query the IOMMU IOTLB entry through eventfd
> - enable the ability for qemu to update a specified mapping of vhost
> - through ioctl.
> - enable the ability to invalidate a specified range of iova for the
>   device IOTLB of vhost through ioctl. In x86/intel_iommu case this is
>   triggered through iommu memory region notifier from device IOTLB
>   invalidation descriptor processing routine.
> 
> With all the above, kernel vhost_net can co-operate with IOMMU.
> 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> ---
>  hw/virtio/vhost-backend.c | 104 ++
>  hw/virtio/vhost.c | 149 
> --
>  include/hw/virtio/vhost-backend.h |  14 
>  include/hw/virtio/vhost.h |   4 +
>  include/hw/virtio/virtio-access.h |  44 ++-
>  net/tap.c |   1 +
>  6 files changed, 291 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 7681f15..a5754f3 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -172,6 +172,107 @@ static int vhost_kernel_get_vq_index(struct vhost_dev 
> *dev, int idx)
>  return idx - dev->vq_index;
>  }
>  
> +static void vhost_kernel_iotlb_read(void *opaque)
> +{
> +struct vhost_dev *dev = opaque;
> +struct vhost_msg msg;
> +ssize_t len;
> +
> +while((len = read((uintptr_t)dev->opaque, , sizeof msg)) > 0) {
> +struct vhost_iotlb_msg *imsg = 
> +if (len < sizeof msg) {
> +error_report("Wrong vhost message len: %d", (int)len);
> +break;
> +}
> +if (msg.type != VHOST_IOTLB_MSG) {
> +error_report("Unknown vhost iotlb message type");
> +break;
> +}
> +switch (imsg->type) {
> +case VHOST_IOTLB_MISS:
> +vhost_device_iotlb_miss(dev, imsg->iova,
> +imsg->perm != VHOST_ACCESS_RO);
> +break;
> +case VHOST_IOTLB_UPDATE:
> +case VHOST_IOTLB_INVALIDATE:
> +error_report("Unexpected IOTLB message type");
> +break;
> +case VHOST_IOTLB_ACCESS_FAIL:
> +/* FIXME: report device iotlb error */
> +break;
> +default:
> +break;
> +}
> +}
> +}
> +
> +static int vhost_kernel_update_device_iotlb(struct vhost_dev *dev,
> +uint64_t iova, uint64_t uaddr,
> +uint64_t len,
> +IOMMUAccessFlags perm)
> +{
> +struct vhost_msg msg = {
> +.type = VHOST_IOTLB_MSG,
> +.iotlb = {
> +.iova = iova,
> +.uaddr = uaddr,
> +.size = len,
> +.type = VHOST_IOTLB_UPDATE,
> +}
> +};
> +
> +switch (perm) {
> +case IOMMU_RO:
> +msg.iotlb.perm = VHOST_ACCESS_RO;
> +break;
> +case IOMMU_WO:
> +msg.iotlb.perm = VHOST_ACCESS_WO;
> +break;
> +case IOMMU_RW:
> +msg.iotlb.perm = VHOST_ACCESS_RW;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +
> +if (write((uintptr_t)dev->opaque, , sizeof msg) != sizeof msg) {
> +error_report("Fail to update device iotlb");
> +return -EFAULT;
> +}
> +
> +return 0;
> +}
> +
> +static int vhost_kernel_invalidate_device_iotlb(struct vhost_dev *dev,
> +uint64_t iova, uint64_t len)
> +{
> +struct vhost_msg msg = {
> +.type = VHOST_IOTLB_MSG,
> +.iotlb = {
> +.iova = iova,
> +.size = len,
> +.type = VHOST_IOTLB_INVALIDATE,
> +}
> +};
> +
> +if (write((uintptr_t)dev->opaque, , sizeof msg) != sizeof msg) {
> +error_report("Fail to invalidate device iotlb");
> +return -EFAULT;
> +}
> +
> +return 0;
> +}
> +
> +static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> +   int enabled)
> +{
> +if (enabled)
> +qemu_set_fd_handler((uintptr_t)dev->opaque,
> +vhost_kernel_iotlb_read, NULL, dev);
> +else
> +qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> +}
> +
>  static const VhostOps kernel_ops = {
>  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
>  .vhost_backend_init = vhost_kernel_init,
> @@ -197,6 +298,9 @@ static const VhostOps kernel_ops = {
>  .vhost_set_owner = vhost_kernel_set_owner,
>  .vhost_reset_device =

[Qemu-devel] [PATCH COLO-Frame v19 08/22] COLO: Send PVM state to secondary side when do checkpoint

2016-08-31 Thread zhanghailiang

VM checkpointing is to synchronize the state of PVM to SVM, just
like migration does, we re-use save helpers to achieve migrating
PVM's state to Secondary side.

COLO need to cache the data of VM's state in the secondary side before
synchronize it to SVM. COLO need the size of the data to determine
how much data should be read in the secondary side.
So here, we can get the size of the data by saving it into I/O channel
before send it to the secondary side.

Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
v19:
- fix title and comment.
v17:
- Rebase to master, use the new channel-buffer API
v16:
- Rename colo_put_cmd_value() to colo_send_message_value()
v13:
- Refactor colo_put_cmd_value() to use 'Error **errp' to indicate success
  or failure.
v12:
- Replace the old colo_ctl_get() with the new helper function 
colo_put_cmd_value()
v11:
- Add Reviewed-by tag
---
 migration/colo.c | 83 ++--
 migration/ram.c  | 37 ++---
 2 files changed, 102 insertions(+), 18 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index bf32d63..d8ac34d 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,10 +13,13 @@
 #include "qemu/osdep.h"
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
+#include "io/channel-buffer.h"
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 
+#define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+
 bool colo_supported(void)
 {
 return false;
@@ -55,6 +58,27 @@ static void colo_send_message(QEMUFile *f, COLOMessage msg,
 trace_colo_send_message(COLOMessage_lookup[msg]);
 }
 
+static void colo_send_message_value(QEMUFile *f, COLOMessage msg,
+uint64_t value, Error **errp)
+{
+Error *local_err = NULL;
+int ret;
+
+colo_send_message(f, msg, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+qemu_put_be64(f, value);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to send value for message:%s",
+ COLOMessage_lookup[msg]);
+}
+}
+
 static COLOMessage colo_receive_message(QEMUFile *f, Error **errp)
 {
 COLOMessage msg;
@@ -91,9 +115,12 @@ static void colo_receive_check_message(QEMUFile *f, 
COLOMessage expect_msg,
 }
 }
 
-static int colo_do_checkpoint_transaction(MigrationState *s)
+static int colo_do_checkpoint_transaction(MigrationState *s,
+  QIOChannelBuffer *bioc,
+  QEMUFile *fb)
 {
 Error *local_err = NULL;
+int ret = -1;
 
 colo_send_message(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
   _err);
@@ -106,15 +133,46 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s)
 if (local_err) {
 goto out;
 }
+/* Reset channel-buffer directly */
+qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
+bioc->usage = 0;
+
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("run", "stop");
 
-/* TODO: suspend and save vm state to colo buffer */
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_header(fb);
+qemu_savevm_state_begin(fb, >params);
+qemu_mutex_lock_iothread();
+qemu_savevm_state_complete_precopy(fb, false);
+qemu_mutex_unlock_iothread();
+
+qemu_fflush(fb);
 
 colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
 if (local_err) {
 goto out;
 }
+/*
+ * We need the size of the VMstate data in Secondary side,
+ * With which we can decide how much data should be read.
+ */
+colo_send_message_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
+bioc->usage, _err);
+if (local_err) {
+goto out;
+}
 
-/* TODO: send vmstate to Secondary */
+qemu_put_buffer(s->to_dst_file, bioc->data, bioc->usage);
+qemu_fflush(s->to_dst_file);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+goto out;
+}
 
 colo_receive_check_message(s->rp_state.from_dst_file,
COLO_MESSAGE_VMSTATE_RECEIVED, _err);
@@ -128,18 +186,24 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s)
 goto out;
 }
 
-/* TODO: resume Primary */
+ret = 0;
+
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
 
-return 0;
 out:
 if (local_err) {
 error_report_err(local_err);
 }
-return

[Qemu-devel] [PATCH COLO-Frame v19 12/22] COLO: Add 'x-colo-lost-heartbeat' command to trigger failover

2016-08-31 Thread zhanghailiang

We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.

For example, if the command is sent to the PVM, the Primary side will
exit COLO mode and take over operation. If sent to the Secondary, the
secondary will run failover work, then take over server operation to
become the new Primary.

Cc: Luiz Capitulino 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- Fix title and comment
v16:
- Fix compile broken due to missing osdep.h
v13:
- Add Reviewed-by tag
v11:
- Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
- Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
v10:
- Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
---
 hmp-commands.hx  | 15 +++
 hmp.c|  8 
 hmp.h|  1 +
 include/migration/colo.h |  3 +++
 include/migration/failover.h | 20 
 migration/Makefile.objs  |  2 +-
 migration/colo-comm.c| 11 +++
 migration/colo-failover.c| 42 ++
 migration/colo.c |  1 +
 qapi-schema.json | 29 +
 qmp-commands.hx  | 19 +++
 stubs/migration-colo.c   |  8 
 12 files changed, 158 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 848efee..c2f1ab0 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1040,6 +1040,21 @@ migration (or once already in postcopy).
 ETEXI
 
 {
+.name   = "x_colo_lost_heartbeat",
+.args_type  = "",
+.params = "",
+.help   = "Tell COLO that heartbeat is lost,\n\t\t\t"
+  "a failover or takeover is needed.",
+.mhandler.cmd = hmp_x_colo_lost_heartbeat,
+},
+
+STEXI
+@item x_colo_lost_heartbeat
+@findex x_colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+{
 .name   = "client_migrate_info",
 .args_type  = 
"protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
 .params = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 38b4a51..16c5fa1 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1354,6 +1354,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const 
QDict *qdict)
 hmp_handle_error(mon, );
 }
 
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+
+qmp_x_colo_lost_heartbeat();
+hmp_handle_error(mon, );
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 0876ec0..0457c8a 100644
--- a/hmp.h
+++ b/hmp.h
@@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
 void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/colo.h b/include/migration/colo.h
index b40676c..e9ac2c3 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 bool colo_supported(void);
 void colo_info_init(void);
@@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
 void migration_incoming_exit_colo(void);
 void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
+
+COLOMode get_colo_mode(void);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
new file mode 100644
index 000..3274735
--- /dev/null
+++ b/include/migration/failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+

[Qemu-devel] [PATCH COLO-Frame v19 13/22] COLO: Introduce state to record failover process

2016-08-31 Thread zhanghailiang

When handling failover, COLO processes differently according to
the different stage of failover process, here we introduce a global
atomic variable to record the status of failover.

We add four failover status to indicate the different stage of failover process.
You should use the helpers to get and set the value.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix comments
v11:
- fix several typos found by Dave
- Add Reviewed-by tag
---
 include/migration/failover.h | 10 ++
 migration/colo-failover.c| 37 +
 migration/colo.c |  4 
 migration/trace-events   |  1 +
 4 files changed, 52 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index 3274735..fe71bb4 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -15,6 +15,16 @@
 
 #include "qemu-common.h"
 
+typedef enum COLOFailoverStatus {
+FAILOVER_STATUS_NONE = 0,
+FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
+FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
+FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+} COLOFailoverStatus;
+
+void failover_init_state(void);
+int failover_set_state(int old_state, int new_state);
+int failover_get_state(void);
 void failover_request_active(Error **errp);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index e31fc10..82196b2 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -15,22 +15,59 @@
 #include "migration/failover.h"
 #include "qmp-commands.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "trace.h"
 
 static QEMUBH *failover_bh;
+static COLOFailoverStatus failover_state;
 
 static void colo_failover_bh(void *opaque)
 {
+int old_state;
+
 qemu_bh_delete(failover_bh);
 failover_bh = NULL;
+old_state = failover_set_state(FAILOVER_STATUS_REQUEST,
+   FAILOVER_STATUS_HANDLING);
+if (old_state != FAILOVER_STATUS_REQUEST) {
+error_report("Unknown error for failover, old_state = %d", old_state);
+return;
+}
 /* TODO: Do failover work */
 }
 
 void failover_request_active(Error **errp)
 {
+   if (failover_set_state(FAILOVER_STATUS_NONE, FAILOVER_STATUS_REQUEST)
+ != FAILOVER_STATUS_NONE) {
+error_setg(errp, "COLO failover is already actived");
+return;
+}
 failover_bh = qemu_bh_new(colo_failover_bh, NULL);
 qemu_bh_schedule(failover_bh);
 }
 
+void failover_init_state(void)
+{
+failover_state = FAILOVER_STATUS_NONE;
+}
+
+int failover_set_state(int old_state, int new_state)
+{
+int old;
+
+old = atomic_cmpxchg(_state, old_state, new_state);
+if (old == old_state) {
+trace_colo_failover_set_state(new_state);
+}
+return old;
+}
+
+int failover_get_state(void)
+{
+return atomic_read(_state);
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
 if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index 31b3029..b94972c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -232,6 +232,8 @@ static void colo_process_checkpoint(MigrationState *s)
 Error *local_err = NULL;
 int ret;
 
+failover_init_state();
+
 s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
 if (!s->rp_state.from_dst_file) {
 error_report("Open QEMUFile from_dst_file failed");
@@ -330,6 +332,8 @@ void *colo_process_incoming_thread(void *opaque)
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 
+failover_init_state();
+
 mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
 if (!mis->to_src_file) {
 error_report("COLO incoming thread: Open QEMUFile to_src_file failed");
diff --git a/migration/trace-events b/migration/trace-events
index f374c8c..d7b0438 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -212,3 +212,4 @@ migration_tls_incoming_handshake_complete(void) ""
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_send_message(const char *msg) "Send '%s' message"
 colo_receive_message(const char *msg) "Receive '%s' message"
+colo_failover_set_state(int new_state) "new state %d"
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH COLO-Frame v18 02/34] migration: Introduce capability 'x-colo' to migration

2016-08-31 Thread Hailiang Zhang


On 2016/8/26 5:47, Amit Shah wrote:

On (Wed) 03 Aug 2016 [20:25:40], zhanghailiang wrote:

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Eric Blake 




+#include "qemu/osdep.h"
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+return true;
+}


Can you start with this disabled, and returning true when all the
pieces are done (ie close to the patches when the functionality is
actually useful later in the series?).



Yes, i have done as your advise, please see the new version, thanks.


.

[Qemu-devel] [PATCH COLO-Frame v19 11/22] COLO: Synchronize PVM's state to SVM periodically

2016-08-31 Thread zhanghailiang

Do checkpoint periodically, the default interval is 200ms.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Add Reviewed-by tag
v11:
- Fix wrong sleep time for checkpoint period. (Dave's comment)
---
 migration/colo.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 9a98caa..4a70e1d 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "io/channel-buffer.h"
@@ -226,6 +227,7 @@ static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
 QEMUFile *fb = NULL;
+int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 Error *local_err = NULL;
 int ret;
 
@@ -254,10 +256,20 @@ static void colo_process_checkpoint(MigrationState *s)
 trace_colo_vm_state_change("stop", "run");
 
 while (s->state == MIGRATION_STATUS_COLO) {
+current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if (current_time - checkpoint_time <
+s->parameters.x_checkpoint_delay) {
+int64_t delay_ms;
+
+delay_ms = s->parameters.x_checkpoint_delay -
+   (current_time - checkpoint_time);
+g_usleep(delay_ms * 1000);
+}
 ret = colo_do_checkpoint_transaction(s, bioc, fb);
 if (ret < 0) {
 goto out;
 }
+checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 }
 
 out:
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 20/22] COLO: Add block replication into colo process

2016-08-31 Thread zhanghailiang

Make sure master start block replication after slave's block
replication started.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Li Zhijian 
Cc: Stefan Hajnoczi 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 migration/colo.c  | 52 +++
 migration/migration.c |  6 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index b6f3cb0..ee20703 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -19,6 +19,9 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "migration/failover.h"
+#include "qapi-event.h"
+#include "block/block.h"
+#include "replication.h"
 
 static bool vmstate_loading;
 
@@ -52,6 +55,7 @@ static void secondary_vm_do_failover(void)
 {
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
 
 /* Can not do failover during the process of VM's loading VMstate, Or
  * it will break the secondary VM.
@@ -69,6 +73,11 @@ static void secondary_vm_do_failover(void)
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
 /* recover runstate to normal migration finish state */
@@ -106,6 +115,7 @@ static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
 int old_state;
+Error *local_err = NULL;
 
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
@@ -129,6 +139,12 @@ static void primary_vm_do_failover(void)
  old_state);
 return;
 }
+
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_exit_sem);
 }
@@ -288,6 +304,15 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 s->params.shared = 0;
 qemu_savevm_state_header(fb);
 qemu_savevm_state_begin(fb, >params);
+
+/* We call this API although this may do nothing on primary side. */
+qemu_mutex_lock_iothread();
+replication_do_checkpoint_all(_err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 qemu_savevm_state_complete_precopy(fb, false);
 qemu_mutex_unlock_iothread();
@@ -386,6 +411,12 @@ static void colo_process_checkpoint(MigrationState *s)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_PRIMARY, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vm_start();
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
@@ -468,6 +499,7 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_GUEST_SHUTDOWN:
 qemu_mutex_lock_iothread();
 vm_stop_force_state(RUN_STATE_COLO);
+replication_stop_all(false, NULL);
 qemu_system_shutdown_request_core();
 qemu_mutex_unlock_iothread();
 /*
@@ -514,6 +546,14 @@ void *colo_process_incoming_thread(void *opaque)
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
+qemu_mutex_lock_iothread();
+bdrv_invalidate_cache_all(_err);
+replication_start_all(REPLICATION_MODE_SECONDARY, _err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+goto out;
+}
+
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
   _err);
 if (local_err) {
@@ -585,6 +625,18 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+replication_get_error_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+/* discard colo disk buffer */
+replication_do_checkpoint_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 qemu_mutex_unlock_iothread();
 
diff --git a/migration/migration.c b/migration/migration.c
index db618db..9a904b8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1679,7 +1679,11 @@ static void migration_completion(MigrationState *s, int 
current_active_state,
 
 if (!ret) {
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-if (ret >= 0) {
+/*
+ * Don't mark the image with BDRV_O_INACTIVE flag if
+ * we

[Qemu-devel] [PATCH COLO-Frame v19 06/22] COLO: Introduce checkpointing protocol

2016-08-31 Thread zhanghailiang

We need communications protocol of user-defined to control
the checkpointing process.

The new checkpointing request is started by Primary VM,
and the interactive process like below:

Checkpoint synchronizing points:

   Primary   Secondary
initial work
'checkpoint-ready'< @

'checkpoint-request'  @ >
Suspend (Only in hybrid mode)
'checkpoint-reply'< @
  Suspend state
'vmstate-send'@ >
  Send stateReceive state
'vmstate-received'< @
  Release packets   Load state
'vmstate-load'< @
  ResumeResume (Only in hybrid mode)

  Start Comparing (Only in hybrid mode)
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
one handshake(single direction) for low-latency.
If more strict synchronization is required, a opposite direction
sync-point should be added.
 3) Since sync-points are single direction, the remote side may
go forward a lot when this side just receives the sync-point.
 4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Cc: Eric Blake 
Cc: Markus Armbruster 
Cc: Dr. David Alan Gilbert 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix title and comments
v16:
- Rename 'colo_put/get[_check]_cmd()' to 'colo_send/receive[_check]_message()'
v14:
- Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
- Add Reviewd-by tag
v13:
- Refactor colo command related helper functions, use 'Error **errp' parameter
  instead of return value to indicate success or failure.
- Fix some other comments from Markus.

v12:
- Rename colo_ctl_put() to colo_put_cmd()
- Rename colo_ctl_get() to colo_get_check_cmd() and drop
  the third parameter
- Rename colo_ctl_get_cmd() to colo_get_cmd()
- Remove useless 'invalid' member for COLOcommand enum.
v11:
- Add missing 'checkpoint-ready' communication in comment.
- Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
- Fix trace for colo_ctl_get() to trace command and value both
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'
---
 migration/colo.c   | 200 -
 migration/trace-events |   2 +
 qapi-schema.json   |  25 +++
 3 files changed, 225 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 7c5769b..bf32d63 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -15,6 +15,7 @@
 #include "migration/colo.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "qapi/error.h"
 
 bool colo_supported(void)
 {
@@ -35,22 +36,146 @@ bool migration_incoming_in_colo_state(void)
 return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static void colo_send_message(QEMUFile *f, COLOMessage msg,
+  Error **errp)
+{
+int ret;
+
+if (msg >= COLO_MESSAGE__MAX) {
+error_setg(errp, "%s: Invalid message", __func__);
+return;
+}
+qemu_put_be32(f, msg);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Can't send COLO message");
+}
+trace_colo_send_message(COLOMessage_lookup[msg]);
+}
+
+static COLOMessage colo_receive_message(QEMUFile *f, Error **errp)
+{
+COLOMessage msg;
+int ret;
+
+msg = qemu_get_be32(f);
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Can't receive COLO message");
+return msg;
+}
+if (msg >= COLO_MESSAGE__MAX) {
+error_setg(errp, "%s: Invalid message", __func__);
+return msg;
+}
+trace_colo_receive_message(COLOMessage_lookup[msg]);
+return msg;
+}
+
+static void colo_receive_check_message(QEMUFile *f, COLOMessage expect_msg,
+   Error **errp)
+{
+COLOMessage msg;
+Error *local_err = NULL;
+
+msg = colo_receive_message(f, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+if (msg != expect_msg) {
+error_setg(errp, "Unexpected COLO message %d, expected %d",
+  msg, expect_msg);
+}
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+Error *local_err = NULL;
+
+colo_send_message(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
+

[Qemu-devel] [PATCH COLO-Frame v19 07/22] COLO: Add a new RunState RUN_STATE_COLO

2016-08-31 Thread zhanghailiang

Guest will enter this state when paused to save/restore VM state
under COLO checkpoint.

Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Eric Blake 
---
 qapi-schema.json | 5 -
 vl.c | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index f2657a4..0ff1a63 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -154,12 +154,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.8)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-'guest-panicked' ] }
+'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index 2408982..72927b8 100644
--- a/vl.c
+++ b/vl.c
@@ -574,6 +574,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_INMIGRATE, RUN_STATE_PRELAUNCH },
 { RUN_STATE_INMIGRATE, RUN_STATE_POSTMIGRATE },
+{ RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -586,6 +587,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_PAUSED, RUN_STATE_COLO},
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -598,10 +600,13 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
 { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
+{ RUN_STATE_COLO, RUN_STATE_RUNNING },
+
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
 { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -612,6 +617,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
 { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
 { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+{ RUN_STATE_RUNNING, RUN_STATE_COLO},
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -624,10 +630,12 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_WATCHDOG, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 21/22] docs: Add documentation for COLO feature

2016-08-31 Thread zhanghailiang

Introduce the design of COLO, and how to test it.

Signed-off-by: zhanghailiang 
---
 docs/COLO-FT.txt | 190 +++
 1 file changed, 190 insertions(+)
 create mode 100644 docs/COLO-FT.txt

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
new file mode 100644
index 000..f1ba580
--- /dev/null
+++ b/docs/COLO-FT.txt
@@ -0,0 +1,190 @@
+COarse-grained LOck-stepping Virtual Machines for Non-stop Service
+
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO's design and how to use it.
+
+== Background ==
+Virtual machine (VM) replication is a well known technique for providing
+application-agnostic software-implemented hardware fault tolerance
+"non-stop service".
+
+COLO (COarse-grained LOck-stepping) is a high availability solution.
+Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
+same request from client, and generate response in parallel too.
+If the response packets from PVM and SVM are identical, they are released
+immediately. Otherwise, a VM checkpoint (on demand) is conducted.
+
+== Architecture ==
+
+The architecture of COLO is shown in the bellow diagram.
+It consists of a pair of networked physical nodes:
+The primary node running the PVM, and the secondary node running the SVM
+to maintain a valid replica of the PVM.
+PVM and SVM execute in parallel and generate output of response packets for
+client requests according to the application semantics.
+
+The incoming packets from the client or external network are received by the
+primary node, and then forwarded to the secondary node, so that Both the PVM
+and the SVM are stimulated with the same requests.
+
+COLO receives the outbound packets from both the PVM and SVM and compares them
+before allowing the output to be sent to clients.
+
+The SVM is qualified as a valid replica of the PVM, as long as it generates
+identical responses to all client requests. Once the differences in the outputs
+are detected between the PVM and SVM, COLO withholds transmission of the
+outbound packets until it has successfully synchronized the PVM state to the 
SVM.
+
+   Primary Node
Secondary Node
+ ++  +---+   ++  
++
+ ||  |   HeartBeat   |<->|   HeartBeat|  | 
   |
+ | Primary VM |  +---|---+   +---|+  
|Secondary VM|
+ ||  |   |   | 
   |
+ ||  +---|---+   +---|+  | 
   |
+ ||  |QEMU   +---v+  |   |QEMU  +v---+|  | 
   |
+ ||  |   |Failover|  |   |  |Failover||  | 
   |
+ ||  |   ++  |   |  ++|  | 
   |
+ ||  |   +---+   |   |   +---+|  | 
   |
+ ||  |   | VM Checkpoint |-->| VM Checkpoint ||  | 
   |
+ ||  |   +---+   |   |   +---+|  | 
   |
+ ||  |   |   ||  | 
   |
+ 
|Requests<---^-->Requests|
+ |Responses--\ /--|--\  
/Responses|
+ ||  |   | |  |  |   |   |  | |  | 
   |
+ ||  | +---+ | |  |  |   |   |  |  ++ |  | 
   |
+ ||  | | COLO disk | | |  |  |   |   |  |  | COLO disk  | |  | 
   |
+ ||  | |   Manager |-|-|--|--|--|->| Manager| |  | 
   |
+ ||  | +|--+ | |  |  |   |   |  |  +---|+ |  | 
   |
+ ||  |  || |  |  |   |   |  |  |  |  | 
   |
+ ++  +--||-|--|--+   +---|--|--|--+  
++
+|| |  |  |  |  |
+ +-+| +--v-v--|--+   +---|--v---+  |
+-+
+ |  VM Monitor || |  COLO Proxy  |   |COLO Proxy|  || 
VM Monitor  |
+ | || |(compare packet)  |   | (adjust sequence)|  ||  
   |
+ +-+| +--|^--+   +--+  |
+-+
+|||

[Qemu-devel] [PATCH COLO-Frame v19 15/22] COLO: Implement failover work for secondary VM

2016-08-31 Thread zhanghailiang

If users require SVM to takeover work, COLO incoming thread should
exit from loop while failover BH helps backing to migration incoming
coroutine.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Improve error message that suggested by Dave
- Add Reviewed-by tag
---
 migration/colo.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 8d6f585..f1fb2ef 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -46,6 +46,33 @@ static bool colo_runstate_is_stopped(void)
 return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static void secondary_vm_do_failover(void)
+{
+int old_state;
+MigrationIncomingState *mis = migration_incoming_get_current();
+
+migrate_set_state(>state, MIGRATION_STATUS_COLO,
+  MIGRATION_STATUS_COMPLETED);
+
+if (!autostart) {
+error_report("\"-S\" qemu option will be ignored in secondary side");
+/* recover runstate to normal migration finish state */
+autostart = true;
+}
+
+old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+   FAILOVER_STATUS_COMPLETED);
+if (old_state != FAILOVER_STATUS_HANDLING) {
+error_report("Incorrect state (%d) while doing failover for "
+ "secondary VM", old_state);
+return;
+}
+/* For Secondary VM, jump to incoming co */
+if (mis->migration_incoming_co) {
+qemu_coroutine_enter(mis->migration_incoming_co);
+}
+}
+
 static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
@@ -72,6 +99,8 @@ void colo_do_failover(MigrationState *s)
 
 if (get_colo_mode() == COLO_MODE_PRIMARY) {
 primary_vm_do_failover();
+} else {
+secondary_vm_do_failover();
 }
 }
 
@@ -414,6 +443,11 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 assert(request);
+if (failover_request_is_active()) {
+error_report("failover request");
+goto out;
+}
+
 /* FIXME: This is unnecessary for periodic checkpoint mode */
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
  _err);
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 17/22] COLO: Don't do failover while loading VM's state

2016-08-31 Thread zhanghailiang

We should not do failover work while the main thread is loading
VM's state. Otherwise it will destroy the consistent of VM's memory and
device state.

Here we add a new failover status 'RELAUNCH' which means we should
relaunch the process of failover.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix title
v14:
- Move the place of 'vmstate_loading = false;'.
v13:
- Add Reviewed-by tag
---
 include/migration/failover.h |  2 ++
 migration/colo.c | 25 +
 2 files changed, 27 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index c4bd81e..99b0d58 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -20,6 +20,8 @@ typedef enum COLOFailoverStatus {
 FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
 FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
 FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+/* Optional, Relaunch the failover process, again 'NONE' -> 'COMPLETED' */
+FAILOVER_STATUS_RELAUNCH = 4,
 } COLOFailoverStatus;
 
 void failover_init_state(void);
diff --git a/migration/colo.c b/migration/colo.c
index fc89438..69d1948 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -20,6 +20,8 @@
 #include "qapi/error.h"
 #include "migration/failover.h"
 
+static bool vmstate_loading;
+
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
 bool colo_supported(void)
@@ -51,6 +53,19 @@ static void secondary_vm_do_failover(void)
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
 
+/* Can not do failover during the process of VM's loading VMstate, Or
+ * it will break the secondary VM.
+ */
+if (vmstate_loading) {
+old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+   FAILOVER_STATUS_RELAUNCH);
+if (old_state != FAILOVER_STATUS_HANDLING) {
+error_report("Unknown error while do failover for secondary VM,"
+ "old_state: %d", old_state);
+}
+return;
+}
+
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
@@ -532,13 +547,22 @@ void *colo_process_incoming_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 qemu_system_reset(VMRESET_SILENT);
+vmstate_loading = true;
 if (qemu_loadvm_state(fb) < 0) {
 error_report("COLO: loadvm failed");
 qemu_mutex_unlock_iothread();
 goto out;
 }
+
+vmstate_loading = false;
 qemu_mutex_unlock_iothread();
 
+if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+failover_set_state(FAILOVER_STATUS_RELAUNCH, FAILOVER_STATUS_NONE);
+failover_request_active(NULL);
+goto out;
+}
+
 colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
  _err);
 if (local_err) {
@@ -547,6 +571,7 @@ void *colo_process_incoming_thread(void *opaque)
 }
 
 out:
+vmstate_loading = false;
 /* Throw the unreported error message after exited from loop */
 if (local_err) {
 error_report_err(local_err);
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 14/22] COLO: Implement the process of failover for primary VM

2016-08-31 Thread zhanghailiang

For primary side, if COLO gets failover request from users.
To be exact, gets 'x_colo_lost_heartbeat' command.
COLO thread will exit the loop while the failover BH does the
cleanup work and resumes VM.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Add Reviewed-by tag
v12:
- Fix error report and remove unnecessary check in
  primary_vm_do_failover() (Dave's suggestion)
v11:
- Don't call migration_end() in primary_vm_do_failover(),
 The cleanup work will be done in migration_thread().
- Remove vm_start() in primary_vm_do_failover() which also been
  done in migraiton_thread()
v10:
- Call migration_end() in primary_vm_do_failover()
---
 include/migration/colo.h |  3 +++
 include/migration/failover.h |  1 +
 migration/colo-failover.c|  7 +-
 migration/colo.c | 54 ++--
 4 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e9ac2c3..e32eef4 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
 
 COLOMode get_colo_mode(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
index fe71bb4..c4bd81e 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -26,5 +26,6 @@ void failover_init_state(void);
 int failover_set_state(int old_state, int new_state);
 int failover_get_state(void);
 void failover_request_active(Error **errp);
+bool failover_request_is_active(void);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index 82196b2..607a294 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -33,7 +33,7 @@ static void colo_failover_bh(void *opaque)
 error_report("Unknown error for failover, old_state = %d", old_state);
 return;
 }
-/* TODO: Do failover work */
+colo_do_failover(NULL);
 }
 
 void failover_request_active(Error **errp)
@@ -68,6 +68,11 @@ int failover_get_state(void)
 return atomic_read(_state);
 }
 
+bool failover_request_is_active(void)
+{
+return failover_get_state() != FAILOVER_STATUS_NONE;
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
 if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index b94972c..8d6f585 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -41,6 +41,40 @@ bool migration_incoming_in_colo_state(void)
 return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static bool colo_runstate_is_stopped(void)
+{
+return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+static void primary_vm_do_failover(void)
+{
+MigrationState *s = migrate_get_current();
+int old_state;
+
+migrate_set_state(>state, MIGRATION_STATUS_COLO,
+  MIGRATION_STATUS_COMPLETED);
+
+old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+   FAILOVER_STATUS_COMPLETED);
+if (old_state != FAILOVER_STATUS_HANDLING) {
+error_report("Incorrect state (%d) while doing failover for Primary 
VM",
+ old_state);
+return;
+}
+}
+
+void colo_do_failover(MigrationState *s)
+{
+/* Make sure VM stopped while failover happened. */
+if (!colo_runstate_is_stopped()) {
+vm_stop_force_state(RUN_STATE_COLO);
+}
+
+if (get_colo_mode() == COLO_MODE_PRIMARY) {
+primary_vm_do_failover();
+}
+}
+
 static void colo_send_message(QEMUFile *f, COLOMessage msg,
   Error **errp)
 {
@@ -162,9 +196,20 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 bioc->usage = 0;
 
 qemu_mutex_lock_iothread();
+if (failover_request_is_active()) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
 vm_stop_force_state(RUN_STATE_COLO);
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("run", "stop");
+/*
+ * Failover request bh could be called after vm_stop_force_state(),
+ * So we need check failover_request_is_active() again.
+ */
+if (failover_request_is_active()) {
+goto out;
+}
 
 /* Disable block migration */
 s->params.blk = 0;
@@ -259,6 +304,11 @@ static void colo_process_checkpoint(MigrationState *s)
 trace_colo_vm_state_change("stop", "run");
 
 while (s->state == MIGRATION_STATUS_COLO) {
+if (failover_request_is_active()) {
+error_report("failover request");
+goto out;
+}
+
 current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 if (current_time - checkpoint_time <
 s->parameters.x_checkpoint_delay) {
@@ -280,9 +330,9 @@ out:
 if

[Qemu-devel] [PATCH COLO-Frame v19 22/22] configure: Support enable/disable COLO feature

2016-08-31 Thread zhanghailiang

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO feature is enabled by default.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix colo_supported() to return true
v11:
- Turn COLO on in default (Eric's suggestion)
---
 configure| 11 +++
 migration/colo.c |  2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index e7aa73c..4aea244 100755
--- a/configure
+++ b/configure
@@ -230,6 +230,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -919,6 +920,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1356,6 +1361,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   fdt fdt device tree
   bluez   bluez stack connectivity
   kvm KVM acceleration support
+  coloCOarse-grain LOck-stepping VM for Non-stop Service
   rdmaRDMA-based migration support
   uuiduuid support
   vde support for vde network
@@ -4866,6 +4872,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
+echo "COLO support  $colo"
 echo "RDMA support  $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support   $fdt"
@@ -5477,6 +5484,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
diff --git a/migration/colo.c b/migration/colo.c
index ee20703..6ecd584 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -29,7 +29,7 @@ static bool vmstate_loading;
 
 bool colo_supported(void)
 {
-return false;
+return true;
 }
 
 bool migration_in_colo_state(void)
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 19/22] COLO: Update the global runstate after going into colo state

2016-08-31 Thread zhanghailiang

If we start qemu with -S, the runstate will change from 'prelaunch' to 'running'
after going into colo state.
So it is necessary to update the global runstate after going into colo state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 0a4cd80..b6f3cb0 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -390,6 +390,11 @@ static void colo_process_checkpoint(MigrationState *s)
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
 
+ret = global_state_store();
+if (ret < 0) {
+goto out;
+}
+
 while (s->state == MIGRATION_STATUS_COLO) {
 if (failover_request_is_active()) {
 error_report("failover request");
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 00/22] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-08-31 Thread zhanghailiang

This is the 19th version of COLO frame series.

According to the suggestion of Juan and Amit,
I dropped parts of the optimization patches to make it easier for review.

Besides, I discarded the network related patches since the development of
COLO proxy goes well, It is very likely to be merged in QEMU 2.8,
Please see [PATCH V12 02/10] colo-compare: introduce colo compare initialization
for more information. The original network related patches in this series are
to buffer the network packets till a checkpointing be successfully achieved, 
just like Remus does in XEN, but these patches will be reverted in COLO compare
series. So here, it is unnecessary to add them.

You can still test this series like before. Or refer to docs/COLO-FT.txt. 

It is based on 'Block replication' series which has been merged in Stefan's 
branch https://github.com/stefanha/qemu/commits/block-next.

The complete codes can be found from the link:
https://github.com/coloft/qemu/commits/colo-v4.1-periodic-mode

Please review. Thanks ;)

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v19:
 - Add documentation about COLO (patch 21)
 - Dropped network related patches
 - Fix parts of patches' title and comments 

zhanghailiang (22):
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate COLO related info to secondary node
  migration: Enter into COLO mode after migration if COLO is enabled
  migration: Switch to COLO process after finishing loadvm
  COLO: Establish a new communicating path for COLO
  COLO: Introduce checkpointing protocol
  COLO: Add a new RunState RUN_STATE_COLO
  COLO: Send PVM state to secondary side when do checkpoint
  COLO: Load VMState into QIOChannelBuffer before restore it
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: Synchronize PVM's state to SVM periodically
  COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
  COLO: Introduce state to record failover process
  COLO: Implement the process of failover for primary VM
  COLO: Implement failover work for secondary VM
  COLO: Shutdown related socket fd while do failover
  COLO: Don't do failover while loading VM's state
  COLO: Handle shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  COLO: Add block replication into colo process
  docs: Add documentation for COLO feature
  configure: Support enable/disable COLO feature

 configure |  11 +
 docs/COLO-FT.txt  | 190 
 hmp-commands.hx   |  15 +
 hmp.c |  15 +
 hmp.h |   1 +
 include/migration/colo.h  |  40 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  11 +
 include/sysemu/sysemu.h   |   3 +
 migration/Makefile.objs   |   2 +
 migration/colo-comm.c |  72 +
 migration/colo-failover.c |  84 +
 migration/colo.c  | 694 ++
 migration/migration.c |  86 +-
 migration/ram.c   |  37 ++-
 migration/trace-events|   6 +
 qapi-schema.json  |  88 +-
 qmp-commands.hx   |  24 +-
 stubs/Makefile.objs   |   1 +
 stubs/migration-colo.c|  51 
 vl.c  |  30 +-
 21 files changed, 1468 insertions(+), 26 deletions(-)
 create mode 100644 docs/COLO-FT.txt
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 10/22] COLO: Add checkpoint-delay parameter for migrate-set-parameters

2016-08-31 Thread zhanghailiang

Add checkpoint-delay parameter for migrate-set-parameters, so that
we can control the checkpoint frequency when COLO is in periodic mode.

Cc: Luiz Capitulino 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Change checkpoint-delay to x-checkpoint-delay (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move this patch ahead of the patch where uses 'checkpoint_delay'
 (Dave's suggestion)
v10:
- Fix related qmp command
---
 hmp.c |  7 +++
 migration/migration.c | 18 ++
 qapi-schema.json  | 17 ++---
 qmp-commands.hx   |  3 ++-
 4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index cc2056e..38b4a51 100644
--- a/hmp.c
+++ b/hmp.c
@@ -304,6 +304,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, " %s: '%s'",
 MigrationParameter_lookup[MIGRATION_PARAMETER_TLS_HOSTNAME],
 params->tls_hostname ? : "");
+monitor_printf(mon, " %s: %" PRId64,
+MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
+params->x_checkpoint_delay);
 monitor_printf(mon, "\n");
 }
 
@@ -1261,6 +1264,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 bool has_tls_creds = false;
 bool has_tls_hostname = false;
 bool use_int_value = false;
+bool has_x_checkpoint_delay = false;
 int i;
 
 for (i = 0; i < MIGRATION_PARAMETER__MAX; i++) {
@@ -1284,6 +1288,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 break;
 case MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT:
 has_cpu_throttle_increment = true;
+case MIGRATION_PARAMETER_X_CHECKPOINT_DELAY:
+has_x_checkpoint_delay = true;
 break;
 case MIGRATION_PARAMETER_TLS_CREDS:
 has_tls_creds = true;
@@ -1308,6 +1314,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
has_cpu_throttle_increment, valueint,
has_tls_creds, valuestr,
has_tls_hostname, valuestr,
+   has_x_checkpoint_delay, valueint,
);
 break;
 }
diff --git a/migration/migration.c b/migration/migration.c
index 34b34b8..db618db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -59,6 +59,11 @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
+/* The delay time (in ms) between two COLO checkpoints
+ * Note: Please change this default value to 1 when we support hybrid mode.
+ */
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -90,6 +95,7 @@ MigrationState *migrate_get_current(void)
 .decompress_threads = DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
 .cpu_throttle_initial = DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL,
 .cpu_throttle_increment = DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT,
+.x_checkpoint_delay = DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
 },
 };
 
@@ -582,6 +588,7 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->cpu_throttle_increment = s->parameters.cpu_throttle_increment;
 params->tls_creds = g_strdup(s->parameters.tls_creds);
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
+params->x_checkpoint_delay = s->parameters.x_checkpoint_delay;
 
 return params;
 }
@@ -801,6 +808,8 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 const char *tls_creds,
 bool has_tls_hostname,
 const char *tls_hostname,
+bool has_x_checkpoint_delay,
+int64_t x_checkpoint_delay,
 Error **errp)
 {
 MigrationState *s = migrate_get_current();
@@ -836,6 +845,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
"cpu_throttle_increment",
"an integer in the range of 1 to 99");
 }
+if (has_x_checkpoint_delay && (x_checkpoint_delay < 0)) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+"x_checkpoint_delay",
+"is invalid, it should be positive");
+}
 
 if (has_compress_level) {
 s->parameters.compress_level = compress_level;
@@ -860,6 +874,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,

[Qemu-devel] [PATCH COLO-Frame v19 04/22] migration: Switch to COLO process after finishing loadvm

2016-08-31 Thread zhanghailiang

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.

We add three new members to struct MigrationIncomingState,
'have_colo_incoming_thread' and 'colo_incoming_thread' record the COLO
related thread for secondary VM, 'migration_incoming_co' records the
original migration incoming coroutine.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- Fix the title and comments
v12:
- Add Reviewed-by tag
v11:
- We moved the place of bdrv_invalidate_cache_all(), but done the deleting work
  in other patch. Fix it.
- Add documentation for colo in 'MigrationStatus' (Eric's review comment)
v10:
- fix a bug about fd leak which is found by Dave.
---
 include/migration/colo.h  |  7 +++
 include/migration/migration.h |  7 +++
 migration/colo-comm.c | 10 ++
 migration/colo.c  | 21 +
 migration/migration.c | 12 
 stubs/migration-colo.c| 10 ++
 6 files changed, 67 insertions(+)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index bf84b99..b40676c 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -15,6 +15,8 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "qemu/coroutine_int.h"
+#include "qemu/thread.h"
 
 bool colo_supported(void);
 void colo_info_init(void);
@@ -22,4 +24,9 @@ void colo_info_init(void);
 void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
+/* loadvm */
+bool migration_incoming_enable_colo(void);
+void migration_incoming_exit_colo(void);
+void *colo_process_incoming_thread(void *opaque);
+bool migration_incoming_in_colo_state(void);
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5effc05..f4b215a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -21,6 +21,7 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 #include "exec/cpu-common.h"
+#include "qemu/coroutine_int.h"
 
 #define QEMU_VM_FILE_MAGIC   0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x0002
@@ -107,6 +108,12 @@ struct MigrationIncomingState {
 QEMUBH *bh;
 
 int state;
+
+bool have_colo_incoming_thread;
+QemuThread colo_incoming_thread;
+/* The coroutine we should enter (back) after failover */
+Coroutine *migration_incoming_co;
+
 /* See savevm.c */
 LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index a2d5185..3af9333 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -49,3 +49,13 @@ void colo_info_init(void)
 {
 vmstate_register(NULL, 0, _state, _info);
 }
+
+bool migration_incoming_enable_colo(void)
+{
+return colo_info.colo_requested;
+}
+
+void migration_incoming_exit_colo(void)
+{
+colo_info.colo_requested = 0;
+}
diff --git a/migration/colo.c b/migration/colo.c
index fd3ceeb..968cd51 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -27,6 +27,13 @@ bool migration_in_colo_state(void)
 return (s->state == MIGRATION_STATUS_COLO);
 }
 
+bool migration_incoming_in_colo_state(void)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+
+return mis && (mis->state == MIGRATION_STATUS_COLO);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 qemu_mutex_lock_iothread();
@@ -48,3 +55,17 @@ void migrate_start_colo_process(MigrationState *s)
 colo_process_checkpoint(s);
 qemu_mutex_lock_iothread();
 }
+
+void *colo_process_incoming_thread(void *opaque)
+{
+MigrationIncomingState *mis = opaque;
+
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_COLO);
+
+/* TODO: COLO checkpoint restore loop */
+
+migration_incoming_exit_colo();
+
+return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index 4a5bdb9..34b34b8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -402,6 +402,18 @@ static void process_incoming_migration_co(void *opaque)
 /* Else if something went wrong then just fall out of the normal exit 
*/
 }
 
+/* we get COLO info, and know if we are in COLO mode */
+if (!ret && migration_incoming_enable_colo()) {
+mis->migration_incoming_co = qemu_coroutine_self();
+qemu_thread_create(>colo_incoming_thread, "COLO incoming",
+ colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
+mis->have_colo_incoming_thread = true;
+qemu_coroutine_yield();
+
+/* Wait checkpoint incoming thread exit before free resource */
+qemu_thread_join(>colo_incoming_thread);
+}
+
 qemu_fclose(f);
 free_xbzrle_decoded_buf();
 
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 0c8eef4..7b72395 100644
--- a/stubs/migration-colo.c
+++

[Qemu-devel] [PATCH COLO-Frame v19 05/22] COLO: Establish a new communicating path for COLO

2016-08-31 Thread zhanghailiang

This new communication path will be used for returning messages
from Secondary side to Primary side.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Remove useless error report
v12:
- Add Reviewed-by tag
v11:
- Rebase master to use qemu_file_get_return_path() for opening return path
v10:
- fix the the error log (Dave's suggestion).
---
 migration/colo.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 968cd51..7c5769b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -14,6 +14,7 @@
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
+#include "qemu/error-report.h"
 
 bool colo_supported(void)
 {
@@ -36,6 +37,12 @@ bool migration_incoming_in_colo_state(void)
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
+if (!s->rp_state.from_dst_file) {
+error_report("Open QEMUFile from_dst_file failed");
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 vm_start();
 qemu_mutex_unlock_iothread();
@@ -43,8 +50,13 @@ static void colo_process_checkpoint(MigrationState *s)
 
 /* TODO: COLO checkpoint savevm loop */
 
+out:
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
+
+if (s->rp_state.from_dst_file) {
+qemu_fclose(s->rp_state.from_dst_file);
+}
 }
 
 void migrate_start_colo_process(MigrationState *s)
@@ -63,8 +75,24 @@ void *colo_process_incoming_thread(void *opaque)
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 
+mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
+if (!mis->to_src_file) {
+error_report("COLO incoming thread: Open QEMUFile to_src_file failed");
+goto out;
+}
+/*
+ * Note: We set the fd to unblocked in migration incoming coroutine,
+ * But here we are in the COLO incoming thread, so it is ok to set the
+ * fd back to blocked.
+ */
+qemu_file_set_blocking(mis->from_src_file, true);
+
 /* TODO: COLO checkpoint restore loop */
 
+out:
+if (mis->to_src_file) {
+qemu_fclose(mis->to_src_file);
+}
 migration_incoming_exit_colo();
 
 return NULL;
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 02/22] COLO: migrate COLO related info to secondary node

2016-08-31 Thread zhanghailiang

We can determine whether or not VM in destination should go into COLO mode
by referring to the info that was migrated.

We skip this section if COLO is not enabled (i.e.
migrate_set_capability colo off), so that, It doesn't break compatibility
with migration no matter whether users configure the --enable-colo/disable-colo
on the source/destination side or not;

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix title and comment
v16:
- Fix compile broken due to missing osdep.h
v14:
- Adjust the place of calling colo_info_init()
v11:
- Add Reviewed-by tag
v10:
- Use VMSTATE_BOOL instead of VMSTATE_UNIT32 for 'colo_requested' (Dave's 
suggestion)
---
 include/migration/colo.h |  2 ++
 migration/Makefile.objs  |  1 +
 migration/colo-comm.c| 51 
 vl.c |  3 +++
 4 files changed, 57 insertions(+)
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 59a632a..1c899a0 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -14,7 +14,9 @@
 #define QEMU_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
 
 bool colo_supported(void);
+void colo_info_init(void);
 
 #endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index cff96f0..4bbe9ab 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,7 @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += tls.o
 common-obj-$(CONFIG_COLO) += colo.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 000..a2d5185
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,51 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include "trace.h"
+
+typedef struct {
+ bool colo_requested;
+} COLOInfo;
+
+static COLOInfo colo_info;
+
+static void colo_info_pre_save(void *opaque)
+{
+COLOInfo *s = opaque;
+
+s->colo_requested = migrate_colo_enabled();
+}
+
+static bool colo_info_need(void *opaque)
+{
+   return migrate_colo_enabled();
+}
+
+static const VMStateDescription colo_state = {
+.name = "COLOState",
+.version_id = 1,
+.minimum_version_id = 1,
+.pre_save = colo_info_pre_save,
+.needed = colo_info_need,
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(colo_requested, COLOInfo),
+VMSTATE_END_OF_LIST()
+},
+};
+
+void colo_info_init(void)
+{
+vmstate_register(NULL, 0, _state, _info);
+}
diff --git a/vl.c b/vl.c
index b3c80d5..2408982 100644
--- a/vl.c
+++ b/vl.c
@@ -89,6 +89,7 @@ int main(int argc, char **argv)
 #include "audio/audio.h"
 #include "migration/migration.h"
 #include "sysemu/cpus.h"
+#include "migration/colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4355,6 +4356,8 @@ int main(int argc, char **argv, char **envp)
 #endif
 }
 
+colo_info_init();
+
 if (net_init_clients() < 0) {
 exit(1);
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH COLO-Frame v19 18/22] COLO: Handle shutdown command for VM in COLO state

2016-08-31 Thread zhanghailiang

If VM is in COLO FT state, we should do some extra work before
normal shutdown process. SVM will ignore the shutdown command if
this command is issued directly to it.

COLO will send the shutdown command to Secondary side if it gets
shutdown request from user.

Cc: Paolo Bonzini 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v19:
- fix title and comment
v15:
- Go on the shutdown process even some error happened
  while sent 'SHUTDOWN' message to SVM.
- Add Reviewed-by tag
v14:
- Remove 'colo_shutdown' variable, use colo_shutdown_request directly
v13:
- Move COLO shutdown related codes to colo.c file (Dave's suggestion)
---
 include/migration/colo.h |  2 ++
 include/sysemu/sysemu.h  |  3 +++
 migration/colo.c | 47 +--
 qapi-schema.json |  4 +++-
 stubs/migration-colo.c   |  5 +
 vl.c | 19 ---
 6 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e32eef4..b16c642 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);
 
 /* failover */
 void colo_do_failover(MigrationState *s);
+
+bool colo_handle_shutdown(void);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ee7c760..1497c8b 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -51,6 +51,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -58,6 +60,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index 69d1948..0a4cd80 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -327,6 +327,21 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+if (colo_shutdown_requested) {
+colo_send_message(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN,
+  _err);
+if (local_err) {
+error_free(local_err);
+/* Go on the shutdown process and throw the error message */
+error_report("Failed to send shutdown message to SVM");
+}
+qemu_fflush(s->to_dst_file);
+colo_shutdown_requested = 0;
+qemu_system_shutdown_request_core();
+/* Fix me: Just let the colo thread exit ? */
+qemu_thread_exit(0);
+}
+
 ret = 0;
 
 qemu_mutex_lock_iothread();
@@ -382,8 +397,9 @@ static void colo_process_checkpoint(MigrationState *s)
 }
 
 current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-if (current_time - checkpoint_time <
-s->parameters.x_checkpoint_delay) {
+if ((current_time - checkpoint_time <
+s->parameters.x_checkpoint_delay) &&
+!colo_shutdown_requested) {
 int64_t delay_ms;
 
 delay_ms = s->parameters.x_checkpoint_delay -
@@ -444,6 +460,16 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
 *checkpoint_request = 1;
 break;
+case COLO_MESSAGE_GUEST_SHUTDOWN:
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_system_shutdown_request_core();
+qemu_mutex_unlock_iothread();
+/*
+ * The main thread will be exit and terminate the whole
+ * process, do need some cleanup ?
+ */
+qemu_thread_exit(0);
 default:
 *checkpoint_request = 0;
 error_setg(errp, "Got unknown COLO message: %d", msg);
@@ -592,3 +618,20 @@ out:
 
 return NULL;
 }
+
+bool colo_handle_shutdown(void)
+{
+/*
+ * If VM is in COLO-FT mode, we need do some significant work before
+ * respond to the shutdown request. Besides, Secondary VM will ignore
+ * the shutdown request from users.
+ */
+if (migration_incoming_in_colo_state()) {
+return true;
+}
+if (migration_in_colo_state()) {
+colo_shutdown_requested = 1;
+return true;
+}
+return false;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index ee7131d..d65b0d3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -811,12 +811,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+#

Re: [Qemu-devel] [PATCH for 2.8 10/11] Revert "intel_iommu: Throw hw_error on notify_started"

2016-08-31 Thread Alex Williamson

[cc +dgibson]

On Thu, 1 Sep 2016 10:29:29 +0800
Peter Xu  wrote:

> On Wed, Aug 31, 2016 at 10:45:37AM +0800, Jason Wang wrote:
> > 
> > 
> > On 2016年08月30日 11:37, Alex Williamson wrote:  
> > >On Tue, 30 Aug 2016 11:06:58 +0800
> > >Jason Wang  wrote:
> > >  
> > >>From: Peter Xu 
> > >>
> > >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> > >>device IOTLB API will get notified and send invalidation request to
> > >>vhost through this notifier.  
> > >AFAICT this series does not address the original problem for which
> > >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> > >use case of a device iotlb firing the iommu notifier therefore this
> > >change is a regression versus 2.7 since it allows invalid
> > >configurations with a physical iommu which will never receive the
> > >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> > >
> > >Alex  
> > 
> > Looking at vfio, it cares about map but vhost only cares about IOTLB
> > invalidation. Then I think we probably need another kind of notifier in this
> > case to avoid this.  
> 
> Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
> invalidation? If so, we can use the same IOTLB interface as before.
> IMHO these two interfaces are not conflicting?
> 
> Alex,
> 
> Do you mean we should still disallow user from passing through devices
> while Intel IOMMU enabled? If so, not sure whether patch below can
> solve the issue.
> 
> It seems that we need a "name" for either IOMMU notifier
> provider/consumer, and we should not allow (provider==Intel &&
> consumer==VFIO) happen. In the following case, I added a name for
> provider, and VFIO checks it.

Absolutely not, intel-iommu emulation is simply incomplete, the IOMMU
notifier is never called for mappings.  There's a whole aspect of
iommu notifiers that intel-iommu simply hasn't bothered to implement.
Don't punish vfio for actually making use of the interface as it was
intended to be used.  AFAICT you're implementing the unmap/invalidation
half, without the actual mapping half of the interface.  It's broken
and incompatible with any iommu notifiers that expect to see both
sides.  Thanks,

Alex


> 8<--
> 
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index 883db13..936c2e6 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -725,6 +725,7 @@ static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion 
> *iommu, hwaddr addr,
>  }
> 
>  static const MemoryRegionIOMMUOps typhoon_iommu_ops = {
> +.iommu_type = "typhoon",
>  .translate = typhoon_translate_iommu,
>  };
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 28c31a2..f5e3875 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2347,6 +2347,7 @@ static void vtd_init(IntelIOMMUState *s)
>  memset(s->w1cmask, 0, DMAR_REG_SIZE);
>  memset(s->womask, 0, DMAR_REG_SIZE);
> 
> +s->iommu_ops.iommu_type = "intel";
>  s->iommu_ops.translate = vtd_iommu_translate;
>  s->iommu_ops.notify_started = vtd_iommu_notify_started;
>  s->root = 0;
> diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
> index 653e711..9cfbb73 100644
> --- a/hw/pci-host/apb.c
> +++ b/hw/pci-host/apb.c
> @@ -323,6 +323,7 @@ static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion 
> *iommu, hwaddr addr,
>  }
> 
>  static MemoryRegionIOMMUOps pbm_iommu_ops = {
> +.iommu_type = "pbm",
>  .translate = pbm_translate_iommu,
>  };
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 6bc4d4d..e3e8739 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -244,6 +244,7 @@ static const VMStateDescription vmstate_spapr_tce_table = 
> {
>  };
> 
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
> +.iommu_type = "spapr",
>  .translate = spapr_tce_translate_iommu,
>  .get_min_page_size = spapr_tce_get_min_page_size,
>  .notify_started = spapr_tce_notify_started,
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 9c1c04e..4414462 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -443,6 +443,7 @@ static IOMMUTLBEntry s390_translate_iommu(MemoryRegion 
> *iommu, hwaddr addr,
>  }
> 
>  static const MemoryRegionIOMMUOps s390_iommu_ops = {
> +.iommu_type = "s390",
>  .translate = s390_translate_iommu,
>  };
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b313e7c..317e08b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -441,6 +441,11 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  if (memory_region_is_iommu(section->mr)) {
>  VFIOGuestIOMMU *giommu;
> 
> +if (!strcmp(memory_region_iommu_type(section->mr), "intel")) {
> +error_report("Device passthrough cannot work with Intel IOMMU");
> +exit(1);
> +}
> +
>  trace_vfio_listener_region_add_iommu(iova,

Re: [Qemu-devel] [PATCH for 2.8 10/11] Revert "intel_iommu: Throw hw_error on notify_started"

2016-08-31 Thread Peter Xu

On Wed, Aug 31, 2016 at 10:45:37AM +0800, Jason Wang wrote:
> 
> 
> On 2016年08月30日 11:37, Alex Williamson wrote:
> >On Tue, 30 Aug 2016 11:06:58 +0800
> >Jason Wang  wrote:
> >
> >>From: Peter Xu 
> >>
> >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> >>device IOTLB API will get notified and send invalidation request to
> >>vhost through this notifier.
> >AFAICT this series does not address the original problem for which
> >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> >use case of a device iotlb firing the iommu notifier therefore this
> >change is a regression versus 2.7 since it allows invalid
> >configurations with a physical iommu which will never receive the
> >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> >
> >Alex
> 
> Looking at vfio, it cares about map but vhost only cares about IOTLB
> invalidation. Then I think we probably need another kind of notifier in this
> case to avoid this.

Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
invalidation? If so, we can use the same IOTLB interface as before.
IMHO these two interfaces are not conflicting?

Alex,

Do you mean we should still disallow user from passing through devices
while Intel IOMMU enabled? If so, not sure whether patch below can
solve the issue.

It seems that we need a "name" for either IOMMU notifier
provider/consumer, and we should not allow (provider==Intel &&
consumer==VFIO) happen. In the following case, I added a name for
provider, and VFIO checks it.

8<--

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 883db13..936c2e6 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -725,6 +725,7 @@ static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion 
*iommu, hwaddr addr,
 }

 static const MemoryRegionIOMMUOps typhoon_iommu_ops = {
+.iommu_type = "typhoon",
 .translate = typhoon_translate_iommu,
 };

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 28c31a2..f5e3875 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2347,6 +2347,7 @@ static void vtd_init(IntelIOMMUState *s)
 memset(s->w1cmask, 0, DMAR_REG_SIZE);
 memset(s->womask, 0, DMAR_REG_SIZE);

+s->iommu_ops.iommu_type = "intel";
 s->iommu_ops.translate = vtd_iommu_translate;
 s->iommu_ops.notify_started = vtd_iommu_notify_started;
 s->root = 0;
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 653e711..9cfbb73 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -323,6 +323,7 @@ static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion 
*iommu, hwaddr addr,
 }

 static MemoryRegionIOMMUOps pbm_iommu_ops = {
+.iommu_type = "pbm",
 .translate = pbm_translate_iommu,
 };

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 6bc4d4d..e3e8739 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -244,6 +244,7 @@ static const VMStateDescription vmstate_spapr_tce_table = {
 };

 static MemoryRegionIOMMUOps spapr_iommu_ops = {
+.iommu_type = "spapr",
 .translate = spapr_tce_translate_iommu,
 .get_min_page_size = spapr_tce_get_min_page_size,
 .notify_started = spapr_tce_notify_started,
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 9c1c04e..4414462 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -443,6 +443,7 @@ static IOMMUTLBEntry s390_translate_iommu(MemoryRegion 
*iommu, hwaddr addr,
 }

 static const MemoryRegionIOMMUOps s390_iommu_ops = {
+.iommu_type = "s390",
 .translate = s390_translate_iommu,
 };

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b313e7c..317e08b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -441,6 +441,11 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (memory_region_is_iommu(section->mr)) {
 VFIOGuestIOMMU *giommu;

+if (!strcmp(memory_region_iommu_type(section->mr), "intel")) {
+error_report("Device passthrough cannot work with Intel IOMMU");
+exit(1);
+}
+
 trace_vfio_listener_region_add_iommu(iova, end);
 /*
  * FIXME: For VFIO iommu types which have KVM acceleration to
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3e4d416..f012f77 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -149,6 +149,8 @@ struct MemoryRegionOps {
 typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;

 struct MemoryRegionIOMMUOps {
+/* Type of IOMMU */
+const char *iommu_type;
 /* Return a TLB entry that contains a given address. */
 IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool 
is_write);
 /* Returns minimum supported page size */
@@ -593,6 +595,21 @@ static inline bool memory_region_is_iommu(MemoryRegion *mr)
 return mr->iommu_ops;
 }

+/**
+ * memory_region_iommu_type: return type of IOMMU
+ *
+ * Returns type of IOMMU, empty string ("") if not a IOMMU

Re: [Qemu-devel] [PATCH v9 05/11] vfio: add check host bus reset is support or not

2016-08-31 Thread Alex Williamson

On Wed, 31 Aug 2016 13:56:20 -0600
Alex Williamson  wrote:

> On Tue, 19 Jul 2016 15:38:23 +0800
> Zhou Jie  wrote:
> 
> > From: Chen Fan 
> > 
> > When assigning a vfio device with AER enabled, we must check whether
> > the device supports a host bus reset (ie. hot reset) as this may be
> > used by the guest OS in order to recover the device from an AER
> > error.  QEMU must therefore have the ability to perform a physical
> > host bus reset using the existing vfio APIs in response to a virtual
> > bus reset in the VM.  A physical bus reset affects all of the devices
> > on the host bus, therefore we place a few simplifying configuration
> > restriction on the VM:
> > 
> >  - All physical devices affected by a bus reset must be assigned to
> >the VM with AER enabled on each and be configured on the same
> >virtual bus in the VM.
> > 
> >  - No devices unaffected by the bus reset, be they physical, emulated,
> >or paravirtual may be configured on the same virtual bus as a
> >device supporting AER signaling through vfio.
> > 
> > In other words users wishing to enable AER on a multifunction device
> > need to assign all functions of the device to the same virtual bus
> > and enable AER support for each device.  The easiest way to
> > accomplish this is to identity map the physical functions to virtual
> > functions with multifunction enabled on the virtual device.  
> 
> Why am I able to start the following VM with aer=on for the vfio-pci
> devices?
> 
> # lspci -tv
> -[:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
>+-01.0  Device 1234:
>+-1c.0-[01]--
>+-1d.0-[02]--+-01.0  Intel Corporation 82576 Gigabit Network 
> Connection
>|\-01.1  Intel Corporation 82576 Gigabit Network 
> Connection
>...
> 
> # lspci -vvv -s 1d.0
> 00:1d.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal 
> decode])
> 
> The devices are behind a PCIe-to-PCI bridge, so shouldn't specifying
> aer=on for the vfio-pci devices cause a configuration error?
> 
> commandline:
> 
> /home/alwillia/local/bin/qemu-system-x86_64 -name 
> guest=rhel7-q35,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-rhel7-q35/master-key.aes
>  -machine pc-q35-2.7,accel=kvm,usb=off,vmport=off -cpu IvyBridge -m 8192 
> -realtime mlock=off -smp 6,sockets=1,cores=6,threads=1 -uuid 
> b20b28b4-9304-4e11-9ffa-0367aeb44afb -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-11-rhel7-q35/monitor.sock,server,nowait
>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global 
> ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device 
> i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device 
> pci-bridge,chassis_nr=3,id=pci.3,bus=pcie.0,addr=0x1d -device 
> ioh3420,port=0xe0,chassis=4,id=pci.4,bus=pcie.0,addr=0x1c -device 
> ich9-usb-ehci1,id=usb,bus=pci!
 .2,addr=0x3.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.2,multifunction=on,addr=0x3 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.2,addr=0x3.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.2,addr=0x3.0x2 
-drive 
file=/dev/rhel/rhel7-q35,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
 -device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:50:ec:0d,bus=pci.2,addr=0x1 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -device 
VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device 
intel-hda,id=sound0,bus=pci.2,addr=0x2 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
vfio-pci,aer=on,host=07:00.0,id=hostdev0,bus=pci.3,multifunction=on,addr=0x1 
-device vfio-pci,!
 aer=on,host=07:00.1,id=hostdev1,bus=pci.3,addr=0x1.0x1 -msg timestamp=on
> 

I had to move to a different system where I could actually inject an
aer error and created a config similar to above but with the 82576
ports downstream of the ioh3420 root port.  When I inject a malformed
TLP uncorrectable error, my RHEL7.2 guest does this:

[   35.995645] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal) error 
received: id=0200
[   35.998483] igb :02:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), 
type=Unaccessible, id=0200(Unregistered Agent ID)
[   36.001965] igb :02:00.0 enp2s0f0: PCIe link lost, device now detached
[   36.015092] igb :02:00.1 enp2s0f1: PCIe link lost, device now detached
[   39.133185] igb :02:00.0:

[Qemu-devel] [PATCH] doc/rcu: fix typo

2016-08-31 Thread Cao jin

Signed-off-by: Cao jin 
---
 docs/rcu.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/rcu.txt b/docs/rcu.txt
index 2f70954..a70b72c 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -37,7 +37,7 @@ do not matter; as soon as all previous critical sections have 
finished,
 there cannot be any readers who hold references to the data structure,
 and these can now be safely reclaimed (e.g., freed or unref'ed).
 
-Here is a picutre:
+Here is a picture:
 
 thread 1  thread 2  thread 3
 ------
-- 
2.1.0

Re: [Qemu-devel] [PATCH v2] intel_iommu: add "eim" property

2016-08-31 Thread Peter Xu

On Fri, Aug 12, 2016 at 05:41:27PM +0800, Peter Xu wrote:
> Adding one extra property for intel-iommu device to decide whether we
> should support EIM bit for IR.
> 
> Now we are throwing high 24 bits of dest_id away directly. This will
> cause interrupt issues with guests that:
> 
> - enabled x2apic with cluster mode
> - have more than 8 vcpus (so dest_id[31:8] might be nonzero)
> 
> Let's make xapic the default one, and for the brave people who would
> like to try EIM and know the side effects, we can do it by explicitly
> enabling EIM using:
> 
>   -device intel-iommu,intremap=on,eim=on
> 
> Even after we have x2apic support, it'll still be good if we can provide
> a way to switch xapic/x2apic from QEMU side for e.g. debugging purpose,
> which is an alternative for tuning guest kernel boot parameters.
> 
> We can switch the default to "on" after x2apic fully supported.
> 
> Signed-off-by: Peter Xu 

Ping.

I'd really appreciate if someone can help have a look on this patch,
and to merge it if possible. Since current x2apic is broken. We should
not allow broken system boot, at least not by default.

Thanks,

-- peterx

Re: [Qemu-devel] [PATCH for 2.8 06/11] intel_iommu: support device iotlb descriptor

2016-08-31 Thread Peter Xu

On Wed, Aug 31, 2016 at 10:54:36AM +0800, Jason Wang wrote:
> >>  static void x86_iommu_instance_init(Object *o)
> >>  {
> >>  X86IOMMUState *s = X86_IOMMU_DEVICE(o);
> >>@@ -108,6 +120,11 @@ static void x86_iommu_instance_init(Object *o)
> >>  s->intr_supported = false;
> >>  object_property_add_bool(o, "intremap", x86_iommu_intremap_prop_get,
> >>   x86_iommu_intremap_prop_set, NULL);
> >>+s->dt_supported = false;
> >>+object_property_add_bool(o, "device_iotlb",
> >>+ x86_iommu_device_iotlb_prop_get,
> >>+ x86_iommu_device_iotlb_prop_set,
> >>+ NULL);
> >Nit 1: use "device-iotlb" instead of "device_iotlb"?
> 
> Yes.
> 
> >Nit 2: use Property bit (like vtd_properties)?
> 
> Not sure, I thought this may be reused by AMD IOMMU but maybe I was wrong.

I mean to create another Property for x86-iommus. :)

Anyway both work for me, and actually "intremap" property is doing it
that way as well...

-- peterx

Re: [Qemu-devel] [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-08-31 Thread Namhyung Kim

Hi Michael,

On Wed, Aug 31, 2016 at 05:54:04PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 31, 2016 at 05:08:00PM +0900, Namhyung Kim wrote:
> > The virtio pstore driver provides interface to the pstore subsystem so
> > that the guest kernel's log/dump message can be saved on the host
> > machine.  Users can access the log file directly on the host, or on the
> > guest at the next boot using pstore filesystem.  It currently deals with
> > kernel log (printk) buffer only, but we can extend it to have other
> > information (like ftrace dump) later.
> > 
> > It supports legacy PCI device using single order-2 page buffer.  It uses
> > two virtqueues - one for (sync) read and another for (async) write.
> > Since it cannot wait for write finished, it supports up to 128
> > concurrent IO.  The buffer size is configurable now.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Radim Krčmář 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Anthony Liguori 
> > Cc: Anton Vorontsov 
> > Cc: Colin Cross 
> > Cc: Kees Cook 
> > Cc: Tony Luck 
> > Cc: Steven Rostedt 
> > Cc: Ingo Molnar 
> > Cc: Minchan Kim 
> > Cc: Will Deacon 
> > Cc: k...@vger.kernel.org
> > Cc: qemu-devel@nongnu.org
> > Cc: virtualizat...@lists.linux-foundation.org
> > Cc: virtio-...@lists.oasis-open.org
> > Signed-off-by: Namhyung Kim 
> > ---

[SNIP]
> > +#define TYPE_TABLE_ENTRY(_entry)   \
> > +   { PSTORE_TYPE_##_entry, VIRTIO_PSTORE_TYPE_##_entry }
> > +
> > +struct type_table {
> > +   int pstore;
> > +   u16 virtio;
> > +} type_table[] = {
> > +   TYPE_TABLE_ENTRY(DMESG),
> > +};
> > +
> > +#undef TYPE_TABLE_ENTRY
> > +
> > +
> > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id 
> > type)
> > +{
> > +   unsigned int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(type_table); i++) {
> > +   if (type == type_table[i].pstore)
> > +   return cpu_to_virtio16(vps->vdev, type_table[i].virtio);
> 
> Does this pass sparse checks? If yes I'm surprised - this clearly
> returns a virtio16 type.

Ah, didn't run sparse.  Will change it to return a __le16 type
(according to your comment below).

> 
> 
> > +   }
> > +
> > +   return cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
> > +}
> > +
> > +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 
> > type)

This one should be '__le16 type' as well.


> > +{
> > +   unsigned int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(type_table); i++) {
> > +   if (virtio16_to_cpu(vps->vdev, type) == type_table[i].virtio)
> > +   return type_table[i].pstore;
> > +   }
> > +
> > +   return PSTORE_TYPE_UNKNOWN;
> > +}
> > +

[SNIP]
> > +
> > +struct virtio_pstore_req {
> > +   __virtio16  cmd;
> > +   __virtio16  type;
> > +   __virtio32  flags;
> > +   __virtio64  id;
> > +   __virtio32  count;
> > +   __virtio32  reserved;
> > +};
> > +
> > +struct virtio_pstore_res {
> > +   __virtio16  cmd;
> > +   __virtio16  type;
> > +   __virtio32  ret;
> > +};
> 
> Is there a reason to support legacy endian-ness?
> If not, you can just use __le formats.

I just didn't know what's the preferred type.  Will change!

Thanks,
Namhyung

> 
> 
> > +struct virtio_pstore_fileinfo {
> > +   __virtio64  id;
> > +   __virtio32  count;
> > +   __virtio16  type;
> > +   __virtio16  unused;
> > +   __virtio32  flags;
> > +   __virtio32  len;
> > +   __virtio64  time_sec;
> > +   __virtio32  time_nsec;
> > +   __virtio32  reserved;
> > +};
> > +
> > +struct virtio_pstore_config {
> > +   __virtio32  bufsize;
> > +};
> > +
> > +#endif /* _LINUX_VIRTIO_PSTORE_H */
> > -- 
> > 2.9.3

[Qemu-devel] [PATCH] aio: Remove spurious smp_read_barrier_depends()

2016-08-31 Thread Pranith Kumar

smp_read_barrier_depends() should be used only if you are reading
dependent pointers which are shared. Here 'bh' is a local variable and
dereferencing it will always be ordered after loading 'bh', i.e.,
bh->next will always be ordered after fetching bh.

This patch removes the barrier and adds a comment why storing
'bh->next' is necessary.

Signed-off-by: Pranith Kumar 
---
 async.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/async.c b/async.c
index 3bca9b0..6b691aa 100644
--- a/async.c
+++ b/async.c
@@ -77,8 +77,7 @@ int aio_bh_poll(AioContext *ctx)
 
 ret = 0;
 for (bh = ctx->first_bh; bh; bh = next) {
-/* Make sure that fetching bh happens before accessing its members */
-smp_read_barrier_depends();
+/* store bh->next since bh can be freed in aio_bh_call() */
 next = bh->next;
 /* The atomic_xchg is paired with the one in qemu_bh_schedule.  The
  * implicit memory barrier ensures that the callback sees all writes
-- 
2.9.3

Re: [Qemu-devel] [kvm-unit-tests PATCH v3 03/10] arm/arm64: smp: support more than 8 cpus

2016-08-31 Thread Auger Eric

Hi Drew,

On 30/08/2016 16:28, Auger Eric wrote:
> Hi Drew,
> 
> Proper commit message?
> ... also selects the vgic model corresponding to the host
>> Reviewed-by: Alex Bennée 
>> Signed-off-by: Andrew Jones 
>> ---
>>  arm/run   | 19 ---
>>  arm/selftest.c|  5 -
>>  lib/arm/asm/processor.h   |  9 +++--
>>  lib/arm/asm/setup.h   |  4 ++--
>>  lib/arm/setup.c   | 12 +++-
>>  lib/arm64/asm/processor.h |  9 +++--
>>  6 files changed, 43 insertions(+), 15 deletions(-)
>>
>> diff --git a/arm/run b/arm/run
>> index a2f35ef6a7e63..2d0698619606e 100755
>> --- a/arm/run
>> +++ b/arm/run
>> @@ -31,13 +31,6 @@ if [ -z "$ACCEL" ]; then
>>  fi
>>  fi
>>  
>> -if [ "$HOST" = "aarch64" ] && [ "$ACCEL" = "kvm" ]; then
>> -processor="host"
>> -if [ "$ARCH" = "arm" ]; then
>> -processor+=",aarch64=off"
>> -fi
>> -fi
>> -
>>  qemu="${QEMU:-qemu-system-$ARCH_NAME}"
>>  qpath=$(which $qemu 2>/dev/null)
>>  
>> @@ -53,6 +46,18 @@ fi
>>  
>>  M='-machine virt'
>>  
>> +if [ "$ACCEL" = "kvm" ]; then
>> +if $qemu $M,\? 2>&1 | grep gic-version > /dev/null; then
>> +M+=',gic-version=host'
>> +fi
>> +if [ "$HOST" = "aarch64" ]; then
>> +processor="host"
>> +if [ "$ARCH" = "arm" ]; then
>> +processor+=",aarch64=off"
>> +fi
>> +fi
>> +fi
>> +
>>  if ! $qemu $M -device '?' 2>&1 | grep virtconsole > /dev/null; then
>>  echo "$qpath doesn't support virtio-console for chr-testdev. Exiting."
>>  exit 2
>> diff --git a/arm/selftest.c b/arm/selftest.c
>> index 196164f5313de..2f117f795d2dc 100644
>> --- a/arm/selftest.c
>> +++ b/arm/selftest.c
>> @@ -312,9 +312,10 @@ static bool psci_check(void)
>>  static cpumask_t smp_reported;
>>  static void cpu_report(void)
>>  {
>> +unsigned long mpidr = get_mpidr();
>>  int cpu = smp_processor_id();
>>  
>> -report("CPU%d online", true, cpu);
>> +report("CPU(%3d) mpidr=%lx", mpidr_to_cpu(mpidr) == cpu, cpu, mpidr);
>>  cpumask_set_cpu(cpu, _reported);
>>  halt();
>>  }
>> @@ -343,6 +344,7 @@ int main(int argc, char **argv)
>>  
>>  } else if (strcmp(argv[1], "smp") == 0) {
>>  
>> +unsigned long mpidr = get_mpidr();
>>  int cpu;
>>  
>>  report("PSCI version", psci_check());
>> @@ -353,6 +355,7 @@ int main(int argc, char **argv)
>>  smp_boot_secondary(cpu, cpu_report);
>>  }
>>  
>> +report("CPU(%3d) mpidr=%lx", mpidr_to_cpu(mpidr) == 0, 0, 
>> mpidr);
>>  cpumask_set_cpu(0, _reported);
>>  while (!cpumask_full(_reported))
>>  cpu_relax();
>> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
>> index f25e7eee3666c..d2048f5f5f7e6 100644
>> --- a/lib/arm/asm/processor.h
>> +++ b/lib/arm/asm/processor.h
>> @@ -40,8 +40,13 @@ static inline unsigned int get_mpidr(void)
>>  return mpidr;
>>  }
>>  
>> -/* Only support Aff0 for now, up to 4 cpus */
>> -#define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
>> +#define MPIDR_HWID_BITMASK 0xff
>> +extern int mpidr_to_cpu(unsigned long mpidr);
>> +
>> +#define MPIDR_LEVEL_SHIFT(level) \
>> +(((1 << level) >> 1) << 3)
> can't we have level << 3?
Forget this, see below
>> +#define MPIDR_AFFINITY_LEVEL(mpidr, level) \
>> +((mpidr >> MPIDR_LEVEL_SHIFT(level)) & 0xff)
>>  
>>  extern void start_usr(void (*func)(void *arg), void *arg, unsigned long 
>> sp_usr);
>>  extern bool is_user(void);
>> diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
>> index cb8fdbd38dd5d..c501c6ddd8657 100644
>> --- a/lib/arm/asm/setup.h
>> +++ b/lib/arm/asm/setup.h
>> @@ -10,8 +10,8 @@
>>  #include 
>>  #include 
>>  
>> -#define NR_CPUS 8
>> -extern u32 cpus[NR_CPUS];
>> +#define NR_CPUS 255
> 256?
>> +extern u64 cpus[NR_CPUS];
> maybe worth commenting the semantic of cpus[i]?
>>  extern int nr_cpus;
> what about MAX_CPUS instead of NR_CPUS?
>>  
>>  #define NR_MEM_REGIONS  8
>> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
>> index 7e7b39f11dde1..b6e2d5815e723 100644
>> --- a/lib/arm/setup.c
>> +++ b/lib/arm/setup.c
>> @@ -24,12 +24,22 @@ extern unsigned long stacktop;
>>  extern void io_init(void);
>>  extern void setup_args_progname(const char *args);
>>  
>> -u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0U) };
>> +u64 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0U) };
>>  int nr_cpus;
>>  
>>  struct mem_region mem_regions[NR_MEM_REGIONS];
>>  phys_addr_t __phys_offset, __phys_end;
>>  
>> +int mpidr_to_cpu(unsigned long mpidr)
>> +{
>> +int i;
>> +
>> +for (i = 0; i < nr_cpus; ++i)
>> +if (cpus[i] == (mpidr & MPIDR_HWID_BITMASK))
>> +return i;
>> +return -1;
>> +}
>> +
>>  static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused)
>>  {
>>  int cpu =

Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-08-31 Thread no-reply

Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Subject: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends
Type: series
Message-id: 1472674630-18886-1-git-send-email-ehabk...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
make J=8 docker-test-quick@centos6
make J=8 docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
58d23fa vl: Delay initialization of memory backends

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD centos6
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPY RUNNER
  RUN test-quick in centos6
No C++ compiler available; disabling C++ specific optional code
Install prefix/tmp/qemu-test/src/tests/docker/install
BIOS directory/tmp/qemu-test/src/tests/docker/install/share/qemu
binary directory  /tmp/qemu-test/src/tests/docker/install/bin
library directory /tmp/qemu-test/src/tests/docker/install/lib
module directory  /tmp/qemu-test/src/tests/docker/install/lib/qemu
libexec directory /tmp/qemu-test/src/tests/docker/install/libexec
include directory /tmp/qemu-test/src/tests/docker/install/include
config directory  /tmp/qemu-test/src/tests/docker/install/etc
local state directory   /tmp/qemu-test/src/tests/docker/install/var
Manual directory  /tmp/qemu-test/src/tests/docker/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-fPIE -DPIE -m64 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
RDMA support  no
TCG interpreter   no
fdt support   yes
preadv supportyes
fdatasync yes
madvise   yes
posix_madvise yes
uuid support  no
libcap-ng support no
vhost-net support yes
vhost-scsi support yes
Trace backendslog
spice support no 
rbd support   no
xfsctl supportno
smartcard support no
libusbno
usb net redir no
OpenGL supportno
OpenGL dmabufsno
libiscsi support  no
libnfs supportno
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
QGA MSI support   no
seccomp support   no
coroutine backend ucontext
coroutine poolyes
GlusterFS support no
Archipelago support no
gcov  gcov
gcov enabled  no
TPM support   yes
libssh2 support   no
TPM passthrough   yes
QOM debugging yes
vhdx  no
lzo support   no
snappy supportno
bzip2 support no
NUMA host support no
tcmalloc support  no
jemalloc support  no
avx2 optimization no
  GEN   x86_64-softmmu/config-devices.mak.tmp
  GEN   aarch64-softmmu/config-devices.mak.tmp
  GEN   config-host.h
  GEN   qemu-options.def
  GEN   qmp-commands.h
  GEN   qapi-types.h
  GEN   qapi-visit.h
  GEN   qapi-event.h
  GEN   qmp-introspect.h
  GEN   x86_64-softmmu/config-devices.mak
  GEN   aarch64-softmmu/config-devices.mak
  GEN   tests/test-qapi-types.h
  GEN   tests/test-qapi-visit.h

Re: [Qemu-devel] [PATCH v2 0/9] SMMUv3 Emulation support

2016-08-31 Thread Auger Eric

Hi Prem,
On 22/08/2016 18:17, Prem Mallappa wrote:
> v1 -> v2:
>   - Adopted review comments from Eric Auger
Although I am really interested in your series, those comments are not
mine and credit should be given to somebody else (Edgar?)

I will do my utmost to review it too ;-)

Thanks

Eric
>   - Make SMMU_DPRINTF to internally call qemu_log
>   (since translation requests are too many, we need control
>on the type of log we want)
>   - SMMUTransCfg modified to suite simplicity
>   - Change RegInfo to uint64 register array
>   - Code cleanup
>   - Test cleanups
>   - Reshuffled patches
> 
> RFC -> v1:
>   - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
>   - Reworked register access/update logic
>   - Factored out translation code for
>   - single point bug fix
>   - sharing/removal in future
>   - (optional) Unit tests added, with PCI test device
>   - S1 with 4k/64k, S1+S2 with 4k/64k
>   - (S1 or S2) only can be verified by Linux 4.7 driver
>   - (optional) Priliminary ACPI support
> 
> RFC:
>   - Implements SMMUv3 spec 11.0
>   - Supported for PCIe devices, 
>   - Command Queue and Event Queue supported
>   - LPAE only, S1 is supported and Tested, S2 not tested
>   - BE mode Translation not supported
>   - IRQ support (legacy, no MSI)
>   - Tested with DPDK and e1000 
> 
> Patch 1: Add new log type for IOMMU transactions
> 
> Patch 2: Adds support in virt.c to create both SMMUv3 device and dts entries
> 
> Patch 2: Adds SMMUv3 model to QEMU
>   Multiple files, big ones, translate functionality is split across to
>   accomodate SMMUv2 model, and to remove when common translation feature
>   (if) becomes available.
> 
> Patch 3: Adds SMMU build support
> 
> Patch 4: Some devicetree function to add support for SMMU's multiple interrupt
>assignment with names
> 
> << optional patches >>
> Optional patches are posted for completeness or for those who wants to test.
> 
> Patch 5: A simple PCI device which does DMA from 'src' to 'dst' given
>src_addr, dst_addr and size, and is used by unit test, uses
>pci_dma_read and pci_dma_write in a crude way but serves the purpose.
> 
> Patch 6: Current libqos PCI helpers are x86 only, this addes a generic 
> interface
> 
> Patch 7: Unit tests for SMMU, 
>   - initializes SMMU device 
>   - initializes Test device
>   - allocates page tables 1:1 mapping va == pa
>   - allocates STE/CD accordingly for S1, S2, S1+S2
>   - initiates DMA via PCI test device
>   - verifies transfered data
> 
> Patch 8: Added ACPI IORT tables, was needed for internal project purpose, but 
>posting here for anyone looking for testing ACPI on ARM platforms.
>(P.S: Linux side IORT patches are WIP)
> 
> Repo:
> https://github.com/pmallappa/qemu/tree/upstream/smmuv3/v2
> 
> To Test:
> $ make tests/smmuv3-test
> $ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 tests/smmuv3-test
> << expect lot of prints >>
> 
> Any comments welcome..
> 
> Cheers
> /Prem
> 
> Prem Mallappa (9):
>   log: Add new IOMMU type
>   devicetree: Added new APIs to make use of more fdt functions
>   hw: arm: SMMUv3 emulation model
>   hw: arm: Added SMMUv3 files for build
>   hw: arm: Add SMMUv3 to virt platform, create DTS accordingly
>   [optional] hw: misc: added testdev for smmu
>   [optional] tests: libqos: generic pci probing helpers
>   [optional] tests: SMMUv3 unit tests
>   [optional] arm: smmu-v3: ACPI IORT initial support
> 
>  default-configs/aarch64-softmmu.mak |1 +
>  device_tree.c   |   35 +
>  hw/arm/Makefile.objs|1 +
>  hw/arm/smmu-common.c|  152 
>  hw/arm/smmu-common.h|  141 
>  hw/arm/smmu-v3.c| 1369 
> +++
>  hw/arm/smmuv3-internal.h|  432 +++
>  hw/arm/virt-acpi-build.c|   43 ++
>  hw/arm/virt.c   |   62 ++
>  hw/misc/Makefile.objs   |2 +-
>  hw/misc/pci-testdev-smmu.c  |  239 ++
>  hw/misc/pci-testdev-smmu.h  |   22 +
>  hw/vfio/common.c|2 +-
>  include/hw/acpi/acpi-defs.h |   84 +++
>  include/hw/arm/smmu.h   |   33 +
>  include/hw/arm/virt.h   |2 +
>  include/qemu/log.h  |1 +
>  include/sysemu/device_tree.h|   18 +
>  tests/Makefile.include  |4 +
>  tests/libqos/pci-generic.c  |  197 +
>  tests/libqos/pci-generic.h  |   58 ++
>  tests/smmuv3-test.c |  952 
>  util/log.c  |2 +
>  23 files changed, 3850 insertions(+), 2 deletions(-)
>  create mode

Re: [Qemu-devel] [PATCH v9 10/11] vfio: Add waiting for host aer error progress

2016-08-31 Thread Michael S. Tsirkin

On Wed, Aug 31, 2016 at 02:13:09PM -0600, Alex Williamson wrote:
> On Tue, 19 Jul 2016 15:38:28 +0800
> Zhou Jie  wrote:
> 
> > From: Chen Fan 
> > 
> > For supporting aer recovery, host and guest would run the same aer
> > recovery code, that would do the secondary bus reset if the error
> > is fatal, the aer recovery process:
> >   1. error_detected
> >   2. reset_link (if fatal)
> >   3. slot_reset/mmio_enabled
> >   4. resume
> > 
> > It indicates that host will do secondary bus reset to reset
> > the physical devices under bus in step 2, that would cause
> > devices in D3 status in a short time. But in qemu, we register
> > an error detected handler, that would be invoked as host broadcasts
> > the error-detected event in step 1, in order to avoid guest do
> > reset_link when host do reset_link simultaneously. it may cause
> > fatal error. we poll the vfio_device_info to assure host reset
> > completely.
> > In qemu, the aer recovery process:
> >   1. Detect support for aer error progress
> >  If host vfio driver does not support for aer error progress,
> >  directly fail to boot up VM as with aer enabled.
> >   2. Immediately notify the VM on error detected.
> >   3. Wait for host aer error progress
> >  Poll the vfio_device_info, If it is still in aer error progress after
> >  some timeout, we would abort the guest directed bus reset
> >  altogether and unplug of the device to prevent it from further
> >  interacting with the VM.
> >   4. Reset bus.
> > 
> > Signed-off-by: Chen Fan 
> > Signed-off-by: Zhou Jie 
> > ---
> >  hw/vfio/pci.c  | 51 
> > +-
> >  hw/vfio/pci.h  |  1 +
> >  linux-headers/linux/vfio.h |  4 
> >  3 files changed, 55 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 0e42786..777245c 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -35,6 +35,12 @@
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > +/*
> > + * Timeout for waiting host aer error process, it is 3 seconds.
> > + * For hardware bus reset 3 seconds will be enough.
> > + */
> > +#define PCI_AER_PROCESS_TIMEOUT 300
> 
> Why is 3 seconds "enough"?  What considerations went into determining
> this that would need to be re-evaluated if we ever want to change it?
> 24 hours is enough, but why was 3 seconds chosen over 24 hours?  Why
> would 2 seconds be a worse choice?  1?

And just to clarify, the answer belongs in a code comment
and possibly commit log, not just in an email response.

> > +
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> >  
> > @@ -1913,6 +1919,14 @@ static void vfio_check_hot_bus_reset(VFIOPCIDevice 
> > *vdev, Error **errp)
> >  VFIOGroup *group;
> >  int ret, i, devfn, range_limit;
> >  
> > +if (!(vdev->vbasedev.flags & VFIO_DEVICE_FLAGS_AERPROCESS)) {
> > +error_setg(errp, "vfio: Cannot enable AER for device %s,"
> > +   " host vfio driver does not support for"
> > +   " aer error progress",
> > +   vdev->vbasedev.name);
> > +return;
> > +}
> > +
> >  ret = vfio_get_hot_reset_info(vdev, );
> >  if (ret) {
> >  error_setg(errp, "vfio: Cannot enable AER for device %s,"
> > @@ -2679,6 +2693,11 @@ static void vfio_err_notifier_handler(void *opaque)
> >  msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> >   PCI_ERR_ROOT_CMD_NONFATAL_EN;
> >  
> > +if (isfatal) {
> > +PCIDevice *dev_0 = pci_get_function_0(dev);
> > +VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> > +vdev_0->pci_aer_error_signaled = true;
> > +}
> >  pcie_aer_msg(dev, );
> >  return;
> >  }
> > @@ -3163,6 +3182,19 @@ static void vfio_exitfn(PCIDevice *pdev)
> >  vfio_bars_exit(vdev);
> >  }
> >  
> > +static int vfio_aer_error_is_in_process(VFIOPCIDevice *vdev)
> > +{
> > +struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> > +int ret;
> > +
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, _info);
> > +if (ret) {
> > +error_report("vfio: error getting device info: %m");
> > +return ret;
> > +}
> > +return dev_info.flags & VFIO_DEVICE_FLAGS_INAERPROCESS ? 1 : 0;
> > +}
> > +
> >  static void vfio_pci_reset(DeviceState *dev)
> >  {
> >  PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
> > @@ -3176,7 +3208,24 @@ static void vfio_pci_reset(DeviceState *dev)
> >  if ((pci_get_word(br->config + PCI_BRIDGE_CONTROL) &
> >   PCI_BRIDGE_CTL_BUS_RESET)) {
> >  if (pci_get_function_0(pdev) == pdev) {
> > -vfio_pci_hot_reset(vdev,

[Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-08-31 Thread Eduardo Habkost

Initialization of memory backends may take a while when
prealloc=yes is used, depending on their size. Initializing
memory backends before chardevs may delay the creation of monitor
sockets, and trigger timeouts on management software that waits
until the monitor socket is created by QEMU.  See, for example,
the bug report at:
https://bugzilla.redhat.com/show_bug.cgi?id=1371211

This patch fixes the problem by adding "memory-backend-*" classes
to the delayed-initialization list.

I believe a more appropriate fix would be creating objects and
chardevs in the same ordering specified on the command-line, but
this patch should fix the bug until we figure out a better way.

Signed-off-by: Eduardo Habkost 
---
 vl.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/vl.c b/vl.c
index b3c80d5..7764032 100644
--- a/vl.c
+++ b/vl.c
@@ -2810,6 +2810,15 @@ static bool object_create_initial(const char *type)
 return false;
 }
 
+/* Initialization of memory backends may delay chardev
+ * initialization for too long, and trigger timeouts on
+ * software that waits for a monitor socket to be created
+ * (e.g. libvirt).
+ */
+if (g_str_has_prefix(type, "memory-backend-")) {
+return false;
+}
+
 return true;
 }
 
-- 
2.7.4

Re: [Qemu-devel] [PATCH v9 10/11] vfio: Add waiting for host aer error progress

2016-08-31 Thread Alex Williamson

On Tue, 19 Jul 2016 15:38:28 +0800
Zhou Jie  wrote:

> From: Chen Fan 
> 
> For supporting aer recovery, host and guest would run the same aer
> recovery code, that would do the secondary bus reset if the error
> is fatal, the aer recovery process:
>   1. error_detected
>   2. reset_link (if fatal)
>   3. slot_reset/mmio_enabled
>   4. resume
> 
> It indicates that host will do secondary bus reset to reset
> the physical devices under bus in step 2, that would cause
> devices in D3 status in a short time. But in qemu, we register
> an error detected handler, that would be invoked as host broadcasts
> the error-detected event in step 1, in order to avoid guest do
> reset_link when host do reset_link simultaneously. it may cause
> fatal error. we poll the vfio_device_info to assure host reset
> completely.
> In qemu, the aer recovery process:
>   1. Detect support for aer error progress
>  If host vfio driver does not support for aer error progress,
>  directly fail to boot up VM as with aer enabled.
>   2. Immediately notify the VM on error detected.
>   3. Wait for host aer error progress
>  Poll the vfio_device_info, If it is still in aer error progress after
>  some timeout, we would abort the guest directed bus reset
>  altogether and unplug of the device to prevent it from further
>  interacting with the VM.
>   4. Reset bus.
> 
> Signed-off-by: Chen Fan 
> Signed-off-by: Zhou Jie 
> ---
>  hw/vfio/pci.c  | 51 
> +-
>  hw/vfio/pci.h  |  1 +
>  linux-headers/linux/vfio.h |  4 
>  3 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 0e42786..777245c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -35,6 +35,12 @@
>  
>  #define MSIX_CAP_LENGTH 12
>  
> +/*
> + * Timeout for waiting host aer error process, it is 3 seconds.
> + * For hardware bus reset 3 seconds will be enough.
> + */
> +#define PCI_AER_PROCESS_TIMEOUT 300

Why is 3 seconds "enough"?  What considerations went into determining
this that would need to be re-evaluated if we ever want to change it?
24 hours is enough, but why was 3 seconds chosen over 24 hours?  Why
would 2 seconds be a worse choice?  1?

> +
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>  
> @@ -1913,6 +1919,14 @@ static void vfio_check_hot_bus_reset(VFIOPCIDevice 
> *vdev, Error **errp)
>  VFIOGroup *group;
>  int ret, i, devfn, range_limit;
>  
> +if (!(vdev->vbasedev.flags & VFIO_DEVICE_FLAGS_AERPROCESS)) {
> +error_setg(errp, "vfio: Cannot enable AER for device %s,"
> +   " host vfio driver does not support for"
> +   " aer error progress",
> +   vdev->vbasedev.name);
> +return;
> +}
> +
>  ret = vfio_get_hot_reset_info(vdev, );
>  if (ret) {
>  error_setg(errp, "vfio: Cannot enable AER for device %s,"
> @@ -2679,6 +2693,11 @@ static void vfio_err_notifier_handler(void *opaque)
>  msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>   PCI_ERR_ROOT_CMD_NONFATAL_EN;
>  
> +if (isfatal) {
> +PCIDevice *dev_0 = pci_get_function_0(dev);
> +VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +vdev_0->pci_aer_error_signaled = true;
> +}
>  pcie_aer_msg(dev, );
>  return;
>  }
> @@ -3163,6 +3182,19 @@ static void vfio_exitfn(PCIDevice *pdev)
>  vfio_bars_exit(vdev);
>  }
>  
> +static int vfio_aer_error_is_in_process(VFIOPCIDevice *vdev)
> +{
> +struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> +int ret;
> +
> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, _info);
> +if (ret) {
> +error_report("vfio: error getting device info: %m");
> +return ret;
> +}
> +return dev_info.flags & VFIO_DEVICE_FLAGS_INAERPROCESS ? 1 : 0;
> +}
> +
>  static void vfio_pci_reset(DeviceState *dev)
>  {
>  PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
> @@ -3176,7 +3208,24 @@ static void vfio_pci_reset(DeviceState *dev)
>  if ((pci_get_word(br->config + PCI_BRIDGE_CONTROL) &
>   PCI_BRIDGE_CTL_BUS_RESET)) {
>  if (pci_get_function_0(pdev) == pdev) {
> -vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
> +if (!vdev->pci_aer_error_signaled) {
> +vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
> +} else {
> +int i;
> +for (i = 0; i < 1000; i++) {
> +if (!vfio_aer_error_is_in_process(vdev)) {
> +break;
> +}
> +

Re: [Qemu-devel] [PATCH v2] sh4: fix broken link to documentation

2016-08-31 Thread Aurelien Jarno

On 2016-08-31 18:31, Reda Sallahi wrote:
> The page that was previously linked in the source code and the README file is
> no longer available so it now returns a 404 error message.
> 
> This puts a previous snapshot from archive.org instead.
> 
> Signed-off-by: Reda Sallahi 
> ---
> Changes from v1:
> * Add the 'https://' part to the link in hw/sh4/shix.c.
> 
>  hw/sh4/shix.c | 2 +-
>  target-sh4/README.sh4 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Thanks for the new version.

Acked-by: Aurelien Jarno 

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v9 05/11] vfio: add check host bus reset is support or not

2016-08-31 Thread Alex Williamson

On Tue, 19 Jul 2016 15:38:23 +0800
Zhou Jie  wrote:

> From: Chen Fan 
> 
> When assigning a vfio device with AER enabled, we must check whether
> the device supports a host bus reset (ie. hot reset) as this may be
> used by the guest OS in order to recover the device from an AER
> error.  QEMU must therefore have the ability to perform a physical
> host bus reset using the existing vfio APIs in response to a virtual
> bus reset in the VM.  A physical bus reset affects all of the devices
> on the host bus, therefore we place a few simplifying configuration
> restriction on the VM:
> 
>  - All physical devices affected by a bus reset must be assigned to
>the VM with AER enabled on each and be configured on the same
>virtual bus in the VM.
> 
>  - No devices unaffected by the bus reset, be they physical, emulated,
>or paravirtual may be configured on the same virtual bus as a
>device supporting AER signaling through vfio.
> 
> In other words users wishing to enable AER on a multifunction device
> need to assign all functions of the device to the same virtual bus
> and enable AER support for each device.  The easiest way to
> accomplish this is to identity map the physical functions to virtual
> functions with multifunction enabled on the virtual device.

Why am I able to start the following VM with aer=on for the vfio-pci
devices?

# lspci -tv
-[:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
   +-01.0  Device 1234:
   +-1c.0-[01]--
   +-1d.0-[02]--+-01.0  Intel Corporation 82576 Gigabit Network 
Connection
   |\-01.1  Intel Corporation 82576 Gigabit Network 
Connection
   ...

# lspci -vvv -s 1d.0
00:1d.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal 
decode])

The devices are behind a PCIe-to-PCI bridge, so shouldn't specifying
aer=on for the vfio-pci devices cause a configuration error?

commandline:

/home/alwillia/local/bin/qemu-system-x86_64 -name 
guest=rhel7-q35,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-rhel7-q35/master-key.aes
 -machine pc-q35-2.7,accel=kvm,usb=off,vmport=off -cpu IvyBridge -m 8192 
-realtime mlock=off -smp 6,sockets=1,cores=6,threads=1 -uuid 
b20b28b4-9304-4e11-9ffa-0367aeb44afb -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-11-rhel7-q35/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global 
ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device 
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device 
pci-bridge,chassis_nr=3,id=pci.3,bus=pcie.0,addr=0x1d -device 
ioh3420,port=0xe0,chassis=4,id=pci.4,bus=pcie.0,addr=0x1c -device 
ich9-usb-ehci1,id=usb,bus=pci.2!
 ,addr=0x3.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.2,multifunction=on,addr=0x3 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.2,addr=0x3.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.2,addr=0x3.0x2 
-drive 
file=/dev/rhel/rhel7-q35,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
 -device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:50:ec:0d,bus=pci.2,addr=0x1 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -device 
VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device 
intel-hda,id=sound0,bus=pci.2,addr=0x2 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
vfio-pci,aer=on,host=07:00.0,id=hostdev0,bus=pci.3,multifunction=on,addr=0x1 
-device vfio-pci,ae!
 r=on,host=07:00.1,id=hostdev1,bus=pci.3,addr=0x1.0x1 -msg timestamp=on

Thanks,
Alex

> Signed-off-by: Chen Fan 
> ---
>  hw/vfio/pci.c | 278 
> +-
>  hw/vfio/pci.h |   1 +
>  2 files changed, 256 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 21fd801..242c1e4 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1693,6 +1693,42 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, 
> uint8_t pos)
>  }
>  }
>  
> +static int vfio_pci_name_to_addr(const char *name, PCIHostDeviceAddress 
> *addr)
> +{
> +if (strlen(name) != 12 ||
> +sscanf(name, "%04x:%02x:%02x.%1x", >domain,
> +   >bus, >slot, >function) != 4) {
> +return -EINVAL;
> +}
> +
> +return 0;
> +}
> +
> +static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
> +{
> +PCIHostDeviceAddress tmp;
> +
> +

Re: [Qemu-devel] [PATCH v9 03/11] vfio: add aer support for vfio device

2016-08-31 Thread Alex Williamson

On Tue, 19 Jul 2016 15:38:21 +0800
Zhou Jie  wrote:

> From: Chen Fan 
> 
> Calling pcie_aer_init to initilize aer related registers for
> vfio device, then reload physical related registers to expose
> device capability.
> 
> Signed-off-by: Chen Fan 
> ---
>  hw/vfio/pci.c | 75 
> ++-
>  hw/vfio/pci.h |  3 +++
>  2 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6a6160b..11c895c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1854,6 +1854,66 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
> uint8_t pos)
>  return 0;
>  }
>  
> +static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
> +  int pos, uint16_t size)
> +{
> +PCIDevice *pdev = >pdev;
> +PCIDevice *dev_iter;
> +uint8_t type;
> +uint32_t errcap;
> +
> +if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
> +pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
> +cap_ver, pos, size);
> +return 0;
> +}
> +
> +dev_iter = pci_bridge_get_device(pdev->bus);
> +if (!dev_iter) {
> +goto error;
> +}
> +
> +while (dev_iter) {
> +if (!pci_is_express(dev_iter)) {
> +goto error;
> +}
> +
> +type = pcie_cap_get_type(dev_iter);
> +if ((type != PCI_EXP_TYPE_ROOT_PORT &&
> + type != PCI_EXP_TYPE_UPSTREAM &&
> + type != PCI_EXP_TYPE_DOWNSTREAM)) {
> +goto error;
> +}
> +
> +if (!dev_iter->exp.aer_cap) {
> +goto error;
> +}
> +
> +dev_iter = pci_bridge_get_device(dev_iter->bus);
> +}
> +
> +errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
> +/*
> + * The ability to record multiple headers is depending on
> + * the state of the Multiple Header Recording Capable bit and
> + * enabled by the Multiple Header Recording Enable bit.
> + */
> +if ((errcap & PCI_ERR_CAP_MHRC) &&
> +(errcap & PCI_ERR_CAP_MHRE)) {
> +pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
> +} else {
> +pdev->exp.aer_log.log_max = 0;
> +}
> +
> +pcie_cap_deverr_init(pdev);
> +return pcie_aer_init(pdev, pos, size);

pcie_aer_init() adds a v2 AER capability regardless of the version of
the AER capability on the device.  Is this expected?  v2 defines a lot
more bits in various registers than v1, so are we simply hoping that
devices have reserved bits as zero like they're supposed to?  It's a
bit strange that for an Intel 82576 NIC I get a v1 AER capability w/o
aer=on and a v2 with.  Thanks,

Alex

> +
> +error:
> +error_report("vfio: Unable to enable AER for device %s, parent bus "
> + "does not support AER signaling", vdev->vbasedev.name);
> +return -1;
> +}
> +
>  static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  {
>  PCIDevice *pdev = >pdev;
> @@ -1861,6 +1921,7 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  uint16_t cap_id, next, size;
>  uint8_t cap_ver;
>  uint8_t *config;
> +int ret = 0;
>  
>  /* Only add extended caps if we have them and the guest can see them */
>  if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
> @@ -1914,6 +1975,9 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
> PCI_EXT_CAP_NEXT_MASK);
>  
>  switch (cap_id) {
> +case PCI_EXT_CAP_ID_ERR:
> +ret = vfio_setup_aer(vdev, cap_ver, next, size);
> +break;
>  case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
>  trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, 
> next);
>  break;
> @@ -1921,6 +1985,9 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>  }
>  
> +if (ret) {
> +goto out;
> +}
>  }
>  
>  /* Cleanup chain head ID if necessary */
> @@ -1928,8 +1995,9 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
>  }
>  
> +out:
>  g_free(config);
> -return 0;
> +return ret;
>  }
>  
>  static int vfio_add_capabilities(VFIOPCIDevice *vdev)
> @@ -2698,6 +2766,11 @@ static int vfio_initfn(PCIDevice *pdev)
>  goto out_teardown;
>  }
>  
> +if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
> +!pdev->exp.aer_cap) {
> +goto out_teardown;
> +}
> +
>  if (vdev->vga) {
>  vfio_vga_quirk_setup(vdev);
>  }
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 7d482d9..5483044 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -15,6 +15,7 @@
>  #include "qemu-common.h"
>  #include "exec/memory.h"
>  #include "hw/pci/pci.h"
> +#include

[Qemu-devel] [Bug 670769] Re: CDROM size not updated when changing image files

2016-08-31 Thread John Snow

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/670769

Title:
  CDROM size not updated when changing image files

Status in QEMU:
  New

Bug description:
  I'm using qemu 13.0 with a plain Linux kernel using the ata_piix driver as 
the guest, and an initrd that starts a shell. When changing the image in the 
monitor and reading from the CDROM in the guest, the size is not updated. I'm 
using LInux 2.6.32.24
  as the host and I've tested 2.6.32.24, 2.6.35, and 2.6.36 as guests.  Both 
host and guest are 64-bit. Here is the command used to start the guest using 
the initrd:

  ./x86_64-softmmu/qemu-system-x86_64 -cdrom /spare2/cd1.img -kernel
  /sources/linux-2.6.32.24-test/arch/x86/boot/bzImage -initrd
  /spare2/initrd.img -append 'root=/dev/ram0 rw' -cpu core2duo

  Additional info on this bug can be found here: 
http://marc.info/?l=kvm=128746013906820=2. Note: this is how I discovered 
  the bug, using 32-bit Slackware install CDs.

  I'm attaching the initrd I used in my tests: I created two different-sized 
fake CDROM images by dd'ing from /dev/zero. In my tests,
  cd1.img is smaller that cd2.img. In the monitor I executed 'change ide1-cd0 
/spare2/cd2.img' to load the new image. I checked 
  the size by cat'ing /sys/block/sr0/size in the guest after reading the CDROM. 
Reading the CDROM was done by typing
  'dd if=/dev/sr0 of=/dev/null bs=512 count=3'

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/670769/+subscriptions

[Qemu-devel] [Bug 1070762] Re: savevm fails with inserted CD, "Device '%s' is writable but does not support snapshots."

2016-08-31 Thread John Snow

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1070762

Title:
  savevm fails with inserted CD, "Device '%s' is writable but does not
  support  snapshots."

Status in QEMU:
  New

Bug description:
  Hi,

  yesterday unfortunately a customer reported a failed snapshot of his
  VM. Going through the logfile I discovered:

  "Device 'ide1-cd0' is writable but does not support snapshots"

  this is with qemu-1.2.0 and 1.0.1 at least...

  Why writeable?
  Even if I specify "-drive ...,readonly=on,snapshot=off" to qemu the 
monitor-command sees the CD-ROM-device as being writeable?!

  Somewhere I saw a "hint" for blockdev.c:
  === snip ===

  --- /tmp/blockdev.c   2012-10-24 11:37:10.0 +0200
  +++ blockdev.c2012-10-24 11:37:17.0 +0200
  @@ -551,6 +551,7 @@
   case IF_XEN:
   case IF_NONE:
   dinfo->media_cd = media == MEDIA_CDROM;
  + dinfo->bdrv->read_only = 1;
   break;
   case IF_SD:
   case IF_FLOPPY:

  === snap ===

  after installing with this small patch applied it works, so insert CD, savevm 
 succeeds.
  This should be fixed at all correct places, and the tags 
"readonly=on,snapshot=off" should do it, too. Or even just work after 
specifying a drive being a CD-rom should do the trick ;-)

  Another "bad habit" is, that the ISO/DVD-file has to be writeable to
  be changed?

  Thnx for attention and regards,

  Oliver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1070762/+subscriptions

[Qemu-devel] [Bug 588691] Re: QEMU is not correctly detecting host CDs

2016-08-31 Thread John Snow

** Changed in: qemu
 Assignee: Natalia Portillo (claunia) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/588691

Title:
  QEMU is not correctly detecting host CDs

Status in QEMU:
  In Progress

Bug description:
  QEMU's block layer contains code for detecting and using ioctls when
  real CD-ROM host devices are attached.

  This detection is not working in some host OSes while bad implemented
  on anothers.

  E.g., in Linux host qemu -cdrom /dev/sr0 is not detecting it as a CD-ROM
  E.g., in Mac OS X host qemu asks the kernel to enumerate optical devices and 
the compares it to the constant string "/dev/cdrom". This is useless, that 
enumeration is just enough, and "/dev/cdrom" will NEVER exist in Mac OS X 
unless manually created by the user.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/588691/+subscriptions

[Qemu-devel] [Bug 1219234] Re: -device ide-hd will assign bus with with no free units

2016-08-31 Thread John Snow

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1219234

Title:
  -device ide-hd will assign bus with with no free units

Status in QEMU:
  New

Bug description:
  Originally filed here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1000118

  ./x86_64-softmmu/qemu-system-x86_64 -device ahci -drive 
id=aa,file=/tmp/foo,if=none -drive id=bb,file=/tmp/foo,if=none -device 
ide-hd,drive=aa -device ide-hd,drive=bb
  qemu-system-x86_64: -device ide-hd,drive=bb: Can't create IDE unit 1, bus 
supports only 1 units
  qemu-system-x86_64: -device ide-hd,drive=bb: Device initialization failed.
  qemu-system-x86_64: -device ide-hd,drive=bb: Device 'ide-hd' could not be 
initialized

  If a bus isn't specified for -device ide-hd, it just uses the first
  bus it finds, not taking into account if that bus was already assigned
  for another device. So users are forced to do -device ide-hd,bus=ide.0
  -device ide-hd,bus=ide.1, etc.

  This isn't specific to -device ahci, but it's worse there since there
  isn't any -drive if=IDE or -hda convenience option, which both seem to
  get the logic correct.

  I know -device is the 'build it yourself' approach so I understand if
  this is WONTFIX.

  This is affects qemu.git as of today (8-31-2013)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1219234/+subscriptions

[Qemu-devel] [Bug 1368204] Re: WinME isn't able to detect QEMU's cdrom drive and other hard drives automatically

2016-08-31 Thread John Snow

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1368204

Title:
  WinME isn't able to detect QEMU's cdrom drive and other hard drives
  automatically

Status in QEMU:
  New

Bug description:
  On a fresh installation of Windows Millennium (WinME) in qemu, Windows
  Me isn't able to find the CD-ROM drive or additional hard drives other
  than -hda at first place.

  Only if i add manually an IDE controller driver in Windows ME's device 
manager, the CD-ROM inserted in QEMU is found.
  Thus an IDE controller isn't found automatically either.

  This shouldn't be the case. On normal real hardware, Windows ME would
  find at least one IDE or SCSI controller.

  The command line that was used is the following:
  sudo /usr/bin/qemu-system-i386 -hda WinME_QEMU.img -cdrom drivers.iso -boot c 
-no-acpi -no-hpet -soundhw sb16 -net nic -cpu pentium3 -m 256 -vga cirrus   

  qemu's version is:
  qemu-system-i386 --version

  
  QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.3), Copyright (c) 
2003-2008 Fabrice Bellard

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1368204/+subscriptions

[Qemu-devel] [Bug 786208] Re: Missing checks for non-existent device in ide_exec_cmd

2016-08-31 Thread John Snow

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/786208

Title:
  Missing checks for non-existent device in ide_exec_cmd

Status in QEMU:
  New

Bug description:
  Several calls in the ide_exec_cmd handler are missing checks for
  (!s->bs) or similar, resulting in NULL pointer dereferences, divide-
  by-zero, or possibly other badness if the guest performs operations on
  a non-existent IDE master.

  For example, the WIN_READ_NATIVE_MAX command does a 'ide_set_sector(s,
  s->nb_sectors - 1);', which does 'cyl = sector_num / (s->heads *
  s->sectors);', which will fail with a divide-by-zero if heads =
  sectors = 0.

  And WIN_MULTREAD also does not check for s->bs, but does a
  'ide_sector_read(s);', which will do 'bdrv_read(s->bs, sector_num,
  s->io_buffer, n);' on a NULL s->bs, leading to a segfault.

  I do not *believe* that a malicious guest can do anything more than
  cause a crash with these bugs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/786208/+subscriptions

[Qemu-devel] [Bug 1414466] Re: -net user, hostfwd=... is not working(qemu-system-aarch64)

2016-08-31 Thread Orzech

Also happens on Ubuntu 16.04.1 64-bit with QEMU 1:2.5+dfsg-5ubuntu10.4.
I have the following settings added to instance xml config:



  


  

It looks like forwarding does not happen at all. When I try to connect
to guest instance, I get exactly the same results regardless of whether
sshd is running in that instance or not.

** Changed in: qemu
   Status: New => Confirmed

** Tags added: qemu trusty ubuntu xenial

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1414466

Title:
  -net user,hostfwd=... is not working(qemu-system-aarch64)

Status in QEMU:
  Confirmed

Bug description:
  QEMU version: git a46b3aaf6bb038d4f6f192a84df204f10929e75c

   /opt/qemu.git/bin/qemu-system-aarch64 --version
  QEMU emulator version 2.2.50, Copyright (c) 2003-2008 Fabrice Bellard

  Hosts:
  ovs - host machine (Ubuntu 14.04.1, x86_64)
  debian8-arm64 - guest 

  Guest start:
  user@ovs:~$ /opt/qemu.git/bin/qemu-system-aarch64 -machine virt -cpu 
cortex-a57 -nographic -smp 1 -m 512 -kernel vmlinuz-run -initrd initrd-run.img 
-append "root=/dev/sda2 console=ttyAMA0" -global virtio-blk-device.scsi=off 
-device virtio-scsi-device,id=scsi -drive 
file=debian8-arm64.img,id=rootimg,cache=unsafe,if=none -device 
scsi-hd,drive=rootimg -netdev user,id=unet -device 
virtio-net-device,netdev=unet -net user,hostfwd=tcp:127.0.0.1:1122-:22

  root@debian8-arm64:~# netstat -ntplu | grep ssh
  tcp0  0 0.0.0.0:22  0.0.0.0:*   LISTEN
  410/sshd
  tcp6   0  0 :::22   :::*LISTEN
  410/sshd   

  (no firewall in guest vm)

  user@ovs:~$ netstat -ntplu | grep 1122
  tcp0  0 127.0.0.1:1122  0.0.0.0:*   LISTEN
  18722/qemu-system-a

  user@ovs:~$ time ssh user@127.0.0.1 -p 1122
  ssh_exchange_identification: read: Connection reset by peer

  real  1m29.341s
  user  0m0.005s
  sys   0m0.000s

  Inside guest vm sshd works fine:
  root@debian8-arm64:~# ssh user@127.0.0.1 -p 22
  user@127.0.0.1's password: 
  
  user@debian8-arm64:~$ exit
  logout
  Connection to 127.0.0.1 closed.

  root@debian8-arm64:~# ssh user@10.0.2.15 -p 22
  user@10.0.2.15's password: 
  ...
  user@debian8-arm64:~$ exit
  logout
  Connection to 10.0.2.15 closed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1414466/+subscriptions

Re: [Qemu-devel] [PATCH 0/2] virtio: fix VirtQueue->inuse field

2016-08-31 Thread Denis V. Lunev

On 08/30/2016 10:54 PM, Stefan Hajnoczi wrote:
> On Mon, Aug 22, 2016 at 10:00 AM, Denis V. Lunev
>  wrote:
>> On 08/15/2016 08:54 AM, Stefan Hajnoczi wrote:
>>> The VirtQueue->inuse field is not always updated correctly.  These patches
>>> fix
>>> it.
>>>
>>> Originally this series was called "virtio-balloon: fix stats vq migration"
>>> but
>>> Ladi Prosek posted a nicer fix called "balloon: Fix failure of updating
>>> guest
>>> memory status".  I dropped the virtio-balloon patches.
>>>
>>> Changes from previous series:
>>>   * Missing comma in error formatting [Fam]
>>>   * virtio_descard() -> virtio_discard() [Michael]
>>>   * Multi-line comment style [Cornelia]
>>>
>>> Stefan Hajnoczi (2):
>>>virtio: recalculate vq->inuse after migration
>>>virtio: decrement vq->inuse in virtqueue_discard()
>>>
>>>   hw/virtio/virtio.c | 16 
>>>   1 file changed, 16 insertions(+)
>>>
>> these patches break 'make check' with the following:
>>
>> GTESTER check-qtest-x86_64
>> Warning: path not on HugeTLBFS: /tmp/vhost-test-hRYeTb
>> Warning: path not on HugeTLBFS: /tmp/vhost-test-hRYeTb
>> Warning: path not on HugeTLBFS: /tmp/vhost-test-hRYeTb
>> qemu-system-x86_64: VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1
>> qemu-system-x86_64: error while loading state for instance 0x0 of device
>> ':00:03.0/virtio-net'
>> qemu-system-x86_64: load of migration failed: Operation not permitted
>> Broken pipe
>> qemu-system-x86_64: Failed to read msg header. Read 0 instead of 12.
>> Original request 11.
>> GTester: last random seed: R02S122f07a3fc35cfd5b0204e3eb45c61e6
>> qemu-system-x86_64: Failed to read msg header. Read 0 instead of 12.
>> Original request 11.
>> Warning: path not on HugeTLBFS: /tmp/vhost-test-60WtDz
>> blkdebug: Suspended request 'A'
>> blkdebug: Resuming request 'A'
>> main-loop: WARNING: I/O thread spun for 1000 iterations
>> main-loop: WARNING: I/O thread spun for 1000 iterations
>> /home/den/src/git/qemu/tests/Makefile:400: recipe for target
>> 'check-qtest-x86_64' failed
>> make: *** [check-qtest-x86_64] Error 1
>> iris ~/src/git/qemu $
>>
>> Sorry, if I have missed the fix in the list.
> I hit this issue when backporting to a QEMU 2.3-based source tree.
> This doesn't happen in qemu.git/master.
>
> To save anyone doing a backport a lot of time:
>
> If you enable vhost-user-test in a QEMU 2.3-based source tree with the
> migration test case, make sure you also backport
> 8c56c1a592b5092d91da8d8943c1d6462a6f ("memory: emulate
> ioeventfd").
>
> Stefan
and, in addition to this, 4eae2a657d1ff5ada56eb9b4966eae0eff333b0b is
also necessary.
Without it scenario with suspend/resume after guest reboot fails.

Thank you Roma for this finding.

Den

[Qemu-devel] [PATCH v2 5/7] ppc/pnv: add a PnvCore object

2016-08-31 Thread Cédric Le Goater

This is largy inspired by sPAPRCPUCore with some simplification, no
hotplug for instance. But the differences are small and the objects
could possibly be merged.

A set of PnvCore objects is added to the PnvChip and the device
tree is populated looping on these cores.

Real HW cpu ids are now generated depending on the chip cpu model, the
chip id and a core mask. This id is stored in CPUState->cpu_index and
in PnvCore->core_id and it is used to populate the device tree.

Signed-off-by: Cédric Le Goater 
---

 Changes since v1:

 - changed name to PnvCore
 - changed PnvChip core array type to a 'PnvCore *cores'
 - introduced real cpu hw ids using a core mask from the chip
 - reworked powernv_create_core_node() which populates the device tree
 - added missing "ibm,pa-features" property 
 - smp_cpus representing threads, used smp_cores instead to create the
   cores in the chip.
 - removed the use of ppc_get_vcpu_dt_id() 
 - added "POWER8E" and "POWER8NVL" cpu models to exercice the
   PnvChipClass

 hw/ppc/Makefile.objs  |   2 +-
 hw/ppc/pnv.c  | 204 ++
 hw/ppc/pnv_core.c | 170 ++
 include/hw/ppc/pnv.h  |   7 ++
 include/hw/ppc/pnv_core.h |  47 +++
 5 files changed, 429 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/pnv_core.c
 create mode 100644 include/hw/ppc/pnv_core.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index f580e5c41413..08c213c40684 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -6,7 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
 # IBM PowerNV
-obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o
+obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index b6efb5e3ef07..daf9f459ab0e 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -35,6 +35,7 @@
 #include "hw/ppc/fdt.h"
 #include "hw/ppc/ppc.h"
 #include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_core.h"
 #include "hw/loader.h"
 #include "exec/address-spaces.h"
 #include "qemu/cutils.h"
@@ -98,6 +99,136 @@ static int powernv_populate_memory(void *fdt)
 return 0;
 }
 
+/*
+ * The PowerNV cores (and threads) need to use real HW ids and not an
+ * incremental index like it has been done on other platforms. This HW
+ * id is called a PIR and is used in the device tree, in the XSCOM
+ * communication to address cores, in the interrupt servers.
+ */
+static void powernv_create_core_node(PnvCore *pc, void *fdt,
+ int cpus_offset, int chip_id)
+{
+CPUCore *core = CPU_CORE(pc);
+CPUState *cs = CPU(DEVICE(pc->threads));
+DeviceClass *dc = DEVICE_GET_CLASS(cs);
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+int smt_threads = ppc_get_compat_smt_threads(cpu);
+CPUPPCState *env = >env;
+PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
+uint32_t servers_prop[smt_threads];
+uint32_t gservers_prop[smt_threads * 2];
+int i;
+uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
+   0x, 0x};
+uint32_t tbfreq = PNV_TIMEBASE_FREQ;
+uint32_t cpufreq = 10;
+uint32_t page_sizes_prop[64];
+size_t page_sizes_prop_size;
+const uint8_t pa_features[] = { 24, 0,
+0xf6, 0x3f, 0xc7, 0xc0, 0x80, 0xf0,
+0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
+int offset;
+char *nodename;
+
+nodename = g_strdup_printf("%s@%x", dc->fw_name, core->core_id);
+offset = fdt_add_subnode(fdt, cpus_offset, nodename);
+_FDT(offset);
+g_free(nodename);
+
+_FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id", chip_id)));
+
+_FDT((fdt_setprop_cell(fdt, offset, "reg", core->core_id)));
+_FDT((fdt_setprop_cell(fdt, offset, "ibm,pir", core->core_id)));
+_FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
+
+_FDT((fdt_setprop_cell(fdt, offset, "cpu-version", env->spr[SPR_PVR])));
+_FDT((fdt_setprop_cell(fdt, offset, "d-cache-block-size",
+env->dcache_line_size)));
+_FDT((fdt_setprop_cell(fdt, offset, "d-cache-line-size",
+env->dcache_line_size)));
+_FDT((fdt_setprop_cell(fdt, offset, "i-cache-block-size",
+env->icache_line_size)));
+_FDT((fdt_setprop_cell(fdt, offset, "i-cache-line-size",
+env->icache_line_size)));
+
+if (pcc->l1_dcache_size) {
+_FDT((fdt_setprop_cell(fdt, offset, "d-cache-size",
+

[Qemu-devel] [PATCH v2 7/7] monitor: fix crash for platforms without a CPU 0

2016-08-31 Thread Cédric Le Goater

On PowerNV, CPU ids start at 0x8 or 0x20, we don't have a CPU 0
anymore. So let's use the first_cpu index to initialize the monitor.

Signed-off-by: Cédric Le Goater 
---

 So that you can dump the cpu list with the monitor :

(qemu) info cpus
* CPU #8: nip=0x0010 thread_id=7742
  CPU #16: nip=0x0010 thread_id=7740
  CPU #24: nip=0x0010 thread_id=7740
  CPU #32: nip=0x0010 thread_id=7740
  CPU #40: nip=0x0010 thread_id=7740
  CPU #48: nip=0x0010 thread_id=7740
  CPU #72: nip=0x0010 thread_id=7740
  CPU #80: nip=0x0010 thread_id=7740
  CPU #136: nip=0x0010 thread_id=7740
  CPU #144: nip=0x0010 thread_id=7740
  CPU #152: nip=0x0010 thread_id=7740
  CPU #160: nip=0x0010 thread_id=7740
  CPU #168: nip=0x0010 thread_id=7740
  CPU #176: nip=0x0010 thread_id=7740
  CPU #200: nip=0x0010 thread_id=7740
  CPU #208: nip=0x0010 thread_id=7740

 monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monitor.c b/monitor.c
index e9009de09a6c..19b8ec14f40e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1027,7 +1027,7 @@ int monitor_set_cpu(int cpu_index)
 CPUState *mon_get_cpu(void)
 {
 if (!cur_mon->mon_cpu) {
-monitor_set_cpu(0);
+monitor_set_cpu(first_cpu->cpu_index);
 }
 cpu_synchronize_state(cur_mon->mon_cpu);
 return cur_mon->mon_cpu;
-- 
2.7.4

Re: [Qemu-devel] DAX can not work on virtual nvdimm device

2016-08-31 Thread Ross Zwisler

On Wed, Aug 31, 2016 at 04:44:47PM +0800, Xiao Guangrong wrote:
> On 08/31/2016 01:09 AM, Dan Williams wrote:
> > 
> > Can you post your exact reproduction steps?  This test is not failing for 
> > me.
> > 
> 
> Sure.
> 
> 1. make the guest kernel based on your tree, the top commit is
>10d7902fa0e82b (dax: unmap/truncate on device shutdown) and
>the config file can be found in this thread.
> 
> 2. add guest kernel command line: memmap=6G!10G
> 
> 3: start the guest:
>x86_64-softmmu/qemu-system-x86_64 -machine pc,nvdimm --enable-kvm \
>-smp 16 -m 32G,maxmem=100G,slots=100 /other/VMs/centos6.img -monitor stdio
> 
> 4: in guest:
>mkfs.ext4 /dev/pmem0
>mount -o dax /dev/pmem0  /mnt/pmem/
>echo > /mnt/pmem/xxx
>./mmap /mnt/pmem/xxx
>./read /mnt/pmem/xxx
> 
>   The source code of mmap and read has been attached in this mail.
> 
>   Hopefully, you can detect the error triggered by read test.
> 
> Thanks!

I'm still unable to reproduce this issue.

I'm using a version of QEMU that I compiled at this commit:

bfc766d (HEAD, tag: v2.6.0) Update version for v2.6.0 release

Here are the options I used for the compile:

./configure --prefix=/home/rzwisler/qemu --target-list=x86_64-softmmu
--enable-kvm --enable-spice --enable-libusb --enable-usb-redir

I used the kernel commit and kernel config you provided.  The mmap is set up
the same, as are the QEMU command line parameters.  

With all this, the tests you provided give the following output:

# ./mmap /mnt/pmem/xxx
mmap test on /mnt/pmem/xxx.
Try to write 0x7f160072d000 for 1000 size.
Write Done.
Try to read 0x7f160072d000 for 1000 size.
Read Done.
End: 1000.
Try to fread fd=3 size 1000 sizeof(buf) 1.
Fread Done.

# ./read /mnt/pmem/xxx
test on /mnt/pmem/xxx.

 Good Read.

I'm not sure what else to look at.  What do you see in /proc/cpuinfo?  Perhaps
our virtual machine CPUs are advertising different features, and we are going
down different code paths?

Here are my cpuinfo flags in my guest:

flags   : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl eagerfpu pni cx16
x2apic hypervisor lahf_lm

Another thing to do would be to run your test on bare metal on the same
machine and see if you get different results.

Thanks,
- Ross

[Qemu-devel] [PATCH v2 4/7] ppc/pnv: add a core mask to PnvChip

2016-08-31 Thread Cédric Le Goater

This will be used to build real HW ids for the cores and enforce some
limits on the available cores per chip.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c | 27 +++
 include/hw/ppc/pnv.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index a6e7f66b2c0a..b6efb5e3ef07 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -236,6 +236,27 @@ static void ppc_powernv_init(MachineState *machine)
 g_free(chip_typename);
 }
 
+/* Allowed core identifiers on a POWER8 Processor Chip :
+ *
+ * 
+ *  EX1  - Venice only
+ *  EX2  - Venice only
+ *  EX3  - Venice only
+ *  EX4
+ *  EX5
+ *  EX6
+ *  
+ *  EX9  - Venice only
+ *  EX10 - Venice only
+ *  EX11 - Venice only
+ *  EX12
+ *  EX13
+ *  EX14
+ * 
+ */
+#define POWER8E_CORE_MASK  (~0x8f8f)
+#define POWER8_CORE_MASK   (~0x8181)
+
 static void pnv_chip_power8nvl_realize(PnvChip *chip, Error **errp)
 {
 ;
@@ -250,6 +271,8 @@ static void pnv_chip_power8nvl_class_init(ObjectClass 
*klass, void *data)
 k->cpu_model = "POWER8NVL";
 k->chip_type = PNV_CHIP_P8NVL;
 k->chip_f000f = 0x120d30498000ull;
+k->cores_max = 12;
+k->cores_mask = POWER8_CORE_MASK;
 dc->desc = "PowerNV Chip POWER8NVL";
 }
 
@@ -274,6 +297,8 @@ static void pnv_chip_power8_class_init(ObjectClass *klass, 
void *data)
 k->cpu_model = "POWER8";
 k->chip_type = PNV_CHIP_P8;
 k->chip_f000f = 0x220ea0498000ull;
+k->cores_max = 12;
+k->cores_mask = POWER8_CORE_MASK;
 dc->desc = "PowerNV Chip POWER8";
 }
 
@@ -298,6 +323,8 @@ static void pnv_chip_power8e_class_init(ObjectClass *klass, 
void *data)
 k->cpu_model = "POWER8E";
 k->chip_type = PNV_CHIP_P8E;
 k->chip_f000f = 0x221ef0498000ull;
+k->cores_max = 6;
+k->cores_mask = POWER8E_CORE_MASK;
 dc->desc = "PowerNV Chip POWER8E";
 }
 
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index bc6e1f80096b..987bc70245a7 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -49,6 +49,8 @@ typedef struct PnvChipClass {
 /*< private >*/
 SysBusDeviceClass parent_class;
 /*< public >*/
+uint32_t   cores_max;
+uint32_t   cores_mask;
 const char *cpu_model;
 PnvChipType  chip_type;
 uint64_t chip_f000f;
-- 
2.7.4

[Qemu-devel] [PATCH v2 1/7] ppc/pnv: add skeleton PowerNV platform

2016-08-31 Thread Cédric Le Goater

From: Benjamin Herrenschmidt 

The goal is to emulate a PowerNV system at the level of the skiboot
firmware, which loads the OS and provides some runtime services. Power
Systems have a lower firmware (HostBoot) that does low level system
initialization, like DRAM training. This is beyond the scope of what
qemu will address in a PowerNV guest.

No devices yet, not even an interrupt controller. Just to get started,
some RAM to load the skiboot firmware, the kernel and initrd. The
device tree is fully created in the machine reset op.

Signed-off-by: Benjamin Herrenschmidt 
[clg: - updated for qemu-2.7
  - replaced fprintf by error_report
  - used a common definition of _FDT macro
  - removed VMStateDescription as migration is not yet supported
  - added IBM Copyright statements
  - reworked kernel_filename handling
  - merged PnvSystem and sPowerNVMachineState
  - removed PHANDLE_XICP
  - added ppc_create_page_sizes_prop helper
  - removed nmi support
  - removed kvm support
  - updated powernv machine to version 2.8
  - removed chips and cpus, They will be provided in another patches
  - added a machine reset routine to initialize the device tree (also)
  - french has a squelette and english a skeleton.
  - improved commit log.
  - reworked prototypes parameters
  - added a check on the ram size (thanks to Michael Ellerman)
  - fixed chip-id cell
  - changed MAX_CPUS to 2048
  - simplified memory node creation to one node only
  - removed machine version
  - rewrote the device tree creation with the fdt "rw" routines
  - s/sPowerNVMachineState/PnvMachineState/
  - etc.
]
Signed-off-by: Cédric Le Goater 
---

 Changes since v1:

 - changed MAX_CPUS to 2048
 - simplified memory node creation to one node only
 - removed machine version 
 - rewrote the device tree creation with the fdt "rw" routines
 - s/sPowerNVMachineState/PnvMachineState/
 - block_default_type is back to IF_IDE because of the AHCI device

 default-configs/ppc64-softmmu.mak |   1 +
 hw/ppc/Makefile.objs  |   2 +
 hw/ppc/pnv.c  | 244 ++
 include/hw/ppc/pnv.h  |  37 ++
 4 files changed, 284 insertions(+)
 create mode 100644 hw/ppc/pnv.c
 create mode 100644 include/hw/ppc/pnv.h

diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index c4be59f638ed..516a6e25aba3 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -40,6 +40,7 @@ CONFIG_I8259=y
 CONFIG_XILINX=y
 CONFIG_XILINX_ETHLITE=y
 CONFIG_PSERIES=y
+CONFIG_POWERNV=y
 CONFIG_PREP=y
 CONFIG_MAC=y
 CONFIG_E500=y
diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 99a0d4e581bf..8105db7d5600 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -5,6 +5,8 @@ obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
+# IBM PowerNV
+obj-$(CONFIG_POWERNV) += pnv.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
new file mode 100644
index ..70413e3c5740
--- /dev/null
+++ b/hw/ppc/pnv.c
@@ -0,0 +1,244 @@
+/*
+ * QEMU PowerPC PowerNV model
+ *
+ * Copyright (c) 2004-2007 Fabrice Bellard
+ * Copyright (c) 2007 Jocelyn Mayer
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ * Copyright (c) 2014-2016 BenH, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/numa.h"
+#include "hw/hw.h"
+#include "target-ppc/cpu.h"
+#include "qemu/log.h"
+#include

[Qemu-devel] [PATCH v2 6/7] ppc/pnv: add a XScomDevice to PnvCore

2016-08-31 Thread Cédric Le Goater

Now that we are using real HW ids for the cores in PowerNV chips, we
can route the XSCOM accesses to them. We just need to attach a
XScomDevice to each core with the associated ranges in the XSCOM
address space.

To start with, let's install the DTS (Digital Thermal Sensor) handlers
which are easy to handle.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c  |  9 +++
 hw/ppc/pnv_core.c | 67 +++
 include/hw/ppc/pnv_core.h | 13 +
 3 files changed, 89 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index daf9f459ab0e..a31568415192 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -527,6 +527,7 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
 for (i = 0, core_hwid = 0; (core_hwid < sizeof(chip->cores_mask) * 8)
  && (i < chip->num_cores); core_hwid++) {
 PnvCore *pnv_core = >cores[i];
+DeviceState *qdev;
 
 if (!(chip->cores_mask & (1 << core_hwid))) {
 continue;
@@ -542,6 +543,14 @@ static void pnv_chip_realize(DeviceState *dev, Error 
**errp)
  _fatal);
 object_unref(OBJECT(pnv_core));
 i++;
+
+/* Attach the core to its XSCOM bus */
+qdev = qdev_create(>xscom->bus, TYPE_PNV_CORE_XSCOM);
+qdev_prop_set_uint32(qdev, "core-pir",
+ P8_PIR(chip->chip_id, core_hwid));
+qdev_init_nofail(qdev);
+
+pnv_core->xd = PNV_CORE_XSCOM(qdev);
 }
 g_free(typename);
 
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 825aea1194a1..feba374740dc 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -18,7 +18,9 @@
  */
 #include "qemu/osdep.h"
 #include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qemu/log.h"
 #include "target-ppc/cpu.h"
 #include "hw/ppc/ppc.h"
 #include "hw/ppc/pnv.h"
@@ -144,10 +146,75 @@ static const TypeInfo pnv_core_info = {
 .abstract   = true,
 };
 
+
+#define DTS_RESULT0 0x5
+#define DTS_RESULT1 0x50001
+
+static bool pnv_core_xscom_read(XScomDevice *dev, uint32_t range,
+   uint32_t offset, uint64_t *out_val)
+{
+switch (offset) {
+case DTS_RESULT0:
+*out_val = 0x26f024f023full;
+break;
+case DTS_RESULT1:
+*out_val = 0x24full;
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR, "Warning: reading reg=0x%08x", offset);
+}
+
+   return true;
+}
+
+static bool pnv_core_xscom_write(XScomDevice *dev, uint32_t range,
+uint32_t offset, uint64_t val)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "Warning: writing to reg=0x%08x", offset);
+return true;
+}
+
+#define EX_XSCOM_BASE 0x1000
+#define EX_XSCOM_SIZE 0x10
+
+static void pnv_core_xscom_realize(DeviceState *dev, Error **errp)
+{
+XScomDevice *xd = XSCOM_DEVICE(dev);
+PnvCoreXScom *pnv_xd = PNV_CORE_XSCOM(dev);
+
+xd->ranges[0].addr = EX_XSCOM_BASE | P8_PIR2COREID(pnv_xd->core_pir) << 24;
+xd->ranges[0].size = EX_XSCOM_SIZE;
+}
+
+static Property pnv_core_xscom_properties[] = {
+DEFINE_PROP_UINT32("core-pir", PnvCoreXScom, core_pir, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pnv_core_xscom_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+XScomDeviceClass *xdc = XSCOM_DEVICE_CLASS(klass);
+
+xdc->read = pnv_core_xscom_read;
+xdc->write = pnv_core_xscom_write;
+
+dc->realize = pnv_core_xscom_realize;
+dc->props = pnv_core_xscom_properties;
+}
+
+static const TypeInfo pnv_core_xscom_type_info = {
+.name  = TYPE_PNV_CORE_XSCOM,
+.parent= TYPE_XSCOM_DEVICE,
+.instance_size = sizeof(PnvCoreXScom),
+.class_init= pnv_core_xscom_class_init,
+};
+
 static void pnv_core_register_types(void)
 {
 int i ;
 
+type_register_static(_core_xscom_type_info);
 type_register_static(_core_info);
 for (i = 0; i < ARRAY_SIZE(pnv_core_models); ++i) {
 TypeInfo ti = {
diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
index 832c8756afaa..72936ccfd22f 100644
--- a/include/hw/ppc/pnv_core.h
+++ b/include/hw/ppc/pnv_core.h
@@ -20,6 +20,18 @@
 #define _PPC_PNV_CORE_H
 
 #include "hw/cpu/core.h"
+#include "hw/ppc/pnv_xscom.h"
+
+#define TYPE_PNV_CORE_XSCOM "powernv-cpu-core-xscom"
+#define PNV_CORE_XSCOM(obj) \
+ OBJECT_CHECK(PnvCoreXScom, (obj), TYPE_PNV_CORE_XSCOM)
+
+typedef struct PnvCoreXScom {
+XScomDevice xd;
+uint32_t core_pir;
+} PnvCoreXScom;
+
+#define P8_PIR2COREID(pir) (((pir) >> 3) & 0xf)
 
 #define TYPE_PNV_CORE "powernv-cpu-core"
 #define PNV_CORE(obj) \
@@ -35,6 +47,7 @@ typedef struct PnvCore {
 
 /*< public >*/
 void *threads;
+PnvCoreXScom *xd;
 } PnvCore;
 
 typedef struct PnvCoreClass {
-- 
2.7.4

[Qemu-devel] [PATCH v2 0/7] ppc/pnv: add a minimal platform

2016-08-31 Thread Cédric Le Goater

Hello,

Here is a new version to address the comments from v1 plus a couple of
improvements, the most important being :

 - PnvChip now has PnvChipClass depending on the cpu model
 - the device tree uses the fdt "rw" routines
 - the XSCOM bus makes its first appearance.
 - the cores now use real HW ids ! 'cpu_dt_id' is dead, long live
   'cpu_index' 

The patchset is organised the same way, the initial patch provides a
minimal platform with some RAM to load ROMs, firmware, kernel,
initrd. The device tree is built with what is available at reset time.

Then, comes the PnvChip object acting as a container for other devices
required to run a system. First of these is XSCOM, the sideband bus
which gives controls to all the units in the POWER8 chip and then the
cores.

Last is a little fix to dump the cpus from the monitor.

The PowerNV platform provides just enough support to be run under
qemu, so that you can check the qom tree, dump the device tree from
ram, show the cpus, etc. It still lacks quite a few controllers to be
useful.

The next major task is XICS as it does not support real HW ids for the
cpus. There are some initial patches and hacks for that in my dev
branch. If you feel adventurous, you can give it a try here :

   https://github.com/legoater/qemu/commits/powernv-ipmi-2.8

Just add on the command line :

 -smp cores=8

Thanks,

C. 

Benjamin Herrenschmidt (2):
  ppc/pnv: add skeleton PowerNV platform
  ppc/pnv: Add XSCOM infrastructure

Cédric Le Goater (5):
  ppc/pnv: add a PnvChip object
  ppc/pnv: add a core mask to PnvChip
  ppc/pnv: add a PnvCore object
  ppc/pnv: add a XScomDevice to PnvCore
  monitor: fix crash for platforms without a CPU 0

 default-configs/ppc64-softmmu.mak |   1 +
 hw/ppc/Makefile.objs  |   2 +
 hw/ppc/pnv.c  | 649 ++
 hw/ppc/pnv_core.c | 237 ++
 hw/ppc/pnv_xscom.c| 408 
 include/hw/ppc/pnv.h  | 119 +++
 include/hw/ppc/pnv_core.h |  60 
 include/hw/ppc/pnv_xscom.h|  75 +
 monitor.c |   2 +-
 9 files changed, 1552 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/pnv.c
 create mode 100644 hw/ppc/pnv_core.c
 create mode 100644 hw/ppc/pnv_xscom.c
 create mode 100644 include/hw/ppc/pnv.h
 create mode 100644 include/hw/ppc/pnv_core.h
 create mode 100644 include/hw/ppc/pnv_xscom.h

-- 
2.7.4

Re: [Qemu-devel] [PATCH 0/6] hypertrace: Lightweight guest-to-QEMU trace channel

2016-08-31 Thread Stefan Hajnoczi

On Mon, Aug 29, 2016 at 08:46:02PM +0200, Lluís Vilanova wrote:
> >> Also, I'm still not sure how to interact with QEMU's monitor interface from
> >> within the probe code (probes execute in kernel mode, including "guru mode"
> >> code).
> 
> > When SystemTap is used the QEMU monitor interface does nothing.
> 
> That's not what I've experienced. I was able to use a stap script to change 
> the
> tracing state of events:
> 
>#!/usr/bin/env stap
> 
>%{
>#include 
>%}
> 
>function event:long(cpu:long, addr:long, info:long)
>%{
>char *argv[4] = {"/bin/sh", "-c", "echo 'trace-event * off' | telnet 
> localhost 1234", NULL};
>call_usermodehelper(argv[0], argv, NULL, UMH_WAIT_EXEC);
>STAP_RETURN(0);
>%}
> 
>probe begin {
>printf("hello\n")
>}
>probe 
> process("./install/vanilla/bin/qemu-system-i386").mark("guest_mem_before_exec")
>{
>printf("%x %d %d\n", $arg1, $arg2, $arg3)
>event($arg1, $arg2, $arg3)
>exit()
>}
> 
> The only caveat is that you must pass the "-g" argument to stap.
> 
> Also, for some reason the printf in the probe always prints zeros, no matter
> what the actual event receives (I've debugged QEMU down to the call to the
> auto-generated stap functions). Could this be an error in systemtap?

It's strange that arguments do not have valid values.  Debugging the
stap functions is the next step if you want to figure out what happened.
I've never had this issue before so maybe something with Debian
SystemTap userspace probes is broken.

Stefan


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-08-31 Thread Cédric Le Goater

From: Benjamin Herrenschmidt 

XSCOM is an interface to a sideband bus provided by the POWER8 chip
pervasive unit, which gives access to a number of facilities in the
chip that are needed by the OPAL firmware and to a lesser extent,
Linux. This is among others how the PCI Host bridges get configured
at boot or how the LPC bus is accessed.

This provides a simple bus and device type for devices sitting on
XSCOM along with some facilities to optionally generate corresponding
device-tree nodes

Signed-off-by: Benjamin Herrenschmidt 
[clg: updated for qemu-2.7
  ported on new sPowerNVMachineState which was merged with PnvSystem
  removed TRACE_XSCOM
  fixed checkpatch errors
  replaced assert with error_setg in xscom_realize()
  reworked xscom_create
  introduced the use of the chip_class for chip model contants
  ]
Signed-off-by: Cédric Le Goater 
---

 They were some discussions on whether we should use a qemu
 address_space instead of the xscom ranges defined in this patch. 
 I gave it try, it is possible but it brings extra unnecessary calls
 and complexity. I think the current solution is better.

 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/pnv.c   |  11 ++
 hw/ppc/pnv_xscom.c | 408 +
 include/hw/ppc/pnv.h   |   2 +
 include/hw/ppc/pnv_xscom.h |  75 +
 5 files changed, 497 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/pnv_xscom.c
 create mode 100644 include/hw/ppc/pnv_xscom.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 8105db7d5600..f580e5c41413 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -6,7 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
 # IBM PowerNV
-obj-$(CONFIG_POWERNV) += pnv.o
+obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 06051268e200..a6e7f66b2c0a 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -39,6 +39,8 @@
 #include "exec/address-spaces.h"
 #include "qemu/cutils.h"
 
+#include "hw/ppc/pnv_xscom.h"
+
 #include 
 
 #define FDT_ADDR0x0100
@@ -103,6 +105,7 @@ static void *powernv_create_fdt(PnvMachineState *pnv,
 char *buf;
 const char plat_compat[] = "qemu,powernv\0ibm,powernv";
 int off;
+int i;
 
 fdt = g_malloc0(FDT_MAX_SIZE);
 _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
@@ -142,6 +145,11 @@ static void *powernv_create_fdt(PnvMachineState *pnv,
 /* Memory */
 powernv_populate_memory(fdt);
 
+/* Populate XSCOM for each chip */
+for (i = 0; i < pnv->num_chips; i++) {
+xscom_populate_fdt(pnv->chips[i]->xscom, fdt, 0);
+}
+
 return fdt;
 }
 
@@ -305,6 +313,9 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
 PnvChip *chip = PNV_CHIP(dev);
 PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
 
+/* Set up XSCOM bus */
+chip->xscom = xscom_create(chip);
+
 pcc->realize(chip, errp);
 }
 
diff --git a/hw/ppc/pnv_xscom.c b/hw/ppc/pnv_xscom.c
new file mode 100644
index ..7ed3804f4b3a
--- /dev/null
+++ b/hw/ppc/pnv_xscom.c
@@ -0,0 +1,408 @@
+
+/*
+ * QEMU PowerNV XSCOM bus definitions
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation 
+ * Based on the s390 virtio bus code:
+ * Copyright (c) 2009 Alexander Graf 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+/* TODO: Add some infrastructure for "random stuff" and FIRs that
+ * various units might want to deal with without creating actual
+ * XSCOM devices.
+ *
+ * For example, HB LPC XSCOM in the PIBAM
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/hw.h"
+#include "sysemu/sysemu.h"
+#include "hw/boards.h"
+#include "monitor/monitor.h"
+#include "hw/loader.h"
+#include "elf.h"
+#include "hw/sysbus.h"
+#include "sysemu/kvm.h"
+#include "sysemu/device_tree.h"
+#include "hw/ppc/fdt.h"
+
+#include "hw/ppc/pnv_xscom.h"
+
+#include 
+
+#define TYPE_XSCOM "xscom"
+#define XSCOM(obj) OBJECT_CHECK(XScomState, (obj), TYPE_XSCOM)
+
+#define XSCOM_SIZE

[Qemu-devel] [PATCH v2 2/7] ppc/pnv: add a PnvChip object

2016-08-31 Thread Cédric Le Goater

This is is an abstraction of a POWER8 chip which is a set of cores
plus other 'units', like the pervasive unit, the interrupt controller,
the memory controller, the on-chip microcontroller, etc. The whole can
be seen as a socket. It depends on a cpu model and its characteristics,
max cores, specific init are defined in a PnvChipClass.

We start with an near empty PnvChip with only a few cpu constants
which we will grow in the subsequent patches with the controllers
required to run the system.

Signed-off-by: Cédric Le Goater 
---

 Changes since v1:
 
 - introduced a PnvChipClass depending on the cpu model. It also
   provides some chip constants used by devices, like the cpu model hw
   id (f000f), a enum type (not sure this is useful yet), a custom
   realize ops for customization.
 - the num-chips property can be configured on the command line.
 
 Maybe this object deserves its own file hw/ppc/pnv_chip.c ? 

 hw/ppc/pnv.c | 154 +++
 include/hw/ppc/pnv.h |  71 
 2 files changed, 225 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 70413e3c5740..06051268e200 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -168,6 +168,8 @@ static void ppc_powernv_init(MachineState *machine)
 char *fw_filename;
 long fw_size;
 long kernel_size;
+int i;
+char *chip_typename;
 
 /* allocate RAM */
 if (ram_size < (1 * G_BYTE)) {
@@ -212,6 +214,153 @@ static void ppc_powernv_init(MachineState *machine)
 exit(1);
 }
 }
+
+/* Create the processor chips */
+chip_typename = g_strdup_printf(TYPE_PNV_CHIP "-%s", machine->cpu_model);
+
+pnv->chips = g_new0(PnvChip *, pnv->num_chips);
+for (i = 0; i < pnv->num_chips; i++) {
+Object *chip = object_new(chip_typename);
+object_property_set_int(chip, CHIP_HWID(i), "chip-id", _abort);
+object_property_set_bool(chip, true, "realized", _abort);
+pnv->chips[i] = PNV_CHIP(chip);
+}
+g_free(chip_typename);
+}
+
+static void pnv_chip_power8nvl_realize(PnvChip *chip, Error **errp)
+{
+;
+}
+
+static void pnv_chip_power8nvl_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PnvChipClass *k = PNV_CHIP_CLASS(klass);
+
+k->realize = pnv_chip_power8nvl_realize;
+k->cpu_model = "POWER8NVL";
+k->chip_type = PNV_CHIP_P8NVL;
+k->chip_f000f = 0x120d30498000ull;
+dc->desc = "PowerNV Chip POWER8NVL";
+}
+
+static const TypeInfo pnv_chip_power8nvl_info = {
+.name  = TYPE_PNV_CHIP_POWER8NVL,
+.parent= TYPE_PNV_CHIP,
+.instance_size = sizeof(PnvChipPower8NVL),
+.class_init= pnv_chip_power8nvl_class_init,
+};
+
+static void pnv_chip_power8_realize(PnvChip *chip, Error **errp)
+{
+;
+}
+
+static void pnv_chip_power8_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PnvChipClass *k = PNV_CHIP_CLASS(klass);
+
+k->realize = pnv_chip_power8_realize;
+k->cpu_model = "POWER8";
+k->chip_type = PNV_CHIP_P8;
+k->chip_f000f = 0x220ea0498000ull;
+dc->desc = "PowerNV Chip POWER8";
+}
+
+static const TypeInfo pnv_chip_power8_info = {
+.name  = TYPE_PNV_CHIP_POWER8,
+.parent= TYPE_PNV_CHIP,
+.instance_size = sizeof(PnvChipPower8),
+.class_init= pnv_chip_power8_class_init,
+};
+
+static void pnv_chip_power8e_realize(PnvChip *chip, Error **errp)
+{
+;
+}
+
+static void pnv_chip_power8e_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PnvChipClass *k = PNV_CHIP_CLASS(klass);
+
+k->realize = pnv_chip_power8e_realize;
+k->cpu_model = "POWER8E";
+k->chip_type = PNV_CHIP_P8E;
+k->chip_f000f = 0x221ef0498000ull;
+dc->desc = "PowerNV Chip POWER8E";
+}
+
+static const TypeInfo pnv_chip_power8e_info = {
+.name  = TYPE_PNV_CHIP_POWER8E,
+.parent= TYPE_PNV_CHIP,
+.instance_size = sizeof(PnvChipPower8e),
+.class_init= pnv_chip_power8e_class_init,
+};
+
+static void pnv_chip_realize(DeviceState *dev, Error **errp)
+{
+PnvChip *chip = PNV_CHIP(dev);
+PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
+
+pcc->realize(chip, errp);
+}
+
+static Property pnv_chip_properties[] = {
+DEFINE_PROP_UINT32("chip-id", PnvChip, chip_id, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pnv_chip_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = pnv_chip_realize;
+dc->props = pnv_chip_properties;
+dc->desc = "PowerNV Chip";
+ }
+
+static const TypeInfo pnv_chip_info = {
+.name  = TYPE_PNV_CHIP,
+.parent= TYPE_SYS_BUS_DEVICE,
+.class_init= pnv_chip_class_init,
+.class_size= sizeof(PnvChipClass),
+.abstract  = true,
+};
+
+static char *pnv_get_num_chips(Object *obj, Error **errp)
+{
+return

[Qemu-devel] [v3 5/6] hw/acpi: report IOAPIC on IVRS

2016-08-31 Thread David Kiarie

Report IOAPIC via IVRS which effectively allows linux AMD-Vi
driver to enable interrupt remapping

Signed-off-by: David Kiarie 
---
 hw/i386/acpi-build.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 49bd183..c2559ff 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2615,6 +2615,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  *   Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte)
  */
 build_append_int_noprefix(table_data, 0x001, 4);
+/* IOAPIC represented as an 8-byte entry. Spec v2.62 Tables 97 */
+build_append_int_noprefix(table_data, 0x0100a000cf48, 8);
 
 build_header(linker, table_data, (void *)(table_data->data + iommu_start),
  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
-- 
2.1.4

[Qemu-devel] [v3 4/6] hw/iommu: AMD IOMMU interrupt remapping

2016-08-31 Thread David Kiarie

Introduce AMD IOMMU interrupt remapping and hook it onto
the existing interrupt remapping infrastructure

Signed-off-by: David Kiarie 
---
 hw/i386/amd_iommu.c | 241 +++-
 hw/i386/amd_iommu.h |   4 +-
 hw/intc/ioapic.c|   9 +-
 3 files changed, 249 insertions(+), 5 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 226fea5..54519bb 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -18,8 +18,10 @@
  * with this program; if not, see .
  *
  * Cache implementation inspired by hw/i386/intel_iommu.c
+ *
  */
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "hw/i386/amd_iommu.h"
 #include "trace.h"
 
@@ -255,6 +257,31 @@ typedef struct QEMU_PACKED {
 uint32_t reserved_5:16;
 } CMDCompletePPR;
 
+typedef union IRTE {
+struct {
+#ifdef HOST_WORDS_BIGENDIAN
+uint32_t destination:8;
+uint32_t rsvd_1:1;
+uint32_t dm:1;
+uint32_t rq_eoi:1;
+uint32_t int_type:3;
+uint32_t no_fault:1;
+uint32_t valid:1;
+#else
+uint32_t valid:1;
+uint32_t no_fault:1;
+uint32_t int_type:3;
+uint32_t rq_eoi:1;
+uint32_t dm:1;
+uint32_t rsvd_1:1;
+uint32_t destination:8;
+#endif
+uint32_t vector:8;
+uint32_t rsvd_2:8;
+} bits;
+uint32_t data;
+} IRTE;
+
 /* configure MMIO registers at startup/reset */
 static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
uint64_t romask, uint64_t w1cmask)
@@ -641,6 +668,11 @@ static void amdvi_inval_inttable(AMDVIState *s, 
CMDInvalIntrTable *inval)
 amdvi_log_illegalcom_error(s, inval->type, s->cmdbuf + s->cmdbuf_head);
 return;
 }
+
+if (s->ir_cache) {
+x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), true, 0, 0);
+}
+
 trace_amdvi_intr_inval();
 }
 
@@ -1203,6 +1235,197 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+static inline int amdvi_ir_handle_non_vectored(MSIMessage *src,
+   MSIMessage *dst, uint8_t bitpos,
+   uint64_t dte)
+{
+if ((dte & (1UL << bitpos))) {
+/* passing interrupt enabled */
+memcpy(dst, src, sizeof(*dst));
+} else {
+/* should be target aborted */
+return -AMDVI_TARGET_ABORT;
+}
+return 0;
+}
+
+static int amdvi_remap_ir_intctl(uint64_t dte, IRTE irte,
+ MSIMessage *src, MSIMessage *dst)
+{
+int ret = 0;
+
+switch ((dte >> AMDVI_DTE_INTCTL_RSHIFT) & 3UL) {
+case AMDVI_INTCTL_PASS:
+/* pass */
+memcpy(dst, src, sizeof(*dst));
+break;
+case AMDVI_INTCTL_REMAP:
+/* remap */
+if (irte.bits.valid) {
+/* LOCAL APIC address */
+dst->address = AMDVI_LOCAL_APIC_ADDR;
+/* destination mode */
+dst->address |= ((uint64_t)irte.bits.dm) <<
+AMDVI_MSI_ADDR_DM_RSHIFT;
+/* RH */
+dst->address |= ((uint64_t)irte.bits.rq_eoi) <<
+AMDVI_MSI_ADDR_RH_RSHIFT;
+/* Destination ID */
+dst->address |= ((uint64_t)irte.bits.destination) <<
+AMDVI_MSI_ADDR_DEST_RSHIFT;
+/* construct data - vector */
+dst->data |= irte.bits.vector;
+/* Interrupt type */
+dst->data |= ((uint64_t)irte.bits.int_type) <<
+ AMDVI_MSI_DATA_DM_RSHIFT;
+} else  {
+ret = -AMDVI_TARGET_ABORT;
+}
+break;
+case AMDVI_INTCTL_ABORT:
+case AMDVI_INTCTL_RSVD:
+ret = -AMDVI_TARGET_ABORT;
+}
+return ret;
+}
+
+static int amdvi_irte_get(AMDVIState *s, MSIMessage *src, IRTE *irte,
+  uint64_t *dte, uint16_t devid)
+{
+uint64_t irte_root, offset = devid * AMDVI_DEVTAB_ENTRY_SIZE,
+ ir_table_size;
+
+irte_root = dte[2] & AMDVI_IRTEROOT_MASK;
+offset = (src->data & AMDVI_IRTE_INDEX_MASK) << 2;
+ir_table_size = 1UL << (dte[2] & AMDVI_IR_TABLE_SIZE_MASK);
+/* enforce IR table size */
+if (offset > (ir_table_size * AMDVI_DEFAULT_IRTE_SIZE)) {
+trace_amdvi_invalid_irte_entry(offset, ir_table_size);
+return -AMDVI_TARGET_ABORT;
+}
+/* read IRTE */
+if (dma_memory_read(_space_memory, irte_root + offset,
+irte, sizeof(*irte))) {
+trace_amdvi_irte_get_fail(irte_root, offset);
+return -AMDVI_DEV_TAB_HW;
+}
+return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu, MSIMessage *src,
+   MSIMessage *dst, uint16_t sid)
+{
+trace_amdvi_ir_request(src->data, src->address, sid);
+
+AMDVIState *s = AMD_IOMMU_DEVICE(iommu);
+int ret = 0;
+uint64_t

[Qemu-devel] [v3 1/6] hw/msi: Allow platform devices to use explicit SID

2016-08-31 Thread David Kiarie

When using IOMMU platform devices like IOAPIC are required to make
interrupt remapping requests using explicit SID.We affiliate an MSI
route with a requester ID and a PCI device if present which ensures
that platform devices can call IOMMU interrupt remapping code with
explicit SID while maintaining compatility with the original code
which mainly dealt with PCI devices.

Signed-off-by: David Kiarie 
---
 hw/i386/intel_iommu.c |  3 +++
 hw/i386/kvm/pci-assign.c  | 12 
 hw/intc/ioapic.c  | 25 +
 hw/misc/ivshmem.c |  6 --
 hw/vfio/pci.c |  6 --
 hw/virtio/virtio-pci.c|  7 +--
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/x86-iommu.h   |  1 +
 include/sysemu/kvm.h  | 25 ++---
 kvm-all.c | 10 ++
 kvm-stub.c|  5 +++--
 qemu-version.h|  1 +
 target-i386/kvm.c | 15 +--
 13 files changed, 80 insertions(+), 37 deletions(-)
 create mode 100644 qemu-version.h

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d6e02c8..496d836 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2466,6 +2466,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 vtd_init(s);
 sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
 pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+/* IOMMU expected IOAPIC SID */
+x86_iommu->ioapic_bdf = PCI_BUILD_BDF(Q35_PSEUDO_DEVFN_IOAPIC,
+Q35_PSEUDO_DEVFN_IOAPIC);
 /* Pseudo address space under root PCI bus. */
 pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 
diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 8238fbc..3f26be1 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -976,7 +976,8 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
 int virq;
 
-virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev);
+virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev,
+ pci_requester_id(pci_dev));
 if (virq < 0) {
 perror("assigned_dev_update_msi: kvm_irqchip_add_msi_route");
 return;
@@ -1014,7 +1015,8 @@ static void assigned_dev_update_msi_msg(PCIDevice 
*pci_dev)
 }
 
 kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
- msi_get_message(pci_dev, 0), pci_dev);
+ msi_get_message(pci_dev, 0), pci_dev,
+ pci_requester_id(pci_dev));
 kvm_irqchip_commit_routes(kvm_state);
 }
 
@@ -1078,7 +1080,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 continue;
 }
 
-r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev);
+r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev,
+  pci_requester_id(pci_dev));
 if (r < 0) {
 return r;
 }
@@ -1599,7 +1602,8 @@ static void assigned_dev_msix_mmio_write(void *opaque, 
hwaddr addr,
 
 ret = kvm_irqchip_update_msi_route(kvm_state,
adev->msi_virq[i], msg,
-   pdev);
+   pdev,
+   pci_requester_id(pdev));
 if (ret) {
 error_report("Error updating irq routing entry (%d)", ret);
 }
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 31791b0..b8b2f33 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -95,9 +95,17 @@ static void ioapic_entry_parse(uint64_t entry, struct 
ioapic_entry_info *info)
 (info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
 }
 
-static void ioapic_service(IOAPICCommonState *s)
+static void ioapic_as_write(IOAPICCommonState *s, uint32_t data, uint64_t addr)
 {
 AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+MemTxAttrs attrs;
+
+attrs.requester_id = s->devid;
+address_space_stl_le(ioapic_as, addr, data, attrs, NULL);
+}
+
+static void ioapic_service(IOAPICCommonState *s)
+{
 struct ioapic_entry_info info;
 uint8_t i;
 uint32_t mask;
@@ -141,7 +149,7 @@ static void ioapic_service(IOAPICCommonState *s)
  * the IOAPIC message into a MSI one, and its
  * address space will decide whether we need a
  * translation. */
-stl_le_phys(ioapic_as, info.addr, info.data);
+ioapic_as_write(s, info.data, info.addr);
 }
 }
 }
@@ -197,7 +205,7 @@ static void

[Qemu-devel] [v3 0/6] AMD IOMMU

2016-08-31 Thread David Kiarie

Hello all,

Changes since V2
  -formating fixes.
  -fixed an issue where the right IOAPIC id was not being correctly set when 
using kernel_irqchip=off

The following patchset implements AMD-Vi interrupt remapping logic and hooks it 
onto existing IR infrastucture.

I have bundled this patchset together with the "Explicit SID for 
IOAPIC"."Explicit SID for IOAPIC" functions to 
affiliate MSI routes with a requester ID and a PCI device if present which 
enables platform devices like IOAPIC to
make interrupt requests using an explicit SID as required by both VT-d and 
AMD-Vi.

David Kiarie (6):
  hw/msi: Allow platform devices to use explicit SID
  hw/i386: enforce SID verification
  hw/iommu: Prepare for AMD IOMMU interrupt remapping
  hw/iommu: AMD IOMMU interrupt remapping
  hw/acpi: report IOAPIC on IVRS
  hw/iommu: share common code between IOMMUs

 hw/i386/acpi-build.c  |   2 +
 hw/i386/amd_iommu.c   | 241 +-
 hw/i386/amd_iommu.h   |  82 +
 hw/i386/intel_iommu.c |  89 +++---
 hw/i386/kvm/pci-assign.c  |  12 +-
 hw/i386/trace-events  |   7 ++
 hw/i386/x86-iommu.c   |   8 ++
 hw/intc/ioapic.c  |  30 -
 hw/misc/ivshmem.c |   6 +-
 hw/vfio/pci.c |   6 +-
 hw/virtio/virtio-pci.c|   7 +-
 include/hw/i386/ioapic_internal.h |   1 +
 include/hw/i386/x86-iommu.h   |   1 +
 include/sysemu/kvm.h  |  25 ++--
 kvm-all.c |  10 +-
 kvm-stub.c|   5 +-
 qemu-version.h|   1 +
 target-i386/kvm.c |  15 ++-
 18 files changed, 462 insertions(+), 86 deletions(-)
 create mode 100644 qemu-version.h

-- 
2.1.4

[Qemu-devel] [v3 6/6] hw/iommu: share common code between IOMMUs

2016-08-31 Thread David Kiarie

Enabling interrupt remapping with kernel_irqchip=on should result
in an error for both VT-d and AMD-Vi

Signed-off-by: David Kiarie 
---
 hw/i386/intel_iommu.c | 9 -
 hw/i386/x86-iommu.c   | 8 
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e4bad6a..bf86dcc 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -30,7 +30,6 @@
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/pci-host/q35.h"
-#include "sysemu/kvm.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2472,14 +2471,6 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 Q35_PSEUDO_DEVFN_IOAPIC);
 /* Pseudo address space under root PCI bus. */
 pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
-
-/* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
-if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
-!kvm_irqchip_is_split()) {
-error_report("Intel Interrupt Remapping cannot work with "
- "kernel-irqchip=on, please use 'split|off'.");
-exit(1);
-}
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 2278af7..66510f7 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -21,6 +21,7 @@
 #include "hw/sysbus.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "sysemu/kvm.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -84,6 +85,13 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 if (x86_class->realize) {
 x86_class->realize(dev, errp);
 }
+/* Currently IOMMU IR only support "kernel-irqchip={off|split}" */
+if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+!kvm_irqchip_is_split()) {
+error_report("Interrupt Remapping cannot work with "
+ "kernel-irqchip=on, please use 'split|off'.");
+exit(1);
+}
 
 x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
 }
-- 
2.1.4

[Qemu-devel] [PATCH v2] sh4: fix broken link to documentation

2016-08-31 Thread Reda Sallahi

The page that was previously linked in the source code and the README file is
no longer available so it now returns a 404 error message.

This puts a previous snapshot from archive.org instead.

Signed-off-by: Reda Sallahi 
---
Changes from v1:
* Add the 'https://' part to the link in hw/sh4/shix.c.

 hw/sh4/shix.c | 2 +-
 target-sh4/README.sh4 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/sh4/shix.c b/hw/sh4/shix.c
index ccc9e75..14d4007 100644
--- a/hw/sh4/shix.c
+++ b/hw/sh4/shix.c
@@ -23,7 +23,7 @@
  */
 /*
Shix 2.0 board by Alexis Polti, described at
-   http://perso.enst.fr/~polti/realisations/shix20/
+   
https://web.archive.org/web/20070917001736/perso.enst.fr/~polti/realisations/shix20
 
More information in target-sh4/README.sh4
 */
diff --git a/target-sh4/README.sh4 b/target-sh4/README.sh4
index e578830..ece0464 100644
--- a/target-sh4/README.sh4
+++ b/target-sh4/README.sh4
@@ -25,7 +25,7 @@ Goals
 
 The primary model being worked on is the soft MMU target to be able to
 emulate the Shix 2.0 board by Alexis Polti, described at
-http://perso.enst.fr/~polti/realisations/shix20/
+https://web.archive.org/web/20070917001736/http://perso.enst.fr/~polti/realisations/shix20/
 
 Ultimately, qemu will be coupled with a system C or a verilog
 simulator to simulate the whole board functionalities.
-- 
2.9.3

[Qemu-devel] [v3 2/6] hw/i386: enforce SID verification

2016-08-31 Thread David Kiarie

Platform device are now able to make interrupt request with
explicit SIDs hence we can safely expect triggered AddressSpace ID
to match the requesting ID

Signed-off-by: David Kiarie 
---
 hw/i386/intel_iommu.c | 77 ++-
 1 file changed, 39 insertions(+), 38 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 496d836..e4bad6a 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2043,43 +2043,41 @@ static int vtd_irte_get(IntelIOMMUState *iommu, 
uint16_t index,
 return -VTD_FR_IR_IRTE_RSVD;
 }
 
-if (sid != X86_IOMMU_SID_INVALID) {
-/* Validate IRTE SID */
-source_id = le32_to_cpu(entry->irte.source_id);
-switch (entry->irte.sid_vtype) {
-case VTD_SVT_NONE:
-VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
-break;
-
-case VTD_SVT_ALL:
-mask = vtd_svt_mask[entry->irte.sid_q];
-if ((source_id & mask) != (sid & mask)) {
-VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
-"%d failed (reqid 0x%04x sid 0x%04x)", index,
-sid, source_id);
-return -VTD_FR_IR_SID_ERR;
-}
-break;
+/* Validate IRTE SID */
+source_id = le32_to_cpu(entry->irte.source_id);
+switch (entry->irte.sid_vtype) {
+case VTD_SVT_NONE:
+VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
+break;
 
-case VTD_SVT_BUS:
-bus_max = source_id >> 8;
-bus_min = source_id & 0xff;
-bus = sid >> 8;
-if (bus > bus_max || bus < bus_min) {
-VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
-"failed (bus %d outside %d-%d)", index, bus,
-bus_min, bus_max);
-return -VTD_FR_IR_SID_ERR;
-}
-break;
+case VTD_SVT_ALL:
+mask = vtd_svt_mask[entry->irte.sid_q];
+if ((source_id & mask) != (sid & mask)) {
+VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
+"%d failed (reqid 0x%04x sid 0x%04x)", index,
+sid, source_id);
+return -VTD_FR_IR_SID_ERR;
+}
+break;
 
-default:
-VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
-"%d", entry->irte.sid_vtype, index);
-/* Take this as verification failure. */
+case VTD_SVT_BUS:
+bus_max = source_id >> 8;
+bus_min = source_id & 0xff;
+bus = sid >> 8;
+if (bus > bus_max || bus < bus_min) {
+VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
+"failed (bus %d outside %d-%d)", index, bus,
+bus_min, bus_max);
 return -VTD_FR_IR_SID_ERR;
-break;
 }
+break;
+
+default:
+VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
+"%d", entry->irte.sid_vtype, index);
+/* Take this as verification failure. */
+return -VTD_FR_IR_SID_ERR;
+break;
 }
 
 return 0;
@@ -2252,14 +2250,17 @@ static MemTxResult vtd_mem_ir_write(void *opaque, 
hwaddr addr,
 {
 int ret = 0;
 MSIMessage from = {}, to = {};
-uint16_t sid = X86_IOMMU_SID_INVALID;
+VTDAddressSpace *as = opaque;
+uint16_t sid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
 
 from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
 from.data = (uint32_t) value;
 
-if (!attrs.unspecified) {
-/* We have explicit Source ID */
-sid = attrs.requester_id;
+if (attrs.requester_id != sid) {
+VTD_DPRINTF(GENERAL, "int remap request for sid 0x%04x"
+" requester_id 0x%04x couldn't be verified",
+sid, attrs.requester_id);
+return MEMTX_ERROR;
 }
 
 ret = vtd_interrupt_remap_msi(opaque, , , sid);
@@ -2325,7 +2326,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 memory_region_init_iommu(_dev_as->iommu, OBJECT(s),
  >iommu_ops, "intel_iommu", UINT64_MAX);
 memory_region_init_io(_dev_as->iommu_ir, OBJECT(s),
-  _mem_ir_ops, s, "intel_iommu_ir",
+  _mem_ir_ops, vtd_dev_as, "intel_iommu_ir",
   VTD_INTERRUPT_ADDR_SIZE);
 memory_region_add_subregion(_dev_as->iommu, 
VTD_INTERRUPT_ADDR_FIRST,
 _dev_as->iommu_ir);
-- 
2.1.4

[Qemu-devel] [V17 2/4] hw/i386/trace-events: Add AMD IOMMU trace events

2016-08-31 Thread David Kiarie

Signed-off-by: David Kiarie 
---
 hw/i386/trace-events | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 7735e46..60bdf6a 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -13,3 +13,32 @@ mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
 
 # hw/i386/x86-iommu.c
 x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC 
invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
+
+# hw/i386/amd_iommu.c
+amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 
0x%"PRIx64 " +  offset 0x%"PRIx32
+amdvi_cache_update(uint16_t domid, uint32_t bus, uint32_t slot, uint32_t func, 
uint64_t gpa, uint64_t txaddr) " update iotlb domid 0x%"PRIx16" devid: 
%02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
+amdvi_completion_wait_fail(uint64_t addr) "error: fail to write at address 
0x%"PRIx64
+amdvi_mmio_write(const char *reg, uint64_t addr, unsigned size, uint64_t val, 
unsigned long offset) "%s write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64 ", 
offset 0x%"PRIx64
+amdvi_mmio_read(const char *reg, uint64_t addr, unsigned size, uint64_t 
offset) "%s read addr 0x%"PRIx64", size %d offset 0x%"PRIx64
+amdvi_command_error(uint64_t status) "error: Executing commands with command 
buffer disabled 0x%"PRIx64
+amdvi_command_read_fail(uint64_t addr, uint32_t head) "error: fail to access 
memory at 0x%"PRIx64" + 0x%"PRIu32
+amdvi_command_exec(uint32_t head, uint32_t tail, uint64_t buf) "command buffer 
head at 0x%"PRIx32 " command buffer tail at 0x%"PRIx32" command buffer base at 
0x%" PRIx64
+amdvi_unhandled_command(uint8_t type) "unhandled command %d"
+amdvi_intr_inval(void) "Interrupt table invalidated"
+amdvi_iotlb_inval(void) "IOTLB pages invalidated"
+amdvi_prefetch_pages(void) "Pre-fetch of AMD-Vi pages requested"
+amdvi_pages_inval(uint16_t domid) "AMD-Vi pages for domain 0x%"PRIx16 " 
invalidated"
+amdvi_all_inval(void) "Invalidation of all AMD-Vi cache requested "
+amdvi_ppr_exec(void) "Execution of PPR queue requested "
+amdvi_devtab_inval(uint16_t bus, uint16_t slot, uint16_t func) "device table 
entry for devid: %02x:%02x.%x invalidated"
+amdvi_completion_wait(uint64_t addr, uint64_t data) "completion wait requested 
with store address 0x%"PRIx64" and store data 0x%"PRIx64
+amdvi_control_status(uint64_t val) "MMIO_STATUS state 0x%"PRIx64
+amdvi_iotlb_reset(void) "IOTLB exceed size limit - reset "
+amdvi_completion_wait_exec(uint64_t addr, uint64_t data) "completion wait 
requested with store address 0x%"PRIx64" and store data 0x%"PRIx64
+amdvi_dte_get_fail(uint64_t addr, uint32_t offset) "error: failed to access 
Device Entry devtab 0x%"PRIx64" offset 0x%"PRIx32
+amdvi_invalid_dte(uint64_t addr) "PTE entry at 0x%"PRIx64" is invalid "
+amdvi_get_pte_hwerror(uint64_t addr) "hardware error eccessing PTE at addr 
0x%"PRIx64
+amdvi_mode_invalid(unsigned level, uint64_t addr)"error: translation level 
0x%"PRIu8" translating addr 0x%"PRIx64
+amdvi_page_fault(uint64_t addr) "error: page fault accessing guest physical 
address 0x%"PRIx64
+amdvi_iotlb_hit(uint16_t bus, uint16_t slot, uint16_t func, uint64_t addr, 
uint64_t txaddr) "hit iotlb devid %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
+amdvi_translation_result(uint16_t bus, uint16_t slot, uint16_t func, uint64_t 
addr, uint64_t txaddr) "devid: %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
-- 
2.1.4

[Qemu-devel] [v3 3/6] hw/iommu: Prepare for AMD IOMMU interrupt remapping

2016-08-31 Thread David Kiarie

Introduce macros and trace events for use in AMD IOMMU
interrupt remapping

Signed-off-by: David Kiarie 
---
 hw/i386/amd_iommu.h  | 80 
 hw/i386/trace-events |  7 +
 2 files changed, 87 insertions(+)

diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 884926e..5c4a13b 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -177,6 +177,68 @@
 #define AMDVI_IOTLB_MAX_SIZE 1024
 #define AMDVI_DEVID_SHIFT36
 
+/* interrupt types */
+#define AMDVI_MT_FIXED  0x0
+#define AMDVI_MT_ARBIT  0x1
+#define AMDVI_MT_SMI0x2
+#define AMDVI_MT_NMI0x3
+#define AMDVI_MT_INIT   0x4
+#define AMDVI_MT_EXTINT 0x6
+#define AMDVI_MT_LINT1  0xb
+#define AMDVI_MT_LINT0  0xe
+
+/* MSI interrupt type mask */
+#define AMDVI_IR_TYPE_MASK 0x300
+
+/* interrupt destination mode */
+#define AMDVI_IRDEST_MODE_MASK 0x2
+
+/* select MSI data 10:0 bits */
+#define AMDVI_IRTE_INDEX_MASK 0x7ff
+
+/* bits determining whether specific interrupts should be passed
+ * split DTE into 64-bit chunks
+ */
+#define AMDVI_DTE_INTPASS_LSHIFT   56
+#define AMDVI_DTE_EINTPASS_LSHIFT  57
+#define AMDVI_DTE_NMIPASS_LSHIFT   58
+#define AMDVI_DTE_INTCTL_RSHIFT60
+#define AMDVI_DTE_LINT0PASS_LSHIFT 62
+#define AMDVI_DTE_LINT1PASS_LSHIFT 63
+
+/* INTCTL expected values */
+#define AMDVI_INTCTL_ABORT  0x0
+#define AMDVI_INTCTL_PASS   0x1
+#define AMDVI_INTCTL_REMAP  0x2
+#define AMDVI_INTCTL_RSVD   0x3
+
+/* interrupt data valid */
+#define AMDVI_IR_VALID  (1UL << 0)
+
+/* interrupt root table mask */
+#define AMDVI_IRTEROOT_MASK 0xc0
+
+/* default IRTE size */
+#define AMDVI_DEFAULT_IRTE_SIZE 0x4
+
+#define AMDVI_IR_TABLE_SIZE_MASK 0xfe
+
+/* offsets into MSI data */
+#define AMDVI_MSI_DATA_DM_RSHIFT   0x8
+#define AMDVI_MSI_DATA_LEVEL_RSHIFT0xe
+#define AMDVI_MSI_DATA_TRM_RSHIFT  0xf
+
+/* offsets into MSI address */
+#define AMDVI_MSI_ADDR_DM_RSHIFT   0x2
+#define AMDVI_MSI_ADDR_RH_RSHIFT   0x3
+#define AMDVI_MSI_ADDR_DEST_RSHIFT 0xc
+
+#define AMDVI_BUS_NUM  0x0
+/* AMD-Vi specific IOAPIC Device function */
+#define AMDVI_DEVFN_IOAPIC 0xa0
+
+#define AMDVI_LOCAL_APIC_ADDR 0xfee0
+
 /* extended feature support */
 #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
 AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
@@ -214,6 +276,24 @@
 #define AMDVI_INT_ADDR_FIRST 0xfee0
 #define AMDVI_INT_ADDR_LAST  0xfeef
 
+#define AMDVI_INT_ADDR_SIZE ((AMDVI_INT_ADDR_LAST - \
+AMDVI_INT_ADDR_FIRST) + 1)
+/* AMD IOMMU errors */
+#define AMDVI_ILLEG_DEV_TAB  0x1
+#define AMDVI_IOPF_  0x2
+#define AMDVI_DEV_TAB_HW 0x3
+#define AMDVI_PAGE_TAB_HW0x4
+#define AMDVI_ILLEG_COM  0x5
+#define AMDVI_COM_HW 0x6
+#define AMDVI_IOTLB_TIMEOUT  0x7
+#define AMDVI_INVAL_DEV_REQ  0x8
+#define AMDVI_INVAL_PPR_REQ  0x9
+#define AMDVI_EVT_COUNT_ZERO 0xa
+
+/* represent target and master aborts error state */
+#define AMDVI_TARGET_ABORT 0xb
+#define AMDVI_MASTER_ABORT 0xc
+
 #define TYPE_AMD_IOMMU_DEVICE "amd-iommu"
 #define AMD_IOMMU_DEVICE(obj)\
 OBJECT_CHECK(AMDVIState, (obj), TYPE_AMD_IOMMU_DEVICE)
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 60bdf6a..344c2f6 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -42,3 +42,10 @@ amdvi_mode_invalid(unsigned level, uint64_t addr)"error: 
translation level 0x%"P
 amdvi_page_fault(uint64_t addr) "error: page fault accessing guest physical 
address 0x%"PRIx64
 amdvi_iotlb_hit(uint16_t bus, uint16_t slot, uint16_t func, uint64_t addr, 
uint64_t txaddr) "hit iotlb devid %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
 amdvi_translation_result(uint16_t bus, uint16_t slot, uint16_t func, uint64_t 
addr, uint64_t txaddr) "devid: %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
+amdvi_irte_get_fail(uint64_t addr, uint64_t offset) "couldn't access device 
table entry 0x%"PRIx64" + offset 0x%"PRIx64
+amdvi_invalid_irte_entry(uint16_t devid, uint64_t offset) "devid %x requested 
IRTE offset 0x%"PRIx64" Outside IR table range"
+amdvi_ir_request(uint32_t data, uint64_t addr, uint16_t sid) "IR request data 
0x%"PRIx32" address 0x%"PRIx64" SID %x"
+amdvi_ir_remap(uint32_t data, uint64_t addr, uint16_t sid) "IR remap data 
0x%"PRIx32" address 0x%"PRIx64" SID %x"
+amdvi_ir_target_abort(uint32_t data, uint64_t addr, uint16_t sid) "IR target 
abort data 0x%"PRIx32" address 0x%"PRIx64" SID %x"
+amdvi_ir_write_fail(uint64_t addr, uint32_t data) "fail to write to addr 
0x%"PRIx64 " value 0x%"PRIx32
+amdvi_ir_read_fail(uint64_t addr) " fail to read from addr 0x%"PRIx64
-- 
2.1.4

[Qemu-devel] [PATCH] i8257: Make device "i8257" unavailable with -device

2016-08-31 Thread Markus Armbruster

The ISA DMA controller needs to be wired up to the ISA bus by
isa_bus_dma() to actually work.

Signed-off-by: Markus Armbruster 
---
 hw/dma/i8257.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
index f345c54..f90df1d 100644
--- a/hw/dma/i8257.c
+++ b/hw/dma/i8257.c
@@ -598,6 +598,8 @@ static void i8257_class_init(ObjectClass *klass, void *data)
 idc->release_DREQ = i8257_dma_release_DREQ;
 idc->schedule = i8257_dma_schedule;
 idc->register_channel = i8257_dma_register_channel;
+/* Reason: needs to be wired up by isa_bus_dma() to work */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo i8257_info = {
-- 
2.5.5

[Qemu-devel] [V17 3/4] hw/i386: Introduce AMD IOMMU

2016-08-31 Thread David Kiarie

Add AMD IOMMU emulaton to Qemu in addition to Intel IOMMU.
The IOMMU does basic translation, error checking and has a
minimal IOTLB implementation. This IOMMU bypassed the need
for target aborts by responding with IOMMU_NONE access rights
and exempts the region 0xfee0-0xfeef from translation
as it is the q35 interrupt region.

We advertise features that are not yet implemented to please
the Linux IOMMU driver.

IOTLB aims at implementing commands on real IOMMUs which is
essential for debugging and may not offer any performance
benefits

Signed-off-by: David Kiarie 
---
 hw/i386/Makefile.objs |1 +
 hw/i386/amd_iommu.c   | 1381 +
 hw/i386/amd_iommu.h   |  289 +++
 3 files changed, 1671 insertions(+)
 create mode 100644 hw/i386/amd_iommu.c
 create mode 100644 hw/i386/amd_iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 90e94ff..909ead6 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -3,6 +3,7 @@ obj-y += multiboot.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
 obj-y += x86-iommu.o intel_iommu.o
+obj-y += amd_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
new file mode 100644
index 000..e6a4c58
--- /dev/null
+++ b/hw/i386/amd_iommu.c
@@ -0,0 +1,1381 @@
+/*
+ * QEMU emulation of AMD IOMMU (AMD-Vi)
+ *
+ * Copyright (C) 2011 Eduard - Gabriel Munteanu
+ * Copyright (C) 2015 David Kiarie, 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ * Cache implementation inspired by hw/i386/intel_iommu.c
+ */
+#include "qemu/osdep.h"
+#include "hw/i386/amd_iommu.h"
+#include "trace.h"
+
+/* used AMD-Vi MMIO registers */
+const char *amdvi_mmio_low[] = {
+"AMDVI_MMIO_DEVTAB_BASE",
+"AMDVI_MMIO_CMDBUF_BASE",
+"AMDVI_MMIO_EVTLOG_BASE",
+"AMDVI_MMIO_CONTROL",
+"AMDVI_MMIO_EXCL_BASE",
+"AMDVI_MMIO_EXCL_LIMIT",
+"AMDVI_MMIO_EXT_FEATURES",
+"AMDVI_MMIO_PPR_BASE",
+"UNHANDLED"
+};
+const char *amdvi_mmio_high[] = {
+"AMDVI_MMIO_COMMAND_HEAD",
+"AMDVI_MMIO_COMMAND_TAIL",
+"AMDVI_MMIO_EVTLOG_HEAD",
+"AMDVI_MMIO_EVTLOG_TAIL",
+"AMDVI_MMIO_STATUS",
+"AMDVI_MMIO_PPR_HEAD",
+"AMDVI_MMIO_PPR_TAIL",
+"UNHANDLED"
+};
+typedef struct AMDVIAddressSpace {
+uint8_t bus_num;/* bus number   */
+uint8_t devfn;  /* device function  */
+AMDVIState *iommu_state;/* AMDVI - one per machine  */
+MemoryRegion iommu; /* Device's address translation region  */
+MemoryRegion iommu_ir;  /* Device's interrupt remapping region  */
+AddressSpace as;/* device's corresponding address space */
+} AMDVIAddressSpace;
+
+/* AMDVI cache entry */
+typedef struct AMDVIIOTLBEntry {
+uint16_t domid; /* assigned domain id  */
+uint16_t devid; /* device owning entry */
+uint64_t perms; /* access permissions  */
+uint64_t translated_addr;   /* translated address  */
+uint64_t page_mask; /* physical page size  */
+} AMDVIIOTLBEntry;
+
+/* serialize IOMMU command processing */
+typedef struct QEMU_PACKED {
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t type:4;   /* command type   */
+uint64_t reserved:8;
+uint64_t store_addr:49;/* addr to write  */
+uint64_t completion_flush:1;   /* allow more executions  */
+uint64_t completion_int:1; /* set MMIOWAITINT*/
+uint64_t completion_store:1;   /* write data to address  */
+#else
+uint64_t completion_store:1;
+uint64_t completion_int:1;
+uint64_t completion_flush:1;
+uint64_t store_addr:49;
+uint64_t reserved:8;
+uint64_t type:4;
+#endif /* __BIG_ENDIAN_BITFIELD */
+uint64_t store_data;   /* data to write  */
+} CMDCompletionWait;
+
+/* invalidate internal caches for devid */
+typedef struct QEMU_PACKED {
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t devid:16; /* device to invalidate   */
+uint64_t reserved_1:44;
+uint64_t type:4;   /* command type   */
+#else
+uint64_t devid:16;
+uint64_t reserved_1:44;
+uint64_t type:4;
+#endif /* __BIG_ENDIAN_BITFIELD */
+uint64_t

[Qemu-devel] [V17 1/4] hw/pci: Prepare for AMD IOMMU

2016-08-31 Thread David Kiarie

Introduce PCI macros from for use by AMD IOMMU

Signed-off-by: David Kiarie 
---
 include/hw/pci/pci.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 929ec2f..5ff92de 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -11,11 +11,13 @@
 #include "hw/pci/pcie.h"
 
 /* PCI bus */
-
 #define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
+#define PCI_BUS_NUM(x)  (((x) >> 8) & 0xff)
 #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn) ((devfn) & 0x07)
 #define PCI_BUILD_BDF(bus, devfn) ((bus << 8) | (devfn))
+#define PCI_BUS_MAX 256
+#define PCI_DEVFN_MAX   256
 #define PCI_SLOT_MAX32
 #define PCI_FUNC_MAX8
 
-- 
2.1.4

[Qemu-devel] [V17 0/4] AMD IOMMU

2016-08-31 Thread David Kiarie

Hi all,

This patchset adds basic AMD IOMMU emulation support to Qemu. 

Changes since v16 - this is mainly supposed to come as a ping :-)
   -minor endian-ness fixes

Changes since v15
   -Endian-ness issue fix
   -cleaned up unused macros
   -removed guest frame number(gfn) from cache entry

Changes since v14
   -MMIO register reading/write bug fix [Peter]
   -Endian-ness issue fix[Peter]
   -Bitfields layouts in IOMMU commands fix[Peter]
   -IVRS changed IVHD device entry from type 3 to 1 to save a few bytes
   -coding style issues, comment grammer and other miscellaneous fixes.

Changes since v13
   -Added an error to make AMD IOMMU incompatible with device assignment.[Alex]
   -Converted AMD IOMMU into a composite PCI and System Bus device. This helps 
with:
  -We can now inherit from X86 IOMMU base class(which is implemented as a 
System Bus device).
  -We can now reserve MMIO region for IOMMU without a BAR register and 
without a hack.

Changes since v12

   -Coding style fixes [Jan, Michael]
   -Error logging fix to avoid using a macro[Jan]
   -moved some PCI macros to PCI header[Jan]
   -Use a lookup table for MMIO register names when tracing[Jan]

Changes since V11
   -AMD IOMMU is not started with -device amd-iommu (with a dependency on 
Marcel's patches).
   -IOMMU commands are represented using bitfields which is less error prone 
and more readable[Peter]
   -Changed from debug fprintfs to tracing[Jan]

Changes since V10
 
   -Support for huge pages including some obscure AMD IOMMU feature that allows 
default page size override[Jan].
   -Fixed an issue with generation of interrupts. We noted that AMD IOMMU has 
BusMaster- and is therefore not able to generate interrupts like any other PCI 
device. We have resulted in writing directly to system address but this could 
be fixed by some patches which have not been merged yet.

Changes since v9

   -amd_iommu prefixes have been renamed to a shorter 'amdvi' both in the macros
and in the functions/code. The register macros have not been moved to the 
implementation file since almost the macros there are basically macros and 
I 
reckoned renaming them should suffice.
   -taken care of byte order in the use of 'dma_memory_read'[Michael]
   -Taken care of invalid DTE entries to ensure no DMA unless a device is 
configured to allow it.
   -An issue with the emulate IOMMU defaulting to AMD_IOMMU has been 
fixed[Marcel]
   
You can test[1] this patches by starting with parameters 
qemu-system-x86_64 -M -device amd-iommu -m 2G -enable-kvm -smp 4 -cpu host 
-hda file.img -soundhw ac97 
emulating whatever devices you want.

Not passing any command line parameters to linux should be enough to test this 
patches since the devices are basically
passes-through but to the 'host' (l1 guest). You can still go ahead pass 
command line parameter 'iommu=pt iommu=1'
and try to pass a device to L2 guest. This can also done without passing any 
iommu related parameters to the kernel. 

David Kiarie (4):
  hw/pci: Prepare for AMD IOMMU
  hw/i386/trace-events: Add AMD IOMMU trace events
  hw/i386: Introduce AMD IOMMU
  hw/i386: AMD IOMMU IVRS table

 hw/acpi/aml-build.c |2 +-
 hw/i386/Makefile.objs   |1 +
 hw/i386/acpi-build.c|   76 ++-
 hw/i386/amd_iommu.c | 1383 +++
 hw/i386/amd_iommu.h |  289 +
 hw/i386/intel_iommu.c   |1 +
 hw/i386/trace-events|   29 +
 hw/i386/x86-iommu.c |6 +
 include/hw/acpi/aml-build.h |1 +
 include/hw/i386/x86-iommu.h |   12 +
 include/hw/pci/pci.h|4 +-
 11 files changed, 1793 insertions(+), 11 deletions(-)
 create mode 100644 hw/i386/amd_iommu.c
 create mode 100644 hw/i386/amd_iommu.h

-- 
2.1.4

Re: [Qemu-devel] [virtio-comment] [PATCH] * Vhost-pci RFC v2 *

2016-08-31 Thread Stefan Hajnoczi

On Tue, Aug 30, 2016 at 10:08:01AM +, Wang, Wei W wrote:
> On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > To: Wang, Wei W 
> > Cc: k...@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> > comm...@lists.oasis-open.org; m...@redhat.com; pbonz...@redhat.com
> > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > 
> > On Mon, Jun 27, 2016 at 02:01:24AM +, Wang, Wei W wrote:
> > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > This RFC proposes a design of vhost-pci, which is a new virtio device 
> > > > type.
> > > > The vhost-pci device is used for inter-VM communication.
> > > >
> > > > Changes in v2:
> > > > 1. changed the vhost-pci driver to use a controlq to send 
> > > > acknowledgement
> > > >messages to the vhost-pci server rather than writing to the device
> > > >configuration space;
> > > >
> > > > 2. re-organized all the data structures and the description 
> > > > layout;
> > > >
> > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > which
> > > > is redundant;
> > > >
> > > > 4. added a message sequence number to the msg info structure to 
> > > > identify socket
> > > >messages, and the socket message exchange does not need to be 
> > > > blocking;
> > > >
> > > > 5. changed to used uuid to identify each VM rather than using the 
> > > > QEMU
> > process
> > > >id
> > > >
> > >
> > > One more point should be added is that the server needs to send 
> > > periodic socket messages to check if the driver VM is still alive. I 
> > > will add this message support in next version.  (*v2-AR1*)
> > 
> > Either the driver VM could go down or the device VM (server) could go 
> > down.  In both cases there must be a way to handle the situation.
> > 
> > If the server VM goes down it should be possible for the driver VM to 
> > resume either via hotplug of a new device or through messages 
> > reinitializing the dead device when the server VM restarts.
> 
> I got feedbacks from people that the name of device VM and driver VM are 
> difficult to remember. Can we use client (or frontend) VM and server (or 
> backend) VM in the discussion? I think that would sound more straightforward 
> :)

We discussed this in a previous email thread.

Device and driver are the terms used by the virtio spec.  Anyone dealing
with vhost-pci design must be familiar with the virtio spec.

I don't see how using the terminology consistently can be confusing,
unless these people haven't looked at the virtio spec.  In that case
they have no business with working on vhost-pci because virtio is a
prerequisite :).

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] sh4: fix broken link to documentation

2016-08-31 Thread Aurelien Jarno

On 2016-08-31 17:54, Reda Sallahi wrote:
> The page that was previously linked in the source code and the README file is
> no longer available so it now returns a 404 error message.
> 
> This puts a previous snapshot from archive.org instead.
> 
> Signed-off-by: Reda Sallahi 
> ---
>  hw/sh4/shix.c | 2 +-
>  target-sh4/README.sh4 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sh4/shix.c b/hw/sh4/shix.c
> index ccc9e75..2ac79fc 100644
> --- a/hw/sh4/shix.c
> +++ b/hw/sh4/shix.c
> @@ -23,7 +23,7 @@
>   */
>  /*
> Shix 2.0 board by Alexis Polti, described at
> -   http://perso.enst.fr/~polti/realisations/shix20/
> +   
> web.archive.org/web/20070917001736/perso.enst.fr/~polti/realisations/shix20

Thanks for the patch. Maybe put the http:// in front. It will go over
the 80 characters limits, but I think that's fine in such a case.

I guess this will be merged through the trivial queue, that seems the
best to me.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH] sh4: fix broken link to documentation

2016-08-31 Thread Reda Sallahi

The page that was previously linked in the source code and the README file is
no longer available so it now returns a 404 error message.

This puts a previous snapshot from archive.org instead.

Signed-off-by: Reda Sallahi 
---
 hw/sh4/shix.c | 2 +-
 target-sh4/README.sh4 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/sh4/shix.c b/hw/sh4/shix.c
index ccc9e75..2ac79fc 100644
--- a/hw/sh4/shix.c
+++ b/hw/sh4/shix.c
@@ -23,7 +23,7 @@
  */
 /*
Shix 2.0 board by Alexis Polti, described at
-   http://perso.enst.fr/~polti/realisations/shix20/
+   web.archive.org/web/20070917001736/perso.enst.fr/~polti/realisations/shix20
 
More information in target-sh4/README.sh4
 */
diff --git a/target-sh4/README.sh4 b/target-sh4/README.sh4
index e578830..ece0464 100644
--- a/target-sh4/README.sh4
+++ b/target-sh4/README.sh4
@@ -25,7 +25,7 @@ Goals
 
 The primary model being worked on is the soft MMU target to be able to
 emulate the Shix 2.0 board by Alexis Polti, described at
-http://perso.enst.fr/~polti/realisations/shix20/
+https://web.archive.org/web/20070917001736/http://perso.enst.fr/~polti/realisations/shix20/
 
 Ultimately, qemu will be coupled with a system C or a verilog
 simulator to simulate the whole board functionalities.
-- 
2.9.3

Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Alex Williamson

On Wed, 31 Aug 2016 15:04:13 +0800
Jike Song  wrote:

> On 08/31/2016 02:12 PM, Tian, Kevin wrote:
> >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> >> Sent: Wednesday, August 31, 2016 12:17 AM
> >>
> >> Hi folks,
> >>
> >> At KVM Forum we had a BoF session primarily around the mediated device
> >> sysfs interface.  I'd like to share what I think we agreed on and the
> >> "problem areas" that still need some work so we can get the thoughts
> >> and ideas from those who weren't able to attend.
> >>
> >> DanPB expressed some concern about the mdev_supported_types sysfs
> >> interface, which exposes a flat csv file with fields like "type",
> >> "number of instance", "vendor string", and then a bunch of type
> >> specific fields like "framebuffer size", "resolution", "frame rate
> >> limit", etc.  This is not entirely machine parsing friendly and sort of
> >> abuses the sysfs concept of one value per file.  Example output taken
> >> from Neo's libvirt RFC:
> >>
> >> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> >> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> >> framebuffer,
> >> max_resolution
> >> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> >> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> >> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> >> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> >> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> >> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> >> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> >> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> >>
> >> The create/destroy then looks like this:
> >>
> >> echo "$mdev_UUID:vendor_specific_argument_list" >
> >>/sys/bus/pci/devices/.../mdev_create
> >>
> >> echo "$mdev_UUID:vendor_specific_argument_list" >
> >>/sys/bus/pci/devices/.../mdev_destroy
> >>
> >> "vendor_specific_argument_list" is nebulous.
> >>
> >> So the idea to fix this is to explode this into a directory structure,
> >> something like:
> >>
> >> ├── mdev_destroy
> >> └── mdev_supported_types
> >> ├── 11
> >> │   ├── create
> >> │   ├── description
> >> │   └── max_instances
> >> ├── 12
> >> │   ├── create
> >> │   ├── description
> >> │   └── max_instances
> >> └── 13
> >> ├── create
> >> ├── description
> >> └── max_instances
> >>
> >> Note that I'm only exposing the minimal attributes here for simplicity,
> >> the other attributes would be included in separate files and we would
> >> require vendors to create standard attributes for common device classes.  
> > 
> > I like this idea. All standard attributes are reflected into this hierarchy.
> > In the meantime, can we still allow optional vendor string in create 
> > interface? libvirt doesn't need to know the meaning, but allows upper
> > layer to do some vendor specific tweak if necessary.
> >   
> 
> Not sure whether this can done within MDEV framework (attrs provided by
> vendor driver of course), or must be within the vendor driver.

The purpose of the sub-directories is that libvirt doesn't need to pass
arbitrary, vendor strings to the create function, the attributes of the
mdev device created are defined by the attributes in the sysfs
directory where the create is done.  The user only provides a uuid for
the device.  Arbitrary vendor parameters are a barrier, libvirt may not
need to know the meaning, but would need to know when to apply them,
which is just as bad.  Ultimately we want libvirt to be able to
interact with sysfs without having an vendor specific knowledge.

> >>
> >> For vGPUs like NVIDIA where we don't support multiple types
> >> concurrently, this directory structure would update as mdev devices are
> >> created, removing no longer available types.  I carried forward  
> > 
> > or keep the type with max_instances cleared to ZERO.
> >  
> 
> +1 :)

Possible yes, but why would the vendor driver report types that the
user cannot create?  It just seems like superfluous information (well,
except for the use I discover below).

> >> max_instances here, but perhaps we really want to copy SR-IOV and
> >> report a max and current allocation.  Creation and deletion is  
> > 
> > right, cur/max_instances look reasonable.
> >   
> >> simplified as we can simply "echo $UUID > create" per type.  I don't
> >> understand why destroy had a parameter list, so here I imagine we can
> >> simply do the same... in fact, I'd actually rather see a "remove" sysfs
> >> entry under each mdev device, so we remove it at the device rather than
> >> in some central location (any objections?).  
> > 
> > OK to me.   
> 
> IIUC, "destroy" has a parameter list is only because the previous
> $VM_UUID + instnace implementation. It should be safe to

[Qemu-devel] QEMU and/or GDB position opening at AdaCore

2016-08-31 Thread Fabien Chouteau

Hello QEMU folks,

AdaCore [1] is opening a QEMU and/or GDB engineer position. You guessed
it, we are looking for someone familiar with low-level programming,
assembly, CPU architectures, etc. On the QEMU side we work on the ARM,
PPC, SPARC and x86 architectures in "full" system emulation only.  Prior
experience with debugger or compiler development is a plus. The location
would be Paris (France) (or maybe New-York (USA)).

I don't have the complete job description yet, but please contact me if
you are interested.

Regards,

[1] http://www.adacore.com/

[Qemu-devel] Implementation of BusLogic SCSI host adapter (BT-958)

2016-08-31 Thread Денис Дмитриев

Hi,
I'm trying to implement a buslogic scsi adapter(BT-958) for qemu. I
realized the driver interaction with the ports through which the driver can
write / read commands and parameters for the adapter. The driver was able
to make an adapter sample procedure. The problem is that I do not
understand how to establish communication between the driver and adapter
for transferring mailboxes. I got the address of the mailboxes from the
driver. Then I calculated the start address of the desired mailbox (the
same address to which a driver recorded a mailbox). After that, I try to
read the data at this address using pci_dma_read function but no buffer
data after reading.


uint64_t buslogicReadOutgoingMailbox(BuslogicState *s, BUSLOGICTASKSTATE
*TaskState)
{
uint64_tGCMailbox;
Mailbox24   Mbx24;
Mbx24.uCmdState = 0;
PCIDevice *pci_dev = PCI_DEVICE(s);
if (s->fMbxIs24Bit)
{
//try to calculate mailbox address
GCMailbox = s->GCPhysAddrMailboxOutgoingBase +
(s->uMailboxOutgoingPositionCurrent * sizeof(Mailbox24));
//try to read mailbox
pci_dma_read(pci_dev, GCMailbox, , sizeof(Mailbox24));
//after that i  have empty buffer
TaskState->MailboxGuest.u32PhysAddrCCB=
ADDR_TO_U32(Mbx24.aPhysAddrCCB);
TaskState->MailboxGuest.u.out.uActionCode = Mbx24.uCmdState;
}
else
{
GCMailbox = s->GCPhysAddrMailboxOutgoingBase +
(s->uMailboxOutgoingPositionCurrent * sizeof(Mailbox32));
pci_dma_read(pci_dev, GCMailbox, >MailboxGuest,
sizeof(Mailbox32));
}
return GCMailbox;
}

Re: [Qemu-devel] [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-08-31 Thread Michael S. Tsirkin

On Wed, Aug 31, 2016 at 05:08:00PM +0900, Namhyung Kim wrote:
> The virtio pstore driver provides interface to the pstore subsystem so
> that the guest kernel's log/dump message can be saved on the host
> machine.  Users can access the log file directly on the host, or on the
> guest at the next boot using pstore filesystem.  It currently deals with
> kernel log (printk) buffer only, but we can extend it to have other
> information (like ftrace dump) later.
> 
> It supports legacy PCI device using single order-2 page buffer.  It uses
> two virtqueues - one for (sync) read and another for (async) write.
> Since it cannot wait for write finished, it supports up to 128
> concurrent IO.  The buffer size is configurable now.
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: "Michael S. Tsirkin" 
> Cc: Anthony Liguori 
> Cc: Anton Vorontsov 
> Cc: Colin Cross 
> Cc: Kees Cook 
> Cc: Tony Luck 
> Cc: Steven Rostedt 
> Cc: Ingo Molnar 
> Cc: Minchan Kim 
> Cc: Will Deacon 
> Cc: k...@vger.kernel.org
> Cc: qemu-devel@nongnu.org
> Cc: virtualizat...@lists.linux-foundation.org
> Cc: virtio-...@lists.oasis-open.org
> Signed-off-by: Namhyung Kim 
> ---
>  drivers/virtio/Kconfig |  10 +
>  drivers/virtio/Makefile|   1 +
>  drivers/virtio/virtio_pstore.c | 417 
> +
>  include/uapi/linux/Kbuild  |   1 +
>  include/uapi/linux/virtio_ids.h|   1 +
>  include/uapi/linux/virtio_pstore.h |  74 +++
>  6 files changed, 504 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pstore.c
>  create mode 100644 include/uapi/linux/virtio_pstore.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 77590320d44c..8f0e6c796c12 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -58,6 +58,16 @@ config VIRTIO_INPUT
>  
>If unsure, say M.
>  
> +config VIRTIO_PSTORE
> + tristate "Virtio pstore driver"
> + depends on VIRTIO
> + depends on PSTORE
> + ---help---
> +  This driver supports virtio pstore devices to save/restore
> +  panic and oops messages on the host.
> +
> +  If unsure, say M.
> +
>   config VIRTIO_MMIO
>   tristate "Platform bus driver for memory mapped virtio devices"
>   depends on HAS_IOMEM && HAS_DMA
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 41e30e3dc842..bee68cb26d48 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
> diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
> new file mode 100644
> index ..ec41f0d2f0b7
> --- /dev/null
> +++ b/drivers/virtio/virtio_pstore.c
> @@ -0,0 +1,417 @@
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define VIRT_PSTORE_ORDER2
> +#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
> +#define VIRT_PSTORE_NR_REQ   128
> +
> +struct virtio_pstore {
> + struct virtio_device*vdev;
> + struct virtqueue*vq[2];
> + struct pstore_info   pstore;
> + struct virtio_pstore_req req[VIRT_PSTORE_NR_REQ];
> + struct virtio_pstore_res res[VIRT_PSTORE_NR_REQ];
> + unsigned int req_id;
> +
> + /* Waiting for host to ack */
> + wait_queue_head_t   acked;
> + int failed;
> +};
> +
> +#define TYPE_TABLE_ENTRY(_entry) \
> + { PSTORE_TYPE_##_entry, VIRTIO_PSTORE_TYPE_##_entry }
> +
> +struct type_table {
> + int pstore;
> + u16 virtio;
> +} type_table[] = {
> + TYPE_TABLE_ENTRY(DMESG),
> +};
> +
> +#undef TYPE_TABLE_ENTRY
> +
> +
> +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id 
> type)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < ARRAY_SIZE(type_table); i++) {
> + if (type == type_table[i].pstore)
> + return cpu_to_virtio16(vps->vdev, type_table[i].virtio);

Does this pass sparse checks? If yes I'm surprised - this clearly
returns a virtio16 type.


> + }
> +
> + return cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
> +}
> +
> +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 
> type)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < ARRAY_SIZE(type_table); i++) {
> + if (virtio16_to_cpu(vps->vdev, type) == type_table[i].virtio)
> +

Re: [Qemu-devel] [PATCH V7 0/6] coroutine: mmap stack memory and stack size

2016-08-31 Thread Kevin Wolf

Am 23.08.2016 um 16:21 hat Peter Lieven geschrieben:
> I decided to split this from the rest of the Qemu RSS usage series as
> it contains the more or less non contentious patches.
> 
> I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
> side effects it has.
> 
> I kept the guard page which is now nicely makes the stacks visible in
> smaps. The old version of the relevent patch lacked the MAP_FIXED flag
> in the second call to mmap.
> 
> The last patch which reduces the stack size of coroutines to 64kB
> may be omitted if its found to risky.

Thanks, applied to block-next.

Kevin

Re: [Qemu-devel] [RFC] target-i386: present virtual L3 cache info for vcpus

2016-08-31 Thread Eduardo Habkost

On Wed, Aug 31, 2016 at 08:59:41AM +0800, Longpeng (Mike) wrote:
[...]
> >> -/* No L3 cache: */
> >> -#define L3_SIZE_KB 0 /* disabled */
> >> -#define L3_ASSOCIATIVITY   0 /* disabled */
> >> -#define L3_LINES_PER_TAG   0 /* disabled */
> >> -#define L3_LINE_SIZE   0 /* disabled */
> >> +/* Level 3 unified cache: */
> >> +#define L3_LINE_SIZE  64
> >> +#define L3_ASSOCIATIVITY  24
> >> +#define L3_SETS 8192
> >> +#define L3_PARTITIONS  1
> >> +#define L3_DESCRIPTOR CPUID_2_L3_12MB_24WAY_64B
> >> +/*FIXME: CPUID leaf 0x8006 is inconsistent with leaves 2 & 4 */
> > 
> > Why are you intentionally introducing a bug?
> 
> Please forgive my foolish, I will fix it.
> 
> By the way, There are the same legacy bugs in L1/L2 codes, is there need to 
> fix
> them ?

We want to fix them, but fixing them also requires a mechanism to
configure cache sizes (so we keep compatibility on old
machine-types).

-- 
Eduardo

[Qemu-devel] Make file problem?

2016-08-31 Thread Programmingkid

When I tried to make a change to the file hw/misc/macio/cuda.c, the make 
command would not detect the changes. I would have to delete the cuda.o file to 
make the changes actually compile.

Re: [Qemu-devel] [PATCH v5] virtio-pci: error out when both legacy and modern modes are disabled

2016-08-31 Thread Greg Kurz

Michael,

I realize this patch fell through the cracks while I was away... the various
reviews seemed to indicate there was a consensus to have this in 2.7 though.

Do you have an opinion whether QEMU needs this or not ?

Cc'ing Peter in case it is acceptable to apply the patch this late.

Cheers.

--
Greg

On Fri, 22 Jul 2016 16:05:29 +0200
Greg Kurz  wrote:

> From: Greg Kurz 
> 
> Without presuming if we got there because of a user mistake or some
> more subtle bug in the tooling, it really does not make sense to
> implement a non-functional device.
> 
> Signed-off-by: Greg Kurz 
> Reviewed-by: Marcel Apfelbaum 
> Reviewed-by: Cornelia Huck 
> Signed-off-by: Greg Kurz 
> ---
> v5: - changed wording as suggested by Connie
> - added Connies R-b tag
> ---
>  hw/virtio/virtio-pci.c |8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 755f9218b77d..8714123d61fd 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
> Error **errp)
>  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>  PCIDevice *pci_dev = >pci_dev;
>  
> +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {
> +error_setg(errp, "device cannot work as neither modern nor legacy 
> mode"
> +   " is enabled");
> +error_append_hint(errp, "Set either disable-modern or disable-legacy"
> +  " to off\n");
> +return;
> +}
> +
>  if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE) &&
>  virtio_pci_modern(proxy)) {
>  pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> 
>

Re: [Qemu-devel] [PATCH] qemu-iotests: Log QMP traffic in debug mode

2016-08-31 Thread Kevin Wolf

Am 23.08.2016 um 20:59 hat Eric Blake geschrieben:
> On 08/23/2016 09:46 AM, Kevin Wolf wrote:
> > Python tests are already annoying enough to debug. With QMP traffic
> > available it's a little bit easier at least.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  tests/qemu-iotests/iotests.py | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> > index 03bccdd..3329bc1 100644
> > --- a/tests/qemu-iotests/iotests.py
> > +++ b/tests/qemu-iotests/iotests.py
> > @@ -50,6 +50,7 @@ cachemode = os.environ.get('CACHEMODE')
> >  qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE')
> >  
> >  socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 
> > 'socket_scm_helper')
> > +debug = False
> >  
> >  def qemu_img(*args):
> >  '''Run qemu-img and return the exit code'''
> > @@ -134,6 +135,8 @@ class VM(qtest.QEMUQtestMachine):
> >  def __init__(self):
> >  super(VM, self).__init__(qemu_prog, qemu_opts, test_dir=test_dir,
> >   socket_scm_helper=socket_scm_helper)
> > +if debug:
> > +self._debug = True
> >  self._num_drives = 0
> >  
> 
> So we already had plumbing for debug...
> 
> >  def add_device(self, opts):
> > @@ -323,6 +326,8 @@ def verify_quorum():
> >  def main(supported_fmts=[], supported_oses=['linux']):
> >  '''Run tests'''
> >  
> > +global debug
> 
> ...but just needed to turn it on?

Yes, the parent class already implements the logging. I guess this was
done in the context of some other scripts (maybe QMP shell?), so we just
have to reuse what's already there.

Kevin


pgpCcv02vuV1P.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] scsi: check page count while initialising descriptor rings

2016-08-31 Thread Dmitry Fleytman


> On 31 Aug 2016, at 09:49 AM, P J P  wrote:
> 
> From: Prasad J Pandit 
> 
> Vmware Paravirtual SCSI emulation uses command descriptors to
> process SCSI commands. These descriptors come with their ring
> buffers. A guest could set the page count for these rings to
> an arbitrary value, leading to infinite loop or OOB access.
> Add check to avoid it.
> 
> Reported-by: Tom Victor 
> Signed-off-by: Prasad J Pandit 
> ---
> hw/scsi/vmw_pvscsi.c | 17 +
> 1 file changed, 9 insertions(+), 8 deletions(-)
> 
> Update:
>Moved Request and Confirm rings page count check to the parent
> function -> pvscsi_on_cmd_setup_rings().
> 
> diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
> index 5116f4a..79aa88c 100644
> --- a/hw/scsi/vmw_pvscsi.c
> +++ b/hw/scsi/vmw_pvscsi.c
> @@ -160,10 +160,6 @@ pvscsi_ring_init_data(PVSCSIRingInfo *m, 
> PVSCSICmdDescSetupRings *ri)
> uint32_t req_ring_size, cmp_ring_size;
> m->rs_pa = ri->ringsStatePPN << VMW_PAGE_SHIFT;
> 
> -if ((ri->reqRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES)
> -|| (ri->cmpRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES)) {
> -return -1;
> -}

Hello Prasad,

Why did you decide to move this logic out of pvscsi_ring_init_data()?
Why not just amend existing “if" as you did in v1 of this patch?

~Dmitry

> req_ring_size = ri->reqRingNumPages * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> cmp_ring_size = ri->cmpRingNumPages * PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> txr_len_log2 = pvscsi_log2(req_ring_size - 1);
> @@ -746,7 +742,7 @@ pvscsi_dbg_dump_tx_rings_config(PVSCSICmdDescSetupRings 
> *rc)
> 
> trace_pvscsi_tx_rings_num_pages("Confirm Ring", rc->cmpRingNumPages);
> for (i = 0; i < rc->cmpRingNumPages; i++) {
> -trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->reqRingPPNs[i]);
> +trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->cmpRingPPNs[i]);
> }
> }
> 
> @@ -777,12 +773,17 @@ pvscsi_on_cmd_setup_rings(PVSCSIState *s)
> PVSCSICmdDescSetupRings *rc =
> (PVSCSICmdDescSetupRings *) s->curr_cmd_data;
> 
> +if (!rc->reqRingNumPages
> +|| rc->reqRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
> +|| !rc->cmpRingNumPages
> +|| rc->cmpRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES) {
> +return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_RINGS");
> 
> pvscsi_dbg_dump_tx_rings_config(rc);
> -if (pvscsi_ring_init_data(>rings, rc) < 0) {
> -return PVSCSI_COMMAND_PROCESSING_FAILED;
> -}
> +pvscsi_ring_init_data(>rings, rc);
> 
> s->rings_info_valid = TRUE;
> return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> -- 
> 2.5.5
>

Re: [Qemu-devel] [PATCH v2 2/2] e1000: fix buliding complaint

2016-08-31 Thread Dmitry Fleytman

Reviewed-by: Dmitry Fleytman 

> On 30 Aug 2016, at 07:10 AM, Gonglei  wrote:
> 
> hw/net/e1000e_core.c:56: warning: e1000e_set_interrupt_cause declared inline 
> after being called
> hw/net/e1000e_core.c:56: warning: previous declaration of 
> e1000e_set_interrupt_cause was here
> 
> Signed-off-by: Gonglei 
> ---
> hw/net/e1000e_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index badb1fe..825e169 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -2168,7 +2168,7 @@ e1000e_update_interrupt_state(E1000ECore *core)
> }
> }
> 
> -static inline void
> +static void
> e1000e_set_interrupt_cause(E1000ECore *core, uint32_t val)
> {
> trace_e1000e_irq_set_cause_entry(val, core->mac[ICR]);
> -- 
> 1.7.12.4
> 
>

Re: [Qemu-devel] [PATCH] * Vhost-pci RFC v2 *

2016-08-31 Thread Marc-André Lureau

Hi

On Sun, Jun 19, 2016 at 10:19 AM Wei Wang  wrote:

> This RFC proposes a design of vhost-pci, which is a new virtio device type.
> The vhost-pci device is used for inter-VM communication.
>
>
Before I send a more complete review of the spec, I have a few overall
questions:

- this patch is for the virtio spec? Why not patch the spec directly (
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/trunk/) I
expect several rfc iterations, so perhaps it's easier as plain text file
for now (as a qemu patch to doc/specs). btw, I would limit the audience at
qemu-devel for now.
- I think the virtio spec should limit itself to the hw device description,
and virtioq messages. Not the backend implementation (the ipc details,
client/server etc).
- If it could be made not pci-specific, a better name for the device could
be simply "driver": the driver of a virtio device. Or the "slave" in
vhost-user terminology - consumer of virtq. I think you prefer to call it
"backend" in general, but I find it more confusing.
- regarding the socket protocol, why not reuse vhost-user? it seems to me
it supports most of what you need and more (like interrupt, migrations,
protocol features, start/stop queues). Some of the extensions, like uuid,
could be beneficial to vhost-user too.
- Why is it required or beneficial to support multiple "frontend" devices
over the same "vhost-pci" device? It could simplify things if it was a
single device. If necessary, that could also be interesting as a vhost-user
extension.
- no interrupt support, I suppose you mainly looked at poll-based net
devices
- when do you expect to share a wip/rfc implementation?

thanks

Changes in v2:
> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>messages to the vhost-pci server rather than writing to the device
>configuration space;
>
> 2. re-organized all the data structures and the description layout;
>
> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which is
> redundant;
>
> 4. added a message sequence number to the msg info structure to identify
> socket
>messages, and the socket message exchange does not need to be blocking;
>
> 5. changed to used uuid to identify each VM rather than using the QEMU
> process
>id
>
> Wei Wang (1):
>   Vhost-pci RFC v2: a new virtio device for inter-VM communication
>
>  vhost-pci.patch | 341
> 
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch
>
> --
> 1.8.3.1
>
>
> --
Marc-André Lureau

1 2 >

1 - 100 of 132 matches

Mail list logo