date:20160131

Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-31 Thread Alex Williamson

On Sat, 2016-01-30 at 01:18 +, Kay, Allen M wrote:
> 
> > -Original Message-
> > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Friday, January 29, 2016 10:00 AM
> > To: Gerd Hoffmann
> > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > us...@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> > for any Intel VGA device, but I wonder if I should only be enabling anything
> > opregion if it also appears at a specific address.
> > 
> 
> No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have 
> seen 0:5.0 in the guest and the driver works.

Thanks Allen.  Another question, when I boot a VM with an assigned HD
P4000 GPU, my console stream with IOMMU faults, like:

DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 

All of these fall within the host RMRR range for the device:

DMAR: Setting RMRR:
DMAR: Setting identity map for device :00:02.0 [0x9f80 - 0xaf9f]

A while back, we excluded devices using RMRRs from participating in
IOMMU API domains because they may continue to DMA to these reserved
regions after assignment, possibly corrupting VM memory
(c875d2c1b808).  Intel later decided this exclusion shouldn't apply to
graphics devices (18436afdc11a).  Don't the above IOMMU faults reveal
that exactly the problem we're trying to prevent by general exclusion of
RMRR encumbered devices from the IOMMU API is actually occuring?  If I
were to have VM memory within the RMRR address range, I wouldn't be
seeing these faults, I'd be having the GPU corrupt my VM memory.

David notes in the latter commit above:

"We should be able to successfully assign graphics devices to guests
too, as long as the initial handling of stolen memory is reconfigured
appropriately."

What code is supposed to be doing that reconfiguration when a device is
assigned?  Clearly we don't have it yet, making assignment of these
devices very unsafe.  It seems like vfio or IOMMU code  in the kernel
needs device specific code to clear these settings to make it safe for
userspace, then perhaps VM BIOS support to reallocate.  Is there any
consistency across IGD revisions for doing this?  Is there a spec?
Thanks,

Alex

Re: [Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest

2016-01-31 Thread Michael S. Tsirkin

On Mon, Feb 01, 2016 at 02:13:19AM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> Patch v2 add detailed commit log.
> 
> This patch is to support WHQL test for Windows guest, while this feature also
> benifits other guest works as a kernel 'gro' like feature with userspace 
> implementation.
> Feature information:
>   http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324
> 
> Both IPv4 and IPv6 are supported, though performance with userspace virtio
> is slow than vhost-net, there is about 30-40 percent performance
> improvement to userspace virtio, this is done by turning this feature on
> and disable 'tso' on corresponding tap interface.
> 
> Test steps:
> Although this feature is mainly used for window guest, i used linux guest to 
> help test
> the feature, to make things simple, i used 3 steps to test the patch as i 
> moved on.
> 1. With a tcp socket client/server pair runnig on 2 linux guest, thus i can 
> control
> the traffic and debugging the code as i want.
> 2. Netperf on linux guest test the throughput.
> 3. WHQL test with 2 Windows guest.
> 
> Current status:
> IPv4 pass all the above tests. 
> IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot 
> receive
> any packet in WHQL test, debugging on the host side shows all the packets 
> have been
> pushed to th vring, by replacing it with a linux guest, i add 10 extra 
> packets before
> sending out the real packet, tcpdump running on guest only capture 6 packets, 
> don't
> find out the root cause yet, will continue working on this.
> 
> Note:
> A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes 
> during setup,
> this can be figured out by replacing it with an 'e1000' nic.

Either memory corruption or unrelated bug.
try with valgrind?

> Pending issues & Todo list:
> 1. Dup ack count not added in the virtio_net_hdr, but WHQL test case passes,
> looks like a bug in test case.

Maybe that's ok - as long as packets are not forwarded.

> 2. Missing a Feature Bit

Do we need a new bit? Maybe for ack coalescing only ...

> 3. Missing a few tcp/ip handling
> ECN change.
> TCP window scale.
> 
> Wei Xu (10):
>   virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'
>   virtio-net rsc: Initilize & Cleanup
>   virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4
>   virtio-net rsc: Detailed IPv4 and General TCP data coalescing
>   virtio-net rsc: Create timer to drain the packets from the cache pool
>   virtio-net rsc: IPv4 checksum
>   virtio-net rsc: Checking TCP flag and drain specific connection
> packets
>   virtio-net rsc: Sanity check & More bypass cases check
>   virtio-net rsc: Add IPv6 support
>   virtio-net rsc: Add Receive Segment Coalesce statistics
> 
>  hw/net/virtio-net.c| 626 
> -
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h |  65 +
>  3 files changed, 691 insertions(+), 1 deletion(-)
> 
> -- 
> 2.4.0

[Qemu-devel] [PATCH 1/3] ppc: fix timebase adjustment during migration

2016-01-31 Thread Mark Cave-Ayland

ns_diff is already clamped to a minimum of 0 to prevent the timebase going
backwards during migration due to misaligned clocks. Following on from this
migration_duration_tb is also subject to the same constraint; hence the
expression MIN(0, migration_duration_tb) always evaluates to 0 and so no
timebase adjustment ever takes place.

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/ppc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index ce90b09..19f4570 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -877,7 +877,7 @@ static int timebase_post_load(void *opaque, int version_id)
 migration_duration_ns = MIN(NANOSECONDS_PER_SECOND, ns_diff);
 migration_duration_tb = muldiv64(migration_duration_ns, freq,
  NANOSECONDS_PER_SECOND);
-guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
+guest_tb = tb_remote->guest_timebase + migration_duration_tb;
 
 tb_off_adj = guest_tb - cpu_get_host_ticks();
 
-- 
1.7.10.4

[Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac_newworld.c |4 
 hw/ppc/mac_oldworld.c |4 
 2 files changed, 8 insertions(+)

diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index f95086b..3283f1d 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
 int *token = g_new(int, 1);
 hwaddr nvram_addr = 0xFFF04000;
 uint64_t tbfreq;
+PPCTimebase *tb;
 
 linux_boot = (kernel_filename != NULL);
 
@@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
 /* Set time-base frequency to 100 Mhz */
 cpu_ppc_tb_init(env, TBFREQ);
 qemu_register_reset(ppc_core99_reset, cpu);
+
+tb = g_malloc0(sizeof(PPCTimebase));
+vmstate_register(NULL, -1, _ppc_timebase, tb);
 }
 
 /* allocate RAM */
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 8984398..45e410b 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -104,6 +104,7 @@ static void ppc_heathrow_init(MachineState *machine)
 DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
 void *fw_cfg;
 uint64_t tbfreq;
+PPCTimebase *tb;
 
 linux_boot = (kernel_filename != NULL);
 
@@ -121,6 +122,9 @@ static void ppc_heathrow_init(MachineState *machine)
 /* Set time-base frequency to 16.6 Mhz */
 cpu_ppc_tb_init(env,  TBFREQ);
 qemu_register_reset(ppc_heathrow_reset, cpu);
+
+tb = g_malloc0(sizeof(PPCTimebase));
+vmstate_register(NULL, -1, _ppc_timebase, tb);
 }
 
 /* allocate RAM */
-- 
1.7.10.4

[Qemu-devel] [PATCH 0/3] ppc: add timebase migration support to Mac machines

2016-01-31 Thread Mark Cave-Ayland

This patchset allows migration of the PPC timebase for g3beige/mac99
machines under TCG on non-PPC hosts.

The majority of the work is in patch 2: here the existing migration code is
split into PPC and non-PPC host codepaths (where the previous behaviour is
preserved). In effect, non-PPC hosts use QEMU's emulated timebase routines
which are based upon the guest virtual clock, but it is still possible to
migrate guests in the same manner.

Finally patch 3 enables the inclusion of the timebase in the migration stream
for both Old World and New World Macs.

Unfortunately I have no ability to test this on KVM-enabled hardware, however
it should preserve the existing behaviour, barring the bugfix in patch 1.

Signed-off-by: Mark Cave-Ayland 

Mark Cave-Ayland (3):
  ppc: fix timebase adjustment during migration
  ppc: add support for timebase migration on non-PPC hosts
  ppc: include timebase in migration stream for g3beige/mac99 machines

 hw/ppc/mac_newworld.c |4 
 hw/ppc/mac_oldworld.c |4 
 hw/ppc/ppc.c  |   35 ---
 3 files changed, 36 insertions(+), 7 deletions(-)

-- 
1.7.10.4

[Qemu-devel] [PATCH 2/3] ppc: add support for timebase migration on non-PPC hosts

2016-01-31 Thread Mark Cave-Ayland

This patch provides support for migration of the PPC guest timebase on non-PPC
host architectures (i.e those using QEMU's virtual emulated timebase).

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/ppc.c |   33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 19f4570..9b80c1d 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -832,6 +832,15 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t 
freq)
 cpu_ppc_store_purr(cpu, 0xULL);
 }
 
+static int host_cpu_is_ppc(void)
+{
+#if defined(_ARCH_PPC)
+return -1;
+#else
+return 0;
+#endif
+}
+
 static void timebase_pre_save(void *opaque)
 {
 PPCTimebase *tb = opaque;
@@ -844,11 +853,16 @@ static void timebase_pre_save(void *opaque)
 }
 
 tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
-/*
- * tb_offset is only expected to be changed by migration so
- * there is no need to update it from KVM here
- */
-tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
+
+if (host_cpu_is_ppc()) {
+/*
+ * tb_offset is only expected to be changed by migration so
+ * there is no need to update it from KVM here
+ */
+tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
+} else {
+tb->guest_timebase = cpu_ppc_load_tbl(_ppc_cpu->env);
+}
 }
 
 static int timebase_post_load(void *opaque, int version_id)
@@ -879,7 +893,14 @@ static int timebase_post_load(void *opaque, int version_id)
  NANOSECONDS_PER_SECOND);
 guest_tb = tb_remote->guest_timebase + migration_duration_tb;
 
-tb_off_adj = guest_tb - cpu_get_host_ticks();
+if (host_cpu_is_ppc()) {
+/* Hardware timebase */
+tb_off_adj = guest_tb - cpu_get_host_ticks();
+} else {
+/* Software timebase */
+tb_off_adj = guest_tb - muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+ freq, get_ticks_per_sec());
+}
 
 tb_off = first_ppc_cpu->env.tb_env->tb_offset;
 trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,
-- 
1.7.10.4

[Qemu-devel] [RFC Patch v2 07/10] virtio-net rsc: Checking TCP flag and drain specific connection packets

2016-01-31 Thread wexu

From: Wei Xu 

Normally it includes 2 typical way to handle a TCP control flag, bypass
and finalize, bypass means should be sent out directly, and finalize
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flag such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 66 +
 1 file changed, 66 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 88fc4f8..b0987d0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,12 @@
 
 #define VIRTIO_HEADER   12/* Virtio net header size */
 #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define IP4_ADDR_OFFSET (IP_OFFSET + 12)/* ipv4 address start */
+#define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
+#define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
+#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+#define TCP_PORT_SIZE   4   /* sport + dport */
 #define TCP_WINDOW  65535
 
 /* IPv4 max payload, 16 bits in the header */
@@ -1850,6 +1856,27 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
 o_data, _ip->ip_len, MAX_IP4_PAYLOAD);
 }
 
+
+/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
+ * to prevent out of order */
+static int virtio_net_rsc_parse_tcp_ctrl(uint8_t *ip, uint16_t offset)
+{
+uint16_t tcp_flag;
+struct tcp_header *tcp;
+
+tcp = (struct tcp_header *)(ip + offset);
+tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
+if (tcp_flag & TH_SYN) {
+return RSC_BYPASS;
+}
+
+if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
+return RSC_FINAL;
+}
+
+return 0;
+}
+
 static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
 const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
 {
@@ -1895,12 +1922,51 @@ static size_t virtio_net_rsc_callback(NetRscChain 
*chain, NetClientState *nc,
 return virtio_net_rsc_cache_buf(chain, nc, buf, size);
 }
 
+/* Drain a connection data, this is to avoid out of order segments */
+static size_t virtio_net_rsc_drain_one(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, uint16_t ip_start,
+uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)
+{
+NetRscSeg *seg, *nseg;
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
+if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
+|| memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {
+continue;
+}
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
+
+virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+break;
+}
+
+return virtio_net_do_receive(nc, buf, size);
+}
 static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
   const uint8_t *buf, size_t size)
 {
+int32_t ret;
+struct ip_header *ip;
 NetRscChain *chain;
 
 chain = (NetRscChain *)opq;
+ip = (struct ip_header *)(buf + IP_OFFSET);
+
+ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
+(0xF & ip->ip_ver_len) << 2);
+if (RSC_BYPASS == ret) {
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_FINAL == ret) {
+return virtio_net_rsc_drain_one(chain, nc, buf, size, IP4_ADDR_OFFSET,
+IP4_ADDR_SIZE, TCP4_PORT_OFFSET, 
TCP_PORT_SIZE);
+}
+
 return virtio_net_rsc_callback(chain, nc, buf, size,
virtio_net_rsc_try_coalesce4);
 }
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool

2016-01-31 Thread wexu

From: Wei Xu 

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets, this is to reduce the
delay to upper layer protocol stack.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4f77fbe..93df0d5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,12 +48,17 @@
 
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
 
+/* Purge coalesced packets timer interval */
+#define RSC_TIMER_INTERVAL  50
+
 /* Global statistics */
 static uint32_t rsc_chain_no_mem;
 
 /* Switcher to enable/disable rsc */
 static bool virtio_net_rsc_bypass;
 
+static uint32_t rsc_timeout = RSC_TIMER_INTERVAL;
+
 /* Coalesce callback for ipv4/6 */
 typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
  const uint8_t *buf, size_t size);
@@ -1625,6 +1630,35 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_net_rsc_purge(void *opq)
+{
+int ret = 0;
+NetRscChain *chain = (NetRscChain *)opq;
+NetRscSeg *seg, *rn;
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn) {
+if (!qemu_can_send_packet(seg->nc)) {
+/* Should quit or continue? not sure if one or some
+* of the queues fail would happen, try continue here */
+continue;
+}
+
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+if (ret == 0) {
+/* Try next queue */
+continue;
+}
+}
+
+if (!QTAILQ_EMPTY(>buffers)) {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
+}
+}
 
 static void virtio_net_rsc_cleanup(VirtIONet *n)
 {
@@ -1810,6 +1844,8 @@ static size_t virtio_net_rsc_callback(NetRscChain *chain, 
NetClientState *nc,
 if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
 return 0;
 } else {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
 return size;
 }
 }
@@ -1877,6 +1913,8 @@ static NetRscChain 
*virtio_net_rsc_lookup_chain(NetClientState *nc,
 }
 
 chain->proto = proto;
+chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  virtio_net_rsc_purge, chain);
 chain->do_receive = virtio_net_rsc_receive4;
 
 QTAILQ_INIT(>buffers);
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 09/10] virtio-net rsc: Add IPv6 support

2016-01-31 Thread wexu

From: Wei Xu 

A few more stuffs should be included to support this
1. Corresponding chain lookup
2. Coalescing callback for the protocol chain
3. Filter & Sanity Check.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 104 +++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9b44762..c9f6bfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -46,12 +46,19 @@
 #define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
 #define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
 #define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+
+#define IP6_ADDR_OFFSET (IP_OFFSET + 8) /* ipv6 address start */
+#define TCP6_OFFSET (IP_OFFSET + sizeof(struct ip6_header)) /* tcp6 header */
+#define TCP6_PORT_OFFSET TCP6_OFFSET/* tcp6 port offset */
+#define IP6_ADDR_SIZE   32  /* ipv6 saddr + daddr */
 #define TCP_PORT_SIZE   4   /* sport + dport */
 #define TCP_WINDOW  65535
 
 /* IPv4 max payload, 16 bits in the header */
 #define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
 
+/* ip6 max payload, payload in ipv6 don't include the  header */
+#define MAX_IP6_PAYLOAD  65535
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
 
 /* Purge coalesced packets timer interval */
@@ -1856,6 +1863,42 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
 o_data, _ip->ip_len, MAX_IP4_PAYLOAD);
 }
 
+static int32_t virtio_net_rsc_try_coalesce6(NetRscChain *chain,
+NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+uint16_t o_ip_len, n_ip_len;/* len in ip header field */
+uint16_t n_tcp_len, o_tcp_len;  /* tcp header len */
+uint16_t o_data, n_data;/* payload without virtio/eth/ip/tcp */
+struct ip6_header *n_ip, *o_ip;
+struct tcp_header *n_tcp, *o_tcp;
+
+n_ip = (struct ip6_header *)(buf + IP_OFFSET);
+n_ip_len = htons(n_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+n_tcp = (struct tcp_header *)(((uint8_t *)n_ip)\
++ sizeof(struct ip6_header));
+n_tcp_len = (htons(n_tcp->th_offset_flags) & 0xF000) >> 10;
+n_data = n_ip_len - n_tcp_len;
+
+o_ip = (struct ip6_header *)(seg->buf + IP_OFFSET);
+o_ip_len = htons(o_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+o_tcp = (struct tcp_header *)(((uint8_t *)o_ip)\
++ sizeof(struct ip6_header));
+o_tcp_len = (htons(o_tcp->th_offset_flags) & 0xF000) >> 10;
+o_data = o_ip_len - o_tcp_len;
+
+if (memcmp(_ip->ip6_src, _ip->ip6_src, sizeof(struct in6_address))
+|| memcmp(_ip->ip6_dst, _ip->ip6_dst, sizeof(struct in6_address))
+|| (n_tcp->th_sport ^ o_tcp->th_sport)
+|| (n_tcp->th_dport ^ o_tcp->th_dport)) {
+return RSC_NO_MATCH;
+}
+
+/* There is a difference between payload lenght in ipv4 and v6,
+   ip header is excluded in ipv6 */
+return virtio_net_rsc_coalesce_tcp(chain, seg, buf,
+   n_tcp, n_tcp_len, n_data, o_tcp, o_tcp_len, o_data,
+   _ip->ip6_ctlun.ip6_un1.ip6_un1_plen, MAX_IP6_PAYLOAD);
+}
 
 /* Pakcets with 'SYN' should bypass, other flag should be sent after drain
  * to prevent out of order */
@@ -2015,6 +2058,59 @@ static size_t virtio_net_rsc_receive4(void *opq, 
NetClientState* nc,
virtio_net_rsc_try_coalesce4);
 }
 
+static int32_t virtio_net_rsc_filter6(NetRscChain *chain, struct ip6_header 
*ip,
+  const uint8_t *buf, size_t size)
+{
+uint16_t ip_len;
+
+if (size < (TCP6_OFFSET + sizeof(tcp_header))) {
+return RSC_BYPASS;
+}
+
+if (0x6 != (0xF & ip->ip6_ctlun.ip6_un1.ip6_un1_flow)) {
+return RSC_BYPASS;
+}
+
+/* Both option and protocol is checked in this */
+if (ip->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
+return RSC_BYPASS;
+}
+
+/* Sanity check */
+ip_len = htons(ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+if (ip_len < sizeof(struct tcp_header)
+|| ip_len > (size - TCP6_OFFSET)) {
+return RSC_BYPASS;
+}
+
+return 0;
+}
+
+static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
+  const uint8_t *buf, size_t size)
+{
+int32_t ret;
+NetRscChain *chain;
+struct ip6_header *ip;
+
+chain = (NetRscChain *)opq;
+ip = (struct ip6_header *)(buf + IP_OFFSET);
+if (RSC_WANT != virtio_net_rsc_filter6(chain, ip, buf, size)) {
+return virtio_net_do_receive(nc, buf, size);
+}
+
+ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip, sizeof(*ip));
+if (RSC_BYPASS == ret) {
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_FINAL == ret) {
+return

[Qemu-devel] [RFC Patch v2 08/10] virtio-net rsc: Sanity check & More bypass cases check

2016-01-31 Thread wexu

From: Wei Xu 

More general exception cases check
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b0987d0..9b44762 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1948,6 +1948,46 @@ static size_t virtio_net_rsc_drain_one(NetRscChain 
*chain, NetClientState *nc,
 
 return virtio_net_do_receive(nc, buf, size);
 }
+
+static int32_t virtio_net_rsc_filter4(NetRscChain *chain, struct ip_header *ip,
+  const uint8_t *buf, size_t size)
+{
+uint16_t ip_len;
+
+if (size < (TCP4_OFFSET + sizeof(tcp_header))) {
+return RSC_BYPASS;
+}
+
+/* Not an ipv4 one */
+if (0x4 != ((0xF0 & ip->ip_ver_len) >> 4)) {
+return RSC_BYPASS;
+}
+
+/* Don't handle packets with ip option */
+if (5 != (0xF & ip->ip_ver_len)) {
+return RSC_BYPASS;
+}
+
+/* Don't handle packets with ip fragment */
+if (!(htons(ip->ip_off) & IP_DF)) {
+return RSC_BYPASS;
+}
+
+if (ip->ip_p != IPPROTO_TCP) {
+return RSC_BYPASS;
+}
+
+/* Sanity check */
+ip_len = htons(ip->ip_len);
+if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
+|| ip_len > (size - IP_OFFSET)) {
+return RSC_BYPASS;
+}
+
+return RSC_WANT;
+}
+
+
 static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
   const uint8_t *buf, size_t size)
 {
@@ -1958,6 +1998,10 @@ static size_t virtio_net_rsc_receive4(void *opq, 
NetClientState* nc,
 chain = (NetRscChain *)opq;
 ip = (struct ip_header *)(buf + IP_OFFSET);
 
+if (RSC_WANT != virtio_net_rsc_filter4(chain, ip, buf, size)) {
+return virtio_net_do_receive(nc, buf, size);
+}
+
 ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
 (0xF & ip->ip_ver_len) << 2);
 if (RSC_BYPASS == ret) {
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 06/10] virtio-net rsc: IPv4 checksum

2016-01-31 Thread wexu

From: Wei Xu 

If a field in the IPv4 header is modified, then the checksum
have to be recalculated before sending it out.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 93df0d5..88fc4f8 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1630,6 +1630,18 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_net_rsc_ipv4_checksum(NetRscSeg *seg)
+{
+uint32_t sum;
+struct ip_header *ip;
+
+ip = (struct ip_header *)(seg->buf + IP_OFFSET);
+
+ip->ip_sum = 0;
+sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
+ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
+}
+
 static void virtio_net_rsc_purge(void *opq)
 {
 int ret = 0;
@@ -1643,6 +1655,10 @@ static void virtio_net_rsc_purge(void *opq)
 continue;
 }
 
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
+
 ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
 QTAILQ_REMOVE(>buffers, seg, next);
 g_free(seg->buf);
@@ -1853,6 +1869,9 @@ static size_t virtio_net_rsc_callback(NetRscChain *chain, 
NetClientState *nc,
 QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
 ret = coalesce(chain, seg, buf, size);
 if (RSC_FINAL == ret) {
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
 ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
 QTAILQ_REMOVE(>buffers, seg, next);
 g_free(seg->buf);
-- 
2.4.0

[Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing

2016-01-31 Thread wexu

From: Wei Xu 

Since this feature also needs to support IPv6, and there are
some protocol specific differences difference for IPv4/6 in the header,
so try to make the interface to be general.

IPv4/6 should set up both the new and old IP/TCP header before invoking
TCP coalescing, and should also tell the real payload.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed invalid filter
and is identified as an expected one.

An expected segment means:
1. Segment is within current window and the sequence is the expected one.
2. ACK of the segment is in the valid window.
3. If the ACK in the segment is a duplicated one, then it must less than 2,
   this is to notify upper layer TCP starting retransmission due to the spec.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 127 ++--
 1 file changed, 124 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index cfbac6d..4f77fbe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,10 @@
 
 #define VIRTIO_HEADER   12/* Virtio net header size */
 #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+#define TCP_WINDOW  65535
+
+/* IPv4 max payload, 16 bits in the header */
+#define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
 
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
 
@@ -1670,13 +1674,130 @@ out:
 return 0;
 }
 
+static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
+ const uint8_t *buf, struct tcp_header *n_tcp,
+ struct tcp_header *o_tcp)
+{
+uint32_t nack, oack;
+uint16_t nwin, owin;
+
+nack = htonl(n_tcp->th_ack);
+nwin = htons(n_tcp->th_win);
+oack = htonl(o_tcp->th_ack);
+owin = htons(o_tcp->th_win);
+
+if ((nack - oack) >= TCP_WINDOW) {
+return RSC_FINAL;
+} else if (nack == oack) {
+/* duplicated ack or window probe */
+if (nwin == owin) {
+/* duplicated ack, add dup ack count due to whql test up to 1 */
+
+if (seg->dup_ack_count == 0) {
+seg->dup_ack_count++;
+return RSC_COALESCE;
+} else {
+/* Spec says should send it directly */
+return RSC_FINAL;
+}
+} else {
+/* Coalesce window update */
+o_tcp->th_win = n_tcp->th_win;
+return RSC_COALESCE;
+}
+} else {
+/* pure ack, update ack */
+o_tcp->th_ack = n_tcp->th_ack;
+return RSC_COALESCE;
+}
+}
+
+static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain *chain, NetRscSeg *seg,
+   const uint8_t *buf, struct tcp_header *n_tcp, uint16_t 
n_tcp_len,
+   uint16_t n_data, struct tcp_header *o_tcp, uint16_t o_tcp_len,
+   uint16_t o_data, uint16_t *p_ip_len, uint16_t max_data)
+{
+void *data;
+uint16_t o_ip_len;
+uint32_t nseq, oseq;
+
+o_ip_len = htons(*p_ip_len);
+nseq = htonl(n_tcp->th_seq);
+oseq = htonl(o_tcp->th_seq);
+
+/* Ignore packet with more/larger tcp options */
+if (n_tcp_len > o_tcp_len) {
+return RSC_FINAL;
+}
+
+/* out of order or retransmitted. */
+if ((nseq - oseq) > TCP_WINDOW) {
+return RSC_FINAL;
+}
+
+data = ((uint8_t *)n_tcp) + n_tcp_len;
+if (nseq == oseq) {
+if ((0 == o_data) && n_data) {
+/* From no payload to payload, normal case, not a dup ack or etc */
+goto coalesce;
+} else {
+return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
+}
+} else if ((nseq - oseq) != o_data) {
+/* Not a consistent packet, out of order */
+return RSC_FINAL;
+} else {
+coalesce:
+if ((o_ip_len + n_data) > max_data) {
+return RSC_FINAL;
+}
+
+/* Here comes the right data, the payload lengh in v4/v6 is different,
+   so use the field value to update */
+*p_ip_len = htons(o_ip_len + n_data); /* Update new data len */
+o_tcp->th_offset_flags = n_tcp->th_offset_flags; /* Bring 'PUSH' big */
+o_tcp->th_ack = n_tcp->th_ack;
+o_tcp->th_win = n_tcp->th_win;
+
+memmove(seg->buf + seg->size, data, n_data);
+seg->size += n_data;
+return RSC_COALESCE;
+}
+}
 
 static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
NetRscSeg *seg, const uint8_t *buf, size_t size)
 {
-/* This real part of this function will be introduced in next patch, just
-*  return a 'final' to feed the compilation. */
-return RSC_FINAL;
+uint16_t o_ip_len, n_ip_len;/* len in ip header field */
+uint16_t n_ip_hdrlen, o_ip_hdrlen;  /* ipv4 header len */
+uint16_t n_tcp_len, o_tcp_len;  /* tcp

[Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread wexu

From: Wei Xu 

The chain list is initialized when the device is getting realized,
and the entry of the chain will be inserted dynamically according
to protocol type of the network traffic.

All the buffered packets and chain will be destroyed when the
device is going to be unrealized.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c| 22 ++
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a877614..4e9458e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+
+static void virtio_net_rsc_cleanup(VirtIONet *n)
+{
+NetRscChain *chain, *rn_chain;
+NetRscSeg *seg, *rn_seg;
+
+QTAILQ_FOREACH_SAFE(chain, >rsc_chains, next, rn_chain) {
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn_seg) {
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+timer_del(chain->drain_timer);
+timer_free(chain->drain_timer);
+QTAILQ_REMOVE(>rsc_chains, chain, next);
+g_free(chain);
+}
+}
+}
+
 static NetClientInfo net_virtio_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
@@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 nc = qemu_get_queue(n->nic);
 nc->rxfilter_notify_enabled = 1;
 
+QTAILQ_INIT(>rsc_chains);
 n->qdev = dev;
 register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
 virtio_net_save, virtio_net_load, n);
@@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, 
Error **errp)
 g_free(n->vqs);
 qemu_del_nic(n->nic);
 virtio_cleanup(vdev);
+virtio_net_rsc_cleanup(n);
 }
 
 static void virtio_net_instance_init(Object *obj)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index f3cc25f..6ce8b93 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -59,6 +59,7 @@ typedef struct VirtIONet {
 VirtIONetQueue *vqs;
 VirtQueue *ctrl_vq;
 NICState *nic;
+QTAILQ_HEAD(, NetRscChain) rsc_chains;
 uint32_t tx_timeout;
 int32_t tx_burst;
 uint32_t has_vnet_hdr;
-- 
2.4.0

[Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest

2016-01-31 Thread wexu

From: Wei Xu 

Patch v2 add detailed commit log.

This patch is to support WHQL test for Windows guest, while this feature also
benifits other guest works as a kernel 'gro' like feature with userspace 
implementation.
Feature information:
  http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324

Both IPv4 and IPv6 are supported, though performance with userspace virtio
is slow than vhost-net, there is about 30-40 percent performance
improvement to userspace virtio, this is done by turning this feature on
and disable 'tso' on corresponding tap interface.

Test steps:
Although this feature is mainly used for window guest, i used linux guest to 
help test
the feature, to make things simple, i used 3 steps to test the patch as i moved 
on.
1. With a tcp socket client/server pair runnig on 2 linux guest, thus i can 
control
the traffic and debugging the code as i want.
2. Netperf on linux guest test the throughput.
3. WHQL test with 2 Windows guest.

Current status:
IPv4 pass all the above tests. 
IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot 
receive
any packet in WHQL test, debugging on the host side shows all the packets have 
been
pushed to th vring, by replacing it with a linux guest, i add 10 extra packets 
before
sending out the real packet, tcpdump running on guest only capture 6 packets, 
don't
find out the root cause yet, will continue working on this.

Note:
A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes during 
setup,
this can be figured out by replacing it with an 'e1000' nic.

Pending issues & Todo list:
1. Dup ack count not added in the virtio_net_hdr, but WHQL test case passes,
looks like a bug in test case.
2. Missing a Feature Bit
3. Missing a few tcp/ip handling
ECN change.
TCP window scale.

Wei Xu (10):
  virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'
  virtio-net rsc: Initilize & Cleanup
  virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4
  virtio-net rsc: Detailed IPv4 and General TCP data coalescing
  virtio-net rsc: Create timer to drain the packets from the cache pool
  virtio-net rsc: IPv4 checksum
  virtio-net rsc: Checking TCP flag and drain specific connection
packets
  virtio-net rsc: Sanity check & More bypass cases check
  virtio-net rsc: Add IPv6 support
  virtio-net rsc: Add Receive Segment Coalesce statistics

 hw/net/virtio-net.c| 626 -
 include/hw/virtio/virtio-net.h |   1 +
 include/hw/virtio/virtio.h |  65 +
 3 files changed, 691 insertions(+), 1 deletion(-)

-- 
2.4.0

[Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-01-31 Thread wexu

From: Wei Xu 

Upon a packet is arriving, a corresponding chain will be selected or created,
or be bypassed if it's not an IPv4 packets.

The callback in the chain will be invoked to call the real coalescing.

Since the coalescing is based on the TCP connection, so the packets will be
cached if there is no previous data within the same connection.

The framework of IPv4 is also introduced.

This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
coalescing)

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 173 +++-
 1 file changed, 172 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4e9458e..cfbac6d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -14,10 +14,12 @@
 #include "qemu/iov.h"
 #include "hw/virtio/virtio.h"
 #include "net/net.h"
+#include "net/eth.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/sockets.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "hw/virtio/virtio-bus.h"
@@ -37,6 +39,21 @@
 #define endof(container, field) \
 (offsetof(container, field) + sizeof(((container *)0)->field))
 
+#define VIRTIO_HEADER   12/* Virtio net header size */
+#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
+
+/* Global statistics */
+static uint32_t rsc_chain_no_mem;
+
+/* Switcher to enable/disable rsc */
+static bool virtio_net_rsc_bypass;
+
+/* Coalesce callback for ipv4/6 */
+typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
+ const uint8_t *buf, size_t size);
+
 typedef struct VirtIOFeature {
 uint32_t flags;
 size_t end;
@@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
 return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
+static ssize_t virtio_net_do_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
 }
 }
 
+static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size)
+{
+NetRscSeg *seg;
+
+seg = g_malloc(sizeof(NetRscSeg));
+if (!seg) {
+return 0;
+}
+
+seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
+if (!seg->buf) {
+goto out;
+}
+
+memmove(seg->buf, buf, size);
+seg->size = size;
+seg->dup_ack_count = 0;
+seg->is_coalesced = 0;
+seg->nc = nc;
+
+QTAILQ_INSERT_TAIL(>buffers, seg, next);
+return size;
+
+out:
+g_free(seg);
+return 0;
+}
+
+
+static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
+   NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+/* This real part of this function will be introduced in next patch, just
+*  return a 'final' to feed the compilation. */
+return RSC_FINAL;
+}
+
+static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
+{
+int ret;
+NetRscSeg *seg, *nseg;
+
+if (QTAILQ_EMPTY(>buffers)) {
+if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
+return 0;
+} else {
+return size;
+}
+}
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
+ret = coalesce(chain, seg, buf, size);
+if (RSC_FINAL == ret) {
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+if (ret == 0) {
+/* Send failed */
+return 0;
+}
+
+/* Send current packet */
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_NO_MATCH == ret) {
+continue;
+} else {
+/* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
+seg->is_coalesced = 1;
+return size;
+}
+}
+
+return virtio_net_rsc_cache_buf(chain, nc, buf, size);
+}
+
+static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
+  const uint8_t *buf, size_t size)
+{
+NetRscChain *chain;
+
+chain = (NetRscChain *)opq;
+return virtio_net_rsc_callback(chain, nc, buf, size,
+   virtio_net_rsc_try_coalesce4);
+}
+
+static NetRscChain *virtio_net_rsc_lookup_chain(NetClientState *nc,
+uint16_t proto)
+{
+VirtIONet *n;

[Qemu-devel] [RFC Patch v2 01/10] virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'

2016-01-31 Thread wexu

From: Wei Xu 

Segment is the coalesced packets in a connection.

Status is to indicate the status while do coalescing, such as if a
packet is bypassed or coalesced, etc.

Chain is used to save the segments of different protocols in a VirtIONet
instance.

A timer is used in a chain to help purging the buffer/coalesced packets.

Signed-off-by: Wei Xu 
---
 include/hw/virtio/virtio.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 205fadf..1383220 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -127,6 +127,38 @@ typedef struct VirtioDeviceClass {
 int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
 } VirtioDeviceClass;
 
+/* Coalesced packets type & status */
+typedef enum {
+RSC_COALESCE,   /* Data been coalesced */
+RSC_FINAL,  /* Will terminate current connection */
+RSC_NO_MATCH,   /* No matched in the buffer pool */
+RSC_BYPASS, /* Packet to be bypass, not tcp, tcp ctrl, etc */
+RSC_WANT/* Data want to be coalesced */
+} COALESCE_STATUS;
+
+/* Coalesced segmant */
+typedef struct NetRscSeg {
+QTAILQ_ENTRY(NetRscSeg) next;
+void *buf;
+size_t size;
+uint32_t dup_ack_count;
+bool is_coalesced;  /* need recal ipv4 header checksum, mark here */
+NetClientState *nc;
+} NetRscSeg;
+
+/* Receive callback for ipv4/6 */
+typedef size_t (VirtioNetReceive) (void *,
+   NetClientState *, const uint8_t *, size_t);
+
+/* Chain is divided by protocol(ipv4/v6) and NetClientInfo */
+typedef struct NetRscChain {
+QTAILQ_ENTRY(NetRscChain) next;
+uint16_t proto;
+VirtioNetReceive *do_receive;
+QEMUTimer *drain_timer;
+QTAILQ_HEAD(, NetRscSeg) buffers;
+} NetRscChain;
+
 void virtio_instance_init_common(Object *proxy_obj, void *data,
  size_t vdev_size, const char *vdev_name);
 
-- 
2.4.0

Re: [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-01-31 Thread Michael S. Tsirkin

On Mon, Feb 01, 2016 at 02:13:22AM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> Upon a packet is arriving, a corresponding chain will be selected or created,
> or be bypassed if it's not an IPv4 packets.
> 
> The callback in the chain will be invoked to call the real coalescing.
> 
> Since the coalescing is based on the TCP connection, so the packets will be
> cached if there is no previous data within the same connection.
> 
> The framework of IPv4 is also introduced.
> 
> This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
> coalescing)
> 
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 173 
> +++-
>  1 file changed, 172 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 4e9458e..cfbac6d 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -14,10 +14,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -37,6 +39,21 @@
>  #define endof(container, field) \
>  (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define VIRTIO_HEADER   12/* Virtio net header size */
> +#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
> +
> +#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
> +
> +/* Global statistics */
> +static uint32_t rsc_chain_no_mem;
> +
> +/* Switcher to enable/disable rsc */
> +static bool virtio_net_rsc_bypass;
> +
> +/* Coalesce callback for ipv4/6 */
> +typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
> + const uint8_t *buf, size_t size);
> +

Since there are only 2 cases, it's probably better to just
open-code if (v4) -> coalesce4 else if v6 -> coalesce6

>  typedef struct VirtIOFeature {
>  uint32_t flags;
>  size_t end;
> @@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
> *buf, int size)
>  return 0;
>  }
>  
> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, 
> size_t size)
> +static ssize_t virtio_net_do_receive(NetClientState *nc,
> +  const uint8_t *buf, size_t size)
>  {
>  VirtIONet *n = qemu_get_nic_opaque(nc);
>  VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> @@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
>  }
>  }
>  
> +static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
> +const uint8_t *buf, size_t size)
> +{
> +NetRscSeg *seg;
> +
> +seg = g_malloc(sizeof(NetRscSeg));
> +if (!seg) {
> +return 0;
> +}
> +
> +seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
> +if (!seg->buf) {
> +goto out;
> +}
> +
> +memmove(seg->buf, buf, size);
> +seg->size = size;
> +seg->dup_ack_count = 0;
> +seg->is_coalesced = 0;
> +seg->nc = nc;
> +
> +QTAILQ_INSERT_TAIL(>buffers, seg, next);
> +return size;
> +
> +out:
> +g_free(seg);
> +return 0;
> +}
> +
> +
> +static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
> +   NetRscSeg *seg, const uint8_t *buf, size_t size)
> +{
> +/* This real part of this function will be introduced in next patch, just
> +*  return a 'final' to feed the compilation. */
> +return RSC_FINAL;
> +}
> +
> +static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
> +const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
> +{
> +int ret;
> +NetRscSeg *seg, *nseg;
> +
> +if (QTAILQ_EMPTY(>buffers)) {
> +if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
> +return 0;
> +} else {
> +return size;
> +}
> +}
> +
> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
> +ret = coalesce(chain, seg, buf, size);
> +if (RSC_FINAL == ret) {
> +ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> +QTAILQ_REMOVE(>buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +if (ret == 0) {
> +/* Send failed */
> +return 0;
> +}
> +
> +/* Send current packet */
> +return virtio_net_do_receive(nc, buf, size);
> +} else if (RSC_NO_MATCH == ret) {
> +continue;
> +} else {
> +/* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
> +seg->is_coalesced = 1;
> +return size;
> +}
> +}
> +
> +return virtio_net_rsc_cache_buf(chain, nc, buf, size);
> +}
> +

Re: [Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread Mark Cave-Ayland

On 31/01/16 19:58, Peter Maydell wrote:

> On 31 January 2016 at 19:19, Mark Cave-Ayland
>  wrote:
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  hw/ppc/mac_newworld.c |4 
>>  hw/ppc/mac_oldworld.c |4 
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
>> index f95086b..3283f1d 100644
>> --- a/hw/ppc/mac_newworld.c
>> +++ b/hw/ppc/mac_newworld.c
>> @@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
>>  int *token = g_new(int, 1);
>>  hwaddr nvram_addr = 0xFFF04000;
>>  uint64_t tbfreq;
>> +PPCTimebase *tb;
>>
>>  linux_boot = (kernel_filename != NULL);
>>
>> @@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
>>  /* Set time-base frequency to 100 Mhz */
>>  cpu_ppc_tb_init(env, TBFREQ);
>>  qemu_register_reset(ppc_core99_reset, cpu);
>> +
>> +tb = g_malloc0(sizeof(PPCTimebase));
>> +vmstate_register(NULL, -1, _ppc_timebase, tb);
> 
> Is there no way to avoid the vmstate_register here (ie to
> tie the migration data to an actual device or CPU object) ?

Not exactly that I know of - although I shamelessly borrowed this part
from similar code in spapr which has this comment:

/* FIXME: Should register things through the MachineState's qdev
 * interface, this is a legacy from the sPAPREnvironment structure
 * which predated MachineState but had a similar function */

Is this something that is now possible?


ATB,

Mark.

Re: [Qemu-devel] [PULL 00/39] ppc-for-2.6 queue 20160129

2016-01-31 Thread David Gibson

On Sat, Jan 30, 2016 at 11:29:43PM +1100, David Gibson wrote:
> On Fri, Jan 29, 2016 at 02:48:23PM +, Peter Maydell wrote:
> > On 29 January 2016 at 05:06, David Gibson  
> > wrote:
> > > The following changes since commit 
> > > 357e81c7e880f868833edf9f53cce1f3b09ea8ec:
> > >
> > >   Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160128' into 
> > > staging (2016-01-28 11:46:34 +)
> > >
> > > are available in the git repository at:
> > >
> > >   git://github.com/dgibson/qemu.git tags/ppc-for-2.6-20160129
> > >
> > > for you to fetch changes up to 1699679e699276c0538008f6ca74cd04e6c68b42:
> > >
> > >   target-ppc: Make every FPSCR_ macro have a corresponding FP_ macro 
> > > (2016-01-29 14:01:52 +1100)
> > >
> > > 
> > > ppc patch queue for 2016-01-29
> > >
> > > Currently accumulated patches for target-ppc, pseries machine type and
> > > related devices.
> > >   * Cleanup of error handling code in spapr
> > >   * A number of fixes for Macintosh devices for the benefit of MacOS 9 
> > > and X
> > >   * Remove some abuses of the RTAS memory access functions in spapr
> > >   * Fixes for the gdbstub (and monitor debug) for VMX and VSX extensions.
> > >   * Fix pseries machine hotplug memory under TCG
> > >   * Clean up and extend handling of multiple page sizes with 64-bit hash 
> > > MMUs
> > >
> > 
> > Hi. Unfortunately this generates errors when built with clang:
> > 
> > /home/petmay01/linaro/qemu-for-merges/target-ppc/mmu_helper.c:660:20:
> > error: unused function 'ppc4xx_tlb_invalidate_virt'
> > [-Werror,-Wunused-function]
> > static inline void ppc4xx_tlb_invalidate_virt(CPUPPCState *env,
> >^
> > 1 error generated.
> > 
> > The function does appear from a quick grep to be entirely unused...
> > 
> > (GCC doesn't complain about this because it doesn't warn about unused
> > static inline functions in a .c file, but clang does.)
> 
> Dammit.  Sorry.
> 
> Now.. why didn't travis pick that up :/.

Turns out the answer is because the test for support of #pragma GCC
diagnostic turns off -Werror on clang builds for me.  So I wonder
what's different about your setup that -Werror is working with clang.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Paolo Bonzini



On 31/01/2016 18:54, Peter Maydell wrote:
> On 31 January 2016 at 17:19, Paolo Bonzini  wrote:
>> On 31/01/2016 16:54, Mark Cave-Ayland wrote:
>>> I also notice that with the above commit I lose cycling through history
>>> in the GTK monitor - even with the multiple echo, instead of the up/down
>>> arrow keys cycling through the history instead I see the codes ^[[B and
>>> ^[[A being output to the window instead.
>>
>> That is probably me.  The echo feature was introduced for QMP, but in
>> theory it should have been limited to that.  I'll check it, thanks.
> 
> I've also seen echo, but only intermittently...

That smells like uninitialized memory or something like that.

Actually I'm fairly sure I tested "-monitor vc" at least, so perhaps
it's an interaction between the echo feature and "qemu-char: add logfile
facility to all chardev backends".  Anyway I'll look at it.

Paolo

Re: [Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread Michael S. Tsirkin

On Mon, Feb 01, 2016 at 02:13:21AM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> The chain list is initialized when the device is getting realized,
> and the entry of the chain will be inserted dynamically according
> to protocol type of the network traffic.
> 
> All the buffered packets and chain will be destroyed when the
> device is going to be unrealized.
> 
> Signed-off-by: Wei Xu 

What happens during migration?

> ---
>  hw/net/virtio-net.c| 22 ++
>  include/hw/virtio/virtio-net.h |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index a877614..4e9458e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  return 0;
>  }
>  
> +
> +static void virtio_net_rsc_cleanup(VirtIONet *n)
> +{
> +NetRscChain *chain, *rn_chain;
> +NetRscSeg *seg, *rn_seg;
> +
> +QTAILQ_FOREACH_SAFE(chain, >rsc_chains, next, rn_chain) {
> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn_seg) {
> +QTAILQ_REMOVE(>buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +
> +timer_del(chain->drain_timer);
> +timer_free(chain->drain_timer);
> +QTAILQ_REMOVE(>rsc_chains, chain, next);
> +g_free(chain);
> +}
> +}
> +}
> +
>  static NetClientInfo net_virtio_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
>  .size = sizeof(NICState),
> @@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
> Error **errp)
>  nc = qemu_get_queue(n->nic);
>  nc->rxfilter_notify_enabled = 1;
>  
> +QTAILQ_INIT(>rsc_chains);
>  n->qdev = dev;
>  register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
>  virtio_net_save, virtio_net_load, n);
> @@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState 
> *dev, Error **errp)
>  g_free(n->vqs);
>  qemu_del_nic(n->nic);
>  virtio_cleanup(vdev);
> +virtio_net_rsc_cleanup(n);
>  }
>  
>  static void virtio_net_instance_init(Object *obj)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index f3cc25f..6ce8b93 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -59,6 +59,7 @@ typedef struct VirtIONet {
>  VirtIONetQueue *vqs;
>  VirtQueue *ctrl_vq;
>  NICState *nic;
> +QTAILQ_HEAD(, NetRscChain) rsc_chains;
>  uint32_t tx_timeout;
>  int32_t tx_burst;
>  uint32_t has_vnet_hdr;
> -- 
> 2.4.0

Re: [Qemu-devel] [PATCH v2 2/2] target-ppc: mcrfs should always update FEX/VX and only clear exception bits

2016-01-31 Thread James Clarke

> On 31 Jan 2016, at 23:50, David Gibson  wrote:
> On Fri, Jan 29, 2016 at 06:40:21PM +, James Clarke wrote:
>> Here is the description of the mcrfs instruction from the PowerPC 
>> Architecture
>> Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
>> (http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), 
>> found
>> on page 120:
>> 
>>The contents of FPSCR field BFA are copied to Condition Register field BF.
>>All exception bits copied are set to 0 in the FPSCR. If the FX bit is
>>copied, it is set to 0 in the FPSCR.
>> 
>>Special Registers Altered:
>>CR field BF
>>FX OX(if BFA=0)
>>UX ZX XX VXSNAN  (if BFA=1)
>>VXISI VXIDI VXZDZ VXIMZ  (if BFA=2)
>>VXVC (if BFA=3)
>>VXSOFT VXSQRT VXCVI  (if BFA=5)
>> 
>> However, currently every bit in FPSCR field BFA is set to 0, including ones 
>> not
>> on that list.
>> 
>> This can be seen in the following simple C program:
>> 
>>#include 
>>#include 
>> 
>>int main(int argc, char **argv) {
>>int ret;
>>ret = fegetround();
>>printf("Current rounding: %d\n", ret);
>>ret = fesetround(FE_UPWARD);
>>printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
>>ret = fegetround();
>>printf("Current rounding: %d\n", ret);
>>ret = fegetround();
>>printf("Current rounding: %d\n", ret);
>>return 0;
>>}
>> 
>> which gave the output (before this commit):
>> 
>>Current rounding: 0
>>Setting to FE_UPWARD (2): 0
>>Current rounding: 2
>>Current rounding: 0
>> 
>> instead of (after this commit):
>> 
>>Current rounding: 0
>>Setting to FE_UPWARD (2): 0
>>Current rounding: 2
>>Current rounding: 2
>> 
>> The relevant disassembly is in fegetround(), which, on my system, is:
>> 
>>__GI___fegetround:
>><+0>:   mcrfs  cr7, cr7
>><+4>:   mfcr   r3
>><+8>:   clrldi r3, r3, 62
>><+12>:  blr
>> 
>> What happens is that, the first time fegetround() is called, FPSCR field 7 is
>> retrieved. However, because of the bug in mcrfs, the entirety of field 7 is 
>> set
>> to 0, which includes the rounding mode.
>> 
>> There are other issues this will fix, such as condition flags not persisting
>> when they should if read, and if you were to read a specific field with some
>> exception bits set, but no others were set in the entire register, then the
>> bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
>> should be.
>> 
>> Signed-off-by: James Clarke 
> 
> Thanks for the fixup.  It actually looks like helper_store_fpscr()
> should really take a target_ulong instead of u64 and have the (single)
> caller which wants to pass a 64 do the truncate.  But that can be a
> cleanup for another day.
> 
> Applied to ppc-for-2.6.

Great, thanks. I agree it seems odd, especially given the argument is cast to
target_ulong, but that’s a more invasive change.

> 
>> ---
>> target-ppc/cpu.h   |  6 ++
>> target-ppc/translate.c | 21 +
>> 2 files changed, 23 insertions(+), 4 deletions(-)
>> 
>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>> index 3a967b7..d811bc9 100644
>> --- a/target-ppc/cpu.h
>> +++ b/target-ppc/cpu.h
>> @@ -718,6 +718,12 @@ enum {
>> #define FP_RN1   (1ull << FPSCR_RN1)
>> #define FP_RN(1ull << FPSCR_RN)
>> 
>> +/* the exception bits which can be cleared by mcrfs - includes FX */
>> +#define FP_EX_CLEAR_BITS (FP_FX | FP_OX | FP_UX | FP_ZX | \
>> +  FP_XX | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
>> +  FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
>> +  FP_VXSQRT | FP_VXCVI)
>> +
>> /*/
>> /* Vector status and control register */
>> #define VSCR_NJ  16 /* Vector non-java */
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index 4be7eaa..ca10bd1 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -2500,18 +2500,31 @@ static void gen_fmrgow(DisasContext *ctx)
>> static void gen_mcrfs(DisasContext *ctx)
>> {
>> TCGv tmp = tcg_temp_new();
>> +TCGv_i32 tmask;
>> +TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
>> int bfa;
>> +int nibble;
>> +int shift;
>> 
>> if (unlikely(!ctx->fpu_enabled)) {
>> gen_exception(ctx, POWERPC_EXCP_FPU);
>> return;
>> }
>> -bfa = 4 * (7 - crfS(ctx->opcode));
>> -tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
>> +bfa = crfS(ctx->opcode);
>> +nibble = 7 - bfa;
>> +shift = 4 * nibble;
>> +tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
>> tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
>> -tcg_temp_free(tmp);
>>

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Paolo Bonzini



On 31/01/2016 16:54, Mark Cave-Ayland wrote:
> Aha! A quick test here shows that the patch fixes the serial port
> appearing on stdout and entering the monitor, but I still see the
> multiple echo problem in the GTK GUI.
> 
> I also notice that with the above commit I lose cycling through history
> in the GTK monitor - even with the multiple echo, instead of the up/down
> arrow keys cycling through the history instead I see the codes ^[[B and
> ^[[A being output to the window instead.

That is probably me.  The echo feature was introduced for QMP, but in
theory it should have been limited to that.  I'll check it, thanks.

Paolo

Re: [Qemu-devel] [PATCH v2 2/2] target-ppc: mcrfs should always update FEX/VX and only clear exception bits

2016-01-31 Thread David Gibson

On Fri, Jan 29, 2016 at 06:40:21PM +, James Clarke wrote:
> Here is the description of the mcrfs instruction from the PowerPC Architecture
> Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
> (http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), 
> found
> on page 120:
> 
> The contents of FPSCR field BFA are copied to Condition Register field BF.
> All exception bits copied are set to 0 in the FPSCR. If the FX bit is
> copied, it is set to 0 in the FPSCR.
> 
> Special Registers Altered:
> CR field BF
> FX OX(if BFA=0)
> UX ZX XX VXSNAN  (if BFA=1)
> VXISI VXIDI VXZDZ VXIMZ  (if BFA=2)
> VXVC (if BFA=3)
> VXSOFT VXSQRT VXCVI  (if BFA=5)
> 
> However, currently every bit in FPSCR field BFA is set to 0, including ones 
> not
> on that list.
> 
> This can be seen in the following simple C program:
> 
> #include 
> #include 
> 
> int main(int argc, char **argv) {
> int ret;
> ret = fegetround();
> printf("Current rounding: %d\n", ret);
> ret = fesetround(FE_UPWARD);
> printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
> ret = fegetround();
> printf("Current rounding: %d\n", ret);
> ret = fegetround();
> printf("Current rounding: %d\n", ret);
> return 0;
> }
> 
> which gave the output (before this commit):
> 
> Current rounding: 0
> Setting to FE_UPWARD (2): 0
> Current rounding: 2
> Current rounding: 0
> 
> instead of (after this commit):
> 
> Current rounding: 0
> Setting to FE_UPWARD (2): 0
> Current rounding: 2
> Current rounding: 2
> 
> The relevant disassembly is in fegetround(), which, on my system, is:
> 
> __GI___fegetround:
> <+0>:   mcrfs  cr7, cr7
> <+4>:   mfcr   r3
> <+8>:   clrldi r3, r3, 62
> <+12>:  blr
> 
> What happens is that, the first time fegetround() is called, FPSCR field 7 is
> retrieved. However, because of the bug in mcrfs, the entirety of field 7 is 
> set
> to 0, which includes the rounding mode.
> 
> There are other issues this will fix, such as condition flags not persisting
> when they should if read, and if you were to read a specific field with some
> exception bits set, but no others were set in the entire register, then the
> bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
> should be.
> 
> Signed-off-by: James Clarke 

Thanks for the fixup.  It actually looks like helper_store_fpscr()
should really take a target_ulong instead of u64 and have the (single)
caller which wants to pass a 64 do the truncate.  But that can be a
cleanup for another day.

Applied to ppc-for-2.6.

> ---
>  target-ppc/cpu.h   |  6 ++
>  target-ppc/translate.c | 21 +
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 3a967b7..d811bc9 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -718,6 +718,12 @@ enum {
>  #define FP_RN1   (1ull << FPSCR_RN1)
>  #define FP_RN(1ull << FPSCR_RN)
>  
> +/* the exception bits which can be cleared by mcrfs - includes FX */
> +#define FP_EX_CLEAR_BITS (FP_FX | FP_OX | FP_UX | FP_ZX | \
> +  FP_XX | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
> +  FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
> +  FP_VXSQRT | FP_VXCVI)
> +
>  
> /*/
>  /* Vector status and control register */
>  #define VSCR_NJ  16 /* Vector non-java */
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 4be7eaa..ca10bd1 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -2500,18 +2500,31 @@ static void gen_fmrgow(DisasContext *ctx)
>  static void gen_mcrfs(DisasContext *ctx)
>  {
>  TCGv tmp = tcg_temp_new();
> +TCGv_i32 tmask;
> +TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
>  int bfa;
> +int nibble;
> +int shift;
>  
>  if (unlikely(!ctx->fpu_enabled)) {
>  gen_exception(ctx, POWERPC_EXCP_FPU);
>  return;
>  }
> -bfa = 4 * (7 - crfS(ctx->opcode));
> -tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
> +bfa = crfS(ctx->opcode);
> +nibble = 7 - bfa;
> +shift = 4 * nibble;
> +tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
>  tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
> -tcg_temp_free(tmp);
>  tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 
> 0xf);
> -tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
> +tcg_temp_free(tmp);
> +tcg_gen_extu_tl_i64(tnew_fpscr, cpu_fpscr);
> +/* Only the exception bits (including FX) should be cleared if read */
> +

[Qemu-devel] [RFC Patch v2 10/10] virtio-net rsc: Add Receive Segment Coalesce statistics

2016-01-31 Thread wexu

From: Wei Xu 

Add statistics to log what happened during the process.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c| 49 +++---
 include/hw/virtio/virtio.h | 33 +++
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c9f6bfc..ab08b96 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -66,6 +66,7 @@
 
 /* Global statistics */
 static uint32_t rsc_chain_no_mem;
+static uint64_t virtio_net_received;
 
 /* Switcher to enable/disable rsc */
 static bool virtio_net_rsc_bypass;
@@ -1679,10 +1680,12 @@ static void virtio_net_rsc_purge(void *opq)
 
 if (ret == 0) {
 /* Try next queue */
+chain->stat.purge_failed++;
 continue;
 }
 }
 
+chain->stat.timer++;
 if (!QTAILQ_EMPTY(>buffers)) {
 timer_mod(chain->drain_timer,
   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
@@ -1715,6 +1718,7 @@ static int virtio_net_rsc_cache_buf(NetRscChain *chain, 
NetClientState *nc,
 
 seg = g_malloc(sizeof(NetRscSeg));
 if (!seg) {
+chain->stat.no_buf++;
 return 0;
 }
 
@@ -1730,9 +1734,11 @@ static int virtio_net_rsc_cache_buf(NetRscChain *chain, 
NetClientState *nc,
 seg->nc = nc;
 
 QTAILQ_INSERT_TAIL(>buffers, seg, next);
+chain->stat.cache++;
 return size;
 
 out:
+chain->stat.no_buf++;
 g_free(seg);
 return 0;
 }
@@ -1750,27 +1756,33 @@ static int32_t virtio_net_rsc_handle_ack(NetRscChain 
*chain, NetRscSeg *seg,
 owin = htons(o_tcp->th_win);
 
 if ((nack - oack) >= TCP_WINDOW) {
+chain->stat.ack_out_of_win++;
 return RSC_FINAL;
 } else if (nack == oack) {
 /* duplicated ack or window probe */
 if (nwin == owin) {
 /* duplicated ack, add dup ack count due to whql test up to 1 */
+chain->stat.dup_ack++;
 
 if (seg->dup_ack_count == 0) {
 seg->dup_ack_count++;
+chain->stat.dup_ack1++;
 return RSC_COALESCE;
 } else {
 /* Spec says should send it directly */
+chain->stat.dup_ack2++;
 return RSC_FINAL;
 }
 } else {
 /* Coalesce window update */
 o_tcp->th_win = n_tcp->th_win;
+chain->stat.win_update++;
 return RSC_COALESCE;
 }
 } else {
 /* pure ack, update ack */
 o_tcp->th_ack = n_tcp->th_ack;
+chain->stat.pure_ack++;
 return RSC_COALESCE;
 }
 }
@@ -1788,13 +1800,20 @@ static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain 
*chain, NetRscSeg *seg,
 nseq = htonl(n_tcp->th_seq);
 oseq = htonl(o_tcp->th_seq);
 
+if (n_tcp_len > sizeof(struct tcp_header)) {
+/* Log this only for debugging observation */
+chain->stat.tcp_option++;
+}
+
 /* Ignore packet with more/larger tcp options */
 if (n_tcp_len > o_tcp_len) {
+chain->stat.tcp_larger_option++;
 return RSC_FINAL;
 }
 
 /* out of order or retransmitted. */
 if ((nseq - oseq) > TCP_WINDOW) {
+chain->stat.data_out_of_win++;
 return RSC_FINAL;
 }
 
@@ -1802,16 +1821,19 @@ static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain 
*chain, NetRscSeg *seg,
 if (nseq == oseq) {
 if ((0 == o_data) && n_data) {
 /* From no payload to payload, normal case, not a dup ack or etc */
+chain->stat.data_after_pure_ack++;
 goto coalesce;
 } else {
 return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
 }
 } else if ((nseq - oseq) != o_data) {
 /* Not a consistent packet, out of order */
+chain->stat.data_out_of_order++;
 return RSC_FINAL;
 } else {
 coalesce:
 if ((o_ip_len + n_data) > max_data) {
+chain->stat.over_size++;
 return RSC_FINAL;
 }
 
@@ -1824,6 +1846,7 @@ coalesce:
 
 memmove(seg->buf + seg->size, data, n_data);
 seg->size += n_data;
+chain->stat.coalesced++;
 return RSC_COALESCE;
 }
 }
@@ -1855,6 +1878,7 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
 if ((n_ip->ip_src ^ o_ip->ip_src) || (n_ip->ip_dst ^ o_ip->ip_dst)
 || (n_tcp->th_sport ^ o_tcp->th_sport)
 || (n_tcp->th_dport ^ o_tcp->th_dport)) {
+chain->stat.no_match++;
 return RSC_NO_MATCH;
 }
 
@@ -1890,6 +1914,7 @@ static int32_t virtio_net_rsc_try_coalesce6(NetRscChain 
*chain,
 || memcmp(_ip->ip6_dst, _ip->ip6_dst, sizeof(struct in6_address))
 || (n_tcp->th_sport ^ o_tcp->th_sport)
 || (n_tcp->th_dport ^ o_tcp->th_dport)) {
+chain->stat.no_match++;
 return RSC_NO_MATCH;
 }
 
@@ -1927,6 +1952,7 @@ static size_t

Re: [Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread Peter Maydell

On 31 January 2016 at 19:19, Mark Cave-Ayland
 wrote:
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/ppc/mac_newworld.c |4 
>  hw/ppc/mac_oldworld.c |4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index f95086b..3283f1d 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
>  int *token = g_new(int, 1);
>  hwaddr nvram_addr = 0xFFF04000;
>  uint64_t tbfreq;
> +PPCTimebase *tb;
>
>  linux_boot = (kernel_filename != NULL);
>
> @@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
>  /* Set time-base frequency to 100 Mhz */
>  cpu_ppc_tb_init(env, TBFREQ);
>  qemu_register_reset(ppc_core99_reset, cpu);
> +
> +tb = g_malloc0(sizeof(PPCTimebase));
> +vmstate_register(NULL, -1, _ppc_timebase, tb);

Is there no way to avoid the vmstate_register here (ie to
tie the migration data to an actual device or CPU object) ?

thanks
-- PMM

Re: [Qemu-devel] Strange monitor/stdout issue on qemu-system-sparc/qemu-system-ppc

2016-01-31 Thread Peter Maydell

On 31 January 2016 at 17:19, Paolo Bonzini  wrote:
> On 31/01/2016 16:54, Mark Cave-Ayland wrote:
>> I also notice that with the above commit I lose cycling through history
>> in the GTK monitor - even with the multiple echo, instead of the up/down
>> arrow keys cycling through the history instead I see the codes ^[[B and
>> ^[[A being output to the window instead.
>
> That is probably me.  The echo feature was introduced for QMP, but in
> theory it should have been limited to that.  I'll check it, thanks.

I've also seen echo, but only intermittently...

thanks
-- PMM

Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication

2016-01-31 Thread Wen Congyang

On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote:
> On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote:
>> On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote:
>>> On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
 On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
>>> I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
>>> time if the disk is slow/failing.  bdrv_drain_all() blocks until all
>>> in-flight I/O requests have completed.  What does the Primary do if the
>>> Secondary becomes unresponsive?
>>
>> Actually, we knew this problem. But currently, there seems no better way to
>> resolve it. If you have any ideas?
> 
> Is it possible to hold the checkpoint information and acknowledge the
> checkpoint right away, without waiting for bdrv_drain_all() or any
> Secondory guest activity to complete?

There is no way to know that secondary becomes unreponsive.

> 
> I think this really means falling back to microcheckpointing until the
> Secondary guest can checkpoint.  Instead of a blocking vm_stop() we
> would prevent vcpus from running and when the last pending I/O finishes
> the Secondary could apply the last checkpoint.  This approach does not
> block QEMU (the monitor, etc).
> 

If secondary host becomes unresponsive, it means that we cannot do 
mocrocheckpointing.
We should do failover in this case.

Thanks
Wen Congyang

Re: [Qemu-devel] [PATCH 1/3] ppc: fix timebase adjustment during migration

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 07:19:34PM +, Mark Cave-Ayland wrote:
> ns_diff is already clamped to a minimum of 0 to prevent the timebase going
> backwards during migration due to misaligned clocks. Following on from this
> migration_duration_tb is also subject to the same constraint; hence the
> expression MIN(0, migration_duration_tb) always evaluates to 0 and so no
> timebase adjustment ever takes place.
> 
> Signed-off-by: Mark Cave-Ayland 

So, there are actually two problems here, which could be expressed a
bit more clearly in the commit message.

First, this clamping is redundant, because of the earlier clamp on
ns_diff.  Well.. probably.. I do wonder if we could get an overflow
anywhere giving us a negative number again.

More importantly, though, this is supposed to be a clamp below, which
needs a MAX.  MIN is Just Plain Wrong.

> ---
>  hw/ppc/ppc.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index ce90b09..19f4570 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -877,7 +877,7 @@ static int timebase_post_load(void *opaque, int 
> version_id)
>  migration_duration_ns = MIN(NANOSECONDS_PER_SECOND, ns_diff);
>  migration_duration_tb = muldiv64(migration_duration_ns, freq,
>   NANOSECONDS_PER_SECOND);
> -guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
> +guest_tb = tb_remote->guest_timebase + migration_duration_tb;
>  
>  tb_off_adj = guest_tb - cpu_get_host_ticks();
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PULL 00/40] ppc-for-2.6 queue 20160201

2016-01-31 Thread David Gibson

The following changes since commit 0430891ce162b986c6e02a7729a942ecd2a32ca4:

  hw: Clean up includes (2016-01-29 15:07:25 +)

are available in the git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-2.6-20160201

for you to fetch changes up to d1277156b5d3df6d75d138a7eec6ff80934cdcec:

  target-ppc: mcrfs should always update FEX/VX and only clear exception bits 
(2016-02-01 13:27:01 +1100)


I hope I've managed to finally iron out the problems in this series.
I've fixed the clang build problem from the 20160129 request and
checked build on a 32-bit host.  I've also added the mcrfs fix on top.



ppc patch queue for 2016-02-01

Currently accumulated patches for target-ppc, pseries machine type and
related devices.
  * Cleanup of error handling code in spapr
  * A number of fixes for Macintosh devices for the benefit of MacOS 9 and X
  * Remove some abuses of the RTAS memory access functions in spapr
  * Fixes for the gdbstub (and monitor debug) for VMX and VSX extensions.
  * Fix pseries machine hotplug memory under TCG
  * Clean up and extend handling of multiple page sizes with 64-bit hash MMUs
  * Fix to the TCG implementation of mcrfs


Alyssa Milburn (1):
  cuda.c: return error for unknown commands

Anton Blanchard (1):
  target-ppc: gdbstub: Add VSX support

Benjamin Herrenschmidt (1):
  target-ppc: Use sensible POWER8/POWER8E versions

Bharata B Rao (1):
  spapr: Don't create ibm,dynamic-reconfiguration-memory w/o DR LMBs

David Gibson (22):
  spapr: Small fixes to rtas_ibm_get_system_parameter, remove rtas_st_buffer
  spapr: Remove rtas_st_buffer_direct()
  spapr: Remove abuse of rtas_ld() in h_client_architecture_support
  ppc: Clean up error handling in ppc_set_compat()
  pseries: Clean up error handling of spapr_cpu_init()
  pseries: Clean up error handling in spapr_validate_node_memory()
  pseries: Clean up error handling in spapr_vga_init()
  pseries: Clean up error handling in spapr_rtas_register()
  pseries: Clean up error handling in xics_system_init()
  pseries: Clean up error reporting in ppc_spapr_init()
  pseries: Clean up error reporting in htab migration functions
  pseries: Allow TCG h_enter to work with hotplugged memory
  target-ppc: Remove unused kvmppc_read_segment_page_sizes() stub
  target-ppc: Convert mmu-hash{32,64}.[ch] from CPUPPCState to PowerPCCPU
  target-ppc: Rework ppc_store_slb
  target-ppc: Rework SLB page size lookup
  target-ppc: Use actual page size encodings from HPTE
  target-ppc: Remove unused mmu models from ppc_tlb_invalidate_one
  target-ppc: Split 44x tlbiva from ppc_tlb_invalidate_one()
  target-ppc: Add new TLB invalidate by HPTE call for hash64 MMUs
  target-ppc: Helper to determine page size information from hpte alone
  target-ppc: Allow more page sizes for POWER7 & POWER8 in TCG

Greg Kurz (6):
  target-ppc: kvm: fix floating point registers sync on little-endian hosts
  target-ppc: rename and export maybe_bswap_register()
  target-ppc: gdbstub: fix float registers for little-endian guests
  target-ppc: gdbstub: introduce avr_need_swap()
  target-ppc: gdbstub: fix altivec registers for little-endian guests
  target-ppc: gdbstub: fix spe registers for little-endian guests

James Clarke (2):
  target-ppc: Make every FPSCR_ macro have a corresponding FP_ macro
  target-ppc: mcrfs should always update FEX/VX and only clear exception 
bits

Mark Cave-Ayland (5):
  target-ppc: use cpu_write_xer() helper in cpu_post_load
  macio: use the existing IDEDMA aiocb to hold the active DMA aiocb
  macio: add dma_active to VMStateDescription
  mac_dbdma: add DBDMA controller state to VMStateDescription
  cuda: add missing fields to VMStateDescription

Programmingkid (1):
  uninorth.c: add support for UniNorth kMacRISCPCIAddressSelect (0x48) 
register

 configure   |   6 +-
 gdb-xml/power-vsx.xml   |  44 
 hw/ide/macio.c  |  23 ++--
 hw/misc/macio/cuda.c|  12 +-
 hw/misc/macio/mac_dbdma.c   |  40 ++-
 hw/pci-host/uninorth.c  |   9 ++
 hw/ppc/mac.h|   1 -
 hw/ppc/spapr.c  | 112 ++
 hw/ppc/spapr_hcall.c| 145 +---
 hw/ppc/spapr_rtas.c |  50 
 include/hw/ppc/spapr.h  |  36 ++
 target-ppc/cpu-models.c |  12 +-
 target-ppc/cpu-models.h |   4 +-
 target-ppc/cpu.h|  41 +--
 target-ppc/gdbstub.c|  10 +-
 target-ppc/helper.h |   1 +
 target-ppc/kvm.c|  14 ++-
 target-ppc/kvm_ppc.h|   5 -
 target-ppc/machine.c|  22 +++-
 target-ppc/mmu-hash32.c |  68 ++-
 target-ppc/mmu-hash32.h |  30 ++---
 target-ppc/mmu-hash64.c | 270

[Qemu-devel] [PULL 06/40] cuda: add missing fields to VMStateDescription

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Include some fields missed from the previous VMState conversion to the
migration stream, as well as the new SR_INT delay timer.

Signed-off-by: Mark Cave-Ayland 
Signed-off-by: David Gibson 
---
 hw/misc/macio/cuda.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/misc/macio/cuda.c b/hw/misc/macio/cuda.c
index 8d450cf..0bd90e8 100644
--- a/hw/misc/macio/cuda.c
+++ b/hw/misc/macio/cuda.c
@@ -705,15 +705,17 @@ static const VMStateDescription vmstate_cuda_timer = {
 
 static const VMStateDescription vmstate_cuda = {
 .name = "cuda",
-.version_id = 2,
-.minimum_version_id = 2,
+.version_id = 3,
+.minimum_version_id = 3,
 .fields = (VMStateField[]) {
 VMSTATE_UINT8(a, CUDAState),
 VMSTATE_UINT8(b, CUDAState),
+VMSTATE_UINT8(last_b, CUDAState),
 VMSTATE_UINT8(dira, CUDAState),
 VMSTATE_UINT8(dirb, CUDAState),
 VMSTATE_UINT8(sr, CUDAState),
 VMSTATE_UINT8(acr, CUDAState),
+VMSTATE_UINT8(last_acr, CUDAState),
 VMSTATE_UINT8(pcr, CUDAState),
 VMSTATE_UINT8(ifr, CUDAState),
 VMSTATE_UINT8(ier, CUDAState),
@@ -728,6 +730,7 @@ static const VMStateDescription vmstate_cuda = {
 VMSTATE_STRUCT_ARRAY(timers, CUDAState, 2, 1,
  vmstate_cuda_timer, CUDATimer),
 VMSTATE_TIMER_PTR(adb_poll_timer, CUDAState),
+VMSTATE_TIMER_PTR(sr_delay_timer, CUDAState),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.5.0

[Qemu-devel] [PULL 03/40] macio: use the existing IDEDMA aiocb to hold the active DMA aiocb

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Currently the aiocb is held within MACIOIDEState, however the IDE core code
assumes that the current actvie DMA aiocb is held in aiocb in a few places,
e.g. ide_bus_reset() and ide_reset().

Switch over to using IDEDMA aiocb to store the aiocb for the current active
DMA request so that bus resets and restarts are handled correctly. As a
consequence we can now use ide_set_inactive() rather than handling its
functionality ourselves.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: John Snow 
Signed-off-by: David Gibson 
---
 hw/ide/macio.c | 20 
 hw/ppc/mac.h   |  1 -
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 336784b..bfdc377 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -120,8 +120,8 @@ static void pmac_dma_read(BlockBackend *blk,
 MACIO_DPRINTF("--- Block read transfer - sector_num: %" PRIx64 "  "
   "nsector: %x\n", (offset >> 9), (bytes >> 9));
 
-m->aiocb = blk_aio_readv(blk, (offset >> 9), >iov, (bytes >> 9),
- cb, io);
+s->bus->dma->aiocb = blk_aio_readv(blk, (offset >> 9), >iov,
+ (bytes >> 9), cb, io);
 }
 
 static void pmac_dma_write(BlockBackend *blk,
@@ -205,8 +205,8 @@ static void pmac_dma_write(BlockBackend *blk,
 MACIO_DPRINTF("--- Block write transfer - sector_num: %" PRIx64 "  "
   "nsector: %x\n", (offset >> 9), (bytes >> 9));
 
-m->aiocb = blk_aio_writev(blk, (offset >> 9), >iov, (bytes >> 9),
-  cb, io);
+s->bus->dma->aiocb = blk_aio_writev(blk, (offset >> 9), >iov,
+ (bytes >> 9), cb, io);
 }
 
 static void pmac_dma_trim(BlockBackend *blk,
@@ -232,8 +232,8 @@ static void pmac_dma_trim(BlockBackend *blk,
 s->io_buffer_index += io->len;
 io->len = 0;
 
-m->aiocb = ide_issue_trim(blk, (offset >> 9), >iov, (bytes >> 9),
-  cb, io);
+s->bus->dma->aiocb = ide_issue_trim(blk, (offset >> 9), >iov,
+ (bytes >> 9), cb, io);
 }
 
 static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
@@ -292,6 +292,8 @@ done:
 } else {
 block_acct_done(blk_get_stats(s->blk), >acct);
 }
+
+ide_set_inactive(s, false);
 io->dma_end(opaque);
 }
 
@@ -306,7 +308,6 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
 
 if (ret < 0) {
 MACIO_DPRINTF("DMA error: %d\n", ret);
-m->aiocb = NULL;
 ide_dma_error(s);
 goto done;
 }
@@ -357,6 +358,8 @@ done:
 block_acct_done(blk_get_stats(s->blk), >acct);
 }
 }
+
+ide_set_inactive(s, false);
 io->dma_end(opaque);
 }
 
@@ -394,8 +397,9 @@ static void pmac_ide_transfer(DBDMA_io *io)
 static void pmac_ide_flush(DBDMA_io *io)
 {
 MACIOIDEState *m = io->opaque;
+IDEState *s = idebus_active_if(>bus);
 
-if (m->aiocb) {
+if (s->bus->dma->aiocb) {
 blk_drain_all();
 }
 }
diff --git a/hw/ppc/mac.h b/hw/ppc/mac.h
index e375ed2..ecf7792 100644
--- a/hw/ppc/mac.h
+++ b/hw/ppc/mac.h
@@ -134,7 +134,6 @@ typedef struct MACIOIDEState {
 
 MemoryRegion mem;
 IDEBus bus;
-BlockAIOCB *aiocb;
 IDEDMA dma;
 void *dbdma;
 bool dma_active;
-- 
2.5.0

[Qemu-devel] [PULL 14/40] pseries: Clean up error handling in spapr_vga_init()

2016-01-31 Thread David Gibson

Use error_setg() to return an error rather than an explicit exit().
Previously it was an exit(0) instead of a non-zero exit code, which was
simply a bug.  Also improve the error message.

While we're at it change the type of spapr_vga_init() to bool since that's
how we're using it anyway.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4e6ee6d..3f90e50 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1246,7 +1246,7 @@ static void spapr_rtc_create(sPAPRMachineState *spapr)
 }
 
 /* Returns whether we want to use VGA or not */
-static int spapr_vga_init(PCIBus *pci_bus)
+static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
 {
 switch (vga_interface_type) {
 case VGA_NONE:
@@ -1257,9 +1257,9 @@ static int spapr_vga_init(PCIBus *pci_bus)
 case VGA_VIRTIO:
 return pci_vga_init(pci_bus) != NULL;
 default:
-fprintf(stderr, "This vga model is not supported,"
-"currently it only supports -vga std\n");
-exit(0);
+error_setg(errp,
+   "Unsupported VGA mode, only -vga std or -vga virtio is 
supported");
+return false;
 }
 }
 
@@ -1934,7 +1934,7 @@ static void ppc_spapr_init(MachineState *machine)
 }
 
 /* Graphics */
-if (spapr_vga_init(phb->bus)) {
+if (spapr_vga_init(phb->bus, _fatal)) {
 spapr->has_graphics = true;
 machine->usb |= defaults_enabled() && !machine->usb_disabled;
 }
-- 
2.5.0

[Qemu-devel] [PULL 05/40] mac_dbdma: add DBDMA controller state to VMStateDescription

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Make sure that we include the DBDMA controller state in the migration
stream.

Signed-off-by: Mark Cave-Ayland 
Signed-off-by: David Gibson 
---
 hw/misc/macio/mac_dbdma.c | 40 
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/hw/misc/macio/mac_dbdma.c b/hw/misc/macio/mac_dbdma.c
index c6d5b96..d81dea7 100644
--- a/hw/misc/macio/mac_dbdma.c
+++ b/hw/misc/macio/mac_dbdma.c
@@ -713,20 +713,52 @@ static const MemoryRegionOps dbdma_ops = {
 },
 };
 
-static const VMStateDescription vmstate_dbdma_channel = {
-.name = "dbdma_channel",
+static const VMStateDescription vmstate_dbdma_io = {
+.name = "dbdma_io",
+.version_id = 0,
+.minimum_version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(addr, struct DBDMA_io),
+VMSTATE_INT32(len, struct DBDMA_io),
+VMSTATE_INT32(is_last, struct DBDMA_io),
+VMSTATE_INT32(is_dma_out, struct DBDMA_io),
+VMSTATE_BOOL(processing, struct DBDMA_io),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_dbdma_cmd = {
+.name = "dbdma_cmd",
 .version_id = 0,
 .minimum_version_id = 0,
 .fields = (VMStateField[]) {
+VMSTATE_UINT16(req_count, dbdma_cmd),
+VMSTATE_UINT16(command, dbdma_cmd),
+VMSTATE_UINT32(phy_addr, dbdma_cmd),
+VMSTATE_UINT32(cmd_dep, dbdma_cmd),
+VMSTATE_UINT16(res_count, dbdma_cmd),
+VMSTATE_UINT16(xfer_status, dbdma_cmd),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_dbdma_channel = {
+.name = "dbdma_channel",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
 VMSTATE_UINT32_ARRAY(regs, struct DBDMA_channel, DBDMA_REGS),
+VMSTATE_STRUCT(io, struct DBDMA_channel, 0, vmstate_dbdma_io, 
DBDMA_io),
+VMSTATE_STRUCT(current, struct DBDMA_channel, 0, vmstate_dbdma_cmd,
+   dbdma_cmd),
 VMSTATE_END_OF_LIST()
 }
 };
 
 static const VMStateDescription vmstate_dbdma = {
 .name = "dbdma",
-.version_id = 2,
-.minimum_version_id = 2,
+.version_id = 3,
+.minimum_version_id = 3,
 .fields = (VMStateField[]) {
 VMSTATE_STRUCT_ARRAY(channels, DBDMAState, DBDMA_CHANNELS, 1,
  vmstate_dbdma_channel, DBDMA_channel),
-- 
2.5.0

[Qemu-devel] [PULL 28/40] uninorth.c: add support for UniNorth kMacRISCPCIAddressSelect (0x48) register

2016-01-31 Thread David Gibson

From: Programmingkid 

Darwin/OS X use the undocumented kMacRISCPCIAddressSelect (0x48) to
configure PCI memory space size for mac99 machines. Without this
register, warnings similar to below are emitted to the console during boot:

AppleMacRiscPCI: bad range 2(8000:0100)
AppleMacRiscPCI: bad range 2(8100:1000)
AppleMacRiscPCI: bad range 2(8108:0008)

Based upon the algorithm in Darwin's AppleMacRiscPCI.cpp driver, set the
kMacRISCPCIAddressSelect register so that Darwin considers the PCI
memory space to be at 0x8000 (size 0x1000) which matches that
currently used by QEMU and OpenBIOS.

Signed-off-by: John Arbuckle 
Tested-by: Mark Cave-Ayland 
[commit message and comment revised as suggested by Mark Cave-Ayland]
Signed-off-by: David Gibson 
---
 hw/pci-host/uninorth.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/pci-host/uninorth.c b/hw/pci-host/uninorth.c
index 778f8e6..40a2e3e 100644
--- a/hw/pci-host/uninorth.c
+++ b/hw/pci-host/uninorth.c
@@ -331,6 +331,15 @@ static void unin_agp_pci_host_realize(PCIDevice *d, Error 
**errp)
 d->config[0x0C] = 0x08; // cache_line_size
 d->config[0x0D] = 0x10; // latency_timer
 //d->config[0x34] = 0x80; // capabilities_pointer
+/*
+ * Set kMacRISCPCIAddressSelect (0x48) register to indicate PCI
+ * memory space with base 0x8000, size 0x1000 for Apple's
+ * AppleMacRiscPCI driver
+ */
+d->config[0x48] = 0x0;
+d->config[0x49] = 0x0;
+d->config[0x4a] = 0x0;
+d->config[0x4b] = 0x1;
 }
 
 static void u3_agp_pci_host_realize(PCIDevice *d, Error **errp)
-- 
2.5.0

Re: [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync

2016-01-31 Thread Fam Zheng

On Fri, 01/29 11:31, Stefan Hajnoczi wrote:
> On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote:
> > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, 
> > void *opaque);
> >   * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), 
> > and
> >   * anything that uses bdrv_drain_all() in the main loop.
> >   *
> > + * The job->deferred_to_main_loop flag will be set. Caller must clear it 
> > once
> > + * the deferred work is done and the block job coroutine continues, unless 
> > it's
> > + * completing immediately.
> > + *
> 
> It's not necessary to expose job->deferred_to_main_loop to the user.
> Just clear it:
> 
> static void block_job_defer_to_main_loop_bh(void *opaque)
> {
> BlockJobDeferToMainLoopData *data = opaque;
> AioContext *aio_context;
> 
> qemu_bh_delete(data->bh);
> 
> /* Prevent race with block_job_defer_to_main_loop() */
> aio_context_acquire(data->aio_context);
> 
> /* Fetch BDS AioContext again, in case it has changed */
> aio_context = bdrv_get_aio_context(data->job->bs);
> aio_context_acquire(aio_context);
> 
> data->fn(data->job, data->opaque);
> job->deferred_to_main_loop = false;  /* <- HERE */

Maybe move one line above in case data->fn() does another
block_job_defer_to_main_loop()?

Fam

> 
> aio_context_release(aio_context);
> 
> aio_context_release(data->aio_context);
> 
> g_free(data);
> }

[Qemu-devel] [PULL 18/40] pseries: Clean up error reporting in htab migration functions

2016-01-31 Thread David Gibson

The functions for migrating the hash page table on pseries machine type
(htab_save_setup() and htab_load()) can report some errors with an
explicit fprintf() before returning an appropriate error code.  Change some
of these to use error_report() instead. htab_save_setup() is omitted for
now to avoid conflicts with some other in-progress work.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c05ddfb..5bd8fd3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1534,7 +1534,7 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 int fd = -1;
 
 if (version_id < 1 || version_id > 1) {
-fprintf(stderr, "htab_load() bad version\n");
+error_report("htab_load() bad version");
 return -EINVAL;
 }
 
@@ -1555,8 +1555,8 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 
 fd = kvmppc_get_htab_fd(true);
 if (fd < 0) {
-fprintf(stderr, "Unable to open fd to restore KVM hash table: 
%s\n",
-strerror(errno));
+error_report("Unable to open fd to restore KVM hash table: %s",
+ strerror(errno));
 }
 }
 
@@ -1576,9 +1576,9 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 if ((index + n_valid + n_invalid) >
 (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
 /* Bad index in stream */
-fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
-"in htab stream (htab_shift=%d)\n", index, n_valid, 
n_invalid,
-spapr->htab_shift);
+error_report(
+"htab_load() bad index %d (%hd+%hd entries) in htab stream 
(htab_shift=%d)",
+index, n_valid, n_invalid, spapr->htab_shift);
 return -EINVAL;
 }
 
-- 
2.5.0

[Qemu-devel] [PULL 33/40] target-ppc: Use actual page size encodings from HPTE

2016-01-31 Thread David Gibson

At present the 64-bit hash MMU code uses information from the SLB to
determine the page size of a translation.  We do need that information to
correctly look up the hash table.  However the MMU also allows a
possibly larger page size to be encoded into the HPTE itself, which is used
to populate the TLB.  At present qemu doesn't check that, and so doesn't
support the MPSS "Multiple Page Size per Segment" feature.

This makes a start on allowing this, by adding an hpte_page_shift()
function which looks up the page size of an HPTE.  We use this to validate
page sizes encodings on faults, and populate the qemu TLB with larger
page sizes when appropriate.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/mmu-hash64.c | 63 ++---
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 9ad02f3..f4c25b7 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -22,6 +22,7 @@
 #include "exec/helper-proto.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
+#include "qemu/error-report.h"
 #include "kvm_ppc.h"
 #include "mmu-hash64.h"
 
@@ -475,12 +476,50 @@ static hwaddr ppc_hash64_htab_lookup(PowerPCCPU *cpu,
 return pte_offset;
 }
 
+static unsigned hpte_page_shift(const struct ppc_one_seg_page_size *sps,
+uint64_t pte0, uint64_t pte1)
+{
+int i;
+
+if (!(pte0 & HPTE64_V_LARGE)) {
+if (sps->page_shift != 12) {
+/* 4kiB page in a non 4kiB segment */
+return 0;
+}
+/* Normal 4kiB page */
+return 12;
+}
+
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+const struct ppc_one_page_size *ps = >enc[i];
+uint64_t mask;
+
+if (!ps->page_shift) {
+break;
+}
+
+if (ps->page_shift == 12) {
+/* L bit is set so this can't be a 4kiB page */
+continue;
+}
+
+mask = ((1ULL << ps->page_shift) - 1) & HPTE64_R_RPN;
+
+if ((pte1 & mask) == (ps->pte_enc << HPTE64_R_RPN_SHIFT)) {
+return ps->page_shift;
+}
+}
+
+return 0; /* Bad page size encoding */
+}
+
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong eaddr,
 int rwx, int mmu_idx)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
 ppc_slb_t *slb;
+unsigned apshift;
 hwaddr pte_offset;
 ppc_hash_pte64_t pte;
 int pp_prot, amr_prot, prot;
@@ -544,6 +583,18 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, 
target_ulong eaddr,
 qemu_log_mask(CPU_LOG_MMU,
 "found PTE at offset %08" HWADDR_PRIx "\n", pte_offset);
 
+/* Validate page size encoding */
+apshift = hpte_page_shift(slb->sps, pte.pte0, pte.pte1);
+if (!apshift) {
+error_report("Bad page size encoding in HPTE 0x%"PRIx64" - 0x%"PRIx64
+ " @ 0x%"HWADDR_PRIx, pte.pte0, pte.pte1, pte_offset);
+/* Not entirely sure what the right action here, but machine
+ * check seems reasonable */
+cs->exception_index = POWERPC_EXCP_MCHECK;
+env->error_code = 0;
+return 1;
+}
+
 /* 5. Check access permissions */
 
 pp_prot = ppc_hash64_pte_prot(cpu, slb, pte);
@@ -596,10 +647,10 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, 
target_ulong eaddr,
 
 /* 7. Determine the real address from the PTE */
 
-raddr = deposit64(pte.pte1 & HPTE64_R_RPN, 0, slb->sps->page_shift, eaddr);
+raddr = deposit64(pte.pte1 & HPTE64_R_RPN, 0, apshift, eaddr);
 
 tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
- prot, mmu_idx, TARGET_PAGE_SIZE);
+ prot, mmu_idx, 1ULL << apshift);
 
 return 0;
 }
@@ -610,6 +661,7 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 ppc_slb_t *slb;
 hwaddr pte_offset;
 ppc_hash_pte64_t pte;
+unsigned apshift;
 
 if (msr_dr == 0) {
 /* In real mode the top 4 effective address bits are ignored */
@@ -626,7 +678,12 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 return -1;
 }
 
-return deposit64(pte.pte1 & HPTE64_R_RPN, 0, slb->sps->page_shift, addr)
+apshift = hpte_page_shift(slb->sps, pte.pte0, pte.pte1);
+if (!apshift) {
+return -1;
+}
+
+return deposit64(pte.pte1 & HPTE64_R_RPN, 0, apshift, addr)
 & TARGET_PAGE_MASK;
 }
 
-- 
2.5.0

[Qemu-devel] [PULL 35/40] target-ppc: Split 44x tlbiva from ppc_tlb_invalidate_one()

2016-01-31 Thread David Gibson

Currently both the tlbiva instruction (used on 44x chips) and the tlbie
instruction (used on hash MMU chips) are both handled via
ppc_tlb_invalidate_one().  This is silly, because they're invoked from
different places, and do different things.

Clean this up by separating out the tlbiva instruction into its own
handling.  In fact the implementation is only a stub anyway.

Signed-off-by: David Gibson 
Reviewed-by: Laurent Vivier 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/helper.h |  1 +
 target-ppc/mmu_helper.c | 14 ++
 target-ppc/translate.c  |  2 +-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 869be15..e5a8f7b 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -544,6 +544,7 @@ DEF_HELPER_2(74xx_tlbd, void, env, tl)
 DEF_HELPER_2(74xx_tlbi, void, env, tl)
 DEF_HELPER_FLAGS_1(tlbia, TCG_CALL_NO_RWG, void, env)
 DEF_HELPER_FLAGS_2(tlbie, TCG_CALL_NO_RWG, void, env, tl)
+DEF_HELPER_FLAGS_2(tlbiva, TCG_CALL_NO_RWG, void, env, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_3(store_slb, TCG_CALL_NO_RWG, void, env, tl, tl)
 DEF_HELPER_2(load_slb_esid, tl, env, tl)
diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 4343cb2..de4e286 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -1946,10 +1946,6 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, 
target_ulong addr)
 ppc6xx_tlb_invalidate_virt(env, addr, 1);
 }
 break;
-case POWERPC_MMU_BOOKE:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "BookE MMU model is not implemented\n");
-break;
 case POWERPC_MMU_32B:
 case POWERPC_MMU_601:
 /* tlbie invalidate TLBs for all segments */
@@ -2091,6 +2087,16 @@ void helper_tlbie(CPUPPCState *env, target_ulong addr)
 ppc_tlb_invalidate_one(env, addr);
 }
 
+void helper_tlbiva(CPUPPCState *env, target_ulong addr)
+{
+PowerPCCPU *cpu = ppc_env_get_cpu(env);
+
+/* tlbiva instruction only exists on BookE */
+assert(env->mmu_model == POWERPC_MMU_BOOKE);
+/* XXX: TODO */
+cpu_abort(CPU(cpu), "BookE MMU model is not implemented\n");
+}
+
 /* Software driven TLBs management */
 /* PowerPC 602/603 software TLB load instructions helpers */
 static void do_6xx_tlb(CPUPPCState *env, target_ulong new_EPN, int is_code)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 3beeb45..0219d38 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -5905,7 +5905,7 @@ static void gen_tlbiva(DisasContext *ctx)
 }
 t0 = tcg_temp_new();
 gen_addr_reg_index(ctx, t0);
-gen_helper_tlbie(cpu_env, cpu_gpr[rB(ctx->opcode)]);
+gen_helper_tlbiva(cpu_env, cpu_gpr[rB(ctx->opcode)]);
 tcg_temp_free(t0);
 #endif
 }
-- 
2.5.0

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-01-31 Thread Hailiang Zhang


On 2016/2/1 11:14, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

We add a new helper function netdev_add_filter(), this function
can help adding a filter object to a netdev.
Besides, we add a is_default member for struct NetFilterState
to indicate whether the filter is default or not.

Signed-off-by: zhanghailiang 
---
v2:
  -Re-implement netdev_add_filter() by re-using object_create()
   (Jason's suggestion)
---
  include/net/filter.h |  7 +
  net/filter.c | 80 
  2 files changed, 87 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index af3c53c..ee1c024 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,6 +55,7 @@ struct NetFilterState {
  char *netdev_id;
  NetClientState *netdev;
  NetFilterDirection direction;
+bool is_default;
  bool enabled;
  QTAILQ_ENTRY(NetFilterState) next;
  };
@@ -74,4 +75,10 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
  int iovcnt,
  void *opaque);

+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp);
+
  #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index d08a2be..dc7aa9b 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -214,6 +214,86 @@ static void netfilter_complete(UserCreatable *uc, Error 
**errp)
  QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
  }

+QemuOptsList qemu_filter_opts = {
+.name = "default-filter",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_filter_opts.head),
+.desc = {
+{
+.name = "qom-type",
+.type = QEMU_OPT_STRING,
+},{
+.name = "id",
+.type = QEMU_OPT_STRING,
+},{
+.name = "netdev",
+.type = QEMU_OPT_STRING,
+},{
+.name = "status",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static void filter_set_default_flag(const char *id,
+bool is_default,
+Error **errp)
+{
+Object *obj, *container;
+NetFilterState *nf;
+
+container = object_get_objects_root();
+obj = object_resolve_path_component(container, id);
+if (!obj) {
+error_setg(errp, "object id not found");
+return;
+}
+nf = NETFILTER(obj);
+nf->is_default = is_default;
+}
+
+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp)
+{
+NetClientState *nc = qemu_find_netdev(netdev_id);
+char *optarg;
+QemuOpts *opts = NULL;
+Error *err = NULL;
+
+/* FIXME: Not support multiple queues */
+if (!nc || nc->queue_index > 1) {
+return;
+}
+/* Not support vhost-net */
+if (get_vhost_net(nc)) {
+return;
+}
+
+optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
+filter_type, id, netdev_id, is_default ? "disable" : "enable"


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter property instead of
assemble it in this helper ?


- colo code may change the pointer to "filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?

Thanks,
Hailiang


Thoughts?


+opts = qemu_opts_parse_noisily(_filter_opts,
+   optarg, false);
+if (!opts) {
+error_report("Failed to parse param '%s'", optarg);
+exit(1);
+}
+g_free(optarg);
+if (object_create(NULL, opts, ) < 0) {
+error_report("Failed to create object");
+goto out_clean;
+}
+filter_set_default_flag(id, is_default, );
+
+out_clean:
+qemu_opts_del(opts);
+if (err) {
+error_propagate(errp, err);
+}
+}
+
  static void netfilter_finalize(Object *obj)
  {
  NetFilterState *nf = NETFILTER(obj);



.

Re: [Qemu-devel] [RFC Patch v2 07/10] virtio-net rsc: Checking TCP flag and drain specific connection packets

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> Normally it includes 2 typical way to handle a TCP control flag, bypass
> and finalize, bypass means should be sent out directly, and finalize
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
>
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flag such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 66 
> +
>  1 file changed, 66 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 88fc4f8..b0987d0 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -41,6 +41,12 @@
>  
>  #define VIRTIO_HEADER   12/* Virtio net header size */
>  #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
> +
> +#define IP4_ADDR_OFFSET (IP_OFFSET + 12)/* ipv4 address start */
> +#define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
> +#define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
> +#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
> +#define TCP_PORT_SIZE   4   /* sport + dport */
>  #define TCP_WINDOW  65535
>  
>  /* IPv4 max payload, 16 bits in the header */
> @@ -1850,6 +1856,27 @@ static int32_t 
> virtio_net_rsc_try_coalesce4(NetRscChain *chain,
>  o_data, _ip->ip_len, MAX_IP4_PAYLOAD);
>  }
>  
> +
> +/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
> + * to prevent out of order */
> +static int virtio_net_rsc_parse_tcp_ctrl(uint8_t *ip, uint16_t offset)
> +{
> +uint16_t tcp_flag;
> +struct tcp_header *tcp;
> +
> +tcp = (struct tcp_header *)(ip + offset);
> +tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
> +if (tcp_flag & TH_SYN) {
> +return RSC_BYPASS;
> +}
> +
> +if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
> +return RSC_FINAL;
> +}
> +
> +return 0;
> +}

To avid breaking bisection, need to squash this into previous patches
for a complete implementation of tcp coalescing.

> +
>  static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
>  const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
>  {
> @@ -1895,12 +1922,51 @@ static size_t virtio_net_rsc_callback(NetRscChain 
> *chain, NetClientState *nc,
>  return virtio_net_rsc_cache_buf(chain, nc, buf, size);
>  }
>  
> +/* Drain a connection data, this is to avoid out of order segments */
> +static size_t virtio_net_rsc_drain_one(NetRscChain *chain, NetClientState 
> *nc,
> +const uint8_t *buf, size_t size, uint16_t ip_start,
> +uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)
> +{
> +NetRscSeg *seg, *nseg;
> +
> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
> +if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
> +|| memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {

Do you really mean "||" here?

> +continue;
> +}
> +if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
> +virtio_net_rsc_ipv4_checksum(seg);
> +}
> +
> +virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> +
> +QTAILQ_REMOVE(>buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);

The above three or four lines looks like a duplication two or three
times in the codes of previous patch. Need consider a new helper.

> +break;
> +}
> +
> +return virtio_net_do_receive(nc, buf, size);
> +}
>  static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
>const uint8_t *buf, size_t size)
>  {
> +int32_t ret;
> +struct ip_header *ip;
>  NetRscChain *chain;
>  
>  chain = (NetRscChain *)opq;
> +ip = (struct ip_header *)(buf + IP_OFFSET);
> +
> +ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
> +(0xF & ip->ip_ver_len) << 2);

This looks like a layer violation here. I think it should be done in
virtio_net_rsc_roalesce_tcp().

> +if (RSC_BYPASS == ret) {
> +return virtio_net_do_receive(nc, buf, size);
> +} else if (RSC_FINAL == ret) {
> +return virtio_net_rsc_drain_one(chain, nc, buf, size, 
> IP4_ADDR_OFFSET,
> +IP4_ADDR_SIZE, TCP4_PORT_OFFSET, 
> TCP_PORT_SIZE);

It's better for virtio_net_rsc_drain_one() itself to check the ip proto
and switch to use v4 or v6 offset/size, instead of passing a long
parameter list of OFFSET/SIZE macros.

> +}
> +
>  return virtio_net_rsc_callback(chain, nc,

Re: [Qemu-devel] [RFC Patch v2 09/10] virtio-net rsc: Add IPv6 support

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> A few more stuffs should be included to support this
> 1. Corresponding chain lookup
> 2. Coalescing callback for the protocol chain
> 3. Filter & Sanity Check.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 104 
> +++-
>  1 file changed, 102 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9b44762..c9f6bfc 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -46,12 +46,19 @@
>  #define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
>  #define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
>  #define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
> +
> +#define IP6_ADDR_OFFSET (IP_OFFSET + 8) /* ipv6 address start */
> +#define TCP6_OFFSET (IP_OFFSET + sizeof(struct ip6_header)) /* tcp6 header */
> +#define TCP6_PORT_OFFSET TCP6_OFFSET/* tcp6 port offset */
> +#define IP6_ADDR_SIZE   32  /* ipv6 saddr + daddr */
>  #define TCP_PORT_SIZE   4   /* sport + dport */
>  #define TCP_WINDOW  65535
>  
>  /* IPv4 max payload, 16 bits in the header */
>  #define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
>  
> +/* ip6 max payload, payload in ipv6 don't include the  header */
> +#define MAX_IP6_PAYLOAD  65535
>  #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
>  
>  /* Purge coalesced packets timer interval */
> @@ -1856,6 +1863,42 @@ static int32_t 
> virtio_net_rsc_try_coalesce4(NetRscChain *chain,
>  o_data, _ip->ip_len, MAX_IP4_PAYLOAD);
>  }
>  
> +static int32_t virtio_net_rsc_try_coalesce6(NetRscChain *chain,
> +NetRscSeg *seg, const uint8_t *buf, size_t size)
> +{
> +uint16_t o_ip_len, n_ip_len;/* len in ip header field */
> +uint16_t n_tcp_len, o_tcp_len;  /* tcp header len */
> +uint16_t o_data, n_data;/* payload without virtio/eth/ip/tcp */
> +struct ip6_header *n_ip, *o_ip;
> +struct tcp_header *n_tcp, *o_tcp;
> +
> +n_ip = (struct ip6_header *)(buf + IP_OFFSET);
> +n_ip_len = htons(n_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
> +n_tcp = (struct tcp_header *)(((uint8_t *)n_ip)\
> ++ sizeof(struct ip6_header));
> +n_tcp_len = (htons(n_tcp->th_offset_flags) & 0xF000) >> 10;
> +n_data = n_ip_len - n_tcp_len;
> +
> +o_ip = (struct ip6_header *)(seg->buf + IP_OFFSET);
> +o_ip_len = htons(o_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
> +o_tcp = (struct tcp_header *)(((uint8_t *)o_ip)\
> ++ sizeof(struct ip6_header));
> +o_tcp_len = (htons(o_tcp->th_offset_flags) & 0xF000) >> 10;
> +o_data = o_ip_len - o_tcp_len;

Like I've replied in previous mails, need a helper or just store
pointers to both ip and tcp in seg.

> +
> +if (memcmp(_ip->ip6_src, _ip->ip6_src, sizeof(struct in6_address))
> +|| memcmp(_ip->ip6_dst, _ip->ip6_dst, sizeof(struct in6_address))
> +|| (n_tcp->th_sport ^ o_tcp->th_sport)
> +|| (n_tcp->th_dport ^ o_tcp->th_dport)) {
> +return RSC_NO_MATCH;
> +}

And if you still want to handle coalescing in a layer style, better
delay the check of ports to tcp function.

> +
> +/* There is a difference between payload lenght in ipv4 and v6,
> +   ip header is excluded in ipv6 */
> +return virtio_net_rsc_coalesce_tcp(chain, seg, buf,
> +   n_tcp, n_tcp_len, n_data, o_tcp, o_tcp_len, o_data,
> +   _ip->ip6_ctlun.ip6_un1.ip6_un1_plen, 
> MAX_IP6_PAYLOAD);
> +}
>  
>  /* Pakcets with 'SYN' should bypass, other flag should be sent after drain
>   * to prevent out of order */
> @@ -2015,6 +2058,59 @@ static size_t virtio_net_rsc_receive4(void *opq, 
> NetClientState* nc,
> virtio_net_rsc_try_coalesce4);
>  }
>  
> +static int32_t virtio_net_rsc_filter6(NetRscChain *chain, struct ip6_header 
> *ip,
> +  const uint8_t *buf, size_t size)
> +{
> +uint16_t ip_len;
> +
> +if (size < (TCP6_OFFSET + sizeof(tcp_header))) {
> +return RSC_BYPASS;
> +}
> +
> +if (0x6 != (0xF & ip->ip6_ctlun.ip6_un1.ip6_un1_flow)) {
> +return RSC_BYPASS;
> +}
> +
> +/* Both option and protocol is checked in this */
> +if (ip->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
> +return RSC_BYPASS;
> +}
> +
> +/* Sanity check */
> +ip_len = htons(ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
> +if (ip_len < sizeof(struct tcp_header)
> +|| ip_len > (size - TCP6_OFFSET)) {
> +return RSC_BYPASS;
> +}
> +
> +return 0;

RSC_WANT?

> +}
> +
> +static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
> +  const uint8_t *buf,

Re: [Qemu-devel] [RFC Patch v2 10/10] virtio-net rsc: Add Receive Segment Coalesce statistics

2016-01-31 Thread Jason Wang

On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> Add statistics to log what happened during the process.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c| 49 
> +++---
>  include/hw/virtio/virtio.h | 33 +++
>  2 files changed, 79 insertions(+), 3 deletions(-)

Statistics is good, but need a way for reporting it to either end-user
(ethtool?) or developer (log, trace or other things).

[Qemu-devel] [PATCH v2] ES1370: QOMify

2016-01-31 Thread Cao jin

Signed-off-by: Cao jin 
---
v1 missed "pci_create_simple" modification.

 hw/audio/es1370.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/audio/es1370.c b/hw/audio/es1370.c
index 592578b..f26fea3 100644
--- a/hw/audio/es1370.c
+++ b/hw/audio/es1370.c
@@ -288,6 +288,10 @@ struct chan_bits {
uint32_t *old_freq, uint32_t *new_freq);
 };
 
+#define TYPE_ES1370 "ES1370"
+#define ES1370(obj) \
+OBJECT_CHECK(ES1370State, (obj), TYPE_ES1370)
+
 static void es1370_dac1_calc_freq (ES1370State *s, uint32_t ctl,
uint32_t *old_freq, uint32_t *new_freq);
 static void es1370_dac2_and_adc_calc_freq (ES1370State *s, uint32_t ctl,
@@ -1013,7 +1017,7 @@ static void es1370_on_reset (void *opaque)
 
 static void es1370_realize(PCIDevice *dev, Error **errp)
 {
-ES1370State *s = DO_UPCAST (ES1370State, dev, dev);
+ES1370State *s = ES1370(dev);
 uint8_t *c = s->dev.config;
 
 c[PCI_STATUS + 1] = PCI_STATUS_DEVSEL_SLOW >> 8;
@@ -1038,7 +1042,7 @@ static void es1370_realize(PCIDevice *dev, Error **errp)
 
 static int es1370_init (PCIBus *bus)
 {
-pci_create_simple (bus, -1, "ES1370");
+pci_create_simple (bus, -1, TYPE_ES1370);
 return 0;
 }
 
@@ -1059,7 +1063,7 @@ static void es1370_class_init (ObjectClass *klass, void 
*data)
 }
 
 static const TypeInfo es1370_info = {
-.name  = "ES1370",
+.name  = TYPE_ES1370,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof (ES1370State),
 .class_init= es1370_class_init,
-- 
2.1.0

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-01-31 Thread Hailiang Zhang


On 2016/2/1 15:46, Jason Wang wrote:



On 02/01/2016 02:13 PM, Hailiang Zhang wrote:

On 2016/2/1 11:14, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

We add a new helper function netdev_add_filter(), this function
can help adding a filter object to a netdev.
Besides, we add a is_default member for struct NetFilterState
to indicate whether the filter is default or not.

Signed-off-by: zhanghailiang 
---
v2:
   -Re-implement netdev_add_filter() by re-using object_create()
(Jason's suggestion)
---
   include/net/filter.h |  7 +
   net/filter.c | 80

   2 files changed, 87 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index af3c53c..ee1c024 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,6 +55,7 @@ struct NetFilterState {
   char *netdev_id;
   NetClientState *netdev;
   NetFilterDirection direction;
+bool is_default;
   bool enabled;
   QTAILQ_ENTRY(NetFilterState) next;
   };
@@ -74,4 +75,10 @@ ssize_t
qemu_netfilter_pass_to_next(NetClientState *sender,
   int iovcnt,
   void *opaque);

+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp);
+
   #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index d08a2be..dc7aa9b 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -214,6 +214,86 @@ static void netfilter_complete(UserCreatable
*uc, Error **errp)
   QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
   }

+QemuOptsList qemu_filter_opts = {
+.name = "default-filter",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_filter_opts.head),
+.desc = {
+{
+.name = "qom-type",
+.type = QEMU_OPT_STRING,
+},{
+.name = "id",
+.type = QEMU_OPT_STRING,
+},{
+.name = "netdev",
+.type = QEMU_OPT_STRING,
+},{
+.name = "status",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static void filter_set_default_flag(const char *id,
+bool is_default,
+Error **errp)
+{
+Object *obj, *container;
+NetFilterState *nf;
+
+container = object_get_objects_root();
+obj = object_resolve_path_component(container, id);
+if (!obj) {
+error_setg(errp, "object id not found");
+return;
+}
+nf = NETFILTER(obj);
+nf->is_default = is_default;
+}
+
+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp)
+{
+NetClientState *nc = qemu_find_netdev(netdev_id);
+char *optarg;
+QemuOpts *opts = NULL;
+Error *err = NULL;
+
+/* FIXME: Not support multiple queues */
+if (!nc || nc->queue_index > 1) {
+return;
+}
+/* Not support vhost-net */
+if (get_vhost_net(nc)) {
+return;
+}
+
+optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
+filter_type, id, netdev_id, is_default ? "disable" :
"enable"


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter property
instead of
assemble it in this helper ?


Yes. E.g just a global string which could be changed by any subsystem.
E.g colo may change it to "filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.



Got it. Then we don't need the global default_netfilter_type[] in patch 5,
Just use this global string instead ?




- colo code may change the pointer to "filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?


What's the reason that you want to distinguish default filters from others?



The default filters will be used by COLO or MC, (In COLO, we will use it
to control packets buffering/releasing).
For COLO, we don't want to control (use) other filters that added by users.

Thanks,
Hailiang


Thanks



Thanks,
Hailiang


Thoughts?


+opts = qemu_opts_parse_noisily(_filter_opts,
+   optarg, false);
+if (!opts) {
+

[Qemu-devel] [PATCH 1/2] Emulated CCID card: QOMify

2016-01-31 Thread Cao jin

Signed-off-by: Cao jin 
---
 hw/usb/ccid-card-emulated.c | 20 +---
 hw/usb/ccid.h   |  4 
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
index 869a63c..0b05260 100644
--- a/hw/usb/ccid-card-emulated.c
+++ b/hw/usb/ccid-card-emulated.c
@@ -42,8 +42,6 @@ do {\
 } \
 } while (0)
 
-#define EMULATED_DEV_NAME "ccid-card-emulated"
-
 #define BACKEND_NSS_EMULATED_NAME "nss-emulated"
 #define BACKEND_CERTIFICATES_NAME "certificates"
 
@@ -133,7 +131,7 @@ struct EmulatedState {
 static void emulated_apdu_from_guest(CCIDCardState *base,
 const uint8_t *apdu, uint32_t len)
 {
-EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
+EmulatedState *card = EMULATED_CCID_CARD(base);
 EmulEvent *event = (EmulEvent *)g_malloc(sizeof(EmulEvent) + len);
 
 assert(event);
@@ -150,7 +148,7 @@ static void emulated_apdu_from_guest(CCIDCardState *base,
 
 static const uint8_t *emulated_get_atr(CCIDCardState *base, uint32_t *len)
 {
-EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
+EmulatedState *card = EMULATED_CCID_CARD(base);
 
 *len = card->atr_length;
 return card->atr;
@@ -478,7 +476,7 @@ static uint32_t parse_enumeration(char *str,
 
 static int emulated_initfn(CCIDCardState *base)
 {
-EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
+EmulatedState *card = EMULATED_CCID_CARD(base);
 VCardEmulError ret;
 const EnumTable *ptable;
 
@@ -514,26 +512,26 @@ static int emulated_initfn(CCIDCardState *base)
 ret = emulated_initialize_vcard_from_certificates(card);
 } else {
 printf("%s: you must provide all three certs for"
-   " certificates backend\n", EMULATED_DEV_NAME);
+   " certificates backend\n", TYPE_EMULATED_CCID);
 return -1;
 }
 } else {
 if (card->backend != BACKEND_NSS_EMULATED) {
 printf("%s: bad backend specified. The options are:\n%s (default),"
-" %s.\n", EMULATED_DEV_NAME, BACKEND_NSS_EMULATED_NAME,
+" %s.\n", TYPE_EMULATED_CCID, BACKEND_NSS_EMULATED_NAME,
 BACKEND_CERTIFICATES_NAME);
 return -1;
 }
 if (card->cert1 != NULL || card->cert2 != NULL || card->cert3 != NULL) 
{
 printf("%s: unexpected cert parameters to nss emulated backend\n",
-   EMULATED_DEV_NAME);
+   TYPE_EMULATED_CCID);
 return -1;
 }
 /* default to mirroring the local hardware readers */
 ret = wrap_vcard_emul_init(NULL);
 }
 if (ret != VCARD_EMUL_OK) {
-printf("%s: failed to initialize vcard\n", EMULATED_DEV_NAME);
+printf("%s: failed to initialize vcard\n", TYPE_EMULATED_CCID);
 return -1;
 }
 qemu_thread_create(>event_thread_id, "ccid/event", event_thread,
@@ -545,7 +543,7 @@ static int emulated_initfn(CCIDCardState *base)
 
 static int emulated_exitfn(CCIDCardState *base)
 {
-EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
+EmulatedState *card = EMULATED_CCID_CARD(base);
 VEvent *vevent = vevent_new(VEVENT_LAST, NULL, NULL);
 
 vevent_queue_vevent(vevent); /* stop vevent thread */
@@ -588,7 +586,7 @@ static void emulated_class_initfn(ObjectClass *klass, void 
*data)
 }
 
 static const TypeInfo emulated_card_info = {
-.name  = EMULATED_DEV_NAME,
+.name  = TYPE_EMULATED_CCID,
 .parent= TYPE_CCID_CARD,
 .instance_size = sizeof(EmulatedState),
 .class_init= emulated_class_initfn,
diff --git a/hw/usb/ccid.h b/hw/usb/ccid.h
index 9334da8..315257e 100644
--- a/hw/usb/ccid.h
+++ b/hw/usb/ccid.h
@@ -23,6 +23,10 @@ typedef struct CCIDCardInfo CCIDCardInfo;
 #define CCID_CARD_GET_CLASS(obj) \
  OBJECT_GET_CLASS(CCIDCardClass, (obj), TYPE_CCID_CARD)
 
+#define TYPE_EMULATED_CCID "ccid-card-emulated"
+#define EMULATED_CCID_CARD(obj) \
+ OBJECT_CHECK(EmulatedState, (obj), TYPE_EMULATED_CCID)
+
 /*
  * callbacks to be used by the CCID device (hw/usb-ccid.c) to call
  * into the smartcard device (hw/ccid-card-*.c)
-- 
2.1.0

[Qemu-devel] [PATCH 0/2] CCID QOMify

2016-01-31 Thread Cao jin

As each commit says

Cao jin (2):
  Emulated CCID card: QOMify
  Passthru CCID card: QOMify

 hw/usb/ccid-card-emulated.c | 20 +---
 hw/usb/ccid-card-passthru.c | 10 --
 hw/usb/ccid.h   |  8 
 3 files changed, 21 insertions(+), 17 deletions(-)

-- 
2.1.0

Re: [Qemu-devel] [PATCH RFC v2 2/5] vl: Make object_create() public

2016-01-31 Thread Hailiang Zhang


On 2016/2/1 15:27, Jason Wang wrote:



On 02/01/2016 02:19 PM, Hailiang Zhang wrote:

On 2016/2/1 11:05, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

Make the helper object_create() public and fix its first
parameter to accept NULL value.


Looks not very nice. Maybe pass a new predicate func for sanity check it
better.



OK, but here is it better to check if the predicate func is NULL ?

Thanks,
Hailiang


Not sure, but if you stick to check against NULL, need a separate patch.



OK, i will drop this unnecessary check and add sanity check in next version, 
thanks.

Re: [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> Upon a packet is arriving, a corresponding chain will be selected or created,
> or be bypassed if it's not an IPv4 packets.
>
> The callback in the chain will be invoked to call the real coalescing.
>
> Since the coalescing is based on the TCP connection, so the packets will be
> cached if there is no previous data within the same connection.
>
> The framework of IPv4 is also introduced.
>
> This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
> coalescing)

Then looks like the order needs to be changed?

>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 173 
> +++-
>  1 file changed, 172 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 4e9458e..cfbac6d 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -14,10 +14,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -37,6 +39,21 @@
>  #define endof(container, field) \
>  (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define VIRTIO_HEADER   12/* Virtio net header size */

This looks wrong if mrg_rxbuf (VIRTIO_NET_F_MRG_RXBUF) is off.

> +#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
> +
> +#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
> +
> +/* Global statistics */
> +static uint32_t rsc_chain_no_mem;

This is meaningless, see below comments.

> +
> +/* Switcher to enable/disable rsc */
> +static bool virtio_net_rsc_bypass;
> +
> +/* Coalesce callback for ipv4/6 */
> +typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
> + const uint8_t *buf, size_t size);
> +
>  typedef struct VirtIOFeature {
>  uint32_t flags;
>  size_t end;
> @@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
> *buf, int size)
>  return 0;
>  }
>  
> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, 
> size_t size)
> +static ssize_t virtio_net_do_receive(NetClientState *nc,
> +  const uint8_t *buf, size_t size)
>  {
>  VirtIONet *n = qemu_get_nic_opaque(nc);
>  VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> @@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
>  }
>  }
>  
> +static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
> +const uint8_t *buf, size_t size)
> +{
> +NetRscSeg *seg;
> +
> +seg = g_malloc(sizeof(NetRscSeg));
> +if (!seg) {
> +return 0;
> +}

g_malloc() can't fail, no need to check if it succeeded.

> +
> +seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
> +if (!seg->buf) {
> +goto out;
> +}
> +
> +memmove(seg->buf, buf, size);
> +seg->size = size;
> +seg->dup_ack_count = 0;
> +seg->is_coalesced = 0;
> +seg->nc = nc;
> +
> +QTAILQ_INSERT_TAIL(>buffers, seg, next);
> +return size;
> +
> +out:
> +g_free(seg);
> +return 0;
> +}
> +
> +
> +static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
> +   NetRscSeg *seg, const uint8_t *buf, size_t size)
> +{
> +/* This real part of this function will be introduced in next patch, just
> +*  return a 'final' to feed the compilation. */
> +return RSC_FINAL;
> +}
> +
> +static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
> +const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
> +{

Looks like this function was called directly, so "callback" suffix is
not accurate.

> +int ret;
> +NetRscSeg *seg, *nseg;
> +
> +if (QTAILQ_EMPTY(>buffers)) {
> +if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
> +return 0;
> +} else {
> +return size;
> +}
> +}
> +
> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
> +ret = coalesce(chain, seg, buf, size);
> +if (RSC_FINAL == ret) {

Let's use "ret == RSC_FINAL" for a consistent coding style with other
qemu codes.

> +ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> +QTAILQ_REMOVE(>buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +if (ret == 0) {
> +/* Send failed */
> +return 0;
> +}
> +
> +/* Send current packet */
> +return virtio_net_do_receive(nc, buf, size);
> +} else if (RSC_NO_MATCH == ret) {
> +continue;
> +}

Re: [Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> Since this feature also needs to support IPv6, and there are
> some protocol specific differences difference for IPv4/6 in the header,
> so try to make the interface to be general.
>
> IPv4/6 should set up both the new and old IP/TCP header before invoking
> TCP coalescing, and should also tell the real payload.
>
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed invalid filter
> and is identified as an expected one.
>
> An expected segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. ACK of the segment is in the valid window.
> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>this is to notify upper layer TCP starting retransmission due to the spec.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 127 
> ++--
>  1 file changed, 124 insertions(+), 3 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index cfbac6d..4f77fbe 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -41,6 +41,10 @@
>  
>  #define VIRTIO_HEADER   12/* Virtio net header size */
>  #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
> +#define TCP_WINDOW  65535

The name is confusing, how about TCP_MAX_WINDOW_SIZE ?

> +
> +/* IPv4 max payload, 16 bits in the header */
> +#define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
>  
>  #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
>  
> @@ -1670,13 +1674,130 @@ out:
>  return 0;
>  }
>  
> +static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
> + const uint8_t *buf, struct tcp_header 
> *n_tcp,
> + struct tcp_header *o_tcp)
> +{
> +uint32_t nack, oack;
> +uint16_t nwin, owin;
> +
> +nack = htonl(n_tcp->th_ack);
> +nwin = htons(n_tcp->th_win);
> +oack = htonl(o_tcp->th_ack);
> +owin = htons(o_tcp->th_win);
> +
> +if ((nack - oack) >= TCP_WINDOW) {
> +return RSC_FINAL;
> +} else if (nack == oack) {
> +/* duplicated ack or window probe */
> +if (nwin == owin) {
> +/* duplicated ack, add dup ack count due to whql test up to 1 */
> +
> +if (seg->dup_ack_count == 0) {
> +seg->dup_ack_count++;
> +return RSC_COALESCE;
> +} else {
> +/* Spec says should send it directly */
> +return RSC_FINAL;
> +}
> +} else {
> +/* Coalesce window update */

Need we flush this immediately consider it was a window update?

> +o_tcp->th_win = n_tcp->th_win;
> +return RSC_COALESCE;
> +}
> +} else {

What if nack < oack here?

> +/* pure ack, update ack */
> +o_tcp->th_ack = n_tcp->th_ack;
> +return RSC_COALESCE;
> +}
> +}
> +
> +static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain *chain, NetRscSeg 
> *seg,
> +   const uint8_t *buf, struct tcp_header *n_tcp, uint16_t 
> n_tcp_len,
> +   uint16_t n_data, struct tcp_header *o_tcp, uint16_t o_tcp_len,
> +   uint16_t o_data, uint16_t *p_ip_len, uint16_t max_data)
> +{
> +void *data;
> +uint16_t o_ip_len;
> +uint32_t nseq, oseq;
> +
> +o_ip_len = htons(*p_ip_len);
> +nseq = htonl(n_tcp->th_seq);
> +oseq = htonl(o_tcp->th_seq);
> +

Need to the tcp header check here. And looks like we need also check more:

- Flags
- Data offset
- URG pointer

> +/* Ignore packet with more/larger tcp options */
> +if (n_tcp_len > o_tcp_len) {

What if n_tcp_len < o_tcp_len ?

> +return RSC_FINAL;
> +}
> +
> +/* out of order or retransmitted. */
> +if ((nseq - oseq) > TCP_WINDOW) {
> +return RSC_FINAL;
> +}
> +
> +data = ((uint8_t *)n_tcp) + n_tcp_len;
> +if (nseq == oseq) {
> +if ((0 == o_data) && n_data) {
> +/* From no payload to payload, normal case, not a dup ack or etc 
> */
> +goto coalesce;
> +} else {
> +return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
> +}
> +} else if ((nseq - oseq) != o_data) {
> +/* Not a consistent packet, out of order */
> +return RSC_FINAL;
> +} else {
> +coalesce:
> +if ((o_ip_len + n_data) > max_data) {
> +return RSC_FINAL;
> +}
> +
> +/* Here comes the right data, the payload lengh in v4/v6 is 
> different,
> +   so use the field value to update */
> +*p_ip_len = htons(o_ip_len + n_data); /* Update new data len */
> +o_tcp->th_offset_flags = n_tcp->th_offset_flags; /* Bring 'PUSH' big 
> */

Is it correct? How about URG pointer?

> +o_tcp->th_ack =

Re: [Qemu-devel] [PATCH RFC v2 2/5] vl: Make object_create() public

2016-01-31 Thread Hailiang Zhang


On 2016/2/1 11:05, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

Make the helper object_create() public and fix its first
parameter to accept NULL value.


Looks not very nice. Maybe pass a new predicate func for sanity check it
better.



OK, but here is it better to check if the predicate func is NULL ?

Thanks,
Hailiang



Signed-off-by: zhanghailiang 
Cc: Paolo Bonzini 
---
v2:
  - New patch
---
  include/qemu-common.h | 2 ++
  vl.c  | 4 ++--
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 22b010c..52cf4fd 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -500,4 +500,6 @@ int parse_debug_env(const char *name, int max, int initial);
  const char *qemu_ether_ntoa(const MACAddr *mac);
  void page_size_init(void);

+int object_create(void *opaque, QemuOpts *opts, Error **errp);
+
  #endif
diff --git a/vl.c b/vl.c
index f043009..b21335e 100644
--- a/vl.c
+++ b/vl.c
@@ -2819,7 +2819,7 @@ static bool object_create_delayed(const char *type)
  }


-static int object_create(void *opaque, QemuOpts *opts, Error **errp)
+int object_create(void *opaque, QemuOpts *opts, Error **errp)
  {
  Error *err = NULL;
  char *type = NULL;
@@ -2842,7 +2842,7 @@ static int object_create(void *opaque, QemuOpts *opts, 
Error **errp)
  if (err) {
  goto out;
  }
-if (!type_predicate(type)) {
+if (type_predicate && !type_predicate(type)) {
  goto out;
  }




.

[Qemu-devel] [PATCH 2/2] Passthru CCID card: QOMify

2016-01-31 Thread Cao jin

Signed-off-by: Cao jin 
---
 hw/usb/ccid-card-passthru.c | 10 --
 hw/usb/ccid.h   |  4 
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/usb/ccid-card-passthru.c b/hw/usb/ccid-card-passthru.c
index 9f49c05..cbb7c80 100644
--- a/hw/usb/ccid-card-passthru.c
+++ b/hw/usb/ccid-card-passthru.c
@@ -38,8 +38,6 @@ static const uint8_t DEFAULT_ATR[] = {
  0x13, 0x08
 };
 
-
-#define PASSTHRU_DEV_NAME "ccid-card-passthru"
 #define VSCARD_IN_SIZE 65536
 
 /* maximum size of ATR - from 7816-3 */
@@ -316,7 +314,7 @@ static void ccid_card_vscard_event(void *opaque, int event)
 static void passthru_apdu_from_guest(
 CCIDCardState *base, const uint8_t *apdu, uint32_t len)
 {
-PassthruState *card = DO_UPCAST(PassthruState, base, base);
+PassthruState *card = PASSTHRU_CCID_CARD(base);
 
 if (!card->cs) {
 printf("ccid-passthru: no chardev, discarding apdu length %d\n", len);
@@ -327,7 +325,7 @@ static void passthru_apdu_from_guest(
 
 static const uint8_t *passthru_get_atr(CCIDCardState *base, uint32_t *len)
 {
-PassthruState *card = DO_UPCAST(PassthruState, base, base);
+PassthruState *card = PASSTHRU_CCID_CARD(base);
 
 *len = card->atr_length;
 return card->atr;
@@ -335,7 +333,7 @@ static const uint8_t *passthru_get_atr(CCIDCardState *base, 
uint32_t *len)
 
 static int passthru_initfn(CCIDCardState *base)
 {
-PassthruState *card = DO_UPCAST(PassthruState, base, base);
+PassthruState *card = PASSTHRU_CCID_CARD(base);
 
 card->vscard_in_pos = 0;
 card->vscard_in_hdr = 0;
@@ -399,7 +397,7 @@ static void passthru_class_initfn(ObjectClass *klass, void 
*data)
 }
 
 static const TypeInfo passthru_card_info = {
-.name  = PASSTHRU_DEV_NAME,
+.name  = TYPE_CCID_PASSTHRU,
 .parent= TYPE_CCID_CARD,
 .instance_size = sizeof(PassthruState),
 .class_init= passthru_class_initfn,
diff --git a/hw/usb/ccid.h b/hw/usb/ccid.h
index 315257e..7a3c3f4 100644
--- a/hw/usb/ccid.h
+++ b/hw/usb/ccid.h
@@ -27,6 +27,10 @@ typedef struct CCIDCardInfo CCIDCardInfo;
 #define EMULATED_CCID_CARD(obj) \
  OBJECT_CHECK(EmulatedState, (obj), TYPE_EMULATED_CCID)
 
+#define TYPE_CCID_PASSTHRU "ccid-card-passthru"
+#define PASSTHRU_CCID_CARD(obj) \
+ OBJECT_CHECK(PassthruState, (obj), TYPE_CCID_PASSTHRU)
+
 /*
  * callbacks to be used by the CCID device (hw/usb-ccid.c) to call
  * into the smartcard device (hw/ccid-card-*.c)
-- 
2.1.0

Re: [Qemu-devel] [PATCH RFC v2 2/5] vl: Make object_create() public

2016-01-31 Thread Jason Wang



On 02/01/2016 02:19 PM, Hailiang Zhang wrote:
> On 2016/2/1 11:05, Jason Wang wrote:
>>
>>
>> On 01/27/2016 04:29 PM, zhanghailiang wrote:
>>> Make the helper object_create() public and fix its first
>>> parameter to accept NULL value.
>>
>> Looks not very nice. Maybe pass a new predicate func for sanity check it
>> better.
>>
>
> OK, but here is it better to check if the predicate func is NULL ?
>
> Thanks,
> Hailiang

Not sure, but if you stick to check against NULL, need a separate patch.

Re: [Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread Wei Xu




On 02/01/2016 11:32 AM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

The chain list is initialized when the device is getting realized,
and the entry of the chain will be inserted dynamically according
to protocol type of the network traffic.

All the buffered packets and chain will be destroyed when the
device is going to be unrealized.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c| 22 ++
  include/hw/virtio/virtio-net.h |  1 +
  2 files changed, 23 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a877614..4e9458e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
  return 0;
  }
  
+

+static void virtio_net_rsc_cleanup(VirtIONet *n)
+{
+NetRscChain *chain, *rn_chain;
+NetRscSeg *seg, *rn_seg;
+
+QTAILQ_FOREACH_SAFE(chain, >rsc_chains, next, rn_chain) {
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn_seg) {
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+timer_del(chain->drain_timer);
+timer_free(chain->drain_timer);
+QTAILQ_REMOVE(>rsc_chains, chain, next);
+g_free(chain);
+}

This is suspicious. Looks like chain removing should be in outer loop.

Oops, my bad, thanks jason.




+}
+}
+
  static NetClientInfo net_virtio_info = {
  .type = NET_CLIENT_OPTIONS_KIND_NIC,
  .size = sizeof(NICState),
@@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
  nc = qemu_get_queue(n->nic);
  nc->rxfilter_notify_enabled = 1;
  
+QTAILQ_INIT(>rsc_chains);

  n->qdev = dev;
  register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
  virtio_net_save, virtio_net_load, n);
@@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, 
Error **errp)
  g_free(n->vqs);
  qemu_del_nic(n->nic);
  virtio_cleanup(vdev);
+virtio_net_rsc_cleanup(n);
  }
  
  static void virtio_net_instance_init(Object *obj)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index f3cc25f..6ce8b93 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -59,6 +59,7 @@ typedef struct VirtIONet {
  VirtIONetQueue *vqs;
  VirtQueue *ctrl_vq;
  NICState *nic;
+QTAILQ_HEAD(, NetRscChain) rsc_chains;
  uint32_t tx_timeout;
  int32_t tx_burst;
  uint32_t has_vnet_hdr;

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 PM, Hailiang Zhang wrote:
> On 2016/2/1 11:14, Jason Wang wrote:
>>
>>
>> On 01/27/2016 04:29 PM, zhanghailiang wrote:
>>> We add a new helper function netdev_add_filter(), this function
>>> can help adding a filter object to a netdev.
>>> Besides, we add a is_default member for struct NetFilterState
>>> to indicate whether the filter is default or not.
>>>
>>> Signed-off-by: zhanghailiang 
>>> ---
>>> v2:
>>>   -Re-implement netdev_add_filter() by re-using object_create()
>>>(Jason's suggestion)
>>> ---
>>>   include/net/filter.h |  7 +
>>>   net/filter.c | 80
>>> 
>>>   2 files changed, 87 insertions(+)
>>>
>>> diff --git a/include/net/filter.h b/include/net/filter.h
>>> index af3c53c..ee1c024 100644
>>> --- a/include/net/filter.h
>>> +++ b/include/net/filter.h
>>> @@ -55,6 +55,7 @@ struct NetFilterState {
>>>   char *netdev_id;
>>>   NetClientState *netdev;
>>>   NetFilterDirection direction;
>>> +bool is_default;
>>>   bool enabled;
>>>   QTAILQ_ENTRY(NetFilterState) next;
>>>   };
>>> @@ -74,4 +75,10 @@ ssize_t
>>> qemu_netfilter_pass_to_next(NetClientState *sender,
>>>   int iovcnt,
>>>   void *opaque);
>>>
>>> +void netdev_add_filter(const char *netdev_id,
>>> +   const char *filter_type,
>>> +   const char *id,
>>> +   bool is_default,
>>> +   Error **errp);
>>> +
>>>   #endif /* QEMU_NET_FILTER_H */
>>> diff --git a/net/filter.c b/net/filter.c
>>> index d08a2be..dc7aa9b 100644
>>> --- a/net/filter.c
>>> +++ b/net/filter.c
>>> @@ -214,6 +214,86 @@ static void netfilter_complete(UserCreatable
>>> *uc, Error **errp)
>>>   QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
>>>   }
>>>
>>> +QemuOptsList qemu_filter_opts = {
>>> +.name = "default-filter",
>>> +.head = QTAILQ_HEAD_INITIALIZER(qemu_filter_opts.head),
>>> +.desc = {
>>> +{
>>> +.name = "qom-type",
>>> +.type = QEMU_OPT_STRING,
>>> +},{
>>> +.name = "id",
>>> +.type = QEMU_OPT_STRING,
>>> +},{
>>> +.name = "netdev",
>>> +.type = QEMU_OPT_STRING,
>>> +},{
>>> +.name = "status",
>>> +.type = QEMU_OPT_STRING,
>>> +},
>>> +{ /* end of list */ }
>>> +},
>>> +};
>>> +
>>> +static void filter_set_default_flag(const char *id,
>>> +bool is_default,
>>> +Error **errp)
>>> +{
>>> +Object *obj, *container;
>>> +NetFilterState *nf;
>>> +
>>> +container = object_get_objects_root();
>>> +obj = object_resolve_path_component(container, id);
>>> +if (!obj) {
>>> +error_setg(errp, "object id not found");
>>> +return;
>>> +}
>>> +nf = NETFILTER(obj);
>>> +nf->is_default = is_default;
>>> +}
>>> +
>>> +void netdev_add_filter(const char *netdev_id,
>>> +   const char *filter_type,
>>> +   const char *id,
>>> +   bool is_default,
>>> +   Error **errp)
>>> +{
>>> +NetClientState *nc = qemu_find_netdev(netdev_id);
>>> +char *optarg;
>>> +QemuOpts *opts = NULL;
>>> +Error *err = NULL;
>>> +
>>> +/* FIXME: Not support multiple queues */
>>> +if (!nc || nc->queue_index > 1) {
>>> +return;
>>> +}
>>> +/* Not support vhost-net */
>>> +if (get_vhost_net(nc)) {
>>> +return;
>>> +}
>>> +
>>> +optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
>>> +filter_type, id, netdev_id, is_default ? "disable" :
>>> "enable"
>>
>> Instead of this, I wonder maybe it's better to:
>>
>> - store the default filter property into a pointer to string
>
> Do you mean, pass a string parameter which stores the filter property
> instead of
> assemble it in this helper ?

Yes. E.g just a global string which could be changed by any subsystem.
E.g colo may change it to "filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.

>
>> - colo code may change the pointer to "filter-buffer,status=disable"
>>
>
>> Then, there's no need for lots of codes above:
>> - no need a "is_default" parameter in netdev_add_filter which does not
>> scale consider we may want to have more property in the future
>> - no need to hacking like "qemu_filter_opts"
>
> Yes, we can use qemu_find_opts("object") instead of it.
>
>> - no need to have a special flag like "is_default"
>>
>
> But we have to distinguish the default filter from the common
> filter, use the name (id) to distinguish it ?

What's the reason that you want to distinguish default filters from others?

Thanks

>
> Thanks,
> Hailiang
>
>> Thoughts?
>>
>>> +opts =

Re: [Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets, this is to reduce the
> delay to upper layer protocol stack.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 38 ++
>  1 file changed, 38 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 4f77fbe..93df0d5 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -48,12 +48,17 @@
>  
>  #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
>  
> +/* Purge coalesced packets timer interval */
> +#define RSC_TIMER_INTERVAL  50

Any hints for choosing this as default value? Do we need a property for
user to change this?

> +
>  /* Global statistics */
>  static uint32_t rsc_chain_no_mem;
>  
>  /* Switcher to enable/disable rsc */
>  static bool virtio_net_rsc_bypass;
>  
> +static uint32_t rsc_timeout = RSC_TIMER_INTERVAL;
> +
>  /* Coalesce callback for ipv4/6 */
>  typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
>   const uint8_t *buf, size_t size);
> @@ -1625,6 +1630,35 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  return 0;
>  }
>  
> +static void virtio_net_rsc_purge(void *opq)
> +{
> +int ret = 0;
> +NetRscChain *chain = (NetRscChain *)opq;
> +NetRscSeg *seg, *rn;
> +
> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn) {
> +if (!qemu_can_send_packet(seg->nc)) {
> +/* Should quit or continue? not sure if one or some
> +* of the queues fail would happen, try continue here */

This looks wrong, qemu_can_send_packet() is used for nc's peer not nc
itself.

> +continue;
> +}
> +
> +ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> +QTAILQ_REMOVE(>buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +
> +if (ret == 0) {
> +/* Try next queue */

Try next seg?

> +continue;
> +}

Why need above?

> +}
> +
> +if (!QTAILQ_EMPTY(>buffers)) {
> +timer_mod(chain->drain_timer,
> +  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);

Need stop/start the timer during vm stop/start to save cpu.

> +}
> +}
>  
>  static void virtio_net_rsc_cleanup(VirtIONet *n)
>  {
> @@ -1810,6 +1844,8 @@ static size_t virtio_net_rsc_callback(NetRscChain 
> *chain, NetClientState *nc,
>  if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
>  return 0;
>  } else {
> +timer_mod(chain->drain_timer,
> +  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
>  return size;
>  }
>  }
> @@ -1877,6 +1913,8 @@ static NetRscChain 
> *virtio_net_rsc_lookup_chain(NetClientState *nc,
>  }
>  
>  chain->proto = proto;
> +chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> +  virtio_net_rsc_purge, chain);
>  chain->do_receive = virtio_net_rsc_receive4;
>  
>  QTAILQ_INIT(>buffers);

Re: [Qemu-devel] [PATCH V2] net/traffic-mirror:Add traffic-mirror

2016-01-31 Thread Li Zhijian




On 02/01/2016 10:57 AM, Jason Wang wrote:



On 01/29/2016 09:38 AM, Li Zhijian wrote:



On 01/28/2016 01:44 PM, Jason Wang wrote:



On 01/27/2016 10:40 AM, Zhang Chen wrote:

From: ZhangChen 

Traffic-mirror is a netfilter plugin.
It gives qemu the ability to copy and mirror guest's
net packet. we output packet to chardev.

usage:

-netdev tap,id=hn0
-chardev socket,id=mirror0,host=ip_primary,port=X,server,nowait
-traffic-mirror,id=m0,netdev=hn0,queue=tx/rx/all,outdev=mirror0

Signed-off-by: ZhangChen 
Signed-off-by: Wen Congyang 
Reviewed-by: Yang Hongyang 


Thanks for the patch. Several questions:

- I'm curious about how the patch was tested? Simple setup e.g:

-netdev tap,id=hn0 -device virtio-net-pci,netdev=hn0 -chardev
socket,id=c0,host=localhost,port=,server,nowait -object
traffic-mirror,netdev=hn0,outdev=c0,id=f0 -netdev
socket,id=s0,connect=127.0.0.1: -device e1000,netdev=s0



a strange thing is about "host=localhost", connection is refused at SUSE 11.3 
but
connection is connected successfully at Ubuntu 15.10 if i launch qemu with the
command line above.
I try to launch qemu at three physical machines installed with SUSE 11.3, they 
all
connect failed. But when I specified "host=127.0.0.1", the connection is OK.

I have comfirmed that:
- "localhost have pointed to 127.0.0.1 if I "ping localhost" at SUSE
- "telnet localhost " works at SUSE



does not works for me.

Hi， Jason

I just test the mirror using the command line above, it don't work too.
I am looking to it, and find that seems because the -net socket
problem that
I have ever post a patch  try to fix（refer to ↓）
[Qemu-devel] [PATCH] report a error message if -net socket can not
connect to server
https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg00758.html


Will have a look at this.



after applying this patch, the qemu monitor tell me following message:
(qemu) qemu-system-x86_64: net socket is not connected Connection refused


Maybe two issues. Have you tired to start the mirror on one VM and then
using socket backend to connect it from another VM?


Yes, if i connect the mirror on VM1 using socket backend from another VM2, the 
connection
is established successfully. But on VM2 guest, I can't dump any packet using 
'tcpdump'
That's because in current version code, mirror is not compatible with socket 
backend and
we will fix it in next version.


Best regards.
Li Zhijian






Thanks
Li Zhijian







.

[Qemu-devel] [PATCH] ES1370: QOMify

2016-01-31 Thread Cao jin

Signed-off-by: Cao jin 
---
 hw/audio/es1370.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/audio/es1370.c b/hw/audio/es1370.c
index 592578b..089dd0e 100644
--- a/hw/audio/es1370.c
+++ b/hw/audio/es1370.c
@@ -288,6 +288,10 @@ struct chan_bits {
uint32_t *old_freq, uint32_t *new_freq);
 };
 
+#define TYPE_ES1370 "ES1370"
+#define ES1370(obj) \
+OBJECT_CHECK(ES1370State, (obj), TYPE_ES1370)
+
 static void es1370_dac1_calc_freq (ES1370State *s, uint32_t ctl,
uint32_t *old_freq, uint32_t *new_freq);
 static void es1370_dac2_and_adc_calc_freq (ES1370State *s, uint32_t ctl,
@@ -1013,7 +1017,7 @@ static void es1370_on_reset (void *opaque)
 
 static void es1370_realize(PCIDevice *dev, Error **errp)
 {
-ES1370State *s = DO_UPCAST (ES1370State, dev, dev);
+ES1370State *s = ES1370(dev);
 uint8_t *c = s->dev.config;
 
 c[PCI_STATUS + 1] = PCI_STATUS_DEVSEL_SLOW >> 8;
@@ -1059,7 +1063,7 @@ static void es1370_class_init (ObjectClass *klass, void 
*data)
 }
 
 static const TypeInfo es1370_info = {
-.name  = "ES1370",
+.name  = TYPE_ES1370,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof (ES1370State),
 .class_init= es1370_class_init,
-- 
2.1.0

Re: [Qemu-devel] [RFC Patch v2 06/10] virtio-net rsc: IPv4 checksum

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> If a field in the IPv4 header is modified, then the checksum
> have to be recalculated before sending it out.

This in fact breaks bisection. I think you need either squash this into
previous patch or introduce virtio_net_rsc_ipv4_checksum() as a helper
before the patch of ipv4 coalescing.

>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 93df0d5..88fc4f8 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1630,6 +1630,18 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  return 0;
>  }
>  
> +static void virtio_net_rsc_ipv4_checksum(NetRscSeg *seg)
> +{
> +uint32_t sum;
> +struct ip_header *ip;
> +
> +ip = (struct ip_header *)(seg->buf + IP_OFFSET);
> +
> +ip->ip_sum = 0;
> +sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
> +ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
> +}
> +
>  static void virtio_net_rsc_purge(void *opq)
>  {
>  int ret = 0;
> @@ -1643,6 +1655,10 @@ static void virtio_net_rsc_purge(void *opq)
>  continue;
>  }
>  
> +if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
> +virtio_net_rsc_ipv4_checksum(seg);
> +}
> +
>  ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
>  QTAILQ_REMOVE(>buffers, seg, next);
>  g_free(seg->buf);
> @@ -1853,6 +1869,9 @@ static size_t virtio_net_rsc_callback(NetRscChain 
> *chain, NetClientState *nc,
>  QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
>  ret = coalesce(chain, seg, buf, size);
>  if (RSC_FINAL == ret) {
> +if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
> +virtio_net_rsc_ipv4_checksum(seg);
> +}
>  ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
>  QTAILQ_REMOVE(>buffers, seg, next);
>  g_free(seg->buf);

Re: [Qemu-devel] [RFC Patch v2 08/10] virtio-net rsc: Sanity check & More bypass cases check

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> More general exception cases check
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
>
> Signed-off-by: Wei Xu 

Let's squash this into previous patches too for a better bisection
ability and complete implementation.

> ---
>  hw/net/virtio-net.c | 44 
>  1 file changed, 44 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b0987d0..9b44762 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1948,6 +1948,46 @@ static size_t virtio_net_rsc_drain_one(NetRscChain 
> *chain, NetClientState *nc,
>  
>  return virtio_net_do_receive(nc, buf, size);
>  }
> +
> +static int32_t virtio_net_rsc_filter4(NetRscChain *chain, struct ip_header 
> *ip,
> +  const uint8_t *buf, size_t size)

This function checks for ip header, so need rename it to something like
"virtio_net_rsc_ipv4_filter()"

> +{
> +uint16_t ip_len;
> +
> +if (size < (TCP4_OFFSET + sizeof(tcp_header))) {
> +return RSC_BYPASS;
> +}
> +
> +/* Not an ipv4 one */
> +if (0x4 != ((0xF0 & ip->ip_ver_len) >> 4)) {

Let's don't use magic value like 0x4 here.

> +return RSC_BYPASS;
> +}
> +
> +/* Don't handle packets with ip option */
> +if (5 != (0xF & ip->ip_ver_len)) {
> +return RSC_BYPASS;
> +}
> +
> +/* Don't handle packets with ip fragment */
> +if (!(htons(ip->ip_off) & IP_DF)) {
> +return RSC_BYPASS;
> +}
> +
> +if (ip->ip_p != IPPROTO_TCP) {
> +return RSC_BYPASS;
> +}
> +
> +/* Sanity check */
> +ip_len = htons(ip->ip_len);
> +if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
> +|| ip_len > (size - IP_OFFSET)) {
> +return RSC_BYPASS;
> +}
> +
> +return RSC_WANT;
> +}
> +
> +
>  static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
>const uint8_t *buf, size_t size)
>  {
> @@ -1958,6 +1998,10 @@ static size_t virtio_net_rsc_receive4(void *opq, 
> NetClientState* nc,
>  chain = (NetRscChain *)opq;
>  ip = (struct ip_header *)(buf + IP_OFFSET);
>  
> +if (RSC_WANT != virtio_net_rsc_filter4(chain, ip, buf, size)) {
> +return virtio_net_do_receive(nc, buf, size);
> +}
> +
>  ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
>  (0xF & ip->ip_ver_len) << 2);
>  if (RSC_BYPASS == ret) {

Re: [Qemu-devel] [PATCH 3/3] ppc: include timebase in migration stream for g3beige/mac99 machines

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 08:10:08PM +, Mark Cave-Ayland wrote:
> On 31/01/16 19:58, Peter Maydell wrote:
> 
> > On 31 January 2016 at 19:19, Mark Cave-Ayland
> >  wrote:
> >> Signed-off-by: Mark Cave-Ayland 
> >> ---
> >>  hw/ppc/mac_newworld.c |4 
> >>  hw/ppc/mac_oldworld.c |4 
> >>  2 files changed, 8 insertions(+)
> >>
> >> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> >> index f95086b..3283f1d 100644
> >> --- a/hw/ppc/mac_newworld.c
> >> +++ b/hw/ppc/mac_newworld.c
> >> @@ -179,6 +179,7 @@ static void ppc_core99_init(MachineState *machine)
> >>  int *token = g_new(int, 1);
> >>  hwaddr nvram_addr = 0xFFF04000;
> >>  uint64_t tbfreq;
> >> +PPCTimebase *tb;
> >>
> >>  linux_boot = (kernel_filename != NULL);
> >>
> >> @@ -201,6 +202,9 @@ static void ppc_core99_init(MachineState *machine)
> >>  /* Set time-base frequency to 100 Mhz */
> >>  cpu_ppc_tb_init(env, TBFREQ);
> >>  qemu_register_reset(ppc_core99_reset, cpu);
> >> +
> >> +tb = g_malloc0(sizeof(PPCTimebase));
> >> +vmstate_register(NULL, -1, _ppc_timebase, tb);
> > 
> > Is there no way to avoid the vmstate_register here (ie to
> > tie the migration data to an actual device or CPU object) ?
> 
> Not exactly that I know of - although I shamelessly borrowed this part
> from similar code in spapr which has this comment:
> 
> /* FIXME: Should register things through the MachineState's qdev
>  * interface, this is a legacy from the sPAPREnvironment structure
>  * which predated MachineState but had a similar function */
> 
> Is this something that is now possible?

Well, it's certainly possible to do better than this.  You want to
make a vmstate_g3beige and vmstate_mac99 which contain all the machine
level things to migrate for these machines, similar to vmstate_spapr.
They will be attached to the MachineState object.

That will at least mean that if more things need to get added to
migration for these machines, then additional vmstate_register() calls
won't be needed.

I'm not sure if there's a better way to register a vmstate for a
machine type.  I thought there was, but I couldn't spot it in a quick
lock.

Peter,

I believe this does need to be attached to the machine, not to the
cpu, even though the cpu would seem to make more sense on a first
look.  The reason is that attaching it to the cpu means it will be
transferred separately for each cpu, and unless we're super-careful
about timing the destination cpus could end up with slightly different
values.  That would be bad, because ppc has a pretty strong
requirement that the timebases be synchronized across all cpus in an
smp system.  The means of initially accomplishing that vary by
platform - usually there's some board level register to freeze /
resume all the timebases - but however it's been done, we don't want
to mess it up on migration.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/3] ppc: add support for timebase migration on non-PPC hosts

2016-01-31 Thread David Gibson

On Sun, Jan 31, 2016 at 07:19:35PM +, Mark Cave-Ayland wrote:
> This patch provides support for migration of the PPC guest timebase on non-PPC
> host architectures (i.e those using QEMU's virtual emulated timebase).
> 
> Signed-off-by: Mark Cave-Ayland 

We shouldn't need an explicit test for a ppc host.  Instead we should
never be touching any host-dependent ticks values, only using host
side interfaces which work in realtime units like ns.

Worse, the ppc host variants here will still be wrong if the host has
a different timebase frequency to the guest, which will always be true
for a g3beige (16MHz) on a modern ppc host (512 MHz).


> ---
>  hw/ppc/ppc.c |   33 +++--
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 19f4570..9b80c1d 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -832,6 +832,15 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t 
> freq)
>  cpu_ppc_store_purr(cpu, 0xULL);
>  }
>  
> +static int host_cpu_is_ppc(void)
> +{
> +#if defined(_ARCH_PPC)
> +return -1;
> +#else
> +return 0;
> +#endif
> +}
> +
>  static void timebase_pre_save(void *opaque)
>  {
>  PPCTimebase *tb = opaque;
> @@ -844,11 +853,16 @@ static void timebase_pre_save(void *opaque)
>  }
>  
>  tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> -/*
> - * tb_offset is only expected to be changed by migration so
> - * there is no need to update it from KVM here
> - */
> -tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> +
> +if (host_cpu_is_ppc()) {
> +/*
> + * tb_offset is only expected to be changed by migration so
> + * there is no need to update it from KVM here
> + */
> +tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> +} else {
> +tb->guest_timebase = cpu_ppc_load_tbl(_ppc_cpu->env);
> +}
>  }
>  
>  static int timebase_post_load(void *opaque, int version_id)
> @@ -879,7 +893,14 @@ static int timebase_post_load(void *opaque, int 
> version_id)
>   NANOSECONDS_PER_SECOND);
>  guest_tb = tb_remote->guest_timebase + migration_duration_tb;
>  
> -tb_off_adj = guest_tb - cpu_get_host_ticks();
> +if (host_cpu_is_ppc()) {
> +/* Hardware timebase */
> +tb_off_adj = guest_tb - cpu_get_host_ticks();
> +} else {
> +/* Software timebase */
> +tb_off_adj = guest_tb - 
> muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> + freq, get_ticks_per_sec());
> +}
>  
>  tb_off = first_ppc_cpu->env.tb_env->tb_offset;
>  trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] Migrating decrementer

2016-01-31 Thread David Gibson

On Tue, Jan 26, 2016 at 10:31:19PM +, Mark Cave-Ayland wrote:
> On 25/01/16 11:10, David Gibson wrote:
> 
> > Um.. so the migration duration is a complete red herring, regardless
> > of the units.
> > 
> > Remember, we only ever compute the guest timebase value at the moment
> > the guest requests it - actually maintaining a current timebase value
> > makes sense in hardware, but would be nuts in software.
> > 
> > The timebase is a function of real, wall-clock time, and the migration
> > destination has a notion of wall-clock time without reference to the
> > source.
> > 
> > So what you need to transmit for migration is enough information to
> > compute the guest timebase from real-time - essentially just an offset
> > between real-time and the timebase.
> > 
> > The guest can potentially observe the migration duration as a jump in
> > timebase values, but qemu doesn't need to do any calculations with it.
> 
> Thanks for more pointers - I think I'm slowly getting there. My current
> thoughts are that the basic migration algorithm is doing the right thing
> in that it works out the number of host ticks different between source
> and destination.

Sorry, I've take a while to reply to this.  I realised the tb
migration didn't work the way I thought it did, so I've had to get my
head around what's actually going on.

I had thought that it transferred only meta-information telling the
destination how to calculate the timebase, without actually working
out the timebase value at any particular moment.

In fact, what it sends is basically the tuple of (timebase, realtime)
at the point of sending the migration stream.  The destination then
uses that to work out how to compute the timebase from realtime there.

I'm not convinced this is a great approach, but it should basically
work.  However, as you've seen there are also some Just Plain Bugs in
the logic for this.

> I have a slight query with this section of code though:
> 
> migration_duration_tb = muldiv64(migration_duration_ns, freq,
>  NANOSECONDS_PER_SECOND);
> 
> This is not technically correct on TCG x86 since the timebase is the x86
> TSC which is running somewhere in the GHz range, compared to freq which
> is hard-coded to 16MHz.

Um.. what?  AFAICT that line doesn't have any reference to the TSC
speed.  Just ns and the (guest) tb).  Also 16MHz is only for the
oldworld Macs - modern ppc cpus have the TB frequency architected as
512MHz.

> However this doesn't seem to matter because the
> timebase adjustment is limited to a maximum of 1s. Why should this be if
> the timebase is supposed to be free running as you mentioned in a
> previous email?

AFAICT, what it's doing here is assuming that if the migration
duration is >1s (or appears to be >1s) then it's because the host
clocks are out of sync and so just capping the elapsed tb time at 1s.

That's just wrong, IMO.  1s is a long downtime for a live migration,
but it's not impossible, and it will happen nearly always in the
scenariou you've discussed of manually loading the migration stream
from a file.

But more to the point, trying to maintain correctness of the timebase
when the hosts are out of sync is basically futile.  There's no other
reference we can use, so all we can achieve is getting a different
wrong value from what we'd get by blindly trusting the host clock.

We do need to constrain the tb from going backwards, because that will
cause chaos on the guest, but otherwise we should just trust the host
clock and ditch that 1s clamp.  If the hosts are out of sync, then
guest time will jump, but that was always going to happen.

> AFAICT the main problem on TCG x86 is that post-migration the timebase
> calculated by cpu_ppc_get_tb() is incorrect:
> 
> uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset)
> {
> /* TB time in tb periods */
> return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) +
> tb_offset;
> }

So the problem here is that get_ticks_per_sec() (which always returns
1,000,000,000) is not talking about the same ticks as
cpu_get_host_ticks().  That may not have been true when this code was
written.

> For a typical savevm/loadvm pair I see something like this:
> 
> savevm:
> 
> tb->guest_timebase = 26281306490558
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
> 
> loadvm:
> 
> cpu_get_host_ticks() = 26289847005259
> tb_off_adj = -8540514701
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
> cpu_ppc_get_tb() = -15785159386
> 
> But as cpu_ppc_get_tb() uses QEMU_CLOCK_VIRTUAL for vmclk we end up with
> a negative number for the timebase since the virtual clock is dwarfed by
> the number of TSC ticks calculated for tb_off_adj. This will work on a
> PPC host though since cpu_host_get_ticks() is also derived from the
> timebase.

Yeah, we shouldn't be using cpu_host_get_ticks() at all - or anything
else which depends on a host frequency.  We should only be using qemu

[Qemu-devel] [PULL 04/40] macio: add dma_active to VMStateDescription

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Make sure that we include the value of dma_active in the migration stream.

Signed-off-by: Mark Cave-Ayland 
Acked-by: John Snow 
Signed-off-by: David Gibson 
---
 hw/ide/macio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index bfdc377..1725e5b 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -517,11 +517,12 @@ static const MemoryRegionOps pmac_ide_ops = {
 
 static const VMStateDescription vmstate_pmac = {
 .name = "ide",
-.version_id = 3,
+.version_id = 4,
 .minimum_version_id = 0,
 .fields = (VMStateField[]) {
 VMSTATE_IDE_BUS(bus, MACIOIDEState),
 VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
+VMSTATE_BOOL(dma_active, MACIOIDEState),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.5.0

[Qemu-devel] [PULL 02/40] target-ppc: use cpu_write_xer() helper in cpu_post_load

2016-01-31 Thread David Gibson

From: Mark Cave-Ayland 

Otherwise some internal xer variables fail to get set post-migration.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: David Gibson 
---
 target-ppc/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 8e30b7a..8cabc77 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -169,7 +169,7 @@ static int cpu_post_load(void *opaque, int version_id)
 env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
 env->lr = env->spr[SPR_LR];
 env->ctr = env->spr[SPR_CTR];
-env->xer = env->spr[SPR_XER];
+cpu_write_xer(env, env->spr[SPR_XER]);
 #if defined(TARGET_PPC64)
 env->cfar = env->spr[SPR_CFAR];
 #endif
-- 
2.5.0

[Qemu-devel] [PULL 08/40] spapr: Remove rtas_st_buffer_direct()

2016-01-31 Thread David Gibson

rtas_st_buffer_direct() is a not particularly useful wrapper around
cpu_physical_memory_write().  All the callers are in
rtas_ibm_configure_connector, where it's better handled by local helper.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_rtas.c| 17 ++---
 include/hw/ppc/spapr.h |  8 
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index ab11b32..19e903d 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -506,6 +506,13 @@ out:
 #define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
 #define CC_WA_LEN 4096
 
+static void configure_connector_st(target_ulong addr, target_ulong offset,
+   const void *buf, size_t len)
+{
+cpu_physical_memory_write(ppc64_phys_to_real(addr + offset),
+  buf, MIN(len, CC_WA_LEN - offset));
+}
+
 static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
  sPAPRMachineState *spapr,
  uint32_t token, uint32_t nargs,
@@ -571,8 +578,7 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 /* provide the name of the next OF node */
 wa_offset = CC_VAL_DATA_OFFSET;
 rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
-rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
-  (uint8_t *)name, strlen(name) + 1);
+configure_connector_st(wa_addr, wa_offset, name, strlen(name) + 1);
 resp = SPAPR_DR_CC_RESPONSE_NEXT_CHILD;
 break;
 case FDT_END_NODE:
@@ -597,8 +603,7 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 /* provide the name of the next OF property */
 wa_offset = CC_VAL_DATA_OFFSET;
 rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
-rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
-  (uint8_t *)name, strlen(name) + 1);
+configure_connector_st(wa_addr, wa_offset, name, strlen(name) + 1);
 
 /* provide the length and value of the OF property. data gets
  * placed immediately after NULL terminator of the OF property's
@@ -607,9 +612,7 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 wa_offset += strlen(name) + 1,
 rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
 rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
-rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
-  (uint8_t *)((struct fdt_property 
*)prop)->data,
-  prop_len);
+configure_connector_st(wa_addr, wa_offset, prop->data, prop_len);
 resp = SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY;
 break;
 case FDT_END:
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 1e10fc9..1f9e722 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -506,14 +506,6 @@ static inline void rtas_st(target_ulong phys, int n, 
uint32_t val)
 stl_be_phys(_space_memory, ppc64_phys_to_real(phys + 4*n), val);
 }
 
-static inline void rtas_st_buffer_direct(target_ulong phys,
- target_ulong phys_len,
- uint8_t *buffer, uint16_t buffer_len)
-{
-cpu_physical_memory_write(ppc64_phys_to_real(phys), buffer,
-  MIN(buffer_len, phys_len));
-}
-
 typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPRMachineState *sm,
   uint32_t token,
   uint32_t nargs, target_ulong args,
-- 
2.5.0

[Qemu-devel] [PULL 01/40] target-ppc: Use sensible POWER8/POWER8E versions

2016-01-31 Thread David Gibson

From: Benjamin Herrenschmidt 

We never released anything older than POWER8 DD2.0 and POWER8E DD2.1,
so let's use these versions, without that some firmware or Linux code
might fail to use some HW features that were non functional in earlier
internal only spins of the chip.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: David Gibson 
---
 target-ppc/cpu-models.c | 12 ++--
 target-ppc/cpu-models.h |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index 884e31d..ed005d7 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1139,10 +1139,10 @@
 "POWER7 v2.3")
 POWERPC_DEF("POWER7+_v2.1",  CPU_POWERPC_POWER7P_v21,POWER7,
 "POWER7+ v2.1")
-POWERPC_DEF("POWER8E_v1.0",  CPU_POWERPC_POWER8E_v10,POWER8,
-"POWER8E v1.0")
-POWERPC_DEF("POWER8_v1.0",   CPU_POWERPC_POWER8_v10, POWER8,
-"POWER8 v1.0")
+POWERPC_DEF("POWER8E_v2.1",  CPU_POWERPC_POWER8E_v21,POWER8,
+"POWER8E v2.1")
+POWERPC_DEF("POWER8_v2.0",   CPU_POWERPC_POWER8_v20, POWER8,
+"POWER8 v2.0")
 POWERPC_DEF("970_v2.2",  CPU_POWERPC_970_v22,970,
 "PowerPC 970 v2.2")
 POWERPC_DEF("970fx_v1.0",CPU_POWERPC_970FX_v10,  970,
@@ -1390,8 +1390,8 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
 { "POWER5gs", "POWER5+_v2.1" },
 { "POWER7", "POWER7_v2.3" },
 { "POWER7+", "POWER7+_v2.1" },
-{ "POWER8E", "POWER8E_v1.0" },
-{ "POWER8", "POWER8_v1.0" },
+{ "POWER8E", "POWER8E_v2.1" },
+{ "POWER8", "POWER8_v2.0" },
 { "970", "970_v2.2" },
 { "970fx", "970fx_v3.1" },
 { "970mp", "970mp_v1.1" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index 9d80e72..2992427 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -557,9 +557,9 @@ enum {
 CPU_POWERPC_POWER7P_BASE   = 0x004A,
 CPU_POWERPC_POWER7P_v21= 0x004A0201,
 CPU_POWERPC_POWER8E_BASE   = 0x004B,
-CPU_POWERPC_POWER8E_v10= 0x004B0100,
+CPU_POWERPC_POWER8E_v21= 0x004B0201,
 CPU_POWERPC_POWER8_BASE= 0x004D,
-CPU_POWERPC_POWER8_v10 = 0x004D0100,
+CPU_POWERPC_POWER8_v20 = 0x004D0200,
 CPU_POWERPC_970_v22= 0x00390202,
 CPU_POWERPC_970FX_v10  = 0x00391100,
 CPU_POWERPC_970FX_v20  = 0x003C0200,
-- 
2.5.0

[Qemu-devel] [PULL 30/40] target-ppc: Convert mmu-hash{32, 64}.[ch] from CPUPPCState to PowerPCCPU

2016-01-31 Thread David Gibson

Like a lot of places these files include a mixture of functions taking
both the older CPUPPCState *env and newer PowerPCCPU *cpu.  Move a step
closer to cleaning this up by standardizing on PowerPCCPU, except for the
helper_* functions which are called with the CPUPPCState * from tcg.

Callers and some related functions are updated as well, the boundaries of
what's changed here are a bit arbitrary.

Signed-off-by: David Gibson 
Reviewed-by: Laurent Vivier 
Reviewed-by: Alexander Graf 
---
 hw/ppc/spapr_hcall.c| 31 ++-
 target-ppc/kvm.c|  2 +-
 target-ppc/mmu-hash32.c | 68 +++--
 target-ppc/mmu-hash32.h | 30 ++-
 target-ppc/mmu-hash64.c | 80 +
 target-ppc/mmu-hash64.h | 21 ++---
 target-ppc/mmu_helper.c | 13 
 7 files changed, 136 insertions(+), 109 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 093d426..a53bd2f 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -161,7 +161,7 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 pte_index &= ~7ULL;
 token = ppc_hash64_start_access(cpu, pte_index);
 for (; index < 8; index++) {
-if ((ppc_hash64_load_hpte0(env, token, index) & HPTE64_V_VALID) == 
0) {
+if (!(ppc_hash64_load_hpte0(cpu, token, index) & HPTE64_V_VALID)) {
 break;
 }
 }
@@ -171,14 +171,14 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 }
 } else {
 token = ppc_hash64_start_access(cpu, pte_index);
-if (ppc_hash64_load_hpte0(env, token, 0) & HPTE64_V_VALID) {
+if (ppc_hash64_load_hpte0(cpu, token, 0) & HPTE64_V_VALID) {
 ppc_hash64_stop_access(token);
 return H_PTEG_FULL;
 }
 ppc_hash64_stop_access(token);
 }
 
-ppc_hash64_store_hpte(env, pte_index + index,
+ppc_hash64_store_hpte(cpu, pte_index + index,
   pteh | HPTE64_V_HPTE_DIRTY, ptel);
 
 args[0] = pte_index + index;
@@ -192,11 +192,12 @@ typedef enum {
 REMOVE_HW = 3,
 } RemoveResult;
 
-static RemoveResult remove_hpte(CPUPPCState *env, target_ulong ptex,
+static RemoveResult remove_hpte(PowerPCCPU *cpu, target_ulong ptex,
 target_ulong avpn,
 target_ulong flags,
 target_ulong *vp, target_ulong *rp)
 {
+CPUPPCState *env = >env;
 uint64_t token;
 target_ulong v, r, rb;
 
@@ -204,9 +205,9 @@ static RemoveResult remove_hpte(CPUPPCState *env, 
target_ulong ptex,
 return REMOVE_PARM;
 }
 
-token = ppc_hash64_start_access(ppc_env_get_cpu(env), ptex);
-v = ppc_hash64_load_hpte0(env, token, 0);
-r = ppc_hash64_load_hpte1(env, token, 0);
+token = ppc_hash64_start_access(cpu, ptex);
+v = ppc_hash64_load_hpte0(cpu, token, 0);
+r = ppc_hash64_load_hpte1(cpu, token, 0);
 ppc_hash64_stop_access(token);
 
 if ((v & HPTE64_V_VALID) == 0 ||
@@ -216,7 +217,7 @@ static RemoveResult remove_hpte(CPUPPCState *env, 
target_ulong ptex,
 }
 *vp = v;
 *rp = r;
-ppc_hash64_store_hpte(env, ptex, HPTE64_V_HPTE_DIRTY, 0);
+ppc_hash64_store_hpte(cpu, ptex, HPTE64_V_HPTE_DIRTY, 0);
 rb = compute_tlbie_rb(v, r, ptex);
 ppc_tlb_invalidate_one(env, rb);
 return REMOVE_SUCCESS;
@@ -225,13 +226,12 @@ static RemoveResult remove_hpte(CPUPPCState *env, 
target_ulong ptex,
 static target_ulong h_remove(PowerPCCPU *cpu, sPAPRMachineState *spapr,
  target_ulong opcode, target_ulong *args)
 {
-CPUPPCState *env = >env;
 target_ulong flags = args[0];
 target_ulong pte_index = args[1];
 target_ulong avpn = args[2];
 RemoveResult ret;
 
-ret = remove_hpte(env, pte_index, avpn, flags,
+ret = remove_hpte(cpu, pte_index, avpn, flags,
   [0], [1]);
 
 switch (ret) {
@@ -272,7 +272,6 @@ static target_ulong h_remove(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 static target_ulong h_bulk_remove(PowerPCCPU *cpu, sPAPRMachineState *spapr,
   target_ulong opcode, target_ulong *args)
 {
-CPUPPCState *env = >env;
 int i;
 
 for (i = 0; i < H_BULK_REMOVE_MAX_BATCH; i++) {
@@ -294,7 +293,7 @@ static target_ulong h_bulk_remove(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 return H_PARAMETER;
 }
 
-ret = remove_hpte(env, *tsh & H_BULK_REMOVE_PTEX, tsl,
+ret = remove_hpte(cpu, *tsh & H_BULK_REMOVE_PTEX, tsl,
   (*tsh & H_BULK_REMOVE_FLAGS) >> 26,
   , );
 
@@ -331,8 +330,8 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 }
 
 token = ppc_hash64_start_access(cpu, pte_index);
-v =

[Qemu-devel] [PULL 13/40] pseries: Clean up error handling in spapr_validate_node_memory()

2016-01-31 Thread David Gibson

Use error_setg() and return an error, rather than using an explicit exit().

Also improve messages, and be more explicit about which constraint failed.

Signed-off-by: David Gibson 
Reviewed-by: Bharata B Rao 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 61653ae..4e6ee6d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1699,27 +1699,34 @@ static void 
spapr_create_lmb_dr_connectors(sPAPRMachineState *spapr)
  * to SPAPR_MEMORY_BLOCK_SIZE(256MB), then refuse to start the guest
  * since we can't support such unaligned sizes with DRCONF_MEMORY.
  */
-static void spapr_validate_node_memory(MachineState *machine)
+static void spapr_validate_node_memory(MachineState *machine, Error **errp)
 {
 int i;
 
-if (machine->maxram_size % SPAPR_MEMORY_BLOCK_SIZE ||
-machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
-error_report("Can't support memory configuration where RAM size "
- "0x" RAM_ADDR_FMT " or maxmem size "
- "0x" RAM_ADDR_FMT " isn't aligned to %llu MB",
- machine->ram_size, machine->maxram_size,
- SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
-exit(EXIT_FAILURE);
+if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
+error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
+   " is not aligned to %llu MiB",
+   machine->ram_size,
+   SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+return;
+}
+
+if (machine->maxram_size % SPAPR_MEMORY_BLOCK_SIZE) {
+error_setg(errp, "Maximum memory size 0x" RAM_ADDR_FMT
+   " is not aligned to %llu MiB",
+   machine->ram_size,
+   SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+return;
 }
 
 for (i = 0; i < nb_numa_nodes; i++) {
 if (numa_info[i].node_mem % SPAPR_MEMORY_BLOCK_SIZE) {
-error_report("Can't support memory configuration where memory size"
- " %" PRIx64 " of node %d isn't aligned to %llu MB",
- numa_info[i].node_mem, i,
- SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
-exit(EXIT_FAILURE);
+error_setg(errp,
+   "Node %d memory size 0x%" PRIx64
+   " is not aligned to %llu MiB",
+   i, numa_info[i].node_mem,
+   SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+return;
 }
 }
 }
@@ -1809,7 +1816,7 @@ static void ppc_spapr_init(MachineState *machine)
   XICS_IRQS);
 
 if (smc->dr_lmb_enabled) {
-spapr_validate_node_memory(machine);
+spapr_validate_node_memory(machine, _fatal);
 }
 
 /* init CPUs */
-- 
2.5.0

[Qemu-devel] [PULL 39/40] target-ppc: Make every FPSCR_ macro have a corresponding FP_ macro

2016-01-31 Thread David Gibson

From: James Clarke 

Signed-off-by: James Clarke 
Signed-off-by: David Gibson 
---
 target-ppc/cpu.h | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 0820390..f300c86 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -687,24 +687,37 @@ enum {
 
 #define FP_FX  (1ull << FPSCR_FX)
 #define FP_FEX (1ull << FPSCR_FEX)
+#define FP_VX  (1ull << FPSCR_VX)
 #define FP_OX  (1ull << FPSCR_OX)
-#define FP_OE  (1ull << FPSCR_OE)
 #define FP_UX  (1ull << FPSCR_UX)
-#define FP_UE  (1ull << FPSCR_UE)
-#define FP_XX  (1ull << FPSCR_XX)
-#define FP_XE  (1ull << FPSCR_XE)
 #define FP_ZX  (1ull << FPSCR_ZX)
-#define FP_ZE  (1ull << FPSCR_ZE)
-#define FP_VX  (1ull << FPSCR_VX)
+#define FP_XX  (1ull << FPSCR_XX)
 #define FP_VXSNAN  (1ull << FPSCR_VXSNAN)
 #define FP_VXISI   (1ull << FPSCR_VXISI)
-#define FP_VXIMZ   (1ull << FPSCR_VXIMZ)
-#define FP_VXZDZ   (1ull << FPSCR_VXZDZ)
 #define FP_VXIDI   (1ull << FPSCR_VXIDI)
+#define FP_VXZDZ   (1ull << FPSCR_VXZDZ)
+#define FP_VXIMZ   (1ull << FPSCR_VXIMZ)
 #define FP_VXVC(1ull << FPSCR_VXVC)
+#define FP_FR  (1ull << FSPCR_FR)
+#define FP_FI  (1ull << FPSCR_FI)
+#define FP_C   (1ull << FPSCR_C)
+#define FP_FL  (1ull << FPSCR_FL)
+#define FP_FG  (1ull << FPSCR_FG)
+#define FP_FE  (1ull << FPSCR_FE)
+#define FP_FU  (1ull << FPSCR_FU)
+#define FP_FPCC(FP_FL | FP_FG | FP_FE | FP_FU)
+#define FP_FPRF(FP_C  | FP_FL | FP_FG | FP_FE | FP_FU)
+#define FP_VXSOFT  (1ull << FPSCR_VXSOFT)
+#define FP_VXSQRT  (1ull << FPSCR_VXSQRT)
 #define FP_VXCVI   (1ull << FPSCR_VXCVI)
 #define FP_VE  (1ull << FPSCR_VE)
-#define FP_FI  (1ull << FPSCR_FI)
+#define FP_OE  (1ull << FPSCR_OE)
+#define FP_UE  (1ull << FPSCR_UE)
+#define FP_ZE  (1ull << FPSCR_ZE)
+#define FP_XE  (1ull << FPSCR_XE)
+#define FP_NI  (1ull << FPSCR_NI)
+#define FP_RN1 (1ull << FPSCR_RN1)
+#define FP_RN  (1ull << FPSCR_RN)
 
 /*/
 /* Vector status and control register */
-- 
2.5.0

[Qemu-devel] [PULL 24/40] target-ppc: gdbstub: fix spe registers for little-endian guests

2016-01-31 Thread David Gibson

From: Greg Kurz 

Let's reuse the ppc_maybe_bswap_register() helper, like we already do
with the general registers.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 1174141..fce68f3 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8848,6 +8848,7 @@ static int gdb_get_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 if (n < 32) {
 #if defined(TARGET_PPC64)
 stl_p(mem_buf, env->gpr[n] >> 32);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 #else
 stl_p(mem_buf, env->gprh[n]);
 #endif
@@ -8855,10 +8856,12 @@ static int gdb_get_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 }
 if (n == 32) {
 stq_p(mem_buf, env->spe_acc);
+ppc_maybe_bswap_register(env, mem_buf, 8);
 return 8;
 }
 if (n == 33) {
 stl_p(mem_buf, env->spe_fscr);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 return 0;
@@ -8869,7 +8872,11 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 if (n < 32) {
 #if defined(TARGET_PPC64)
 target_ulong lo = (uint32_t)env->gpr[n];
-target_ulong hi = (target_ulong)ldl_p(mem_buf) << 32;
+target_ulong hi;
+
+ppc_maybe_bswap_register(env, mem_buf, 4);
+
+hi = (target_ulong)ldl_p(mem_buf) << 32;
 env->gpr[n] = lo | hi;
 #else
 env->gprh[n] = ldl_p(mem_buf);
@@ -8877,10 +8884,12 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 4;
 }
 if (n == 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
 env->spe_acc = ldq_p(mem_buf);
 return 8;
 }
 if (n == 33) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 env->spe_fscr = ldl_p(mem_buf);
 return 4;
 }
-- 
2.5.0

Re: [Qemu-devel] [PATCH v2 0/3] qemu-nbd.texi formatting, grammar and completeness fixes

2016-01-31 Thread Paolo Bonzini



On 31/01/2016 14:25, Sitsofe Wheeler wrote:
>> > Thanks, queued.  Will send a pull request some time next week.
> Just checking - did this one get lost? Nothing's popped up in the QEMU
> git repos yet...

Hmm, yes.  Thanks for telling me.

Paolo

[Qemu-devel] [PULL 09/40] spapr: Remove abuse of rtas_ld() in h_client_architecture_support

2016-01-31 Thread David Gibson

h_client_architecture_support() uses rtas_ld() for general purpose memory
access, despite the fact that it's not an RTAS routine at all and rtas_ld
makes things more awkward.

Clean this up by replacing rtas_ld() calls with appropriate ldXX_phys()
calls.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_hcall.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 51083cd..fdd7fea 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -862,7 +862,8 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
   target_ulong opcode,
   target_ulong *args)
 {
-target_ulong list = args[0], ov_table;
+target_ulong list = ppc64_phys_to_real(args[0]);
+target_ulong ov_table, ov5;
 PowerPCCPUClass *pcc_ = POWERPC_CPU_GET_CLASS(cpu_);
 CPUState *cs;
 bool cpu_match = false, cpu_update = true, memory_update = false;
@@ -876,9 +877,9 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
 for (counter = 0; counter < 512; ++counter) {
 uint32_t pvr, pvr_mask;
 
-pvr_mask = rtas_ld(list, 0);
+pvr_mask = ldl_be_phys(_space_memory, list);
 list += 4;
-pvr = rtas_ld(list, 0);
+pvr = ldl_be_phys(_space_memory, list);
 list += 4;
 
 trace_spapr_cas_pvr_try(pvr);
@@ -949,14 +950,13 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
 /* For the future use: here @ov_table points to the first option vector */
 ov_table = list;
 
-list = cas_get_option_vector(5, ov_table);
-if (!list) {
+ov5 = cas_get_option_vector(5, ov_table);
+if (!ov5) {
 return H_SUCCESS;
 }
 
 /* @list now points to OV 5 */
-list += 2;
-ov5_byte2 = rtas_ld(list, 0) >> 24;
+ov5_byte2 = ldub_phys(_space_memory, ov5 + 2);
 if (ov5_byte2 & OV5_DRCONF_MEMORY) {
 memory_update = true;
 }
-- 
2.5.0

[Qemu-devel] [PULL 17/40] pseries: Clean up error reporting in ppc_spapr_init()

2016-01-31 Thread David Gibson

This function includes a number of explicit fprintf()s for errors.
Change these to use error_report() instead.

Also replace the single exit(EXIT_FAILURE) with an explicit exit(1), since
the latter is the more usual idiom in qemu by a large margin.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1281e07..c05ddfb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1789,8 +1789,8 @@ static void ppc_spapr_init(MachineState *machine)
 }
 
 if (spapr->rma_size > node0_size) {
-fprintf(stderr, "Error: Numa node 0 has to span the RMA 
(%#08"HWADDR_PRIx")\n",
-spapr->rma_size);
+error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
+ spapr->rma_size);
 exit(1);
 }
 
@@ -1856,10 +1856,10 @@ static void ppc_spapr_init(MachineState *machine)
 ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
 
 if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
-error_report("Specified number of memory slots %" PRIu64
- " exceeds max supported %d",
+error_report("Specified number of memory slots %"
+ PRIu64" exceeds max supported %d",
  machine->ram_slots, SPAPR_MAX_RAM_SLOTS);
-exit(EXIT_FAILURE);
+exit(1);
 }
 
 spapr->hotplug_memory.base = ROUND_UP(machine->ram_size,
@@ -1955,8 +1955,9 @@ static void ppc_spapr_init(MachineState *machine)
 }
 
 if (spapr->rma_size < (MIN_RMA_SLOF << 20)) {
-fprintf(stderr, "qemu: pSeries SLOF firmware requires >= "
-"%ldM guest RMA (Real Mode Area memory)\n", MIN_RMA_SLOF);
+error_report(
+"pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area 
memory)",
+MIN_RMA_SLOF);
 exit(1);
 }
 
@@ -1972,8 +1973,8 @@ static void ppc_spapr_init(MachineState *machine)
 kernel_le = kernel_size > 0;
 }
 if (kernel_size < 0) {
-fprintf(stderr, "qemu: error loading %s: %s\n",
-kernel_filename, load_elf_strerror(kernel_size));
+error_report("error loading %s: %s",
+ kernel_filename, load_elf_strerror(kernel_size));
 exit(1);
 }
 
@@ -1986,8 +1987,8 @@ static void ppc_spapr_init(MachineState *machine)
 initrd_size = load_image_targphys(initrd_filename, initrd_base,
   load_limit - initrd_base);
 if (initrd_size < 0) {
-fprintf(stderr, "qemu: could not load initial ram disk '%s'\n",
-initrd_filename);
+error_report("could not load initial ram disk '%s'",
+ initrd_filename);
 exit(1);
 }
 } else {
-- 
2.5.0

[Qemu-devel] [PULL 10/40] spapr: Don't create ibm, dynamic-reconfiguration-memory w/o DR LMBs

2016-01-31 Thread David Gibson

From: Bharata B Rao 

If guest doesn't have any dynamically reconfigurable (DR) logical memory
blocks (LMB), then we shouldn't create ibm,dynamic-reconfiguration-memory
device tree node.

Signed-off-by: Bharata B Rao 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 08da895..0ac6368 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -764,6 +764,13 @@ static int spapr_populate_drconf_memory(sPAPRMachineState 
*spapr, void *fdt)
 int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
 
 /*
+ * Don't create the node if there are no DR LMBs.
+ */
+if (!nr_lmbs) {
+return 0;
+}
+
+/*
  * Allocate enough buffer size to fit in ibm,dynamic-memory
  * or ibm,associativity-lookup-arrays
  */
@@ -869,7 +876,7 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
 _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
 }
 
-/* Generate memory nodes or ibm,dynamic-reconfiguration-memory node */
+/* Generate ibm,dynamic-reconfiguration-memory node if required */
 if (memory_update && smc->dr_lmb_enabled) {
 _FDT((spapr_populate_drconf_memory(spapr, fdt)));
 }
-- 
2.5.0

[Qemu-devel] [PULL 23/40] target-ppc: gdbstub: fix altivec registers for little-endian guests

2016-01-31 Thread David Gibson

From: Greg Kurz 

Altivec registers are 128-bit wide. They are stored in memory as two
64-bit values that must be byteswapped when the guest is little-endian.
Let's reuse the ppc_maybe_bswap_register() helper for this.

We also need to fix the ordering of the 64-bit elements according to
the target endianness, for both system and user mode.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 0d6d115..1174141 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8753,9 +8753,9 @@ static void dump_ppc_insns (CPUPPCState *env)
 static bool avr_need_swap(CPUPPCState *env)
 {
 #ifdef HOST_WORDS_BIGENDIAN
-return false;
+return msr_le;
 #else
-return true;
+return !msr_le;
 #endif
 }
 
@@ -8799,14 +8799,18 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 stq_p(mem_buf, env->avr[n].u64[1]);
 stq_p(mem_buf+8, env->avr[n].u64[0]);
 }
+ppc_maybe_bswap_register(env, mem_buf, 8);
+ppc_maybe_bswap_register(env, mem_buf + 8, 8);
 return 16;
 }
 if (n == 32) {
 stl_p(mem_buf, env->vscr);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 if (n == 33) {
 stl_p(mem_buf, (uint32_t)env->spr[SPR_VRSAVE]);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 return 0;
@@ -8815,6 +8819,8 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
+ppc_maybe_bswap_register(env, mem_buf + 8, 8);
 if (!avr_need_swap(env)) {
 env->avr[n].u64[0] = ldq_p(mem_buf);
 env->avr[n].u64[1] = ldq_p(mem_buf+8);
@@ -8825,10 +8831,12 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 16;
 }
 if (n == 32) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 env->vscr = ldl_p(mem_buf);
 return 4;
 }
 if (n == 33) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 env->spr[SPR_VRSAVE] = (target_ulong)ldl_p(mem_buf);
 return 4;
 }
-- 
2.5.0

[Qemu-devel] [PULL 29/40] target-ppc: Remove unused kvmppc_read_segment_page_sizes() stub

2016-01-31 Thread David Gibson

This stub function is in the !KVM ifdef in target-ppc/kvm_ppc.h.  However
no such function exists on the KVM side, or is ever used.

I think this originally referenced a function which read host page size
information from /proc, for we we now use the KVM GET_SMMU_INFO extension
instead.

In any case, it has no function now, so remove it.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Laurent Vivier 
Reviewed-by: Alexander Graf 
---
 target-ppc/kvm_ppc.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 5e1333d..62406ce 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -98,11 +98,6 @@ static inline int kvmppc_get_hypercall(CPUPPCState *env, 
uint8_t *buf, int buf_l
 return -1;
 }
 
-static inline int kvmppc_read_segment_page_sizes(uint32_t *prop, int maxcells)
-{
-return -1;
-}
-
 static inline int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int level)
 {
 return -1;
-- 
2.5.0

[Qemu-devel] [PULL 22/40] target-ppc: gdbstub: introduce avr_need_swap()

2016-01-31 Thread David Gibson

From: Greg Kurz 

This helper will be used to support Altivec registers in little-endian guests.
This patch does not change functionnality.

Note: I had to put the helper some lines away from the gdb_*_avr_reg()
routines to get a more readable patch.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 031c71e..0d6d115 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8750,6 +8750,15 @@ static void dump_ppc_insns (CPUPPCState *env)
 }
 #endif
 
+static bool avr_need_swap(CPUPPCState *env)
+{
+#ifdef HOST_WORDS_BIGENDIAN
+return false;
+#else
+return true;
+#endif
+}
+
 static int gdb_get_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
@@ -8783,13 +8792,13 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
-#ifdef HOST_WORDS_BIGENDIAN
-stq_p(mem_buf, env->avr[n].u64[0]);
-stq_p(mem_buf+8, env->avr[n].u64[1]);
-#else
-stq_p(mem_buf, env->avr[n].u64[1]);
-stq_p(mem_buf+8, env->avr[n].u64[0]);
-#endif
+if (!avr_need_swap(env)) {
+stq_p(mem_buf, env->avr[n].u64[0]);
+stq_p(mem_buf+8, env->avr[n].u64[1]);
+} else {
+stq_p(mem_buf, env->avr[n].u64[1]);
+stq_p(mem_buf+8, env->avr[n].u64[0]);
+}
 return 16;
 }
 if (n == 32) {
@@ -8806,13 +8815,13 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
-#ifdef HOST_WORDS_BIGENDIAN
-env->avr[n].u64[0] = ldq_p(mem_buf);
-env->avr[n].u64[1] = ldq_p(mem_buf+8);
-#else
-env->avr[n].u64[1] = ldq_p(mem_buf);
-env->avr[n].u64[0] = ldq_p(mem_buf+8);
-#endif
+if (!avr_need_swap(env)) {
+env->avr[n].u64[0] = ldq_p(mem_buf);
+env->avr[n].u64[1] = ldq_p(mem_buf+8);
+} else {
+env->avr[n].u64[1] = ldq_p(mem_buf);
+env->avr[n].u64[0] = ldq_p(mem_buf+8);
+}
 return 16;
 }
 if (n == 32) {
-- 
2.5.0

[Qemu-devel] [PULL 37/40] target-ppc: Helper to determine page size information from hpte alone

2016-01-31 Thread David Gibson

h_enter() in the spapr code needs to know the page size of the HPTE it's
about to insert.  Unlike other paths that do this, it doesn't have access
to the SLB, so at the moment it determines this with some open-coded
tests which assume POWER7 or POWER8 page size encodings.

To make this more flexible add ppc_hash64_hpte_page_shift_noslb() to
determine both the "base" page size per segment, and the individual
effective page size from an HPTE alone.

This means that the spapr code should now be able to handle any page size
listed in the env->sps table.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 hw/ppc/spapr_hcall.c| 25 ++---
 target-ppc/mmu-hash64.c | 35 +++
 target-ppc/mmu-hash64.h |  3 +++
 3 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0a8378c..12f8c33 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -73,31 +73,18 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 target_ulong pte_index = args[1];
 target_ulong pteh = args[2];
 target_ulong ptel = args[3];
-target_ulong page_shift = 12;
+unsigned apshift, spshift;
 target_ulong raddr;
 target_ulong index;
 uint64_t token;
 
-/* only handle 4k and 16M pages for now */
-if (pteh & HPTE64_V_LARGE) {
-#if 0 /* We don't support 64k pages yet */
-if ((ptel & 0xf000) == 0x1000) {
-/* 64k page */
-} else
-#endif
-if ((ptel & 0xff000) == 0) {
-/* 16M page */
-page_shift = 24;
-/* lowest AVA bit must be 0 for 16M pages */
-if (pteh & 0x80) {
-return H_PARAMETER;
-}
-} else {
-return H_PARAMETER;
-}
+apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel, );
+if (!apshift) {
+/* Bad page size encoding */
+return H_PARAMETER;
 }
 
-raddr = (ptel & HPTE64_R_RPN) & ~((1ULL << page_shift) - 1);
+raddr = (ptel & HPTE64_R_RPN) & ~((1ULL << apshift) - 1);
 
 if (is_ram_address(spapr, raddr)) {
 /* Regular RAM - should have WIMG=0010 */
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 565a0f4..6d110ee 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -513,6 +513,41 @@ static unsigned hpte_page_shift(const struct 
ppc_one_seg_page_size *sps,
 return 0; /* Bad page size encoding */
 }
 
+unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
+  uint64_t pte0, uint64_t pte1,
+  unsigned *seg_page_shift)
+{
+CPUPPCState *env = >env;
+int i;
+
+if (!(pte0 & HPTE64_V_LARGE)) {
+*seg_page_shift = 12;
+return 12;
+}
+
+/*
+ * The encodings in env->sps need to be carefully chosen so that
+ * this gives an unambiguous result.
+ */
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+const struct ppc_one_seg_page_size *sps = >sps.sps[i];
+unsigned shift;
+
+if (!sps->page_shift) {
+break;
+}
+
+shift = hpte_page_shift(sps, pte0, pte1);
+if (shift) {
+*seg_page_shift = sps->page_shift;
+return shift;
+}
+}
+
+*seg_page_shift = 0;
+return 0;
+}
+
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong eaddr,
 int rwx, int mmu_idx)
 {
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 293a951..34cf975 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -16,6 +16,9 @@ void ppc_hash64_store_hpte(PowerPCCPU *cpu, target_ulong 
index,
 void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
target_ulong pte_index,
target_ulong pte0, target_ulong pte1);
+unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
+  uint64_t pte0, uint64_t pte1,
+  unsigned *seg_page_shift);
 #endif
 
 /*
-- 
2.5.0

Re: [Qemu-devel] [PATCH RFC v2 2/5] vl: Make object_create() public

2016-01-31 Thread Jason Wang



On 01/27/2016 04:29 PM, zhanghailiang wrote:
> Make the helper object_create() public and fix its first
> parameter to accept NULL value.

Looks not very nice. Maybe pass a new predicate func for sanity check it
better.

>
> Signed-off-by: zhanghailiang 
> Cc: Paolo Bonzini 
> ---
> v2:
>  - New patch
> ---
>  include/qemu-common.h | 2 ++
>  vl.c  | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 22b010c..52cf4fd 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -500,4 +500,6 @@ int parse_debug_env(const char *name, int max, int 
> initial);
>  const char *qemu_ether_ntoa(const MACAddr *mac);
>  void page_size_init(void);
>  
> +int object_create(void *opaque, QemuOpts *opts, Error **errp);
> +
>  #endif
> diff --git a/vl.c b/vl.c
> index f043009..b21335e 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2819,7 +2819,7 @@ static bool object_create_delayed(const char *type)
>  }
>  
>  
> -static int object_create(void *opaque, QemuOpts *opts, Error **errp)
> +int object_create(void *opaque, QemuOpts *opts, Error **errp)
>  {
>  Error *err = NULL;
>  char *type = NULL;
> @@ -2842,7 +2842,7 @@ static int object_create(void *opaque, QemuOpts *opts, 
> Error **errp)
>  if (err) {
>  goto out;
>  }
> -if (!type_predicate(type)) {
> +if (type_predicate && !type_predicate(type)) {
>  goto out;
>  }
>

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-01-31 Thread Jason Wang



On 01/27/2016 04:29 PM, zhanghailiang wrote:
> We add a new helper function netdev_add_filter(), this function
> can help adding a filter object to a netdev.
> Besides, we add a is_default member for struct NetFilterState
> to indicate whether the filter is default or not.
>
> Signed-off-by: zhanghailiang 
> ---
> v2:
>  -Re-implement netdev_add_filter() by re-using object_create()
>   (Jason's suggestion)
> ---
>  include/net/filter.h |  7 +
>  net/filter.c | 80 
> 
>  2 files changed, 87 insertions(+)
>
> diff --git a/include/net/filter.h b/include/net/filter.h
> index af3c53c..ee1c024 100644
> --- a/include/net/filter.h
> +++ b/include/net/filter.h
> @@ -55,6 +55,7 @@ struct NetFilterState {
>  char *netdev_id;
>  NetClientState *netdev;
>  NetFilterDirection direction;
> +bool is_default;
>  bool enabled;
>  QTAILQ_ENTRY(NetFilterState) next;
>  };
> @@ -74,4 +75,10 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
>  int iovcnt,
>  void *opaque);
>  
> +void netdev_add_filter(const char *netdev_id,
> +   const char *filter_type,
> +   const char *id,
> +   bool is_default,
> +   Error **errp);
> +
>  #endif /* QEMU_NET_FILTER_H */
> diff --git a/net/filter.c b/net/filter.c
> index d08a2be..dc7aa9b 100644
> --- a/net/filter.c
> +++ b/net/filter.c
> @@ -214,6 +214,86 @@ static void netfilter_complete(UserCreatable *uc, Error 
> **errp)
>  QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
>  }
>  
> +QemuOptsList qemu_filter_opts = {
> +.name = "default-filter",
> +.head = QTAILQ_HEAD_INITIALIZER(qemu_filter_opts.head),
> +.desc = {
> +{
> +.name = "qom-type",
> +.type = QEMU_OPT_STRING,
> +},{
> +.name = "id",
> +.type = QEMU_OPT_STRING,
> +},{
> +.name = "netdev",
> +.type = QEMU_OPT_STRING,
> +},{
> +.name = "status",
> +.type = QEMU_OPT_STRING,
> +},
> +{ /* end of list */ }
> +},
> +};
> +
> +static void filter_set_default_flag(const char *id,
> +bool is_default,
> +Error **errp)
> +{
> +Object *obj, *container;
> +NetFilterState *nf;
> +
> +container = object_get_objects_root();
> +obj = object_resolve_path_component(container, id);
> +if (!obj) {
> +error_setg(errp, "object id not found");
> +return;
> +}
> +nf = NETFILTER(obj);
> +nf->is_default = is_default;
> +}
> +
> +void netdev_add_filter(const char *netdev_id,
> +   const char *filter_type,
> +   const char *id,
> +   bool is_default,
> +   Error **errp)
> +{
> +NetClientState *nc = qemu_find_netdev(netdev_id);
> +char *optarg;
> +QemuOpts *opts = NULL;
> +Error *err = NULL;
> +
> +/* FIXME: Not support multiple queues */
> +if (!nc || nc->queue_index > 1) {
> +return;
> +}
> +/* Not support vhost-net */
> +if (get_vhost_net(nc)) {
> +return;
> +}
> +
> +optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
> +filter_type, id, netdev_id, is_default ? "disable" : "enable"

Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string
- colo code may change the pointer to "filter-buffer,status=disable"

Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"
- no need to have a special flag like "is_default"

Thoughts?

> +opts = qemu_opts_parse_noisily(_filter_opts,
> +   optarg, false);
> +if (!opts) {
> +error_report("Failed to parse param '%s'", optarg);
> +exit(1);
> +}
> +g_free(optarg);
> +if (object_create(NULL, opts, ) < 0) {
> +error_report("Failed to create object");
> +goto out_clean;
> +}
> +filter_set_default_flag(id, is_default, );
> +
> +out_clean:
> +qemu_opts_del(opts);
> +if (err) {
> +error_propagate(errp, err);
> +}
> +}
> +
>  static void netfilter_finalize(Object *obj)
>  {
>  NetFilterState *nf = NETFILTER(obj);

Re: [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-01-31 Thread Wei Xu


On 02/01/2016 02:50 AM, Michael S. Tsirkin wrote:

On Mon, Feb 01, 2016 at 02:13:22AM +0800, w...@redhat.com wrote:

From: Wei Xu 

Upon a packet is arriving, a corresponding chain will be selected or created,
or be bypassed if it's not an IPv4 packets.

The callback in the chain will be invoked to call the real coalescing.

Since the coalescing is based on the TCP connection, so the packets will be
cached if there is no previous data within the same connection.

The framework of IPv4 is also introduced.

This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
coalescing)

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 173 +++-
  1 file changed, 172 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4e9458e..cfbac6d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -14,10 +14,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -37,6 +39,21 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
  
+#define VIRTIO_HEADER   12/* Virtio net header size */

+#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
+
+/* Global statistics */
+static uint32_t rsc_chain_no_mem;
+
+/* Switcher to enable/disable rsc */
+static bool virtio_net_rsc_bypass;
+
+/* Coalesce callback for ipv4/6 */
+typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
+ const uint8_t *buf, size_t size);
+

Since there are only 2 cases, it's probably better to just
open-code if (v4) -> coalesce4 else if v6 -> coalesce6

OK, thanks mst.

Wei



  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
  return 0;
  }
  
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)

+static ssize_t virtio_net_do_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
  {
  VirtIONet *n = qemu_get_nic_opaque(nc);
  VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
  }
  }
  
+static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,

+const uint8_t *buf, size_t size)
+{
+NetRscSeg *seg;
+
+seg = g_malloc(sizeof(NetRscSeg));
+if (!seg) {
+return 0;
+}
+
+seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
+if (!seg->buf) {
+goto out;
+}
+
+memmove(seg->buf, buf, size);
+seg->size = size;
+seg->dup_ack_count = 0;
+seg->is_coalesced = 0;
+seg->nc = nc;
+
+QTAILQ_INSERT_TAIL(>buffers, seg, next);
+return size;
+
+out:
+g_free(seg);
+return 0;
+}
+
+
+static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
+   NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+/* This real part of this function will be introduced in next patch, just
+*  return a 'final' to feed the compilation. */
+return RSC_FINAL;
+}
+
+static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
+{
+int ret;
+NetRscSeg *seg, *nseg;
+
+if (QTAILQ_EMPTY(>buffers)) {
+if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
+return 0;
+} else {
+return size;
+}
+}
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
+ret = coalesce(chain, seg, buf, size);
+if (RSC_FINAL == ret) {
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+if (ret == 0) {
+/* Send failed */
+return 0;
+}
+
+/* Send current packet */
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_NO_MATCH == ret) {
+continue;
+} else {
+/* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
+seg->is_coalesced = 1;
+return size;
+}
+}
+
+return virtio_net_rsc_cache_buf(chain, nc, buf, size);
+}
+
+static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
+  const uint8_t *buf, size_t size)
+{
+NetRscChain *chain;
+
+chain = (NetRscChain

[Qemu-devel] CPU hotplug

2016-01-31 Thread David Gibson

Hi,

It seems to me we're getting rather bogged down in how to proceed with
an improved CPU hotplug (and hot unplug) interface, both generically
and for ppc in particular.

So here's a somewhat more concrete suggestion of a way forward, to see
if we can get some consensus.

The biggest difficulty I think we're grappling with is that device-add
is actually *not* a great interface to cpu hotplug.  Or rather, it's
not great as the _only_ interface: in order to represent the many
different constraints on how cpus can be plugged on various platforms,
it's natural to use a heirarchy of cpu core / socket / package types
specific to the specific platform or real-world cpu package being
modeled.  However, for the normal case of a regular homogenous (and at
least slightly para-virtualized) server, that interface is nasty for
management layers because they have to know the right type to
instantiate.

To address this, I'm proposing this two layer interface:

Layer 1: Low-level, device-add based

* a new, generic cpu-package QOM type represents a group of 1 or
  more cpu threads which can be hotplugged as a unit
* cpu-package is abstract and can't be instantiated directly
* archs and/or individual platforms have specific subtypes of
  cpu-package which can be instantiated
* for platforms attempting to be faithful representations of real
  hardware these subtypes would match the specific characteristics
  of the real hardware devices.  In addition to the cpu threads,
  they may have other on chip devices as sub-objects.
* for platforms which are paravirtual - or which have existing
  firmware abstractions for cpu cores/sockets/packages/whatever -
  these could be more abstract, but would still be tied to that
  platform's constraints
* Depending on the platform the cpu-package object could have
  further internal structure (e.g. a package object representing a
  socket contains package objects representing each core, which in
  turn contain cpu objects for each thread)
* Some crazy platform that has multiple daughterboards each with
  several multi-chip-modules each with several chips, each
  with several cores each with several threads could represent
  that too.

What would be common to all the cpu-package subtypes is:
* A boolean "present" attribute ("realized" might already be
  suitable, but I'm not certain)
* A generic means of determining the number of cpu threads in the
  package, and enumerating those
* A generic means of determining if the package is hotpluggable or
  not
* They'd get listed in a standard place in the QOM tree

This interface is suitable if you want complete control over
constructing the system, including weird cases like heterogeneous
machines (either totally different cpu types, or just different
numbers of threads in different packages).

The intention is that these objects would never look at the global cpu
type or sockets/cores/threads numbers.  The next level up would
instead configure the packages to match those for the common case.

Layer 2: Higher-level

* not all machine types need support this model, but I'd expect
  all future versions of machine types designed for production use
  to do so
* machine types don't construct cpu objects directly
* instead they create enough cpu-package objects - of a subtype
  suitable for this machine - to provide maxcpus threads
* the machine type would set the "present" bit on enough of the
  cpu packages to provide the base number of cpu threads

Management layers can then manage hotplug without knowing platform
specifics by using qmp to toggle the "present" bit on packages.
Platforms that allow thread-level pluggability can expose a package
for every thread, those that allow core-level expose a package per
core, those that have even less granularity expose a package at
whatever grouping they can do hotplug on.

Examples:

For use with pc (or q35 or whatever) machine type, I'd expect a
cpu-package subtype called, say "acpi-thread" which represents a
single thread in the ACPI sense.  Toggling those would trigger ACPI
hotplug events as cpu_add does now.

For use with pseries, I'd expect a "papr-core" cpu-package subtype,
which represents a single (paravirtual) core.  Toggling present on
this would trigger the PAPR hotplug events.  A property would control
the number of threads in the core (only settable before enabling
present).

For use with the powernv machine type (once ready for merge) I'd
expect "POWER8-package" type which represents a POWER8 chip / module
as close to the real hardware as we can get.  It would have a fixed
number of cores and threads within it as per the real hardware, and
would also include xscoms and other per-module logic.

From here to there:

A suggested order of implementation to get there without too much risk
of breaking things.

  1. Fix bugs with creation /

Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-31 Thread Wen Congyang

On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
 On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I've got a block error if I kill the secondary.
>
> Start both primary & secondary
> kill -9 secondary qemu
> x_colo_lost_heartbeat on primary
>
> The guest sees a block error and the ext4 root switches to read-only.
>
> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> backtrace below.
> (This is based on colo-v2.4-periodic-mode of the framework
> code with the block and network proxy merged in; so it could be my
> merging but I don't think so ?)
>
>
> (gdb) where
> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
> acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> at /root/colo/jan-2016/qemu/block/quorum.c:222
> #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
> ret=)
> at /root/colo/jan-2016/qemu/block/quorum.c:315
> #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
> /root/colo/jan-2016/qemu/block/io.c:2122
> #3  0x7f2943ae777d in aio_bh_call (bh=) at 
> /root/colo/jan-2016/qemu/async.c:64
> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
> /root/colo/jan-2016/qemu/async.c:92
> #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
> /root/colo/jan-2016/qemu/aio-posix.c:305
> #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
> callback=, 
> user_data=) at /root/colo/jan-2016/qemu/async.c:231
> #7  0x7f293b84a79a in g_main_context_dispatch () from 
> /lib64/libglib-2.0.so.0
> #8  0x7f2943af3a00 in glib_pollfds_poll () at 
> /root/colo/jan-2016/qemu/main-loop.c:211
> #9  os_host_main_loop_wait (timeout=) at 
> /root/colo/jan-2016/qemu/main-loop.c:256
> #10 main_loop_wait (nonblocking=) at 
> /root/colo/jan-2016/qemu/main-loop.c:504
> #11 0x7f29438529ee in main_loop () at 
> /root/colo/jan-2016/qemu/vl.c:1945
> #12 main (argc=, argv=, envp= out>) at /root/colo/jan-2016/qemu/vl.c:4707
>
> (gdb) p s->num_children
> $1 = 2
> (gdb) p acb->success_count
> $2 = 0
> (gdb) p acb->is_read
> $5 = false

 Sorry for the late reply.
>>>
>>> No problem.
>>>
 What it the value of acb->count?
>>>
>>> (gdb) p acb->count
>>> $1 = 1
>>
>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing 
>> to children.0 successes,
>> the guest doesn't know this error.
 If secondary host is down, you should remove quorum's children.1. 
 Otherwise, you will get
 I/O error event.
>>>
>>> Is that safe?  If the secondary fails, do you always have time to issue the 
>>> command to
>>> remove the children.1  before the guest sees the error?
>>
>> We will write to two children, and expect that writing to children.0 will 
>> success. If so,
>> the guest doesn't know this error. You just get the I/O error event.
> 
> I think children.0 is the disk, and that should be OK - so only the 
> children.1/replication should
> be failing - so in that case why do I see the error?

I don't know, and I will check the codes.

> The 'node0' in the backtrace above is the name of the replication, so it does 
> look like the error
> is coming from the replication.

No, the backtrace is just report an I/O error events to the management 
application.

> 
>>> Anyway, I tried removing children.1 but it segfaults now, I guess the 
>>> replication is unhappy:
>>>
>>> (qemu) x_block_change colo-disk0 -d children.1
>>> (qemu) x_colo_lost_heartbeat 
>>
>> Hmm, you should not remove the child before failover. I will check it how to 
>> avoid it in the codes.
> 
>  But you said 'If secondary host is down, you should remove quorum's 
> children.1' - is that not
> what you meant?

Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 
'x_block_change ... -d ...'.

> 
>>> 12973 Segmentation fault  (core dumped) 
>>> ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot 
>>> c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on 
>>> -trace events=trace-file -device virtio-rng-pci $block_param $net_param
>>>
>>> #0  0x7f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, 
>>> failover=true, errp=0x7fff6a5c3420)
>>> at /root/colo/jan-2016/qemu/block.c:4426
>>>
>>> (gdb) p drv
>>> $1 = (BlockDriver *) 0x5d2a
>>>
>>>   it looks like the whole of bs is bogus.
>>>
>>> #1  0x7f0a398d87f6 in quorum_stop_replication (bs=, 
>>> failover=, 
>>> errp=) at /root/colo/jan-2016/qemu/block/quorum.c:1213
>>>
>>> (gdb) p s->replication_index
>>> $3 = 1
>>>
>>> I guess quorum_del_child needs to stop replication before it removes the 
>>> child?
>>
>> Yes, but in the newest version, quorum doesn't know

[Qemu-devel] [PULL 25/40] target-ppc: gdbstub: Add VSX support

2016-01-31 Thread David Gibson

From: Anton Blanchard 

Add the XML and functions to get and set VSX registers.

Signed-off-by: Anton Blanchard 
(fixed little-endian guests)
Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 configure   |  6 +++---
 gdb-xml/power-vsx.xml   | 44 
 target-ppc/translate_init.c | 24 
 3 files changed, 71 insertions(+), 3 deletions(-)
 create mode 100644 gdb-xml/power-vsx.xml

diff --git a/configure b/configure
index 3506e44..297bfc7 100755
--- a/configure
+++ b/configure
@@ -5702,20 +5702,20 @@ case "$target_name" in
   ppc64)
 TARGET_BASE_ARCH=ppc
 TARGET_ABI_DIR=ppc
-gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
+gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml power-vsx.xml"
   ;;
   ppc64le)
 TARGET_ARCH=ppc64
 TARGET_BASE_ARCH=ppc
 TARGET_ABI_DIR=ppc
-gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
+gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml power-vsx.xml"
   ;;
   ppc64abi32)
 TARGET_ARCH=ppc64
 TARGET_BASE_ARCH=ppc
 TARGET_ABI_DIR=ppc
 echo "TARGET_ABI32=y" >> $config_target_mak
-gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
+gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml power-vsx.xml"
   ;;
   sh4|sh4eb)
 TARGET_ARCH=sh4
diff --git a/gdb-xml/power-vsx.xml b/gdb-xml/power-vsx.xml
new file mode 100644
index 000..fd290e9
--- /dev/null
+++ b/gdb-xml/power-vsx.xml
@@ -0,0 +1,44 @@
+
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index fce68f3..4d71a5d 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8896,6 +8896,26 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 0;
 }
 
+static int gdb_get_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
+{
+if (n < 32) {
+stq_p(mem_buf, env->vsr[n]);
+ppc_maybe_bswap_register(env, mem_buf, 8);
+return 8;
+}
+return 0;
+}
+
+static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
+{
+if (n < 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
+env->vsr[n] = ldq_p(mem_buf);
+return 8;
+}
+return 0;
+}
+
 static int ppc_fixup_cpu(PowerPCCPU *cpu)
 {
 CPUPPCState *env = >env;
@@ -9001,6 +9021,10 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
**errp)
 gdb_register_coprocessor(cs, gdb_get_spe_reg, gdb_set_spe_reg,
  34, "power-spe.xml", 0);
 }
+if (pcc->insns_flags2 & PPC2_VSX) {
+gdb_register_coprocessor(cs, gdb_get_vsx_reg, gdb_set_vsx_reg,
+ 32, "power-vsx.xml", 0);
+}
 
 qemu_init_vcpu(cs);
 
-- 
2.5.0

[Qemu-devel] [PULL 38/40] target-ppc: Allow more page sizes for POWER7 & POWER8 in TCG

2016-01-31 Thread David Gibson

Now that the TCG and spapr code has been extended to allow (semi-)
arbitrary page encodings in the CPU's 'sps' table, we can add the many
page sizes supported by real POWER7 and POWER8 hardware that we previously
didn't support in TCG.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/mmu-hash64.h |  2 ++
 target-ppc/translate_init.c | 32 
 2 files changed, 34 insertions(+)

diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 34cf975..ab0f86b 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -48,6 +48,8 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
 #define SLB_VSID_LLP_MASK   (SLB_VSID_L | SLB_VSID_LP)
 #define SLB_VSID_4K 0xULL
 #define SLB_VSID_64K0x0110ULL
+#define SLB_VSID_16M0x0100ULL
+#define SLB_VSID_16G0x0120ULL
 
 /*
  * Hash page table definitions
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 4d71a5d..cdd18ac 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8105,6 +8105,36 @@ static Property powerpc_servercpu_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+#ifdef CONFIG_SOFTMMU
+static const struct ppc_segment_page_sizes POWER7_POWER8_sps = {
+.sps = {
+{
+.page_shift = 12, /* 4K */
+.slb_enc = 0,
+.enc = { { .page_shift = 12, .pte_enc = 0 },
+ { .page_shift = 16, .pte_enc = 0x7 },
+ { .page_shift = 24, .pte_enc = 0x38 }, },
+},
+{
+.page_shift = 16, /* 64K */
+.slb_enc = SLB_VSID_64K,
+.enc = { { .page_shift = 16, .pte_enc = 0x1 },
+ { .page_shift = 24, .pte_enc = 0x8 }, },
+},
+{
+.page_shift = 24, /* 16M */
+.slb_enc = SLB_VSID_16M,
+.enc = { { .page_shift = 24, .pte_enc = 0 }, },
+},
+{
+.page_shift = 34, /* 16G */
+.slb_enc = SLB_VSID_16G,
+.enc = { { .page_shift = 34, .pte_enc = 0x3 }, },
+},
+}
+};
+#endif /* CONFIG_SOFTMMU */
+
 static void init_proc_POWER7 (CPUPPCState *env)
 {
 init_proc_book3s_64(env, BOOK3S_CPU_POWER7);
@@ -8168,6 +8198,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 pcc->mmu_model = POWERPC_MMU_2_06;
 #if defined(CONFIG_SOFTMMU)
 pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+pcc->sps = _POWER8_sps;
 #endif
 pcc->excp_model = POWERPC_EXCP_POWER7;
 pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
@@ -8248,6 +8279,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 pcc->mmu_model = POWERPC_MMU_2_07;
 #if defined(CONFIG_SOFTMMU)
 pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+pcc->sps = _POWER8_sps;
 #endif
 pcc->excp_model = POWERPC_EXCP_POWER7;
 pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
-- 
2.5.0

[Qemu-devel] [PULL 36/40] target-ppc: Add new TLB invalidate by HPTE call for hash64 MMUs

2016-01-31 Thread David Gibson

When HPTEs are removed or modified by hypercalls on spapr, we need to
invalidate the relevant pages in the qemu TLB.

Currently we do that by doing some complicated calculations to work out the
right encoding for the tlbie instruction, then passing that to
ppc_tlb_invalidate_one()... which totally ignores the argument and flushes
the whole tlb.

Avoid that by adding a new flush-by-hpte helper in mmu-hash64.c.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 hw/ppc/spapr_hcall.c| 46 --
 target-ppc/mmu-hash64.c | 12 
 target-ppc/mmu-hash64.h |  3 +++
 3 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index a53bd2f..0a8378c 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -38,42 +38,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong 
value,
 run_on_cpu(cs, do_spr_sync, );
 }
 
-static target_ulong compute_tlbie_rb(target_ulong v, target_ulong r,
- target_ulong pte_index)
-{
-target_ulong rb, va_low;
-
-rb = (v & ~0x7fULL) << 16; /* AVA field */
-va_low = pte_index >> 3;
-if (v & HPTE64_V_SECONDARY) {
-va_low = ~va_low;
-}
-/* xor vsid from AVA */
-if (!(v & HPTE64_V_1TB_SEG)) {
-va_low ^= v >> 12;
-} else {
-va_low ^= v >> 24;
-}
-va_low &= 0x7ff;
-if (v & HPTE64_V_LARGE) {
-rb |= 1; /* L field */
-#if 0 /* Disable that P7 specific bit for now */
-if (r & 0xff000) {
-/* non-16MB large page, must be 64k */
-/* (masks depend on page size) */
-rb |= 0x1000;/* page encoding in LP field */
-rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */
-rb |= (va_low & 0xfe);   /* AVAL field */
-}
-#endif
-} else {
-/* 4kB page */
-rb |= (va_low & 0x7ff) << 12;   /* remaining 11b of AVA */
-}
-rb |= (v >> 54) & 0x300;/* B field */
-return rb;
-}
-
 static inline bool valid_pte_index(CPUPPCState *env, target_ulong pte_index)
 {
 /*
@@ -199,7 +163,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 {
 CPUPPCState *env = >env;
 uint64_t token;
-target_ulong v, r, rb;
+target_ulong v, r;
 
 if (!valid_pte_index(env, ptex)) {
 return REMOVE_PARM;
@@ -218,8 +182,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 *vp = v;
 *rp = r;
 ppc_hash64_store_hpte(cpu, ptex, HPTE64_V_HPTE_DIRTY, 0);
-rb = compute_tlbie_rb(v, r, ptex);
-ppc_tlb_invalidate_one(env, rb);
+ppc_hash64_tlb_flush_hpte(cpu, ptex, v, r);
 return REMOVE_SUCCESS;
 }
 
@@ -323,7 +286,7 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 target_ulong pte_index = args[1];
 target_ulong avpn = args[2];
 uint64_t token;
-target_ulong v, r, rb;
+target_ulong v, r;
 
 if (!valid_pte_index(env, pte_index)) {
 return H_PARAMETER;
@@ -344,10 +307,9 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 r |= (flags << 55) & HPTE64_R_PP0;
 r |= (flags << 48) & HPTE64_R_KEY_HI;
 r |= flags & (HPTE64_R_PP | HPTE64_R_N | HPTE64_R_KEY_LO);
-rb = compute_tlbie_rb(v, r, pte_index);
 ppc_hash64_store_hpte(cpu, pte_index,
   (v & ~HPTE64_V_VALID) | HPTE64_V_HPTE_DIRTY, 0);
-ppc_tlb_invalidate_one(env, rb);
+ppc_hash64_tlb_flush_hpte(cpu, pte_index, v, r);
 /* Don't need a memory barrier, due to qemu's global lock */
 ppc_hash64_store_hpte(cpu, pte_index, v | HPTE64_V_HPTE_DIRTY, r);
 return H_SUCCESS;
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index f4c25b7..565a0f4 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -708,3 +708,15 @@ void ppc_hash64_store_hpte(PowerPCCPU *cpu,
  env->htab_base + pte_index + HASH_PTE_SIZE_64 / 2, pte1);
 }
 }
+
+void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
+   target_ulong pte_index,
+   target_ulong pte0, target_ulong pte1)
+{
+/*
+ * XXX: given the fact that there are too many segments to
+ * invalidate, and we still don't have a tlb_flush_mask(env, n,
+ * mask) in QEMU, we just invalidate all TLBs
+ */
+tlb_flush(CPU(cpu), 1);
+}
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 24fd2c4..293a951 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -13,6 +13,9 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong 
address, int rw,
 int mmu_idx);
 void ppc_hash64_store_hpte(PowerPCCPU *cpu, target_ulong index,
target_ulong

[Qemu-devel] [PULL 11/40] ppc: Clean up error handling in ppc_set_compat()

2016-01-31 Thread David Gibson

Current ppc_set_compat() returns -1 for errors, and also (unconditionally)
reports an error message.  The caller in h_client_architecture_support()
may then report it again using an outdated fprintf().

Clean this up by using the modern error reporting mechanisms.  Also add
strerror(errno) to the error message.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c  |  4 +---
 hw/ppc/spapr_hcall.c| 10 +-
 target-ppc/cpu.h|  2 +-
 target-ppc/translate_init.c | 13 +++--
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0ac6368..8862d18 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1643,9 +1643,7 @@ static void spapr_cpu_init(sPAPRMachineState *spapr, 
PowerPCCPU *cpu)
 }
 
 if (cpu->max_compat) {
-if (ppc_set_compat(cpu, cpu->max_compat) < 0) {
-exit(1);
-}
+ppc_set_compat(cpu, cpu->max_compat, _fatal);
 }
 
 xics_cpu_setup(spapr->icp, cpu);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index fdd7fea..655c433 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -838,7 +838,7 @@ static target_ulong cas_get_option_vector(int vector, 
target_ulong table)
 typedef struct {
 PowerPCCPU *cpu;
 uint32_t cpu_version;
-int ret;
+Error *err;
 } SetCompatState;
 
 static void do_set_compat(void *arg)
@@ -846,7 +846,7 @@ static void do_set_compat(void *arg)
 SetCompatState *s = arg;
 
 cpu_synchronize_state(CPU(s->cpu));
-s->ret = ppc_set_compat(s->cpu, s->cpu_version);
+ppc_set_compat(s->cpu, s->cpu_version, >err);
 }
 
 #define get_compat_level(cpuver) ( \
@@ -931,13 +931,13 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu_,
 SetCompatState s = {
 .cpu = POWERPC_CPU(cs),
 .cpu_version = cpu_version,
-.ret = 0
+.err = NULL,
 };
 
 run_on_cpu(cs, do_set_compat, );
 
-if (s.ret < 0) {
-fprintf(stderr, "Unable to set compatibility mode\n");
+if (s.err) {
+error_report_err(s.err);
 return H_HARDWARE;
 }
 }
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 9706000..b3b89e6 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1210,7 +1210,7 @@ void ppc_store_msr (CPUPPCState *env, target_ulong value);
 
 void ppc_cpu_list (FILE *f, fprintf_function cpu_fprintf);
 int ppc_get_compat_smt_threads(PowerPCCPU *cpu);
-int ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version);
+void ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version, Error **errp);
 
 /* Time-base and decrementer management */
 #ifndef NO_CPU_IO_DEFS
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 76d5da1..78c2811 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9185,7 +9185,7 @@ int ppc_get_compat_smt_threads(PowerPCCPU *cpu)
 return ret;
 }
 
-int ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version)
+void ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version, Error **errp)
 {
 int ret = 0;
 CPUPPCState *env = >env;
@@ -9207,12 +9207,13 @@ int ppc_set_compat(PowerPCCPU *cpu, uint32_t 
cpu_version)
 break;
 }
 
-if (kvm_enabled() && kvmppc_set_compat(cpu, cpu->cpu_version) < 0) {
-error_report("Unable to set compatibility mode in KVM");
-ret = -1;
+if (kvm_enabled()) {
+ret = kvmppc_set_compat(cpu, cpu->cpu_version);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Unable to set CPU compatibility mode in KVM");
+}
 }
-
-return ret;
 }
 
 static gint ppc_cpu_compare_class_pvr(gconstpointer a, gconstpointer b)
-- 
2.5.0

Re: [Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread Wei Xu


On 02/01/2016 02:47 AM, Michael S. Tsirkin wrote:

On Mon, Feb 01, 2016 at 02:13:21AM +0800, w...@redhat.com wrote:

From: Wei Xu 

The chain list is initialized when the device is getting realized,
and the entry of the chain will be inserted dynamically according
to protocol type of the network traffic.

All the buffered packets and chain will be destroyed when the
device is going to be unrealized.

Signed-off-by: Wei Xu 

What happens during migration?

Missing considering migration, will check it out, thanks Michael.

Wei



---
  hw/net/virtio-net.c| 22 ++
  include/hw/virtio/virtio-net.h |  1 +
  2 files changed, 23 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a877614..4e9458e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
  return 0;
  }
  
+

+static void virtio_net_rsc_cleanup(VirtIONet *n)
+{
+NetRscChain *chain, *rn_chain;
+NetRscSeg *seg, *rn_seg;
+
+QTAILQ_FOREACH_SAFE(chain, >rsc_chains, next, rn_chain) {
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn_seg) {
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+timer_del(chain->drain_timer);
+timer_free(chain->drain_timer);
+QTAILQ_REMOVE(>rsc_chains, chain, next);
+g_free(chain);
+}
+}
+}
+
  static NetClientInfo net_virtio_info = {
  .type = NET_CLIENT_OPTIONS_KIND_NIC,
  .size = sizeof(NICState),
@@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
  nc = qemu_get_queue(n->nic);
  nc->rxfilter_notify_enabled = 1;
  
+QTAILQ_INIT(>rsc_chains);

  n->qdev = dev;
  register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
  virtio_net_save, virtio_net_load, n);
@@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, 
Error **errp)
  g_free(n->vqs);
  qemu_del_nic(n->nic);
  virtio_cleanup(vdev);
+virtio_net_rsc_cleanup(n);
  }
  
  static void virtio_net_instance_init(Object *obj)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index f3cc25f..6ce8b93 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -59,6 +59,7 @@ typedef struct VirtIONet {
  VirtIONetQueue *vqs;
  VirtQueue *ctrl_vq;
  NICState *nic;
+QTAILQ_HEAD(, NetRscChain) rsc_chains;
  uint32_t tx_timeout;
  int32_t tx_burst;
  uint32_t has_vnet_hdr;
--
2.4.0

Re: [Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> Patch v2 add detailed commit log.
>
> This patch is to support WHQL test for Windows guest, while this feature also
> benifits other guest works as a kernel 'gro' like feature with userspace 
> implementation.
> Feature information:
>   http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324
>
> Both IPv4 and IPv6 are supported, though performance with userspace virtio
> is slow than vhost-net, there is about 30-40 percent performance
> improvement to userspace virtio, this is done by turning this feature on
> and disable 'tso' on corresponding tap interface.

Maybe you can share us with the numbers?

>
> Test steps:
> Although this feature is mainly used for window guest, i used linux guest to 
> help test
> the feature, to make things simple, i used 3 steps to test the patch as i 
> moved on.
> 1. With a tcp socket client/server pair runnig on 2 linux guest, thus i can 
> control
> the traffic and debugging the code as i want.
> 2. Netperf on linux guest test the throughput.
> 3. WHQL test with 2 Windows guest.
>
> Current status:
> IPv4 pass all the above tests. 
> IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot 
> receive
> any packet in WHQL test, debugging on the host side shows all the packets 
> have been
> pushed to th vring, by replacing it with a linux guest, i add 10 extra 
> packets before
> sending out the real packet, tcpdump running on guest only capture 6 packets, 
> don't
> find out the root cause yet, will continue working on this.

Maybe you can try dropmonitor [1] in both host and guest to find the
reason of packet dropping.

[1] ./perf script net_dropmonitor
>
> Note:
> A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes 
> during setup,
> this can be figured out by replacing it with an 'e1000' nic.
>
> Pending issues & Todo list:
> 1. Dup ack count not added in the virtio_net_hdr, but WHQL test case passes,
> looks like a bug in test case.
> 2. Missing a Feature Bit
> 3. Missing a few tcp/ip handling
> ECN change.
> TCP window scale.
>
> Wei Xu (10):
>   virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'
>   virtio-net rsc: Initilize & Cleanup
>   virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4
>   virtio-net rsc: Detailed IPv4 and General TCP data coalescing
>   virtio-net rsc: Create timer to drain the packets from the cache pool
>   virtio-net rsc: IPv4 checksum
>   virtio-net rsc: Checking TCP flag and drain specific connection
> packets
>   virtio-net rsc: Sanity check & More bypass cases check
>   virtio-net rsc: Add IPv6 support
>   virtio-net rsc: Add Receive Segment Coalesce statistics
>
>  hw/net/virtio-net.c| 626 
> -
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h |  65 +
>  3 files changed, 691 insertions(+), 1 deletion(-)
>

[Qemu-devel] [PULL 31/40] target-ppc: Rework ppc_store_slb

2016-01-31 Thread David Gibson

ppc_store_slb updates the SLB for PPC cpus with 64-bit hash MMUs.
Currently it takes two parameters, which contain values encoded as the
register arguments to the slbmte instruction, one register contains the
ESID portion of the SLBE and also the slot number, the other contains the
VSID portion of the SLBE.

We're shortly going to want to do some SLB updates from other code where
it is more convenient to supply the slot number and ESID separately, so
rework this function and its callers to work this way.

As a bonus, this slightly simplifies the emulation of segment registers for
when running a 32-bit OS on a 64-bit CPU.

Signed-off-by: David Gibson 
Reviewed-by: Laurent Vivier 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/kvm.c|  2 +-
 target-ppc/mmu-hash64.c | 24 +---
 target-ppc/mmu-hash64.h |  3 ++-
 target-ppc/mmu_helper.c | 14 +-
 4 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 3e61fcd..70ca296 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1205,7 +1205,7 @@ int kvm_arch_get_registers(CPUState *cs)
  * Only restore valid entries
  */
 if (rb & SLB_ESID_V) {
-ppc_store_slb(cpu, rb, rs);
+ppc_store_slb(cpu, rb & 0xfff, rb & ~0xfffULL, rs);
 }
 }
 #endif
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 8648408..788725c 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -136,28 +136,30 @@ void helper_slbie(CPUPPCState *env, target_ulong addr)
 }
 }
 
-int ppc_store_slb(PowerPCCPU *cpu, target_ulong rb, target_ulong rs)
+int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
+  target_ulong esid, target_ulong vsid)
 {
 CPUPPCState *env = >env;
-int slot = rb & 0xfff;
 ppc_slb_t *slb = >slb[slot];
 
-if (rb & (0x1000 - env->slb_nr)) {
-return -1; /* Reserved bits set or slot too high */
+if (slot >= env->slb_nr) {
+return -1; /* Bad slot number */
+}
+if (esid & ~(SLB_ESID_ESID | SLB_ESID_V)) {
+return -1; /* Reserved bits set */
 }
-if (rs & (SLB_VSID_B & ~SLB_VSID_B_1T)) {
+if (vsid & (SLB_VSID_B & ~SLB_VSID_B_1T)) {
 return -1; /* Bad segment size */
 }
-if ((rs & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG)) {
+if ((vsid & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG)) {
 return -1; /* 1T segment on MMU that doesn't support it */
 }
 
-/* Mask out the slot number as we store the entry */
-slb->esid = rb & (SLB_ESID_ESID | SLB_ESID_V);
-slb->vsid = rs;
+slb->esid = esid;
+slb->vsid = vsid;
 
 LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
-" %016" PRIx64 "\n", __func__, slot, rb, rs,
+" %016" PRIx64 "\n", __func__, slot, esid, vsid,
 slb->esid, slb->vsid);
 
 return 0;
@@ -197,7 +199,7 @@ void helper_store_slb(CPUPPCState *env, target_ulong rb, 
target_ulong rs)
 {
 PowerPCCPU *cpu = ppc_env_get_cpu(env);
 
-if (ppc_store_slb(cpu, rb, rs) < 0) {
+if (ppc_store_slb(cpu, rb & 0xfff, rb & ~0xfffULL, rs) < 0) {
 helper_raise_exception_err(env, POWERPC_EXCP_PROGRAM,
POWERPC_EXCP_INVAL);
 }
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 6e3de7e..24fd2c4 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -6,7 +6,8 @@
 #ifdef TARGET_PPC64
 void ppc_hash64_check_page_sizes(PowerPCCPU *cpu, Error **errp);
 void dump_slb(FILE *f, fprintf_function cpu_fprintf, PowerPCCPU *cpu);
-int ppc_store_slb(PowerPCCPU *cpu, target_ulong rb, target_ulong rs);
+int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
+  target_ulong esid, target_ulong vsid);
 hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, target_ulong addr);
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, target_ulong address, int rw,
 int mmu_idx);
diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 2446bba..7277889 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -2089,21 +2089,17 @@ void helper_store_sr(CPUPPCState *env, target_ulong 
srnum, target_ulong value)
 (int)srnum, value, env->sr[srnum]);
 #if defined(TARGET_PPC64)
 if (env->mmu_model & POWERPC_MMU_64) {
-uint64_t rb = 0, rs = 0;
+uint64_t esid, vsid;
 
 /* ESID = srnum */
-rb |= ((uint32_t)srnum & 0xf) << 28;
-/* Set the valid bit */
-rb |= SLB_ESID_V;
-/* Index = ESID */
-rb |= (uint32_t)srnum;
+esid = ((uint64_t)(srnum & 0xf) << 28) | SLB_ESID_V;
 
 /* VSID = VSID */
-rs |= (value & 0xfff) << 12;
+vsid = (value & 0xfff)

[Qemu-devel] [PULL 16/40] pseries: Clean up error handling in xics_system_init()

2016-01-31 Thread David Gibson

Use the error handling infrastructure to pass an error out from
try_create_xics() instead of assuming _abort - the caller is in a
better position to decide on error handling policy.

Also change the error handling from an _abort to _fatal, since
this occurs during the initial machine construction and could be triggered
by bad configuration rather than a program error.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Markus Armbruster 
---
 hw/ppc/spapr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3f90e50..1281e07 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -112,7 +112,7 @@ static XICSState *try_create_xics(const char *type, int 
nr_servers,
 }
 
 static XICSState *xics_system_init(MachineState *machine,
-   int nr_servers, int nr_irqs)
+   int nr_servers, int nr_irqs, Error **errp)
 {
 XICSState *icp = NULL;
 
@@ -131,7 +131,7 @@ static XICSState *xics_system_init(MachineState *machine,
 }
 
 if (!icp) {
-icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs, _abort);
+icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs, errp);
 }
 
 return icp;
@@ -1813,7 +1813,7 @@ static void ppc_spapr_init(MachineState *machine)
 spapr->icp = xics_system_init(machine,
   DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(),
smp_threads),
-  XICS_IRQS);
+  XICS_IRQS, _fatal);
 
 if (smc->dr_lmb_enabled) {
 spapr_validate_node_memory(machine, _fatal);
-- 
2.5.0

[Qemu-devel] [PULL 20/40] target-ppc: rename and export maybe_bswap_register()

2016-01-31 Thread David Gibson

From: Greg Kurz 

This helper will be used to support FP, Altivec and VSX registers when
the guest is little-endian.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/cpu.h |  1 +
 target-ppc/gdbstub.c | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index b3b89e6..2bc96b4 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -2355,4 +2355,5 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
  */
 PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
 
+void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
 #endif /* !defined (__CPU_PPC_H__) */
diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
index ef4be23..569c380 100644
--- a/target-ppc/gdbstub.c
+++ b/target-ppc/gdbstub.c
@@ -88,7 +88,7 @@ static int ppc_gdb_register_len(int n)
the proper ordering for the binary, and cannot be changed.
For system mode, TARGET_WORDS_BIGENDIAN is always set, and we must check
the current mode of the chip to see if we're running in little-endian.  */
-static void maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len)
+void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len)
 {
 #ifndef CONFIG_USER_ONLY
 if (!msr_le) {
@@ -158,7 +158,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 break;
 }
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 return r;
 }
 
@@ -214,7 +214,7 @@ int ppc_cpu_gdb_read_register_apple(CPUState *cs, uint8_t 
*mem_buf, int n)
 break;
 }
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 return r;
 }
 
@@ -227,7 +227,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 if (!r) {
 return r;
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 if (n < 32) {
 /* gprs */
 env->gpr[n] = ldtul_p(mem_buf);
@@ -277,7 +277,7 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, uint8_t 
*mem_buf, int n)
 if (!r) {
 return r;
 }
-maybe_bswap_register(env, mem_buf, r);
+ppc_maybe_bswap_register(env, mem_buf, r);
 if (n < 32) {
 /* gprs */
 env->gpr[n] = ldq_p(mem_buf);
-- 
2.5.0

[Qemu-devel] [PULL 34/40] target-ppc: Remove unused mmu models from ppc_tlb_invalidate_one

2016-01-31 Thread David Gibson

ppc_tlb_invalidate_one() has a big switch handling many different MMU
types.  However, most of those branches can never be reached:

It is called from 3 places: from remove_hpte() and h_protect() in
spapr_hcall.c (which always has a 64-bit hash MMU type), and from
helper_tlbie() in mmu_helper.c.

Calls to helper_tlbie() are generated from gen_tlbiel, gen_tlbiel and
gen_tlbiva.  The first two are only used with the PPC_MEM_TLBIE flag,
set only with 32-bit or 64-bit hash MMU models, and gen_tlbiva() is
used only on 440 and 460 models with the BookE mmu model.

These means the exhaustive list of MMU types which may call
ppc_tlb_invalidate_one() is: POWERPC_MMU_SOFT_6xx, POWERPC_MMU_601,
POWERPC_MMU_32B, POWERPC_MMU_SOFT_74xx, POWERPC_MMU_64B, POWERPC_MMU_2_03,
POWERPC_MMU_2_06, POWERPC_MMU_2_07 and POWERPC_MMU_BOOKE.

Clean up by removing logic for all other MMU types from
ppc_tlb_invalidate_one().

This means that ppc4xx_tlb_invalidate_virt() now has no callers, or rather,
makes it obvious that it has no callers.  So, we remove that function as
well.

Signed-off-by: David Gibson 
---
 target-ppc/mmu_helper.c | 46 ++
 1 file changed, 2 insertions(+), 44 deletions(-)

diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 7277889..4343cb2 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -658,32 +658,6 @@ static inline void ppc4xx_tlb_invalidate_all(CPUPPCState 
*env)
 tlb_flush(CPU(cpu), 1);
 }
 
-static inline void ppc4xx_tlb_invalidate_virt(CPUPPCState *env,
-  target_ulong eaddr, uint32_t pid)
-{
-#if !defined(FLUSH_ALL_TLBS)
-CPUState *cs = CPU(ppc_env_get_cpu(env));
-ppcemb_tlb_t *tlb;
-hwaddr raddr;
-target_ulong page, end;
-int i;
-
-for (i = 0; i < env->nb_tlb; i++) {
-tlb = >tlb.tlbe[i];
-if (ppcemb_tlb_check(env, tlb, , eaddr, pid, 0, i) == 0) {
-end = tlb->EPN + tlb->size;
-for (page = tlb->EPN; page < end; page += TARGET_PAGE_SIZE) {
-tlb_flush_page(cs, page);
-}
-tlb->prot &= ~PAGE_VALID;
-break;
-}
-}
-#else
-ppc4xx_tlb_invalidate_all(env);
-#endif
-}
-
 static int mmu40x_get_physical_address(CPUPPCState *env, mmu_ctx_t *ctx,
target_ulong address, int rw,
int access_type)
@@ -1972,25 +1946,10 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, 
target_ulong addr)
 ppc6xx_tlb_invalidate_virt(env, addr, 1);
 }
 break;
-case POWERPC_MMU_SOFT_4xx:
-case POWERPC_MMU_SOFT_4xx_Z:
-ppc4xx_tlb_invalidate_virt(env, addr, env->spr[SPR_40x_PID]);
-break;
-case POWERPC_MMU_REAL:
-cpu_abort(CPU(cpu), "No TLB for PowerPC 4xx in real mode\n");
-break;
-case POWERPC_MMU_MPC8xx:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "MPC8xx MMU model is not implemented\n");
-break;
 case POWERPC_MMU_BOOKE:
 /* XXX: TODO */
 cpu_abort(CPU(cpu), "BookE MMU model is not implemented\n");
 break;
-case POWERPC_MMU_BOOKE206:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "BookE 2.06 MMU model is not implemented\n");
-break;
 case POWERPC_MMU_32B:
 case POWERPC_MMU_601:
 /* tlbie invalidate TLBs for all segments */
@@ -2032,9 +1991,8 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, 
target_ulong addr)
 break;
 #endif /* defined(TARGET_PPC64) */
 default:
-/* XXX: TODO */
-cpu_abort(CPU(cpu), "Unknown MMU model\n");
-break;
+/* Should never reach here with other MMU models */
+assert(0);
 }
 #else
 ppc_tlb_invalidate_all(env);
-- 
2.5.0

[Qemu-devel] [PULL 21/40] target-ppc: gdbstub: fix float registers for little-endian guests

2016-01-31 Thread David Gibson

From: Greg Kurz 

Let's reuse the ppc_maybe_bswap_register() helper, like we already do
with the general registers.

Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 78c2811..031c71e 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8754,10 +8754,12 @@ static int gdb_get_float_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 {
 if (n < 32) {
 stfq_p(mem_buf, env->fpr[n]);
+ppc_maybe_bswap_register(env, mem_buf, 8);
 return 8;
 }
 if (n == 32) {
 stl_p(mem_buf, env->fpscr);
+ppc_maybe_bswap_register(env, mem_buf, 4);
 return 4;
 }
 return 0;
@@ -8766,10 +8768,12 @@ static int gdb_get_float_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
 if (n < 32) {
+ppc_maybe_bswap_register(env, mem_buf, 8);
 env->fpr[n] = ldfq_p(mem_buf);
 return 8;
 }
 if (n == 32) {
+ppc_maybe_bswap_register(env, mem_buf, 4);
 helper_store_fpscr(env, ldl_p(mem_buf), 0x);
 return 4;
 }
-- 
2.5.0

[Qemu-devel] [PULL 40/40] target-ppc: mcrfs should always update FEX/VX and only clear exception bits

2016-01-31 Thread David Gibson

From: James Clarke 

Here is the description of the mcrfs instruction from the PowerPC Architecture
Book, Version 2.02, Book I: PowerPC User Instruction Set Architecture
(http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html), found
on page 120:

The contents of FPSCR field BFA are copied to Condition Register field BF.
All exception bits copied are set to 0 in the FPSCR. If the FX bit is
copied, it is set to 0 in the FPSCR.

Special Registers Altered:
CR field BF
FX OX(if BFA=0)
UX ZX XX VXSNAN  (if BFA=1)
VXISI VXIDI VXZDZ VXIMZ  (if BFA=2)
VXVC (if BFA=3)
VXSOFT VXSQRT VXCVI  (if BFA=5)

However, currently every bit in FPSCR field BFA is set to 0, including ones not
on that list.

This can be seen in the following simple C program:

#include 
#include 

int main(int argc, char **argv) {
int ret;
ret = fegetround();
printf("Current rounding: %d\n", ret);
ret = fesetround(FE_UPWARD);
printf("Setting to FE_UPWARD (%d): %d\n", FE_UPWARD, ret);
ret = fegetround();
printf("Current rounding: %d\n", ret);
ret = fegetround();
printf("Current rounding: %d\n", ret);
return 0;
}

which gave the output (before this commit):

Current rounding: 0
Setting to FE_UPWARD (2): 0
Current rounding: 2
Current rounding: 0

instead of (after this commit):

Current rounding: 0
Setting to FE_UPWARD (2): 0
Current rounding: 2
Current rounding: 2

The relevant disassembly is in fegetround(), which, on my system, is:

__GI___fegetround:
<+0>:   mcrfs  cr7, cr7
<+4>:   mfcr   r3
<+8>:   clrldi r3, r3, 62
<+12>:  blr

What happens is that, the first time fegetround() is called, FPSCR field 7 is
retrieved. However, because of the bug in mcrfs, the entirety of field 7 is set
to 0, which includes the rounding mode.

There are other issues this will fix, such as condition flags not persisting
when they should if read, and if you were to read a specific field with some
exception bits set, but no others were set in the entire register, then the
bits would be cleared correctly, but FEX/VX would not be updated to 0 as they
should be.

Signed-off-by: James Clarke 
Signed-off-by: David Gibson 
---
 target-ppc/cpu.h   |  6 ++
 target-ppc/translate.c | 21 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index f300c86..892f4dc 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -719,6 +719,12 @@ enum {
 #define FP_RN1 (1ull << FPSCR_RN1)
 #define FP_RN  (1ull << FPSCR_RN)
 
+/* the exception bits which can be cleared by mcrfs - includes FX */
+#define FP_EX_CLEAR_BITS (FP_FX | FP_OX | FP_UX | FP_ZX | \
+  FP_XX | FP_VXSNAN | FP_VXISI  | FP_VXIDI  | \
+  FP_VXZDZ  | FP_VXIMZ  | FP_VXVC   | FP_VXSOFT | \
+  FP_VXSQRT | FP_VXCVI)
+
 /*/
 /* Vector status and control register */
 #define VSCR_NJ16 /* Vector non-java */
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 0219d38..7db3145 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2501,18 +2501,31 @@ static void gen_fmrgow(DisasContext *ctx)
 static void gen_mcrfs(DisasContext *ctx)
 {
 TCGv tmp = tcg_temp_new();
+TCGv_i32 tmask;
+TCGv_i64 tnew_fpscr = tcg_temp_new_i64();
 int bfa;
+int nibble;
+int shift;
 
 if (unlikely(!ctx->fpu_enabled)) {
 gen_exception(ctx, POWERPC_EXCP_FPU);
 return;
 }
-bfa = 4 * (7 - crfS(ctx->opcode));
-tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
+bfa = crfS(ctx->opcode);
+nibble = 7 - bfa;
+shift = 4 * nibble;
+tcg_gen_shri_tl(tmp, cpu_fpscr, shift);
 tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
-tcg_temp_free(tmp);
 tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 
0xf);
-tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
+tcg_temp_free(tmp);
+tcg_gen_extu_tl_i64(tnew_fpscr, cpu_fpscr);
+/* Only the exception bits (including FX) should be cleared if read */
+tcg_gen_andi_i64(tnew_fpscr, tnew_fpscr, ~((0xF << shift) & 
FP_EX_CLEAR_BITS));
+/* FEX and VX need to be updated, so don't set fpscr directly */
+tmask = tcg_const_i32(1 << nibble);
+gen_helper_store_fpscr(cpu_env, tnew_fpscr, tmask);
+tcg_temp_free_i32(tmask);
+tcg_temp_free_i64(tnew_fpscr);
 }
 
 /* mffs */
-- 
2.5.0

[Qemu-devel] [PULL 32/40] target-ppc: Rework SLB page size lookup

2016-01-31 Thread David Gibson

Currently, the ppc_hash64_page_shift() function looks up a page size based
on information in an SLB entry.  It open codes the bit translation for
existing CPUs, however different CPU models can have different SLB
encodings.  We already store those in the 'sps' table in CPUPPCState, but
we don't currently enforce that that actually matches the logic in
ppc_hash64_page_shift.

This patch reworks lookup of page size from SLB in several ways:
  * ppc_store_slb() will now fail (triggering an illegal instruction
exception) if given a bad SLB page size encoding
  * On success ppc_store_slb() stores a pointer to the relevant entry in
the page size table in the SLB entry.  This is looked up directly from
the published table of page size encodings, so can't get out ot sync.
  * ppc_hash64_htab_lookup() and others now use this precached page size
information rather than decoding the SLB values
  * Now that callers have easy access to the page_shift,
ppc_hash64_pte_raddr() amounts to just a deposit64(), so remove it and
have the callers use deposit64() directly.

Signed-off-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Alexander Graf 
---
 target-ppc/cpu.h|  1 +
 target-ppc/machine.c| 20 +
 target-ppc/mmu-hash64.c | 74 +++--
 3 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 2bc96b4..0820390 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -419,6 +419,7 @@ typedef struct ppc_slb_t ppc_slb_t;
 struct ppc_slb_t {
 uint64_t esid;
 uint64_t vsid;
+const struct ppc_one_seg_page_size *sps;
 };
 
 #define MAX_SLB_ENTRIES 64
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 8cabc77..692121e 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -3,6 +3,7 @@
 #include "hw/boards.h"
 #include "sysemu/kvm.h"
 #include "helper_regs.h"
+#include "mmu-hash64.h"
 
 static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
 {
@@ -353,11 +354,30 @@ static bool slb_needed(void *opaque)
 return (cpu->env.mmu_model & POWERPC_MMU_64);
 }
 
+static int slb_post_load(void *opaque, int version_id)
+{
+PowerPCCPU *cpu = opaque;
+CPUPPCState *env = >env;
+int i;
+
+/* We've pulled in the raw esid and vsid values from the migration
+ * stream, but we need to recompute the page size pointers */
+for (i = 0; i < env->slb_nr; i++) {
+if (ppc_store_slb(cpu, i, env->slb[i].esid, env->slb[i].vsid) < 0) {
+/* Migration source had bad values in its SLB */
+return -1;
+}
+}
+
+return 0;
+}
+
 static const VMStateDescription vmstate_slb = {
 .name = "cpu/slb",
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = slb_needed,
+.post_load = slb_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_INT32_EQUAL(env.slb_nr, PowerPCCPU),
 VMSTATE_SLB_ARRAY(env.slb, PowerPCCPU, MAX_SLB_ENTRIES),
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 788725c..9ad02f3 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "exec/helper-proto.h"
+#include "qemu/error-report.h"
 #include "sysemu/kvm.h"
 #include "kvm_ppc.h"
 #include "mmu-hash64.h"
@@ -141,6 +142,8 @@ int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
 {
 CPUPPCState *env = >env;
 ppc_slb_t *slb = >slb[slot];
+const struct ppc_one_seg_page_size *sps = NULL;
+int i;
 
 if (slot >= env->slb_nr) {
 return -1; /* Bad slot number */
@@ -155,8 +158,29 @@ int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
 return -1; /* 1T segment on MMU that doesn't support it */
 }
 
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+const struct ppc_one_seg_page_size *sps1 = >sps.sps[i];
+
+if (!sps1->page_shift) {
+break;
+}
+
+if ((vsid & SLB_VSID_LLP_MASK) == sps1->slb_enc) {
+sps = sps1;
+break;
+}
+}
+
+if (!sps) {
+error_report("Bad page size encoding in SLB store: slot "TARGET_FMT_lu
+ " esid 0x"TARGET_FMT_lx" vsid 0x"TARGET_FMT_lx,
+ slot, esid, vsid);
+return -1;
+}
+
 slb->esid = esid;
 slb->vsid = vsid;
+slb->sps = sps;
 
 LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
 " %016" PRIx64 "\n", __func__, slot, esid, vsid,
@@ -395,24 +419,6 @@ static hwaddr ppc_hash64_pteg_search(PowerPCCPU *cpu, 
hwaddr hash,
 return -1;
 }
 
-static uint64_t ppc_hash64_page_shift(ppc_slb_t *slb)
-{
-uint64_t epnshift;
-
-/* Page size according to the SLB, which we use to generate the
- * EPN for hash table lookup..  When we implement more recent MMU
- * extensions this might

Re: [Qemu-devel] [PATCH] dimm: Correct type of MemoryHotplugState->base

2016-01-31 Thread David Gibson

On Fri, Jan 22, 2016 at 03:32:52PM +0100, Igor Mammedov wrote:
> On Fri, 22 Jan 2016 15:21:05 +0100
> Paolo Bonzini  wrote:
> 
> > On 22/01/2016 11:02, Igor Mammedov wrote:
> > > On Thu, 21 Jan 2016 12:37:51 +1100
> > > David Gibson  wrote:
> > >   
> > >> The 'base' field of MemoryHotplugState is ram_addr_t, which indicates 
> > >> that
> > >> it exists in the abstract address space of RAM regions.
> > >>
> > >> However, the actual usage of this field indicates that it is a concrete
> > >> physical address (it's passed as an offset to 
> > >> memory_region_add_subgregion
> > >> for example).
> > >>
> > >> So, correct its type to 'hwaddr'.
> > >>
> > >> Signed-off-by: David Gibson 
> > >> ---
> > >>  include/hw/mem/pc-dimm.h | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
> > >> index d83bf30..218dfb0 100644
> > >> --- a/include/hw/mem/pc-dimm.h
> > >> +++ b/include/hw/mem/pc-dimm.h
> > >> @@ -77,7 +77,7 @@ typedef struct PCDIMMDeviceClass {
> > >>   * @mr: hotplug memory address space container
> > >>   */
> > >>  typedef struct MemoryHotplugState {
> > >> -ram_addr_t base;
> > >> +hwaddr base;
> > >>  MemoryRegion mr;
> > >>  } MemoryHotplugState;
> > >>
> > > 
> > > I agree with this fix but that's not the only place where
> > > ram_addr_t needs to be replaced with hwaddr.
> > > For example type of MachineState.[max]ram_size fields needs
> > > to be changed as well. Because QEMU builds without CONFIG_XEN_BACKEND
> > > on 32-bit hosts are broken since ram_addr_t is 32-bits there
> > > while some targets assume and use it as 64-bit one.  
> > 
> > But on a 32-bit system without CONFIG_XEN_BACKEND you cannot allocate
> > more than 4G anyway, so the choice of ram_addr_t is understandable in
> > that case.
> QEMU build will probably fail with above config but if it succeeds
> then maxmem will be silently truncated.
> 
> > 
> > On the other hand, on a 32-bit system without CONFIG_XEN_BACKEND you
> > definitely can place 128M of hot plugged memory between say 4096MB and
> > 4224MB.
> True.
> 
> Anyway for this patch
> Reviewed-by: Igor Mammedov 

Who needs to take this patch?  I'm not sure if I need to do anything
further to push it forwards.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup

2016-01-31 Thread Jason Wang



On 02/01/2016 02:13 AM, w...@redhat.com wrote:
> From: Wei Xu 
>
> The chain list is initialized when the device is getting realized,
> and the entry of the chain will be inserted dynamically according
> to protocol type of the network traffic.
>
> All the buffered packets and chain will be destroyed when the
> device is going to be unrealized.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c| 22 ++
>  include/hw/virtio/virtio-net.h |  1 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index a877614..4e9458e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1603,6 +1603,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  return 0;
>  }
>  
> +
> +static void virtio_net_rsc_cleanup(VirtIONet *n)
> +{
> +NetRscChain *chain, *rn_chain;
> +NetRscSeg *seg, *rn_seg;
> +
> +QTAILQ_FOREACH_SAFE(chain, >rsc_chains, next, rn_chain) {
> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn_seg) {
> +QTAILQ_REMOVE(>buffers, seg, next);
> +g_free(seg->buf);
> +g_free(seg);
> +
> +timer_del(chain->drain_timer);
> +timer_free(chain->drain_timer);
> +QTAILQ_REMOVE(>rsc_chains, chain, next);
> +g_free(chain);
> +}

This is suspicious. Looks like chain removing should be in outer loop.

> +}
> +}
> +
>  static NetClientInfo net_virtio_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
>  .size = sizeof(NICState),
> @@ -1732,6 +1752,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
> Error **errp)
>  nc = qemu_get_queue(n->nic);
>  nc->rxfilter_notify_enabled = 1;
>  
> +QTAILQ_INIT(>rsc_chains);
>  n->qdev = dev;
>  register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
>  virtio_net_save, virtio_net_load, n);
> @@ -1766,6 +1787,7 @@ static void virtio_net_device_unrealize(DeviceState 
> *dev, Error **errp)
>  g_free(n->vqs);
>  qemu_del_nic(n->nic);
>  virtio_cleanup(vdev);
> +virtio_net_rsc_cleanup(n);
>  }
>  
>  static void virtio_net_instance_init(Object *obj)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index f3cc25f..6ce8b93 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -59,6 +59,7 @@ typedef struct VirtIONet {
>  VirtIONetQueue *vqs;
>  VirtQueue *ctrl_vq;
>  NICState *nic;
> +QTAILQ_HEAD(, NetRscChain) rsc_chains;
>  uint32_t tx_timeout;
>  int32_t tx_burst;
>  uint32_t has_vnet_hdr;

Re: [Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest

2016-01-31 Thread Wei Xu


On 02/01/2016 11:23 AM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

Patch v2 add detailed commit log.

This patch is to support WHQL test for Windows guest, while this feature also
benifits other guest works as a kernel 'gro' like feature with userspace 
implementation.
Feature information:
   http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324

Both IPv4 and IPv6 are supported, though performance with userspace virtio
is slow than vhost-net, there is about 30-40 percent performance
improvement to userspace virtio, this is done by turning this feature on
and disable 'tso' on corresponding tap interface.

Maybe you can share us with the numbers?

Sure, will paste them in.



Test steps:
Although this feature is mainly used for window guest, i used linux guest to 
help test
the feature, to make things simple, i used 3 steps to test the patch as i moved 
on.
1. With a tcp socket client/server pair runnig on 2 linux guest, thus i can 
control
the traffic and debugging the code as i want.
2. Netperf on linux guest test the throughput.
3. WHQL test with 2 Windows guest.

Current status:
IPv4 pass all the above tests.
IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot 
receive
any packet in WHQL test, debugging on the host side shows all the packets have 
been
pushed to th vring, by replacing it with a linux guest, i add 10 extra packets 
before
sending out the real packet, tcpdump running on guest only capture 6 packets, 
don't
find out the root cause yet, will continue working on this.

Maybe you can try dropmonitor [1] in both host and guest to find the
reason of packet dropping.

[1] ./perf script net_dropmonitor

OK, thanks Jason.

Wei

Note:
A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes during 
setup,
this can be figured out by replacing it with an 'e1000' nic.

Pending issues & Todo list:
1. Dup ack count not added in the virtio_net_hdr, but WHQL test case passes,
looks like a bug in test case.
2. Missing a Feature Bit
3. Missing a few tcp/ip handling
 ECN change.
 TCP window scale.

Wei Xu (10):
   virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status'
   virtio-net rsc: Initilize & Cleanup
   virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4
   virtio-net rsc: Detailed IPv4 and General TCP data coalescing
   virtio-net rsc: Create timer to drain the packets from the cache pool
   virtio-net rsc: IPv4 checksum
   virtio-net rsc: Checking TCP flag and drain specific connection
 packets
   virtio-net rsc: Sanity check & More bypass cases check
   virtio-net rsc: Add IPv6 support
   virtio-net rsc: Add Receive Segment Coalesce statistics

  hw/net/virtio-net.c| 626 -
  include/hw/virtio/virtio-net.h |   1 +
  include/hw/virtio/virtio.h |  65 +
  3 files changed, 691 insertions(+), 1 deletion(-)

1 2 >

1 - 100 of 137 matches

Mail list logo