Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/5] ppc: Improve PCR bit selection in ppc_set_compat()

2016-06-07 Thread David Gibson
On Tue, Jun 07, 2016 at 05:39:39PM +0200, Thomas Huth wrote:
> When using an olderr PowerISA level, all the upper compatibility
> bits have to be enabled, too. For example when we want to run
> something in PowerISA 2.05 compatibility mode on POWER8, the bit
> for 2.06 has to be set beside the bit for 2.05.
> Additionally, to make sure that we do not set bits that are not
> supported by the host, we apply a mask with the known-to-be-good
> bits here, too.
> 
> Signed-off-by: Thomas Huth 

So, this breaks compile on 32-bit targets, because the spr values are
only 32-bit there, and the PCR constants exceed that.  But
ppc_set_compat() is only actually used on 64-bit machines, so I've
added a change to #if it out for 64-bit targets.

> ---
>  target-ppc/translate_init.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index fa09183..ee2bc14 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -9519,24 +9519,29 @@ void ppc_set_compat(PowerPCCPU *cpu, uint32_t 
> cpu_version, Error **errp)
>  {
>  int ret = 0;
>  CPUPPCState *env = >env;
> +PowerPCCPUClass *host_pcc;
>  
>  cpu->cpu_version = cpu_version;
>  
>  switch (cpu_version) {
>  case CPU_POWERPC_LOGICAL_2_05:
> -env->spr[SPR_PCR] = PCR_COMPAT_2_05;
> +env->spr[SPR_PCR] = PCR_TM_DIS | PCR_VSX_DIS | PCR_COMPAT_2_07 |
> +PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
>  break;
>  case CPU_POWERPC_LOGICAL_2_06:
> -env->spr[SPR_PCR] = PCR_COMPAT_2_06;
> -break;
>  case CPU_POWERPC_LOGICAL_2_06_PLUS:
> -env->spr[SPR_PCR] = PCR_COMPAT_2_06;
> +env->spr[SPR_PCR] = PCR_TM_DIS | PCR_COMPAT_2_07 | PCR_COMPAT_2_06;
>  break;
>  default:
>  env->spr[SPR_PCR] = 0;
>  break;
>  }
>  
> +host_pcc = kvm_ppc_get_host_cpu_class();
> +if (host_pcc) {
> +env->spr[SPR_PCR] &= host_pcc->pcr_mask;
> +}
> +
>  if (kvm_enabled()) {
>  ret = kvmppc_set_compat(cpu, cpu->cpu_version);
>  if (ret < 0) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] hw/timer: Add value matching support to aspeed_timer

2016-06-07 Thread Andrew Jeffery
On Mon, 2016-06-06 at 15:01 +0100, Peter Maydell wrote:
> On 27 May 2016 at 06:08, Andrew Jeffery  wrote:
> > 
> > Value matching allows Linux to boot with CONFIG_NO_HZ_IDLE=y on the
> > palmetto-bmc machine. Two match registers are provided for each timer.
> > 
> > Signed-off-by: Andrew Jeffery 
> > ---
> > 
> > The change pulls out ptimer in favour of the regular timer infrastructure. 
> > As a
> > consequence it implements the conversions between ticks and time which 
> > feels a
> > little tedious. Any comments there would be appreciated.
> So what would you need from ptimer to be able to implement value
> matching with it; or is ptimer just too far away from what this
> timer device needs to make that worthwhile ?

I gave expanding the ptimer API some (quick) consideration. It feels
like it might be a departure from "simple" and depending on your view a
departure from "periodic"; though an interrupt for a given match value
is at least periodic with respect to itself. In this case the hardware
supports two match values, so if we add something to the ptimer API it
would need to support an arbitrary number of values
(ptimer_{add,del}_match(...)?). In this case the hardware counts down,
but just as we're doing currently we can fix the values to give the
right behaviour.

I guess the final thought was that you queried me on the #include
"qemu/main-loop.h" in the original patch, and that moving away from
ptimer would eliminate it.

If we come up with an acceptable match value API for ptimer I can
implement it and resend.

Cheers,

Andrew

signature.asc
Description: This is a digitally signed message part


[Qemu-devel] [PATCH 3/3] record/replay: add network support

2016-06-07 Thread Pavel Dovgalyuk
This patch adds support of recording and replaying network packets in
irount rr mode.

Record and replay for network interactions is performed with the network filter.
Each backend must have its own instance of the replay filter as follows:
 -netdev user,id=net1 -device rtl8139,netdev=net1
 -object filter-replay,id=replay,netdev=net1

Replay network filter is used to record and replay network packets. While
recording the virtual machine this filter puts all packets coming from
the outer world into the log. In replay mode packets from the log are
injected into the network device. All interactions with network backend
in replay mode are disabled.

Signed-off-by: Pavel Dovgalyuk 
---
 docs/replay.txt  |   14 ++
 include/sysemu/replay.h  |   12 +
 net/Makefile.objs|1 
 net/filter-replay.c  |   90 ++
 replay/Makefile.objs |1 
 replay/replay-events.c   |   11 +
 replay/replay-internal.h |   10 
 replay/replay-net.c  |  110 ++
 replay/replay.c  |2 -
 vl.c |3 +
 10 files changed, 252 insertions(+), 2 deletions(-)
 create mode 100644 net/filter-replay.c
 create mode 100644 replay/replay-net.c

diff --git a/docs/replay.txt b/docs/replay.txt
index 779c6c0..347b2ff 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -195,3 +195,17 @@ Queue is flushed at checkpoints and information about 
processed requests
 is recorded to the log. In replay phase the queue is matched with
 events read from the log. Therefore block devices requests are processed
 deterministically.
+
+Network devices
+---
+
+Record and replay for network interactions is performed with the network 
filter.
+Each backend must have its own instance of the replay filter as follows:
+ -netdev user,id=net1 -device rtl8139,netdev=net1
+ -object filter-replay,id=replay,netdev=net1
+
+Replay network filter is used to record and replay network packets. While
+recording the virtual machine this filter puts all packets coming from
+the outer world into the log. In replay mode packets from the log are
+injected into the network device. All interactions with network backend
+in replay mode are disabled.
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 52430d3..fa61aae 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -39,6 +39,8 @@ enum ReplayCheckpoint {
 };
 typedef enum ReplayCheckpoint ReplayCheckpoint;
 
+typedef struct ReplayNetState ReplayNetState;
+
 extern ReplayMode replay_mode;
 
 /* Replay process control functions */
@@ -135,4 +137,14 @@ void replay_char_read_all_save_error(int res);
 /*! Writes character read_all execution result into the replay log. */
 void replay_char_read_all_save_buf(uint8_t *buf, int offset);
 
+/* Network */
+
+/*! Registers replay network filter attached to some backend. */
+ReplayNetState *replay_register_net(NetFilterState *nfs);
+/*! Unregisters replay network filter. */
+void replay_unregister_net(ReplayNetState *rns);
+/*! Called to write network packet to the replay log. */
+void replay_net_packet_event(ReplayNetState *rns, unsigned flags,
+ const struct iovec *iov, int iovcnt);
+
 #endif
diff --git a/net/Makefile.objs b/net/Makefile.objs
index b7c22fd..f787ba4 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -16,3 +16,4 @@ common-obj-$(CONFIG_NETMAP) += netmap.o
 common-obj-y += filter.o
 common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
+common-obj-y += filter-replay.o
diff --git a/net/filter-replay.c b/net/filter-replay.c
new file mode 100644
index 000..7d93dc9
--- /dev/null
+++ b/net/filter-replay.c
@@ -0,0 +1,90 @@
+/*
+ * filter-replay.c
+ *
+ * Copyright (c) 2010-2016 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "clients.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qemu/iov.h"
+#include "qemu/log.h"
+#include "qemu/timer.h"
+#include "qapi/visitor.h"
+#include "net/filter.h"
+#include "sysemu/replay.h"
+
+#define TYPE_FILTER_REPLAY "filter-replay"
+
+#define FILTER_REPLAY(obj) \
+OBJECT_CHECK(NetFilterReplayState, (obj), TYPE_FILTER_REPLAY)
+
+struct NetFilterReplayState {
+NetFilterState nfs;
+ReplayNetState *rns;
+};
+typedef struct NetFilterReplayState NetFilterReplayState;
+
+static ssize_t filter_replay_receive_iov(NetFilterState *nf, NetClientState 
*sndr,
+ unsigned flags, const struct iovec 
*iov,
+ int iovcnt, NetPacketSent *sent_cb)
+{
+NetFilterReplayState *nfrs = FILTER_REPLAY(nf);
+switch (replay_mode) {
+case 

Re: [Qemu-devel] [V11 1/4] hw/i386: Introduce AMD IOMMU

2016-06-07 Thread Jan Kiszka
On 2016-06-07 22:36, Alex Williamson wrote:
> On Sun, 22 May 2016 13:21:51 +0300
> David Kiarie  wrote:
> 
>> Add AMD IOMMU emulaton to Qemu in addition to Intel IOMMU
>> The IOMMU does basic translation, error checking and has a
>> minimal IOTLB implementation. This IOMMU bypassed the need
>> for target aborts by responding with IOMMU_NONE access rights
>> and exempts the region 0xfee0-0xfeef from translation
>> as it is the q35 interrupt region. We also advertise features
>> that are not yet implemented to please the Linux IOMMU driver.
>>
>> IOTLB aims at implementing commands on real IOMMUs which is
>> essential for debugging and may not offer any performance
>> benefits
>>
>> Signed-off-by: David Kiarie 
>> ---
>>  hw/i386/Makefile.objs |1 +
>>  hw/i386/amd_iommu.c   | 1401 
>> +
>>  hw/i386/amd_iommu.h   |  340 
>>  include/hw/pci/pci.h  |2 +
>>  4 files changed, 1744 insertions(+)
>>  create mode 100644 hw/i386/amd_iommu.c
>>  create mode 100644 hw/i386/amd_iommu.h
> 
> I don't see any callouts to memory_region_notify_iommu() here, so this
> won't yet support assigned devices.  Do you have any plans to add that
> support?  Thanks,

One after the other: correct emulation of all key features is the
primary goal, adding support for assigned devices a bonus. However, it's
not a simple one as we will probably need shadow page tables. So GSoC is
likely not long enough for this.

Jan




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 1/3] target-ppc: exceptions handling in icount mode

2016-06-07 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch fixes exception handling in PowerPC.
Instructions generate several types of exceptions.
When exception is generated, it breaks the execution of the current translation
block. Implementation of the exceptions handling does not correctly
restore icount for the instruction which caused the exception. In most cases
icount will be decreased by the value equal to the size of TB.
This patch passes pointer to the translation block internals to the exception
handler. It allows correct restoring of the icount value.

Signed-off-by: Pavel Dovgalyuk 
---
 target-ppc/cpu.h |3 +
 target-ppc/excp_helper.c |   38 ++--
 target-ppc/fpu_helper.c  |  192 ++
 target-ppc/helper.h  |1 
 target-ppc/mem_helper.c  |6 +
 target-ppc/misc_helper.c |8 +-
 target-ppc/mmu-hash64.c  |   12 +--
 target-ppc/mmu_helper.c  |   18 ++--
 target-ppc/timebase_helper.c |   21 ++---
 target-ppc/translate.c   |   84 +-
 10 files changed, 169 insertions(+), 214 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 98a24a5..4d7319a 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -2391,4 +2391,7 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
 PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
 
 void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
+void raise_exception_err(CPUPPCState *env, uint32_t exception,
+ uint32_t error_code, uintptr_t pc);
+
 #endif /* !defined (__CPU_PPC_H__) */
diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
index a37009e..ec006fa 100644
--- a/target-ppc/excp_helper.c
+++ b/target-ppc/excp_helper.c
@@ -887,8 +887,8 @@ static void cpu_dump_rfi(target_ulong RA, target_ulong msr)
 /*/
 /* Exceptions processing helpers */
 
-void helper_raise_exception_err(CPUPPCState *env, uint32_t exception,
-uint32_t error_code)
+void raise_exception_err(CPUPPCState *env, uint32_t exception,
+ uint32_t error_code, uintptr_t pc)
 {
 CPUState *cs = CPU(ppc_env_get_cpu(env));
 
@@ -897,15 +897,32 @@ void helper_raise_exception_err(CPUPPCState *env, 
uint32_t exception,
 #endif
 cs->exception_index = exception;
 env->error_code = error_code;
-cpu_loop_exit(cs);
+cpu_loop_exit_restore(cs, pc);
+}
+
+void helper_raise_exception_err(CPUPPCState *env, uint32_t exception,
+uint32_t error_code)
+{
+raise_exception_err(env, exception, error_code, GETPC());
+}
+
+void helper_raise_exception_end(CPUPPCState *env, uint32_t exception,
+uint32_t error_code)
+{
+raise_exception_err(env, exception, error_code, 0);
 }
 
 void helper_raise_exception(CPUPPCState *env, uint32_t exception)
 {
-helper_raise_exception_err(env, exception, 0);
+raise_exception_err(env, exception, 0, GETPC());
 }
 
 #if !defined(CONFIG_USER_ONLY)
+static void raise_exception(CPUPPCState *env, uint32_t exception, uintptr_t pc)
+{
+raise_exception_err(env, exception, 0, pc);
+}
+
 void helper_store_msr(CPUPPCState *env, target_ulong val)
 {
 CPUState *cs;
@@ -914,7 +931,8 @@ void helper_store_msr(CPUPPCState *env, target_ulong val)
 if (val != 0) {
 cs = CPU(ppc_env_get_cpu(env));
 cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
-helper_raise_exception(env, val);
+/* nip is updated by generated code */
+raise_exception(env, val, 0);
 }
 }
 
@@ -1015,8 +1033,9 @@ void helper_tw(CPUPPCState *env, target_ulong arg1, 
target_ulong arg2,
   ((int32_t)arg1 == (int32_t)arg2 && (flags & 0x04)) ||
   ((uint32_t)arg1 < (uint32_t)arg2 && (flags & 0x02)) ||
   ((uint32_t)arg1 > (uint32_t)arg2 && (flags & 0x01) {
-helper_raise_exception_err(env, POWERPC_EXCP_PROGRAM,
-   POWERPC_EXCP_TRAP);
+/* nip is updated in TB */
+raise_exception_err(env, POWERPC_EXCP_PROGRAM,
+POWERPC_EXCP_TRAP, 0);
 }
 }
 
@@ -1029,8 +1048,9 @@ void helper_td(CPUPPCState *env, target_ulong arg1, 
target_ulong arg2,
   ((int64_t)arg1 == (int64_t)arg2 && (flags & 0x04)) ||
   ((uint64_t)arg1 < (uint64_t)arg2 && (flags & 0x02)) ||
   ((uint64_t)arg1 > (uint64_t)arg2 && (flags & 0x01) {
-helper_raise_exception_err(env, POWERPC_EXCP_PROGRAM,
-   POWERPC_EXCP_TRAP);
+/* nip is updated in TB */
+raise_exception_err(env, POWERPC_EXCP_PROGRAM,
+POWERPC_EXCP_TRAP, 0);
 }
 }
 #endif
diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index b67ebca..a02bc63 100644
--- 

[Qemu-devel] [PATCH 2/3] replay: allow replay stopping and restarting

2016-06-07 Thread Pavel Dovgalyuk
This patch fixes bug with stopping and restarting replay
through monitor.

Signed-off-by: Pavel Dovgalyuk 
---
 block/blkreplay.c|   18 +-
 cpus.c   |1 +
 include/sysemu/replay.h  |2 ++
 replay/replay-internal.h |2 --
 vl.c |1 +
 5 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 42f1813..438170c 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -70,6 +70,14 @@ static void blkreplay_bh_cb(void *opaque)
 g_free(req);
 }
 
+static uint64_t blkreplay_next_id(void)
+{
+if (replay_events_enabled()) {
+return request_id++;
+}
+return 0;
+}
+
 static void block_request_create(uint64_t reqid, BlockDriverState *bs,
  Coroutine *co)
 {
@@ -84,7 +92,7 @@ static void block_request_create(uint64_t reqid, 
BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
-uint64_t reqid = request_id++;
+uint64_t reqid = blkreplay_next_id();
 int ret = bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);
 block_request_create(reqid, bs, qemu_coroutine_self());
 qemu_coroutine_yield();
@@ -95,7 +103,7 @@ static int coroutine_fn blkreplay_co_readv(BlockDriverState 
*bs,
 static int coroutine_fn blkreplay_co_writev(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
-uint64_t reqid = request_id++;
+uint64_t reqid = blkreplay_next_id();
 int ret = bdrv_co_writev(bs->file->bs, sector_num, nb_sectors, qiov);
 block_request_create(reqid, bs, qemu_coroutine_self());
 qemu_coroutine_yield();
@@ -106,7 +114,7 @@ static int coroutine_fn 
blkreplay_co_writev(BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
 {
-uint64_t reqid = request_id++;
+uint64_t reqid = blkreplay_next_id();
 int ret = bdrv_co_write_zeroes(bs->file->bs, sector_num, nb_sectors, 
flags);
 block_request_create(reqid, bs, qemu_coroutine_self());
 qemu_coroutine_yield();
@@ -117,7 +125,7 @@ static int coroutine_fn 
blkreplay_co_write_zeroes(BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_discard(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors)
 {
-uint64_t reqid = request_id++;
+uint64_t reqid = blkreplay_next_id();
 int ret = bdrv_co_discard(bs->file->bs, sector_num, nb_sectors);
 block_request_create(reqid, bs, qemu_coroutine_self());
 qemu_coroutine_yield();
@@ -127,7 +135,7 @@ static int coroutine_fn 
blkreplay_co_discard(BlockDriverState *bs,
 
 static int coroutine_fn blkreplay_co_flush(BlockDriverState *bs)
 {
-uint64_t reqid = request_id++;
+uint64_t reqid = blkreplay_next_id();
 int ret = bdrv_co_flush(bs->file->bs);
 block_request_create(reqid, bs, qemu_coroutine_self());
 qemu_coroutine_yield();
diff --git a/cpus.c b/cpus.c
index 326742f..34f951f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -742,6 +742,7 @@ static int do_vm_stop(RunState state)
 runstate_set(state);
 vm_state_notify(0, state);
 qapi_event_send_stop(_abort);
+replay_disable_events();
 }
 
 bdrv_drain_all();
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 0a88393..52430d3 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -105,6 +105,8 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint);
 
 /*! Disables storing events in the queue */
 void replay_disable_events(void);
+/*! Enables storing events in the queue */
+void replay_enable_events(void);
 /*! Returns true when saving events is enabled */
 bool replay_events_enabled(void);
 /*! Adds bottom half event to the queue */
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index efbf14c..310c4b7 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -119,8 +119,6 @@ void replay_read_next_clock(unsigned int kind);
 void replay_init_events(void);
 /*! Clears internal data structures for events handling */
 void replay_finish_events(void);
-/*! Enables storing events in the queue */
-void replay_enable_events(void);
 /*! Flushes events queue */
 void replay_flush_events(void);
 /*! Clears events list before loading new VM state */
diff --git a/vl.c b/vl.c
index 2f74fe8..b361ca8 100644
--- a/vl.c
+++ b/vl.c
@@ -765,6 +765,7 @@ void vm_start(void)
 if (runstate_is_running()) {
 qapi_event_send_stop(_abort);
 } else {
+replay_enable_events();
 cpu_enable_ticks();
 runstate_set(RUN_STATE_RUNNING);
 vm_state_notify(1, RUN_STATE_RUNNING);




[Qemu-devel] [PATCH 0/3] icount/replay additions

2016-06-07 Thread Pavel Dovgalyuk
This set of patches includes fixes and additions for icount and
record/replay implementation:
 - Fixing icount processing on exceptions in PPC
 - Enabling VM start/stop in replay mode
 - Adding network record/replay

---

Pavel Dovgalyuk (3):
  target-ppc: exceptions handling in icount mode
  replay: allow replay stopping and restarting
  record/replay: add network support


 block/blkreplay.c|   18 +++-
 cpus.c   |1 
 docs/replay.txt  |   14 +++
 include/sysemu/replay.h  |   14 +++
 net/Makefile.objs|1 
 net/filter-replay.c  |   90 
 replay/Makefile.objs |1 
 replay/replay-events.c   |   11 ++
 replay/replay-internal.h |   12 ++-
 replay/replay-net.c  |  110 
 replay/replay.c  |2 
 target-ppc/cpu.h |3 +
 target-ppc/excp_helper.c |   38 ++--
 target-ppc/fpu_helper.c  |  192 ++
 target-ppc/helper.h  |1 
 target-ppc/mem_helper.c  |6 +
 target-ppc/misc_helper.c |8 +-
 target-ppc/mmu-hash64.c  |   12 +--
 target-ppc/mmu_helper.c  |   18 ++--
 target-ppc/timebase_helper.c |   21 ++---
 target-ppc/translate.c   |   84 +-
 vl.c |4 +
 22 files changed, 438 insertions(+), 223 deletions(-)
 create mode 100644 net/filter-replay.c
 create mode 100644 replay/replay-net.c

-- 
Pavel Dovgalyuk



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Alex Williamson
On Wed, 8 Jun 2016 11:18:42 +0800
Dong Jia  wrote:

> On Tue, 7 Jun 2016 19:39:21 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 8 Jun 2016 01:18:42 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > 
> > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > "Tian, Kevin"  wrote:
> > > > 
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > >
> > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > Neo Jia  wrote:
> > > > > >
> > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > Neo Jia  wrote:
> > > > > > > >
> > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > This intends to handle an intercepted channel I/O instruction. 
> > > > > > > > It
> > > > > > > > basically need to do the following thing:
> > > > > > >
> > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > VFIO ioctl at
> > > > > > > first place?
> > > > > >
> > > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > > emulated device in QEMU that's trying to tell the mediated device 
> > > > > > when
> > > > > > to start an operation when we probably should be passing through
> > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > mediated
> > > > > > device. Thanks,
> > > > > >
> > > > > > Alex
> > > > >
> > > > > Below is copied from Dong's earlier post which said clear that
> > > > > a guest cmd submission will trigger the whole flow:
> > > > >
> > > > > 
> > > > > Explanation:
> > > > > Q1-Q4: Qemu side process.
> > > > > K1-K6: Kernel side process.
> > > > >
> > > > > Q1. Intercept a ssch instruction.
> > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > (u_ccwchain).
> > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > program, which becomes runnable for a real device.
> > > > > K3. With the necessary information contained in the orb passed in
> > > > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > > > for the I/O result.
> > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait 
> > > > > q.
> > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > > > update the user space irb.
> > > > > K6. Copy irb and scsw back to user space.
> > > > > Q4. Update the irb for the guest.
> > > > > 
> > > > 
> > > > Right, but this was the pre-mediated device approach, now we no longer
> > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > QEMU if those are operations that are not visible to the mediated
> > > > device; which they very well might be, since it's described as an
> > > > instruction rather than an i/o operation.  It's not terrible if that's
> > > > the case, vfio-pci has its own ioctl for doing a hot reset.
> Dear Alex, Kevin and Neo,
> 
> 'ssch' is a privileged I/O instruction, which should be finally issued
> to the dedicated subchannel of the physical device.
> 
> BTW, I did remove step Q2 with all of the user-space translation code,
> according to your comments in another thread.
> 
> > > 
> > >   
> > > > 
> > > > > My understanding is that such thing belongs to how device is mediated
> > > > > (so device driver specific), instead of something to be abstracted in
> > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > >
> > > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > model will intercept the submission request (in its own way), do its
> > > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > > to physical GPU through vendor specific interface.
> > > > >
> > > > > No difference with channel I/O here.
> > > > 
> > > > Well, if the GPU command is submitted through an MMIO register, is that
> > > > MMIO register part of the mediated device?  If so, could the mediated
> > > > device recognize the command and do the scan/audit itself?  QEMU must
> > > > not be the point at which mediation occurs for security purposes, QEMU
> > > > is userspace and userspace is not to be trusted.  I'm still open to
> > > > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > > > already, but we need to evaluate each one, why it needs to exist, and
> > > > whether we can skip it if the 

Re: [Qemu-devel] [PATCH 00/17] some ARM platform QOM'ify work

2016-06-07 Thread xiaoqiang zhao



在 2016年06月08日 05:32, Mark Cave-Ayland 写道:

On 07/06/16 11:34, xiaoqiang zhao wrote:


This patch series QOM'ify ARM platform related devices.
Where we drop the sysbus init function if possible and use
instance_init and DeviceClass::realize function.

xiaoqiang zhao (17):
   hw/i2c: QOM'ify bitbang_i2c.c
   hw/i2c: QOM'ify exynos4210_i2c.c
   hw/i2c: QOM'ify omap_i2c.c
   hw/i2c: QOM'ify versatile_i2c.c
   hw/gpio: QOM'ify mpc8xxx.c
   hw/gpio: QOM'ify omap_gpio.c
   hw/gpio: QOM'ify pl061.c
   hw/gpio: QOM'ify zaurus.c
   hw/misc: QOM'ify arm_l2x0.c
   hw/misc: QOM'ify eccmemctl.c
   hw/misc: QOM'ify exynos4210_pmu.c
   hw/misc: QOM'ify mst_fpga.c
   hw/misc: QOM'ify slavio_misc.c
   hw/dma: QOM'ify pxa2xx_dma.c
   hw/dma: QOM'ify sparc32_dma.c
   hw/dma: QOM'ify sun4m_iommu.c
   hw/sd: QOM'ify pl181.c

  hw/dma/pxa2xx_dma.c  | 38 +-
  hw/dma/sparc32_dma.c | 25 
  hw/dma/sun4m_iommu.c | 12 --
  hw/gpio/mpc8xxx.c| 20 +---
  hw/gpio/omap_gpio.c  | 61 
  hw/gpio/pl061.c  | 24 +++
  hw/gpio/zaurus.c | 14 +--
  hw/i2c/bitbang_i2c.c | 14 +--
  hw/i2c/exynos4210_i2c.c  | 13 +--
  hw/i2c/omap_i2c.c| 44 --
  hw/i2c/versatile_i2c.c   | 19 +--
  hw/misc/arm_l2x0.c   | 11 -
  hw/misc/eccmemctl.c  | 25 +---
  hw/misc/exynos4210_pmu.c | 11 -
  hw/misc/mst_fpga.c   | 13 +--
  hw/misc/slavio_misc.c| 43 ++
  hw/sd/pl181.c| 26 +
  17 files changed, 207 insertions(+), 206 deletions(-)

Patches 16 and 17 for sparc32_dma and sun4m_iommu are actually sun4m
SPARC rather than ARM devices, so while I don't mind if these go through
someone else's tree then please ensure that you also test
qemu-system-sparc thoroughly with these patches.


ATB,

Mark.


I have tested with following cmdline:

qemu-system-sparc -hda /home/hitmoon/debian_etch_sparc_small.qcow2

default machine is SS-5 and guest is debian, It seems everything is ok!




Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Neo Jia
On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote:
> On Tue, 7 Jun 2016 19:39:21 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 8 Jun 2016 01:18:42 +
> > "Tian, Kevin"  wrote:
> > 
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > 
> > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > "Tian, Kevin"  wrote:
> > > >   
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > >
> > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > Neo Jia  wrote:
> > > > > >  
> > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > Neo Jia  wrote:
> > > > > > > >
> > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > This intends to handle an intercepted channel I/O instruction. 
> > > > > > > > It
> > > > > > > > basically need to do the following thing:  
> > > > > > >
> > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > VFIO ioctl at
> > > > > > > first place?  
> > > > > >
> > > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > > emulated device in QEMU that's trying to tell the mediated device 
> > > > > > when
> > > > > > to start an operation when we probably should be passing through
> > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > mediated
> > > > > > device. Thanks,
> > > > > >
> > > > > > Alex  
> > > > >
> > > > > Below is copied from Dong's earlier post which said clear that
> > > > > a guest cmd submission will trigger the whole flow:
> > > > >
> > > > > 
> > > > > Explanation:
> > > > > Q1-Q4: Qemu side process.
> > > > > K1-K6: Kernel side process.
> > > > >
> > > > > Q1. Intercept a ssch instruction.
> > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > (u_ccwchain).
> > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > program, which becomes runnable for a real device.
> > > > > K3. With the necessary information contained in the orb passed in
> > > > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > > > for the I/O result.
> > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait 
> > > > > q.
> > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > > > update the user space irb.
> > > > > K6. Copy irb and scsw back to user space.
> > > > > Q4. Update the irb for the guest.
> > > > >   
> > > > 
> > > > Right, but this was the pre-mediated device approach, now we no longer
> > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > QEMU if those are operations that are not visible to the mediated
> > > > device; which they very well might be, since it's described as an
> > > > instruction rather than an i/o operation.  It's not terrible if that's
> > > > the case, vfio-pci has its own ioctl for doing a hot reset.  
> Dear Alex, Kevin and Neo,
> 
> 'ssch' is a privileged I/O instruction, which should be finally issued
> to the dedicated subchannel of the physical device.
> 
> BTW, I did remove step Q2 with all of the user-space translation code,
> according to your comments in another thread.
> 
> > > 
> > > 
> > > >   
> > > > > My understanding is that such thing belongs to how device is mediated
> > > > > (so device driver specific), instead of something to be abstracted in
> > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > >
> > > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > model will intercept the submission request (in its own way), do its
> > > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > > to physical GPU through vendor specific interface.
> > > > >
> > > > > No difference with channel I/O here.  
> > > > 
> > > > Well, if the GPU command is submitted through an MMIO register, is that
> > > > MMIO register part of the mediated device?  If so, could the mediated
> > > > device recognize the command and do the scan/audit itself?  QEMU must
> > > > not be the point at which mediation occurs for security purposes, QEMU
> > > > is userspace and userspace is not to be trusted.  I'm still open to
> > > > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > > > already, but we need to evaluate each one, why it needs to exist, and
> > > > whether we can skip it if the mediated device can trigger the action on
> > > > 

Re: [Qemu-devel] [PATCH v6 14/22] mirror: Disable image locking on target backing chain

2016-06-07 Thread Fam Zheng
On Mon, 06/06 17:03, Max Reitz wrote:
> On 03.06.2016 10:49, Fam Zheng wrote:
> > In sync=none the backing image of s->target is s->common.bs, which could
> > be exclusively locked, the image locking wouldn't work here.
> > 
> > Later we can update completion code to lock it after the replaced node
> > has dropped its lock.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  blockdev.c | 10 --
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> Without having reviewed the patch in the new context, when basing your
> series on v2 of my "block/mirror: Fix target backing BDS​", all iotests
> pass even without this patch.
> 
> Without my series, test 041 fails.
> 
> So I have reason to hope that I was actually able to make this patch
> superfluous.

Great, then I can drop this patch from v7 later.

Fam



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Dong Jia
On Tue, 7 Jun 2016 19:39:21 -0600
Alex Williamson  wrote:

> On Wed, 8 Jun 2016 01:18:42 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > 
> > > On Tue, 7 Jun 2016 03:03:32 +
> > > "Tian, Kevin"  wrote:
> > >   
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > >
> > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > Neo Jia  wrote:
> > > > >  
> > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > Neo Jia  wrote:
> > > > > > >
> > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > This intends to handle an intercepted channel I/O instruction. It
> > > > > > > basically need to do the following thing:  
> > > > > >
> > > > > > May I ask how and when QEMU knows that he needs to issue such VFIO 
> > > > > > ioctl at
> > > > > > first place?  
> > > > >
> > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > emulated device in QEMU that's trying to tell the mediated device when
> > > > > to start an operation when we probably should be passing through
> > > > > whatever i/o operations indicate that status directly to the mediated
> > > > > device. Thanks,
> > > > >
> > > > > Alex  
> > > >
> > > > Below is copied from Dong's earlier post which said clear that
> > > > a guest cmd submission will trigger the whole flow:
> > > >
> > > > 
> > > > Explanation:
> > > > Q1-Q4: Qemu side process.
> > > > K1-K6: Kernel side process.
> > > >
> > > > Q1. Intercept a ssch instruction.
> > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > (u_ccwchain).
> > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > program, which becomes runnable for a real device.
> > > > K3. With the necessary information contained in the orb passed in
> > > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > > for the I/O result.
> > > > K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > > update the user space irb.
> > > > K6. Copy irb and scsw back to user space.
> > > > Q4. Update the irb for the guest.
> > > >   
> > > 
> > > Right, but this was the pre-mediated device approach, now we no longer
> > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > QEMU if those are operations that are not visible to the mediated
> > > device; which they very well might be, since it's described as an
> > > instruction rather than an i/o operation.  It's not terrible if that's
> > > the case, vfio-pci has its own ioctl for doing a hot reset.  
Dear Alex, Kevin and Neo,

'ssch' is a privileged I/O instruction, which should be finally issued
to the dedicated subchannel of the physical device.

BTW, I did remove step Q2 with all of the user-space translation code,
according to your comments in another thread.

> > 
> > 
> > >   
> > > > My understanding is that such thing belongs to how device is mediated
> > > > (so device driver specific), instead of something to be abstracted in
> > > > VFIO which manages resource but doesn't care how resource is used.
> > > >
> > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > model will intercept the submission request (in its own way), do its
> > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > to physical GPU through vendor specific interface.
> > > >
> > > > No difference with channel I/O here.  
> > > 
> > > Well, if the GPU command is submitted through an MMIO register, is that
> > > MMIO register part of the mediated device?  If so, could the mediated
> > > device recognize the command and do the scan/audit itself?  QEMU must
> > > not be the point at which mediation occurs for security purposes, QEMU
> > > is userspace and userspace is not to be trusted.  I'm still open to
> > > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > > already, but we need to evaluate each one, why it needs to exist, and
> > > whether we can skip it if the mediated device can trigger the action on
> > > its own.  After all, that's why we're using the vfio api, so we can
> > > re-use much of the existing infrastructure, especially for a vGPU that
> > > exposes itself as a PCI device.  Thanks,
> > >   
> > 
> > My point is that a guest submission on vGPU is just a normal trapped 
> > register write, which is 

Re: [Qemu-devel] [PATCH v2 3/3] q35: allow dynamic sysbus

2016-06-07 Thread Peter Xu
On Thu, Jun 02, 2016 at 11:15:55PM +0300, Marcel Apfelbaum wrote:
> Allow adding sysbus devices with -device on Q35.
> 
> At first Q35 will support only intel-iommu to be added this way,
> however the command line will support all sysbus devices.
> 
> Mark with 'cannot_instantiate_with_device_add_yet' the ones
> causing immediate problems (e.g. crashes).

What happens if we do dynamic device_add for IOMMU? Guessing we should
not allow that as well?

-- peterx



Re: [Qemu-devel] [PATCH v6 04/22] block: Introduce image file locking

2016-06-07 Thread Fam Zheng
On Tue, 06/07 21:51, Jason Dillaman wrote:
> On Fri, Jun 3, 2016 at 4:48 AM, Fam Zheng  wrote:
> > +typedef enum {
> > +/* The values are ordered so that lower number implies higher 
> > restriction.
> > + * Starting from 1 to make 0 an invalid value.
> > + * */
> > +BDRV_LOCKF_EXCLUSIVE = 1,
> > +BDRV_LOCKF_SHARED,
> > +BDRV_LOCKF_UNLOCK,
> > +} BdrvLockfCmd;
> > +
> 
> We started to talk about new APIs in librbd to support this feature
> where we don't need to worry about admin action should QEMU crash
> while holding the lock.
> 
> Any chance for separating the UNLOCK enum into the exclusive vs shared
> case? We could do some magic in the rbd block driver to guess how it
> was locked but it seems like it would be cleaner (at least for us) to
> explicitly call out what type of unlock you are requesting since it
> will involve different API methods.

This should be possible but I'm not sure I fully understand the rationale
behind it. The server side who implements the lock and keeps track of states
should have the lock type information already, why is it necessary for the
client to be explicit? It doesn't sound necessary to me at all from an
interface point of view. Can you elaborate more on the API methods that need
this?

Fam



Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory

2016-06-07 Thread Bharata B Rao
On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-06-07 00:19:03)
> > Memory hotplug can fail for some combinations of RAM and maxmem when
> > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > on maximum addressable memory returned by guest and this value is currently
> > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > While there is an attempt to fix the guest kernel, this patch works
> > around the problem within QEMU itself.
> > 
> > memory_hotplug_max() routine in the guest kernel arrives at max
> > addressable memory by multiplying lmb-size with the lmb-count obtained
> > from ibm,dynamic-memory property. There are two assumptions here:
> > 
> > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> >   where only hot-pluggable LMBs are present in this property.
> > - The memory area comprising of RAM and hotplug region is contiguous: This
> >   needn't be true always for PowerKVM as there can be gap between
> >   boot time RAM and hotplug region.
> > 
> > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > has information about all the LMBs (RMA, boot-time LMBs, future
> > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > hotpluggable region).
> > 
> > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > reserved so that these LMBs are not recounted/counted by guest.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> > Changes in v3:
> > 
> > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> >   to have DRC objects for only hotpluggable LMBs.
> > - Simplified the logic of creating dynamic-memory node based on comments
> >   from Michael Roth and David Gibson.
> > 
> > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > 
> >  hw/ppc/spapr.c | 51 
> > --
> >  include/hw/ppc/spapr.h |  5 +++--
> >  2 files changed, 36 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 0636642..9d1d43d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -762,14 +762,17 @@ static int 
> > spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> >  int ret, i, offset;
> >  uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> >  uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > -uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > +uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > +uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > +   memory_region_size(>hotplug_memory.mr)) /
> > +   lmb_size;
> >  uint32_t *int_buf, *cur_index, buf_len;
> >  int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > 
> >  /*
> > - * Don't create the node if there are no DR LMBs.
> > + * Don't create the node if there is no hotpluggable memory
> >   */
> > -if (!nr_lmbs) {
> > +if (machine->ram_size == machine->maxram_size) {
> >  return 0;
> >  }
> > 
> > @@ -805,24 +808,36 @@ static int 
> > spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> >  for (i = 0; i < nr_lmbs; i++) {
> >  sPAPRDRConnector *drc;
> >  sPAPRDRConnectorClass *drck;
> 
> Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> clearer to move them there now.

Yes.

> 
> > -uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > +uint64_t addr = i * lmb_size;
> >  uint32_t *dynamic_memory = cur_index;
> > 
> > -drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > -   addr/lmb_size);
> > -g_assert(drc);
> > -drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > -
> > -dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > -dynamic_memory[1] = cpu_to_be32(addr & 0x);
> > -dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > -dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > -dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > -if (addr < machine->ram_size ||
> > -memory_region_present(get_system_memory(), addr)) {
> > -dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > +if (i >= hotplug_lmb_start) {
> > +drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > +   addr / lmb_size);
> 
> Could just be i

Hmm I thought I got all such occurances covered :(

> 
> > +g_assert(drc);
> > +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > +dynamic_memory[1] = 

[Qemu-devel] [PATCH] hw/arm: virt uart fix

2016-06-07 Thread xiaoqiang zhao
commit f0d1d2c115dffc1fbaf954d0b449db05c5eb79b1
("hw/char: QOM'ify pl011 model") break qemu-system-arm virt machine
if option '-machine secure=on' is provided.

The function create_uart is called twice. So make CharDriverState pointer
a parameter to create_uart instead of hardcoded.

Signed-off-by: xiaoqiang zhao 
---
 hw/arm/virt.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8e46137..73113cf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -525,7 +525,7 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic, 
int type, bool secure)
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic, int uart,
-MemoryRegion *mem)
+MemoryRegion *mem, CharDriverState *chr)
 {
 char *nodename;
 hwaddr base = vbi->memmap[uart].base;
@@ -536,7 +536,7 @@ static void create_uart(const VirtBoardInfo *vbi, qemu_irq 
*pic, int uart,
 DeviceState *dev = qdev_create(NULL, "pl011");
 SysBusDevice *s = SYS_BUS_DEVICE(dev);
 
-qdev_prop_set_chr(dev, "chardev", serial_hds[0]);
+qdev_prop_set_chr(dev, "chardev", chr);
 qdev_init_nofail(dev);
 memory_region_add_subregion(mem, base,
 sysbus_mmio_get_region(s, 0));
@@ -1259,11 +1259,11 @@ static void machvirt_init(MachineState *machine)
 
 create_gic(vbi, pic, gic_version, vms->secure);
 
-create_uart(vbi, pic, VIRT_UART, sysmem);
+create_uart(vbi, pic, VIRT_UART, sysmem, serial_hds[0]);
 
 if (vms->secure) {
 create_secure_ram(vbi, secure_sysmem);
-create_uart(vbi, pic, VIRT_SECURE_UART, secure_sysmem);
+create_uart(vbi, pic, VIRT_SECURE_UART, secure_sysmem, serial_hds[1]);
 }
 
 create_rtc(vbi, pic);
-- 
2.1.4





Re: [Qemu-devel] [PATCH 2/2] spapr: Better handling of ibm, pa-features TM bit

2016-06-07 Thread David Gibson
On Tue, Jun 07, 2016 at 10:32:10PM +1000, Anton Blanchard wrote:
> From: Anton Blanchard 
> 
> There are a few issues with our handling of the ibm,pa-features
> TM bit:
> 
> - We don't support transactional memory in PR KVM, so don't tell
>   the OS that we do.
> 
> - In full emulation we have a minimal implementation of TM that always
>   fails, so for performance reasons lets not tell the OS that we
>   support it either.
> 
> - In HV KVM mode, we should mirror the host TM enabled state by
>   looking at the AT_HWCAP2 bit.
> 
> Signed-off-by: Anton Blanchard 

So, we certainly need a change like this.  I'm not entirely happy with
the current implementation though.

> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0636642..c403fbb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -620,7 +620,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void 
> *fdt, int offset,
>  0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
>  0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
>  0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
> -0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
> +0x80, 0x00, 0x80, 0x00, 0x00, 0x00 };
>  uint8_t *pa_features;
>  size_t pa_size;
>  
> @@ -697,6 +697,19 @@ static void spapr_populate_cpu_dt(CPUState *cs, void 
> *fdt, int offset,
>  } else /* env->mmu_model == POWERPC_MMU_2_07 */ {
>  pa_features = pa_features_207;
>  pa_size = sizeof(pa_features_207);
> +
> +#ifdef CONFIG_KVM
> +/* Only enable TM in HV KVM mode */
> +if (kvm_enabled() &&
> +!kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
> +unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
> +
> +/* Guest should inherit host TM enabled bit */
> +if (hwcap2 & PPC_FEATURE2_HAS_HTM) {
> +pa_features[24] |= 0x80;
> +}
> +}
> +#endif

So first, I think this stanza wants to move into target-ppc/kvm.c -
maybe a kvm_filter_pa_features() call or something.

Second, although using PVINFO to determine if we have HV KVM is a
standard trick, we don't want to use it as our first option.  We
really want to introduce an actual KVM CAP flag for TM support, then
fall back to checking PVINFO if we can't use that.

I wonder if we actually want to just blanket disable TM in one patch -
since it doesn't work at all with PR KVM, and "works" only in the most
rules-lawyering and useless way on TCG.  Then re-enable it on HV KVM
in a second patch.

>  }
>  if (env->ci_large_pages) {
>  pa_features[3] |= 0x20;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 1/2] Add PowerPC AT_HWCAP2 definitions

2016-06-07 Thread David Gibson
On Tue, Jun 07, 2016 at 10:28:42PM +1000, Anton Blanchard wrote:
> From: Anton Blanchard 
> 
> We need the PPC_FEATURE2_HAS_HTM bit in a subsequent patch, so
> add the PowerPC AT_HWCAP2 definitions.
> 
> Signed-off-by: Anton Blanchard 

Applied to ppc-for-2.7.

Paolo or Peter: since this is a change to PPC specific bits, it seems
reasonable to go through my tree although it's technically a generic
header.  If someone wants to drop an explicit Ack, that wouldn't hurt
of course.

> ---
> 
> diff --git a/include/elf.h b/include/elf.h
> index 28d448b..8533b2a 100644
> --- a/include/elf.h
> +++ b/include/elf.h
> @@ -477,6 +477,19 @@ typedef struct {
>  #define PPC_FEATURE_TRUE_LE 0x0002
>  #define PPC_FEATURE_PPC_LE  0x0001
>  
> +/* Bits present in AT_HWCAP2 for PowerPC.  */
> +
> +#define PPC_FEATURE2_ARCH_2_07  0x8000
> +#define PPC_FEATURE2_HAS_HTM0x4000
> +#define PPC_FEATURE2_HAS_DSCR   0x2000
> +#define PPC_FEATURE2_HAS_EBB0x1000
> +#define PPC_FEATURE2_HAS_ISEL   0x0800
> +#define PPC_FEATURE2_HAS_TAR0x0400
> +#define PPC_FEATURE2_HAS_VEC_CRYPTO 0x0200
> +#define PPC_FEATURE2_HTM_NOSC   0x0100
> +#define PPC_FEATURE2_ARCH_3_00  0x0080
> +#define PPC_FEATURE2_HAS_IEEE1280x0040
> +
>  /* Bits present in AT_HWCAP for Sparc.  */
>  
>  #define HWCAP_SPARC_FLUSH   0x0001
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] Allow AMD IOMMU to have both SysBusDevice and PCIDevice properties.

2016-06-07 Thread Peter Xu
On Tue, Jun 07, 2016 at 10:32:34PM +0300, David Kiarie wrote:
> On Tue, Jun 7, 2016 at 10:12 PM, Eduardo Habkost  wrote:
> > Hi,
> 
> Hello,
> 
> >
> > I didn't review the amd_iommu.c code, but there seems to be some
> > unrelated changes in the patch:
> 
> Thanks for looking at this but I actually wanted someone to look at
> the amd_iommu.c. I mentioned in annotation that there are some
> unrelated changes because this work is based on code that has not been
> merged yet. I specifically sent this to have a review in amd_iommu.c
> not the details but the design. I have patchset that implements AMD
> IOMMU (translation only) which is implemented as a PCI device. It is
> however not possible to work on interrupt remapping without converting
> AMD IOMMU from a PCI device to a SysBusDevice. This device(AMD IOMMU),
> the one on this patch unlike in previous patches, creates to devices ;
> a PCI device and a SySBusDev which am not sure is acceptable.

I would suggest that you generate another patch, only contains the
changes you made related to adding the PCI device for AMD IOMMU,
explain bits about what this work is based on (e.g., IMHO it could be
based on your v11 AMD patchset and several other patches like Intel
IOMMU IR, just mention them in the cover letter), then mark it as a
RFC.

-- peterx



Re: [Qemu-devel] [Qemu-discuss] -serial option broken in master?

2016-06-07 Thread xiaoqiang zhao



在 2016年06月08日 05:24, Peter Maydell 写道:

On 7 June 2016 at 15:47, Jérôme Forissier  wrote:

Hi,

I just noticed this error [1] (QEMU master branch):

../qemu/arm-softmmu/qemu-system-arm -nographic -monitor none -machine
virt -machine secure=on -cpu cortex-a15 -m 1057 -serial stdio -serial
file:serial1.log -bios
/home/travis/optee_repo/build/../out/bios-qemu/bios.bin
Unexpected error in parse_chr() at hw/core/qdev-properties-system.c:149:
qemu-system-arm: Property 'pl011.chardev' can't take value 'serial0',
it's in use

FYI, revert commits e5fabad7ccfd ("char: get rid of
qemu_char_get_next_serial") and f0d1d2c115df ("hw/char: QOM'ify pl011
model"), and the problem disappears.

Should I use a different syntax?

No, it's a bug that we broke this somehow. Xiaoqiang, could you
have a look at this, please?

thanks
-- PMM

Hi, peter:
 The bug was caught. It is because in vrit.c, create_uart is called 
twice, in which serial_hds array is hardcoded.

 I will send a patch to fix this later .




Re: [Qemu-devel] [PATCH v2 10/22] hw/intc/arm_gicv3: Implement functions to identify next pending irq

2016-06-07 Thread Shannon Zhao


On 2016/5/26 22:55, Peter Maydell wrote:
> Implement the GICv3 logic to recalculate the highest priority pending
> interrupt for each CPU after some part of the GIC state has changed.
> We avoid unnecessary full recalculation where possible.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/intc/arm_gicv3.c| 293 
> +
>  hw/intc/arm_gicv3_common.c |   9 ++
>  hw/intc/gicv3_internal.h   | 121 +++
>  include/hw/intc/arm_gicv3_common.h |  18 +++
>  4 files changed, 441 insertions(+)
> 
> diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
> index 96e0d2f..7c4bee6 100644
> --- a/hw/intc/arm_gicv3.c
> +++ b/hw/intc/arm_gicv3.c
> @@ -21,6 +21,287 @@
>  #include "hw/intc/arm_gicv3.h"
>  #include "gicv3_internal.h"
>  
> +static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio)
> +{
> +/* Return true if this IRQ at this priority should take
> + * precedence over the current recorded highest priority
> + * pending interrupt for this CPU. We also return true if
> + * the current recorded highest priority pending interrupt
> + * is the same as this one (a property which the calling code
> + * relies on).
> + */
> +if (prio < cs->hppi.prio) {
> +return true;
> +}
> +/* If multiple pending interrupts have the same priority then it is an
> + * IMPDEF choice which of them to signal to the CPU. We choose to
> + * signal the one with the lowest interrupt number.
> + */
> +if (prio == cs->hppi.prio && irq <= cs->hppi.irq) {
> +return true;
> +}
> +return false;
> +}
> +
> +static uint32_t gicd_int_pending(GICv3State *s, int irq)
> +{
> +/* Recalculate which redistributor interrupts are actually pending
s/redistributor/distributor/

> + * in the group of 32 interrupts starting at irq (which should be a 
> multiple
> + * of 32), and return a 32-bit integer which has a bit set for each
> + * interrupt that is eligible to be signaled to the CPU interface.
> + *
> + * An interrupt is pending if:
> + *  + the PENDING latch is set OR it is level triggered and the input is 
> 1
> + *  + its ENABLE bit is set
> + *  + the GICD enable bit for its group is set
> + * Conveniently we can bulk-calculate this with bitwise operations.
> + */
> +uint32_t pend, grpmask;
> +uint32_t pending = *gic_bmp_ptr32(s->pending, irq);
> +uint32_t edge_trigger = *gic_bmp_ptr32(s->edge_trigger, irq);
> +uint32_t level = *gic_bmp_ptr32(s->level, irq);
> +uint32_t group = *gic_bmp_ptr32(s->group, irq);
> +uint32_t grpmod = *gic_bmp_ptr32(s->grpmod, irq);
> +uint32_t enable = *gic_bmp_ptr32(s->enabled, irq);
> +
> +pend = pending | (~edge_trigger & level);
> +pend &= enable;
> +
> +if (s->gicd_ctlr & GICD_CTLR_DS) {
> +grpmod = 0;
> +}
> +
> +grpmask = 0;
> +if (s->gicd_ctlr & GICD_CTLR_EN_GRP1NS) {
> +grpmask |= group;
> +}
> +if (s->gicd_ctlr & GICD_CTLR_EN_GRP1S) {
> +grpmask |= (~group & grpmod);
> +}
> +if (s->gicd_ctlr & GICD_CTLR_EN_GRP0) {
> +grpmask |= (~group & ~grpmod);
> +}
> +pend &= grpmask;
> +
> +return pend;
> +}
> +
> +static uint32_t gicr_int_pending(GICv3CPUState *cs)
> +{
> +/* Recalculate which redistributor interrupts are actually pending,
> + * and return a 32-bit integer which has a bit set for each interrupt
> + * that is eligible to be signaled to the CPU interface.
> + *
> + * An interrupt is pending if:
> + *  + the PENDING latch is set OR it is level triggered and the input is 
> 1
> + *  + its ENABLE bit is set
> + *  + the GICD enable bit for its group is set
> + * Conveniently we can bulk-calculate this with bitwise operations.
> + */
> +uint32_t pend, grpmask, grpmod;
> +
> +pend = cs->gicr_ipendr0 | (~cs->edge_trigger & cs->level);
> +pend &= cs->gicr_ienabler0;
> +
> +if (cs->gic->gicd_ctlr & GICD_CTLR_DS) {
> +grpmod = 0;
> +} else {
> +grpmod = cs->gicr_igrpmodr0;
> +}
> +
> +grpmask = 0;
> +if (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP1NS) {
> +grpmask |= cs->gicr_igroupr0;
> +}
> +if (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP1S) {
> +grpmask |= (~cs->gicr_igroupr0 & grpmod);
> +}
> +if (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP0) {
> +grpmask |= (~cs->gicr_igroupr0 & ~grpmod);
> +}
> +pend &= grpmask;
> +
> +return pend;
> +}
> +
> +/* Update the interrupt status after state in a redistributor
> + * or CPU interface has changed, but don't tell the CPU i/f.
> + */
> +static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
> +{
> +/* Find the highest priority pending interrupt among the
> + * redistributor interrupts (SGIs and PPIs).
> + */
> +bool seenbetter = false;
> +uint8_t prio;
> +int i;
> +uint32_t pend;
> +
> 

Re: [Qemu-devel] [PATCH v3 2/2] target-i386: add migration support for Intel LMCE

2016-06-07 Thread Haozhong Zhang
On 06/07/16 17:18, Eduardo Habkost wrote:
> On Fri, Jun 03, 2016 at 02:09:44PM +0800, Haozhong Zhang wrote:
> > LMCE is disabled by default, but a cpu option 'lmce=on/off' is provided
> > to enable/disable it. Migration is only allowed between VCPUs with the
> > same lmce option.
> > 
> > Signed-off-by: Haozhong Zhang 
> > ---
> > Cc: "Michael S. Tsirkin" 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: Eduardo Habkost 
> > Cc: Boris Petkov 
> > Cc: Tony Luck 
> > Cc: Andi Kleen 
> > Cc: Ashok Raj 
> > ---
> >  include/hw/i386/pc.h  |  7 ++-
> >  target-i386/cpu.c |  1 +
> >  target-i386/cpu.h |  5 +
> >  target-i386/machine.c | 24 
> >  4 files changed, 36 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index ca23609..058eef9 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -357,7 +357,12 @@ int e820_get_num_entries(void);
> >  bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> >  
> >  #define PC_COMPAT_2_6 \
> > -HW_COMPAT_2_6
> > +HW_COMPAT_2_6 \
> > +{\
> > +.driver   = TYPE_X86_CPU,\
> > +.property = "lmce",\
> > +.value= "off",\
> > +},
> 
> You don't need this if lmce is disabled by default.
>

Oh yes, I'll remove in the next version.

> >  
> >  #define PC_COMPAT_2_5 \
> >  PC_COMPAT_2_6 \
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 9b4dbab..c69cc17 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -3232,6 +3232,7 @@ static Property x86_cpu_properties[] = {
> >  DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> >  DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> >  DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id),
> > +DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false),
> 
> Maybe this belong to patch 1/2?
>

I think it's better to not allow users to enable LMCE until we fix the
migration in patch 2, so I didn't put it in patch 1.

> >  DEFINE_PROP_END_OF_LIST()
> >  };
> >  
> > diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> > index 2d411ba..b512fd6 100644
> > --- a/target-i386/cpu.h
> > +++ b/target-i386/cpu.h
> > @@ -1182,6 +1182,11 @@ struct X86CPU {
> >   */
> >  bool enable_pmu;
> >  
> > +/* Enable LMCE support which is set via cpu option 'lmce=on/off'. LMCE 
> > is
> > + * disabled by default to avoid breaking the migration between QEMU 
> > with
> > + * different LMCE support. Only migrating between QEMU with the same 
> > LMCE
> > + * support is allowed.
> > + */
> >  bool enable_lmce;
> >  
> >  /* in order to simplify APIC support, we leave this pointer to the
> > diff --git a/target-i386/machine.c b/target-i386/machine.c
> > index cb9adf2..b55d376 100644
> > --- a/target-i386/machine.c
> > +++ b/target-i386/machine.c
> > @@ -347,6 +347,11 @@ static int cpu_post_load(void *opaque, int version_id)
> >  return -EINVAL;
> >  }
> >  
> > +if (!cpu->enable_lmce && (env->mcg_cap & MCG_LMCE_P)) {
> > +error_report("LMCE not enabled");
> > +return -EINVAL;
> > +}
> 
> Nice. But the error message could be clearer, to indicate that it
> is about command-line configuration not being the same on both
> sides. What about something like:
>   config mismatch: VCPU has LMCE is enabled, but "lmce" option is disabled
>

Yes, yours is much clearer. I'll change in the next version.

Thanks,
Haozhong

> > +
> >  /*
> >   * Real mode guest segments register DPL should be zero.
> >   * Older KVM version were setting it wrongly.
> > @@ -896,6 +901,24 @@ static const VMStateDescription vmstate_tsc_khz = {
> >  }
> >  };
> >  
> > +static bool mcg_ext_ctl_needed(void *opaque)
> > +{
> > +X86CPU *cpu = opaque;
> > +CPUX86State *env = >env;
> > +return cpu->enable_lmce && env->mcg_ext_ctl;
> > +}
> > +
> > +static const VMStateDescription vmstate_mcg_ext_ctl = {
> > +.name = "cpu/mcg_ext_ctl",
> > +.version_id = 1,
> > +.minimum_version_id = 1,
> > +.needed = mcg_ext_ctl_needed,
> > +.fields = (VMStateField[]) {
> > +VMSTATE_UINT64(env.mcg_ext_ctl, X86CPU),
> > +VMSTATE_END_OF_LIST()
> > +}
> > +};
> > +
> >  VMStateDescription vmstate_x86_cpu = {
> >  .name = "cpu",
> >  .version_id = 12,
> > @@ -1022,6 +1045,7 @@ VMStateDescription vmstate_x86_cpu = {
> >  #ifdef TARGET_X86_64
> >  _pkru,
> >  #endif
> > +_mcg_ext_ctl,
> >  NULL
> >  }
> >  };
> > -- 
> > 2.8.3
> > 
> 
> -- 
> Eduardo



Re: [Qemu-devel] [PATCH v6 04/22] block: Introduce image file locking

2016-06-07 Thread Jason Dillaman
On Fri, Jun 3, 2016 at 4:48 AM, Fam Zheng  wrote:
> +typedef enum {
> +/* The values are ordered so that lower number implies higher 
> restriction.
> + * Starting from 1 to make 0 an invalid value.
> + * */
> +BDRV_LOCKF_EXCLUSIVE = 1,
> +BDRV_LOCKF_SHARED,
> +BDRV_LOCKF_UNLOCK,
> +} BdrvLockfCmd;
> +

We started to talk about new APIs in librbd to support this feature
where we don't need to worry about admin action should QEMU crash
while holding the lock.

Any chance for separating the UNLOCK enum into the exclusive vs shared
case? We could do some magic in the rbd block driver to guess how it
was locked but it seems like it would be cleaner (at least for us) to
explicitly call out what type of unlock you are requesting since it
will involve different API methods.

-- 
Jason



Re: [Qemu-devel] [PATCH v3 1/2] target-i386: KVM: add basic Intel LMCE support

2016-06-07 Thread Haozhong Zhang
On 06/07/16 17:10, Eduardo Habkost wrote:
> On Fri, Jun 03, 2016 at 02:09:43PM +0800, Haozhong Zhang wrote:
> [...]
> > +
> > +if (cpu->enable_lmce) {
> > +if (lmce_supported()) {
> > +cenv->mcg_cap |= MCG_LMCE_P;
> > +cenv->msr_ia32_feature_control |=
> > +MSR_IA32_FEATURE_CONTROL_LMCE |
> > +MSR_IA32_FEATURE_CONTROL_LOCKED;
> > +} else {
> > +error_report("Warning: KVM unavailable or not support 
> > LMCE, "
> > + "LMCE disabled");
> > +cpu->enable_lmce = false;
> 
> Please don't do that. If the user explicitly asked for LMCE, you
> should refuse to start if the host doesn't have the required
> capabilities.
>

OK, I'll change in the next version.

Thanks,
Haozhong

> 
> > +}
> > +}
> > +
> >  cenv->mcg_ctl = ~(uint64_t)0;
> >  for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
> >  cenv->mce_banks[bank * 4] = ~(uint64_t)0;
> [...]
> 
> -- 
> Eduardo



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Alex Williamson
On Wed, 8 Jun 2016 01:18:42 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, June 08, 2016 6:42 AM
> > 
> > On Tue, 7 Jun 2016 03:03:32 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > >
> > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > Neo Jia  wrote:
> > > >  
> > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > Neo Jia  wrote:
> > > > > >
> > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > This intends to handle an intercepted channel I/O instruction. It
> > > > > > basically need to do the following thing:  
> > > > >
> > > > > May I ask how and when QEMU knows that he needs to issue such VFIO 
> > > > > ioctl at
> > > > > first place?  
> > > >
> > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > emulated device in QEMU that's trying to tell the mediated device when
> > > > to start an operation when we probably should be passing through
> > > > whatever i/o operations indicate that status directly to the mediated
> > > > device. Thanks,
> > > >
> > > > Alex  
> > >
> > > Below is copied from Dong's earlier post which said clear that
> > > a guest cmd submission will trigger the whole flow:
> > >
> > > 
> > > Explanation:
> > > Q1-Q4: Qemu side process.
> > > K1-K6: Kernel side process.
> > >
> > > Q1. Intercept a ssch instruction.
> > > Q2. Translate the guest ccw program to a user space ccw program
> > > (u_ccwchain).
> > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > K2. Translate the user space ccw program to a kernel space ccw
> > > program, which becomes runnable for a real device.
> > > K3. With the necessary information contained in the orb passed in
> > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > for the I/O result.
> > > K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > update the user space irb.
> > > K6. Copy irb and scsw back to user space.
> > > Q4. Update the irb for the guest.
> > >   
> > 
> > Right, but this was the pre-mediated device approach, now we no longer
> > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > QEMU if those are operations that are not visible to the mediated
> > device; which they very well might be, since it's described as an
> > instruction rather than an i/o operation.  It's not terrible if that's
> > the case, vfio-pci has its own ioctl for doing a hot reset.  
> 
> 
> 
> >   
> > > My understanding is that such thing belongs to how device is mediated
> > > (so device driver specific), instead of something to be abstracted in
> > > VFIO which manages resource but doesn't care how resource is used.
> > >
> > > Actually we have same requirement in vGPU case, that a guest driver
> > > needs submit GPU commands through some MMIO register. vGPU device
> > > model will intercept the submission request (in its own way), do its
> > > necessary scan/audit to ensure correctness/security, and then submit
> > > to physical GPU through vendor specific interface.
> > >
> > > No difference with channel I/O here.  
> > 
> > Well, if the GPU command is submitted through an MMIO register, is that
> > MMIO register part of the mediated device?  If so, could the mediated
> > device recognize the command and do the scan/audit itself?  QEMU must
> > not be the point at which mediation occurs for security purposes, QEMU
> > is userspace and userspace is not to be trusted.  I'm still open to
> > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > already, but we need to evaluate each one, why it needs to exist, and
> > whether we can skip it if the mediated device can trigger the action on
> > its own.  After all, that's why we're using the vfio api, so we can
> > re-use much of the existing infrastructure, especially for a vGPU that
> > exposes itself as a PCI device.  Thanks,
> >   
> 
> My point is that a guest submission on vGPU is just a normal trapped 
> register write, which is forwarded from Qemu to VFIO through pwrite 
> interface and then hit mediated vGPU device. The mediated device
> will recognize this register write as a submission request and then do
> necessary scan (looks we are saying same thing) and then submit to
> physical device driver. If loading ccw cmds on channel i/o are also 
> through some I/O registers, it can be implemented same way w/o
> introducing new ioctl. The r/w handler of mediated device can figure
> out whether it's a ccw submission or not. But my understanding might 
> be wrong here.

I 

Re: [Qemu-devel] [PATCH 4/5] ppc: Improve PCR bit selection in ppc_set_compat()

2016-06-07 Thread David Gibson
On Tue, Jun 07, 2016 at 05:39:39PM +0200, Thomas Huth wrote:
> When using an olderr PowerISA level, all the upper compatibility
> bits have to be enabled, too. For example when we want to run
> something in PowerISA 2.05 compatibility mode on POWER8, the bit
> for 2.06 has to be set beside the bit for 2.05.
> Additionally, to make sure that we do not set bits that are not
> supported by the host, we apply a mask with the known-to-be-good
> bits here, too.
> 
> Signed-off-by: Thomas Huth 

This one confused me a bit until I realised that, roughly speaking,
bits in the PCR turn features off, rather than turning features on.
Does that sound correct?

> ---
>  target-ppc/translate_init.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index fa09183..ee2bc14 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -9519,24 +9519,29 @@ void ppc_set_compat(PowerPCCPU *cpu, uint32_t 
> cpu_version, Error **errp)
>  {
>  int ret = 0;
>  CPUPPCState *env = >env;
> +PowerPCCPUClass *host_pcc;
>  
>  cpu->cpu_version = cpu_version;
>  
>  switch (cpu_version) {
>  case CPU_POWERPC_LOGICAL_2_05:
> -env->spr[SPR_PCR] = PCR_COMPAT_2_05;
> +env->spr[SPR_PCR] = PCR_TM_DIS | PCR_VSX_DIS | PCR_COMPAT_2_07 |
> +PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
>  break;
>  case CPU_POWERPC_LOGICAL_2_06:
> -env->spr[SPR_PCR] = PCR_COMPAT_2_06;
> -break;
>  case CPU_POWERPC_LOGICAL_2_06_PLUS:
> -env->spr[SPR_PCR] = PCR_COMPAT_2_06;
> +env->spr[SPR_PCR] = PCR_TM_DIS | PCR_COMPAT_2_07 | PCR_COMPAT_2_06;
>  break;
>  default:
>  env->spr[SPR_PCR] = 0;
>  break;
>  }
>  
> +host_pcc = kvm_ppc_get_host_cpu_class();
> +if (host_pcc) {
> +env->spr[SPR_PCR] &= host_pcc->pcr_mask;
> +}
> +
>  if (kvm_enabled()) {
>  ret = kvmppc_set_compat(cpu, cpu->cpu_version);
>  if (ret < 0) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/5] ppc: Improve sPAPR CPU compatibility mode settings

2016-06-07 Thread David Gibson
On Tue, Jun 07, 2016 at 05:39:35PM +0200, Thomas Huth wrote:
> If a guest currently only requests PowerISA 2.07 compatiblity mode with
> the "ibm,client-architecture-support" firmware call, but does not
> specify a matching real PVR for the host CPU on a POWER8 host,
> it ends up in POWER7 / PowerISA 2.06 compatibility mode since QEMU
> does not support 2.07 compatibility mode yet. This currently happens
> when running a Linux guest on a POWER8NVL host, since Linux guests
> do not use the PVR for these CPUs for the "ibm,client-architecture-
> support" call yet (but I submitted a patch for the kernel to fix this
> issue there last week, so the support should soon be there, too).
> 
> Anyway, QEMU should also support a proper 2.07 compatibility mode
> if the host CPU can do it. So this patch series introduces such a
> mode and does some clean-ups and other fixes along the way (e.g.
> it splits the ambiguous pcr_mask setting into two variables, one
> for defining the valid bits in the PCR register, and one for
> storing the valid ISA levels).
> 
> Thomas Huth (5):
>   ppc/spapr: Refactor h_client_architecture_support() CPU parsing code
>   ppc: Split pcr_mask settings into supported bits and the register mask
>   ppc: Provide function to get CPU class of the host CPU
>   ppc: Improve PCR bit selection in ppc_set_compat()
>   ppc: Add PowerISA 2.07 compatibility mode
> 
>  hw/ppc/spapr_hcall.c| 63 
> +++--
>  target-ppc/cpu-qom.h|  3 ++-
>  target-ppc/cpu.h|  1 +
>  target-ppc/kvm.c| 19 ++
>  target-ppc/kvm_ppc.h|  7 +
>  target-ppc/translate_init.c | 22 +++-
>  6 files changed, 78 insertions(+), 37 deletions(-)

Applied to ppc-for-2.7, thanks.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v20 10/10] support replication driver in blockdev-add

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index e56cdf4..b9f9839 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -248,6 +248,7 @@
 #   2.3: 'host_floppy' deprecated
 #   2.5: 'host_floppy' dropped
 #   2.6: 'luks' added
+#   2.7: 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1632,6 +1633,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.7
 #
 # Since: 2.0
 ##
@@ -1639,8 +1641,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
 'http', 'https', 'luks', 'null-aio', 'null-co', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-'vmdk', 'vpc', 'vvfat' ] }
+'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication', 'tftp',
+'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsFile
@@ -2045,6 +2047,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.7
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.  Many options are available for all
@@ -2125,6 +2140,7 @@
   'quorum': 'BlockdevOptionsQuorum',
   'raw':'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+  'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
   'tftp':   'BlockdevOptionsFile',
-- 
1.9.3






Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, June 08, 2016 6:42 AM
> 
> On Tue, 7 Jun 2016 03:03:32 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Tuesday, June 07, 2016 3:31 AM
> > >
> > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > Neo Jia  wrote:
> > >
> > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > Neo Jia  wrote:
> > > > >
> > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > This intends to handle an intercepted channel I/O instruction. It
> > > > > basically need to do the following thing:
> > > >
> > > > May I ask how and when QEMU knows that he needs to issue such VFIO 
> > > > ioctl at
> > > > first place?
> > >
> > > Yep, this is my question as well.  It sounds a bit like there's an
> > > emulated device in QEMU that's trying to tell the mediated device when
> > > to start an operation when we probably should be passing through
> > > whatever i/o operations indicate that status directly to the mediated
> > > device. Thanks,
> > >
> > > Alex
> >
> > Below is copied from Dong's earlier post which said clear that
> > a guest cmd submission will trigger the whole flow:
> >
> > 
> > Explanation:
> > Q1-Q4: Qemu side process.
> > K1-K6: Kernel side process.
> >
> > Q1. Intercept a ssch instruction.
> > Q2. Translate the guest ccw program to a user space ccw program
> > (u_ccwchain).
> > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > K2. Translate the user space ccw program to a kernel space ccw
> > program, which becomes runnable for a real device.
> > K3. With the necessary information contained in the orb passed in
> > by Qemu, issue the k_ccwchain to the device, and wait event q
> > for the I/O result.
> > K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > update the user space irb.
> > K6. Copy irb and scsw back to user space.
> > Q4. Update the irb for the guest.
> > 
> 
> Right, but this was the pre-mediated device approach, now we no longer
> need step Q2 so we really only need Q1 and therefore Q3 to exist in
> QEMU if those are operations that are not visible to the mediated
> device; which they very well might be, since it's described as an
> instruction rather than an i/o operation.  It's not terrible if that's
> the case, vfio-pci has its own ioctl for doing a hot reset.



> 
> > My understanding is that such thing belongs to how device is mediated
> > (so device driver specific), instead of something to be abstracted in
> > VFIO which manages resource but doesn't care how resource is used.
> >
> > Actually we have same requirement in vGPU case, that a guest driver
> > needs submit GPU commands through some MMIO register. vGPU device
> > model will intercept the submission request (in its own way), do its
> > necessary scan/audit to ensure correctness/security, and then submit
> > to physical GPU through vendor specific interface.
> >
> > No difference with channel I/O here.
> 
> Well, if the GPU command is submitted through an MMIO register, is that
> MMIO register part of the mediated device?  If so, could the mediated
> device recognize the command and do the scan/audit itself?  QEMU must
> not be the point at which mediation occurs for security purposes, QEMU
> is userspace and userspace is not to be trusted.  I'm still open to
> ioctls where it makes sense, as above, we have PCI specific ioctls and
> already, but we need to evaluate each one, why it needs to exist, and
> whether we can skip it if the mediated device can trigger the action on
> its own.  After all, that's why we're using the vfio api, so we can
> re-use much of the existing infrastructure, especially for a vGPU that
> exposes itself as a PCI device.  Thanks,
> 

My point is that a guest submission on vGPU is just a normal trapped 
register write, which is forwarded from Qemu to VFIO through pwrite 
interface and then hit mediated vGPU device. The mediated device
will recognize this register write as a submission request and then do
necessary scan (looks we are saying same thing) and then submit to
physical device driver. If loading ccw cmds on channel i/o are also 
through some I/O registers, it can be implemented same way w/o
introducing new ioctl. The r/w handler of mediated device can figure
out whether it's a ccw submission or not. But my understanding might 
be wrong here.

Thanks
Kevin



[Qemu-devel] [PATCH v20 06/10] auto complete active commit

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Auto complete mirror job in background to prevent from
blocking synchronously

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
 block/mirror.c| 13 +
 blockdev.c|  2 +-
 include/block/block_int.h |  3 ++-
 qemu-img.c|  2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..40fad19 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -805,7 +805,8 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
  BlockCompletionFunc *cb,
  void *opaque, Error **errp,
  const BlockJobDriver *driver,
- bool is_none_mode, BlockDriverState *base)
+ bool is_none_mode, BlockDriverState *base,
+ bool auto_complete)
 {
 MirrorBlockJob *s;
 
@@ -840,6 +841,9 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 s->granularity = granularity;
 s->buf_size = ROUND_UP(buf_size, granularity);
 s->unmap = unmap;
+if (auto_complete) {
+s->should_complete = true;
+}
 
 s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
 if (!s->dirty_bitmap) {
@@ -877,14 +881,15 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
 mirror_start_job(bs, target, replaces,
  speed, granularity, buf_size,
  on_source_error, on_target_error, unmap, cb, opaque, errp,
- _job_driver, is_none_mode, base);
+ _job_driver, is_none_mode, base, false);
 }
 
 void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
  int64_t speed,
  BlockdevOnError on_error,
  BlockCompletionFunc *cb,
- void *opaque, Error **errp)
+ void *opaque, Error **errp,
+ bool auto_complete)
 {
 int64_t length, base_length;
 int orig_base_flags;
@@ -924,7 +929,7 @@ void commit_active_start(BlockDriverState *bs, 
BlockDriverState *base,
 
 mirror_start_job(bs, base, NULL, speed, 0, 0,
  on_error, on_error, false, cb, opaque, _err,
- _active_job_driver, false, base);
+ _active_job_driver, false, base, auto_complete);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error_restore_flags;
diff --git a/blockdev.c b/blockdev.c
index 717785e..734bfb0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3155,7 +3155,7 @@ void qmp_block_commit(const char *device,
 goto out;
 }
 commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
-bs, _err);
+bs, _err, false);
 } else {
 commit_start(bs, base_bs, top_bs, speed, on_error, block_job_cb, bs,
  has_backing_file ? backing_file : NULL, _err);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 30a9717..89b66e8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -653,13 +653,14 @@ void commit_start(BlockDriverState *bs, BlockDriverState 
*base,
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
+ * @auto_complete: Auto complete the job.
  *
  */
 void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
  int64_t speed,
  BlockdevOnError on_error,
  BlockCompletionFunc *cb,
- void *opaque, Error **errp);
+ void *opaque, Error **errp, bool auto_complete);
 /*
  * mirror_start:
  * @bs: Block device to operate on.
diff --git a/qemu-img.c b/qemu-img.c
index 4b56ad3..e6a480f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -911,7 +911,7 @@ static int img_commit(int argc, char **argv)
 };
 
 commit_active_start(bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
-common_block_job_cb, , _err);
+common_block_job_cb, , _err, false);
 if (local_err) {
 goto done;
 }
-- 
1.9.3






[Qemu-devel] [PATCH v20 07/10] Introduce new APIs to do replication operation

2016-06-07 Thread Changlong Xie
Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 Makefile.objs|   1 +
 qapi/block-core.json |  13 
 replication.c| 105 ++
 replication.h| 176 +++
 4 files changed, 295 insertions(+)
 create mode 100644 replication.c
 create mode 100644 replication.h

diff --git a/Makefile.objs b/Makefile.objs
index da49b71..f77f6b0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
+block-obj-y += replication.o
 
 block-obj-m = block/
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98a20d2..e56cdf4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2032,6 +2032,19 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.7
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.  Many options are available for all
diff --git a/replication.c b/replication.c
new file mode 100644
index 000..03f4a2b
--- /dev/null
+++ b/replication.c
@@ -0,0 +1,105 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "replication.h"
+
+static QLIST_HEAD(, ReplicationState) replication_states;
+
+ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
+{
+ReplicationState *rs;
+
+assert(ops != NULL);
+rs = g_new0(ReplicationState, 1);
+rs->opaque = opaque;
+rs->ops = ops;
+QLIST_INSERT_HEAD(_states, rs, node);
+
+return rs;
+}
+
+void replication_remove(ReplicationState *rs)
+{
+if (rs) {
+QLIST_REMOVE(rs, node);
+g_free(rs);
+}
+}
+
+/*
+ * The caller of the function MUST make sure vm stopped
+ */
+void replication_start_all(ReplicationMode mode, Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, _states, node, next) {
+if (rs->ops && rs->ops->start) {
+rs->ops->start(rs, mode, _err);
+}
+if (local_err) {
+   error_propagate(errp, local_err);
+   return;
+}
+}
+}
+
+void replication_do_checkpoint_all(Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, _states, node, next) {
+if (rs->ops && rs->ops->checkpoint) {
+rs->ops->checkpoint(rs, _err);
+}
+if (local_err) {
+   error_propagate(errp, local_err);
+   return;
+}
+}
+}
+
+void replication_get_error_all(Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, _states, node, next) {
+if (rs->ops && rs->ops->get_error) {
+rs->ops->get_error(rs, _err);
+}
+if (local_err) {
+   error_propagate(errp, local_err);
+   return;
+}
+}
+}
+
+void replication_stop_all(bool failover, Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, _states, node, next) {
+if (rs->ops && rs->ops->stop) {
+rs->ops->stop(rs, failover, _err);
+}
+if (local_err) {
+   error_propagate(errp, local_err);
+   return;
+}
+}
+}
diff --git a/replication.h b/replication.h
new file mode 100644
index 000..d9db696
--- /dev/null
+++ b/replication.h
@@ -0,0 +1,176 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REPLICATION_H
+#define REPLICATION_H
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+
+typedef struct ReplicationOps ReplicationOps;
+typedef struct ReplicationState ReplicationState;
+
+/**
+ * SECTION:replication.h
+ * @title:Base Replication System
+ * @short_description: interfaces for handling replication
+ *
+ * The 

[Qemu-devel] [PATCH v20 08/10] Implement new driver for block replication

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block/Makefile.objs |   1 +
 block/replication.c | 657 
 2 files changed, 658 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fbfe647..5e28b45 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 block-obj-y += crypto.o
 
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..1dabb5d
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,657 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "block/nbd.h"
+#include "block/blockjob.h"
+#include "block/block_int.h"
+#include "block/block_backup.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+#include "replication.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+BdrvChild *active_disk;
+BdrvChild *hidden_disk;
+BdrvChild *secondary_disk;
+char *top_id;
+ReplicationState *rs;
+Error *blocker;
+int orig_hidden_flags;
+int orig_secondary_flags;
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_FAILOVER, /* failover is running in background */
+BLOCK_REPLICATION_FAILOVER_FAILED,  /* failover failed */
+BLOCK_REPLICATION_DONE, /* block replication is done */
+};
+
+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+  Error **errp);
+static void replication_do_checkpoint(ReplicationState *rs, Error **errp);
+static void replication_get_error(ReplicationState *rs, Error **errp);
+static void replication_stop(ReplicationState *rs, bool failover,
+ Error **errp);
+
+#define REPLICATION_MODE"mode"
+#define REPLICATION_TOP_ID  "top-id"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_TOP_ID,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static ReplicationOps replication_ops = {
+.start = replication_start,
+.checkpoint = replication_do_checkpoint,
+.get_error = replication_get_error,
+.stop = replication_stop,
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+const char *top_id;
+
+ret = -EINVAL;
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
+s->top_id = g_strdup(top_id);
+if (!s->top_id) {
+error_setg(_err, "Missing the option top-id");
+goto fail;
+}
+} else {
+error_setg(_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+s->rs = replication_new(bs, _ops);
+
+ret = 0;
+
+fail:
+qemu_opts_del(opts);
+error_propagate(errp, local_err);
+
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(s->rs, false, NULL);
+}
+
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+

[Qemu-devel] [PATCH v20 09/10] tests: add unit test case for replication

2016-06-07 Thread Changlong Xie
Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
 tests/.gitignore |   1 +
 tests/Makefile   |   4 +
 tests/test-replication.c | 555 +++
 3 files changed, 560 insertions(+)
 create mode 100644 tests/test-replication.c

diff --git a/tests/.gitignore b/tests/.gitignore
index a06a8ba..d22ab06 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -58,6 +58,7 @@ test-qmp-introspect.[ch]
 test-qmp-marshal.c
 test-qmp-output-visitor
 test-rcu-list
+test-replication
 test-rfifolock
 test-string-input-visitor
 test-string-output-visitor
diff --git a/tests/Makefile b/tests/Makefile
index a3e20e3..901b8e4 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -103,6 +103,7 @@ check-unit-y += tests/test-crypto-xts$(EXESUF)
 check-unit-y += tests/test-crypto-block$(EXESUF)
 gcov-files-test-logging-y = tests/test-logging.c
 check-unit-y += tests/test-logging$(EXESUF)
+check-unit-y += tests/test-replication$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -451,6 +452,9 @@ tests/test-base64$(EXESUF): tests/test-base64.o \
 
 tests/test-logging$(EXESUF): tests/test-logging.o $(test-util-obj-y)
 
+tests/test-replication$(EXESUF): tests/test-replication.o $(test-util-obj-y) \
+   $(test-block-obj-y)
+
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json 
$(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
diff --git a/tests/test-replication.c b/tests/test-replication.c
new file mode 100644
index 000..b5bb2eb
--- /dev/null
+++ b/tests/test-replication.c
@@ -0,0 +1,555 @@
+/*
+ * Block replication tests
+ *
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Author: Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "replication.h"
+#include "block/block_int.h"
+#include "sysemu/block-backend.h"
+
+#define IMG_SIZE (64 * 1024 * 1024)
+
+/* primary */
+static char p_local_disk[] = "/tmp/p_local_disk.XX";
+
+/* secondary */
+#define S_ID "secondary-id"
+#define S_LOCAL_DISK_ID "secondary-local-disk-id"
+static char s_local_disk[] = "/tmp/s_local_disk.XX";
+static char s_active_disk[] = "/tmp/s_active_disk.XX";
+static char s_hidden_disk[] = "/tmp/s_hidden_disk.XX";
+
+/* FIXME: steal from blockdev.c */
+QemuOptsList qemu_drive_opts = {
+.name = "drive",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_drive_opts.head),
+.desc = {
+{ /* end of list */ }
+},
+};
+
+static void io_read(BlockDriverState *bs, long pattern, int64_t pattern_offset,
+int64_t pattern_count, int64_t offset, int64_t count,
+bool expect_failed)
+{
+char *buf;
+void *cmp_buf = NULL;
+int ret;
+
+/* alloc pattern buffer */
+if (pattern) {
+cmp_buf = g_malloc(pattern_count);
+memset(cmp_buf, pattern, pattern_count);
+}
+
+/* alloc read buffer */
+buf = qemu_blockalign(bs, count);
+memset(buf, 0xab, count);
+
+/* do read */
+ret = bdrv_read(bs, offset >> 9, (uint8_t *)buf, count >> 9);
+
+/* assert and compare buf */
+if (expect_failed) {
+g_assert(ret < 0);
+} else {
+g_assert(ret >= 0);
+if (pattern) {
+g_assert(memcmp(buf + pattern_offset, cmp_buf, pattern_count) <= 
0);
+}
+}
+
+g_free(cmp_buf);
+qemu_vfree(buf);
+}
+
+static void io_write(BlockDriverState *bs, long pattern, int64_t offset,
+ int64_t count, bool expect_failed)
+{
+void *pattern_buf = NULL;
+int ret;
+
+/* alloc pattern buffer */
+if (pattern) {
+pattern_buf = qemu_blockalign(bs, count);
+memset(pattern_buf, pattern, count);
+}
+
+/* do write */
+if (pattern) {
+ret = bdrv_write(bs, offset >> 9, (uint8_t *)pattern_buf, count >> 9);
+} else {
+ret = bdrv_write_zeroes(bs, offset >> 9, count >> 9, 0);
+}
+
+/* assert */
+if (expect_failed) {
+g_assert(ret < 0);
+} else {
+g_assert(ret >= 0);
+}
+
+qemu_vfree(pattern_buf);
+}
+
+/*
+ * Create a uniquely-named empty temporary file.
+ */
+static void make_temp(char *template)
+{
+int fd;
+
+fd = mkstemp(template);
+g_assert(fd >= 0);
+close(fd);
+}
+
+
+static void prepare_imgs(void)
+{
+Error *local_err = NULL;
+
+make_temp(p_local_disk);
+make_temp(s_local_disk);
+make_temp(s_active_disk);
+make_temp(s_hidden_disk);
+
+/* Primary */
+bdrv_img_create(p_local_disk, "qcow2", NULL, NULL, NULL, IMG_SIZE,
+BDRV_O_RDWR, _err, true);
+g_assert(!local_err);
+
+/* Secondary */
+bdrv_img_create(s_local_disk, 

[Qemu-devel] [PATCH v20 03/10] Backup: export interfaces for extra serialization

2016-06-07 Thread Changlong Xie
Normal backup(sync='none') workflow:
step 1. NBD peformance I/O write from client to server
   qcow2_co_writev
bdrv_co_writev
 ...
   bdrv_aligned_pwritev
notifier_with_return_list_notify -> backup_do_cow
 bdrv_driver_pwritev // write new contents

step 2. drive-backup sync=none
   backup_do_cow
   {
wait_for_overlapping_requests
cow_request_begin
for(; start < end; start++) {
bdrv_co_readv_no_serialising //read old contents from Secondary disk
bdrv_co_writev // write old contents to hidden-disk
}
cow_request_end
   }

step 3. Then roll back to "step 1" to write new contents to Secondary disk.

And for replication, we must make sure that we only read the old contents from
Secondary disk in order to keep contents consistent.

1) Replication workflow of Secondary
 virtio-blk
  ^
--->  1 NBD   |
   || server   3 replication
   ||^^
   |||   backing backing  |
   ||  Secondary disk 6< hidden-disk 5 < active-disk 4
   ||| ^
   ||'-'
   ||   drive-backup sync=none 2

Hence, we need these interfaces to implement coarse-grained serialization 
between
COW of Secondary disk and the read operation of replication.

Example codes about how to use them:

*#include "block/block_backup.h"

static coroutine_fn int xxx_co_readv()
{
CowRequest req;
BlockJob *job = secondary_disk->bs->job;

if (job) {
  backup_wait_for_overlapping_requests(job, start, end);
  backup_cow_request_begin(, job, start, end);
  ret = bdrv_co_readv();
  backup_cow_request_end();
  goto out;
}
ret = bdrv_co_readv();
out:
return ret;
}

Signed-off-by: Changlong Xie 
Signed-off-by: Wen Congyang 
---
 block/backup.c   | 41 ++---
 include/block/block_backup.h | 14 ++
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index baf3936..b8e1c44 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -28,13 +28,6 @@
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 #define SLICE_TIME 1ULL /* ns */
 
-typedef struct CowRequest {
-int64_t start;
-int64_t end;
-QLIST_ENTRY(CowRequest) list;
-CoQueue wait_queue; /* coroutines blocked on this request */
-} CowRequest;
-
 typedef struct BackupBlockJob {
 BlockJob common;
 BlockBackend *target;
@@ -264,6 +257,40 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 bitmap_zero(backup_job->done_bitmap, len);
 }
 
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+  int nb_sectors)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
+int64_t start, end;
+
+assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+start = sector_num / sectors_per_cluster;
+end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+wait_for_overlapping_requests(backup_job, start, end);
+}
+
+void backup_cow_request_begin(CowRequest *req, BlockJob *job,
+  int64_t sector_num,
+  int nb_sectors)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
+int64_t start, end;
+
+assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+start = sector_num / sectors_per_cluster;
+end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+cow_request_begin(req, backup_job, start, end);
+}
+
+void backup_cow_request_end(CowRequest *req)
+{
+cow_request_end(req);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 3753bcb..e0e7ce6 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -1,3 +1,17 @@
 #include "block/block_int.h"
 
+typedef struct CowRequest {
+int64_t start;
+int64_t end;
+QLIST_ENTRY(CowRequest) list;
+CoQueue wait_queue; /* coroutines blocked on this request */
+} CowRequest;
+
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+  int nb_sectors);
+void backup_cow_request_begin(CowRequest *req, BlockJob *job,
+  

[Qemu-devel] [PATCH v20 04/10] Link backup into block core

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Some programs that add a dependency on it will use
the block layer directly.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 44a5416..fbfe647 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,12 +22,12 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 block-obj-y += crypto.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
1.9.3






[Qemu-devel] [PATCH v20 00/10] Block replication for continuous checkpoints

2016-06-07 Thread Changlong Xie
Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v20

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v19

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V20:
1. Rebase to the lastest code
2. Address comments from stefan
p8: 
1. error_setg() with an error message when check_top_bs() fails. 
2. remove bdrv_ref(s->hidden_disk->bs) since commit 5c438bc6
3. use bloc_job_cancel_sync() before active commit
p9: 
1. fix uninitialized 'pattern_buf'
2. introduce mkstemp(3) to fix unique filenames
3. use qemu_vfree() for qemu_blockalign() memory
4. add missing replication_start_all()
5. remove useless pattern for io_write()
V19:
1. Rebase to v2.6.0
2. Address comments from stefan
p3: a new patch that export interfaces for extra serialization
p8: 
1. call replication_stop() before freeing s->top_id
2. check top_bs
3. reopen file readonly in error return paths
4. enable extra serialization between read and COW
p9: try to hanlde SIGABRT
V18:
p6: add local_err in all replication callbacks to prevent "errp == NULL"
p7: add missing qemu_iovec_destroy(xxx)
V17:
1. Rebase to the lastest codes 
p2: refactor backup_do_checkpoint addressed comments from Jeff Cody
p4: fix bugs in "drive_add buddy xxx" hmp commands
p6: add "since: 2.7"
p7: fix bug in replication_close(), add missing "qapi/error.h", add 
test-replication 
p8: add "since: 2.7"
V16:
1. Rebase to the newest codes
2. Address comments from Stefan & hailiang
p3: we don't need this patch now
p4: add "top-id" parameters for secondary
p6: fix NULL pointer in replication callbacks, remove unnecessary typedefs, 
add doc comments that explain the semantics of Replication
p7: Refactor AioContext for thread-safe, remove unnecessary get_top_bs()
*Note*: I'm working on replication testcase now, will send out in V17
V15:
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
   1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
   2) Update the message and description for [PATCH 4/9]
   3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
   4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
   5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771  
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
   structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c 
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Changlong Xie (3):
  Backup: export interfaces for extra serialization
  Introduce new APIs to do replication operation
  tests: add unit test case for replication

Wen Congyang (7):
  unblock backup operations in backing file
  Backup: clear all bitmap when doing block checkpoint
  Link backup into 

[Qemu-devel] [PATCH v20 01/10] unblock backup operations in backing file

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
 block.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block.c b/block.c
index 736432f..dcf63f4 100644
--- a/block.c
+++ b/block.c
@@ -1310,6 +1310,23 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1 and 2, neither the source nor the target is the backing file.
+ * In case 3, we will block the top BDS, so there is only one block job
+ * for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }
-- 
1.9.3






[Qemu-devel] [PATCH v20 02/10] Backup: clear all bitmap when doing block checkpoint

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block/backup.c   | 18 ++
 include/block/block_backup.h |  3 +++
 2 files changed, 21 insertions(+)
 create mode 100644 include/block/block_backup.h

diff --git a/block/backup.c b/block/backup.c
index feeb9f8..baf3936 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -17,6 +17,7 @@
 #include "block/block.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/block_backup.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
@@ -246,6 +247,23 @@ static void backup_abort(BlockJob *job)
 }
 }
 
+void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+int64_t len;
+
+assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "The backup job only supports block checkpoint in"
+   " sync=none mode");
+return;
+}
+
+len = DIV_ROUND_UP(backup_job->common.len, backup_job->cluster_size);
+bitmap_zero(backup_job->done_bitmap, len);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
diff --git a/include/block/block_backup.h b/include/block/block_backup.h
new file mode 100644
index 000..3753bcb
--- /dev/null
+++ b/include/block/block_backup.h
@@ -0,0 +1,3 @@
+#include "block/block_int.h"
+
+void backup_do_checkpoint(BlockJob *job, Error **errp);
-- 
1.9.3






[Qemu-devel] [PATCH v20 05/10] docs: block replication's description

2016-06-07 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 docs/block-replication.txt | 239 +
 1 file changed, 239 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..c5fc18b
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,239 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2016
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of the Primary and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort during a vmstate checkpoint, the disk modification operations of
+the Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content (it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' |||  backing^   backing
+  ||| |
+  |||  

Re: [Qemu-devel] [PATCH COLO-Frame v17 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-06-07 Thread Hailiang Zhang

On 2016/6/7 20:06, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

This is the 17th version of COLO FT feature.

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v3.0-periodic-mode

Migration now switches to use the new QIOChannel API. It only affects COLO's
patch 9 and patch 12, which we used the old qsb buffer before, and we updated
them with the new API. It's only involving tiny changes.

Patch status:
Unreviewed: patch 32 ~ 35
Updated: patch 9, 12, 32


You've posted patches 1..34 - does 35 really exist?



Sorry, i made a mistake, there is no patch 35,
Compared with last series, we dropped one patch that related to qsb buffer.


Thanks.
Hailiang


Dave



Cc: Stefan Hajnoczi 
Cc: Jeff Cody 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Dr. David Alan Gilbert 
Cc: Jason Wang 

PS: These series has been in community for a long time, it depends on
Changlong's block-replicaton series, but that has been blocked for a long
time, we really need help on reviewing that and this series.  Thanks.

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v17:
  - Rebase master to use the new QIOChannel API, only affect patch 9 and 12
  - Reorganize some ugly comments
  - Rename colo_sem to colo_exit_sem (patch 21)

v16:
  - Fix compile broken due to missing osdep.h
  - Add reviewed-by tag for patch 27, 28, 29
  - Rename the message send/receive helper function (patch 7, 13)
  - Simplify the codes by using some notifier helpers in QEMU (patch 32)
  - Remove the useless check in colo_add_buffer_filter() (patch 33)
  - Remove the previous patch 36, 37 which export filter_buffer_flush()
to release the buffered packets, we simplify it by stopping buffer
filter while doing checkpoint, which will flush the buffered packets
by default. (patch 34)
v15:
  - Go on the shutdown process if encounter error while sending shutdown
message to SVM. (patch 24)
  - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
some useless comment. (patch 31, Jason)
  - Call object_new_with_props() directly to add filter in
colo_add_buffer_filter. (patch 34, Jason)
  - Re-implement colo_set_filter_status() based on COLOBufferFilters
list. (patch 35)
  - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
list. (patch 37)
v14:
  - Re-implement the network processing based on netfilter (Jason Wang)
  - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
  - Split two new patches (patch 27/28) from patch 29
  - Fix some other comments from Dave and Markus.

v13:
  - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
   instead of return value to indicate success or failure. (patch 10)
  - Remove the optional error message for COLO_EXIT event. (patch 25)
  - Use semaphore to notify colo/colo incoming loop that failover work is
finished. (patch 26)
  - Move COLO shutdown related codes to colo.c file. (patch 28)
  - Fix memory leak bug for colo incoming loop. (new patch 31)
  - Re-use some existed helper functions to realize the process of
saving/loading ram and device. (patch 32)
  - Fix some other comments from Dave and Markus.


zhanghailiang (34):
   configure: Add parameter for configure to enable/disable COLO support
   migration: Introduce capability 'x-colo' to migration
   COLO: migrate colo related info to secondary node
   migration: Integrate COLO checkpoint process into migration
   migration: Integrate COLO checkpoint process into loadvm
   COLO/migration: Create a new communication path from destination to
 source
   COLO: Implement COLO checkpoint protocol
   COLO: Add a new RunState RUN_STATE_COLO
   COLO: Save PVM state to secondary side when do checkpoint
   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
   ram/COLO: Record the dirty pages that SVM received
   COLO: Load VMState into buffer before restore it
   COLO: Flush PVM's cached RAM into SVM's memory
   COLO: Add checkpoint-delay parameter for migrate-set-parameters
   COLO: Synchronize PVM's state to SVM periodically
   COLO failover: Introduce a new command to trigger a failover
   COLO failover: Introduce state to record failover process
   COLO: Implement failover work for Primary VM
   COLO: Implement failover work for Secondary VM
   qmp event: Add COLO_EXIT event to notify users while exited from COLO
   COLO failover: Shutdown related socket fd when do failover
   COLO failover: Don't do failover during loading VM's state
   COLO: Process shutdown command for VM in COLO state
   COLO: Update the global runstate after going into colo state
   savevm: Introduce two helper functions for save/find 

Re: [Qemu-devel] [PATCH 1/5] ppc/spapr: Refactor h_client_architecture_support() CPU parsing code

2016-06-07 Thread Michael Roth
Quoting Thomas Huth (2016-06-07 10:39:36)
> The h_client_architecture_support() function has become quite big
> and nested already. So factor out the code that takes care of the
> sPAPR compatibility PVRs (which will be modified by the following
> patches).
> 
> Signed-off-by: Thomas Huth 

Restructuring looks sane.

Reviewed-by: Michael Roth 

> ---
>  hw/ppc/spapr_hcall.c | 61 
> +++-
>  1 file changed, 36 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 9a3f4ec..bb8f4de 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -922,6 +922,39 @@ static void do_set_compat(void *arg)
>  ((cpuver) == CPU_POWERPC_LOGICAL_2_06_PLUS) ? 2061 : \
>  ((cpuver) == CPU_POWERPC_LOGICAL_2_07) ? 2070 : 0)
> 
> +static void cas_handle_compat_cpu(PowerPCCPUClass *pcc, uint32_t pvr,
> +  unsigned max_lvl, unsigned *compat_lvl,
> +  unsigned *cpu_version)
> +{
> +unsigned lvl = get_compat_level(pvr);
> +bool is205, is206;
> +
> +if (!lvl) {
> +return;
> +}
> +
> +/* If it is a logical PVR, try to determine the highest level */
> +is205 = (pcc->pcr_mask & PCR_COMPAT_2_05) &&
> +(lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_05));
> +is206 = (pcc->pcr_mask & PCR_COMPAT_2_06) &&
> +((lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06)) ||
> + (lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06_PLUS)));
> +
> +if (is205 || is206) {
> +if (!max_lvl) {
> +/* User did not set the level, choose the highest */
> +if (*compat_lvl <= lvl) {
> +*compat_lvl = lvl;
> +*cpu_version = pvr;
> +}
> +} else if (max_lvl >= lvl) {
> +/* User chose the level, don't set higher than this */
> +*compat_lvl = lvl;
> +*cpu_version = pvr;
> +}
> +}
> +}
> +
>  #define OV5_DRCONF_MEMORY 0x20
> 
>  static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
> @@ -931,7 +964,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu_,
>  {
>  target_ulong list = ppc64_phys_to_real(args[0]);
>  target_ulong ov_table, ov5;
> -PowerPCCPUClass *pcc_ = POWERPC_CPU_GET_CLASS(cpu_);
> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu_);
>  CPUState *cs;
>  bool cpu_match = false, cpu_update = true, memory_update = false;
>  unsigned old_cpu_version = cpu_->cpu_version;
> @@ -958,29 +991,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu_,
>  cpu_match = true;
>  cpu_version = cpu_->cpu_version;
>  } else if (!cpu_match) {
> -/* If it is a logical PVR, try to determine the highest level */
> -unsigned lvl = get_compat_level(pvr);
> -if (lvl) {
> -bool is205 = (pcc_->pcr_mask & PCR_COMPAT_2_05) &&
> - (lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_05));
> -bool is206 = (pcc_->pcr_mask & PCR_COMPAT_2_06) &&
> -((lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06)) ||
> -(lvl == 
> get_compat_level(CPU_POWERPC_LOGICAL_2_06_PLUS)));
> -
> -if (is205 || is206) {
> -if (!max_lvl) {
> -/* User did not set the level, choose the highest */
> -if (compat_lvl <= lvl) {
> -compat_lvl = lvl;
> -cpu_version = pvr;
> -}
> -} else if (max_lvl >= lvl) {
> -/* User chose the level, don't set higher than this 
> */
> -compat_lvl = lvl;
> -cpu_version = pvr;
> -}
> -}
> -}
> +cas_handle_compat_cpu(pcc, pvr, max_lvl, _lvl, 
> _version);
>  }
>  /* Terminator record */
>  if (~pvr_mask & pvr) {
> @@ -990,7 +1001,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu_,
> 
>  /* Parsing finished */
>  trace_spapr_cas_pvr(cpu_->cpu_version, cpu_match,
> -cpu_version, pcc_->pcr_mask);
> +cpu_version, pcc->pcr_mask);
> 
>  /* Update CPUs */
>  if (old_cpu_version != cpu_version) {
> -- 
> 1.8.3.1
> 
> 




Re: [Qemu-devel] [PATCH 3/5] ppc: Provide function to get CPU class of the host CPU

2016-06-07 Thread Michael Roth
Quoting Thomas Huth (2016-06-07 10:39:38)
> When running with KVM, we might be interested in some details
> of the host CPU class, too, so provide a function to get the
> corresponding CPU class.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Michael Roth 

> ---
>  target-ppc/kvm.c | 19 ++-
>  target-ppc/kvm_ppc.h |  7 +++
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 24d6032..6c15361 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2329,6 +2329,19 @@ static PowerPCCPUClass 
> *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
>  return POWERPC_CPU_CLASS(oc);
>  }
> 
> +PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void)
> +{
> +uint32_t host_pvr = mfpvr();
> +PowerPCCPUClass *pvr_pcc;
> +
> +pvr_pcc = ppc_cpu_class_by_pvr(host_pvr);
> +if (pvr_pcc == NULL) {
> +pvr_pcc = ppc_cpu_class_by_pvr_mask(host_pvr);
> +}
> +
> +return pvr_pcc;
> +}
> +
>  static int kvm_ppc_register_host_cpu_type(void)
>  {
>  TypeInfo type_info = {
> @@ -2336,14 +2349,10 @@ static int kvm_ppc_register_host_cpu_type(void)
>  .instance_init = kvmppc_host_cpu_initfn,
>  .class_init = kvmppc_host_cpu_class_init,
>  };
> -uint32_t host_pvr = mfpvr();
>  PowerPCCPUClass *pvr_pcc;
>  DeviceClass *dc;
> 
> -pvr_pcc = ppc_cpu_class_by_pvr(host_pvr);
> -if (pvr_pcc == NULL) {
> -pvr_pcc = ppc_cpu_class_by_pvr_mask(host_pvr);
> -}
> +pvr_pcc = kvm_ppc_get_host_cpu_class();
>  if (pvr_pcc == NULL) {
>  return -1;
>  }
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 3b2090e..20bfb59 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -56,6 +56,7 @@ void kvmppc_hash64_write_pte(CPUPPCState *env, target_ulong 
> pte_index,
>  bool kvmppc_has_cap_fixup_hcalls(void);
>  int kvmppc_enable_hwrng(void);
>  int kvmppc_put_books_sregs(PowerPCCPU *cpu);
> +PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void);
> 
>  #else
> 
> @@ -252,6 +253,12 @@ static inline int kvmppc_put_books_sregs(PowerPCCPU *cpu)
>  {
>  abort();
>  }
> +
> +static inline PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void)
> +{
> +return NULL;
> +}
> +
>  #endif
> 
>  #ifndef CONFIG_KVM
> -- 
> 1.8.3.1
> 
> 




Re: [Qemu-devel] [PATCH 2/5] ppc: Split pcr_mask settings into supported bits and the register mask

2016-06-07 Thread Michael Roth
Quoting Thomas Huth (2016-06-07 10:39:37)
> The current pcr_mask values are ambiguous: Should these be the mask
> that defines valid bits in the PCR register? Or should these rather
> indicate which compatibility levels are possible? Anyway, POWER6 and
> POWER7 should certainly not use the same values here. So let's
> introduce an additional variable "pcr_supported" here which is
> used to indicate the valid compatibility levels, and use pcr_mask
> to signal the valid bits in the PCR register.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/ppc/spapr_hcall.c| 4 ++--
>  target-ppc/cpu-qom.h| 3 ++-
>  target-ppc/cpu.h| 1 +
>  target-ppc/translate_init.c | 6 --
>  4 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index bb8f4de..cc16249 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -934,9 +934,9 @@ static void cas_handle_compat_cpu(PowerPCCPUClass *pcc, 
> uint32_t pvr,
>  }
> 
>  /* If it is a logical PVR, try to determine the highest level */
> -is205 = (pcc->pcr_mask & PCR_COMPAT_2_05) &&
> +is205 = (pcc->pcr_supported & PCR_COMPAT_2_05) &&
>  (lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_05));
> -is206 = (pcc->pcr_mask & PCR_COMPAT_2_06) &&
> +is206 = (pcc->pcr_supported & PCR_COMPAT_2_06) &&
>  ((lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06)) ||
>   (lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06_PLUS)));
> 
> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
> index 07358aa..969ecdf 100644
> --- a/target-ppc/cpu-qom.h
> +++ b/target-ppc/cpu-qom.h
> @@ -165,7 +165,8 @@ typedef struct PowerPCCPUClass {
> 
>  uint32_t pvr;
>  bool (*pvr_match)(struct PowerPCCPUClass *pcc, uint32_t pvr);
> -uint64_t pcr_mask;
> +uint64_t pcr_mask;  /* Available bits in PCR register */
> +uint64_t pcr_supported; /* Bits for supported PowerISA versions */
>  uint32_t svr;
>  uint64_t insns_flags;
>  uint64_t insns_flags2;
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index c2962d7..c00a3b5 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -2202,6 +2202,7 @@ enum {
>  enum {
>  PCR_COMPAT_2_05 = 1ull << (63-62),
>  PCR_COMPAT_2_06 = 1ull << (63-61),
> +PCR_COMPAT_2_07 = 1ull << (63-60),

This gets introduced somewhat subtly here, maybe move it to patch 5?

>  PCR_VEC_DIS = 1ull << (63-0), /* Vec. disable (bit NA since 
> POWER8) */
>  PCR_VSX_DIS = 1ull << (63-1), /* VSX disable (bit NA since 
> POWER8) */
>  PCR_TM_DIS  = 1ull << (63-2), /* Trans. memory disable (POWER8) 
> */
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index a1db500..fa09183 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -8365,7 +8365,8 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
>  dc->desc = "POWER7";
>  dc->props = powerpc_servercpu_properties;
>  pcc->pvr_match = ppc_pvr_match_power7;
> -pcc->pcr_mask = PCR_COMPAT_2_05 | PCR_COMPAT_2_06;
> +pcc->pcr_mask = PCR_VEC_DIS | PCR_VSX_DIS | PCR_COMPAT_2_05;
> +pcc->pcr_supported = PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
>  pcc->init_proc = init_proc_POWER7;
>  pcc->check_pow = check_pow_nocheck;
>  pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
> @@ -8445,7 +8446,8 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
>  dc->desc = "POWER8";
>  dc->props = powerpc_servercpu_properties;
>  pcc->pvr_match = ppc_pvr_match_power8;
> -pcc->pcr_mask = PCR_COMPAT_2_05 | PCR_COMPAT_2_06;
> +pcc->pcr_mask = PCR_TM_DIS | PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
> +pcc->pcr_supported = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
>  pcc->init_proc = init_proc_POWER8;
>  pcc->check_pow = check_pow_nocheck;
>  pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
> -- 
> 1.8.3.1
> 
> 




Re: [Qemu-devel] [PATCH] docker: Don't use eval trick on Makefile

2016-06-07 Thread Fam Zheng
On Tue, 06/07 15:30, Peter Maydell wrote:
> On 7 June 2016 at 15:00, Peter Maydell  wrote:
> > On 7 June 2016 at 04:24, Fam Zheng  wrote:
> >> On Mon, 06/06 12:53, Eduardo Habkost wrote:
> >>> The eval trick for defining DOCKER_SRC_COPY doesn't do anything
> >>> useful, as DOCKER_SRC_COPY is immediately expanded just after it
> >>> is defined, and CUR_TIME is already defined using ":=". Simply
> >>> define it using ":=" so it is evaluated only once.
> >>>
> >>> The eval trick was also triggering an weird error on Travis builds:
> >>>   qemu/tests/docker/Makefile.include:34: *** unterminated variable 
> >>> reference.  Stop.
> >>>
> >>> The issue is not easily reproducible (maybe it's a bug in some
> >>> versions of Make), but it is avoided if removing the eval trick.
> 
> > Hi; I'd like to apply this direct to master, because one of my build
> > test machines hits this error intermittently, and so without the fix
> > I can't reliably process any other pull requests.
> 
> Now applied, thanks.

No problem, sorry for the trouble!

Fam



Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data

2016-06-07 Thread Emilio G. Cota
On Tue, Jun 07, 2016 at 18:56:48 +0300, Sergey Fedorov wrote:
> On 07/06/16 04:05, Emilio G. Cota wrote:
> > On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote:
> >> On 25/05/16 04:13, Emilio G. Cota wrote:
> >>> diff --git a/util/qdist.c b/util/qdist.c
> >>> new file mode 100644
> >>> index 000..3343640
> >>> --- /dev/null
> >>> +++ b/util/qdist.c
> >>> @@ -0,0 +1,386 @@
> >> (snip)
> >>> +
> >>> +void qdist_add(struct qdist *dist, double x, long count)
> >>> +{
> >>> +struct qdist_entry *entry = NULL;
> >>> +
> >>> +if (dist->entries) {
> >>> +struct qdist_entry e;
> >>> +
> >>> +e.x = x;
> >>> +entry = bsearch(, dist->entries, dist->n, sizeof(e), 
> >>> qdist_cmp);
> >>> +}
> >>> +
> >>> +if (entry) {
> >>> +entry->count += count;
> >>> +return;
> >>> +}
> >>> +
> >>> +dist->entries = g_realloc(dist->entries,
> >>> +  sizeof(*dist->entries) * (dist->n + 1));
> >> Repeated doubling?
> > Can you please elaborate?
> 
> I mean dynamic array with a growth factor of 2
> [https://en.wikipedia.org/wiki/Dynamic_array].

Changed to:

diff --git a/include/qemu/qdist.h b/include/qemu/qdist.h
index 6d8b701..f30050c 100644
--- a/include/qemu/qdist.h
+++ b/include/qemu/qdist.h
@@ -29,6 +29,7 @@ struct qdist_entry {
 struct qdist {
 struct qdist_entry *entries;
 size_t n;
+size_t size;
 };
 
 #define QDIST_PR_BORDER BIT(0)
diff --git a/util/qdist.c b/util/qdist.c
index dc9dbd1..3b54354 100644
--- a/util/qdist.c
+++ b/util/qdist.c
@@ -16,6 +16,7 @@
 void qdist_init(struct qdist *dist)
 {
 dist->entries = NULL;
+dist->size = 0;
 dist->n = 0;
 }
 
@@ -58,8 +59,11 @@ void qdist_add(struct qdist *dist, double x, long count)
 return;
 }
 
-dist->entries = g_realloc(dist->entries,
-  sizeof(*dist->entries) * (dist->n + 1));
+if (unlikely(dist->n == dist->size)) {
+dist->size = dist->size ? dist->size * 2 : 1;
+dist->entries = g_realloc(dist->entries,
+  sizeof(*dist->entries) * (dist->size));
+}
 dist->n++;
 entry = >entries[dist->n - 1];
 entry->x = x;


> >> (snip)
> >>> +static char *qdist_pr_internal(const struct qdist *dist)
> >>> +{
> >>> +double min, max, step;
> >>> +GString *s = g_string_new("");
> >>> +size_t i;
> >>> +
> >>> +/* if only one entry, its printout will be either full or empty */
> >>> +if (dist->n == 1) {
> >>> +if (dist->entries[0].count) {
> >>> +g_string_append_unichar(s, qdist_blocks[QDIST_NR_BLOCK_CODES 
> >>> - 1]);
> >>> +} else {
> >>> +g_string_append_c(s, ' ');
> >>> +}
> >>> +goto out;
> >>> +}
> >>> +
> >>> +/* get min and max counts */
> >>> +min = dist->entries[0].count;
> >>> +max = min;
> >>> +for (i = 0; i < dist->n; i++) {
> >>> +struct qdist_entry *e = >entries[i];
> >>> +
> >>> +if (e->count < min) {
> >>> +min = e->count;
> >>> +}
> >>> +if (e->count > max) {
> >>> +max = e->count;
> >>> +}
> >>> +}
> >>> +
> >>> +/* floor((count - min) * step) will give us the block index */
> >>> +step = (QDIST_NR_BLOCK_CODES - 1) / (max - min);
> >>> +
> >>> +for (i = 0; i < dist->n; i++) {
> >>> +struct qdist_entry *e = >entries[i];
> >>> +int index;
> >>> +
> >>> +/* make an exception with 0; instead of using block[0], print a 
> >>> space */
> >>> +if (e->count) {
> >>> +index = (int)((e->count - min) * step);
> >> So "e->count == min" gives us one eighth block instead of just space?
> > Yes, only 0 can print a space.
> 
> So our scale is not linear. I think some users might get confused by this.

That's correct. I think special-casing 0 makes sense though, since
it increases the signal-to-noise ratio of the histogram. For example:

1) 0 as ' ':
TB hash occupancy   31.84% avg chain occ. Histogram: [0,10)%|▆ █  
▅▁▃▁▁|[90,100]%
TB hash avg chain   1.015 buckets. Histogram: 1|█▁▁|3

2) 0 as '1/8':
TB hash occupancy   32.07% avg chain occ. Histogram: 
[0,10)%|▆▁█▁▁▅▁▃▁▁|[90,100]%
TB hash avg chain   1.015 buckets. Histogram: 1|▇▁▁|3

I think in these examples most users would be less confused by 1) than by 2).

(snip)
> >>> +to->n = from->n;
> >>> +memcpy(to->entries, from->entries, sizeof(*to->entries) * to->n);
> >>> +return;
> >>> +}
> >>> +
> >>> + rebin:
> 
> By the way, here's a space before the 'rebin' label.

Yes, I always do this.
It prevents diff from mistaking the label for a function definition,
and thus wrongly using the label as context. See:
  https://lkml.org/lkml/2010/6/16/312


> >>> +j_min = 0;
> >>> +for (i = 0; i < n; i++) {
> >>> +double x;
> >>> +double left, right;
> >>> +
> >>> +left = xmin + i * step;
> >>> +right = xmin + (i + 1) * step;
> >>> 

Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory

2016-06-07 Thread Michael Roth
Quoting Bharata B Rao (2016-06-07 00:19:03)
> Memory hotplug can fail for some combinations of RAM and maxmem when
> DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> on maximum addressable memory returned by guest and this value is currently
> being calculated wrongly by the guest kernel routine memory_hotplug_max().
> While there is an attempt to fix the guest kernel, this patch works
> around the problem within QEMU itself.
> 
> memory_hotplug_max() routine in the guest kernel arrives at max
> addressable memory by multiplying lmb-size with the lmb-count obtained
> from ibm,dynamic-memory property. There are two assumptions here:
> 
> - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
>   where only hot-pluggable LMBs are present in this property.
> - The memory area comprising of RAM and hotplug region is contiguous: This
>   needn't be true always for PowerKVM as there can be gap between
>   boot time RAM and hotplug region.
> 
> To work around this guest kernel bug, ensure that ibm,dynamic-memory
> has information about all the LMBs (RMA, boot-time LMBs, future
> hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> hotpluggable region).
> 
> RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> and also the LMBs for the gap b/n RAM and hotpluggable region as
> reserved so that these LMBs are not recounted/counted by guest.
> 
> Signed-off-by: Bharata B Rao 
> ---
> Changes in v3:
> 
> - Not touching spapr_create_lmb_dr_connectors() so that we continue
>   to have DRC objects for only hotpluggable LMBs.
> - Simplified the logic of creating dynamic-memory node based on comments
>   from Michael Roth and David Gibson.
> 
> v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> 
>  hw/ppc/spapr.c | 51 
> --
>  include/hw/ppc/spapr.h |  5 +++--
>  2 files changed, 36 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0636642..9d1d43d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -762,14 +762,17 @@ static int 
> spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
>  int ret, i, offset;
>  uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>  uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> -uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> +uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> +uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> +   memory_region_size(>hotplug_memory.mr)) /
> +   lmb_size;
>  uint32_t *int_buf, *cur_index, buf_len;
>  int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> 
>  /*
> - * Don't create the node if there are no DR LMBs.
> + * Don't create the node if there is no hotpluggable memory
>   */
> -if (!nr_lmbs) {
> +if (machine->ram_size == machine->maxram_size) {
>  return 0;
>  }
> 
> @@ -805,24 +808,36 @@ static int 
> spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
>  for (i = 0; i < nr_lmbs; i++) {
>  sPAPRDRConnector *drc;
>  sPAPRDRConnectorClass *drck;

Since these ^ are only used if (i >= hotplug_lmb_start), it might be
clearer to move them there now.

> -uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> +uint64_t addr = i * lmb_size;
>  uint32_t *dynamic_memory = cur_index;
> 
> -drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> -   addr/lmb_size);
> -g_assert(drc);
> -drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -
> -dynamic_memory[0] = cpu_to_be32(addr >> 32);
> -dynamic_memory[1] = cpu_to_be32(addr & 0x);
> -dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> -dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> -dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> -if (addr < machine->ram_size ||
> -memory_region_present(get_system_memory(), addr)) {
> -dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> +if (i >= hotplug_lmb_start) {
> +drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +   addr / lmb_size);

Could just be i

> +g_assert(drc);
> +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +dynamic_memory[0] = cpu_to_be32(addr >> 32);
> +dynamic_memory[1] = cpu_to_be32(addr & 0x);
> +dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> +dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> +dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> +if (memory_region_present(get_system_memory(), addr)) {
> +

Re: [Qemu-devel] [PATCH v4 04/28] qapi: Add parameter to visit_end_*

2016-06-07 Thread Eric Blake
On 06/01/2016 09:36 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Rather than making the dealloc visitor track of stack of pointers
>> remembered during visit_start_* in order to free them during
>> visit_end_*, it's a lot easier to just make all callers pass the
>> same pointer to visit_end_*.  The generated code has access to the
>> same pointer, while all other users are doing virtual walks and
>> can pass NULL.  The dealloc visitor is then greatly simplified.
>>
>> All three visit_end_*() functions intentionally take a void**,
>> even though the visit_start_*() functions differ between void**,
>> GenericList**, and GenericAlternate**.  This is done for several
>> reasons: when doing a virtual walk, passing NULL doesn't care
>> what the type is, but when doing a generated walk, we already
>> have to cast the caller's specific FOO* to call visit_start,
>> while using void** lets us use visit_end without a cast. Also,
>> an upcoming patch will add a clone visitor that wants to use
>> the same implementation for all three visit_end callbacks,
>> which is made easier if all three share the same signature.
>>
>> Signed-off-by: Eric Blake 
> [...]
>> diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c
>> index aea90a1..84f32fc 100644
>> --- a/qapi/qmp-input-visitor.c
>> +++ b/qapi/qmp-input-visitor.c
>> @@ -145,7 +145,7 @@ static void qmp_input_check_struct(Visitor *v, Error 
>> **errp)
>>  }
>>  }
>>
>> -static void qmp_input_pop(Visitor *v)
>> +static void qmp_input_pop(Visitor *v, void **obj)
>>  {
>>  QmpInputVisitor *qiv = to_qiv(v);
>>  StackObject *tos = >stack[qiv->nb_stack - 1];
> 
> You could assert @obj matches tos->obj.  Same for the other visitors
> that still need a stack.  Adding a stack to the ones that don't just for
> the assertion seems excessive, though.

At this point, only the QMP visitors track a stack (the OptsVisitor does
not, and we just got rid of the dealloc visitor stack); but since the
string visitors only support a top-level visit with no struct or nested
list, those can also support an assert. That makes 4 of the 6 visitors
at this stage in the series; and only 4/8 when the clone and json
visitors are added.  I'll go ahead and add it in, though.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data

2016-06-07 Thread Emilio G. Cota
On Tue, Jun 07, 2016 at 17:06:16 +0300, Sergey Fedorov wrote:
> On 07/06/16 02:40, Emilio G. Cota wrote:
> > On Fri, Jun 03, 2016 at 20:46:07 +0300, Sergey Fedorov wrote:
> >> Maybe something like
> >> https://en.wikipedia.org/wiki/Kahan_summation_algorithm could help?
> > That algorithm is overkill for what we're doing. Pairwise summation
> > should suffice:
> >
> > diff --git a/util/qdist.c b/util/qdist.c
> > index 3343640..909bd2b 100644
> > --- a/util/qdist.c
> > +++ b/util/qdist.c
> > @@ -367,20 +367,34 @@ unsigned long qdist_sample_count(const struct qdist 
> > *dist)
> >  return count;
> >  }
> >  
> > +static double qdist_pairwise_avg(const struct qdist *dist, size_t index,
> > + size_t n, unsigned long count)
> > +{
> > +if (n <= 2) {
> 
> We would like to amortize the overhead of the recursion by making the
> cut-off sufficiently large.

Yes, this was just for showing what it looked like.

We can use 128 here like JuliaLang does:
  https://github.com/JuliaLang/julia/blob/d98f2c0dcd/base/arraymath.jl#L366

(snip)
> Otherwise looks good.

Thanks!

Emilio



Re: [Qemu-devel] [Qemu-discuss] -serial option broken in master?

2016-06-07 Thread xiaoqiang zhao


> 在 2016年6月8日,05:24,Peter Maydell  写道:
> 
>> On 7 June 2016 at 15:47, Jérôme Forissier  
>> wrote:
>> Hi,
>> 
>> I just noticed this error [1] (QEMU master branch):
>> 
>> ../qemu/arm-softmmu/qemu-system-arm -nographic -monitor none -machine
>> virt -machine secure=on -cpu cortex-a15 -m 1057 -serial stdio -serial
>> file:serial1.log -bios
>> /home/travis/optee_repo/build/../out/bios-qemu/bios.bin
>> Unexpected error in parse_chr() at hw/core/qdev-properties-system.c:149:
>> qemu-system-arm: Property 'pl011.chardev' can't take value 'serial0',
>> it's in use
>> 
>> FYI, revert commits e5fabad7ccfd ("char: get rid of
>> qemu_char_get_next_serial") and f0d1d2c115df ("hw/char: QOM'ify pl011
>> model"), and the problem disappears.
>> 
>> Should I use a different syntax?
> 
> No, it's a bug that we broke this somehow. Xiaoqiang, could you
> have a look at this, please?
> 
> thanks
> -- PMM
> 
Okay!




Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Alex Williamson
On Tue, 7 Jun 2016 03:03:32 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, June 07, 2016 3:31 AM
> > 
> > On Mon, 6 Jun 2016 10:44:25 -0700
> > Neo Jia  wrote:
> >   
> > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > Neo Jia  wrote:
> > > >
> > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > This intends to handle an intercepted channel I/O instruction. It
> > > > basically need to do the following thing:  
> > >
> > > May I ask how and when QEMU knows that he needs to issue such VFIO ioctl 
> > > at
> > > first place?  
> > 
> > Yep, this is my question as well.  It sounds a bit like there's an
> > emulated device in QEMU that's trying to tell the mediated device when
> > to start an operation when we probably should be passing through
> > whatever i/o operations indicate that status directly to the mediated
> > device. Thanks,
> > 
> > Alex  
> 
> Below is copied from Dong's earlier post which said clear that
> a guest cmd submission will trigger the whole flow:
> 
> 
> Explanation:
> Q1-Q4: Qemu side process.
> K1-K6: Kernel side process.
> 
> Q1. Intercept a ssch instruction.
> Q2. Translate the guest ccw program to a user space ccw program
> (u_ccwchain).
> Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> K1. Copy from u_ccwchain to kernel (k_ccwchain).
> K2. Translate the user space ccw program to a kernel space ccw
> program, which becomes runnable for a real device.
> K3. With the necessary information contained in the orb passed in
> by Qemu, issue the k_ccwchain to the device, and wait event q
> for the I/O result.
> K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> update the user space irb.
> K6. Copy irb and scsw back to user space.
> Q4. Update the irb for the guest.
> 

Right, but this was the pre-mediated device approach, now we no longer
need step Q2 so we really only need Q1 and therefore Q3 to exist in
QEMU if those are operations that are not visible to the mediated
device; which they very well might be, since it's described as an
instruction rather than an i/o operation.  It's not terrible if that's
the case, vfio-pci has its own ioctl for doing a hot reset.
 
> My understanding is that such thing belongs to how device is mediated
> (so device driver specific), instead of something to be abstracted in 
> VFIO which manages resource but doesn't care how resource is used.
> 
> Actually we have same requirement in vGPU case, that a guest driver 
> needs submit GPU commands through some MMIO register. vGPU device 
> model will intercept the submission request (in its own way), do its 
> necessary scan/audit to ensure correctness/security, and then submit 
> to physical GPU through vendor specific interface. 
> 
> No difference with channel I/O here.

Well, if the GPU command is submitted through an MMIO register, is that
MMIO register part of the mediated device?  If so, could the mediated
device recognize the command and do the scan/audit itself?  QEMU must
not be the point at which mediation occurs for security purposes, QEMU
is userspace and userspace is not to be trusted.  I'm still open to
ioctls where it makes sense, as above, we have PCI specific ioctls and
already, but we need to evaluate each one, why it needs to exist, and
whether we can skip it if the mediated device can trigger the action on
its own.  After all, that's why we're using the vfio api, so we can
re-use much of the existing infrastructure, especially for a vGPU that
exposes itself as a PCI device.  Thanks,

Alex



Re: [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest

2016-06-07 Thread Huang, Kai



On 6/8/2016 6:46 AM, Alex Williamson wrote:

On Tue, 7 Jun 2016 17:21:06 +1200
"Huang, Kai"  wrote:


On 6/7/2016 3:58 PM, Alex Williamson wrote:

On Tue, 7 Jun 2016 11:20:32 +0800
Peter Xu  wrote:


On Mon, Jun 06, 2016 at 11:02:11AM -0600, Alex Williamson wrote:

On Mon, 6 Jun 2016 21:43:17 +0800
Peter Xu  wrote:


On Mon, Jun 06, 2016 at 07:11:41AM -0600, Alex Williamson wrote:

On Mon, 6 Jun 2016 13:04:07 +0800
Peter Xu  wrote:

[...]

Besides the reason that there might have guests that do not support
CM=1, will there be performance considerations? When user's
configuration does not require CM capability (e.g., generic VM
configuration, without VFIO), shall we allow user to disable the CM
bit so that we can have better IOMMU performance (avoid extra and
useless invalidations)?


With Alexey's proposed patch to have callback ops when the iommu
notifier list adds its first entry and removes its last, any of the
additional overhead to generate notifies when nobody is listening can
be avoided.  These same callbacks would be the ones that need to
generate a hw_error if a notifier is added while running in CM=0.


Not familar with Alexey's patch


https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00079.html


Thanks for the pointer. :)




, but is that for VFIO only?


vfio is currently the only user of the iommu notifier, but the
interface is generic, which is how it should (must) be.


Yes.




I mean, if
we configured CMbit=1, guest kernel will send invalidation request
every time it creates new entries (context entries, or iotlb
entries). Even without VFIO notifiers, guest need to trap into QEMU
and process the invalidation requests. This is avoidable if we are not
using VFIO devices at all (so no need to maintain any mappings),
right?


CM=1 only defines that not-present and invalid entries can be cached,
any changes to existing entries requires an invalidation regardless of
CM.  What you're looking for sounds more like ECAP.C:


Yes, but I guess what I was talking about is CM bit but not ECAP.C.
When we clear/replace one context entry, guest kernel will definitely
send one context entry invalidation to QEMU:

static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 
devfn)
{
if (!iommu)
return;

clear_context_table(iommu, bus, devfn);
iommu->flush.flush_context(iommu, 0, 0, 0,
   DMA_CCMD_GLOBAL_INVL);
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
}

... While if we are creating a new one (like attaching a new VFIO
device?), it's an optional behavior depending on whether CM bit is
set:

static int domain_context_mapping_one(struct dmar_domain *domain,
  struct intel_iommu *iommu,
  u8 bus, u8 devfn)
{
...
/*
 * It's a non-present to present mapping. If hardware doesn't cache
 * non-present entry we only need to flush the write-buffer. If the
 * _does_ cache non-present entries, then it does so in the special
 * domain #0, which we have to flush:
 */
if (cap_caching_mode(iommu->cap)) {
iommu->flush.flush_context(iommu, 0,
   (((u16)bus) << 8) | devfn,
   DMA_CCMD_MASK_NOBIT,
   DMA_CCMD_DEVICE_INVL);
iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
} else {
iommu_flush_write_buffer(iommu);
}
...
}

Only if cap_caching_mode() is set (which is bit 7, the CM bit), we
will send these invalidations. What I meant is that, we should allow
user to specify the CM bit, so that when we are not using VFIO
devices, we can skip the above flush_content() and flush_iotlb()
etc... So, besides the truth that we have some guests do not support
CM bit (like Jailhouse), performance might be another consideration
point that we should allow user to specify the CM bit themselfs.


I'm dubious of this, IOMMU drivers are already aware that hardware
flushes are expensive and do batching to optimize it.  The queued
invalidation mechanism itself is meant to allow asynchronous
invalidations.  QEMU invalidating a virtual IOMMU might very well be
faster than hardware.


Do batching doesn't mean we can eliminate the IOTLB flush for mappings
from non-present to present, in case of CM=1, while in case CM=0 those
IOTLB flush are not necessary, just like the code above shows. Therefore
generally speaking CM=0 should have better performance than CM=1, even
for Qemu's vIOMMU.

In my understanding the purpose of exposing CM=1 is to force guest do
IOTLB flush for each mapping change (including from non-present to
present) so Qemu is able to emulate each mapping change from guest
(correct me if I 

Re: [Qemu-devel] [PATCH 02/10] target-i386: cpu: move features logic that requires CPUState to realize time

2016-06-07 Thread Eduardo Habkost
On Tue, Jun 07, 2016 at 03:07:33PM -0600, Eric Blake wrote:
> On 06/07/2016 02:25 PM, Eduardo Habkost wrote:
> 
> > [...]
> >> +/* TODO: convert plus_features & minus_features static vars
> >> + * to global properties, once broken host_features is fixed
> >> + */
> > 
> > I will rewrite this to:
> > 
> > /*TODO: cpu->host_features inclurrectly overwrites features
> 
> Was that supposed to be "incorrectly" or "currently"?

Wow, that's an weird typo. I wrote "incorrectly" inclurrectly.

-- 
Eduardo



Re: [Qemu-devel] [PATCH] MAINTAINERS: add Artyom Tarasenko as SPARC maintainer

2016-06-07 Thread Mark Cave-Ayland
On 07/06/16 23:04, Mark Cave-Ayland wrote:

> Artyom has been working on QEMU's SPARC emulation for several years, providing
> initial support for Solaris under qemu-system-sparc and more recently bugfixes
> for qemu-system-sparc64 and TCG patch reviews. As work progresses on improving
> emulation for sun4u machines and beyond, Artyom has agreed to take on
> co-maintainership of SPARC with a focus on 64-bit architecture.
> ---
>  MAINTAINERS |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index df990a8..081bf20 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -191,6 +191,7 @@ F: disas/sh4.c
>  SPARC
>  M: Blue Swirl 
>  M: Mark Cave-Ayland 
> +M: Artyom Tarasenko 
>  S: Maintained
>  F: target-sparc/
>  F: hw/sparc/
> 

Obviously this has my implicit SoB which I'll add to v2 once an Ack has
been received.


ATB,

Mark.




[Qemu-devel] [PATCH] MAINTAINERS: add Artyom Tarasenko as SPARC maintainer

2016-06-07 Thread Mark Cave-Ayland
Artyom has been working on QEMU's SPARC emulation for several years, providing
initial support for Solaris under qemu-system-sparc and more recently bugfixes
for qemu-system-sparc64 and TCG patch reviews. As work progresses on improving
emulation for sun4u machines and beyond, Artyom has agreed to take on
co-maintainership of SPARC with a focus on 64-bit architecture.
---
 MAINTAINERS |1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index df990a8..081bf20 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -191,6 +191,7 @@ F: disas/sh4.c
 SPARC
 M: Blue Swirl 
 M: Mark Cave-Ayland 
+M: Artyom Tarasenko 
 S: Maintained
 F: target-sparc/
 F: hw/sparc/
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH 00/17] some ARM platform QOM'ify work

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 22:32, Mark Cave-Ayland  wrote:
> On 07/06/16 11:34, xiaoqiang zhao wrote:
>
>> This patch series QOM'ify ARM platform related devices.
>> Where we drop the sysbus init function if possible and use
>> instance_init and DeviceClass::realize function.
>>
>> xiaoqiang zhao (17):
>>   hw/i2c: QOM'ify bitbang_i2c.c
>>   hw/i2c: QOM'ify exynos4210_i2c.c
>>   hw/i2c: QOM'ify omap_i2c.c
>>   hw/i2c: QOM'ify versatile_i2c.c
>>   hw/gpio: QOM'ify mpc8xxx.c
>>   hw/gpio: QOM'ify omap_gpio.c
>>   hw/gpio: QOM'ify pl061.c
>>   hw/gpio: QOM'ify zaurus.c
>>   hw/misc: QOM'ify arm_l2x0.c
>>   hw/misc: QOM'ify eccmemctl.c
>>   hw/misc: QOM'ify exynos4210_pmu.c
>>   hw/misc: QOM'ify mst_fpga.c
>>   hw/misc: QOM'ify slavio_misc.c
>>   hw/dma: QOM'ify pxa2xx_dma.c
>>   hw/dma: QOM'ify sparc32_dma.c
>>   hw/dma: QOM'ify sun4m_iommu.c
>>   hw/sd: QOM'ify pl181.c
>>
>>  hw/dma/pxa2xx_dma.c  | 38 +-
>>  hw/dma/sparc32_dma.c | 25 
>>  hw/dma/sun4m_iommu.c | 12 --
>>  hw/gpio/mpc8xxx.c| 20 +---
>>  hw/gpio/omap_gpio.c  | 61 
>> 
>>  hw/gpio/pl061.c  | 24 +++
>>  hw/gpio/zaurus.c | 14 +--
>>  hw/i2c/bitbang_i2c.c | 14 +--
>>  hw/i2c/exynos4210_i2c.c  | 13 +--
>>  hw/i2c/omap_i2c.c| 44 --
>>  hw/i2c/versatile_i2c.c   | 19 +--
>>  hw/misc/arm_l2x0.c   | 11 -
>>  hw/misc/eccmemctl.c  | 25 +---
>>  hw/misc/exynos4210_pmu.c | 11 -
>>  hw/misc/mst_fpga.c   | 13 +--
>>  hw/misc/slavio_misc.c| 43 ++
>>  hw/sd/pl181.c| 26 +
>>  17 files changed, 207 insertions(+), 206 deletions(-)
>
> Patches 16 and 17 for sparc32_dma and sun4m_iommu are actually sun4m
> SPARC rather than ARM devices, so while I don't mind if these go through
> someone else's tree then please ensure that you also test
> qemu-system-sparc thoroughly with these patches.

I don't have a good set of sparc test images, so probably
better if you take those. I think eccmemctl.c is sparc too.
mpc8xxx.c is PPC.

thanks
-- PMM



Re: [Qemu-devel] [PATCH V9 0/9] Xilinx DisplayPort.

2016-06-07 Thread Alistair Francis
On Tue, Jun 7, 2016 at 1:30 PM,   wrote:
> From: KONRAD Frederic 

Hey Peter,

These are all reviewed by Xilinx, this is ready to merge from our point of view.

Thanks,

Alistair

>
> This is the 9th version of this patch-set of the implementation of the Xilinx
> DisplayPort and DPDMA.
>
> This 9th version fixes some minors issues.
>
> The fourth patch introduces an AUX bus needed by the DP to read the DPCD.
> It's also possible to connect an I2C device on it to do I2C through AUX
> commands. The drivers requires I2C broadcast write to be modeled as well which
> seemed to be missing currently upstream.
>
> The tree can be cloned at:
> g...@git.greensocs.com:fkonrad/xilinx_dp.git branch xilinx_dp_v9_release
>
> Details of the DPDMA part:
>  * DPDMA is implemented as a QEMU SYSBUS device.
>  * Interrupts are implemented except the axi error and fifo.
>
> Details of the XILINX-DP:
>  * DP is also implemented as a QEMU SYSBUS. Multiple memory regions are used 
> to
>avoid having a single big region as there are holes in the DP memory map.
>  * An aux-bus has been implemented, it creates a memory map for aux slaves and
>has an i2c bus (which is already implemented in QEMU).
>  * The normal programmable i2c clock and controller implementation is missing
>from the QEMU tree so the easiest way for us was to implement a dummy-clk
>driver in the kernel. It's a clock which does nothing but fakes a clock 
> such
>that the DPDMA driver works. The patch will be send separately.
>  * The graphic plane works on channel 3, video on channel 0 and audios on
>channel 4 and 5.
>
> Thanks,
> Fred
>
> V8 -> V9 changes:
>   * globally:
> * Rebased on current master (6ed5546fa7bf12c5b87ef76bafb86e1d77ed6e85).
>   * aux:
> * Coding style fix.
>
> V7 -> V8 changes:
>   * globally:
> * Rebased on current master (e854d0cf7847e70f5ed5dad5820fc1bbeda6f29e).
> * include qemu/osdep.h.
>   * xlnx-dp:
> * Coding style fix.
> * Drop xlnx_dp_aux_get/set.
>
> V6 -> V7 changes:
>   * globally:
> * Rebased on current master (0430891ce162b986c6e02a7729a942ecd2a32ca4).
> * Pick Peter's patch-set and rebase it on broadcast patch.
>   * xlnx-dp:
> * Print some unimplemented debug trace instead of aborting.
>   * zynq-mp:
> * Set realized before map the device.
> * Coding style fix.
>   * aux:
> * Factorize i2c access with i2c_send_recv.
>
> V5 -> V6 changes:
>   * globally:
> * Rebased on current master (38a762fec63fd5c035aae29ba9a77d357e21e4a7).
> * Fix some coding style issues.
>
> V4 -> V5 changes:
>   * aux:
> * Move the header include/hw => include/hw/misc
>   * dpcd:
> * Move the header hw/display => include/hw/display
>   * i2c-ddc:
> * Move the header hw/i2c => include/hw/i2c
>   * xlnx-dpdma:
> * Move the header hw/dma => include/hw/dma
> * Fix some styles issues.
>   * xlnx-dp:
> * Move the header hw/display => include/hw/display
>   * globally:
> * Rebased on current master (c49d3411faae8ffaab8f7e5db47405a008411c10).
>
> V3 -> V4 changes:
>   * xlnx_dpdma:
> * Initialize operation_finished during reset.
> * Add a function to trigger a VSYNC interrupt from the xlnx_dp.
>   * xlnx_dp:
> * Fix the default pixman format for video buffer.
> * Remove unused buffer.
>   * dpcd:
> * Add the missing DPCD_LANE_X_STATUS.
> * Set status field for all ports to avoid driver error.
> * Use 4 lines by default.
> * Use guest error in case of an outbound access.
>   * i2c broadcast:
> * Use a list of device instead of relying on broadcast field to remove 
> duped
>   code.
>   * other:
> * rebased on current master (774ee4772b6838b78741ea52d4bf26b8922244c5)
>
> V2 -> V3 changes:
>   * dpcd:
> * Add a CONFIG_DPCD.
>   * i2c-ddc:
> * Fill in VMSD.
>   * aux:
> * Remove address field.
> * Add a CONFIG_AUX.
>   * dpdma:
> * Fill in VMSD.
> * Some coding style changes.
>   * dp:
> * Fill in VMSD.
> * Coding style changes.
>
> V1 -> V2 changes:
>   * xlnx-zynqmp:
> * Remove the dummy object_property_add_child(..).
>   * dpcd:
> * Compile only when the ZYNQMP platform is compiled.
> * Use qemu_log instead of printf.
> * Compile test debug traces.
> * Remove the unused current_reg.
> * Remove the blank realize.
> * Use dpcd_ prefixes instead of aux_ prefixes.
> * Add a reset callback.
> * Add the VMSD.
> * Add size constraint in the MemoryRegionOps structure instead of 
> asserting.
> * Style fixes.
>   * aux:
> * Compile only when the ZYNQMP platform is compiled.
> * Remove the class init and the class for aux-slave.
>   * dpdma:
> * Compile only when the ZYNQMP platform is compiled.
> * Unify per channel macro in one, simplify the switch case.
> * Use extractXX.
> * Make DPDMA_GBL an or'ed register.
>   * dp:
> * Compile only when the ZYNQMP platform is 

Re: [Qemu-devel] [PATCH 00/17] some ARM platform QOM'ify work

2016-06-07 Thread Mark Cave-Ayland
On 07/06/16 11:34, xiaoqiang zhao wrote:

> This patch series QOM'ify ARM platform related devices.
> Where we drop the sysbus init function if possible and use
> instance_init and DeviceClass::realize function.
> 
> xiaoqiang zhao (17):
>   hw/i2c: QOM'ify bitbang_i2c.c
>   hw/i2c: QOM'ify exynos4210_i2c.c
>   hw/i2c: QOM'ify omap_i2c.c
>   hw/i2c: QOM'ify versatile_i2c.c
>   hw/gpio: QOM'ify mpc8xxx.c
>   hw/gpio: QOM'ify omap_gpio.c
>   hw/gpio: QOM'ify pl061.c
>   hw/gpio: QOM'ify zaurus.c
>   hw/misc: QOM'ify arm_l2x0.c
>   hw/misc: QOM'ify eccmemctl.c
>   hw/misc: QOM'ify exynos4210_pmu.c
>   hw/misc: QOM'ify mst_fpga.c
>   hw/misc: QOM'ify slavio_misc.c
>   hw/dma: QOM'ify pxa2xx_dma.c
>   hw/dma: QOM'ify sparc32_dma.c
>   hw/dma: QOM'ify sun4m_iommu.c
>   hw/sd: QOM'ify pl181.c
> 
>  hw/dma/pxa2xx_dma.c  | 38 +-
>  hw/dma/sparc32_dma.c | 25 
>  hw/dma/sun4m_iommu.c | 12 --
>  hw/gpio/mpc8xxx.c| 20 +---
>  hw/gpio/omap_gpio.c  | 61 
> 
>  hw/gpio/pl061.c  | 24 +++
>  hw/gpio/zaurus.c | 14 +--
>  hw/i2c/bitbang_i2c.c | 14 +--
>  hw/i2c/exynos4210_i2c.c  | 13 +--
>  hw/i2c/omap_i2c.c| 44 --
>  hw/i2c/versatile_i2c.c   | 19 +--
>  hw/misc/arm_l2x0.c   | 11 -
>  hw/misc/eccmemctl.c  | 25 +---
>  hw/misc/exynos4210_pmu.c | 11 -
>  hw/misc/mst_fpga.c   | 13 +--
>  hw/misc/slavio_misc.c| 43 ++
>  hw/sd/pl181.c| 26 +
>  17 files changed, 207 insertions(+), 206 deletions(-)

Patches 16 and 17 for sparc32_dma and sun4m_iommu are actually sun4m
SPARC rather than ARM devices, so while I don't mind if these go through
someone else's tree then please ensure that you also test
qemu-system-sparc thoroughly with these patches.


ATB,

Mark.




Re: [Qemu-devel] [PATCH 18/18] linux-user: Special-case ERESTARTSYS in target_strerror()

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 20:53, Laurent Vivier  wrote:
>
>
> Le 06/06/2016 à 20:58, Peter Maydell a écrit :
>> Since TARGET_ERESTARTSYS and TARGET_ESIGRETURN are internal-to-QEMU
>> error numbers, handle them specially in target_strerror(), to avoid
>> confusing strace output like:
>>
>> 9521 rt_sigreturn(14,8,274886297808,8,0,268435456) = -1 errno=513 (Unknown 
>> error 513)
>>
>> Signed-off-by: Peter Maydell 
>> ---
>>  linux-user/syscall.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>> index bcee02d..782d475 100644
>> --- a/linux-user/syscall.c
>> +++ b/linux-user/syscall.c
>> @@ -619,6 +619,13 @@ static inline int is_error(abi_long ret)
>>
>>  const char *target_strerror(int err)
>>  {
>> +if (err == TARGET_ERESTARTSYS) {
>> +return "To be restarted";
>> +}
>> +if (err == TARGET_QEMU_ESIGRETURN) {
>> +return "Successful exit from sigreturn";
>> +}
>> +
>>  if ((err >= ERRNO_TABLE_SIZE) || (err < 0)) {
>>  return NULL;
>>  }
>
> This is not the aim of this patch, but target_to_host_errno() has now
> these checks, perhaps we can remove this while we are here...

I think that would break the callers, which assume they can
pass in any number as a potential errno, and get
back NULL if it wasn't actually an errno. If we passed
them through to target_to_host_errno() it would pass
them on unchanged and the host strerror() would generate
a string "Unknown errno 134134234" or whatever.

thanks
-- PMM



Re: [Qemu-devel] [Qemu-discuss] -serial option broken in master?

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 15:47, Jérôme Forissier  wrote:
> Hi,
>
> I just noticed this error [1] (QEMU master branch):
>
> ../qemu/arm-softmmu/qemu-system-arm -nographic -monitor none -machine
> virt -machine secure=on -cpu cortex-a15 -m 1057 -serial stdio -serial
> file:serial1.log -bios
> /home/travis/optee_repo/build/../out/bios-qemu/bios.bin
> Unexpected error in parse_chr() at hw/core/qdev-properties-system.c:149:
> qemu-system-arm: Property 'pl011.chardev' can't take value 'serial0',
> it's in use
>
> FYI, revert commits e5fabad7ccfd ("char: get rid of
> qemu_char_get_next_serial") and f0d1d2c115df ("hw/char: QOM'ify pl011
> model"), and the problem disappears.
>
> Should I use a different syntax?

No, it's a bug that we broke this somehow. Xiaoqiang, could you
have a look at this, please?

thanks
-- PMM



Re: [Qemu-devel] [PATCH 14/18] linux-user: Use __get_user() and __put_user() to handle structs in do_fcntl()

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 21:41, Laurent Vivier  wrote:
>
> Le 06/06/2016 à 20:58, Peter Maydell a écrit :
>> Use the __get_user() and __put_user() to handle reading and writing the
>> guest structures in do_ioctl(). This has two benefits:
>>  * avoids possible errors due to misaligned guest pointers
>>  * correctly sign extends signed fields (like l_start in struct flock)
>>which might be different sizes between guest and host
>>
>> To do this we abstract out into copy_from/to_user functions. We
>> also standardize on always using host flock64 and the F_GETLK64
>> etc flock commands, as this means we always have 64 bit offsets
>> whether the host is 64-bit or 32-bit and we don't need to support
>> conversion to both host struct flock and struct flock64.
>>
>> In passing we fix errors in converting l_type from the host to
>> the target (where we were doing a byteswap of the host value
>> before trying to do the convert-bitmasks operation rather than
>> otherwise, and inexplicably shifting left by 1).
>
> I  think the ">> 1" is coming from:
>
> 43f238d Support fcntl F_GETLK64, F_SETLK64, F_SETLKW64
>
> to convert arm to x86, and should have been removed then in:
>
> 2ba7f73 alpha-linux-user: Translate fcntl l_type
>
> So yes, the ">> 1" is wrong. I don't understand how it can work.

Thanks for tracking down where it came from. I suspect it
just didn't work and nobody noticed, because:
 * there's not much use of big-on-little-endian
 * a lot of the time the bug is just going to downgrade an
   exclusive lock to a shared lock, and you won't notice if
   there isn't actually any contention on the lock...

>> Signed-off-by: Peter Maydell 
>> ---
>>  linux-user/syscall.c | 280 
>> +--
>>  1 file changed, 157 insertions(+), 123 deletions(-)
>>
>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>> index 4cf67c8..f3a487e 100644
>> --- a/linux-user/syscall.c
>> +++ b/linux-user/syscall.c
>> @@ -4894,11 +4894,11 @@ static int target_to_host_fcntl_cmd(int cmd)
>>   case TARGET_F_SETFL:
>>  return cmd;
>>  case TARGET_F_GETLK:
>> - return F_GETLK;
>> - case TARGET_F_SETLK:
>> - return F_SETLK;
>> - case TARGET_F_SETLKW:
>> - return F_SETLKW;
>> +return F_GETLK64;
>> +case TARGET_F_SETLK:
>> +return F_SETLK64;
>> +case TARGET_F_SETLKW:
>> +return F_SETLKW64;
>>   case TARGET_F_GETOWN:
>>   return F_GETOWN;
>>   case TARGET_F_SETOWN:
>
> I see no reason to have this in this patch.

The idea is that we want to use only one host flock struct,
which means it must be the one which supports 64-bit offsets.
On a 32-bit host, that's the flock64 struct, which must be
used with the F_GETLK64 fcntl, not F_GETLK.
On a 64-bit host, the system headers define that F_GETLK64
and F_GETLK are identical (and that the flock64 struct is flock),
so instead of having to specialcase 64-bit hosts, we can just
say "use the F_*64 constants and struct flock64 everywhere".

If we didn't have this hunk of the patch then on a 32-bit
host the code below would go wrong, because when we did
a guest F_GETLK we'd end up passing a (host) struct flock64
to the 32-bit F_GETLK.

>>  case TARGET_F_GETLK:
>> -if (!lock_user_struct(VERIFY_READ, target_fl, arg, 1))
>> +if (copy_from_user_flock(, arg)) {
>>  return -TARGET_EFAULT;
>
> why do you ignore the exact value returned by copy_from_user_flock()?
> You should return this value instead of guessing it.

Yeah, I was being lazy and not wanting to have an extra 'ret'
variable floating around. I'll fix this.

>> -fl64.l_type =
>> -   target_to_host_bitmask(tswap16(target_fl64->l_type), flock_tbl) 
>> >> 1;
>
> The ">> 1" disappears...

...and it's correct that it disappears, right?

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2 18/19] linux-user: Avoid possible misalignment in host_to_target_siginfo()

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 20:36, Laurent Vivier  wrote:
>
>
> Le 27/05/2016 à 16:52, Peter Maydell a écrit :
>> host_to_target_siginfo() is implemented by a combination of
>> host_to_target_siginfo_noswap() followed by tswap_siginfo().
>> The first of these two functions assumes that the target_siginfo_t
>> it is writing to is correctly aligned, but the pointer passed
>> into host_to_target_siginfo() is directly from the guest and
>> might be misaligned. Use a local variable to avoid this problem.
>> (tswap_siginfo() does now correctly handle a misaligned destination.)
>
> You mean the pointer from the guest can not be correctly aligned for the
> guest?

Might not be correctly aligned for the host (for that matter
it might not be correctly aligned for the guest,
if the guest is being malicious or buggy, but it's the
host alignment we care about.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH 02/10] target-i386: cpu: move features logic that requires CPUState to realize time

2016-06-07 Thread Eric Blake
On 06/07/2016 02:25 PM, Eduardo Habkost wrote:

> [...]
>> +/* TODO: convert plus_features & minus_features static vars
>> + * to global properties, once broken host_features is fixed
>> + */
> 
> I will rewrite this to:
> 
> /*TODO: cpu->host_features inclurrectly overwrites features

Was that supposed to be "incorrectly" or "currently"?

>  * set using "feat=on|off". Once we fix this, we can convert
>  * plus_features & minus_features to global properties
>  * inside x86_cpu_parse_featurestr() too.
>  */


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 17/19] linux-user: Use both si_code and si_signo when converting siginfo_t

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 20:22, Laurent Vivier  wrote:
> Where is coming from QEMU_SI_TIMER?
> It is not used elsewhere.

It's the enum constant that goes with "we use the
.sifields.timer fields of the union". At the moment we
don't have any cases which cause us to think we should
use those (and we didn't before this patch either), so
the case in this case statement is purely for completeness.
(I suspect the _timer fields are wrong anyway, since they're
pretty much dead code.)

The awkward thing about SI_TIMER is that because glibc
can call rt_sigqueueinfo() with a si_code of SI_TIMER[*] we
have no way to tell "this is a SI_TIMER signal from
the kernel with valid .timer fields" from "this is a
SI_TIMER from rt_sigqueueinfo with valid .rt fields".
So we assume it's always the latter.

[*] for instance, see thread_expire_timer() in
http://osxr.org:8080/glibc/source/nptl/sysdeps/pthread/timer_routines.c

thanks
-- PMM



[Qemu-devel] [Bug 1502613] Re: [Feature Request] Battery Status / Virtual Battery

2016-06-07 Thread Naftaly Avadiaev
I'm trying to add virtual battery to QEMU. More specifically, if the
HOST is running on battery power [laptop] I want to pass this knowledge
to GUEST.

I have looked at ACPI folder within QEMU source code, however was unable
to find the specific place where I can add this functionality.

Can someone provide me with general roadmap of what should I do and
where I should start? I suspect that changing the QEMU source code will
not be enough and I will also have to implement a driver for the GUEST.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1502613

Title:
  [Feature Request] Battery Status / Virtual Battery

Status in QEMU:
  New

Bug description:
  When using virtualization on notebooks heavily then virtual machines
  do not realize that they're running on a notebook device causing high
  power consumption because they're not switching into a optimized
  "laptop mode". This leads to the circumstance that they are trying to
  do things like defragmentation / virtus scan / etc. while the host is
  still running on batteries.

  So it would be great if QEMU / KVM would have support for emulating
  "Virtual Batteries" to guests causing them to enable power-saving
  options like disabling specific services / devices / file operations
  automatically by OS.

  Optionally a great feature would be to set virtual battery's status
  manually. For example: Current charge rate / charging / discharging /
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1502613/+subscriptions



[Qemu-devel] [PULL 1/1] tests: start a /qga/guest-exec test

2016-06-07 Thread Michael Roth
From: Marc-André Lureau 

Test a few guest-exec guest agent commands, added in qemu 2.5.

Signed-off-by: Marc-André Lureau 
Signed-off-by: Michael Roth 
---
 tests/test-qga.c | 81 
 1 file changed, 81 insertions(+)

diff --git a/tests/test-qga.c b/tests/test-qga.c
index 9c9039f..251b201 100644
--- a/tests/test-qga.c
+++ b/tests/test-qga.c
@@ -822,6 +822,84 @@ static void test_qga_fsfreeze_and_thaw(gconstpointer fix)
 QDECREF(ret);
 }
 
+static void test_qga_guest_exec(gconstpointer fix)
+{
+const TestFixture *fixture = fix;
+QDict *ret, *val;
+const gchar *out;
+guchar *decoded;
+int64_t pid, now, exitcode;
+gsize len;
+bool exited;
+
+/* exec 'echo foo bar' */
+ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
+ " 'path': '/bin/echo', 'arg': [ '-n', '\" test_str \"' ],"
+ " 'capture-output': true } }");
+g_assert_nonnull(ret);
+qmp_assert_no_error(ret);
+val = qdict_get_qdict(ret, "return");
+pid = qdict_get_int(val, "pid");
+g_assert_cmpint(pid, >, 0);
+QDECREF(ret);
+
+/* wait for completion */
+now = g_get_monotonic_time();
+do {
+ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec-status',"
+ " 'arguments': { 'pid': %" PRId64 "  } }", pid);
+g_assert_nonnull(ret);
+val = qdict_get_qdict(ret, "return");
+exited = qdict_get_bool(val, "exited");
+if (!exited) {
+QDECREF(ret);
+}
+} while (!exited &&
+ g_get_monotonic_time() < now + 5 * G_TIME_SPAN_SECOND);
+g_assert(exited);
+
+/* check stdout */
+exitcode = qdict_get_int(val, "exitcode");
+g_assert_cmpint(exitcode, ==, 0);
+out = qdict_get_str(val, "out-data");
+decoded = g_base64_decode(out, );
+g_assert_cmpint(len, ==, 12);
+g_assert_cmpstr((char *)decoded, ==, "\" test_str \"");
+g_free(decoded);
+QDECREF(ret);
+}
+
+static void test_qga_guest_exec_invalid(gconstpointer fix)
+{
+const TestFixture *fixture = fix;
+QDict *ret, *error;
+const gchar *class, *desc;
+
+/* invalid command */
+ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
+ " 'path': '/bin/invalid-cmd42' } }");
+g_assert_nonnull(ret);
+error = qdict_get_qdict(ret, "error");
+g_assert_nonnull(error);
+class = qdict_get_str(error, "class");
+desc = qdict_get_str(error, "desc");
+g_assert_cmpstr(class, ==, "GenericError");
+g_assert_cmpint(strlen(desc), >, 0);
+QDECREF(ret);
+
+/* invalid pid */
+ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec-status',"
+ " 'arguments': { 'pid': 0 } }");
+g_assert_nonnull(ret);
+error = qdict_get_qdict(ret, "error");
+g_assert_nonnull(error);
+class = qdict_get_str(error, "class");
+desc = qdict_get_str(error, "desc");
+g_assert_cmpstr(class, ==, "GenericError");
+g_assert_cmpint(strlen(desc), >, 0);
+QDECREF(ret);
+}
+
 int main(int argc, char **argv)
 {
 TestFixture fix;
@@ -852,6 +930,9 @@ int main(int argc, char **argv)
 
 g_test_add_data_func("/qga/blacklist", NULL, test_qga_blacklist);
 g_test_add_data_func("/qga/config", NULL, test_qga_config);
+g_test_add_data_func("/qga/guest-exec", , test_qga_guest_exec);
+g_test_add_data_func("/qga/guest-exec-invalid", ,
+ test_qga_guest_exec_invalid);
 
 if (g_getenv("QGA_TEST_SIDE_EFFECTING")) {
 g_test_add_data_func("/qga/fsfreeze-and-thaw", ,
-- 
1.9.1




[Qemu-devel] [PULL 0/1] qemu-ga patch queue

2016-06-07 Thread Michael Roth
The following changes since commit 6ed5546fa7bf12c5b87ef76bafb86e1d77ed6e85:

  Merge remote-tracking branch 
'remotes/mjt/tags/pull-trivial-patches-2016-06-07' into staging (2016-06-07 
16:34:45 +0100)

are available in the git repository at:


  git://github.com/mdroth/qemu.git tags/qga-pull-2016-07-07-tag

for you to fetch changes up to 3dab9fa1ac8fdfebbfbc5142ba42d89d96a6b5f4:

  tests: start a /qga/guest-exec test (2016-06-07 11:25:06 -0500)


qemu-ga patch queue

* add unit tests for guest-exec command set


Marc-André Lureau (1):
  tests: start a /qga/guest-exec test

 tests/test-qga.c | 81 
+
 1 file changed, 81 insertions(+)




Re: [Qemu-devel] [PATCHv2] tests: start a /qga/guest-exec test

2016-06-07 Thread Michael Roth
Quoting marcandre.lur...@redhat.com (2016-06-03 07:27:50)
> From: Marc-André Lureau 
> 
> Test a few guest-exec guest agent commands, added in qemu 2.5.
> 
> Signed-off-by: Marc-André Lureau 

Thanks, applied to qga tree:

  https://github.com/mdroth/qemu/tree/qga

> ---
>  tests/test-qga.c | 81 
> 
>  1 file changed, 81 insertions(+)
> 
> diff --git a/tests/test-qga.c b/tests/test-qga.c
> index 72a89de..10b29eb 100644
> --- a/tests/test-qga.c
> +++ b/tests/test-qga.c
> @@ -823,6 +823,84 @@ static void test_qga_fsfreeze_and_thaw(gconstpointer fix)
>  QDECREF(ret);
>  }
> 
> +static void test_qga_guest_exec(gconstpointer fix)
> +{
> +const TestFixture *fixture = fix;
> +QDict *ret, *val;
> +const gchar *out;
> +guchar *decoded;
> +int64_t pid, now, exitcode;
> +gsize len;
> +bool exited;
> +
> +/* exec 'echo foo bar' */
> +ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
> + " 'path': '/bin/echo', 'arg': [ '-n', '\" test_str \"' ],"
> + " 'capture-output': true } }");
> +g_assert_nonnull(ret);
> +qmp_assert_no_error(ret);
> +val = qdict_get_qdict(ret, "return");
> +pid = qdict_get_int(val, "pid");
> +g_assert_cmpint(pid, >, 0);
> +QDECREF(ret);
> +
> +/* wait for completion */
> +now = g_get_monotonic_time();
> +do {
> +ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec-status',"
> + " 'arguments': { 'pid': %" PRId64 "  } }", pid);
> +g_assert_nonnull(ret);
> +val = qdict_get_qdict(ret, "return");
> +exited = qdict_get_bool(val, "exited");
> +if (!exited) {
> +QDECREF(ret);
> +}
> +} while (!exited &&
> + g_get_monotonic_time() < now + 5 * G_TIME_SPAN_SECOND);
> +g_assert(exited);
> +
> +/* check stdout */
> +exitcode = qdict_get_int(val, "exitcode");
> +g_assert_cmpint(exitcode, ==, 0);
> +out = qdict_get_str(val, "out-data");
> +decoded = g_base64_decode(out, );
> +g_assert_cmpint(len, ==, 12);
> +g_assert_cmpstr((char *)decoded, ==, "\" test_str \"");
> +g_free(decoded);
> +QDECREF(ret);
> +}
> +
> +static void test_qga_guest_exec_invalid(gconstpointer fix)
> +{
> +const TestFixture *fixture = fix;
> +QDict *ret, *error;
> +const gchar *class, *desc;
> +
> +/* invalid command */
> +ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
> + " 'path': '/bin/invalid-cmd42' } }");
> +g_assert_nonnull(ret);
> +error = qdict_get_qdict(ret, "error");
> +g_assert_nonnull(error);
> +class = qdict_get_str(error, "class");
> +desc = qdict_get_str(error, "desc");
> +g_assert_cmpstr(class, ==, "GenericError");
> +g_assert_cmpint(strlen(desc), >, 0);
> +QDECREF(ret);
> +
> +/* invalid pid */
> +ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec-status',"
> + " 'arguments': { 'pid': 0 } }");
> +g_assert_nonnull(ret);
> +error = qdict_get_qdict(ret, "error");
> +g_assert_nonnull(error);
> +class = qdict_get_str(error, "class");
> +desc = qdict_get_str(error, "desc");
> +g_assert_cmpstr(class, ==, "GenericError");
> +g_assert_cmpint(strlen(desc), >, 0);
> +QDECREF(ret);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  TestFixture fix;
> @@ -853,6 +931,9 @@ int main(int argc, char **argv)
> 
>  g_test_add_data_func("/qga/blacklist", NULL, test_qga_blacklist);
>  g_test_add_data_func("/qga/config", NULL, test_qga_config);
> +g_test_add_data_func("/qga/guest-exec", , test_qga_guest_exec);
> +g_test_add_data_func("/qga/guest-exec-invalid", ,
> + test_qga_guest_exec_invalid);
> 
>  if (g_getenv("QGA_TEST_SIDE_EFFECTING")) {
>  g_test_add_data_func("/qga/fsfreeze-and-thaw", ,
> -- 
> 2.7.4
> 




Re: [Qemu-devel] [PATCH 14/18] linux-user: Use __get_user() and __put_user() to handle structs in do_fcntl()

2016-06-07 Thread Laurent Vivier


Le 06/06/2016 à 20:58, Peter Maydell a écrit :
> Use the __get_user() and __put_user() to handle reading and writing the
> guest structures in do_ioctl(). This has two benefits:
>  * avoids possible errors due to misaligned guest pointers
>  * correctly sign extends signed fields (like l_start in struct flock)
>which might be different sizes between guest and host
> 
> To do this we abstract out into copy_from/to_user functions. We
> also standardize on always using host flock64 and the F_GETLK64
> etc flock commands, as this means we always have 64 bit offsets
> whether the host is 64-bit or 32-bit and we don't need to support
> conversion to both host struct flock and struct flock64.
> 
> In passing we fix errors in converting l_type from the host to
> the target (where we were doing a byteswap of the host value
> before trying to do the convert-bitmasks operation rather than
> otherwise, and inexplicably shifting left by 1).

I  think the ">> 1" is coming from:

43f238d Support fcntl F_GETLK64, F_SETLK64, F_SETLKW64

to convert arm to x86, and should have been removed then in:

2ba7f73 alpha-linux-user: Translate fcntl l_type

So yes, the ">> 1" is wrong. I don't understand how it can work.

> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/syscall.c | 280 
> +--
>  1 file changed, 157 insertions(+), 123 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 4cf67c8..f3a487e 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -4894,11 +4894,11 @@ static int target_to_host_fcntl_cmd(int cmd)
>   case TARGET_F_SETFL:
>  return cmd;
>  case TARGET_F_GETLK:
> - return F_GETLK;
> - case TARGET_F_SETLK:
> - return F_SETLK;
> - case TARGET_F_SETLKW:
> - return F_SETLKW;
> +return F_GETLK64;
> +case TARGET_F_SETLK:
> +return F_SETLK64;
> +case TARGET_F_SETLKW:
> +return F_SETLKW64;
>   case TARGET_F_GETOWN:
>   return F_GETOWN;
>   case TARGET_F_SETOWN:

I see no reason to have this in this patch.

> @@ -4949,12 +4949,134 @@ static const bitmask_transtbl flock_tbl[] = {
>  { 0, 0, 0, 0 }
>  };
>  
> -static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
> +static inline abi_long copy_from_user_flock(struct flock64 *fl,
> +abi_ulong target_flock_addr)
>  {
> -struct flock fl;
>  struct target_flock *target_fl;
> +short l_type;
> +
> +if (!lock_user_struct(VERIFY_READ, target_fl, target_flock_addr, 1)) {
> +return -TARGET_EFAULT;
> +}
> +
> +__get_user(l_type, _fl->l_type);
> +fl->l_type = target_to_host_bitmask(l_type, flock_tbl);
> +__get_user(fl->l_whence, _fl->l_whence);
> +__get_user(fl->l_start, _fl->l_start);
> +__get_user(fl->l_len, _fl->l_len);
> +__get_user(fl->l_pid, _fl->l_pid);
> +unlock_user_struct(target_fl, target_flock_addr, 0);
> +return 0;
> +}
> +
> +static inline abi_long copy_to_user_flock(abi_ulong target_flock_addr,
> +  const struct flock64 *fl)
> +{
> +struct target_flock *target_fl;
> +short l_type;
> +
> +if (!lock_user_struct(VERIFY_WRITE, target_fl, target_flock_addr, 0)) {
> +return -TARGET_EFAULT;
> +}
> +
> +l_type = host_to_target_bitmask(fl->l_type, flock_tbl);
> +__put_user(l_type, _fl->l_type);
> +__put_user(fl->l_whence, _fl->l_whence);
> +__put_user(fl->l_start, _fl->l_start);
> +__put_user(fl->l_len, _fl->l_len);
> +__put_user(fl->l_pid, _fl->l_pid);
> +unlock_user_struct(target_fl, target_flock_addr, 1);
> +return 0;
> +}
> +
> +typedef abi_long from_flock64_fn(struct flock64 *fl, abi_ulong target_addr);
> +typedef abi_long to_flock64_fn(abi_ulong target_addr, const struct flock64 
> *fl);
> +
> +#ifdef TARGET_ARM
> +static inline abi_long copy_from_user_eabi_flock64(struct flock64 *fl,
> +   abi_ulong 
> target_flock_addr)
> +{
> +struct target_eabi_flock64 *target_fl;
> +short l_type;
> +
> +if (!lock_user_struct(VERIFY_READ, target_fl, target_flock_addr, 1)) {
> +return -TARGET_EFAULT;
> +}
> +
> +__get_user(l_type, _fl->l_type);
> +fl->l_type = target_to_host_bitmask(l_type, flock_tbl);
> +__get_user(fl->l_whence, _fl->l_whence);
> +__get_user(fl->l_start, _fl->l_start);
> +__get_user(fl->l_len, _fl->l_len);
> +__get_user(fl->l_pid, _fl->l_pid);
> +unlock_user_struct(target_fl, target_flock_addr, 0);
> +return 0;
> +}
> +
> +static inline abi_long copy_to_user_eabi_flock64(abi_ulong target_flock_addr,
> + const struct flock64 *fl)
> +{
> +struct target_eabi_flock64 *target_fl;
> +short l_type;
> +
> +if (!lock_user_struct(VERIFY_WRITE, target_fl, 

Re: [Qemu-devel] [V11 1/4] hw/i386: Introduce AMD IOMMU

2016-06-07 Thread Alex Williamson
On Sun, 22 May 2016 13:21:51 +0300
David Kiarie  wrote:

> Add AMD IOMMU emulaton to Qemu in addition to Intel IOMMU
> The IOMMU does basic translation, error checking and has a
> minimal IOTLB implementation. This IOMMU bypassed the need
> for target aborts by responding with IOMMU_NONE access rights
> and exempts the region 0xfee0-0xfeef from translation
> as it is the q35 interrupt region. We also advertise features
> that are not yet implemented to please the Linux IOMMU driver.
> 
> IOTLB aims at implementing commands on real IOMMUs which is
> essential for debugging and may not offer any performance
> benefits
> 
> Signed-off-by: David Kiarie 
> ---
>  hw/i386/Makefile.objs |1 +
>  hw/i386/amd_iommu.c   | 1401 
> +
>  hw/i386/amd_iommu.h   |  340 
>  include/hw/pci/pci.h  |2 +
>  4 files changed, 1744 insertions(+)
>  create mode 100644 hw/i386/amd_iommu.c
>  create mode 100644 hw/i386/amd_iommu.h

I don't see any callouts to memory_region_notify_iommu() here, so this
won't yet support assigned devices.  Do you have any plans to add that
support?  Thanks,

Alex



Re: [Qemu-devel] [PATCH 05/10] target-i386: cpu: consolidate calls of object_property_parse() in x86_cpu_parse_featurestr

2016-06-07 Thread Eduardo Habkost
On Mon, Jun 06, 2016 at 05:16:47PM +0200, Igor Mammedov wrote:
> From: Eduardo Habkost 
> 
> Signed-off-by: Eduardo Habkost 
> Reviewed-by: Igor Mammedov 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Eduardo Habkost 

A small suggestion in case we are going to need a new version of
this series, below:

> ---
> v1:
>  - fix error handling in of +-feat, Igor Mammedov 
>  - rebase on top of
> "target-i386: Remove xlevel & hv-spinlocks option fixups"
> ---
>  target-i386/cpu.c | 74 
> ---
>  1 file changed, 49 insertions(+), 25 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 349b971..f791a06 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -1957,43 +1957,67 @@ static void x86_cpu_parse_featurestr(CPUState *cs, 
> char *features,
>  char *featurestr; /* Single 'key=value" string being parsed */
>  Error *local_err = NULL;
>  
> -featurestr = features ? strtok(features, ",") : NULL;
> +if (!features) {
> +return;
> +}
>  
> -while (featurestr) {
> -char *val;
> +for (featurestr = strtok(features, ",");
> + featurestr;
> + featurestr = strtok(NULL, ",")) {
> +const char *name;
> +const char *val = NULL;
> +char *eq = NULL;
> +
> +/* Compatibility syntax: */
>  if (featurestr[0] == '+') {
>  add_flagname_to_bitmaps(featurestr + 1, plus_features, 
> _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
> +continue;

If you add an error_propagate() call to the end of the function,
this can be shortened to:

if (local_err) {
break;
}

Or maybe the loop could be simply written as:

for (featurestr = strtok(features, ",");
 featurestr && !local_err;
 featurestr = strtok(NULL, ","))

and we could avoid all the if (local_err) checks inside the loop
body.


>  } else if (featurestr[0] == '-') {
>  add_flagname_to_bitmaps(featurestr + 1, minus_features, 
> _err);
> -} else if ((val = strchr(featurestr, '='))) {
> -*val = 0; val++;
> -feat2prop(featurestr);
> -if (!strcmp(featurestr, "tsc-freq")) {
> -int64_t tsc_freq;
> -char *err;
> -char num[32];
> -
> -tsc_freq = qemu_strtosz_suffix_unit(val, ,
> -   QEMU_STRTOSZ_DEFSUFFIX_B, 
> 1000);
> -if (tsc_freq < 0 || *err) {
> -error_setg(errp, "bad numerical value %s", val);
> -return;
> -}
> -snprintf(num, sizeof(num), "%" PRId64, tsc_freq);
> -object_property_parse(OBJECT(cpu), num, "tsc-frequency",
> -  _err);
> -} else {
> -object_property_parse(OBJECT(cpu), val, featurestr, 
> _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
>  }
> +continue;
> +}
> +
> +eq = strchr(featurestr, '=');
> +if (eq) {
> +*eq++ = 0;
> +val = eq;
>  } else {
> -feat2prop(featurestr);
> -object_property_parse(OBJECT(cpu), "on", featurestr, _err);
> +val = "on";
> +}
> +
> +feat2prop(featurestr);
> +name = featurestr;
> +
> +/* Special case: */
> +if (!strcmp(name, "tsc-freq")) {
> +int64_t tsc_freq;
> +char *err;
> +char num[32];
> +
> +tsc_freq = qemu_strtosz_suffix_unit(val, ,
> +   QEMU_STRTOSZ_DEFSUFFIX_B, 1000);
> +if (tsc_freq < 0 || *err) {
> +error_setg(errp, "bad numerical value %s", val);
> +return;
> +}
> +snprintf(num, sizeof(num), "%" PRId64, tsc_freq);
> +val = num;
> +name = "tsc-frequency";
>  }
> +
> +object_property_parse(OBJECT(cpu), val, name, _err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
>  }
> -featurestr = strtok(NULL, ",");
>  }
>  }
>  
> -- 
> 1.8.3.1
> 

-- 
Eduardo



[Qemu-devel] [PATCH V9 9/9] arm: xlnx-zynqmp: Add xlnx-dp and xlnx-dpdma

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This adds the DP and the DPDMA to the Zynq MP platform.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Peter Crosthwaite 
Reviewed-by: Alistair Francis 
Tested-By: Hyun Kwon 
---
 hw/arm/xlnx-zynqmp.c | 32 +++-
 include/hw/arm/xlnx-zynqmp.h |  4 
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 308d677..23c7199 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -38,6 +38,12 @@
 #define SATA_ADDR   0xFD0C
 #define SATA_NUM_PORTS  2
 
+#define DP_ADDR 0xfd4a
+#define DP_IRQ  113
+
+#define DPDMA_ADDR  0xfd4c
+#define DPDMA_IRQ   116
+
 static const uint64_t gem_addr[XLNX_ZYNQMP_NUM_GEMS] = {
 0xFF0B, 0xFF0C, 0xFF0D, 0xFF0E,
 };
@@ -165,6 +171,12 @@ static void xlnx_zynqmp_init(Object *obj)
   TYPE_XILINX_SPIPS);
 qdev_set_parent_bus(DEVICE(>spi[i]), sysbus_get_default());
 }
+
+object_initialize(>dp, sizeof(s->dp), TYPE_XLNX_DP);
+qdev_set_parent_bus(DEVICE(>dp), sysbus_get_default());
+
+object_initialize(>dpdma, sizeof(s->dpdma), TYPE_XLNX_DPDMA);
+qdev_set_parent_bus(DEVICE(>dpdma), sysbus_get_default());
 }
 
 static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
@@ -388,8 +400,26 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 object_property_add_alias(OBJECT(s), bus_name,
   OBJECT(>spi[i]), "spi0",
   _abort);
-   g_free(bus_name);
+g_free(bus_name);
+}
+
+object_property_set_bool(OBJECT(>dp), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>dp), 0, DP_ADDR);
+sysbus_connect_irq(SYS_BUS_DEVICE(>dp), 0, gic_spi[DP_IRQ]);
+
+object_property_set_bool(OBJECT(>dpdma), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
 }
+object_property_set_link(OBJECT(>dp), OBJECT(>dpdma), "dpdma",
+ _abort);
+sysbus_mmio_map(SYS_BUS_DEVICE(>dpdma), 0, DPDMA_ADDR);
+sysbus_connect_irq(SYS_BUS_DEVICE(>dpdma), 0, gic_spi[DPDMA_IRQ]);
 }
 
 static Property xlnx_zynqmp_props[] = {
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index 68f6eb0..c2931bf 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -26,6 +26,8 @@
 #include "hw/ide/ahci.h"
 #include "hw/sd/sdhci.h"
 #include "hw/ssi/xilinx_spips.h"
+#include "hw/dma/xlnx_dpdma.h"
+#include "hw/display/xlnx_dp.h"
 
 #define TYPE_XLNX_ZYNQMP "xlnx,zynqmp"
 #define XLNX_ZYNQMP(obj) OBJECT_CHECK(XlnxZynqMPState, (obj), \
@@ -81,6 +83,8 @@ typedef struct XlnxZynqMPState {
 SysbusAHCIState sata;
 SDHCIState sdhci[XLNX_ZYNQMP_NUM_SDHCI];
 XilinxSPIPS spi[XLNX_ZYNQMP_NUM_SPIS];
+XlnxDPState dp;
+XlnxDPDMAState dpdma;
 
 char *boot_cpu;
 ARMCPU *boot_cpu_ptr;
-- 
1.8.3.1




[Qemu-devel] [PATCH V9 0/9] Xilinx DisplayPort.

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This is the 9th version of this patch-set of the implementation of the Xilinx
DisplayPort and DPDMA.

This 9th version fixes some minors issues.

The fourth patch introduces an AUX bus needed by the DP to read the DPCD.
It's also possible to connect an I2C device on it to do I2C through AUX
commands. The drivers requires I2C broadcast write to be modeled as well which
seemed to be missing currently upstream.

The tree can be cloned at:
g...@git.greensocs.com:fkonrad/xilinx_dp.git branch xilinx_dp_v9_release

Details of the DPDMA part:
 * DPDMA is implemented as a QEMU SYSBUS device.
 * Interrupts are implemented except the axi error and fifo.

Details of the XILINX-DP:
 * DP is also implemented as a QEMU SYSBUS. Multiple memory regions are used to
   avoid having a single big region as there are holes in the DP memory map.
 * An aux-bus has been implemented, it creates a memory map for aux slaves and
   has an i2c bus (which is already implemented in QEMU).
 * The normal programmable i2c clock and controller implementation is missing
   from the QEMU tree so the easiest way for us was to implement a dummy-clk
   driver in the kernel. It's a clock which does nothing but fakes a clock such
   that the DPDMA driver works. The patch will be send separately.
 * The graphic plane works on channel 3, video on channel 0 and audios on
   channel 4 and 5.

Thanks,
Fred

V8 -> V9 changes:
  * globally:
* Rebased on current master (6ed5546fa7bf12c5b87ef76bafb86e1d77ed6e85).
  * aux:
* Coding style fix.

V7 -> V8 changes:
  * globally:
* Rebased on current master (e854d0cf7847e70f5ed5dad5820fc1bbeda6f29e).
* include qemu/osdep.h.
  * xlnx-dp:
* Coding style fix.
* Drop xlnx_dp_aux_get/set.

V6 -> V7 changes:
  * globally:
* Rebased on current master (0430891ce162b986c6e02a7729a942ecd2a32ca4).
* Pick Peter's patch-set and rebase it on broadcast patch.
  * xlnx-dp:
* Print some unimplemented debug trace instead of aborting.
  * zynq-mp:
* Set realized before map the device.
* Coding style fix.
  * aux:
* Factorize i2c access with i2c_send_recv.

V5 -> V6 changes:
  * globally:
* Rebased on current master (38a762fec63fd5c035aae29ba9a77d357e21e4a7).
* Fix some coding style issues.

V4 -> V5 changes:
  * aux:
* Move the header include/hw => include/hw/misc
  * dpcd:
* Move the header hw/display => include/hw/display
  * i2c-ddc:
* Move the header hw/i2c => include/hw/i2c
  * xlnx-dpdma:
* Move the header hw/dma => include/hw/dma
* Fix some styles issues.
  * xlnx-dp:
* Move the header hw/display => include/hw/display
  * globally:
* Rebased on current master (c49d3411faae8ffaab8f7e5db47405a008411c10).

V3 -> V4 changes:
  * xlnx_dpdma:
* Initialize operation_finished during reset.
* Add a function to trigger a VSYNC interrupt from the xlnx_dp.
  * xlnx_dp:
* Fix the default pixman format for video buffer.
* Remove unused buffer.
  * dpcd:
* Add the missing DPCD_LANE_X_STATUS.
* Set status field for all ports to avoid driver error.
* Use 4 lines by default.
* Use guest error in case of an outbound access.
  * i2c broadcast:
* Use a list of device instead of relying on broadcast field to remove duped
  code.
  * other:
* rebased on current master (774ee4772b6838b78741ea52d4bf26b8922244c5)

V2 -> V3 changes:
  * dpcd:
* Add a CONFIG_DPCD.
  * i2c-ddc:
* Fill in VMSD.
  * aux:
* Remove address field.
* Add a CONFIG_AUX.
  * dpdma:
* Fill in VMSD.
* Some coding style changes.
  * dp:
* Fill in VMSD.
* Coding style changes.

V1 -> V2 changes:
  * xlnx-zynqmp:
* Remove the dummy object_property_add_child(..).
  * dpcd:
* Compile only when the ZYNQMP platform is compiled.
* Use qemu_log instead of printf.
* Compile test debug traces.
* Remove the unused current_reg.
* Remove the blank realize.
* Use dpcd_ prefixes instead of aux_ prefixes.
* Add a reset callback.
* Add the VMSD.
* Add size constraint in the MemoryRegionOps structure instead of asserting.
* Style fixes.
  * aux:
* Compile only when the ZYNQMP platform is compiled.
* Remove the class init and the class for aux-slave.
  * dpdma:
* Compile only when the ZYNQMP platform is compiled.
* Unify per channel macro in one, simplify the switch case.
* Use extractXX.
* Make DPDMA_GBL an or'ed register.
  * dp:
* Compile only when the ZYNQMP platform is compiled.
* Don't look at the audio channel count.
* Use a third pixman plane when we do blending.
  * other:
* Drop the useless "console: add qemu_alloc_display_format." patch as
  suggested by Gerd.
* Rebase on current master (f3e3b083d4c266ea864ae3c83da49d4086857679).

KONRAD Frederic (7):
  i2cbus: remove unused dev field
  i2c: implement broadcast write
  introduce aux-bus
  introduce dpcd module
  introduce xlnx-dpdma
  

[Qemu-devel] [PATCH V9 7/9] introduce xlnx-dpdma

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This is the implementation of the DPDMA.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Tested-By: Hyun Kwon 
---
 hw/dma/Makefile.objs|   1 +
 hw/dma/xlnx_dpdma.c | 794 
 include/hw/dma/xlnx_dpdma.h |  85 +
 3 files changed, 880 insertions(+)
 create mode 100644 hw/dma/xlnx_dpdma.c
 create mode 100644 include/hw/dma/xlnx_dpdma.h

diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index a1abbcf..8b0823e 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -8,6 +8,7 @@ common-obj-$(CONFIG_XILINX_AXI) += xilinx_axidma.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_dma.o
 common-obj-$(CONFIG_STP2000) += sparc32_dma.o
 common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
+obj-$(CONFIG_XLNX_ZYNQMP) += xlnx_dpdma.o
 
 obj-$(CONFIG_OMAP) += omap_dma.o soc_dma.o
 obj-$(CONFIG_PXA2XX) += pxa2xx_dma.o
diff --git a/hw/dma/xlnx_dpdma.c b/hw/dma/xlnx_dpdma.c
new file mode 100644
index 000..97a5da7
--- /dev/null
+++ b/hw/dma/xlnx_dpdma.c
@@ -0,0 +1,794 @@
+/*
+ * xlnx_dpdma.c
+ *
+ *  Copyright (C) 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/dma/xlnx_dpdma.h"
+
+#ifndef DEBUG_DPDMA
+#define DEBUG_DPDMA 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_DPDMA) { 
\
+qemu_log("xlnx_dpdma: " fmt , ## __VA_ARGS__); 
\
+}  
\
+} while (0);
+
+/*
+ * Registers offset for DPDMA.
+ */
+#define DPDMA_ERR_CTRL(0x)
+#define DPDMA_ISR (0x0004 >> 2)
+#define DPDMA_IMR (0x0008 >> 2)
+#define DPDMA_IEN (0x000C >> 2)
+#define DPDMA_IDS (0x0010 >> 2)
+#define DPDMA_EISR(0x0014 >> 2)
+#define DPDMA_EIMR(0x0018 >> 2)
+#define DPDMA_EIEN(0x001C >> 2)
+#define DPDMA_EIDS(0x0020 >> 2)
+#define DPDMA_CNTL(0x0100 >> 2)
+
+#define DPDMA_GBL (0x0104 >> 2)
+#define DPDMA_GBL_TRG_CH(n)   (1 << n)
+#define DPDMA_GBL_RTRG_CH(n)  (1 << 6 << n)
+
+#define DPDMA_ALC0_CNTL   (0x0108 >> 2)
+#define DPDMA_ALC0_STATUS (0x010C >> 2)
+#define DPDMA_ALC0_MAX(0x0110 >> 2)
+#define DPDMA_ALC0_MIN(0x0114 >> 2)
+#define DPDMA_ALC0_ACC(0x0118 >> 2)
+#define DPDMA_ALC0_ACC_TRAN   (0x011C >> 2)
+#define DPDMA_ALC1_CNTL   (0x0120 >> 2)
+#define DPDMA_ALC1_STATUS (0x0124 >> 2)
+#define DPDMA_ALC1_MAX(0x0128 >> 2)
+#define DPDMA_ALC1_MIN(0x012C >> 2)
+#define DPDMA_ALC1_ACC(0x0130 >> 2)
+#define DPDMA_ALC1_ACC_TRAN   (0x0134 >> 2)
+
+#define DPDMA_DSCR_STRT_ADDRE_CH(n)   ((0x0200 + n * 0x100) >> 2)
+#define DPDMA_DSCR_STRT_ADDR_CH(n)((0x0204 + n * 0x100) >> 2)
+#define DPDMA_DSCR_NEXT_ADDRE_CH(n)   ((0x0208 + n * 0x100) >> 2)
+#define DPDMA_DSCR_NEXT_ADDR_CH(n)((0x020C + n * 0x100) >> 2)
+#define DPDMA_PYLD_CUR_ADDRE_CH(n)((0x0210 + n * 0x100) >> 2)
+#define DPDMA_PYLD_CUR_ADDR_CH(n) ((0x0214 + n * 0x100) >> 2)
+
+#define DPDMA_CNTL_CH(n)  ((0x0218 + n * 0x100) >> 2)
+#define DPDMA_CNTL_CH_EN  (1)
+#define DPDMA_CNTL_CH_PAUSED  (1 << 1)
+
+#define DPDMA_STATUS_CH(n)((0x021C + n * 0x100) >> 2)
+#define DPDMA_STATUS_BURST_TYPE   (1 << 4)
+#define DPDMA_STATUS_MODE (1 << 5)
+#define DPDMA_STATUS_EN_CRC   (1 << 6)
+#define 

Re: [Qemu-devel] [PATCH] Make password based authentication the default for VNC

2016-06-07 Thread Gerd Hoffmann
  Hi,

> Agreed. The target of this patch is however not people who know that
> they want security, but rather people who don't know it :-). Ie.
> people who just run things with their default settings and stop as
> soon as it seems to work, without conideration for security.

I have my doubts this is going to work.  The wikis of this world will
start to include the ",insecure", pretty much like they include
",disable-ticketing" for -spice today.  And people will cut+paste that.

Flipping defaults often breaks things, and this really doesn't look like
a good reason to take that risk.

cheers,
  Gerd




[Qemu-devel] [PATCH V9 2/9] i2c: implement broadcast write

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This does a write to every slaves when the I2C bus get a write to address 0.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 hw/i2c/core.c | 129 ++
 1 file changed, 75 insertions(+), 54 deletions(-)

diff --git a/hw/i2c/core.c b/hw/i2c/core.c
index 013ff68..a3921d9 100644
--- a/hw/i2c/core.c
+++ b/hw/i2c/core.c
@@ -10,11 +10,19 @@
 #include "qemu/osdep.h"
 #include "hw/i2c/i2c.h"
 
+typedef struct I2CNode I2CNode;
+
+struct I2CNode {
+I2CSlave *elt;
+QLIST_ENTRY(I2CNode) next;
+};
+
 struct I2CBus
 {
 BusState qbus;
-I2CSlave *current_dev;
+QLIST_HEAD(, I2CNode) current_devs;
 uint8_t saved_address;
+bool broadcast;
 };
 
 static Property i2c_props[] = {
@@ -35,17 +43,12 @@ static void i2c_bus_pre_save(void *opaque)
 {
 I2CBus *bus = opaque;
 
-bus->saved_address = bus->current_dev ? bus->current_dev->address : -1;
-}
-
-static int i2c_bus_post_load(void *opaque, int version_id)
-{
-I2CBus *bus = opaque;
-
-/* The bus is loaded before attached devices, so load and save the
-   current device id.  Devices will check themselves as loaded.  */
-bus->current_dev = NULL;
-return 0;
+bus->saved_address = -1;
+if (!QLIST_EMPTY(>current_devs)) {
+if (!bus->broadcast) {
+bus->saved_address = QLIST_FIRST(>current_devs)->elt->address;
+}
+}
 }
 
 static const VMStateDescription vmstate_i2c_bus = {
@@ -53,9 +56,9 @@ static const VMStateDescription vmstate_i2c_bus = {
 .version_id = 1,
 .minimum_version_id = 1,
 .pre_save = i2c_bus_pre_save,
-.post_load = i2c_bus_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT8(saved_address, I2CBus),
+VMSTATE_BOOL(broadcast, I2CBus),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -78,7 +81,7 @@ void i2c_set_slave_address(I2CSlave *dev, uint8_t address)
 /* Return nonzero if bus is busy.  */
 int i2c_bus_busy(I2CBus *bus)
 {
-return bus->current_dev != NULL;
+return !QLIST_EMPTY(>current_devs);
 }
 
 /* Returns non-zero if the address is not valid.  */
@@ -86,95 +89,109 @@ int i2c_bus_busy(I2CBus *bus)
 int i2c_start_transfer(I2CBus *bus, uint8_t address, int recv)
 {
 BusChild *kid;
-I2CSlave *slave = NULL;
 I2CSlaveClass *sc;
+I2CNode *node;
+
+if (address == 0x00) {
+/*
+ * This is a broadcast, the current_devs will be all the devices of the
+ * bus.
+ */
+bus->broadcast = true;
+}
 
 QTAILQ_FOREACH(kid, >qbus.children, sibling) {
 DeviceState *qdev = kid->child;
 I2CSlave *candidate = I2C_SLAVE(qdev);
-if (candidate->address == address) {
-slave = candidate;
-break;
+if ((candidate->address == address) || (bus->broadcast)) {
+node = g_malloc(sizeof(struct I2CNode));
+node->elt = candidate;
+QLIST_INSERT_HEAD(>current_devs, node, next);
+if (!bus->broadcast) {
+break;
+}
 }
 }
 
-if (!slave) {
+if (QLIST_EMPTY(>current_devs)) {
 return 1;
 }
 
-sc = I2C_SLAVE_GET_CLASS(slave);
-/* If the bus is already busy, assume this is a repeated
-   start condition.  */
-bus->current_dev = slave;
-if (sc->event) {
-sc->event(slave, recv ? I2C_START_RECV : I2C_START_SEND);
+QLIST_FOREACH(node, >current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+/* If the bus is already busy, assume this is a repeated
+   start condition.  */
+if (sc->event) {
+sc->event(node->elt, recv ? I2C_START_RECV : I2C_START_SEND);
+}
 }
 return 0;
 }
 
 void i2c_end_transfer(I2CBus *bus)
 {
-I2CSlave *dev = bus->current_dev;
 I2CSlaveClass *sc;
+I2CNode *node;
 
-if (!dev) {
+if (QLIST_EMPTY(>current_devs)) {
 return;
 }
 
-sc = I2C_SLAVE_GET_CLASS(dev);
-if (sc->event) {
-sc->event(dev, I2C_FINISH);
+QLIST_FOREACH(node, >current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+if (sc->event) {
+sc->event(node->elt, I2C_FINISH);
+}
+QLIST_REMOVE(node, next);
+g_free(node);
 }
-
-bus->current_dev = NULL;
+bus->broadcast = false;
 }
 
 int i2c_send(I2CBus *bus, uint8_t data)
 {
-I2CSlave *dev = bus->current_dev;
 I2CSlaveClass *sc;
+I2CNode *node;
+int ret = -1;
 
-if (!dev) {
-return -1;
-}
-
-sc = I2C_SLAVE_GET_CLASS(dev);
-if (sc->send) {
-return sc->send(dev, data);
+QLIST_FOREACH(node, >current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+if (sc->send) {
+ret 

[Qemu-devel] [PATCH V9 8/9] introduce xlnx-dp

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This is the implementation of the DisplayPort.
It has an aux-bus to access dpcd and edid.

Graphic plane is connected to the channel 3.
Video plane is connected to the channel 0.
Audio stream are connected to the channels 4 and 5.

Signed-off-by: KONRAD Frederic 
Tested-By: Hyun Kwon 
Reviewed-by: Alistair Francis 
---
 hw/display/Makefile.objs |1 +
 hw/display/xlnx_dp.c | 1336 ++
 include/hw/display/xlnx_dp.h |  109 
 3 files changed, 1446 insertions(+)
 create mode 100644 hw/display/xlnx_dp.c
 create mode 100644 include/hw/display/xlnx_dp.h

diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index ddf3275..063889b 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -44,3 +44,4 @@ virtio-gpu.o-libs += $(VIRGL_LIBS)
 virtio-gpu-3d.o-cflags := $(VIRGL_CFLAGS)
 virtio-gpu-3d.o-libs += $(VIRGL_LIBS)
 obj-$(CONFIG_DPCD) += dpcd.o
+obj-$(CONFIG_XLNX_ZYNQMP) += xlnx_dp.o
diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
new file mode 100644
index 000..552955f
--- /dev/null
+++ b/hw/display/xlnx_dp.c
@@ -0,0 +1,1336 @@
+/*
+ * xlnx_dp.c
+ *
+ *  Copyright (C) 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option)any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/display/xlnx_dp.h"
+
+#ifndef DEBUG_DP
+#define DEBUG_DP 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_DP) {
\
+qemu_log("xlnx_dp: " fmt , ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+/*
+ * Register offset for DP.
+ */
+#define DP_LINK_BW_SET  (0x >> 2)
+#define DP_LANE_COUNT_SET   (0x0004 >> 2)
+#define DP_ENHANCED_FRAME_EN(0x0008 >> 2)
+#define DP_TRAINING_PATTERN_SET (0x000C >> 2)
+#define DP_LINK_QUAL_PATTERN_SET(0x0010 >> 2)
+#define DP_SCRAMBLING_DISABLE   (0x0014 >> 2)
+#define DP_DOWNSPREAD_CTRL  (0x0018 >> 2)
+#define DP_SOFTWARE_RESET   (0x001C >> 2)
+#define DP_TRANSMITTER_ENABLE   (0x0080 >> 2)
+#define DP_MAIN_STREAM_ENABLE   (0x0084 >> 2)
+#define DP_FORCE_SCRAMBLER_RESET(0x00C0 >> 2)
+#define DP_VERSION_REGISTER (0x00F8 >> 2)
+#define DP_CORE_ID  (0x00FC >> 2)
+
+#define DP_AUX_COMMAND_REGISTER (0x0100 >> 2)
+#define AUX_ADDR_ONLY_MASK  (0x1000)
+#define AUX_COMMAND_MASK(0x0F00)
+#define AUX_COMMAND_SHIFT   (8)
+#define AUX_COMMAND_NBYTES  (0x000F)
+
+#define DP_AUX_WRITE_FIFO   (0x0104 >> 2)
+#define DP_AUX_ADDRESS  (0x0108 >> 2)
+#define DP_AUX_CLOCK_DIVIDER(0x010C >> 2)
+#define DP_TX_USER_FIFO_OVERFLOW(0x0110 >> 2)
+#define DP_INTERRUPT_SIGNAL_STATE   (0x0130 >> 2)
+#define DP_AUX_REPLY_DATA   (0x0134 >> 2)
+#define DP_AUX_REPLY_CODE   (0x0138 >> 2)
+#define DP_AUX_REPLY_COUNT  (0x013C >> 2)
+#define DP_REPLY_DATA_COUNT (0x0148 >> 2)
+#define DP_REPLY_STATUS (0x014C >> 2)
+#define DP_HPD_DURATION (0x0150 >> 2)
+#define DP_MAIN_STREAM_HTOTAL   (0x0180 >> 2)
+#define DP_MAIN_STREAM_VTOTAL   (0x0184 >> 2)
+#define DP_MAIN_STREAM_POLARITY (0x0188 >> 2)
+#define DP_MAIN_STREAM_HSWIDTH  (0x018C >> 2)
+#define DP_MAIN_STREAM_VSWIDTH  (0x0190 >> 2)
+#define DP_MAIN_STREAM_HRES (0x0194 >> 2)
+#define DP_MAIN_STREAM_VRES (0x0198 >> 2)
+#define DP_MAIN_STREAM_HSTART   (0x019C >> 2)
+#define DP_MAIN_STREAM_VSTART   (0x01A0 >> 2)
+#define DP_MAIN_STREAM_MISC0(0x01A4 >> 2)
+#define DP_MAIN_STREAM_MISC1(0x01A8 >> 2)
+#define 

[Qemu-devel] [PATCH V9 3/9] i2c: Factor our send() and recv() common logic

2016-06-07 Thread fred . konrad
From: Peter Crosthwaite 

Most of the control flow logic between send and recv (error checking
etc) is the same. Factor this out into a common send_recv() API.
This is then usable by clients, where the control logic for send
and receive differs only by a boolean. E.g.

if (send)
   i2c_send(...):
else
   i2c_recv(...);

becomes:

i2c_send_recv(... , send);

Signed-off-by: Peter Crosthwaite 
Changes from FK:
  * Rebased on master.
  * Rebased on my i2c broadcast patch.
Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
---
 hw/i2c/core.c| 48 
 include/hw/i2c/i2c.h |  1 +
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/hw/i2c/core.c b/hw/i2c/core.c
index a3921d9..a49138f 100644
--- a/hw/i2c/core.c
+++ b/hw/i2c/core.c
@@ -148,34 +148,50 @@ void i2c_end_transfer(I2CBus *bus)
 bus->broadcast = false;
 }
 
-int i2c_send(I2CBus *bus, uint8_t data)
+int i2c_send_recv(I2CBus *bus, uint8_t *data, bool send)
 {
 I2CSlaveClass *sc;
 I2CNode *node;
 int ret = -1;
 
-QLIST_FOREACH(node, >current_devs, next) {
-sc = I2C_SLAVE_GET_CLASS(node->elt);
-if (sc->send) {
-ret |= sc->send(node->elt, data);
+if (send) {
+QLIST_FOREACH(node, >current_devs, next) {
+sc = I2C_SLAVE_GET_CLASS(node->elt);
+if (sc->send) {
+ret |= sc->send(node->elt, *data);
+}
+}
+return ret;
+} else {
+if ((QLIST_EMPTY(>current_devs)) || (bus->broadcast)) {
+return -1;
 }
+
+sc = I2C_SLAVE_GET_CLASS(QLIST_FIRST(>current_devs)->elt);
+if (sc->recv) {
+ret = sc->recv(QLIST_FIRST(>current_devs)->elt);
+if (ret < 0) {
+return ret;
+} else {
+*data = ret;
+return 0;
+}
+}
+return -1;
 }
-return ret;
 }
 
-int i2c_recv(I2CBus *bus)
+int i2c_send(I2CBus *bus, uint8_t data)
 {
-I2CSlaveClass *sc;
+return i2c_send_recv(bus, , true);
+}
 
-if ((QLIST_EMPTY(>current_devs)) || (bus->broadcast)) {
-return -1;
-}
+int i2c_recv(I2CBus *bus)
+{
+uint8_t data;
+int ret = i2c_send_recv(bus, , false);
 
-sc = I2C_SLAVE_GET_CLASS(QLIST_FIRST(>current_devs)->elt);
-if (sc->recv) {
-return sc->recv(QLIST_FIRST(>current_devs)->elt);
-}
-return -1;
+return ret < 0 ? ret : data;
 }
 
 void i2c_nack(I2CBus *bus)
diff --git a/include/hw/i2c/i2c.h b/include/hw/i2c/i2c.h
index 4986ebc..c4085aa 100644
--- a/include/hw/i2c/i2c.h
+++ b/include/hw/i2c/i2c.h
@@ -56,6 +56,7 @@ int i2c_bus_busy(I2CBus *bus);
 int i2c_start_transfer(I2CBus *bus, uint8_t address, int recv);
 void i2c_end_transfer(I2CBus *bus);
 void i2c_nack(I2CBus *bus);
+int i2c_send_recv(I2CBus *bus, uint8_t *data, bool send);
 int i2c_send(I2CBus *bus, uint8_t data);
 int i2c_recv(I2CBus *bus);
 
-- 
1.8.3.1




[Qemu-devel] [PATCH V9 5/9] introduce dpcd module

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This introduces dpcd module.
It wires on a aux-bus and can be accessed by the driver to get lane-speed, etc.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/display/Makefile.objs|   1 +
 hw/display/dpcd.c   | 173 
 include/hw/display/dpcd.h   | 105 ++
 4 files changed, 280 insertions(+)
 create mode 100644 hw/display/dpcd.c
 create mode 100644 include/hw/display/dpcd.h

diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index d3a2665..87165b7 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -4,4 +4,5 @@
 include arm-softmmu.mak
 
 CONFIG_AUX=y
+CONFIG_DPCD=y
 CONFIG_XLNX_ZYNQMP=y
diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index d99780e..ddf3275 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -43,3 +43,4 @@ virtio-gpu.o-cflags := $(VIRGL_CFLAGS)
 virtio-gpu.o-libs += $(VIRGL_LIBS)
 virtio-gpu-3d.o-cflags := $(VIRGL_CFLAGS)
 virtio-gpu-3d.o-libs += $(VIRGL_LIBS)
+obj-$(CONFIG_DPCD) += dpcd.o
diff --git a/hw/display/dpcd.c b/hw/display/dpcd.c
new file mode 100644
index 000..5a36855
--- /dev/null
+++ b/hw/display/dpcd.c
@@ -0,0 +1,173 @@
+/*
+ * dpcd.c
+ *
+ *  Copyright (C) 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option)any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+/*
+ * This is a simple AUX slave which emulates a connected screen.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/misc/aux.h"
+#include "hw/display/dpcd.h"
+
+#ifndef DEBUG_DPCD
+#define DEBUG_DPCD 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_DPCD) {  
\
+qemu_log("dpcd: " fmt, ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+#define DPCD_READABLE_AREA  0x600
+
+struct DPCDState {
+/*< private >*/
+AUXSlave parent_obj;
+
+/*< public >*/
+/*
+ * The DCPD is 0x7 length but read as 0 after offset 0x5FF.
+ */
+uint8_t dpcd_info[DPCD_READABLE_AREA];
+
+MemoryRegion iomem;
+};
+
+static uint64_t dpcd_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint8_t ret;
+DPCDState *e = DPCD(opaque);
+
+if (offset < DPCD_READABLE_AREA) {
+ret = e->dpcd_info[offset];
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "dpcd: Bad offset 0x%" HWADDR_PRIX "\n",
+   offset);
+ret = 0;
+}
+
+DPRINTF("read 0x%" PRIX8 " @0x%" HWADDR_PRIX "\n", ret, offset);
+return ret;
+}
+
+static void dpcd_write(void *opaque, hwaddr offset, uint64_t value,
+   unsigned size)
+{
+DPCDState *e = DPCD(opaque);
+
+DPRINTF("write 0x%" PRIX8 " @0x%" HWADDR_PRIX "\n", (uint8_t)value, 
offset);
+
+if (offset < DPCD_READABLE_AREA) {
+e->dpcd_info[offset] = value;
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "dpcd: Bad offset 0x%" HWADDR_PRIX "\n",
+   offset);
+}
+}
+
+static const MemoryRegionOps aux_ops = {
+.read = dpcd_read,
+.write = dpcd_write,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+};
+
+static void dpcd_reset(DeviceState *dev)
+{
+DPCDState *s = DPCD(dev);
+
+memset(&(s->dpcd_info), 0, sizeof(s->dpcd_info));
+
+s->dpcd_info[DPCD_REVISION] = DPCD_REV_1_0;
+s->dpcd_info[DPCD_MAX_LINK_RATE] = DPCD_5_4GBPS;
+s->dpcd_info[DPCD_MAX_LANE_COUNT] = DPCD_FOUR_LANES;
+s->dpcd_info[DPCD_RECEIVE_PORT0_CAP_0] = DPCD_EDID_PRESENT;
+/* buffer size */
+s->dpcd_info[DPCD_RECEIVE_PORT0_CAP_1] = 0xFF;
+
+s->dpcd_info[DPCD_LANE0_1_STATUS] = 

[Qemu-devel] [PATCH V9 1/9] i2cbus: remove unused dev field

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

The dev field in i2cbus is not used.
So just drop it.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
Tested-By: Hyun Kwon 
---
 hw/i2c/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/i2c/core.c b/hw/i2c/core.c
index ba22104..013ff68 100644
--- a/hw/i2c/core.c
+++ b/hw/i2c/core.c
@@ -14,7 +14,6 @@ struct I2CBus
 {
 BusState qbus;
 I2CSlave *current_dev;
-I2CSlave *dev;
 uint8_t saved_address;
 };
 
-- 
1.8.3.1




[Qemu-devel] [PATCH V9 6/9] hw/i2c-ddc.c: Implement DDC I2C slave

2016-06-07 Thread fred . konrad
From: Peter Maydell 

Implement an I2C slave which implements DDC and returns the
EDID data for an attached monitor.

Signed-off-by: Peter Maydell 

  - Rebased on the current master.
  - Modified for QOM.

Signed-off-by: KONRAD Frederic 
Reviewed-by: Alistair Francis 
Tested-By: Hyun Kwon 
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/i2c/Makefile.objs|   1 +
 hw/i2c/i2c-ddc.c| 307 
 include/hw/i2c/i2c-ddc.h|  38 +
 4 files changed, 347 insertions(+)
 create mode 100644 hw/i2c/i2c-ddc.c
 create mode 100644 include/hw/i2c/i2c-ddc.h

diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index 87165b7..2449483 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -4,5 +4,6 @@
 include arm-softmmu.mak
 
 CONFIG_AUX=y
+CONFIG_DDC=y
 CONFIG_DPCD=y
 CONFIG_XLNX_ZYNQMP=y
diff --git a/hw/i2c/Makefile.objs b/hw/i2c/Makefile.objs
index 1fd54ed..a081b8e 100644
--- a/hw/i2c/Makefile.objs
+++ b/hw/i2c/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += core.o smbus.o smbus_eeprom.o
+common-obj-$(CONFIG_DDC) += i2c-ddc.o
 common-obj-$(CONFIG_VERSATILE_I2C) += versatile_i2c.o
 common-obj-$(CONFIG_ACPI_X86) += smbus_ich9.o
 common-obj-$(CONFIG_APM) += pm_smbus.o
diff --git a/hw/i2c/i2c-ddc.c b/hw/i2c/i2c-ddc.c
new file mode 100644
index 000..02cd374
--- /dev/null
+++ b/hw/i2c/i2c-ddc.c
@@ -0,0 +1,307 @@
+/* A simple I2C slave for returning monitor EDID data via DDC.
+ *
+ * Copyright (c) 2011 Linaro Limited
+ * Written by Peter Maydell
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/i2c/i2c.h"
+#include "hw/i2c/i2c-ddc.h"
+
+#ifndef DEBUG_I2CDDC
+#define DEBUG_I2CDDC 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_I2CDDC) {
\
+qemu_log("i2c-ddc: " fmt , ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+/* Structure defining a monitor's characteristics in a
+ * readable format: this should be passed to build_edid_blob()
+ * to convert it into the 128 byte binary EDID blob.
+ * Not all bits of the EDID are customisable here.
+ */
+struct EDIDData {
+char manuf_id[3]; /* three upper case letters */
+uint16_t product_id;
+uint32_t serial_no;
+uint8_t manuf_week;
+int manuf_year;
+uint8_t h_cm;
+uint8_t v_cm;
+uint8_t gamma;
+char monitor_name[14];
+char serial_no_string[14];
+/* Range limits */
+uint8_t vmin; /* Hz */
+uint8_t vmax; /* Hz */
+uint8_t hmin; /* kHz */
+uint8_t hmax; /* kHz */
+uint8_t pixclock; /* MHz / 10 */
+uint8_t timing_data[18];
+};
+
+typedef struct EDIDData EDIDData;
+
+/* EDID data for a simple LCD monitor */
+static const EDIDData lcd_edid = {
+/* The manuf_id ought really to be an assigned EISA ID */
+.manuf_id = "QMU",
+.product_id = 0,
+.serial_no = 1,
+.manuf_week = 1,
+.manuf_year = 2011,
+.h_cm = 40,
+.v_cm = 30,
+.gamma = 0x78,
+.monitor_name = "QEMU monitor",
+.serial_no_string = "1",
+.vmin = 40,
+.vmax = 120,
+.hmin = 30,
+.hmax = 100,
+.pixclock = 18,
+.timing_data = {
+/* Borrowed from a 21" LCD */
+0x48, 0x3f, 0x40, 0x30, 0x62, 0xb0, 0x32, 0x40, 0x40,
+0xc0, 0x13, 0x00, 0x98, 0x32, 0x11, 0x00, 0x00, 0x1e
+}
+};
+
+static uint8_t manuf_char_to_int(char c)
+{
+return (c - 'A') & 0x1f;
+}
+
+static void write_ascii_descriptor_block(uint8_t *descblob, uint8_t blocktype,
+ const char *string)
+{
+/* Write an EDID Descriptor Block of the "ascii string" type */
+int i;
+descblob[0] = descblob[1] = descblob[2] = descblob[4] = 0;
+descblob[3] = blocktype;
+/* The rest is 13 bytes of ASCII; if less then the rest must
+ * be filled with newline then spaces
+ */
+for (i = 5; i < 19; i++) {
+descblob[i] = string[i - 5];
+if (!descblob[i]) {
+break;
+}
+}
+if (i < 19) {
+descblob[i++] = 

[Qemu-devel] [PATCH V9 4/9] introduce aux-bus

2016-06-07 Thread fred . konrad
From: KONRAD Frederic 

This introduces a new bus: aux-bus.

It contains an address space for aux slaves devices and a bridge to an I2C bus
for I2C through AUX transactions.

Signed-off-by: KONRAD Frederic 
Tested-By: Hyun Kwon 
Reviewed-by: Alistair Francis 
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/misc/Makefile.objs   |   1 +
 hw/misc/aux.c   | 292 
 include/hw/misc/aux.h   | 128 
 4 files changed, 422 insertions(+)
 create mode 100644 hw/misc/aux.c
 create mode 100644 include/hw/misc/aux.h

diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index 96dd994..d3a2665 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -3,4 +3,5 @@
 # We support all the 32 bit boards so need all their config
 include arm-softmmu.mak
 
+CONFIG_AUX=y
 CONFIG_XLNX_ZYNQMP=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index bc0dd2c..ffb49c1 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -51,3 +51,4 @@ obj-$(CONFIG_MIPS_ITU) += mips_itu.o
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
+obj-$(CONFIG_AUX) += aux.o
diff --git a/hw/misc/aux.c b/hw/misc/aux.c
new file mode 100644
index 000..25d7712
--- /dev/null
+++ b/hw/misc/aux.c
@@ -0,0 +1,292 @@
+/*
+ * aux.c
+ *
+ *  Copyright 2015 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option)any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+/*
+ * This is an implementation of the AUX bus for VESA Display Port v1.1a.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/misc/aux.h"
+#include "hw/i2c/i2c.h"
+#include "monitor/monitor.h"
+
+#ifndef DEBUG_AUX
+#define DEBUG_AUX 0
+#endif
+
+#define DPRINTF(fmt, ...) do { 
\
+if (DEBUG_AUX) {   
\
+qemu_log("aux: " fmt , ## __VA_ARGS__);
\
+}  
\
+} while (0);
+
+#define TYPE_AUXTOI2C "aux-to-i2c-bridge"
+#define AUXTOI2C(obj) OBJECT_CHECK(AUXTOI2CState, (obj), TYPE_AUXTOI2C)
+
+static void aux_slave_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static inline I2CBus *aux_bridge_get_i2c_bus(AUXTOI2CState *bridge);
+
+/* aux-bus implementation (internal not public) */
+static void aux_bus_class_init(ObjectClass *klass, void *data)
+{
+BusClass *k = BUS_CLASS(klass);
+
+/* AUXSlave has an MMIO so we need to change the way we print information
+ * in monitor.
+ */
+k->print_dev = aux_slave_dev_print;
+}
+
+AUXBus *aux_init_bus(DeviceState *parent, const char *name)
+{
+AUXBus *bus;
+
+bus = AUX_BUS(qbus_create(TYPE_AUX_BUS, parent, name));
+bus->bridge = AUXTOI2C(qdev_create(BUS(bus), TYPE_AUXTOI2C));
+
+/* Memory related. */
+bus->aux_io = g_malloc(sizeof(*bus->aux_io));
+memory_region_init(bus->aux_io, OBJECT(bus), "aux-io", (1 << 20));
+address_space_init(>aux_addr_space, bus->aux_io, "aux-io");
+return bus;
+}
+
+static void aux_bus_map_device(AUXBus *bus, AUXSlave *dev, hwaddr addr)
+{
+memory_region_add_subregion(bus->aux_io, addr, dev->mmio);
+}
+
+static bool aux_bus_is_bridge(AUXBus *bus, DeviceState *dev)
+{
+return (dev == DEVICE(bus->bridge));
+}
+
+I2CBus *aux_get_i2c_bus(AUXBus *bus)
+{
+return aux_bridge_get_i2c_bus(bus->bridge);
+}
+
+AUXReply aux_request(AUXBus *bus, AUXCommand cmd, uint32_t address,
+  uint8_t len, uint8_t *data)
+{
+AUXReply ret = AUX_NACK;
+I2CBus *i2c_bus = aux_get_i2c_bus(bus);
+size_t i;
+bool is_write = false;
+
+DPRINTF("request at address 0x%" PRIX32 ", command %u, len %u\n", address,
+cmd, len);
+
+switch (cmd) {
+/*
+ * Forward the request on the AUX bus..
+ */
+case WRITE_AUX:
+case READ_AUX:
+is_write = cmd == READ_AUX ? false : true;
+for (i = 0; i < len; i++) {
+if 

Re: [Qemu-devel] [PATCH 04/10] target-i386: cpu: use cpu_generic_init() in cpu_x86_init()

2016-06-07 Thread Eduardo Habkost
On Mon, Jun 06, 2016 at 05:16:46PM +0200, Igor Mammedov wrote:
> now cpu_x86_init() does nothing more or less
> than duplicating cpu_generic_init() logic.
> So simplify it by using cpu_generic_init().
> 
> Signed-off-by: Igor Mammedov 
> Reviewed-by: Eduardo Habkost 

Applied to x86-next. Thanks.

-- 
Eduardo



Re: [Qemu-devel] [PATCH 03/10] target-i386: cpu: move xcc->kvm_required check to realize time

2016-06-07 Thread Eduardo Habkost
On Mon, Jun 06, 2016 at 05:16:45PM +0200, Igor Mammedov wrote:
> it will allow to drop custom cpu_x86_init() and use
> cpu_generic_init() insteadi, reducing cpu_x86_create()
> to a simple 3-liner.
> 
> Signed-off-by: Igor Mammedov 
> Eduardo Habkost 

Applied to x86-next. Thanks.

-- 
Eduardo



Re: [Qemu-devel] [PATCH 02/10] target-i386: cpu: move features logic that requires CPUState to realize time

2016-06-07 Thread Eduardo Habkost
On Mon, Jun 06, 2016 at 05:16:44PM +0200, Igor Mammedov wrote:
> Making x86_cpu_parse_featurestr() a pure convertor
> of legacy feature string into global properties, needs
> it to be called before a CPU instance is created so
> parser shouldn't modify CPUState directly or access
> it at all. Hence move current hack that directly pokes
> into CPUState, to set/unset +-feats, from parser to
> CPU's realize method.
> 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Eduardo Habkost 

I will just edit a comment below, when applying, for clarity:

[...]
> +/* TODO: convert plus_features & minus_features static vars
> + * to global properties, once broken host_features is fixed
> + */

I will rewrite this to:

/*TODO: cpu->host_features inclurrectly overwrites features
 * set using "feat=on|off". Once we fix this, we can convert
 * plus_features & minus_features to global properties
 * inside x86_cpu_parse_featurestr() too.
 */

> +if (cpu->host_features) {
> +for (w = 0; w < FEATURE_WORDS; w++) {
> +env->features[w] =
> +x86_cpu_get_supported_feature_word(w, cpu->migratable);
> +}
> +}
> +
> +for (w = 0; w < FEATURE_WORDS; w++) {
> +cpu->env.features[w] |= plus_features[w];
> +cpu->env.features[w] &= ~minus_features[w];
> +}
> +
>  if (env->features[FEAT_7_0_EBX] && env->cpuid_level < 7) {
>  env->cpuid_level = 7;
>  }
> -- 
> 1.8.3.1
> 

-- 
Eduardo



Re: [Qemu-devel] [PATCH 03/10] target-i386: cpu: move xcc->kvm_required check to realize time

2016-06-07 Thread Eduardo Habkost
On Mon, Jun 06, 2016 at 05:16:45PM +0200, Igor Mammedov wrote:
> it will allow to drop custom cpu_x86_init() and use
> cpu_generic_init() insteadi, reducing cpu_x86_create()
> to a simple 3-liner.
> 
> Signed-off-by: Igor Mammedov 
> Eduardo Habkost 

The "Reviewed-by: " prefix is missing, but this can be fixed when
applying the patch.

-- 
Eduardo



Re: [Qemu-devel] [PATCH V8 4/9] introduce aux-bus

2016-06-07 Thread Alistair Francis
On Tue, Jun 7, 2016 at 12:02 AM, KONRAD Frederic
 wrote:
>
>
> Le 06/06/2016 à 20:41, Alistair Francis a écrit :
>>
>> On Mon, Jun 6, 2016 at 7:21 AM,   wrote:
>>>
>>> From: KONRAD Frederic 
>>>
>>> This introduces a new bus: aux-bus.
>>>
>>> It contains an address space for aux slaves devices and a bridge to an
>>> I2C bus
>>> for I2C through AUX transactions.
>>>
>>> Signed-off-by: KONRAD Frederic 
>>> Tested-By: Hyun Kwon 
>>> ---
>>>  default-configs/aarch64-softmmu.mak |   1 +
>>>  hw/misc/Makefile.objs   |   1 +
>>>  hw/misc/aux.c   | 297
>>> 
>>>  include/hw/misc/aux.h   | 125 +++
>>>  4 files changed, 424 insertions(+)
>>>  create mode 100644 hw/misc/aux.c
>>>  create mode 100644 include/hw/misc/aux.h
>>>
>>> diff --git a/default-configs/aarch64-softmmu.mak
>>> b/default-configs/aarch64-softmmu.mak
>>> index 96dd994..d3a2665 100644
>>> --- a/default-configs/aarch64-softmmu.mak
>>> +++ b/default-configs/aarch64-softmmu.mak
>>> @@ -3,4 +3,5 @@
>>>  # We support all the 32 bit boards so need all their config
>>>  include arm-softmmu.mak
>>>
>>> +CONFIG_AUX=y
>>>  CONFIG_XLNX_ZYNQMP=y
>>> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
>>> index bc0dd2c..ffb49c1 100644
>>> --- a/hw/misc/Makefile.objs
>>> +++ b/hw/misc/Makefile.objs
>>> @@ -51,3 +51,4 @@ obj-$(CONFIG_MIPS_ITU) += mips_itu.o
>>>  obj-$(CONFIG_PVPANIC) += pvpanic.o
>>>  obj-$(CONFIG_EDU) += edu.o
>>>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
>>> +obj-$(CONFIG_AUX) += aux.o
>>> diff --git a/hw/misc/aux.c b/hw/misc/aux.c
>>> new file mode 100644
>>> index 000..6605224
>>> --- /dev/null
>>> +++ b/hw/misc/aux.c
>>> @@ -0,0 +1,297 @@
>>> +/*
>>> + * aux.c
>>> + *
>>> + *  Copyright 2015 : GreenSocs Ltd
>>> + *  http://www.greensocs.com/ , email: i...@greensocs.com
>>> + *
>>> + *  Developed by :
>>> + *  Frederic Konrad   
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation, either version 2 of the License, or
>>> + * (at your option)any later version.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> along
>>> + * with this program; if not, see .
>>> + *
>>> + */
>>> +
>>> +/*
>>> + * This is an implementation of the AUX bus for VESA Display Port v1.1a.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/log.h"
>>> +#include "hw/misc/aux.h"
>>> +#include "hw/i2c/i2c.h"
>>> +#include "monitor/monitor.h"
>>> +
>>> +#ifndef DEBUG_AUX
>>> +#define DEBUG_AUX 0
>>> +#endif
>>> +
>>> +#define DPRINTF(fmt, ...) do {
>>> \
>>> +if (DEBUG_AUX) {
>>> \
>>> +qemu_log("aux: " fmt , ## __VA_ARGS__);
>>> \
>>> +}
>>> \
>>> +} while (0);
>>> +
>>> +#define TYPE_AUXTOI2C "aux-to-i2c-bridge"
>>> +#define AUXTOI2C(obj) OBJECT_CHECK(AUXTOI2CState, (obj), TYPE_AUXTOI2C)
>>> +
>>> +#define TYPE_AUX_BUS "aux-bus"
>>> +#define AUX_BUS(obj) OBJECT_CHECK(AUXBus, (obj), TYPE_AUX_BUS)
>>
>>
>> This should be in the header file where the struct is.
>
>
> Ok
>
>>
>>> +
>>> +static void aux_slave_dev_print(Monitor *mon, DeviceState *dev, int
>>> indent);
>>> +static inline I2CBus *aux_bridge_get_i2c_bus(AUXTOI2CState *bridge);
>>> +
>>> +/* aux-bus implementation (internal not public) */
>>> +static void aux_bus_class_init(ObjectClass *klass, void *data)
>>> +{
>>> +BusClass *k = BUS_CLASS(klass);
>>> +
>>> +/* AUXSlave has an MMIO so we need to change the way we print
>>> information
>>> + * in monitor.
>>> + */
>>> +k->print_dev = aux_slave_dev_print;
>>> +}
>>> +
>>> +AUXBus *aux_init_bus(DeviceState *parent, const char *name)
>>> +{
>>> +AUXBus *bus;
>>> +
>>> +bus = AUX_BUS(qbus_create(TYPE_AUX_BUS, parent, name));
>>> +bus->bridge = AUXTOI2C(qdev_create(BUS(bus), TYPE_AUXTOI2C));
>>> +
>>> +/* Memory related. */
>>> +bus->aux_io = g_malloc(sizeof(*bus->aux_io));
>>> +memory_region_init(bus->aux_io, OBJECT(bus), "aux-io", (1 << 20));
>>> +address_space_init(>aux_addr_space, bus->aux_io, "aux-io");
>>> +return bus;
>>> +}
>>> +
>>> +static void aux_bus_map_device(AUXBus *bus, AUXSlave *dev, hwaddr addr)
>>> +{
>>> +memory_region_add_subregion(bus->aux_io, addr, dev->mmio);
>>> +}
>>> +
>>> +static bool aux_bus_is_bridge(AUXBus *bus, DeviceState *dev)
>>> +{
>>> +return (dev == DEVICE(bus->bridge));
>>> +}
>>> +
>>> +I2CBus 

Re: [Qemu-devel] [PATCH v3 2/2] target-i386: add migration support for Intel LMCE

2016-06-07 Thread Eduardo Habkost
On Fri, Jun 03, 2016 at 02:09:44PM +0800, Haozhong Zhang wrote:
> LMCE is disabled by default, but a cpu option 'lmce=on/off' is provided
> to enable/disable it. Migration is only allowed between VCPUs with the
> same lmce option.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: Boris Petkov 
> Cc: Tony Luck 
> Cc: Andi Kleen 
> Cc: Ashok Raj 
> ---
>  include/hw/i386/pc.h  |  7 ++-
>  target-i386/cpu.c |  1 +
>  target-i386/cpu.h |  5 +
>  target-i386/machine.c | 24 
>  4 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index ca23609..058eef9 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -357,7 +357,12 @@ int e820_get_num_entries(void);
>  bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>  
>  #define PC_COMPAT_2_6 \
> -HW_COMPAT_2_6
> +HW_COMPAT_2_6 \
> +{\
> +.driver   = TYPE_X86_CPU,\
> +.property = "lmce",\
> +.value= "off",\
> +},

You don't need this if lmce is disabled by default.

>  
>  #define PC_COMPAT_2_5 \
>  PC_COMPAT_2_6 \
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 9b4dbab..c69cc17 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3232,6 +3232,7 @@ static Property x86_cpu_properties[] = {
>  DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
>  DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
>  DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id),
> +DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false),

Maybe this belong to patch 1/2?

>  DEFINE_PROP_END_OF_LIST()
>  };
>  
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 2d411ba..b512fd6 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -1182,6 +1182,11 @@ struct X86CPU {
>   */
>  bool enable_pmu;
>  
> +/* Enable LMCE support which is set via cpu option 'lmce=on/off'. LMCE is
> + * disabled by default to avoid breaking the migration between QEMU with
> + * different LMCE support. Only migrating between QEMU with the same LMCE
> + * support is allowed.
> + */
>  bool enable_lmce;
>  
>  /* in order to simplify APIC support, we leave this pointer to the
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index cb9adf2..b55d376 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -347,6 +347,11 @@ static int cpu_post_load(void *opaque, int version_id)
>  return -EINVAL;
>  }
>  
> +if (!cpu->enable_lmce && (env->mcg_cap & MCG_LMCE_P)) {
> +error_report("LMCE not enabled");
> +return -EINVAL;
> +}

Nice. But the error message could be clearer, to indicate that it
is about command-line configuration not being the same on both
sides. What about something like:
  config mismatch: VCPU has LMCE is enabled, but "lmce" option is disabled

> +
>  /*
>   * Real mode guest segments register DPL should be zero.
>   * Older KVM version were setting it wrongly.
> @@ -896,6 +901,24 @@ static const VMStateDescription vmstate_tsc_khz = {
>  }
>  };
>  
> +static bool mcg_ext_ctl_needed(void *opaque)
> +{
> +X86CPU *cpu = opaque;
> +CPUX86State *env = >env;
> +return cpu->enable_lmce && env->mcg_ext_ctl;
> +}
> +
> +static const VMStateDescription vmstate_mcg_ext_ctl = {
> +.name = "cpu/mcg_ext_ctl",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = mcg_ext_ctl_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT64(env.mcg_ext_ctl, X86CPU),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  VMStateDescription vmstate_x86_cpu = {
>  .name = "cpu",
>  .version_id = 12,
> @@ -1022,6 +1045,7 @@ VMStateDescription vmstate_x86_cpu = {
>  #ifdef TARGET_X86_64
>  _pkru,
>  #endif
> +_mcg_ext_ctl,
>  NULL
>  }
>  };
> -- 
> 2.8.3
> 

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 1/2] target-i386: KVM: add basic Intel LMCE support

2016-06-07 Thread Eduardo Habkost
On Fri, Jun 03, 2016 at 02:09:43PM +0800, Haozhong Zhang wrote:
[...]
> +
> +if (cpu->enable_lmce) {
> +if (lmce_supported()) {
> +cenv->mcg_cap |= MCG_LMCE_P;
> +cenv->msr_ia32_feature_control |=
> +MSR_IA32_FEATURE_CONTROL_LMCE |
> +MSR_IA32_FEATURE_CONTROL_LOCKED;
> +} else {
> +error_report("Warning: KVM unavailable or not support LMCE, "
> + "LMCE disabled");
> +cpu->enable_lmce = false;

Please don't do that. If the user explicitly asked for LMCE, you
should refuse to start if the host doesn't have the required
capabilities.


> +}
> +}
> +
>  cenv->mcg_ctl = ~(uint64_t)0;
>  for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
>  cenv->mce_banks[bank * 4] = ~(uint64_t)0;
[...]

-- 
Eduardo



Re: [Qemu-devel] [PATCH 15/18] linux-user: Correct signedness of target_flock l_start and l_len fields

2016-06-07 Thread Laurent Vivier


Le 06/06/2016 à 20:58, Peter Maydell a écrit :
> The l_start and l_len fields in the various target_flock structures are
> supposed to be '__kernel_off_t' or '__kernel_loff_t', which means they
> should be signed, not unsigned. Correcting the structure definitions means
> that __get_user() and __put_user() will correctly sign extend them if
> the guest is using 32 bit offsets and the host is using 64 bit offsets.
> 
> This fixes failures in the LTP 'fcntl14' tests where it checks that
> negative seek offsets work correctly.
> 
> We reindent the structures to drop hard tabs since we're touching 40%
> of the fields anyway.
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/syscall_defs.h | 34 +-
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> index 124754f..8a801e0 100644
> --- a/linux-user/syscall_defs.h
> +++ b/linux-user/syscall_defs.h
> @@ -2289,34 +2289,34 @@ struct target_statfs64 {
>  #endif
>  
>  struct target_flock {
> - short l_type;
> - short l_whence;
> - abi_ulong l_start;
> - abi_ulong l_len;
> - int l_pid;
> +short l_type;
> +short l_whence;
> +abi_long l_start;
> +abi_long l_len;
> +int l_pid;
>  };
>  
>  struct target_flock64 {
> - short  l_type;
> - short  l_whence;
> +short  l_type;
> +short  l_whence;
>  #if defined(TARGET_PPC) || defined(TARGET_X86_64) || defined(TARGET_MIPS) \
>  || defined(TARGET_SPARC) || defined(TARGET_HPPA) \
>  || defined(TARGET_MICROBLAZE) || defined(TARGET_TILEGX)
> -int __pad;
> +int __pad;
>  #endif
> - unsigned long long l_start;
> - unsigned long long l_len;
> - int  l_pid;
> +long long l_start;
> +long long l_len;

to be correct, they should be abi_llong.

> +int  l_pid;
>  } QEMU_PACKED;
>  
>  #ifdef TARGET_ARM
>  struct target_eabi_flock64 {
> - short  l_type;
> - short  l_whence;
> -int __pad;
> - unsigned long long l_start;
> - unsigned long long l_len;
> - int  l_pid;
> +short  l_type;
> +short  l_whence;
> +int __pad;
> +long long l_start;
> +long long l_len;

abi_llong

> +int  l_pid;
>  } QEMU_PACKED;
>  #endif
>  
> 



Re: [Qemu-devel] [Qemu-block] [PATCH] Report error when opening device with locked tray

2016-06-07 Thread John Snow


On 06/07/2016 06:28 AM, Kevin Wolf wrote:
> Am 06.06.2016 um 21:40 hat Colin Lord geschrieben:
>> This commit causes qmp_blockdev_change_medium to report an error if an
>> attempt is made to open a device with a locked tray.
> 
> The old behaviour is that the command seemingly succeeds, but the medium
> isn't actually changed. Correct?
> 

Close. Old "change" command also fails, but with a confusing error.

> Should this be mentioned in the commit message? You just describe what
> you change, but not why.
> 

Old behavior:

- Change uses qmp_blockdev_open_tray, which "succeeds."
- Change then tries to use qmp_x_blockdev_remove_medium, but receives
potentially confusing error "Tray is locked."
- Moments later, the tray is likely now open.

New behavior:

- Change uses do_open_tray, which returns -EINPROGRESS.
- Change can propagate this error upwards without attempting to remove
the medium.
- User gets "Device  is locked and force was not specified, wait
for tray to open and try again" error.

Why: "The new error tries to inform the user that there is an action
pending and that the command, if run again, may succeed."

>> Signed-off-by: Colin Lord 
>> This is based off my previous patch regarding the do_open_tray function
>> (currently at v3). Probably should have been submitted as a patch set
>> but I wasn't thinking that far ahead when I submitted the first patch.
>> ---
>>  blockdev.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Yes, would probably have made sense as a series, but as long as it's
> only two patches, it's not really a problem.
> 
> Please make sure to put such comments below the "---" line, though, i.e.
> comments that make sense for the review, but not as part of the commit
> log. Then git-am automatically removes that part from the commit message
> while applying the patch. I did it manually for this one now.
> 
> Kevin
> 

-- 
—js



Re: [Qemu-devel] [PATCH 17/18] linux-user: Make target_strerror() return 'const char *'

2016-06-07 Thread Laurent Vivier


Le 06/06/2016 à 20:58, Peter Maydell a écrit :
> Make target_strerror() return 'const char *' rather than just 'char *';
> this will allow us to return constant strings from it for some special
> cases.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Vivier 

> ---
>  linux-user/qemu.h| 2 +-
>  linux-user/strace.c  | 4 ++--
>  linux-user/syscall.c | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index 6bd7b32..56f29c3 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -195,7 +195,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
>  void gemu_log(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
>  extern THREAD CPUState *thread_cpu;
>  void cpu_loop(CPUArchState *env);
> -char *target_strerror(int err);
> +const char *target_strerror(int err);
>  int get_osversion(void);
>  void init_qemu_uname_release(void);
>  void fork_start(void);
> diff --git a/linux-user/strace.c b/linux-user/strace.c
> index 0810c85..c5980a1 100644
> --- a/linux-user/strace.c
> +++ b/linux-user/strace.c
> @@ -281,7 +281,7 @@ print_ipc(const struct syscallname *name,
>  static void
>  print_syscall_ret_addr(const struct syscallname *name, abi_long ret)
>  {
> -char *errstr = NULL;
> +const char *errstr = NULL;
>  
>  if (ret < 0) {
>  errstr = target_strerror(-ret);
> @@ -1594,7 +1594,7 @@ void
>  print_syscall_ret(int num, abi_long ret)
>  {
>  int i;
> -char *errstr = NULL;
> +const char *errstr = NULL;
>  
>  for(i=0;i  if( scnames[i].nr == num ) {
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 249d246..bcee02d 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -617,7 +617,7 @@ static inline int is_error(abi_long ret)
>  return (abi_ulong)ret >= (abi_ulong)(-4096);
>  }
>  
> -char *target_strerror(int err)
> +const char *target_strerror(int err)
>  {
>  if ((err >= ERRNO_TABLE_SIZE) || (err < 0)) {
>  return NULL;
> 



Re: [Qemu-devel] [PATCH 18/18] linux-user: Special-case ERESTARTSYS in target_strerror()

2016-06-07 Thread Laurent Vivier


Le 06/06/2016 à 20:58, Peter Maydell a écrit :
> Since TARGET_ERESTARTSYS and TARGET_ESIGRETURN are internal-to-QEMU
> error numbers, handle them specially in target_strerror(), to avoid
> confusing strace output like:
> 
> 9521 rt_sigreturn(14,8,274886297808,8,0,268435456) = -1 errno=513 (Unknown 
> error 513)
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/syscall.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index bcee02d..782d475 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -619,6 +619,13 @@ static inline int is_error(abi_long ret)
>  
>  const char *target_strerror(int err)
>  {
> +if (err == TARGET_ERESTARTSYS) {
> +return "To be restarted";
> +}
> +if (err == TARGET_QEMU_ESIGRETURN) {
> +return "Successful exit from sigreturn";
> +}
> +
>  if ((err >= ERRNO_TABLE_SIZE) || (err < 0)) {
>  return NULL;
>  }

This is not the aim of this patch, but target_to_host_errno() has now
these checks, perhaps we can remove this while we are here...

Laurent



Re: [Qemu-devel] [PATCH v2 19/19] linux-user: Avoid possible misalignment in target_to_host_siginfo()

2016-06-07 Thread Laurent Vivier


Le 27/05/2016 à 16:52, Peter Maydell a écrit :
> Reimplement target_to_host_siginfo() to use __get_user(), which
> handles possibly misaligned source guest structures correctly.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Vivier 

> ---
>  linux-user/signal.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 7e2a80f..8417da7 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -409,13 +409,18 @@ void host_to_target_siginfo(target_siginfo_t *tinfo, 
> const siginfo_t *info)
>  /* XXX: find a solution for 64 bit (additional malloced data is needed) */
>  void target_to_host_siginfo(siginfo_t *info, const target_siginfo_t *tinfo)
>  {
> -info->si_signo = tswap32(tinfo->si_signo);
> -info->si_errno = tswap32(tinfo->si_errno);
> -info->si_code = tswap32(tinfo->si_code);
> -info->si_pid = tswap32(tinfo->_sifields._rt._pid);
> -info->si_uid = tswap32(tinfo->_sifields._rt._uid);
> -info->si_value.sival_ptr =
> -(void *)(long)tswapal(tinfo->_sifields._rt._sigval.sival_ptr);
> +/* This conversion is used only for the rt_sigqueueinfo syscall,
> + * and so we know that the _rt fields are the valid ones.
> + */
> +abi_ulong sival_ptr;
> +
> +__get_user(info->si_signo, >si_signo);
> +__get_user(info->si_errno, >si_errno);
> +__get_user(info->si_code, >si_code);
> +__get_user(info->si_pid, >_sifields._rt._pid);
> +__get_user(info->si_uid, >_sifields._rt._uid);
> +__get_user(sival_ptr, >_sifields._rt._sigval.sival_ptr);
> +info->si_value.sival_ptr = (void *)(long)sival_ptr;
>  }
>  
>  static int fatal_signal (int sig)
> 



Re: [Qemu-devel] [PATCH v2 18/19] linux-user: Avoid possible misalignment in host_to_target_siginfo()

2016-06-07 Thread Laurent Vivier


Le 27/05/2016 à 16:52, Peter Maydell a écrit :
> host_to_target_siginfo() is implemented by a combination of
> host_to_target_siginfo_noswap() followed by tswap_siginfo().
> The first of these two functions assumes that the target_siginfo_t
> it is writing to is correctly aligned, but the pointer passed
> into host_to_target_siginfo() is directly from the guest and
> might be misaligned. Use a local variable to avoid this problem.
> (tswap_siginfo() does now correctly handle a misaligned destination.)

You mean the pointer from the guest can not be correctly aligned for the
guest?

Laurent
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/signal.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 8ea0cbf..7e2a80f 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -400,8 +400,9 @@ static void tswap_siginfo(target_siginfo_t *tinfo,
>  
>  void host_to_target_siginfo(target_siginfo_t *tinfo, const siginfo_t *info)
>  {
> -host_to_target_siginfo_noswap(tinfo, info);
> -tswap_siginfo(tinfo, tinfo);
> +target_siginfo_t tgt_tmp;
> +host_to_target_siginfo_noswap(_tmp, info);
> +tswap_siginfo(tinfo, _tmp);
>  }
>  
>  /* XXX: we support only POSIX RT signals are used. */
> 



Re: [Qemu-devel] [RFC] Allow AMD IOMMU to have both SysBusDevice and PCIDevice properties.

2016-06-07 Thread David Kiarie
On Tue, Jun 7, 2016 at 10:12 PM, Eduardo Habkost  wrote:
> Hi,

Hello,

>
> I didn't review the amd_iommu.c code, but there seems to be some
> unrelated changes in the patch:

Thanks for looking at this but I actually wanted someone to look at
the amd_iommu.c. I mentioned in annotation that there are some
unrelated changes because this work is based on code that has not been
merged yet. I specifically sent this to have a review in amd_iommu.c
not the details but the design. I have patchset that implements AMD
IOMMU (translation only) which is implemented as a PCI device. It is
however not possible to work on interrupt remapping without converting
AMD IOMMU from a PCI device to a SysBusDevice. This device(AMD IOMMU),
the one on this patch unlike in previous patches, creates to devices ;
a PCI device and a SySBusDev which am not sure is acceptable.

>
> On Sun, Jun 05, 2016 at 07:54:33PM +0300, David Kiarie wrote:
>> Signed-off-by: David Kiarie 
>> ---
>>  hw/acpi/aml-build.c |2 +-
>>  hw/i386/amd_iommu.c | 1471 
>> +++
>>  hw/i386/amd_iommu.h |  348 ++
>>  hw/i386/kvm/pci-assign.c|2 +-
>>  hw/i386/pc_q35.c|1 +
>>  include/hw/acpi/acpi-defs.h |   13 +
>>  include/hw/acpi/aml-build.h |1 +
>>  include/hw/pci/pci.h|   10 +-
>>  qemu-options.hx |7 +-
>>  util/qemu-config.c  |8 +-
>>  10 files changed, 1853 insertions(+), 10 deletions(-)
>>  create mode 100644 hw/i386/amd_iommu.c
>>  create mode 100644 hw/i386/amd_iommu.h
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index cedb74e..8d4bd01 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -227,7 +227,7 @@ static void build_extop_package(GArray *package, uint8_t 
>> op)
>>  build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
>>  }
>>
>> -static void build_append_int_noprefix(GArray *table, uint64_t value, int 
>> size)
>> +void build_append_int_noprefix(GArray *table, uint64_t value, int size)
>
> Why this change?
>
>>  {
>>  int i;
>>
> [...]
>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>> index 04aae89..431eaed 100644
>> --- a/hw/i386/pc_q35.c
>> +++ b/hw/i386/pc_q35.c
>> @@ -281,6 +281,7 @@ static void pc_q35_machine_options(MachineClass *m)
>>  m->default_machine_opts = "firmware=bios-256k.bin";
>>  m->default_display = "std";
>>  m->no_floppy = 1;
>> +m->has_dynamic_sysbus = true;
>
> Why is this needed? Is it possible to do this change before
> adding the iommu code?  Can this be done in a separate patch that
> documents why it should be changed and why it is safe to set it
> to true?
>
>>  }
>>
> [...]
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index a30808b..ef0e8a6 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -11,10 +11,11 @@
>>  #include "hw/pci/pcie.h"
>>
>>  /* PCI bus */
>> -
>> +#define PCI_DEVID(bus, devfn)   uint16_t)(bus)) << 8) | (devfn))
>>  #define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
>>  #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
>>  #define PCI_FUNC(devfn) ((devfn) & 0x07)
>> +#define PCI_BUILD_BDF(bus, devfn) ((bus << 8) | (devfn))
>
> Missing parenthesis around (bus).
>
>>  #define PCI_SLOT_MAX32
>>  #define PCI_FUNC_MAX8
>>
>> @@ -328,7 +329,6 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>  int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
>> uint8_t offset, uint8_t size,
>> Error **errp);
>> -
>
> Unrelated whitespace change.
>
>>  void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t 
>> cap_size);
>>
>>  uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
>> @@ -692,11 +692,13 @@ static inline uint32_t pci_config_size(const PCIDevice 
>> *d)
>>  return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : 
>> PCI_CONFIG_SPACE_SIZE;
>>  }
>>
>> -static inline uint16_t pci_requester_id(PCIDevice *dev)
>> +static inline uint16_t pci_get_bdf(PCIDevice *dev)
>>  {
>> -return (pci_bus_num(dev->bus) << 8) | dev->devfn;
>> +return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
>>  }
>>
>> +uint16_t pci_requester_id(PCIDevice *dev);
>> +
>
> Why is pci_requester_id() being kept? Where's its implementation?
>
> pci_requester_id() is still being used at:
>
>   hw/pci/msi.c:attrs.requester_id = pci_requester_id(dev);
>   hw/pci/pcie_aer.c:err.source_id = pci_requester_id(dev);
>
>
>>  /* DMA access functions */
>>  static inline AddressSpace *pci_get_address_space(PCIDevice *dev)
>>  {
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 9f33361..0aec287 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -38,7 +38,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>  "kvm_shadow_mem=size of KVM shadow MMU\n"
>>  "   

Re: [Qemu-devel] [PATCH v2 17/19] linux-user: Use both si_code and si_signo when converting siginfo_t

2016-06-07 Thread Laurent Vivier


Le 27/05/2016 à 16:51, Peter Maydell a écrit :
> The siginfo_t struct includes a union. The correct way to identify
> which fields of the union are relevant is complicated, because we
> have to use a combination of the si_code and si_signo to figure out
> which of the union's members are valid.  (Within the host kernel it
> is always possible to tell, but the kernel carefully avoids giving
> userspace the high 16 bits of si_code, so we don't have the
> information to do this the easy way...) We therefore make our best
> guess, bearing in mind that a guest can spoof most of the si_codes
> via rt_sigqueueinfo() if it likes.  Once we have made our guess, we
> record it in the top 16 bits of the si_code, so that tswap_siginfo()
> later can use it.  tswap_siginfo() then strips these top bits out
> before writing si_code to the guest (sign-extending the lower bits).
> 
> This fixes a bug where fields were sometimes wrong; in particular
> the LTP kill10 test went into an infinite loop because its signal
> handler got a si_pid value of 0 rather than the pid of the sending
> process.
> 
> As part of this change, we switch to using __put_user() in the
> tswap_siginfo code which writes out the byteswapped values to
> the target memory, in case the target memory pointer is not
> sufficiently aligned for the host CPU's requirements.
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/signal.c   | 165 
> --
>  linux-user/syscall_defs.h |  15 +
>  2 files changed, 131 insertions(+), 49 deletions(-)
> 
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index b21d6bf..8ea0cbf 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -17,6 +17,7 @@
>   *  along with this program; if not, see .
>   */
>  #include "qemu/osdep.h"
> +#include "qemu/bitops.h"
>  #include 
>  #include 
>  
> @@ -274,70 +275,129 @@ static inline void 
> host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
>   const siginfo_t *info)
>  {
>  int sig = host_to_target_signal(info->si_signo);
> +int si_code = info->si_code;
> +int si_type;
>  tinfo->si_signo = sig;
>  tinfo->si_errno = 0;
>  tinfo->si_code = info->si_code;
>  
> -if (sig == TARGET_SIGILL || sig == TARGET_SIGFPE || sig == TARGET_SIGSEGV
> -|| sig == TARGET_SIGBUS || sig == TARGET_SIGTRAP) {
> -/* Should never come here, but who knows. The information for
> -   the target is irrelevant.  */
> -tinfo->_sifields._sigfault._addr = 0;
> -} else if (sig == TARGET_SIGIO) {
> -tinfo->_sifields._sigpoll._band = info->si_band;
> -tinfo->_sifields._sigpoll._fd = info->si_fd;
> -} else if (sig == TARGET_SIGCHLD) {
> -tinfo->_sifields._sigchld._pid = info->si_pid;
> -tinfo->_sifields._sigchld._uid = info->si_uid;
> -tinfo->_sifields._sigchld._status
> +/* This is awkward, because we have to use a combination of
> + * the si_code and si_signo to figure out which of the union's
> + * members are valid. (Within the host kernel it is always possible
> + * to tell, but the kernel carefully avoids giving userspace the
> + * high 16 bits of si_code, so we don't have the information to
> + * do this the easy way...) We therefore make our best guess,
> + * bearing in mind that a guest can spoof most of the si_codes
> + * via rt_sigqueueinfo() if it likes.
> + *
> + * Once we have made our guess, we record it in the top 16 bits of
> + * the si_code, so that tswap_siginfo() later can use it.
> + * tswap_siginfo() will strip these top bits out before writing
> + * si_code to the guest (sign-extending the lower bits).
> + */
> +
> +switch (si_code) {
> +case SI_USER:
> +case SI_TKILL:
> +case SI_KERNEL:
> +/* Sent via kill(), tkill() or tgkill(), or direct from the kernel.
> + * These are the only unspoofable si_code values.
> + */
> +tinfo->_sifields._kill._pid = info->si_pid;
> +tinfo->_sifields._kill._uid = info->si_uid;
> +si_type = QEMU_SI_KILL;
> +break;
> +default:
> +/* Everything else is spoofable. Make best guess based on signal */
> +switch (sig) {
> +case TARGET_SIGCHLD:
> +tinfo->_sifields._sigchld._pid = info->si_pid;
> +tinfo->_sifields._sigchld._uid = info->si_uid;
> +tinfo->_sifields._sigchld._status
>  = host_to_target_waitstatus(info->si_status);
> -tinfo->_sifields._sigchld._utime = info->si_utime;
> -tinfo->_sifields._sigchld._stime = info->si_stime;
> -} else if (sig >= TARGET_SIGRTMIN) {
> -tinfo->_sifields._rt._pid = info->si_pid;
> -tinfo->_sifields._rt._uid = info->si_uid;
> -/* XXX: potential problem if 64 bit */
> -

Re: [Qemu-devel] QEMU 2.7 release schedule?

2016-06-07 Thread Michael Roth
Quoting Peter Maydell (2016-06-07 13:45:06)
> On 7 June 2016 at 18:29, Michael Roth  wrote:
> > I think it is actually bit shorter of a window this time. The last few
> > releases had around 2.5 to 3 months between n-1 release and hard freeze / 
> > rc0
> > for n+1, but the proposed date would be just around 2 months.
> 
> Yeah, it's a bit short because the late-breaking CVEs meant we
> didn't release 2.6 until about two weeks later than we planned.
> 
> If we want to have 2.5 months between n-1 and rc0, that would be
> something like
>  softfreeze 5 july
>  hardfreeze/rc0 26 july
>  rc1 2 august
>  rc2 9 august
>  rc3 16 august
>  release 22 august (before kvm forum) if we're lucky, or
>   30 august if we're not (more likely)
> 
> [these dates are all +2 weeks on the previous suggestion.]
> 
> > Being in late RC during KVM Forum also sounds like it could
> > be productive, but I'm not sure I'd want to be in that position
> > if I was Peter...
> 
> From my POV the rc3-to-rc4 stage is not that much work, but
> it's hard to predict who might be the person with the last-minute
> required fix (which is usually why we end up with about a week
> of slip over the theoretical schedule). The tree is not supposed
> to change at that point. I can do the rc/release cutting mechanics
> remotely (assuming no disasters like stolen laptops etc); how about
> your part with the tarballs? Otherwise we can just do it either
> before or after the conference depending on how it goes.

Same for me, should be able to kick off everything remotely.

> 
> But I think the real problem with a schedule which expects a
> release at the tail end of August is that we then only have
> three and a half months left til mid-December which is in
> practice the latest we want to do a release given holidays.
> So we can only avoid the short dev period this time round by
> having a short one next time instead.
> 
> Maybe we could have +1 week rather than +0 or +2 (so softfreeze
> 28 June, rc0 19 July, release 16 August), as you suggest. That's
> currently feeling like the best compromise to me.

Yah, I think I agree. We'll have to make up the 2 weeks lost at some
point, but spreading it out avoids us finding ourselves in a similar
situation next release. Seems like maybe there may be more work
being targetted for 2.8 as well.

> 
> thanks
> -- PMM
> 




Re: [Qemu-devel] [RFC] Allow AMD IOMMU to have both SysBusDevice and PCIDevice properties.

2016-06-07 Thread Eduardo Habkost
Hi,

I didn't review the amd_iommu.c code, but there seems to be some
unrelated changes in the patch:

On Sun, Jun 05, 2016 at 07:54:33PM +0300, David Kiarie wrote:
> Signed-off-by: David Kiarie 
> ---
>  hw/acpi/aml-build.c |2 +-
>  hw/i386/amd_iommu.c | 1471 
> +++
>  hw/i386/amd_iommu.h |  348 ++
>  hw/i386/kvm/pci-assign.c|2 +-
>  hw/i386/pc_q35.c|1 +
>  include/hw/acpi/acpi-defs.h |   13 +
>  include/hw/acpi/aml-build.h |1 +
>  include/hw/pci/pci.h|   10 +-
>  qemu-options.hx |7 +-
>  util/qemu-config.c  |8 +-
>  10 files changed, 1853 insertions(+), 10 deletions(-)
>  create mode 100644 hw/i386/amd_iommu.c
>  create mode 100644 hw/i386/amd_iommu.h
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index cedb74e..8d4bd01 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -227,7 +227,7 @@ static void build_extop_package(GArray *package, uint8_t 
> op)
>  build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
>  }
>  
> -static void build_append_int_noprefix(GArray *table, uint64_t value, int 
> size)
> +void build_append_int_noprefix(GArray *table, uint64_t value, int size)

Why this change?

>  {
>  int i;
>  
[...]
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 04aae89..431eaed 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -281,6 +281,7 @@ static void pc_q35_machine_options(MachineClass *m)
>  m->default_machine_opts = "firmware=bios-256k.bin";
>  m->default_display = "std";
>  m->no_floppy = 1;
> +m->has_dynamic_sysbus = true;

Why is this needed? Is it possible to do this change before
adding the iommu code?  Can this be done in a separate patch that
documents why it should be changed and why it is safe to set it
to true?

>  }
>  
[...]
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index a30808b..ef0e8a6 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -11,10 +11,11 @@
>  #include "hw/pci/pcie.h"
>  
>  /* PCI bus */
> -
> +#define PCI_DEVID(bus, devfn)   uint16_t)(bus)) << 8) | (devfn))
>  #define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn) ((devfn) & 0x07)
> +#define PCI_BUILD_BDF(bus, devfn) ((bus << 8) | (devfn))

Missing parenthesis around (bus).

>  #define PCI_SLOT_MAX32
>  #define PCI_FUNC_MAX8
>  
> @@ -328,7 +329,6 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>  int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
> uint8_t offset, uint8_t size,
> Error **errp);
> -

Unrelated whitespace change.

>  void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t 
> cap_size);
>  
>  uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
> @@ -692,11 +692,13 @@ static inline uint32_t pci_config_size(const PCIDevice 
> *d)
>  return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : 
> PCI_CONFIG_SPACE_SIZE;
>  }
>  
> -static inline uint16_t pci_requester_id(PCIDevice *dev)
> +static inline uint16_t pci_get_bdf(PCIDevice *dev)
>  {
> -return (pci_bus_num(dev->bus) << 8) | dev->devfn;
> +return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
>  }
>  
> +uint16_t pci_requester_id(PCIDevice *dev);
> +

Why is pci_requester_id() being kept? Where's its implementation?

pci_requester_id() is still being used at:

  hw/pci/msi.c:attrs.requester_id = pci_requester_id(dev);
  hw/pci/pcie_aer.c:err.source_id = pci_requester_id(dev);


>  /* DMA access functions */
>  static inline AddressSpace *pci_get_address_space(PCIDevice *dev)
>  {
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9f33361..0aec287 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -38,7 +38,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>  "kvm_shadow_mem=size of KVM shadow MMU\n"
>  "dump-guest-core=on|off include guest memory in a core 
> dump (default=on)\n"
>  "mem-merge=on|off controls memory merge support 
> (default: on)\n"
> -"iommu=on|off controls emulated Intel IOMMU (VT-d) 
> support (default=off)\n"
> +"iommu=on|off controls emulated IOMMU support(default: 
> off)\n"
> +"x-iommu-type=amd|intel overrides emulated IOMMU to AMD 
> IOMMU (default: intel)\n"

Where is the new x-iommu-type option being used?

>  "igd-passthru=on|off controls IGD GFX passthrough 
> support (default=off)\n"
>  "aes-key-wrap=on|off controls support for AES key 
> wrapping (default=on)\n"
>  "dea-key-wrap=on|off controls support for DEA key 
> wrapping (default=on)\n"
> @@ -74,7 +75,9 @@ Enables or disables memory merge support. 

Re: [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest

2016-06-07 Thread Alex Williamson
On Tue, 7 Jun 2016 17:21:06 +1200
"Huang, Kai"  wrote:

> On 6/7/2016 3:58 PM, Alex Williamson wrote:
> > On Tue, 7 Jun 2016 11:20:32 +0800
> > Peter Xu  wrote:
> >  
> >> On Mon, Jun 06, 2016 at 11:02:11AM -0600, Alex Williamson wrote:  
> >>> On Mon, 6 Jun 2016 21:43:17 +0800
> >>> Peter Xu  wrote:
> >>>  
>  On Mon, Jun 06, 2016 at 07:11:41AM -0600, Alex Williamson wrote:  
> > On Mon, 6 Jun 2016 13:04:07 +0800
> > Peter Xu  wrote:  
>  [...]  
> >> Besides the reason that there might have guests that do not support
> >> CM=1, will there be performance considerations? When user's
> >> configuration does not require CM capability (e.g., generic VM
> >> configuration, without VFIO), shall we allow user to disable the CM
> >> bit so that we can have better IOMMU performance (avoid extra and
> >> useless invalidations)?  
> >
> > With Alexey's proposed patch to have callback ops when the iommu
> > notifier list adds its first entry and removes its last, any of the
> > additional overhead to generate notifies when nobody is listening can
> > be avoided.  These same callbacks would be the ones that need to
> > generate a hw_error if a notifier is added while running in CM=0.  
> 
>  Not familar with Alexey's patch  
> >>>
> >>> https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00079.html  
> >>
> >> Thanks for the pointer. :)
> >>  
> >>>  
>  , but is that for VFIO only?  
> >>>
> >>> vfio is currently the only user of the iommu notifier, but the
> >>> interface is generic, which is how it should (must) be.  
> >>
> >> Yes.
> >>  
> >>>  
>  I mean, if
>  we configured CMbit=1, guest kernel will send invalidation request
>  every time it creates new entries (context entries, or iotlb
>  entries). Even without VFIO notifiers, guest need to trap into QEMU
>  and process the invalidation requests. This is avoidable if we are not
>  using VFIO devices at all (so no need to maintain any mappings),
>  right?  
> >>>
> >>> CM=1 only defines that not-present and invalid entries can be cached,
> >>> any changes to existing entries requires an invalidation regardless of
> >>> CM.  What you're looking for sounds more like ECAP.C:  
> >>
> >> Yes, but I guess what I was talking about is CM bit but not ECAP.C.
> >> When we clear/replace one context entry, guest kernel will definitely
> >> send one context entry invalidation to QEMU:
> >>
> >> static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 
> >> devfn)
> >> {
> >>if (!iommu)
> >>return;
> >>
> >>clear_context_table(iommu, bus, devfn);
> >>iommu->flush.flush_context(iommu, 0, 0, 0,
> >>   DMA_CCMD_GLOBAL_INVL);
> >>iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
> >> }
> >>
> >> ... While if we are creating a new one (like attaching a new VFIO
> >> device?), it's an optional behavior depending on whether CM bit is
> >> set:
> >>
> >> static int domain_context_mapping_one(struct dmar_domain *domain,
> >>  struct intel_iommu *iommu,
> >>  u8 bus, u8 devfn)
> >> {
> >> ...
> >>/*
> >> * It's a non-present to present mapping. If hardware doesn't cache
> >> * non-present entry we only need to flush the write-buffer. If the
> >> * _does_ cache non-present entries, then it does so in the special
> >> * domain #0, which we have to flush:
> >> */
> >>if (cap_caching_mode(iommu->cap)) {
> >>iommu->flush.flush_context(iommu, 0,
> >>   (((u16)bus) << 8) | devfn,
> >>   DMA_CCMD_MASK_NOBIT,
> >>   DMA_CCMD_DEVICE_INVL);
> >>iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> >>} else {
> >>iommu_flush_write_buffer(iommu);
> >>}
> >> ...
> >> }
> >>
> >> Only if cap_caching_mode() is set (which is bit 7, the CM bit), we
> >> will send these invalidations. What I meant is that, we should allow
> >> user to specify the CM bit, so that when we are not using VFIO
> >> devices, we can skip the above flush_content() and flush_iotlb()
> >> etc... So, besides the truth that we have some guests do not support
> >> CM bit (like Jailhouse), performance might be another consideration
> >> point that we should allow user to specify the CM bit themselfs.  
> >
> > I'm dubious of this, IOMMU drivers are already aware that hardware
> > flushes are expensive and do batching to optimize it.  The queued
> > invalidation mechanism itself is meant to allow asynchronous
> > invalidations.  QEMU invalidating a virtual IOMMU might very well be
> > faster than hardware.  
> 
> Do batching doesn't mean we can eliminate the IOTLB flush for mappings 
> 

Re: [Qemu-devel] QEMU 2.7 release schedule?

2016-06-07 Thread Peter Maydell
On 7 June 2016 at 18:29, Michael Roth  wrote:
> I think it is actually bit shorter of a window this time. The last few
> releases had around 2.5 to 3 months between n-1 release and hard freeze / rc0
> for n+1, but the proposed date would be just around 2 months.

Yeah, it's a bit short because the late-breaking CVEs meant we
didn't release 2.6 until about two weeks later than we planned.

If we want to have 2.5 months between n-1 and rc0, that would be
something like
 softfreeze 5 july
 hardfreeze/rc0 26 july
 rc1 2 august
 rc2 9 august
 rc3 16 august
 release 22 august (before kvm forum) if we're lucky, or
  30 august if we're not (more likely)

[these dates are all +2 weeks on the previous suggestion.]

> Being in late RC during KVM Forum also sounds like it could
> be productive, but I'm not sure I'd want to be in that position
> if I was Peter...

>From my POV the rc3-to-rc4 stage is not that much work, but
it's hard to predict who might be the person with the last-minute
required fix (which is usually why we end up with about a week
of slip over the theoretical schedule). The tree is not supposed
to change at that point. I can do the rc/release cutting mechanics
remotely (assuming no disasters like stolen laptops etc); how about
your part with the tarballs? Otherwise we can just do it either
before or after the conference depending on how it goes.

But I think the real problem with a schedule which expects a
release at the tail end of August is that we then only have
three and a half months left til mid-December which is in
practice the latest we want to do a release given holidays.
So we can only avoid the short dev period this time round by
having a short one next time instead.

Maybe we could have +1 week rather than +0 or +2 (so softfreeze
28 June, rc0 19 July, release 16 August), as you suggest. That's
currently feeling like the best compromise to me.

thanks
-- PMM



  1   2   3   4   5   >