Re: [Qemu-devel] [PATCH v3 11/11] tcg-mips: Adjust condition functions for mips64

2016-11-28 Thread Richard Henderson

On 11/27/2016 11:42 PM, Jin Guojie wrote:

By reading Richard and Aurelien's comment, I realized now the best way to solve 
this problem
is not to add ext32s in brcond_32i, but to fix the helper function. In another 
word,
the register value should be 32-bit sign-extened at where it's being *created*, 
not where
it's being *utilized*. Maybe I can do this ext32s after helper_le_ld_name() is 
call back to ensure
 V0 to be sign-extended.


It's not necessarily V0 that needs to be extended, but the destination register 
(S1 in this case).  So perhaps


 tcg_out_opc_br(s, OPC_BEQ, TCG_REG_ZERO, TCG_REG_ZERO);
 /* delay slot */
-tcg_out_mov(s, TCG_TYPE_REG, v0, TCG_REG_V0);
+if (TCG_TARGET_REG_BITS == 64 && l->type == TCG_TYPE_I32) {
+tcg_out_opc_sa(s, OPC_SLL, v0, TCG_REG_V0, 0);
+} else {
+tcg_out_opc_reg(s, OPC_OR, v0, TCG_REG_V0, TCG_REG_ZERO);
+}

so that we always sign-extend 32-bit loads.


r~



Re: [Qemu-devel] [PATCH v2 1/1] block/vmdk: Fix the endian problem of buf_len and lba

2016-11-28 Thread Hao QingFeng



在 2016-11-28 15:56, liujing 写道:

Hi QingFeng,


I just have a question that whether the marker->data
need convert?

I've no idea, just suddenly realized this question.

nope, the data is type of char * for the compressed data stream, so no 
endian issue.

thanks.

Jing

On 11/26/2016 01:46 PM, QingFeng Hao wrote:

The problem was triggered by qemu-iotests case 055. It failed when it
was comparing the compressed vmdk image with original test.img.

The cause is that buf_len in vmdk_write_extent wasn't converted to
little-endian before it was stored to disk. But later vmdk_read_extent
read it and converted it from little-endian to cpu endian.
If the cpu is big-endian like s390, the problem will happen and
the data length read by vmdk_read_extent will become invalid!
The fix is to add the conversion in vmdk_write_extent, meanwhile,
repair the endianness problem of lba field which shall also be converted
to little-endian before storing to disk.

Cc: qemu-sta...@nongnu.org
Signed-off-by: QingFeng Hao 
Signed-off-by: Jing Liu 
Signed-off-by: Kevin Wolf 
---
  block/vmdk.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a11c27a..26e5f95 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1354,8 +1354,8 @@ static int vmdk_write_extent(VmdkExtent 
*extent, int64_t cluster_offset,

  goto out;
  }

-data->lba = offset >> BDRV_SECTOR_BITS;
-data->size = buf_len;
+data->lba = cpu_to_le64(offset >> BDRV_SECTOR_BITS);
+data->size = cpu_to_le32(buf_len);

  n_bytes = buf_len + sizeof(VmdkGrainMarker);
  iov = (struct iovec) {




--
QingFeng Hao(Robin)




Re: [Qemu-devel] [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:23PM +, Alex Bennée wrote:
> While tests can be pegged to tcg it is useful to override this from time
> to time, especially when testing correctness on real systems.
> ---
>  run_tests.sh | 8 ++--
>  scripts/runtime.bash | 4 
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 254129d..b88c36f 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,9 +13,10 @@ function usage()
>  {
>  cat <  
> -Usage: $0 [-g group] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-h] [-v]
>  
>  -g: Only execute tests in the given group
> +-a: Force acceleration mode (tcg/kvm)
>  -h: Output this help text
>  -v: Enables verbose mode
>  
> @@ -28,11 +29,14 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:hv" opt; do
> +while getopts "g:a:hv" opt; do
>  case $opt in
>  g)
>  only_group=$OPTARG
>  ;;
> +a)
> +force_accel=$OPTARG
> +;;
>  h)
>  usage
>  exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 11a40a9..578cf32 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -75,6 +75,10 @@ function run()
>  return;
>  fi
>  
> +if [ -n "$force_accel" ]; then
> +accel=$force_accel
> +fi
> +
>  if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>  echo "`SKIP` $1 ($arch only)"
>  return 2
> -- 
> 2.10.1

We can already do 'ACCEL=tcg ./run_tests.sh' to force, for example, tcg.
Additionally, you can add any env you want to the config.mak after running
configure,

 echo ACCEL=tcg >> config.mak

If you still prefer a cmdline parameter, then I'd suggest a boolean
instead, with the default being KVM. So the param would be '-tcg', or
something.

Thanks,
drew



[Qemu-devel] [PATCH] block/mirror: Fix passing wrong argument to trace_mirror_yield

2016-11-28 Thread Yang Wei
mirror_yield is defined in block/trace-event, just like the following:
mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight)
so we should exchange arguement 2 and 4 while invoking it.

Signed-off-by: Yang Wei 
---
 block/mirror.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 301ba92..2846a2e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -577,7 +577,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 }
 
 if (s->in_flight >= MAX_IN_FLIGHT) {
-trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
+trace_mirror_yield(s,  -1, s->buf_free_count, s->in_flight);
 mirror_wait_for_io(s);
 continue;
 }
@@ -730,7 +730,7 @@ static void coroutine_fn mirror_run(void *opaque)
 s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
 if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
 (cnt == 0 && s->in_flight > 0)) {
-trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
+trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
 mirror_wait_for_io(s);
 continue;
 } else if (cnt != 0) {
-- 
2.10.2




Re: [Qemu-devel] [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:24PM +, Alex Bennée wrote:
> Certainly during development of the tests and MTTCG there are times when
> the timeout just gets in the way.
> 
> Signed-off-by: Alex Bennée 
> ---
>  run_tests.sh | 8 ++--
>  scripts/runtime.bash | 4 
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index b88c36f..4f2e5cb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <  
> -Usage: $0 [-g group] [-a accel] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>  
>  -g: Only execute tests in the given group
>  -a: Force acceleration mode (tcg/kvm)
> +-t: disable timeouts
>  -h: Output this help text
>  -v: Enables verbose mode
>  
> @@ -29,7 +30,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:hv" opt; do
> +while getopts "g:a:thv" opt; do
>  case $opt in
>  g)
>  only_group=$OPTARG
> @@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
>  a)
>  force_accel=$OPTARG
>  ;;
> +t)
> +no_timeout="yes"
> +;;
>  h)
>  usage
>  exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 578cf32..968ff6d 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -79,6 +79,10 @@ function run()
>  accel=$force_accel
>  fi
>  
> +if [ "$no_timeout" = "yes" ]; then
> +timeout=""
> +fi
> +
>  if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>  echo "`SKIP` $1 ($arch only)"
>  return 2
> -- 
> 2.10.1
>

A timeout value of zero disables the timeout. So you just need to run
 TIMEOUT=0 ./run_tests.sh, or add it to config.mak.

drew



Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI

2016-11-28 Thread Gerd Hoffmann
  Hi,

> If I understand correctly, one argument against the current state of
> writeable fw_cfg, captured in
> , is
> that callbacks on write are not supported. Apparently, QEMU code that
> uses the data written by the guest is supposed to just read that data,
> not to expect a notification about it.
> 
> I'm unsure how this can work for actual negotiation, where the guest
> usually does a read/write/read cycle, and expects some kind of change
> between steps #2 and #3. I don't see how that can be implemented in QEMU
> without write callbacks (i.e. how QEMU can confirm or reject the
> negotiation attempt).

Do you actually need negotiation?  I think you only need to know
whenever broadcast-smi is supported, and the presence of the
etc/broadcast-smi (or however we name that) fw_cfg file indicates that.
If it is there just write true/false to it to enable/disable, and qemu
checks the field each time a smi is raised.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH v2 4/5] slirp: VMStatify socket level

2016-11-28 Thread Dr. David Alan Gilbert
* Samuel Thibault (samuel.thiba...@gnu.org) wrote:
> Samuel Thibault, on Sun 27 Nov 2016 16:13:46 +0100, wrote:
> > Dr. David Alan Gilbert (git), on Wed 23 Nov 2016 18:52:57 +, wrote:
> > > +static const VMStateDescription vmstate_slirp_socket_addr = {
> > > +.name = "slirp-socket-addr",
> > > +.version_id = 4,
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_UINT16(ss.ss_family, union slirp_sockaddr),
> > > +VMSTATE_UINT32_TEST(sin.sin_addr.s_addr, union slirp_sockaddr,
> > > +slirp_family_inet),
> > > +VMSTATE_UINT16_TEST(sin.sin_port, union slirp_sockaddr,
> > > +slirp_family_inet),
> > > +VMSTATE_END_OF_LIST()
> > > +}
> > > +};
> > 
> > How will we be able to add the IPv6 case here?
> 
> Reading again your previous post, it seemed it'd be in
> slirp_family_inet, but I don't immediately see how.
> 
> Applying your patch for 2.9 would thus make porting the code to IPv6
> more difficult than how it is now, so I'm quite reluctant :)
> 
> Could you perhaps simply add the IPv6 case in your patch series already?
> It shouldn't be much work for you who actually know how the VMSTATE
> machinery is supposed to work (I guess the amount of people who care
> about slirp *and* know about VMSTATE is extremely small), and a proof of
> concept for the portability to non-ipv4 addresse spaces.

The number of people who care about slirp, IPv6, VMState is even smaller :-)
Hmm, I don't really know IPv6 but I'm thinking this code will become something 
like
the following (says he not knowing whether a scope-id or a flowinfo
is something that needs migrating) (untested):


static const VMStateDescription vmstate_slirp_socket_addr = {
.name = "slirp-socket-addr",
.version_id = 4,
.fields = (VMStateField[]) {
VMSTATE_UINT16(ss.ss_family, union slirp_sockaddr),
VMSTATE_UINT32_TEST(sin.sin_addr.s_addr, union slirp_sockaddr,
slirp_family_inet),
VMSTATE_UINT16_TEST(sin.sin_port, union slirp_sockaddr,
slirp_family_inet),

VMSTATE_BUFFER_TEST(sin6.sin6_addr, union slirp_sockaddr,
slirp_family_inet6),
VMSTATE_UINT16_TEST(sin6.sin6_port, union slirp_sockaddr,
slirp_family_inet6),
VMSTATE_UINT32_TEST(sin6.sin6_flowinfo, union slirp_sockaddr,
slirp_family_inet6),
VMSTATE_UINT32_TEST(sin6.sin6_scope_id, union slirp_sockaddr,
slirp_family_inet6),

VMSTATE_END_OF_LIST()
}
};

So to me that looks pretty clean, we need to add another slirp_family_inet6
test function, but then we just add the extra fields for the IPv6 stuff.

Can you suggest an IPv6 command line for testing that ?

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:25PM +, Alex Bennée wrote:
> This introduces a the option -o for passing of options directly to QEMU
> which is useful. In my case I'm using it to toggle MTTCG on an off:
> 
>   ./run_tests.sh -t -o "-tcg mttcg=on"
> 
> Signed-off-by: Alex Bennée 
> ---
>  run_tests.sh   | 10 +++---
>  scripts/functions.bash | 13 +++--
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 4f2e5cb..05cc7fb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <  
> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>  
>  -g: Only execute tests in the given group
>  -a: Force acceleration mode (tcg/kvm)
> +-o: additional options for QEMU command line
>  -t: disable timeouts
>  -h: Output this help text
>  -v: Enables verbose mode
> @@ -30,7 +31,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:thv" opt; do
> +while getopts "g:a:o:thv" opt; do
>  case $opt in
>  g)
>  only_group=$OPTARG
> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>  a)
>  force_accel=$OPTARG
>  ;;
> +o)
> +extra_opts=$OPTARG
> +;;
>  t)
>  no_timeout="yes"
>  ;;
> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>  config=$TEST_DIR/unittests.cfg
>  rm -f test.log
>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
> -for_each_unittest $config run
> +for_each_unittest $config run "$extra_opts"
> diff --git a/scripts/functions.bash b/scripts/functions.bash
> index ee9143c..d38a69e 100644
> --- a/scripts/functions.bash
> +++ b/scripts/functions.bash
> @@ -2,11 +2,12 @@
>  function for_each_unittest()
>  {
>   local unittests="$1"
> - local cmd="$2"
> - local testname
> +local cmd="$2"
> +local extra_opts=$3
> +local testname

We use tabs in this file. Not sure why cmd and testname got
changed too...

>   local smp
>   local kernel
> - local opts
> +local opts=$extra_opts
>   local groups
>   local arch
>   local check
> @@ -21,7 +22,7 @@ function for_each_unittest()
>   testname=${BASH_REMATCH[1]}
>   smp=1
>   kernel=""
> - opts=""
> +opts=$extra_opts
>   groups=""
>   arch=""
>   check=""
> @@ -32,7 +33,7 @@ function for_each_unittest()
>   elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>   smp=${BASH_REMATCH[1]}
>   elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
> - opts=${BASH_REMATCH[1]}
> +opts="$opts ${BASH_REMATCH[1]}"
>   elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>   groups=${BASH_REMATCH[1]}
>   elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
> @@ -45,6 +46,6 @@ function for_each_unittest()
>   timeout=${BASH_REMATCH[1]}
>   fi
>   done
> - "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" 
> "$accel" "$timeout"
> +"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" 
> "$check" "$accel" "$timeout"
>   exec {fd}<&-
>  }
> -- 
> 2.10.1
> 
>

This is a pretty good idea, but I think I might like the extra options
to be given like this instead

  ./run_tests.sh [run_tests.sh options] -- [qemu options]

Thanks,
drew 



Re: [Qemu-devel] [PATCH] {disas, slirp}: Replace min/max with MIN/MAX macros

2016-11-28 Thread Yuval Shaia
On Mon, Nov 28, 2016 at 08:39:21AM +0100, Markus Armbruster wrote:
> The "{disas, slirp}: " prefix is unusual.  Better: "disas, slirp: ".
> But I'd instead split the patch into the slirp part, where you really
> replace stuff, and the disas part, where you merely drop an unused macro
> definition.

Thanks,
Accepting your first suggestion as just now realized that actually this
macro is in use (disas/m68k.c lines 4732 and 4796).
(Can't explain how i missed that in v0)



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:26PM +, Alex Bennée wrote:
> So we can have portable formatting of uint32_t types.
> 
> Signed-off-by: Alex Bennée 
> ---
>  lib/libcflat.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool bool;
>  #define true  1
>  
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX "l"
>  #  define __PRIPTR_PREFIX"l"
>  #else
> +#  define __PRI32_PREFIX"l"

But a 32-bit value is an 'int' and an 'int' shouldn't ever
require an 'l'. Why was this necessary?

>  #  define __PRI64_PREFIX "ll"
>  #  define __PRIPTR_PREFIX
>  #endif
> +#define PRId32  __PRI32_PREFIX   "d"
> +#define PRIu32  __PRI32_PREFIX   "u"
> +#define PRIx32  __PRI32_PREFIX   "x"
>  #define PRId64  __PRI64_PREFIX   "d"
>  #define PRIu64  __PRI64_PREFIX   "u"
>  #define PRIx64  __PRI64_PREFIX   "x"
> -- 
> 2.10.1
> 
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Qemu-devel] [PATCH v1] slirp, disas: Replace min/max with MIN/MAX macros

2016-11-28 Thread Yuval Shaia
Signed-off-by: Yuval Shaia 
---
v0 -> v1:
* Remove unndeeded "include" as suggested by Fam Zheng
* Change commit message's prefix as suggested by Markus Armbruster
* Utilize MIN macro in two extra places in file disas/m68k.c
---
 disas/m68k.c   |  8 ++--
 slirp/dhcpv6.c |  2 +-
 slirp/ip6_icmp.c   |  2 +-
 slirp/slirp.c  |  2 +-
 slirp/slirp.h  |  5 -
 slirp/tcp_input.c  | 14 +++---
 slirp/tcp_output.c |  6 +++---
 slirp/tcp_timer.c  |  2 +-
 slirp/tcpip.h  |  2 +-
 9 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/disas/m68k.c b/disas/m68k.c
index 8e7c3f7..81e0fb4 100644
--- a/disas/m68k.c
+++ b/disas/m68k.c
@@ -4698,10 +4698,6 @@ get_field (const unsigned char *data, enum 
floatformat_byteorders order,
   return result;
 }
 
-#ifndef min
-#define min(a, b) ((a) < (b) ? (a) : (b))
-#endif
-
 /* Convert from FMT to a double.
FROM is the address of the extended float.
Store the double in *TO.  */
@@ -4733,7 +4729,7 @@ floatformat_to_double (const struct floatformat *fmt,
   nan = 0;
   while (mant_bits_left > 0)
{
- mant_bits = min (mant_bits_left, 32);
+ mant_bits = MIN(mant_bits_left, 32);
 
  if (get_field (ufrom, fmt->byteorder, fmt->totalsize,
 mant_off, mant_bits) != 0)
@@ -4793,7 +4789,7 @@ floatformat_to_double (const struct floatformat *fmt,
 
   while (mant_bits_left > 0)
 {
-  mant_bits = min (mant_bits_left, 32);
+  mant_bits = MIN(mant_bits_left, 32);
 
   mant = get_field (ufrom, fmt->byteorder, fmt->totalsize,
 mant_off, mant_bits);
diff --git a/slirp/dhcpv6.c b/slirp/dhcpv6.c
index 02c51c7..d266611 100644
--- a/slirp/dhcpv6.c
+++ b/slirp/dhcpv6.c
@@ -168,7 +168,7 @@ static void dhcpv6_info_request(Slirp *slirp, struct 
sockaddr_in6 *srcsas,
 sa[0], sa[1], sa[2], sa[3], sa[4], sa[5], sa[6], sa[7],
 sa[8], sa[9], sa[10], sa[11], sa[12], sa[13], sa[14],
 sa[15], slirp->bootp_filename);
-slen = min(slen, smaxlen);
+slen = MIN(slen, smaxlen);
 *resp++ = slen >> 8;/* option-len high byte */
 *resp++ = slen; /* option-len low byte */
 resp += slen;
diff --git a/slirp/ip6_icmp.c b/slirp/ip6_icmp.c
index 6d18e28..298a48d 100644
--- a/slirp/ip6_icmp.c
+++ b/slirp/ip6_icmp.c
@@ -95,7 +95,7 @@ void icmp6_send_error(struct mbuf *m, uint8_t type, uint8_t 
code)
 #endif
 
 rip->ip_nh = IPPROTO_ICMPV6;
-const int error_data_len = min(m->m_len,
+const int error_data_len = MIN(m->m_len,
 IF_MTU - (sizeof(struct ip6) + ICMP6_ERROR_MINLEN));
 rip->ip_pl = htons(ICMP6_ERROR_MINLEN + error_data_len);
 t->m_len = sizeof(struct ip6) + ntohs(rip->ip_pl);
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 6e2b4e5..60539de 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -774,7 +774,7 @@ void slirp_pollfds_poll(GArray *pollfds, int select_error)
 static void arp_input(Slirp *slirp, const uint8_t *pkt, int pkt_len)
 {
 struct slirp_arphdr *ah = (struct slirp_arphdr *)(pkt + ETH_HLEN);
-uint8_t arp_reply[max(ETH_HLEN + sizeof(struct slirp_arphdr), 64)];
+uint8_t arp_reply[MAX(ETH_HLEN + sizeof(struct slirp_arphdr), 64)];
 struct ethhdr *reh = (struct ethhdr *)arp_reply;
 struct slirp_arphdr *rah = (struct slirp_arphdr *)(arp_reply + ETH_HLEN);
 int ar_op;
diff --git a/slirp/slirp.h b/slirp/slirp.h
index a1f3139..3877f66 100644
--- a/slirp/slirp.h
+++ b/slirp/slirp.h
@@ -292,9 +292,4 @@ int tcp_emu(struct socket *, struct mbuf *);
 int tcp_ctl(struct socket *);
 struct tcpcb *tcp_drop(struct tcpcb *tp, int err);
 
-#ifndef _WIN32
-#define min(x,y) ((x) < (y) ? (x) : (y))
-#define max(x,y) ((x) > (y) ? (x) : (y))
-#endif
-
 #endif
diff --git a/slirp/tcp_input.c b/slirp/tcp_input.c
index c5063a9..42c7f2f 100644
--- a/slirp/tcp_input.c
+++ b/slirp/tcp_input.c
@@ -596,7 +596,7 @@ findso:
   win = sbspace(&so->so_rcv);
  if (win < 0)
win = 0;
- tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt));
+ tp->rcv_wnd = MAX(win, (int)(tp->rcv_adv - tp->rcv_nxt));
}
 
switch (tp->t_state) {
@@ -1065,7 +1065,7 @@ trimthenstep6:
else if (++tp->t_dupacks == TCPREXMTTHRESH) {
tcp_seq onxt = tp->snd_nxt;
u_int win =
-   min(tp->snd_wnd, tp->snd_cwnd) / 2 /
+   MIN(tp->snd_wnd, tp->snd_cwnd) / 2 /
tp->t_maxseg;
 
if (win < 2)
@@ -1138,7 +1138,7 @@ trimthenstep6:
 
  if (cw > tp->snd_ssthresh)
incr = incr * incr / cw;
- tp->snd_cwnd = min(cw + incr, TCP_MAXWIN

[Qemu-devel] [Bug 1641861] Re: fail to correctly emulate FPSCR register on arm

2016-11-28 Thread Peter Maydell
Hi. The v8 ARM ARM defines these bits of the FPSCR as "RES0". The
glossary definition of "RES0" says that for bits in a RW register it is
an implementation choice whether the bits should be "hardwired to 0" (ie
writes are ignored) or whether the bit can be written and read back (but
has no effect on behaviour). QEMU has gone for the "can be written and
read back" option.

(Previous versions of the architecture like v7 required implementations
to provide the "hardwired to 0" behaviour. In any case correctly
behaving guest code should never write 1s to these bits.)

This is a specific example of a general situation: QEMU doesn't really
pay very much attention to the edge cases of behaviour in IMPDEF or
UNPREDICTABLE cases, especially where they vary between architecture
versions, and we don't try to enforce "unimplemented always RAZ/RAO"
bits in registers. So while it might be nice to have these bits RAZ0
it's really very low priority for us -- I'd happily review and accept a
patch to do this, but am unlikely to write one myself.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1641861

Title:
  fail to correctly emulate FPSCR register on arm

Status in QEMU:
  New

Bug description:
  Hi all, we systematically tested the QEMU implementation for emulating
  arm user mode programs. We found that QEMU incorrectly emulate the
  FPSCR register. The following the proof of code:

  /*** Beginning of the bug: arm.c **/

  int printf(const char *format, ...);
  unsigned char i0[0x10];
  unsigned char o[0x10];
  int main() {
  int k = 0;
  asm("mov r2, %0\n"
  "ldr r0, [r2]\n"::"r"((char *)(i0)));;
  asm("vmsr fpscr, r0");
  asm("mov r2, %0\n"
  "vmrs r4, fpscr\n"
  "str r4, [r2]\n"::"r"((char *)(o)));;
  for (k = 0; k < 0x10; k++)
  printf("%02x", o[0x10 - 1 - k]);
  printf("\n");
  }
  unsigned char i0[0x10] = {0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 
0x28, 0x1c, 0xc7, 0x01, 0x00, 0x00, 0x00, 0x00};

  /*** End fo the bug **/

  When the program is compiled into arm binary code and running on a
  real arm machine, and running in qemu, we have the following result

  $ arm-linux-gnueabihf-gcc arm.c -o arm -static
  $ ./arm
  fff7009f
  $ qemu-arm arm
  

  According to the ARM manual, bits[19, 14:13, 6:5] of FPSCR should be
  reserved as zero. However, arm qemu fails to keep these bits to be
  zero: these bits can be actually modified in QEMU.

  QEMU version is 2.7.0. The operating system is Linux 3.13.0. x86_64.

  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1641861/+subscriptions



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:28PM +, Alex Bennée wrote:
> As distro compilers move towards defaults for build hardening for things
> like ASLR we need to force -fno-pic. Failure to do can lead to weird
> relocation problems when we build our "lat" binaries.
> 
> Signed-off-by: Alex Bennée 
> ---
>  arm/Makefile.common | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 52f7440..cca0d9c 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -21,6 +21,7 @@ phys_base = $(LOADADDR)
>  
>  CFLAGS += -std=gnu99
>  CFLAGS += -ffreestanding
> +CFLAGS += -fno-pic
>  CFLAGS += -Wextra
>  CFLAGS += -O2
>  CFLAGS += -I lib -I lib/libfdt
> -- 
> 2.10.1
> 
>

Applied to arm/next

https://github.com/rhdrjones/kvm-unit-tests/commits/arm/next

Thanks,
drew



Re: [Qemu-devel] GPIO input?

2016-11-28 Thread Peter Maydell
On 17 November 2016 at 12:41, Liviu Ionescu  wrote:
> I added graphical buttons to GNU ARM Eclipse QEMU, and I can already trigger 
> actions when a button is pushed/released, so currently I can reset the 
> STM32F4DISCOVERY board clicking the reset button on the board picture.
>
> Now I want to implement the user button, which on this board is connected to 
> a GPIO pin.
>
>
> Can you suggest an existing board that somehow implements a GPIO
> input, so I can a use a similar mechanism?

Stellaris has a bunch of things with GPIO inputs (eg the ADC
has a GPIO input which we wire up to the GPTM device). Basically
the device implement some input GPIO lines, and then the board
code can call qdev_get_gpio_in() to get the qemu_irq so it can
signal it when appropriate (or wire it up to some other device
which signals it). You might prefer to use the named-gpios
rather than the old-style numbered GPIOs; either will work.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI

2016-11-28 Thread Paolo Bonzini


On 25/11/2016 15:10, Igor Mammedov wrote:
> On Fri, 25 Nov 2016 03:55:29 -0500 (EST)
> Paolo Bonzini  wrote:
>>> if 0x3 were covered by SMRR range, then OSPM wouldn't able to
>>> place its own code there and there wouldn't be any need in side interfaces
>>> to put and get CPU in/from some undefined by spec state (parked).
>>>
>>> But above would imply a large block allocated at 0x3 to fit all
>>> possible CPUs+1, not sure if it's doable (maybe edk2 wouldn't have
>>> big issues with reserving large block in lowmem).  
>>
>> If you mean that the default SMRR range would include 0x3 for an 
>> hotplugged
>> CPU, that would work but it is a significant departure from real hardware.
>> I'd rather avoid that.
> 
> But that's guest side only solution to guest problem, that won't require
> any assistance from QEMU/KVM.

No, I don't think it can be guest-only.  The initial value of SMRRs is
undefined, and SMRRs are per processor.  The newly-hotplugged CPU has no
SMRRs defined when it is started up.

> Baremetal would also benefit from it as it won't need to implement unpark
> logic anymore. it should also work for existing HW that has unpark logic.
> 
> Do you have any pointers to how hardware does unparking now?

The firmware tells the BMC to do it.  I don't know what the firmware-BMC
interface looks like.

>>> It looks like we need only SMM accessible guest/host interface to make
>>> CPU unparking secure or cover default SMBASE by SMRR.  
>>
>> Yes, unparking would be accessible only from SMM if the unparking feature
>> is negotiated.
> 
> I suppose we could check in MMIO handler that all CPUs are in SMM mode
> before allowing unparking or ignore command if they are not.
> 
> For compat reasons unpark should be optin feature (i.e. firmware should
> explicitly enable it to avoid breaking existing configurations (SeaBIOS))

Yes, of course---that's why I'm bringing up in the context of this series.

Paolo



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:29PM +, Alex Bennée wrote:
> This adds a fairly brain dead torture test for TLB flushes intended for
> stressing the MTTCG QEMU build. It takes the usual -smp option for
> multiple CPUs.
> 
> By default it CPU0 will do a TLBIALL flush after each cycle. You can
> pass options via -append to control additional aspects of the test:
> 
>   - "page" flush each page in turn (one per function)
>   - "self" do the flush after each computation cycle
>   - "verbose" report progress on each computation cycle
> 
> Signed-off-by: Alex Bennée 
> CC: Mark Rutland 
> 
> ---
> v2
>   - rename to tlbflush-test
>   - made makefile changes cleaner
>   - added self/other flush mode
>   - create specific prefix
>   - whitespace fixes
> v3
>   - using new SMP framework for test runing
> v4
>   - merge in the unitests.cfg
> v5
>   - max out at -smp 4
>   - printf fmtfix
> v7
>   - rename to tlbflush-code
>   - int -> bool flags
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-code.c | 212 
> 
>  arm/unittests.cfg   |  24 ++
>  3 files changed, 238 insertions(+)
>  create mode 100644 arm/tlbflush-code.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index cca0d9c..de99a6e 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
> +tests-common += $(TEST_DIR)/tlbflush-code.flat
>  
>  all: test_cases
>  
> @@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
>  test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
> +$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o

This should no longer be necessary.

> diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
> new file mode 100644
> index 000..cb5cdc2
> --- /dev/null
> +++ b/arm/tlbflush-code.c
> @@ -0,0 +1,212 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working a
> + * compuation task and will then trigger TLB flushes across the
> + * system. It doesn't actually need to re-map anything but the flushes
> + * themselves will trigger QEMU's TCG self-modifying code detection
> + * which will invalidate any generated  code causing re-translation.
> + * Eventually the code buffer will fill and a general tb_lush() will
> + * be triggered.
> + *
> + * Copyright (C) 2016, Linaro, Alex Bennée 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define SEQ_LENGTH 10
> +#define SEQ_HASH 0x7cd707fe
> +
> +static cpumask_t smp_test_complete;
> +static int flush_count = 100;
> +static bool flush_self;
> +static bool flush_page;
> +static bool flush_verbose;
> +
> +/*
> + * Work functions
> + *
> + * These work functions need to be:
> + *
> + *  - page aligned, so we can flush one function at a time
> + *  - have branches, so QEMU TCG generates multiple basic blocks
> + *  - call across pages, so we exercise the TCG basic block slow path
> + */
> +
> +/* Adler32 */
> +__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
> + size_t buflen)

I think I'd prefer

__attribute__((aligned(PAGE_SIZE)))
uint32_t hash_array(const void *buf, size_t buflen)

to handle the long line

> +{
> + const uint8_t *data = (uint8_t *) buf;
> + uint32_t s1 = 1;
> + uint32_t s2 = 0;
> +
> + for (size_t n = 0; n < buflen; n++) {
> + s1 = (s1 + data[n]) % 65521;
> + s2 = (s2 + s1) % 65521;
> + }
> + return (s2 << 16) | s1;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
> + unsigned int *array)
> +{
> + int i;
> +
> + /* first two values */
> + array[0] = 0;
> + array[1] = 1;
> + for (i=2; i + array[i] = array[i-2] + array[i-1];
> + }

please don't use {} for one-liners. Try running the kernel's check_patch
on your patches. Applies many places below

> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned 
> int n)

long line

> +{
> + unsigned int i;
> + unsigned long long fac = 1;
> + for (i=1; i<=n; i++)
> + {
> + fac = fac * i;
> + }
> + return fac;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void factorial_array
> +(unsigned int n, unsigned int *input, unsigned long long *output)
> +{
> + unsigned int i;
> + for (i=0; i + output[i] = factorial(input[i]);
> + }
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
> +{
> +

Re: [Qemu-devel] [RfC PATCH 0/3] edk2: add efi firmware builds

2016-11-28 Thread Gerd Hoffmann
  Hi,

> Should we think about our policy for distributing & shipping ROMS
> more generally ?  Most distros will actively strip out the ROMs that
> we ship in the QEMU tar.gz releases, and rebuild them from pristine
> source in order to ensure they're fully complying with licensing
> requirements wrt "full & corresponding source".

Or they don't even use the bundled rom sources and build from upstream
tarball instead.

> So should we consider actually shipping 2 tar.gz files for QEMU
> releases. One minimal qemu-x.y.z.tar.gz that only contains pristine
> QEMU source with no pre-built artifacts, and a second qemu-full-x.y.z.tar.gz
> that contains the QEMU source, plus an arbitrary number of pre-built
> blobs.

Makes sense.

> In terms of GIT, we could likewise make the binary ROMS live in
> a submodule, so we don't bloat the main GIT repo, but I don't
> think that's so critical

Adding some submodule (roms/prebuilt?) would also make the management
easier.

Basically the full tarball would be qemu plus submodules, and the
minimal tarball would be qemu without submodules.  You can simply use
"git archive" to create the later, which is a bit problematic today as
the tarball will include blobs without sources.

cheers,
  Gerd




Re: [Qemu-devel] [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:30PM +, Alex Bennée wrote:
> This test is the cousin of the tlbflush-code test. Instead of flushing
> running code it re-maps virtual addresses while a buffer is being filled
> up. It then audits the results checking for writes that have ended up in
> the wrong place.
> 
> While tlbflush-code exercises QEMU's translation invalidation logic this
> test stresses the SoftMMU cputlb code and ensures it is semantically
> correct.
> 
> The test optionally takes two parameters for debugging:
> 
>cycles   - change the default number of test iterations
>page - flush pages individually instead of all
> 
> Signed-off-by: Alex Bennée 
> CC: Mark Rutland 
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-data.c | 401 
> 
>  arm/unittests.cfg   |  12 ++
>  3 files changed, 415 insertions(+)
>  create mode 100644 arm/tlbflush-data.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index de99a6e..528166d 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
> +tests-common += $(TEST_DIR)/tlbflush-data.flat
>  
>  all: test_cases
>  
> @@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
> +$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o

This isn't necessary

> diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
> new file mode 100644
> index 000..7920179
> --- /dev/null
> +++ b/arm/tlbflush-data.c
> @@ -0,0 +1,401 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working on
> + * a writing to a set of pages. It will then re-map one of the pages
> + * back and forth while recording the timestamps of when each page was
> + * active. The test fails if a write was detected on a page after the
> + * tlbflush switching to a new page should have completed.
> + *
> + * Copyright (C) 2016, Linaro, Alex Bennée 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define NR_TIMESTAMPS((PAGE_SIZE/sizeof(u64)) << 2)
> +#define NR_AUDIT_RECORDS 16384
> +#define NR_DYNAMIC_PAGES 3
> +#define MAX_CPUS 8
> +
> +#define MIN(a, b)((a) < (b) ? (a) : (b))

Peter Xu is bringing MIN to libcflat with his edu series.

> +
> +typedef struct {
> + u64 timestamps[NR_TIMESTAMPS];
> +} write_buffer;
> +
> +typedef struct {
> + write_buffer*newbuf;
> + u64 time_before_flush;
> + u64 time_after_flush;
> +} audit_rec_t;
> +
> +typedef struct {
> + audit_rec_t records[NR_AUDIT_RECORDS];
> +} audit_buffer;
> +
> +typedef struct {
> + write_buffer*stable_pages;
> + write_buffer*dynamic_pages[NR_DYNAMIC_PAGES];
> + audit_buffer*audit;
> + unsigned intflush_count;
> +} test_data_t;
> +
> +static test_data_t test_data[MAX_CPUS];
> +
> +static cpumask_t ready;
> +static cpumask_t complete;
> +
> +static bool test_complete;
> +static bool flush_verbose;
> +static bool flush_by_page;
> +static int test_cycles=3;
> +static int secondary_cpus;
> +
> +static write_buffer * alloc_test_pages(void)
> +{
> + write_buffer *pg;
> + pg = calloc(NR_TIMESTAMPS, sizeof(u64));
> + return pg;
> +}
> +
> +static void setup_pages_for_cpu(int cpu)
> +{
> + unsigned int i;
> +
> + test_data[cpu].stable_pages = alloc_test_pages();
> +
> + for (i=0; i + test_data[cpu].dynamic_pages[i] = alloc_test_pages();
> + }
> +
> + test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
> +}
> +
> +static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
> +{
> + return &buf->records[record];
> +}
> +
> +/* Sync on a given cpumask */
> +static void wait_on(int cpu, cpumask_t *mask)
> +{

Why take 'cpu' as a parameter. Just use smp_processor_id()

> + cpumask_set_cpu(cpu, mask);
> + while (!cpumask_full(mask))
> + cpu_relax();
> +}
> +
> +static uint64_t sync_start(void)
> +{
> + const uint64_t gate_mask = ~0x7ff;
> + uint64_t gate, now;
> + gate = get_cntvct() & gate_mask;
> + do {
> + now = get_cntvct();
> + } while ((now & gate_mask) == gate);

I'm not really sure what this function is doing. Trying to
get synchronized timestamps between cpus?

> +
> + return now;
> +}
> +
> +static void do_page_writes(void)
> +{
> + unsigned int i, runs = 0;
> + int cpu = smp_

Re: [Qemu-devel] QEMU soundcards vulnerable to jack retasking?

2016-11-28 Thread Dr. David Alan Gilbert
* ban...@openmailbox.org (ban...@openmailbox.org) wrote:
> Recent security research shows that soundcards support surreptitiously
> switching line-out jacks into line-in by modifying the software stack. The
> way modern speakers and headphones are designed makes them readily usable as
> microphones. The Intel High Definition (HD) Audio standards which all modern
> consumer soundcards are based mandates this stupidity.
> 
> https://arxiv.org/ftp/arxiv/papers/1611/1611.07350.pdf
> 
> Does anyone know if QEMU's emulated sound devices follow this standard? If
> yes then a malicious guest that can modify the virt sound hardware can turn
> PC speakers into surveillance devices even if the microphone is disabled on
> the host. The only solution is completely denying untrusted VMs access to a
> virtual sound device.

I think it's reasonably isolated; the emulated audio controller ends up using
normal pulseaudio/alsa etc to talk to your host's audio system - so I don't
think it should be able to screw around with low level settings of the codecs.

Dave

> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] GPIO input?

2016-11-28 Thread Liviu Ionescu

> On 28 Nov 2016, at 11:34, Peter Maydell  wrote:
> 
> You might prefer to use the named-gpios ...


I already did so, thank you.

Liviu





Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI

2016-11-28 Thread Laszlo Ersek
On 11/28/16 10:01, Gerd Hoffmann wrote:
>   Hi,
> 
>> If I understand correctly, one argument against the current state of
>> writeable fw_cfg, captured in
>> , is
>> that callbacks on write are not supported. Apparently, QEMU code that
>> uses the data written by the guest is supposed to just read that data,
>> not to expect a notification about it.
>>
>> I'm unsure how this can work for actual negotiation, where the guest
>> usually does a read/write/read cycle, and expects some kind of change
>> between steps #2 and #3. I don't see how that can be implemented in QEMU
>> without write callbacks (i.e. how QEMU can confirm or reject the
>> negotiation attempt).
> 
> Do you actually need negotiation?  I think you only need to know
> whenever broadcast-smi is supported, and the presence of the
> etc/broadcast-smi (or however we name that) fw_cfg file indicates that.

Michael suggested to use negotiation like virtio does (where the host
can reject invalid combinations of requested features):

http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03077.html
http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03218.html

I thought negotiation was overkill:

http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03123.html

but I figured I'd just stick with the recommendation.

> If it is there just write true/false to it to enable/disable, and qemu
> checks the field each time a smi is raised.

If we agree there's no need for negotiation, I'm game.

Thanks
Laszlo




Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-11-28 Thread Ketan Nilangekar


On 11/25/16, 5:05 PM, "Stefan Hajnoczi"  wrote:

On Fri, Nov 25, 2016 at 08:27:26AM +, Ketan Nilangekar wrote:
> On 11/24/16, 9:38 PM, "Stefan Hajnoczi"  wrote:
> On Thu, Nov 24, 2016 at 11:31:14AM +, Ketan Nilangekar wrote:
> > On 11/24/16, 4:41 PM, "Stefan Hajnoczi"  wrote:
> > On Thu, Nov 24, 2016 at 05:44:37AM +, Ketan Nilangekar 
wrote:
> > > On 11/24/16, 4:07 AM, "Paolo Bonzini"  
wrote:
> > > >On 23/11/2016 23:09, ashish mittal wrote:
> > > >> On the topic of protocol security -
> > > >> 
> > > >> Would it be enough for the first patch to implement only
> > > >> authentication and not encryption?
> > > >
> > > >Yes, of course.  However, as we introduce more and more 
QEMU-specific
> > > >characteristics to a protocol that is already QEMU-specific 
(it doesn't
> > > >do failover, etc.), I am still not sure of the actual 
benefit of using
> > > >libqnio versus having an NBD server or FUSE driver.
> > > >
> > > >You have already mentioned performance, but the design has 
changed so
> > > >much that I think one of the two things has to change: 
either failover
> > > >moves back to QEMU and there is no (closed source) 
translator running on
> > > >the node, or the translator needs to speak a well-known and
> > > >already-supported protocol.
> > > 
> > > IMO design has not changed. Implementation has changed 
significantly. I would propose that we keep resiliency/failover code out of 
QEMU driver and implement it entirely in libqnio as planned in a subsequent 
revision. The VxHS server does not need to understand/handle failover at all. 
> > > 
> > > Today libqnio gives us significantly better performance than 
any NBD/FUSE implementation. We know because we have prototyped with both. 
Significant improvements to libqnio are also in the pipeline which will use 
cross memory attach calls to further boost performance. Ofcourse a big reason 
for the performance is also the HyperScale storage backend but we believe this 
method of IO tapping/redirecting can be leveraged by other solutions as well.
> > 
> > By "cross memory attach" do you mean
> > process_vm_readv(2)/process_vm_writev(2)?
> >   
> > Ketan> Yes.
> >   
> > That puts us back to square one in terms of security.  You have
> > (untrusted) QEMU + (untrusted) libqnio directly accessing the 
memory of
> > another process on the same machine.  That process is therefore 
also
> > untrusted and may only process data for one guest so that 
guests stay
> > isolated from each other.
> > 
> > Ketan> Understood but this will be no worse than the current 
network based communication between qnio and vxhs server. And although we have 
questions around QEMU trust/vulnerability issues, we are looking to implement 
basic authentication scheme between libqnio and vxhs server.
> 
> This is incorrect.
> 
> Cross memory attach is equivalent to ptrace(2) (i.e. debugger) access.
> It means process A reads/writes directly from/to process B memory.  
Both
> processes must have the same uid/gid.  There is no trust boundary
> between them.
> 
> Ketan> Not if vxhs server is running as root and initiating the cross mem 
attach. Which is also why we are proposing a basic authentication mechanism 
between qemu-vxhs. But anyway the cross memory attach is for a near future 
implementation. 
> 
> Network communication does not require both processes to have the same
> uid/gid.  If you want multiple QEMU processes talking to a single 
server
> there must be a trust boundary between client and server.  The server
> can validate the input from the client and reject undesired 
operations.
> 
> Ketan> This is what we are trying to propose. With the addition of 
authentication between qemu-vxhs server, we should be able to achieve this. 
Question is, would that be acceptable?
> 
> Hope this makes sense now.
> 
> Two architectures that implement the QEMU trust model correctly are:
> 
> 1. Cross memory attach: each QEMU process has a dedicated vxhs server
>process to prevent guests from attacking each other.  This is 
where I
>said you might as well put the code inside QEMU since there is no
>isolation anyway.  From what you've said it sounds like the vxhs
>server needs a host-wide view and is responsible for all guests
>running on the host, so I guess we have to rule out this
>architecture.
> 
> 2. Network comm

Re: [Qemu-devel] [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:31PM +, Alex Bennée wrote:
> This test has been written mainly to stress multi-threaded TCG behaviour
> but will demonstrate failure by default on real hardware. The test takes
> the following parameters:
> 
>   - "lock" use GCC's locking semantics
>   - "atomic" use GCC's __atomic primitives
>   - "wfelock" use WaitForEvent sleep
>   - "excl" use load/store exclusive semantics
> 
> Also two more options allow the test to be tweaked
> 
>   - "noshuffle" disables the memory shuffling
>   - "count=%ld" set your own per-CPU increment count
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - Don't use thumb style strexeq stuff
>   - Add atomic and wfelock tests
>   - Add count/noshuffle test controls
>   - Move barrier tests to separate test file
> v4
>   - fix up unitests.cfg to use correct test name
>   - move into "locking" group, remove barrier tests
>   - use a table to add tests, mark which are expected to work
>   - correctly report XFAIL
> v5
>   - max out at -smp 4 in unittest.cfg
> v7
>   - make test control flags bools
>   - default the count to 10 (so it doesn't timeout)
> ---
>  arm/Makefile.common |   2 +
>  arm/locking-test.c  | 302 
> 
>  arm/unittests.cfg   |  34 ++
>  3 files changed, 338 insertions(+)
>  create mode 100644 arm/locking-test.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 528166d..eb4cfdf 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
>  tests-common += $(TEST_DIR)/tlbflush-data.flat
> +tests-common += $(TEST_DIR)/locking-test.flat
>  
>  all: test_cases
>  
> @@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
>  $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
> +$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o

Instead of adding a new test file, please extend the one we already have,
which iirc was the first MTTCG test, arm/spinlock-test.c. If you don't
like the naming or code in spinlock-test.c, then feel free to change it,
delete it. It's currently not getting run by arm/unittests.cfg, and it's
not getting maintained.

> diff --git a/arm/locking-test.c b/arm/locking-test.c
> new file mode 100644
> index 000..f10c61b
> --- /dev/null
> +++ b/arm/locking-test.c
> @@ -0,0 +1,302 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MAX_CPUS 8
> +
> +/* Test definition structure
> + *
> + * A simple structure that describes the test name, expected pass and
> + * increment function.
> + */
> +
> +/* Function pointers for test */
> +typedef void (*inc_fn)(int cpu);
> +
> +typedef struct {
> + const char *test_name;
> + bool  should_pass;
> + inc_fn main_fn;
> +} test_descr_t;
> +
> +/* How many increments to do */
> +static int increment_count = 100;
> +static bool do_shuffle = true;
> +
> +/* Shared value all the tests attempt to safely increment using
> + * various forms of atomic locking and exclusive behaviour.
> + */
> +static unsigned int shared_value;
> +
> +/* PAGE_SIZE * uint32_t means we span several pages */
> +__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
> +
> +/* We use the alignment of the following to ensure accesses to locking
> + * and synchronisation primatives don't interfere with the page of the
> + * shared value
> + */
> +__attribute__((aligned(PAGE_SIZE))) static unsigned int 
> per_cpu_value[MAX_CPUS];
> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
> +__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
> +
> +/* Some of the approaches use a global lock to prevent contention. */
> +static int global_lock;
> +
> +/* In any SMP setting this *should* fail due to cores stepping on
> + * each other updating the shared variable
> + */
> +static void increment_shared(int cpu)
> +{
> + (void)cpu;
> +
> + shared_value++;
> +}
> +
> +/* GCC __sync primitives are deprecated in favour of __atomic */
> +static void increment_shared_with_lock(int cpu)
> +{
> + (void)cpu;
> +
> + while (__sync_lock_test_and_set(&global_lock, 1));
> + shared_value++;
> + __sync_lock_release(&global_lock);
> +}
> +
> +/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
> + * semantics */
> +static void increment_shared_with_atomic(int cpu)
> +{
> + (void)cpu;
> +
> + __atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
> +}
> +
> +
> +/*
> + * Load/store exclusive with WFE (wait-for-event)
> + *
> + * See ARMv8 ARM examples:
> + *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
> + */
> +
> +static void increm

Re: [Qemu-devel] QEMU soundcards vulnerable to jack retasking?

2016-11-28 Thread Gerd Hoffmann
On Fr, 2016-11-25 at 21:25 +0100, ban...@openmailbox.org wrote:
> Recent security research shows that soundcards support surreptitiously 
> switching line-out jacks into line-in by modifying the software stack. 
> The way modern speakers and headphones are designed makes them readily 
> usable as microphones. The Intel High Definition (HD) Audio standards 
> which all modern consumer soundcards are based mandates this stupidity.
> 
> https://arxiv.org/ftp/arxiv/papers/1611/1611.07350.pdf
> 
> Does anyone know if QEMU's emulated sound devices follow this standard? 

No.  Output line is output only and input line is input only, period.

There are three qemu hda codecs available:

  hda-output
only playback (line-out).  No way the guest can record anything.

  hda-duplex
record (line-in) and playback (line-out).  Use that if you need
sound recording in the guest.

  hda-micro
record (micro) and playback (speaker).  Same as hda-duplex, except
that the record and playback channels are tagged differently.  Use
that if your guests app is picky and refuses to record from line-in.

Where the data for the record channel comes from is subject to host
configuration.  On a typical linux system it'll probably being
pulseaudio, either directly or indirectly (via spice).

cheers,
  Gerd




Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:22PM +, Alex Bennée wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.
> 
> The code needs to be applied on top of Drew's latest ARM GIC patches
> or you can grab my tree from:
> 
>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Thanks Alex,

I've skimmed over everything looking at it from a framwork/sytle
perspective. I didn't dig in trying to understand the tests though.
One general comment, I see many tests introduce MAX_CPUS 8. Why do
that? Why not allow all cpus by using NR_CPUS for the array sizes?

Thanks,
drew

> 
> Cheers,
> 
> Alex.
> 
> Alex Bennée (11):
>   run_tests: allow forcing of acceleration mode
>   run_tests: allow disabling of timeouts
>   run_tests: allow passing of options to QEMU
>   libcflat: add PRI(dux)32 format types
>   lib: add isaac prng library from CCAN
>   arm/Makefile.common: force -fno-pic
>   arm/tlbflush-code: Add TLB flush during code execution test
>   arm/tlbflush-data: Add TLB flush during data writes test
>   arm/locking-tests: add comprehensive locking test
>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>   arm/tcg-test: some basic TCG exercising tests
> 
>  Makefile  |   2 +
>  arm/Makefile.arm  |   2 +
>  arm/Makefile.arm64|   2 +
>  arm/Makefile.common   |  11 ++
>  arm/barrier-litmus-test.c | 437 
> ++
>  arm/locking-test.c| 302 
>  arm/tcg-test-asm.S| 170 ++
>  arm/tcg-test-asm64.S  | 169 ++
>  arm/tcg-test.c| 337 +++
>  arm/tlbflush-code.c   | 212 ++
>  arm/tlbflush-data.c   | 401 ++
>  arm/unittests.cfg | 190 
>  lib/arm/asm/barrier.h |  63 ++-
>  lib/arm64/asm/barrier.h   |  50 ++
>  lib/libcflat.h|   5 +
>  lib/prng.c| 162 +
>  lib/prng.h|  82 +
>  run_tests.sh  |  18 +-
>  scripts/functions.bash|  13 +-
>  scripts/runtime.bash  |   8 +
>  20 files changed, 2626 insertions(+), 10 deletions(-)
>  create mode 100644 arm/barrier-litmus-test.c
>  create mode 100644 arm/locking-test.c
>  create mode 100644 arm/tcg-test-asm.S
>  create mode 100644 arm/tcg-test-asm64.S
>  create mode 100644 arm/tcg-test.c
>  create mode 100644 arm/tlbflush-code.c
>  create mode 100644 arm/tlbflush-data.c
>  create mode 100644 lib/prng.c
>  create mode 100644 lib/prng.h
> 
> -- 
> 2.10.1
> 
> 



Re: [Qemu-devel] QEMU soundcards vulnerable to jack retasking?

2016-11-28 Thread Daniel P. Berrange
On Mon, Nov 28, 2016 at 10:19:16AM +, Dr. David Alan Gilbert wrote:
> * ban...@openmailbox.org (ban...@openmailbox.org) wrote:
> > Recent security research shows that soundcards support surreptitiously
> > switching line-out jacks into line-in by modifying the software stack. The
> > way modern speakers and headphones are designed makes them readily usable as
> > microphones. The Intel High Definition (HD) Audio standards which all modern
> > consumer soundcards are based mandates this stupidity.
> > 
> > https://arxiv.org/ftp/arxiv/papers/1611/1611.07350.pdf
> > 
> > Does anyone know if QEMU's emulated sound devices follow this standard? If
> > yes then a malicious guest that can modify the virt sound hardware can turn
> > PC speakers into surveillance devices even if the microphone is disabled on
> > the host. The only solution is completely denying untrusted VMs access to a
> > virtual sound device.
> 
> I think it's reasonably isolated; the emulated audio controller ends up using
> normal pulseaudio/alsa etc to talk to your host's audio system - so I don't
> think it should be able to screw around with low level settings of the codecs.

Further, the admin has to make an explicit decision to allow the guest to
have any audio access at all, by explicitly configuring a virtual sound
card.  Some sound card models QEMU supports (ac97, es1370) are always
full duplex, meaning they always allow the guest to do both audio input
and output. Other models (intel-hda) can be setup in either full-duplex
or half-duplex modes, at host administrators discretion. In the half
duplex mode, my reading of the code indicates that it is impossible to
reach the code paths for audio input, so no matter what the guest tries
to do to the virtual sound card it seems, it will only ever be able to
output audio, never get audio input.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH] target-arm: Add VBAR support to ARM1176 CPUs

2016-11-28 Thread Peter Maydell
On 24 November 2016 at 14:29, Cédric Le Goater  wrote:
> On 09/05/2016 04:39 PM, Peter Maydell wrote:
>> We implement VBAR in v7-without-EL3 even though architecturally
>> it should only exist in v7-with-EL3 because we have some
>> legacy board models which we implement as without-EL3 but
>> where the guest (Linux) assumes it's running on a with-EL3
>> CPU in NS mode. I'd rather we stick to the architectural
>> definition for 1176 if we can rather than expanding the
>> "things we do which aren't architectural" set if there's
>> no legacy config that requires it. (If it is necessary
>> to define it for 1176 always then that's OK, I'd just
>> prefer not to unless we know we have to.)
>
> According to the ARM spec, 1176 is a v6 with a couple of
> extensions, among which v6k and TrustZone. So it has Secure
> and Non-Secure VBAR and MVBAR registers.
>
> Looking at :
>
>   https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures
>
> May be, we should instead introduce a new ARMv6Z architecture
> for the 1176 cpu which would represent a ARMv6 + TrustZone ?
>
>
> But I am a bit confused with this EL3 feature. 1176 does not
> have EL3 to my understanding (I am beyond my ARM skills there).

EL3 == TrustZone's Monitor mode level of privilege. In
ARMv8 the privilege level model was tidied up a bit so it's
more cleanly arranged in levels EL0 (user) EL1 (kernel)
EL2 (hypervisor) EL3 (secure monitor), and the old 32-bit
setup can be expressed in terms of that (User being EL0,
the various Sys/Fiq/Irq/etc modes being EL1, Hyp at EL2
and Mon at EL3). For QEMU we implemented TrustZone support
after the v8 spec was published, so we had the opportunity
to do it in a way that fit nicely with 64-bit TZ support.
So ARM_FEATURE_EL3 is our "does this CPU have TrustZone"
flag, in the same way that ARM_FEATURE_EL2 is "does it have
the Virtualization extensions".

We also support 1176-without-TrustZone, which doesn't
exist in real hardware but which we have for backwards
compatibility with code that runs on our 1176 boards
which we implemented before we had TZ support in the
CPU implementation.

What I'm suggesting is that we shoulld provide VBAR
only in the 1176-with-TZ CPUs, not in the 1176-no-TZ
CPUs, unless we must provide it for the latter as a
back-compat requirement.

> So why is it part of the feature list in arm1176_initfn ?
> Is that a way to define the MVBAR and SCR regs ?

The CPU object defaults to TZ-enabled, and then the
board model can disable it. (See for instance the
code in hw/arm/realview.c which sets the has_el3 property
on the CPU object to false.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2 4/5] slirp: VMStatify socket level

2016-11-28 Thread Samuel Thibault
Hello,

Dr. David Alan Gilbert, on Mon 28 Nov 2016 09:08:16 +, wrote:
> Hmm, I don't really know IPv6 but I'm thinking this code will become 
> something like
> the following (says he not knowing whether a scope-id or a flowinfo
> is something that needs migrating) (untested):
> 
> 
> static const VMStateDescription vmstate_slirp_socket_addr = {
> .name = "slirp-socket-addr",
> .version_id = 4,
> .fields = (VMStateField[]) {
> VMSTATE_UINT16(ss.ss_family, union slirp_sockaddr),
> VMSTATE_UINT32_TEST(sin.sin_addr.s_addr, union slirp_sockaddr,
> slirp_family_inet),
> VMSTATE_UINT16_TEST(sin.sin_port, union slirp_sockaddr,
> slirp_family_inet),
> 
> VMSTATE_BUFFER_TEST(sin6.sin6_addr, union slirp_sockaddr,
> slirp_family_inet6),
> VMSTATE_UINT16_TEST(sin6.sin6_port, union slirp_sockaddr,
> slirp_family_inet6),
> VMSTATE_UINT32_TEST(sin6.sin6_flowinfo, union slirp_sockaddr,
> slirp_family_inet6),
> VMSTATE_UINT32_TEST(sin6.sin6_scope_id, union slirp_sockaddr,
> slirp_family_inet6),
> 
> VMSTATE_END_OF_LIST()
> }
> };

Ok, that looks good :)

> So to me that looks pretty clean, we need to add another slirp_family_inet6
> test function, but then we just add the extra fields for the IPv6 stuff.

Yes, now I see.

> Can you suggest an IPv6 command line for testing that ?

Well, it doesn't exist yet, that's the problem. And applying your patch
would have made it to exist even harder, so that's why I was afraid.

I would say that your patch should contain these IPv6 lines, but as
comments since they are untested, for the person who will at some point
want to implement the IPv6 case here.

Samuel



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Andrew Jones
On Thu, Nov 24, 2016 at 04:10:22PM +, Alex Bennée wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.

One other thought. I'm not sure how best to approach a bunch of TCG-only
tests getting integrated. I'm thinking it might be nice to give them
their own subdir under the arch dir, e.g. arm/tcg. That subdir would
have its own unittests.cfg file too. Otherwise when we run on KVM we'll
have a load of "SKIP: requires TCG" type messages...

We'll want to add a run_tests.sh option to pass the name of the subdir,
'-d tcg'. When the subdir name is 'tcg' ACCEL could automatically be
switched to 'tcg' as well.

Thanks,
drew



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Alex Bennée

Andrew Jones  writes:

> On Thu, Nov 24, 2016 at 04:10:22PM +, Alex Bennée wrote:
>> Hi,
>>
>> Looking at my records it seems as though it has been a while since I
>> last posted these tests. As I'm hoping to get the final bits of MTTCG
>> merged upstream on the next QEMU development cycle I've been re-basing
>> these and getting them cleaned up for merging.
>>
>> Some of the patches might be worth taking now if the maintainers are
>> happy to do so (run_test tweaks, libcflat updates?). The others could
>> do with more serious review. I've CC'd some of the ARM guys to look
>> over the tlbflush/barrier tests so they can cast their expert eyes
>> over them ;-)
>>
>> There are two additions to the series.
>>
>> The tcg-test is a general torture test aimed at QEMU's TCG execution
>> model. It stresses the cpu execution loop through the use of
>> cross-page and computed jumps. It can also add IRQ's and self-modifying
>> code to the mix.
>>
>> The tlbflush-data test is a new one, the old tlbflush test is renamed
>> tlbflush-code to better indicate the code path it exercise. The the
>> code test tests the translation invalidation pathways in QEMU the data
>> test exercises the SoftMMU's TLBs and explicitly that tlbflush
>> completion semantics are correct.
>>
>> The tlbflush-data passes most of the times on real hardware but
>> definitely showed the problem with deferred TLB flushes running under
>> MTTCG QEMU. I've looked at some of the failure cases on real hardware
>> and it did look like a timestamp appeared on a page that shouldn't
>> have been accessible at the time - I don't know if this is a real
>> silicon bug or my misreading of the semantics so I'd appreciate
>> a comment from the experts.
>>
>> The code needs to be applied on top of Drew's latest ARM GIC patches
>> or you can grab my tree from:
>>
>>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7
>
> Thanks Alex,
>
> I've skimmed over everything looking at it from a framwork/sytle
> perspective. I didn't dig in trying to understand the tests though.
> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> that? Why not allow all cpus by using NR_CPUS for the array sizes?

Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>
> Thanks,
> drew
>
>>
>> Cheers,
>>
>> Alex.
>>
>> Alex Bennée (11):
>>   run_tests: allow forcing of acceleration mode
>>   run_tests: allow disabling of timeouts
>>   run_tests: allow passing of options to QEMU
>>   libcflat: add PRI(dux)32 format types
>>   lib: add isaac prng library from CCAN
>>   arm/Makefile.common: force -fno-pic
>>   arm/tlbflush-code: Add TLB flush during code execution test
>>   arm/tlbflush-data: Add TLB flush during data writes test
>>   arm/locking-tests: add comprehensive locking test
>>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>>   arm/tcg-test: some basic TCG exercising tests
>>
>>  Makefile  |   2 +
>>  arm/Makefile.arm  |   2 +
>>  arm/Makefile.arm64|   2 +
>>  arm/Makefile.common   |  11 ++
>>  arm/barrier-litmus-test.c | 437 
>> ++
>>  arm/locking-test.c| 302 
>>  arm/tcg-test-asm.S| 170 ++
>>  arm/tcg-test-asm64.S  | 169 ++
>>  arm/tcg-test.c| 337 +++
>>  arm/tlbflush-code.c   | 212 ++
>>  arm/tlbflush-data.c   | 401 ++
>>  arm/unittests.cfg | 190 
>>  lib/arm/asm/barrier.h |  63 ++-
>>  lib/arm64/asm/barrier.h   |  50 ++
>>  lib/libcflat.h|   5 +
>>  lib/prng.c| 162 +
>>  lib/prng.h|  82 +
>>  run_tests.sh  |  18 +-
>>  scripts/functions.bash|  13 +-
>>  scripts/runtime.bash  |   8 +
>>  20 files changed, 2626 insertions(+), 10 deletions(-)
>>  create mode 100644 arm/barrier-litmus-test.c
>>  create mode 100644 arm/locking-test.c
>>  create mode 100644 arm/tcg-test-asm.S
>>  create mode 100644 arm/tcg-test-asm64.S
>>  create mode 100644 arm/tcg-test.c
>>  create mode 100644 arm/tlbflush-code.c
>>  create mode 100644 arm/tlbflush-data.c
>>  create mode 100644 lib/prng.c
>>  create mode 100644 lib/prng.h
>>
>> --
>> 2.10.1
>>
>>


--
Alex Bennée



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Peter Maydell
On 28 November 2016 at 11:12, Alex Bennée  wrote:
>
> Andrew Jones  writes:
>> I've skimmed over everything looking at it from a framwork/sytle
>> perspective. I didn't dig in trying to understand the tests though.
>> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>
> Yeah - I can fix those. I wonder what the maximum is with GIC V3?

So large that you don't want to hardcode it as an array size...

thanks
-- PMM



Re: [Qemu-devel] [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension

2016-11-28 Thread Stefan Hajnoczi
On Sun, Nov 27, 2016 at 08:17:14PM +0100, Wouter Verhelst wrote:
> Quickly: the reason I haven't merged this yes is twofold:
> - I wasn't thrilled with the proposal at the time. It felt a bit
>   hackish, and bolted onto NBD so you could use it, but without defining
>   everything in the NBD protocol. "We're reading some data, but it's not
>   about you". That didn't feel right
>
> - There were a number of questions still unanswered (you're answering a
>   few below, so that's good).
> 
> For clarity, I have no objection whatsoever to adding more commands if
> they're useful, but I would prefer that they're also useful with NBD on
> its own, i.e., without requiring an initiation or correlation of some
> state through another protocol or network connection or whatever. If
> that's needed, that feels like I didn't do my job properly, if you get
> my point.

The out-of-band operations you are referring to are for dirty bitmap
management.  (The goal is to read out blocks that changed since the last
backup.)

The client does not access the live disk, instead it accesses a
read-only snapshot and the dirty information (so that it can copy out
only blocks that were written).  The client is allowed to read blocks
that are not dirty too.

If you want to implement the whole incremental backup workflow in NBD
then the client would first have to connect to the live disk, set up
dirty tracking, create a snapshot export, and then connect to that
snapshot.

That sounds like a big feature set and I'd argue it's for the control
plane (storage API) and not the data plane (NBD).  There were
discussions about transferring the dirty information via the control
plane but it seems more appropriate to it in the data plane since it is
block-level information.

I'm arguing that the NBD protocol doesn't need to support the
incremental backup workflow since it's a complex control plane concept.

Being able to read dirty information via NBD is useful for other block
backup applications, not just QEMU.  It could be used for syncing LVM
volumes across machines, for example, if someone implements an NBD+LVM
server.

Another issue with adding control plane operations is that you need to
begin considering privilege separation.  Should all NBD clients be able
to initiate snapshots, dirty tracking, etc or is some kind of access
control required to limit certain commands?  Not all clients require the
same privileges and so they shouldn't have access to the same set of
operations.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU

2016-11-28 Thread Alex Bennée

Andrew Jones  writes:

> On Thu, Nov 24, 2016 at 04:10:25PM +, Alex Bennée wrote:
>> This introduces a the option -o for passing of options directly to QEMU
>> which is useful. In my case I'm using it to toggle MTTCG on an off:
>>
>>   ./run_tests.sh -t -o "-tcg mttcg=on"
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  run_tests.sh   | 10 +++---
>>  scripts/functions.bash | 13 +++--
>>  2 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/run_tests.sh b/run_tests.sh
>> index 4f2e5cb..05cc7fb 100755
>> --- a/run_tests.sh
>> +++ b/run_tests.sh
>> @@ -13,10 +13,11 @@ function usage()
>>  {
>>  cat <>
>> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>>
>>  -g: Only execute tests in the given group
>>  -a: Force acceleration mode (tcg/kvm)
>> +-o: additional options for QEMU command line
>>  -t: disable timeouts
>>  -h: Output this help text
>>  -v: Enables verbose mode
>> @@ -30,7 +31,7 @@ EOF
>>  RUNTIME_arch_run="./$TEST_DIR/run"
>>  source scripts/runtime.bash
>>
>> -while getopts "g:a:thv" opt; do
>> +while getopts "g:a:o:thv" opt; do
>>  case $opt in
>>  g)
>>  only_group=$OPTARG
>> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>>  a)
>>  force_accel=$OPTARG
>>  ;;
>> +o)
>> +extra_opts=$OPTARG
>> +;;
>>  t)
>>  no_timeout="yes"
>>  ;;
>> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>>  config=$TEST_DIR/unittests.cfg
>>  rm -f test.log
>>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
>> -for_each_unittest $config run
>> +for_each_unittest $config run "$extra_opts"
>> diff --git a/scripts/functions.bash b/scripts/functions.bash
>> index ee9143c..d38a69e 100644
>> --- a/scripts/functions.bash
>> +++ b/scripts/functions.bash
>> @@ -2,11 +2,12 @@
>>  function for_each_unittest()
>>  {
>>  local unittests="$1"
>> -local cmd="$2"
>> -local testname
>> +local cmd="$2"
>> +local extra_opts=$3
>> +local testname
>
> We use tabs in this file. Not sure why cmd and testname got
> changed too...
>
>>  local smp
>>  local kernel
>> -local opts
>> +local opts=$extra_opts
>>  local groups
>>  local arch
>>  local check
>> @@ -21,7 +22,7 @@ function for_each_unittest()
>>  testname=${BASH_REMATCH[1]}
>>  smp=1
>>  kernel=""
>> -opts=""
>> +opts=$extra_opts
>>  groups=""
>>  arch=""
>>  check=""
>> @@ -32,7 +33,7 @@ function for_each_unittest()
>>  elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>>  smp=${BASH_REMATCH[1]}
>>  elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
>> -opts=${BASH_REMATCH[1]}
>> +opts="$opts ${BASH_REMATCH[1]}"
>>  elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>>  groups=${BASH_REMATCH[1]}
>>  elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
>> @@ -45,6 +46,6 @@ function for_each_unittest()
>>  timeout=${BASH_REMATCH[1]}
>>  fi
>>  done
>> -"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" 
>> "$accel" "$timeout"
>> +"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" 
>> "$check" "$accel" "$timeout"
>>  exec {fd}<&-
>>  }
>> --
>> 2.10.1
>>
>>
>
> This is a pretty good idea, but I think I might like the extra options
> to be given like this instead
>
>   ./run_tests.sh [run_tests.sh options] -- [qemu options]
>
> Thanks,
> drew

That sounds like a better way, I'll fix that.

--
Alex Bennée



Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI

2016-11-28 Thread Igor Mammedov
On Mon, 28 Nov 2016 10:41:42 +0100
Paolo Bonzini  wrote:

> On 25/11/2016 15:10, Igor Mammedov wrote:
> > On Fri, 25 Nov 2016 03:55:29 -0500 (EST)
> > Paolo Bonzini  wrote:  
> >>> if 0x3 were covered by SMRR range, then OSPM wouldn't able to
> >>> place its own code there and there wouldn't be any need in side interfaces
> >>> to put and get CPU in/from some undefined by spec state (parked).
> >>>
> >>> But above would imply a large block allocated at 0x3 to fit all
> >>> possible CPUs+1, not sure if it's doable (maybe edk2 wouldn't have
> >>> big issues with reserving large block in lowmem).
> >>
> >> If you mean that the default SMRR range would include 0x3 for an 
> >> hotplugged
> >> CPU, that would work but it is a significant departure from real hardware.
> >> I'd rather avoid that.  
> > 
> > But that's guest side only solution to guest problem, that won't require
> > any assistance from QEMU/KVM.  
> 
> No, I don't think it can be guest-only.  The initial value of SMRRs is
> undefined, and SMRRs are per processor.  The newly-hotplugged CPU has no
> SMRRs defined when it is started up.
Does it matter?
If 0x3 is protected by SMRRs on exiting CPUs so OS can't tamper with it
and SMI is injected on hotplug
(i.e. hotplugged CPU arrives with SMI pin active, we can do this without
inventing PV solution to park/unpark CPU).

Then even if OS sends INIT/SIPI with code to hijack SMRAM that code
won't get a chance to run unrestricted as pending SMI will be processed
first and SMRAM will be relocated by firmware to per CPU area along with
configuring the hotplugged CPU's SMRR before malicious SIPI is even executed.


> > Baremetal would also benefit from it as it won't need to implement unpark
> > logic anymore. it should also work for existing HW that has unpark logic.
> > 
> > Do you have any pointers to how hardware does unparking now?  
> 
> The firmware tells the BMC to do it.  I don't know what the firmware-BMC
> interface looks like.
> 
> >>> It looks like we need only SMM accessible guest/host interface to make
> >>> CPU unparking secure or cover default SMBASE by SMRR.
> >>
> >> Yes, unparking would be accessible only from SMM if the unparking feature
> >> is negotiated.  
> > 
> > I suppose we could check in MMIO handler that all CPUs are in SMM mode
> > before allowing unparking or ignore command if they are not.
> > 
> > For compat reasons unpark should be optin feature (i.e. firmware should
> > explicitly enable it to avoid breaking existing configurations (SeaBIOS))  
> 
> Yes, of course---that's why I'm bringing up in the context of this series.
> 
> Paolo




Re: [Qemu-devel] [PATCH v1 1/1] generic-loader: file: Only set a PC if a CPU is specified

2016-11-28 Thread Peter Maydell
On 16 November 2016 at 22:02, Alistair Francis
 wrote:
> On Fri, Nov 11, 2016 at 8:06 PM, Edgar E. Iglesias
>  wrote:
>> On Fri, Nov 11, 2016 at 06:51:20PM -0800, Alistair Francis wrote:
>>> This patch fixes the generic-loader file loading to only set the program
>>> counter if a CPU is specified. This follows what is written in the
>>> documentation and was always part of the original intention.
>>
>> Reviewed-by: Edgar E. Iglesias 
>
> Peter can this go through your queue?
>
> I think it should make it into 2.8 as it matches the documentation and
> this will be the first release with the device loader.

Yes, I think this counts as a bugfix -- applied to target-arm.next.

-- PMM



Re: [Qemu-devel] [PATCH v2 1/1] cadence_uart: Check baud rate generator and divider values on migration

2016-11-28 Thread Peter Maydell
On 8 November 2016 at 00:34, Alistair Francis
 wrote:
> The Cadence UART device emulator calculates speed by dividing the
> baud rate by a 'baud rate generator' & 'baud rate divider' value.
> The device specification defines these register values to be
> non-zero and within certain limits. Checks were recently added when
> writing to these registers but not when restoring from migration.
>
> This patch adds checks when restoring from migration to avoid divide by
> zero errors.
>
> Reported-by: Huawei PSIRT 
> Signed-off-by: Alistair Francis 
> ---
> V2:
>  - Abort the migration if the data is invalid
>
>  hw/char/cadence_uart.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c
> index def34cd..9568ac6 100644
> --- a/hw/char/cadence_uart.c
> +++ b/hw/char/cadence_uart.c
> @@ -487,6 +487,13 @@ static int cadence_uart_post_load(void *opaque, int 
> version_id)
>  {
>  CadenceUARTState *s = opaque;
>
> +/* Ensure these two aren't invalid numbers */
> +if (s->r[R_BRGR] <= 1 || s->r[R_BRGR] & 0x ||
> +s->r[R_BDIV] <= 3 || s->r[R_BDIV] & 0xFF) {
> +/* Value is invalid, abort */
> +return 1;
> +}

The "s->r[R_BRGR] & 0x" and "s->r[R_BDIV] & 0xFF" checks
look wrong -- it's ok for the low bits to be set, it's only
if high bits are set that we want to abort the migration.
Missing '~'s ? (I'm surprised this bug doesn't cause migration
to fail every time.)

thanks
-- PMM



Re: [Qemu-devel] [Qemu-arm] [PATCH] arm: Create /chosen and /memory devicetree nodes if necessary

2016-11-28 Thread Peter Maydell
On 17 November 2016 at 01:30, Guenter Roeck  wrote:
> While customary, the /chosen and /memory devicetree nodes do not have to
> exist. Create if necessary. Also create the /memory/device_type property
> if needed.
>
> Signed-off-by: Guenter Roeck 
> ---
> The problem is seen with the latest version of the Linux kernel in
> linux-next (next-20161116), where many of the /chosen and /memory nodes
> in arm devicetree files have been removed. This results in a kernel hang
> (sabrelite) or qemu abort (imx25-pdk).

The lack of a memory node sounds to me like a bug in the Linux device
trees. The 0.1 specification at http://www.devicetree.org/specifications/
says on page 21 "the following nodes shall be present at the root of all
devicetrees: [...] At least one memory node".

There's no harm in QEMU being robust against bogus input, though,
especially if the dtbs are common in the wild, so this patch is
worth having. (And 'chosen' is definitely optional.)

I've applied this patch to target-arm.next.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 12:24, Igor Mammedov wrote:
> On Mon, 28 Nov 2016 10:41:42 +0100
> Paolo Bonzini  wrote:
> 
>> On 25/11/2016 15:10, Igor Mammedov wrote:
>>> On Fri, 25 Nov 2016 03:55:29 -0500 (EST)
>>> Paolo Bonzini  wrote:  
> if 0x3 were covered by SMRR range, then OSPM wouldn't able to
> place its own code there and there wouldn't be any need in side interfaces
> to put and get CPU in/from some undefined by spec state (parked).
>
> But above would imply a large block allocated at 0x3 to fit all
> possible CPUs+1, not sure if it's doable (maybe edk2 wouldn't have
> big issues with reserving large block in lowmem).

 If you mean that the default SMRR range would include 0x3 for an 
 hotplugged
 CPU, that would work but it is a significant departure from real hardware.
 I'd rather avoid that.  
>>>
>>> But that's guest side only solution to guest problem, that won't require
>>> any assistance from QEMU/KVM.  
>>
>> No, I don't think it can be guest-only.  The initial value of SMRRs is
>> undefined, and SMRRs are per processor.  The newly-hotplugged CPU has no
>> SMRRs defined when it is started up.
> 
> Does it matter?
> If 0x3 is protected by SMRRs on exiting CPUs so OS can't tamper with it
> and SMI is injected on hotplug
> (i.e. hotplugged CPU arrives with SMI pin active, we can do this without
> inventing PV solution to park/unpark CPU).

If you mean placing TSEG at 0x3 no, that won't work.  You need much
more memory than that.

If you need placing SMRRs there because KVM doesn't need SMRRs to
overlap TSEG, there are problems:

0) KVM doesn't implement SMRRs yet anyway. :)

1) while it's true that we don't need SMRRs, on Haswell and newer
processors SMRRs also provide the range of physical addresses from which
you can execute from in SMM ("SMM Code Access Check").  We do not
implement that, but it's planned.

2) I would still not bet on not having other vulnerabilities hidden.
For example, can the OS try to have two hotplugs run the initial SMM in
parallel?

3) It's really ugly and only works for virt.  I wouldn't even call the
alternative PV.  This thing is not part of the hardware specs; as long
as ACPI can describe it, it's not PV, it's firmware.

Paolo



Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 11:22, Laszlo Ersek wrote:
> Michael suggested to use negotiation like virtio does (where the host
> can reject invalid combinations of requested features):
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03077.html
> http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03218.html
> 
> I thought negotiation was overkill:
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03123.html
> 
> but I figured I'd just stick with the recommendation.

I think negotiation is useful.  By doing things like "prevent
broadcast_SMI && !parked_VCPUs if max_cpus > smp_cpus", it lets you
commit your OVMF patch immediately for example.

Paolo

>> > If it is there just write true/false to it to enable/disable, and qemu
>> > checks the field each time a smi is raised.
> If we agree there's no need for negotiation, I'm game.



Re: [Qemu-devel] [PATCH v6 3/4] hw/intc/arm_gicv3_kvm: Save and Restore ICC_SRE_EL1 register

2016-11-28 Thread Peter Maydell
On 23 November 2016 at 12:39,   wrote:
> From: Vijaya Kumar K 
>
> Save and Restore ICC_SRE_EL1 register. ICC_SRE_EL1 register
> value is used by kernel to check if SRE bit is set or not.
>
> Signed-off-by: Vijaya Kumar K 
> ---
>  hw/intc/arm_gicv3_kvm.c| 4 
>  include/hw/intc/arm_gicv3_common.h | 1 +
>  2 files changed, 5 insertions(+)
>
> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
> index a317fbf..77af32d 100644
> --- a/hw/intc/arm_gicv3_kvm.c
> +++ b/hw/intc/arm_gicv3_kvm.c
> @@ -65,6 +65,8 @@
>  KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 3)
>  #define ICC_CTLR_EL1\
>  KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 4)
> +#define ICC_SRE_EL1 \
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 5)
>  #define ICC_IGRPEN0_EL1 \
>  KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 6)
>  #define ICC_IGRPEN1_EL1 \
> @@ -404,6 +406,7 @@ static void kvm_arm_gicv3_put(GICv3State *s)
>  GICv3CPUState *c = &s->cpu[ncpu];
>  int num_pri_bits;
>
> +kvm_gicc_access(s, ICC_SRE_EL1, ncpu, &c->icc_sre_el1, true);
>  kvm_gicc_access(s, ICC_CTLR_EL1, ncpu,
>  &c->icc_ctlr_el1[GICV3_NS], true);
>  kvm_gicc_access(s, ICC_IGRPEN0_EL1, ncpu,
> @@ -557,6 +560,7 @@ static void kvm_arm_gicv3_get(GICv3State *s)
>  GICv3CPUState *c = &s->cpu[ncpu];
>  int num_pri_bits;
>
> +kvm_gicc_access(s, ICC_SRE_EL1, ncpu, &c->icc_sre_el1, false);
>  kvm_gicc_access(s, ICC_CTLR_EL1, ncpu,
>  &c->icc_ctlr_el1[GICV3_NS], false);
>  kvm_gicc_access(s, ICC_IGRPEN0_EL1, ncpu,
> diff --git a/include/hw/intc/arm_gicv3_common.h 
> b/include/hw/intc/arm_gicv3_common.h
> index 341a311..183c7f8 100644
> --- a/include/hw/intc/arm_gicv3_common.h
> +++ b/include/hw/intc/arm_gicv3_common.h
> @@ -166,6 +166,7 @@ struct GICv3CPUState {
>  uint8_t gicr_ipriorityr[GIC_INTERNAL];
>
>  /* CPU interface */
> +uint64_t icc_sre_el1;
>  uint64_t icc_ctlr_el1[2];
>  uint64_t icc_pmr_el1;
>  uint64_t icc_bpr[3];

If you're adding a new structure field you must also update
the vmstate struct in arm_gicv3_common.c.

thanks
-- PMM



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Andrew Jones
On Mon, Nov 28, 2016 at 11:14:48AM +, Peter Maydell wrote:
> On 28 November 2016 at 11:12, Alex Bennée  wrote:
> >
> > Andrew Jones  writes:
> >> I've skimmed over everything looking at it from a framwork/sytle
> >> perspective. I didn't dig in trying to understand the tests though.
> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >
> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> 
> So large that you don't want to hardcode it as an array size...

255 with the gic series, not yet merged. Even if you have a dozen arrays
with that as the size, then unless your array element size is huge (on
the order multiple pages), then it probably doesn't matter. Using the
default memory allocation of a QEMU guest, 128 MB, unit tests have plenty
of memory at their disposal. The framework itself only allocates a handful
of pages.

Of course the framework also supports dynamic memory allocation, so you
could do malloc(nr_cpus * element_size), to avoid excess.

Thanks,
drew



Re: [Qemu-devel] [PATCH v6 2/4] hw/intc/arm_gicv3_kvm: Implement get/put functions

2016-11-28 Thread Peter Maydell
On 23 November 2016 at 12:39,   wrote:
> From: Vijaya Kumar K 
>
> This actually implements pre_save and post_load methods for in-kernel
> vGICv3.
>
> Signed-off-by: Pavel Fedin 
> Signed-off-by: Peter Maydell 
> [PMM:
>  * use decimal, not 0bnnn
>  * fixed typo in names of ICC_APR0R_EL1 and ICC_AP1R_EL1
>  * completely rearranged the get and put functions to read and write
>the state in a natural order, rather than mixing distributor and
>redistributor state together]
> Signed-off-by: Vijaya Kumar K 
> [Vijay:
>  * Update macro KVM_VGIC_ATTR
>  * Use 32 bit access for gicd and gicr
>  * GICD_IROUTER, GICD_TYPER, GICR_PROPBASER and GICR_PENDBASER reg
>access  are changed from 64-bit to 32-bit access
>  * Add ICC_SRE_EL1 save and restore
>  * Dropped translate_fn mechanism and coded functions to handle
>save and restore of edge_trigger and priority
>  * Number of APnR register saved/restored based on number of
>priority bits supported]
> ---
> ---
>  hw/intc/arm_gicv3_kvm.c  | 559 
> ++-
>  hw/intc/gicv3_internal.h |   1 +
>  2 files changed, 549 insertions(+), 11 deletions(-)

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [Qemu-devel] [PATCH v6 3/4] hw/intc/arm_gicv3_kvm: Save and Restore ICC_SRE_EL1 register

2016-11-28 Thread Peter Maydell
On 28 November 2016 at 11:54, Peter Maydell  wrote:
> On 23 November 2016 at 12:39,   wrote:
>> From: Vijaya Kumar K 
>>
>> Save and Restore ICC_SRE_EL1 register. ICC_SRE_EL1 register
>> value is used by kernel to check if SRE bit is set or not.
>>
>> Signed-off-by: Vijaya Kumar K 
>> ---
>>  hw/intc/arm_gicv3_kvm.c| 4 
>>  include/hw/intc/arm_gicv3_common.h | 1 +
>>  2 files changed, 5 insertions(+)
>>
>> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
>> index a317fbf..77af32d 100644
>> --- a/hw/intc/arm_gicv3_kvm.c
>> +++ b/hw/intc/arm_gicv3_kvm.c
>> @@ -65,6 +65,8 @@
>>  KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 3)
>>  #define ICC_CTLR_EL1\
>>  KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 4)
>> +#define ICC_SRE_EL1 \
>> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 5)
>>  #define ICC_IGRPEN0_EL1 \
>>  KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 6)
>>  #define ICC_IGRPEN1_EL1 \
>> @@ -404,6 +406,7 @@ static void kvm_arm_gicv3_put(GICv3State *s)
>>  GICv3CPUState *c = &s->cpu[ncpu];
>>  int num_pri_bits;
>>
>> +kvm_gicc_access(s, ICC_SRE_EL1, ncpu, &c->icc_sre_el1, true);
>>  kvm_gicc_access(s, ICC_CTLR_EL1, ncpu,
>>  &c->icc_ctlr_el1[GICV3_NS], true);
>>  kvm_gicc_access(s, ICC_IGRPEN0_EL1, ncpu,
>> @@ -557,6 +560,7 @@ static void kvm_arm_gicv3_get(GICv3State *s)
>>  GICv3CPUState *c = &s->cpu[ncpu];
>>  int num_pri_bits;
>>
>> +kvm_gicc_access(s, ICC_SRE_EL1, ncpu, &c->icc_sre_el1, false);
>>  kvm_gicc_access(s, ICC_CTLR_EL1, ncpu,
>>  &c->icc_ctlr_el1[GICV3_NS], false);
>>  kvm_gicc_access(s, ICC_IGRPEN0_EL1, ncpu,
>> diff --git a/include/hw/intc/arm_gicv3_common.h 
>> b/include/hw/intc/arm_gicv3_common.h
>> index 341a311..183c7f8 100644
>> --- a/include/hw/intc/arm_gicv3_common.h
>> +++ b/include/hw/intc/arm_gicv3_common.h
>> @@ -166,6 +166,7 @@ struct GICv3CPUState {
>>  uint8_t gicr_ipriorityr[GIC_INTERNAL];
>>
>>  /* CPU interface */
>> +uint64_t icc_sre_el1;
>>  uint64_t icc_ctlr_el1[2];
>>  uint64_t icc_pmr_el1;
>>  uint64_t icc_bpr[3];
>
> If you're adding a new structure field you must also update
> the vmstate struct in arm_gicv3_common.c.

...and if you put the patch which adds the new struct field
to GICv3CPUState and the vmstate first, then you can avoid
bumping the version numbers in vmstate because migration will
have always been disabled. So it's better to do it that way
round.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] include: Add roundup_pow_of_two helper function

2016-11-28 Thread Marcel Apfelbaum

On 11/28/2016 09:18 AM, Yuval Shaia wrote:

Move private implementation of rthe function to osdep.h


Hi Yuval,

In my opinion we need to use the function in at least
two places in order to promote it to a global utility.

You are welcome to try to find another place needing it.

Thanks,
Marcel



Signed-off-by: Yuval Shaia 
---
 hw/pci/shpc.c| 12 +---
 include/qemu/osdep.h | 10 ++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 3dcd472..4b8982d 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -122,16 +122,6 @@
 #define SHPC_PCI_TO_IDX(pci_slot) ((pci_slot) - 1)
 #define SHPC_IDX_TO_PHYSICAL(slot) ((slot) + 1)

-static int roundup_pow_of_two(int x)
-{
-x |= (x >> 1);
-x |= (x >> 2);
-x |= (x >> 4);
-x |= (x >> 8);
-x |= (x >> 16);
-return x + 1;
-}
-
 static uint16_t shpc_get_status(SHPCDevice *shpc, int slot, uint16_t msk)
 {
 uint8_t *status = shpc->config + SHPC_SLOT_STATUS(slot);
@@ -654,7 +644,7 @@ int shpc_init(PCIDevice *d, PCIBus *sec_bus, MemoryRegion 
*bar, unsigned offset)

 int shpc_bar_size(PCIDevice *d)
 {
-return roundup_pow_of_two(SHPC_SLOT_REG(SHPC_MAX_SLOTS));
+return round_up_pow_of_two(SHPC_SLOT_REG(SHPC_MAX_SLOTS));
 }

 void shpc_cleanup(PCIDevice *d, MemoryRegion *bar)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 689f253..74d5c30 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -194,6 +194,16 @@ extern int daemon(int, int);
 #define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
 #endif

+static inline int round_up_pow_of_two(int x)
+{
+   x |= (x >> 1);
+   x |= (x >> 2);
+   x |= (x >> 4);
+   x |= (x >> 8);
+   x |= (x >> 16);
+   return x + 1;
+}
+
 #ifndef DIV_ROUND_UP
 #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
 #endif






[Qemu-devel] [PATCH v3] crypto: add virtio-crypto driver

2016-11-28 Thread Gonglei
This patch introduces virtio-crypto driver for Linux Kernel.

The virtio crypto device is a virtual cryptography device
as well as a kind of virtual hardware accelerator for
virtual machines. The encryption anddecryption requests
are placed in the data queue and are ultimately handled by
thebackend crypto accelerators. The second queue is the
control queue used to create or destroy sessions for
symmetric algorithms and will control some advanced features
in the future. The virtio crypto device provides the following
cryptoservices: CIPHER, MAC, HASH, and AEAD.

For more information about virtio-crypto device, please see:
  http://qemu-project.org/Features/VirtioCrypto

CC: Michael S. Tsirkin 
CC: Cornelia Huck 
CC: Stefan Hajnoczi 
CC: Herbert Xu 
CC: Halil Pasic 
CC: David S. Miller 
CC: Zeng Xin 
Signed-off-by: Gonglei 
---
 MAINTAINERS  |   9 +
 drivers/crypto/Kconfig   |   2 +
 drivers/crypto/Makefile  |   1 +
 drivers/crypto/virtio/Kconfig|  10 +
 drivers/crypto/virtio/Makefile   |   5 +
 drivers/crypto/virtio/virtio_crypto.c| 451 +++
 drivers/crypto/virtio/virtio_crypto_algs.c   | 525 +++
 drivers/crypto/virtio/virtio_crypto_common.h | 124 +++
 drivers/crypto/virtio/virtio_crypto_mgr.c| 258 +
 include/uapi/linux/Kbuild|   1 +
 include/uapi/linux/virtio_crypto.h   | 450 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 12 files changed, 1837 insertions(+)
 create mode 100644 drivers/crypto/virtio/Kconfig
 create mode 100644 drivers/crypto/virtio/Makefile
 create mode 100644 drivers/crypto/virtio/virtio_crypto.c
 create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
 create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
 create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
 create mode 100644 include/uapi/linux/virtio_crypto.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ad9b965..cccaaf0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12810,6 +12810,7 @@ F:  drivers/net/virtio_net.c
 F: drivers/block/virtio_blk.c
 F: include/linux/virtio_*.h
 F: include/uapi/linux/virtio_*.h
+F: drivers/crypto/virtio/
 
 VIRTIO DRIVERS FOR S390
 M: Christian Borntraeger 
@@ -12846,6 +12847,14 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO CRYPTO DRIVER
+M:  Gonglei 
+L:  virtualizat...@lists.linux-foundation.org
+L:  linux-cry...@vger.kernel.org
+S:  Maintained
+F:  drivers/crypto/virtio/
+F:  include/uapi/linux/virtio_crypto.h
+
 VIA RHINE NETWORK DRIVER
 S: Orphan
 F: drivers/net/ethernet/via/via-rhine.c
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4d2b81f..7956478 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -555,4 +555,6 @@ config CRYPTO_DEV_ROCKCHIP
 
 source "drivers/crypto/chelsio/Kconfig"
 
+source "drivers/crypto/virtio/Kconfig"
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index ad7250f..bc53cb8 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
 obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
+obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio/
diff --git a/drivers/crypto/virtio/Kconfig b/drivers/crypto/virtio/Kconfig
new file mode 100644
index 000..ceae88c
--- /dev/null
+++ b/drivers/crypto/virtio/Kconfig
@@ -0,0 +1,10 @@
+config CRYPTO_DEV_VIRTIO
+   tristate "VirtIO crypto driver"
+   depends on VIRTIO
+select CRYPTO_AEAD
+select CRYPTO_AUTHENC
+select CRYPTO_BLKCIPHER
+   default m
+   help
+ This driver provides support for virtio crypto device. If you
+ choose 'M' here, this module will be called virtio-crypto.
diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
new file mode 100644
index 000..a316e92
--- /dev/null
+++ b/drivers/crypto/virtio/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio-crypto.o
+virtio-crypto-objs := \
+   virtio_crypto_algs.o \
+   virtio_crypto_mgr.o \
+   virtio_crypto.o
diff --git a/drivers/crypto/virtio/virtio_crypto.c 
b/drivers/crypto/virtio/virtio_crypto.c
new file mode 100644
index 000..a896f4d
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_crypto.c
@@ -0,0 +1,451 @@
+ /* Driver for Virtio crypto device.
+  *
+  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
+  *
+  * This program is free software; you can redistribute it and/or modify
+  * it under the terms of the GNU General Public License as published by
+  * the Free Software Foundation; either version 2 of the License, or
+  * (at your option) any later version.
+  *
+  * This program is distributed in the hope that it will be

[Qemu-devel] [PATCH v3] virtio-crypto: add Linux driver

2016-11-28 Thread Gonglei
v3:
 - set cpu affinity when data queues are not equal to the number of online 
cpus. [Michael]
 - add TODO comments for cpu hotplug (changing the relationship of binding 
virtqueue and cpu)
 - use __u32/64 in the config space since the virtio->get() doesn't support 
byte-swap yet. [Michael]
 - drop the whole patch 1 of v2 because the above reason.
 - add VERSION_1 check at the beginning of virtcrypto_probe()
 - s/-1/EPERM/g in virtcrypto_update_status(), don't change err to EFAULT then. 
[Michael]
 - add reset operation before delete the virtqueus. [Micheal]
 - drop an unnecessiry spin_lock calling in virtcrypto_freeze(), avoid possible 
dead lock. [Micheal]
 - redefine parameter alg's type in order to use a cast for it. [Michael]
 - pad all structures to have the same size in one union, and add a member to
   show the union's size in virtio_crypto.h. [Michael]
 - update MAINTAINER file to add virtio-crypto stuff to Michael's entry so that
   the corresponding patches can be CC'ed to you too because the virtio-crypto
   doesn't lay in driver/virtio directory. 

The virtio crypto device is a virtual cryptography device
as well as a kind of virtual hardware accelerator for
virtual machines. The encryption anddecryption requests
are placed in the data queue and are ultimately handled by
thebackend crypto accelerators. The second queue is the
control queue used to create or destroy sessions for
symmetric algorithms and will control some advanced features
in the future. The virtio crypto device provides the following
cryptoservices: CIPHER, MAC, HASH, and AEAD.

For more information about virtio-crypto device, please see:
  http://qemu-project.org/Features/VirtioCrypto

For better reviewing:

The patch mainly includes five files:
 1) virtio_crypto.h is the header file for virtio-crypto device,
which is based on the virtio-crypto specification. 
 2) virtio_crypto.c is the entry of the driver module,
which is similar with other virtio devices, such as virtio-net,
virtio-input etc. 
 3) virtio_crypto_mgr.c is used to manage the virtio
crypto devices in the system. We support up to 32 virtio-crypto
devices currently. I use a global list to store the virtio crypto
devices which refer to Intel QAT driver. Meanwhile, the file
includs the functions of add/del/search/start/stop for virtio
crypto devices.
 4) virtio_crypto_common.h is a private header file for virtio
crypto driver, includes structure definations, and function declarations.
 5) virtio_crypto_algs.c is the realization of algs based on Linux Crypto 
Framwork,
which can register different crypto algorithms. Currently it's only support 
AES-CBC.
The Crypto guys can mainly focus on this file. 


v2:
 - stop doing DMA from the stack, CONFIG_VMAP_STACK=y [Salvatore]
 - convert __virtio32/64 to __le32/64 in virtio_crypto.h
 - remove VIRTIO_CRYPTO_S_STARTED based on the lastest virtio crypto spec.
 - introduces the little edian functions for VIRTIO_1 devices in patch 1.


Gonglei (1):
  crypto: add virtio-crypto driver

 MAINTAINERS  |   9 +
 drivers/crypto/Kconfig   |   2 +
 drivers/crypto/Makefile  |   1 +
 drivers/crypto/virtio/Kconfig|  10 +
 drivers/crypto/virtio/Makefile   |   5 +
 drivers/crypto/virtio/virtio_crypto.c| 451 +++
 drivers/crypto/virtio/virtio_crypto_algs.c   | 525 +++
 drivers/crypto/virtio/virtio_crypto_common.h | 124 +++
 drivers/crypto/virtio/virtio_crypto_mgr.c| 258 +
 include/uapi/linux/Kbuild|   1 +
 include/uapi/linux/virtio_crypto.h   | 450 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 12 files changed, 1837 insertions(+)
 create mode 100644 drivers/crypto/virtio/Kconfig
 create mode 100644 drivers/crypto/virtio/Makefile
 create mode 100644 drivers/crypto/virtio/virtio_crypto.c
 create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
 create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
 create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
 create mode 100644 include/uapi/linux/virtio_crypto.h

-- 
1.8.3.1





Re: [Qemu-devel] [qemu patch V3 0/2] improve kvmclock difference on migration

2016-11-28 Thread Marcelo Tosatti
On Mon, Nov 21, 2016 at 08:50:02AM -0200, Marcelo Tosatti wrote:
> See individual patches for details. This patchset depends on kernels
> "[PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master
> clock is in use" from Paolo.

Ping?




[Qemu-devel] [Bug 1563152] Re: general protection fault running VirtualBox in KVM guest

2016-11-28 Thread Robie Basak
** Tags added: needs-upstream-report

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1563152

Title:
  general protection fault running VirtualBox in KVM guest

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  I'm trying to run nested VMs using qemu-kvm on the physical host and 
VirtualBox on the guest host:
    * physical host: Ubuntu 14.04 running Linux 4.2.0, qemu-kvm 2.0.0
    * guest host: Ubuntu 16.04 beta 2 running Linux 4.4.0, VirtualBox 5.0.16

  When I try to start up a VirtualBox VM in the guest host, I get a
  general protection fault (see below for dmesg output).  According to
  https://www.virtualbox.org/ticket/14965 this is caused by a bug in
  QEMU/KVM:

  The problem in more detail:  As written above, VirtualBox tries to
  read the MSR 0x9B (IA32_SMM_MONITOR_CTL).  This is an
  architectural MSR which is present if CPUID.01 / ECX bit 5 or bit
  6 are set (VMX or SMX).  As KVM has nested virtualization enabled
  and therefore pretends to support VT-x, this MSR must be
  accessible and reading from this MSR must not raise a
  #GP.  KVM/QEmu does not behave like real hardware in this case.

  dmesg output:

  SUPR0GipMap: fGetGipCpu=0x3
  general protection fault:  [#1] SMP
  Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) 
vboxdrv(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm 
irqbypass snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core 
snd_hwdep snd_pcm snd_timer i2c_piix4 snd input_leds soundcore joydev 
8250_fintek mac_hid serio_raw pvpanic parport_pc parport ib_iser rdma_cm iw_cm 
ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear crct10dif_pclmul crc32_pclmul qxl ttm drm_kms_helper 
syscopyarea sysfillrect aesni_intel sysimgblt fb_sys_fops aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd psmouse floppy drm pata_acpi
  CPU: 0 PID: 31507 Comm: EMT Tainted: G   OE   4.4.0-15-generic 
#31-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: 880034c0a580 ti: 880002e0 task.ti: 880002e0
  RIP: 0010:[]  [] 0xc067e506
  RSP: 0018:880002e03d70  EFLAGS: 00010206
  RAX: 06f0 RBX: ffdb RCX: 009b
  RDX:  RSI: 880002e03d00 RDI: 880002e03cc8
  RBP: 880002e03d90 R08: 0004 R09: 06f0
  R10: 49656e69 R11: 0f8bfbff R12: 0020
  R13:  R14: c957407c R15: c0645260
  FS:  7f89b8f6b700() GS:88007fc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f89b8d1 CR3: 35ae1000 CR4: 06f0
  Stack:
      
   880002e03db0 c0693e93 c9574010 880035aae550
   880002e03e30 c060a3e7 880002e03e10 0282
  Call Trace:
   [] ? supdrvIOCtl+0x2de7/0x3250 [vboxdrv]
   [] ? VBoxDrvLinuxIOCtl_5_0_16+0x150/0x250 [vboxdrv]
   [] ? do_vfs_ioctl+0x29f/0x490
   [] ? __do_page_fault+0x1b4/0x400
   [] ? SyS_ioctl+0x79/0x90
   [] ? entry_SYSCALL_64_fastpath+0x16/0x71
  Code: 88 e4 fc ff ff b9 3a 00 00 00 0f 32 48 c1 e2 20 89 c0 48 09 d0 48 89 05 
f9 db 0e 00 0f 20 e0 b9 9b 00 00 00 48 89 05 d2 db 0e 00 <0f> 32 48 c1 e2 20 89 
c0 b9 80 00 00 c0 48 09 d0 48 89 05 cb db
  RIP  [] 0xc067e506
   RSP 
  ---[ end trace b3284b6520f49e0d ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1563152/+subscriptions



Re: [Qemu-devel] [PATCH v2 4/4] spec/vhost-user: add VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE

2016-11-28 Thread Marc-André Lureau
Hi

On Thu, Nov 24, 2016 at 7:20 AM Wei Wang  wrote:

> The VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE protocol feature indicates
> that the slave side implementation supports different types of devices.
> The master tells the slave what type of device to create by sending
> the VHOST_USER_SET_DEV_INFO message.
>
> Signed-off-by: Wei Wang 
> ---
>  docs/specs/vhost-user.txt | 21 -
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index fdc99ea..da1314d 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -264,11 +264,12 @@ restarted.
>  Protocol features
>  -
>
> -#define VHOST_USER_PROTOCOL_F_MQ 0
> -#define VHOST_USER_PROTOCOL_F_LOG_SHMFD  1
> -#define VHOST_USER_PROTOCOL_F_RARP   2
> -#define VHOST_USER_PROTOCOL_F_REPLY_ACK  3
> -#define VHOST_USER_PROTOCOL_F_VHOST_PCI  4
> +#define VHOST_USER_PROTOCOL_F_MQ   0
> +#define VHOST_USER_PROTOCOL_F_LOG_SHMFD1
> +#define VHOST_USER_PROTOCOL_F_RARP 2
> +#define VHOST_USER_PROTOCOL_F_REPLY_ACK3
> +#define VHOST_USER_PROTOCOL_F_VHOST_PCI4
> +#define VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE  5
>

 I would rather name it after the message,
VHOST_USER_PROTOCOL_F_SET_DEV_INFO

>
>  Message types
>  -
> @@ -514,6 +515,16 @@ Message types
>#define VHOST_USER_SET_PEER_CONNECTION_F_CREATE2
>#define VHOST_USER_SET_PEER_CONNECTION_F_DESTROY   3
>
> + * VHOST_USER_SET_DEV_INFO
> +
> +  Id: 21
> +  Equivalent ioctl: N/A
> +  Master payload: u64
>
+
> +  The master sends the device type info to the slave.
>

What is the meaning of this payload? Is it the virtio device id? better be
explicit about it.

If it is the case, I would name the message "VHOST_USER_SET_DEVICE_ID".



> +  This request should be sent only when
> VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE
> +  has been negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  ---
>  The original vhost-user specification only demands replies for certain
> --
> 2.7.4
>
> --
Marc-André Lureau


Re: [Qemu-devel] [PATCH 2/2] virtio-gpu: call cleanup mapping function in resource destroy

2016-11-28 Thread Marc-André Lureau
Hi

On Thu, Nov 24, 2016 at 4:30 PM Li Qiang  wrote:

> If the guest destroy the resource before detach banking, the 'iov'
> and 'addrs' field in resource is not freed thus leading memory
> leak issue. This patch avoid this.
>
>

That looks correct to me.

Reviewed-by: Marc-André Lureau 


Please squash patch 1 (no reason for it to be split).




> Signed-off-by: Li Qiang 
> ---
>  hw/display/virtio-gpu.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index cc725a6..98dadf2 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -360,6 +360,7 @@ static void virtio_gpu_resource_destroy(VirtIOGPU *g,
>  struct virtio_gpu_simple_resource
> *res)
>  {
>  pixman_image_unref(res->image);
> +virtio_gpu_cleanup_mapping(res);
>  QTAILQ_REMOVE(&g->reslist, res, next);
>  g_free(res);
>  }
> --
> 1.8.3.1
>
> --
Marc-André Lureau


Re: [Qemu-devel] [PATCH v6 4/4] hw/intc/arm_gicv3_kvm: Reset GICv3 cpu interface registers

2016-11-28 Thread Peter Maydell
On 23 November 2016 at 12:39,   wrote:
> From: Vijaya Kumar K 
>
> Reset CPU interface registers of GICv3 when CPU is reset.
> For this, object interface is used, which is called from
> arm_cpu_reset function.
>
> Signed-off-by: Vijaya Kumar K 

This approach doesn't handle the SMP case correctly --
when a CPU is reset then the CPU interface for that CPU
(and only that CPU) should be reset. Your code will
reset every CPU interface every time any CPU is reset.

I think it would be better to use the same approach that
the arm_gicv3_cpuif.c code uses to arrange for cpu i/f
registers to be reset, perhaps by moving the appropriate
parts of that code into the common source file.

Having the reset state depend implicitly on the kernel's
internal state (as you have here for the ICC_CTLR_EL1
state) is something I'm a bit unsure about -- what goes
wrong if you don't do that?

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3] crypto: add virtio-crypto driver

2016-11-28 Thread Cornelia Huck
On Mon, 28 Nov 2016 20:08:23 +0800
Gonglei  wrote:

> +static int virtcrypto_update_status(struct virtio_crypto *vcrypto)
> +{
> + u32 status;
> + int err;
> +
> + virtio_cread(vcrypto->vdev,
> + struct virtio_crypto_config, status, &status);
> +
> + /* Ignore unknown (future) status bits */
> + status &= VIRTIO_CRYPTO_S_HW_READY;

I'm wondering what the driver really should do if it encounters unknown
status bits.

I'd expect that new status bits are guarded by a feature bit and that
the device should not set status bits if the respective feature bit has
not been negotiated. Therefore, unknown status bits would be a host
error and the driver should consider the device to be broken.

Thoughts?

> +
> + if (vcrypto->status == status)
> + return 0;
> +
> + vcrypto->status = status;
> +
> + if (vcrypto->status & VIRTIO_CRYPTO_S_HW_READY) {
> + err = virtcrypto_dev_start(vcrypto);
> + if (err) {
> + dev_err(&vcrypto->vdev->dev,
> + "Failed to start virtio crypto device.\n");
> + virtcrypto_dev_stop(vcrypto);
> + return -EPERM;
> + }
> + dev_info(&vcrypto->vdev->dev, "Accelerator is ready\n");
> + } else {
> + virtcrypto_dev_stop(vcrypto);
> + dev_info(&vcrypto->vdev->dev, "Accelerator is not ready\n");
> + }
> +
> + return 0;
> +}
> +




Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Peter Maydell
On 28 November 2016 at 11:58, Andrew Jones  wrote:
> On Mon, Nov 28, 2016 at 11:14:48AM +, Peter Maydell wrote:
>> On 28 November 2016 at 11:12, Alex Bennée  wrote:
>> >
>> > Andrew Jones  writes:
>> >> I've skimmed over everything looking at it from a framwork/sytle
>> >> perspective. I didn't dig in trying to understand the tests though.
>> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>> >
>> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>>
>> So large that you don't want to hardcode it as an array size...
>
> 255 with the gic series, not yet merged.

I was talking about the architectural GICv3 limit, which is larger
than that by many orders of magnitude. For QEMU it looks like
MAX_CPUMASK_BITS is now 288 rather than 255.

thanks
-- PMM



[Qemu-devel] [PATCH 1/2] migration/pcspk: Add a property to state if pcspk is migrated

2016-11-28 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

Allow us to turn migration of pcspk off for compatibility.

Signed-off-by: Dr. David Alan Gilbert 
---
 hw/audio/pcspk.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/audio/pcspk.c b/hw/audio/pcspk.c
index 984534b..7980022 100644
--- a/hw/audio/pcspk.c
+++ b/hw/audio/pcspk.c
@@ -54,6 +54,7 @@ typedef struct {
 unsigned int play_pos;
 uint8_t data_on;
 uint8_t dummy_refresh_clock;
+bool migrate;
 } PCSpkState;
 
 static const char *s_spk = "pcspk";
@@ -187,11 +188,19 @@ static void pcspk_realizefn(DeviceState *dev, Error 
**errp)
 pcspk_state = s;
 }
 
+static bool migrate_needed(void *opaque)
+{
+PCSpkState *s = opaque;
+
+return s->migrate;
+}
+
 static const VMStateDescription vmstate_spk = {
 .name = "pcspk",
 .version_id = 1,
 .minimum_version_id = 1,
 .minimum_version_id_old = 1,
+.needed = migrate_needed,
 .fields  = (VMStateField[]) {
 VMSTATE_UINT8(data_on, PCSpkState),
 VMSTATE_UINT8(dummy_refresh_clock, PCSpkState),
@@ -201,6 +210,7 @@ static const VMStateDescription vmstate_spk = {
 
 static Property pcspk_properties[] = {
 DEFINE_PROP_UINT32("iobase", PCSpkState, iobase,  -1),
+DEFINE_PROP_BOOL("migrate", PCSpkState, migrate,  true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.9.3




[Qemu-devel] [PATCH 0/2] pcspk migration compatibility

2016-11-28 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

Hi,
  39c88f56 added VMState for pcspk but turned it on for
all machine types, this breaks backwards compatibility
to older machine types.

  If this is too late for 2.8 then I suggest we take the 1st
of these two patches, which just makes it a property to flip
for those of us who need it.  Adding the 2nd patch after 2.8
might cause problems for people using 2.8 with 2.7 machine
type.

Dave

Dr. David Alan Gilbert (2):
  migration/pcspk: Add a property to state if pcspk is migrated
  migration/pcspk: Turn migration of pcspk off for 2.7 and older

 hw/audio/pcspk.c | 10 ++
 include/hw/i386/pc.h |  5 +
 2 files changed, 15 insertions(+)

-- 
2.9.3




Re: [Qemu-devel] [virtio-comment] Re: [PATCH v2 4/4] spec/vhost-user: add VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE

2016-11-28 Thread Wei Wang

On 11/28/2016 08:41 PM, Marc-André Lureau wrote:

Hi

On Thu, Nov 24, 2016 at 7:20 AM Wei Wang > wrote:


The VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE protocol feature indicates
that the slave side implementation supports different types of
devices.
The master tells the slave what type of device to create by sending
the VHOST_USER_SET_DEV_INFO message.

Signed-off-by: Wei Wang mailto:wei.w.w...@intel.com>>
---
 docs/specs/vhost-user.txt | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index fdc99ea..da1314d 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -264,11 +264,12 @@ restarted.
 Protocol features
 -

-#define VHOST_USER_PROTOCOL_F_MQ 0
-#define VHOST_USER_PROTOCOL_F_LOG_SHMFD  1
-#define VHOST_USER_PROTOCOL_F_RARP   2
-#define VHOST_USER_PROTOCOL_F_REPLY_ACK  3
-#define VHOST_USER_PROTOCOL_F_VHOST_PCI  4
+#define VHOST_USER_PROTOCOL_F_MQ   0
+#define VHOST_USER_PROTOCOL_F_LOG_SHMFD1
+#define VHOST_USER_PROTOCOL_F_RARP 2
+#define VHOST_USER_PROTOCOL_F_REPLY_ACK3
+#define VHOST_USER_PROTOCOL_F_VHOST_PCI4
+#define VHOST_USER_PROTOCOL_F_VERSATILE_SLAVE  5


 I would rather name it after the message, 
VHOST_USER_PROTOCOL_F_SET_DEV_INFO


OK, I will take this suggestion. Thanks. <*v2-AR1*>



 Message types
 -
@@ -514,6 +515,16 @@ Message types
   #define VHOST_USER_SET_PEER_CONNECTION_F_CREATE 2
   #define VHOST_USER_SET_PEER_CONNECTION_F_DESTROY  3

+ * VHOST_USER_SET_DEV_INFO
+
+  Id: 21
+  Equivalent ioctl: N/A
+  Master payload: u64

+
+  The master sends the device type info to the slave.


What is the meaning of this payload? Is it the virtio device id? 
better be explicit about it.


If it is the case, I would name the message "VHOST_USER_SET_DEVICE_ID".


Yes, currently we only have the virtio device id as the payload. I was 
thinking that in the future we would have other more info about the 
device. But no problem, I will change to use "VHOST_USER_SET_DEVICE_ID" 
for now. <*v2-AR2*>



Best,
Wei



[Qemu-devel] [PATCH 2/2] migration/pcspk: Turn migration of pcspk off for 2.7 and older

2016-11-28 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

To keep backwards migration compatibility allow us to turn pcspk
migration off.

Signed-off-by: Dr. David Alan Gilbert 
---
 include/hw/i386/pc.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 67a1a9e..4b74130 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -395,6 +395,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 .driver   = "Opteron_G3" "-" TYPE_X86_CPU,\
 .property = "stepping",\
 .value= "1",\
+},\
+{\
+.driver   = "isa-pcspk",\
+.property = "migrate",\
+.value= "off",\
 },
 
 #define PC_COMPAT_2_6 \
-- 
2.9.3




Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration

2016-11-28 Thread Paolo Bonzini


On 17/11/2016 13:16, Marcelo Tosatti wrote:
> What QEMU wants is to use KVM_GET_CLOCK at pre_save independently
> of whether masterclock is enabled or not... it just depends
> on KVM_GET_CLOCK being correct for the masterclock case
> (108b249c453dd7132599ab6dc7e435a7036c193f).
> 
> So a "reliable KVM_GET_CLOCK" (that does not timebackward
> when masterclock is enabled) is much simpler to userspace
> than "whether masterclock is enabled or not".
> 
> If you have a reason why that should not be the case,
> let me know.

I still cannot understand this case.

If the source has masterclock _disabled_, shouldn't it read kvmclock 
from memory?  But that's not what your patch does.  To be clear, what
I mean is:

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 653b0b4..6f6e2dc 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -97,6 +97,7 @@ static uint64_t kvm_get_clock(void)
 fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
 abort();
 }
+s->src_use_reliable_get_clock = data.flags & KVM_CLOCK_TSC_STABLE;
 return data.clock;
 }
 
@@ -110,34 +111,19 @@ static void kvmclock_vm_state_change(void *opaque, int 
running,
 
 if (running) {
 struct kvm_clock_data data = {};
-uint64_t pvclock_via_mem = 0;
 
-/* local (running VM) restore */
-if (s->clock_valid) {
-/*
- * if host does not support reliable KVM_GET_CLOCK,
- * read kvmclock value from memory
- */
-if (!kvm_has_adjust_clock_stable()) {
-pvclock_via_mem = kvmclock_current_nsec(s);
-}
-/* migration/savevm/init restore */
-} else {
-/*
- * use s->clock in case machine uses reliable
- * get clock and source host supported
- * reliable get clock
- */
-if (!s->src_use_reliable_get_clock) {
-pvclock_via_mem = kvmclock_current_nsec(s);
+/*
+ * if last KVM_GET_CLOCK did not retrieve a reliable value,
+ * we can't rely on the saved clock value.  Just discard it and
+ * read kvmclock value from memory
+ */
+if (!s->src_use_reliable_get_clock) {
+uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
+if (pvclock_via_mem) {
+s->clock = pvclock_via_mem;
 }
 }
 
-/* We can't rely on the saved clock value, just discard it */
-if (pvclock_via_mem) {
-s->clock = pvclock_via_mem;
-}
-
 s->clock_valid = false;
 
 data.clock = s->clock;
@@ -181,16 +168,6 @@ static void kvmclock_realize(DeviceState *dev, Error 
**errp)
 {
 KVMClockState *s = KVM_CLOCK(dev);
 
-/*
- * On machine types that support reliable KVM_GET_CLOCK,
- * if host kernel does provide reliable KVM_GET_CLOCK,
- * set src_use_reliable_get_clock=true so that destination
- * avoids reading kvmclock from memory.
- */
-if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
-s->src_use_reliable_get_clock = true;
-}
-
 qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
 }
 



Re: [Qemu-devel] [PATCH v4] target-ppc: add vextu[bhw][lr]x instructions

2016-11-28 Thread Richard Henderson
On 11/27/2016 11:56 PM, Nikunj A Dadhania wrote:
> From: Avinesh Kumar 
> 
> vextublx: Vector Extract Unsigned Byte Left
> vextuhlx: Vector Extract Unsigned Halfword Left
> vextuwlx: Vector Extract Unsigned Word Left
> vextubrx: Vector Extract Unsigned Byte Right-Indexed VX-form
> vextuhrx: Vector Extract Unsigned  Halfword Right-Indexed VX-form
> vextuwrx: Vector Extract Unsigned Word Right-Indexed VX-form
> 
> Signed-off-by: Avinesh Kumar 
> Signed-off-by: Hariharan T.S. 
> [ implement using int128_rshift ]
> Signed-off-by: Nikunj A Dadhania 
> ---
> 
> v3:
> * Add the missing int128_getlo in the routine
> ---
>  target-ppc/cpu.h|  2 ++
>  target-ppc/helper.h |  6 ++
>  target-ppc/int_helper.c | 36 
>  target-ppc/translate/vmx-impl.inc.c | 23 +++
>  target-ppc/translate/vmx-ops.inc.c  |  8 ++--
>  5 files changed, 73 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 


r~



[Qemu-devel] [PULL 2/2] arm: Create /chosen and /memory devicetree nodes if necessary

2016-11-28 Thread Peter Maydell
From: Guenter Roeck 

While customary, the /chosen and /memory devicetree nodes do not have to
exist. Create if necessary. Also create the /memory/device_type property
if needed.

Signed-off-by: Guenter Roeck 
Message-id: 1479346221-18474-1-git-send-email-li...@roeck-us.net
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/arm/boot.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 942416d..ff621e4 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include 
 #include "hw/hw.h"
 #include "hw/arm/arm.h"
 #include "hw/arm/linux-boot-if.h"
@@ -486,6 +487,17 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 g_free(nodename);
 }
 } else {
+Error *err = NULL;
+
+rc = fdt_path_offset(fdt, "/memory");
+if (rc < 0) {
+qemu_fdt_add_subnode(fdt, "/memory");
+}
+
+if (!qemu_fdt_getprop(fdt, "/memory", "device_type", NULL, &err)) {
+qemu_fdt_setprop_string(fdt, "/memory", "device_type", "memory");
+}
+
 rc = qemu_fdt_setprop_sized_cells(fdt, "/memory", "reg",
   acells, binfo->loader_start,
   scells, binfo->ram_size);
@@ -495,6 +507,11 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 }
 }
 
+rc = fdt_path_offset(fdt, "/chosen");
+if (rc < 0) {
+qemu_fdt_add_subnode(fdt, "/chosen");
+}
+
 if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
 rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
  binfo->kernel_cmdline);
-- 
2.7.4




[Qemu-devel] [PULL 0/2] target-arm queue

2016-11-28 Thread Peter Maydell
target-arm queue for 2.8 rc2: just two bugfixes.

thanks
-- PMM

The following changes since commit 00227fefd2059464cd2f59aed29944874c630e2f:

  Update version for v2.8.0-rc1 release (2016-11-22 22:29:08 +)

are available in the git repository at:

  git://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20161128

for you to fetch changes up to b77257d7bae26a0fca6a90af88d54ee2c45f5b61:

  arm: Create /chosen and /memory devicetree nodes if necessary (2016-11-28 
11:32:34 +)


target-arm queue:
 * hw/arm/boot: fix crash handling device trees with no /chosen
   or /memory nodes
 * generic-loader: only set PC if a CPU is specified


Alistair Francis (1):
  generic-loader: file: Only set a PC if a CPU is specified

Guenter Roeck (1):
  arm: Create /chosen and /memory devicetree nodes if necessary

 hw/arm/boot.c| 17 +
 hw/core/generic-loader.c |  7 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)



[Qemu-devel] [PULL 1/2] generic-loader: file: Only set a PC if a CPU is specified

2016-11-28 Thread Peter Maydell
From: Alistair Francis 

This patch fixes the generic-loader file loading to only set the program
counter if a CPU is specified. This follows what is written in the
documentation and was always part of the original intention.

Signed-off-by: Alistair Francis 
Reviewed-by: Edgar E. Iglesias 
Message-id: 
537bf4d08be7acf7a89b590cff69e19db7f0a6cd.1478908712.git.alistair.fran...@xilinx.com
Signed-off-by: Peter Maydell 
---
 hw/core/generic-loader.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index 79ab6df..208f549 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -93,7 +93,12 @@ static void generic_loader_realize(DeviceState *dev, Error 
**errp)
"image");
 return;
 }
-s->set_pc = true;
+/* The user specified a file, only set the PC if they also specified
+ * a CPU to use.
+ */
+if (s->cpu_num != CPU_NONE) {
+s->set_pc = true;
+}
 } else if (s->addr) {
 /* User is setting the PC */
 if (s->data || s->data_len || s->data_be) {
-- 
2.7.4




Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Andrew Jones
On Mon, Nov 28, 2016 at 01:30:54PM +, Peter Maydell wrote:
> On 28 November 2016 at 11:58, Andrew Jones  wrote:
> > On Mon, Nov 28, 2016 at 11:14:48AM +, Peter Maydell wrote:
> >> On 28 November 2016 at 11:12, Alex Bennée  wrote:
> >> >
> >> > Andrew Jones  writes:
> >> >> I've skimmed over everything looking at it from a framwork/sytle
> >> >> perspective. I didn't dig in trying to understand the tests though.
> >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >> >
> >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> >>
> >> So large that you don't want to hardcode it as an array size...
> >
> > 255 with the gic series, not yet merged.
> 
> I was talking about the architectural GICv3 limit, which is larger
> than that by many orders of magnitude. For QEMU it looks like
> MAX_CPUMASK_BITS is now 288 rather than 255.

Ah, yeah. So far we haven't considered testing limits beyond what
KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
patience, we could attempt to test bigger limits. In that case,
though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
and run a specific unit test.

mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
to bump that too if we want to experiment.

Thanks,
drew



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Andrew Jones
On Mon, Nov 28, 2016 at 03:04:45PM +0100, Andrew Jones wrote:
> On Mon, Nov 28, 2016 at 01:30:54PM +, Peter Maydell wrote:
> > On 28 November 2016 at 11:58, Andrew Jones  wrote:
> > > On Mon, Nov 28, 2016 at 11:14:48AM +, Peter Maydell wrote:
> > >> On 28 November 2016 at 11:12, Alex Bennée  wrote:
> > >> >
> > >> > Andrew Jones  writes:
> > >> >> I've skimmed over everything looking at it from a framwork/sytle
> > >> >> perspective. I didn't dig in trying to understand the tests though.
> > >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> > >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> > >> >
> > >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> > >>
> > >> So large that you don't want to hardcode it as an array size...
> > >
> > > 255 with the gic series, not yet merged.
> > 
> > I was talking about the architectural GICv3 limit, which is larger
> > than that by many orders of magnitude. For QEMU it looks like
> > MAX_CPUMASK_BITS is now 288 rather than 255.
> 
> Ah, yeah. So far we haven't considered testing limits beyond what
> KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
> patience, we could attempt to test bigger limits. In that case,
> though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
> and run a specific unit test.
> 
> mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
> to bump that too if we want to experiment.

Er... actually mach-virt is 123, as we only allocate 123 redistributors.

drew



Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases

2016-11-28 Thread Peter Maydell
On 28 November 2016 at 14:07, Andrew Jones  wrote:
> Er... actually mach-virt is 123, as we only allocate 123 redistributors.

Oh yes, I'd forgotten about that limit. We'd need to add
a KVM API for allocating redistributors in non-contiguous
bits of memory if we wanted to raise that.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2] pci-assign: sync MSI/MSI-X cap and table with PCIDevice

2016-11-28 Thread Paolo Bonzini


On 25/11/2016 18:05, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 10:55:22AM +0800, Peter Xu wrote:
>> Since commit e1d4fb2d ("kvm-irqchip: x86: add msi route notify fn"),
>> kvm_irqchip_add_msi_route() starts to use pci_get_msi_message() to fetch
>> MSI info. This requires that we setup MSI related fields in PCIDevice.
>> For most devices, that won't be a problem, as long as we are using
>> general interfaces like msi_init()/msix_init().
>>
>> However, for pci-assign devices, MSI/MSI-X is treated differently - PCI
>> assign devices are maintaining its own MSI table and cap information in
>> AssignedDevice struct. however that's not synced up with PCIDevice's
>> fields. That will leads to pci_get_msi_message() failed to find correct
>> MSI capability, even with an NULL msix_table.
>>
>> A quick fix is to sync up the two places: both the capability bits and
>> table address for MSI/MSI-X.
>>
>> Reported-by: Changlimin 
>> Tested-by: Changlimin 
>> Cc: qemu-sta...@nongnu.org
>> Fixes: e1d4fb2d ("kvm-irqchip: x86: add msi route notify fn")
>> Signed-off-by: Peter Xu 
> 
> 
> Reviewed-by: Michael S. Tsirkin 
> 
> Paolo, want to pick this up?

Yes.

Paolo


> 
>> ---
>> Do we still support pci-assign?
>>
>> v2:
>> - add (uint8_t *) for msix_table assignment [Limin]
>> ---
>>  hw/i386/kvm/pci-assign.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
>> index 8238fbc..87dcbdd 100644
>> --- a/hw/i386/kvm/pci-assign.c
>> +++ b/hw/i386/kvm/pci-assign.c
>> @@ -1251,6 +1251,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
>> *pci_dev, Error **errp)
>>  error_propagate(errp, local_err);
>>  return -ENOTSUP;
>>  }
>> +dev->dev.cap_present |= QEMU_PCI_CAP_MSI;
>>  dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
>>  /* Only 32-bit/no-mask currently supported */
>>  ret = pci_add_capability2(pci_dev, PCI_CAP_ID_MSI, pos, 10,
>> @@ -1285,6 +1286,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
>> *pci_dev, Error **errp)
>>  error_propagate(errp, local_err);
>>  return -ENOTSUP;
>>  }
>> +dev->dev.cap_present |= QEMU_PCI_CAP_MSIX;
>>  dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
>>  ret = pci_add_capability2(pci_dev, PCI_CAP_ID_MSIX, pos, 12,
>>&local_err);
>> @@ -1648,6 +1650,7 @@ static void 
>> assigned_dev_register_msix_mmio(AssignedDevice *dev, Error **errp)
>>  dev->msix_table = NULL;
>>  return;
>>  }
>> +dev->dev.msix_table = (uint8_t *)dev->msix_table;
>>  
>>  assigned_dev_msix_reset(dev);
>>  
>> @@ -1665,6 +1668,7 @@ static void 
>> assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>>  error_report("error unmapping msix_table! %s", strerror(errno));
>>  }
>>  dev->msix_table = NULL;
>> +dev->dev.msix_table = NULL;
>>  }
>>  
>>  static const VMStateDescription vmstate_assigned_device = {
>> -- 
>> 2.7.4
>>



Re: [Qemu-devel] [qemu patch V3 2/2] kvmclock: reduce kvmclock difference on migration

2016-11-28 Thread Eduardo Habkost
Sorry for not noticing the following issues on the previous
reviews. I was only paying attention to the vmstate and machine
code:

On Mon, Nov 21, 2016 at 08:50:04AM -0200, Marcelo Tosatti wrote:
> Check for KVM_CAP_ADJUST_CLOCK capability KVM_CLOCK_TSC_STABLE, which
> indicates that KVM_GET_CLOCK returns a value as seen by the guest at
> that moment.
> 
> For new machine types, use this value rather than reading 
> from guest memory.
> 
> This reduces kvmclock difference on migration from 5s to 0.1s
> (when max_downtime == 5s).
> 
> Signed-off-by: Marcelo Tosatti 
> 
> ---
>  hw/i386/kvm/clock.c|  107 
> ++---
>  include/hw/i386/pc.h   |5 ++
>  target-i386/kvm.c  |7 +++
>  target-i386/kvm_i386.h |1 
>  4 files changed, 106 insertions(+), 14 deletions(-)
> 
> v2: 
> - improve variable names (Juan)
> - consolidate code on kvm_get_clock function (Paolo)
> - return mach_use_reliable_get_clock from needed function (Paolo)
> v3: 
> - simplify check for src_use_reliable_get_clock (Eduardo)
> - move src_use_reliable_get_clock initialization to realize (Eduardo)
> 
> 
> Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
> ===
> --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c   2016-11-17 
> 15:07:11.220632761 -0200
> +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c2016-11-17 
> 15:11:51.372048640 -0200
> @@ -36,6 +36,12 @@
>  
>  uint64_t clock;
>  bool clock_valid;
> +
> +/* whether machine supports reliable KVM_GET_CLOCK */
> +bool mach_use_reliable_get_clock;
> +
> +/* whether source host supported reliable KVM_GET_CLOCK */
> +bool src_use_reliable_get_clock;
>  } KVMClockState;
>  
>  struct pvclock_vcpu_time_info {
> @@ -81,6 +87,19 @@
>  return nsec + time.system_time;
>  }
>  
> +static uint64_t kvm_get_clock(void)
> +{
> +struct kvm_clock_data data;
> +int ret;
> +
> +ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> +if (ret < 0) {
> +fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> +abort();
> +}
> +return data.clock;
> +}
> +
>  static void kvmclock_vm_state_change(void *opaque, int running,
>   RunState state)
>  {
> @@ -91,15 +110,36 @@

Can you please use "diff -p" on your patches?

>  
>  if (running) {
>  struct kvm_clock_data data = {};
> -uint64_t time_at_migration = kvmclock_current_nsec(s);
> +uint64_t pvclock_via_mem = 0;
>  
> -s->clock_valid = false;
> +/* local (running VM) restore */
> +if (s->clock_valid) {
> +/*
> + * if host does not support reliable KVM_GET_CLOCK,
> + * read kvmclock value from memory
> + */
> +if (!kvm_has_adjust_clock_stable()) {

Why not use src_use_reliable_get_clock here, too? (Maybe rename
it to simply clock_is_reliable?)

You can change kvm_get_clock() to kvm_get_clock(KVMClockState*),
and make it update both s->clock and s->clock_is_reliable. With
this, s->clock and s->clock_is_reliable will be always in sync,
and have a single code path for both local restore and migration:

/*
 * If host that set s->clock did not support reliable
 * KVM_GET_CLOCK, read kvmclock value from memory
 */
if (!s->clock_is_reliable) {
pvclock_via_mem = kvmclock_current_nsec(s);
}

Updating the table from my previous message, the result would be:

++---+---+-+
| kvm_has_adj_clock_stable() | clock_is_reliable   |
machine | src| dst   | src   | dst (*) |
++---+---+-+
pc-2.8  | false  | false | false | false   |
pc-2.8  | false  | true  | false | false   |
pc-2.8  | true   | false | true  | true|
pc-2.8  | true   | true  | true  | true|
++---+---+-+
pc-2.7  | false  | (any) | false | false   |
pc-2.7  | true   | (any) | true (**) | false   |
++---+---+-+

(*) More precisely: in the first kvmclock_vm_state_change() call
just after migration. It will get updated the next time
s->clock is set.

(**) On purpose, so the code I suggest above would work for local
 restore, too.


> +pvclock_via_mem = kvmclock_current_nsec(s);
> +}
> +/* migration/savevm/init restore */
> +} else {
> +/*
> + * use s->clock in case machine uses reliable
> + * get clock and source host supported
> + * reliable get clock
> + */
> +if (!s->src_use_reliable_get_clock) {
> +pvclock_via_mem = kvmclock_current_nsec(s);
> +}
> +}
>  
> -   

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-11-28 Thread Stefan Hajnoczi
On Mon, Nov 28, 2016 at 10:23:41AM +, Ketan Nilangekar wrote:
> 
> 
> On 11/25/16, 5:05 PM, "Stefan Hajnoczi"  wrote:
> 
> On Fri, Nov 25, 2016 at 08:27:26AM +, Ketan Nilangekar wrote:
> > On 11/24/16, 9:38 PM, "Stefan Hajnoczi"  wrote:
> > On Thu, Nov 24, 2016 at 11:31:14AM +, Ketan Nilangekar wrote:
> > > On 11/24/16, 4:41 PM, "Stefan Hajnoczi"  
> wrote:
> > > On Thu, Nov 24, 2016 at 05:44:37AM +, Ketan Nilangekar 
> wrote:
> > > > On 11/24/16, 4:07 AM, "Paolo Bonzini"  
> wrote:
> > > > >On 23/11/2016 23:09, ashish mittal wrote:
> > > > >> On the topic of protocol security -
> > > > >> 
> > > > >> Would it be enough for the first patch to implement only
> > > > >> authentication and not encryption?
> > > > >
> > > > >Yes, of course.  However, as we introduce more and more 
> QEMU-specific
> > > > >characteristics to a protocol that is already 
> QEMU-specific (it doesn't
> > > > >do failover, etc.), I am still not sure of the actual 
> benefit of using
> > > > >libqnio versus having an NBD server or FUSE driver.
> > > > >
> > > > >You have already mentioned performance, but the design has 
> changed so
> > > > >much that I think one of the two things has to change: 
> either failover
> > > > >moves back to QEMU and there is no (closed source) 
> translator running on
> > > > >the node, or the translator needs to speak a well-known and
> > > > >already-supported protocol.
> > > > 
> > > > IMO design has not changed. Implementation has changed 
> significantly. I would propose that we keep resiliency/failover code out of 
> QEMU driver and implement it entirely in libqnio as planned in a subsequent 
> revision. The VxHS server does not need to understand/handle failover at all. 
> > > > 
> > > > Today libqnio gives us significantly better performance 
> than any NBD/FUSE implementation. We know because we have prototyped with 
> both. Significant improvements to libqnio are also in the pipeline which will 
> use cross memory attach calls to further boost performance. Ofcourse a big 
> reason for the performance is also the HyperScale storage backend but we 
> believe this method of IO tapping/redirecting can be leveraged by other 
> solutions as well.
> > > 
> > > By "cross memory attach" do you mean
> > > process_vm_readv(2)/process_vm_writev(2)?
> > >   
> > > Ketan> Yes.
> > >   
> > > That puts us back to square one in terms of security.  You 
> have
> > > (untrusted) QEMU + (untrusted) libqnio directly accessing the 
> memory of
> > > another process on the same machine.  That process is 
> therefore also
> > > untrusted and may only process data for one guest so that 
> guests stay
> > > isolated from each other.
> > > 
> > > Ketan> Understood but this will be no worse than the current 
> network based communication between qnio and vxhs server. And although we 
> have questions around QEMU trust/vulnerability issues, we are looking to 
> implement basic authentication scheme between libqnio and vxhs server.
> > 
> > This is incorrect.
> > 
> > Cross memory attach is equivalent to ptrace(2) (i.e. debugger) 
> access.
> > It means process A reads/writes directly from/to process B memory.  
> Both
> > processes must have the same uid/gid.  There is no trust boundary
> > between them.
> > 
> > Ketan> Not if vxhs server is running as root and initiating the cross 
> mem attach. Which is also why we are proposing a basic authentication 
> mechanism between qemu-vxhs. But anyway the cross memory attach is for a near 
> future implementation. 
> > 
> > Network communication does not require both processes to have the 
> same
> > uid/gid.  If you want multiple QEMU processes talking to a single 
> server
> > there must be a trust boundary between client and server.  The 
> server
> > can validate the input from the client and reject undesired 
> operations.
> > 
> > Ketan> This is what we are trying to propose. With the addition of 
> authentication between qemu-vxhs server, we should be able to achieve this. 
> Question is, would that be acceptable?
> > 
> > Hope this makes sense now.
> > 
> > Two architectures that implement the QEMU trust model correctly are:
> > 
> > 1. Cross memory attach: each QEMU process has a dedicated vxhs 
> server
> >process to prevent guests from attacking each other.  This is 
> where I
> >said you might as well put the code inside QEMU since there is no
>

Re: [Qemu-devel] [PATCH v3 10/11] tcg-mips: Adjust qemu_ld/st for mips64

2016-11-28 Thread Richard Henderson
On 11/27/2016 10:59 PM, Jin Guojie wrote:
> In Richard's v2 patch (shown as below), the compilation on mips64 host is 
> disabled.
> 
> -#define LO_OFF(MIPS_BE * 4)
> -#define HI_OFF(4 - LO_OFF)
> +#if TCG_TARGET_REG_BITS == 32
> +# define LO_OFF  (MIPS_BE * 4)
> +# define HI_OFF  (4 - LO_OFF)
> +#else
> +extern int link_error(void);
> +# define LO_OFF  link_error()
> +# define HI_OFF  link_error()
> +#endif
> 
> When I compiled this patch on Loongson as mips64el, a link error occured:
> 
>   LINKi386-softmmu/qemu-system-i386
> tcg/tcg.o: In function `tcg_out_tlb_load':
> tcg-target.inc.c:1252: undefined reference to `link_error'

The intent of the link_error is to ensure that all paths that lead to the use
of the symbol are eliminated by the compiler.  Therefore all uses must be
protected by some trivially eliminated condition like

  TCG_TARGET_REG_BITS < TARGET_LONG_BITS

which can only be true for 32-bit host and 64-bit guest.

Within the code that I have here, I only see two uses of LO/HI_OFF within
tcg_out_tlb_load, both of which are protected by this condition.

 /* Load the (low half) tlb comparator.  Mask the page bits, keeping the
alignment bits to compare against.  */
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
 tcg_out_ld(s, TCG_TYPE_I32, TCG_TMP0, TCG_REG_A0, cmp_off + LO_OFF);
 tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, mask);
...
 /* Load and test the high half tlb comparator.  */
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
 /* delay slot */
 tcg_out_ld(s, TCG_TYPE_I32, TCG_TMP0, TCG_REG_A0, cmp_off + HI_OFF);

The line numbers don't match up with what you quote, so I'm curious exactly
which use is generating the error.


> Frankly speaking, I didn't take n32 into consideration until you pointed out 
> that.
> I feel that n32 is rarely used in current and future market. I can not even 
> find an n32 
> debian distribution in Aurelien's qemu image collection.
> How about leave n32 as a TODO feature?

You're probably right.  Especially in the case of qemu, we really want the host
to have a 64-bit virtual address space within which we can store the guest
memory.  The 2G available to N32 is only good for really small guests.

In this case, however, it seems just as easy to get it right, since we can test
this code with N64.


r~



Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration

2016-11-28 Thread Eduardo Habkost
On Mon, Nov 28, 2016 at 02:47:18PM +0100, Paolo Bonzini wrote:
> 
> 
> On 17/11/2016 13:16, Marcelo Tosatti wrote:
> > What QEMU wants is to use KVM_GET_CLOCK at pre_save independently
> > of whether masterclock is enabled or not... it just depends
> > on KVM_GET_CLOCK being correct for the masterclock case
> > (108b249c453dd7132599ab6dc7e435a7036c193f).
> > 
> > So a "reliable KVM_GET_CLOCK" (that does not timebackward
> > when masterclock is enabled) is much simpler to userspace
> > than "whether masterclock is enabled or not".
> > 
> > If you have a reason why that should not be the case,
> > let me know.
> 
> I still cannot understand this case.
> 
> If the source has masterclock _disabled_, shouldn't it read kvmclock 
> from memory?  But that's not what your patch does.  To be clear, what
> I mean is:
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 653b0b4..6f6e2dc 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -97,6 +97,7 @@ static uint64_t kvm_get_clock(void)
>  fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
>  abort();
>  }
> +s->src_use_reliable_get_clock = data.flags & KVM_CLOCK_TSC_STABLE;

I still don't understand the reasoning behind
kvm_has_adjust_clock_stable() vs (flags & KVM_CLOCK_TSC_STABLE),
but on either case, updating src_use_reliable_get_clock inside
kvm_get_clock() looks like the right thing to do.

>  return data.clock;
>  }
>  
> @@ -110,34 +111,19 @@ static void kvmclock_vm_state_change(void *opaque, int 
> running,
>  
>  if (running) {
>  struct kvm_clock_data data = {};
> -uint64_t pvclock_via_mem = 0;
>  
> -/* local (running VM) restore */
> -if (s->clock_valid) {
> -/*
> - * if host does not support reliable KVM_GET_CLOCK,
> - * read kvmclock value from memory
> - */
> -if (!kvm_has_adjust_clock_stable()) {
> -pvclock_via_mem = kvmclock_current_nsec(s);
> -}
> -/* migration/savevm/init restore */
> -} else {
> -/*
> - * use s->clock in case machine uses reliable
> - * get clock and source host supported
> - * reliable get clock
> - */
> -if (!s->src_use_reliable_get_clock) {
> -pvclock_via_mem = kvmclock_current_nsec(s);
> +/*
> + * if last KVM_GET_CLOCK did not retrieve a reliable value,
> + * we can't rely on the saved clock value.  Just discard it and
> + * read kvmclock value from memory
> + */
> +if (!s->src_use_reliable_get_clock) {
> +uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
> +if (pvclock_via_mem) {
> +s->clock = pvclock_via_mem;
>  }
>  }

This is equivalent to the logic I suggested on my reply to v3.
Much simpler.

>  
> -/* We can't rely on the saved clock value, just discard it */
> -if (pvclock_via_mem) {
> -s->clock = pvclock_via_mem;
> -}
> -
>  s->clock_valid = false;
>  
>  data.clock = s->clock;
> @@ -181,16 +168,6 @@ static void kvmclock_realize(DeviceState *dev, Error 
> **errp)
>  {
>  KVMClockState *s = KVM_CLOCK(dev);
>  
> -/*
> - * On machine types that support reliable KVM_GET_CLOCK,
> - * if host kernel does provide reliable KVM_GET_CLOCK,
> - * set src_use_reliable_get_clock=true so that destination
> - * avoids reading kvmclock from memory.
> - */
> -if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> -s->src_use_reliable_get_clock = true;
> -}
> -
>  qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
>  }
>  

-- 
Eduardo



Re: [Qemu-devel] [PATCH] target-m68k: Implement bfffo

2016-11-28 Thread Richard Henderson
On 11/27/2016 11:53 AM, Laurent Vivier wrote:
>>  tcg_gen_andi_i32(tmp, tmp, 31);
>> >  mask = tcg_const_i32(0x7fffu);
>> >  tcg_gen_shr_i32(mask, mask, tmp);
>> > +if (!TCGV_IS_UNUSED(tlen)) {
>> > +tcg_gen_mov_i32(tlen, tmp);
> We must add 1 here, otherwise the value stays between 0 and 31, not 1
> and 32:
> 
> +tcg_gen_addi_i32(tlen, tmp, 1);
> 

Yep, good catch.


r~



Re: [Qemu-devel] [PATCH v2 3/5] target-m68k: Inline shifts

2016-11-28 Thread Richard Henderson
On 11/27/2016 09:53 AM, Laurent Vivier wrote:
>> > +TCGv t0 = tcg_temp_new();
>> > +tcg_gen_sari_i32(QREG_CC_V, reg, bits - 1);
>> > +tcg_gen_sari_i32(t0, t0, bits - count);
> t0 is used unitialized, I think we should have here:
> 
> tcg_gen_sari_i32(t0, reg, bits - count - 1);
> 
> moreover we must use "bits - count - 1" to use also the current most
> significant bit to compute V flag.
> 

Yep.


r~



Re: [Qemu-devel] [PATCH for-2.8 0/4] Allow 'cache-clean-interval' in Linux only

2016-11-28 Thread Kevin Wolf
Am 25.11.2016 um 12:27 hat Alberto Garcia geschrieben:
> Hi all,
> 
> The cache-clean-interval setting of qcow2 frees the memory of the L2
> cache tables that haven't been used after a certain interval of time.
> 
> QEMU uses madvise() with MADV_DONTNEED for this. After that call, the
> data in the specified cache tables is discarded by the kernel. The
> problem with this behavior is that it is Linux-specific. madvise()
> itself is not a standard system call and while other implementations
> (e.g. FreeBSD) also have MADV_DONTNEED, they don't share the same
> semantics.
> 
> POSIX defines posix_madvise(), which has POSIX_MADV_DONTNEED, and
> that's what QEMU uses in systems that don't implement madvise().
> However POSIX_MADV_DONTNEED also has different semantics and cannot be
> used for our purposes. As a matter of fact, in glibc it is a no-op:
> 
> https://github.molgen.mpg.de/git-mirror/glibc/blob/glibc-2.23/sysdeps/unix/sysv/linux/posix_madvise.c
> 
> So while this all is mentioned in the QEMU documentation, there's
> nothing preventing users of other systems from trying to use this
> feature. In non-Linux systems it is worse than a no-op: it invalidates
> perfectly valid cache tables for no reason without freeing their
> memory.
> 
> This series makes Linux a hard requirement for cache-clean-interval
> and prints an error message in other systems.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [PATCH v2 3/5] target-m68k: Inline shifts

2016-11-28 Thread Richard Henderson
On 11/27/2016 11:30 AM, Laurent Vivier wrote:
> There is another bug on this one.
> 
> Le 09/11/2016 à 14:46, Richard Henderson a écrit :
>> diff --git a/target-m68k/translate.c b/target-m68k/translate.c
>> index 4f224d7..1b3765f 100644
>> --- a/target-m68k/translate.c
>> +++ b/target-m68k/translate.c
>> +static inline void shift_reg(DisasContext *s, uint16_t insn, int opsize)
> ...
>> +/* M68000 sets V if the most significant bit is changed at
>> + * any time during the shift operation.  Do this via creating
>> + * an extension of the sign bit, comparing, and discarding
>> + * the bits below the sign bit.  I.e.
>> + * int64_t s = (intN_t)reg;
>> + * int64_t t = (int64_t)(intN_t)reg << count;
>> + * V = ((s ^ t) & (-1 << (bits - 1))) != 0
>> + */
>> +if (!logical && m68k_feature(s->env, M68K_FEATURE_M68000)) {
>> +/* Sign extend the input to 64 bits; re-do the shift.  */
>> +tcg_gen_ext_i32_i64(t64, reg);
>> +tcg_gen_shl_i64(s64, t64, s64);
>> +/* Clear all bits that are unchanged.  */
>> +tcg_gen_xor_i64(t64, t64, s64);
>> +/* Ignore the bits below the sign bit.  */
>> +tcg_gen_andi_i64(t64, t64, -1ULL << (bits - 1));
>> +/* If any bits remain set, we have overflow.  */
>> +tcg_gen_setcondi_i64(TCG_COND_NE, t64, t64, 0);
>> +tcg_gen_extrl_i64_i32(QREG_CC_V, t64);
>> +tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
> 
> if s64 is greater than 32, we lose all the bits needed to compute the V
> flag. I think we can just add this to fix the problem:
> 
>  if (!logical && m68k_feature(s->env, M68K_FEATURE_M68000)) {
> +TCGv_i64 tt = tcg_const_i64(32);
> +/* if shift is greater than 32, use 32 */
> +tcg_gen_movcond_i64(TCG_COND_GT, s64, s64, tt, tt, s64);
> +tcg_temp_free_i64(tt);
>  /* Sign extend the input to 64 bits; re-do the shift.  */
>  tcg_gen_ext_i32_i64(t64, reg);
>  tcg_gen_shl_i64(s64, t64, s64);

Hmm.  I guess the test case is input like 8 << 63, where the sign bit is clear,
the input is non-zero, and we shift out all of the set bits.

I can't think of anything cleaner than your movcond solution.


r~



Re: [Qemu-devel] [PATCH v2 3/5] target-m68k: Inline shifts

2016-11-28 Thread Richard Henderson
On 11/27/2016 11:35 AM, Laurent Vivier wrote:
>> > +tcg_gen_extr_i64_i32(QREG_CC_N, QREG_CC_C, t64);
> This does not extract correctly the C flag when the opsize is word or byte.
> I think we should use a shift instead:
> 
> -tcg_gen_extr_i64_i32(QREG_CC_N, QREG_CC_C, t64);
> -
> -/* Note that C=0 if shift count is 0, and we get that for free.  */
> +if (opsize == OS_LONG) {
> +tcg_gen_extr_i64_i32(QREG_CC_N, QREG_CC_C, t64);
> +/* Note that C=0 if shift count is 0, and we get that for
> free.  */
> +} else {
> +TCGv zero = tcg_const_i32(0);
> +tcg_gen_extrl_i64_i32(QREG_CC_N, t64);
> +tcg_gen_shri_i64(t64, t64, bits);
> +tcg_gen_extrl_i64_i32(QREG_CC_C, t64);
> +tcg_gen_movcond_i32(TCG_COND_EQ, QREG_CC_C,
> +s32, zero, zero, QREG_CC_C);
> +tcg_temp_free(zero);
> +}
> 
> Do you have a better idea?

  if (opsize == OS_LONG) {
tcg_gen_extr_i64_i32(QREG_CC_N, QREG_CC_C, t64);
  } else {
tcg_gen_extrl_i64_i32(QREG_CC_N, t64);
tcg_gen_shri_i32(QREG_CC_C, QREG_CC_N, bits);
  }

Since we zero-extend the input from bits, it's still true that a zero shift
gets C=0 for free.


r~



Re: [Qemu-devel] [PATCH v2 4/5] target-m68k: Implement bitfield ops for registers

2016-11-28 Thread Richard Henderson
On 11/27/2016 11:46 AM, Laurent Vivier wrote:
>> +uint32_t maski = -2U << (len - 1);
>> > +uint32_t roti = (ofs + len) & 31;
>> > +tcg_gen_andi_i32(tmp, src, maski);
> should be:
> 
>tcg_gen_andi_i32(tmp, src, ~maski);
> 
> Is it correct?

Yes.


r~



[Qemu-devel] [Bug 1645287] [NEW] Option "split" does not available for kernel_irqchip flag in qemu-system-x86_64

2016-11-28 Thread Po-Hsu Lin
Public bug reported:

On releases prior to Yakkety, the "split" option is not available for
kernel_irqchip flag in qemu-system-x86_64.

Yakkety:
kernel_irqchip=on|off|split controls accelerated irqchip support (default=off)


Xenial:
kernel_irqchip=on|off controls accelerated irqchip support

Trusty:
kernel_irqchip=on|off controls accelerated irqchip support

Precise:
kernel_irqchip=on|off controls accelerated irqchip support

It will be great to have this option, as we will need this for some kvm-
unit-tests for SRU

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1645287

Title:
  Option "split" does not available for kernel_irqchip flag in qemu-
  system-x86_64

Status in QEMU:
  New

Bug description:
  On releases prior to Yakkety, the "split" option is not available for
  kernel_irqchip flag in qemu-system-x86_64.

  Yakkety:
  kernel_irqchip=on|off|split controls accelerated irqchip support (default=off)

  
  Xenial:
  kernel_irqchip=on|off controls accelerated irqchip support

  Trusty:
  kernel_irqchip=on|off controls accelerated irqchip support

  Precise:
  kernel_irqchip=on|off controls accelerated irqchip support

  It will be great to have this option, as we will need this for some
  kvm-unit-tests for SRU

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1645287/+subscriptions



Re: [Qemu-devel] Linux kernel polling for QEMU

2016-11-28 Thread Eliezer Tamir
+ Eric, Willem

On 24/11/2016 17:12, Stefan Hajnoczi wrote:
> I looked through the socket SO_BUSY_POLL and blk_mq poll support in
> recent Linux kernels with an eye towards integrating the ongoing QEMU
> polling work.  The main missing feature is eventfd polling support which
> I describe below.
...
> State of polling in Linux
> -
> SO_BUSY_POLL causes recvmsg(2), select(2), and poll(2) family system
> calls to spin awaiting new receive packets.  From what I can tell epoll
> is not supported so that system call will sleep without polling.

At the time I sent out an RFC for epoll() SO_BUSY_POLL support.
https://lkml.org/lkml/2013/8/21/192

In hindsight I think the way I tracked sockets was over-complicated.
What I would do today would be to extend the API to allow the user
to tell epoll which socket/queue combinations are interesting.

I would love to collaborate on this with you, though I must confess that
my resources at the moment are limited and the setup I used for testing
no longer exists.

-Eliezer



Re: [Qemu-devel] [PATCH 0/2] pcspk migration compatibility

2016-11-28 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
> 
> 
> On 28/11/2016 14:31, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Hi,
> >   39c88f56 added VMState for pcspk but turned it on for
> > all machine types, this breaks backwards compatibility
> > to older machine types.
> > 
> >   If this is too late for 2.8 then I suggest we take the 1st
> > of these two patches, which just makes it a property to flip
> > for those of us who need it.  Adding the 2nd patch after 2.8
> > might cause problems for people using 2.8 with 2.7 machine
> > type.
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (2):
> >   migration/pcspk: Add a property to state if pcspk is migrated
> >   migration/pcspk: Turn migration of pcspk off for 2.7 and older
> > 
> >  hw/audio/pcspk.c | 10 ++
> >  include/hw/i386/pc.h |  5 +
> >  2 files changed, 15 insertions(+)
> > 
> 
> Wow, I didn't know optional sections existed.  We could have used it for
> hw/char/parallel.c, but that was added only a couple months before
> optional sections.

I'd forgotten that they existed and was about to write them to
solve this problem, and then found the code I apparently reviewed
a year or so ago :-)

> Queued for 2.8.

Thanks.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [Bug 1645355] [NEW] x86: singlestepping through SYSCALL instruction causes exception in kernelspace

2016-11-28 Thread Rudolf Marek
Public bug reported:

Hi,

The bug was originally reported [1] and [2] here. There is a problem
inside QEMU with singlestepping from userspace until SYSCALL instruction
is reached. The OS has in FMASK TF bit set, therefore there should be no
singlestepping exception when transitioning to kernelmode. But, inside
QEMU there is (TF is clear seems FMASK is applied). See below for
further details.

The reproducer is available at [2].

Here is the original text with some minor clarifications:

It seems that there is something wrong with QEMU with respect to handle
the singlestepping and AMD64 syscall instruction.

The AMD "syscall" instruction will clear defined flag in the FMASK MSR.
Normally the TF flag is set there, so the first instruction when kernel
is entered after syscall won't cause single step exception in the
kernel.

The observed scenario is a unhandled singlestep fault in the kernel or
host reboot or QEMU crash.

The possible way how to reproduce it is to single step through any
function (in userspace) which does "syscall" instruction. After syscall
is entered QEMU will trigger singlestepping exception in the kernel
despite that the TF is set in FMASK MSR. Real HW behaves correctly and
does not trigger this exception.


What is interesting is that I was not able to trigger it if I just enabled TF 
and did the syscall instruction, perhaps for this bug is somewhat important to 
have TF set for previous few instruction.


I have stumbled to this problem while working with our custom OS. However after 
some googling I found out that the NetBSD guys (CCed) are having very similar 
problem and I asked them to prepare a ISO image where the problem ends with 
QEMU SIGSEGV or host reboot. 

Thanks
Rudolf

[1] https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02289.html
[2] http://gnats.netbsd.org/49603

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1645355

Title:
  x86: singlestepping through SYSCALL instruction causes exception in
  kernelspace

Status in QEMU:
  New

Bug description:
  Hi,

  The bug was originally reported [1] and [2] here. There is a problem
  inside QEMU with singlestepping from userspace until SYSCALL
  instruction is reached. The OS has in FMASK TF bit set, therefore
  there should be no singlestepping exception when transitioning to
  kernelmode. But, inside QEMU there is (TF is clear seems FMASK is
  applied). See below for further details.

  The reproducer is available at [2].

  Here is the original text with some minor clarifications:

  It seems that there is something wrong with QEMU with respect to
  handle the singlestepping and AMD64 syscall instruction.

  The AMD "syscall" instruction will clear defined flag in the FMASK
  MSR. Normally the TF flag is set there, so the first instruction when
  kernel is entered after syscall won't cause single step exception in
  the kernel.

  The observed scenario is a unhandled singlestep fault in the kernel or
  host reboot or QEMU crash.

  The possible way how to reproduce it is to single step through any
  function (in userspace) which does "syscall" instruction. After
  syscall is entered QEMU will trigger singlestepping exception in the
  kernel despite that the TF is set in FMASK MSR. Real HW behaves
  correctly and does not trigger this exception.

  
  What is interesting is that I was not able to trigger it if I just enabled TF 
and did the syscall instruction, perhaps for this bug is somewhat important to 
have TF set for previous few instruction.

  
  I have stumbled to this problem while working with our custom OS. However 
after some googling I found out that the NetBSD guys (CCed) are having very 
similar problem and I asked them to prepare a ISO image where the problem ends 
with QEMU SIGSEGV or host reboot. 

  Thanks
  Rudolf

  [1] https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02289.html
  [2] http://gnats.netbsd.org/49603

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1645355/+subscriptions



Re: [Qemu-devel] [PATCH 0/2] pcspk migration compatibility

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 14:31, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Hi,
>   39c88f56 added VMState for pcspk but turned it on for
> all machine types, this breaks backwards compatibility
> to older machine types.
> 
>   If this is too late for 2.8 then I suggest we take the 1st
> of these two patches, which just makes it a property to flip
> for those of us who need it.  Adding the 2nd patch after 2.8
> might cause problems for people using 2.8 with 2.7 machine
> type.
> 
> Dave
> 
> Dr. David Alan Gilbert (2):
>   migration/pcspk: Add a property to state if pcspk is migrated
>   migration/pcspk: Turn migration of pcspk off for 2.7 and older
> 
>  hw/audio/pcspk.c | 10 ++
>  include/hw/i386/pc.h |  5 +
>  2 files changed, 15 insertions(+)
> 

Wow, I didn't know optional sections existed.  We could have used it for
hw/char/parallel.c, but that was added only a couple months before
optional sections.

Queued for 2.8.

Paolo



Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 15:28, Eduardo Habkost wrote:
> > +s->src_use_reliable_get_clock = data.flags & KVM_CLOCK_TSC_STABLE;
> 
> I still don't understand the reasoning behind
> kvm_has_adjust_clock_stable() vs (flags & KVM_CLOCK_TSC_STABLE),
> but on either case, updating src_use_reliable_get_clock inside
> kvm_get_clock() looks like the right thing to do.

There are three possibility: the kernel tells you the clock is stable,
the kernel tells you the clock is unstable, the kernel is too old and
doesn't tell you anything.  Then:

kvm_has_adjust_clock_stable() == true:
if the clock is stable, KVM_CLOCK_TSC_STABLE will be set in "flags"
if the clock is unstable, KVM_CLOCK_TSC_STABLE will be unset

kvm_has_adjust_clock_stable() == false:
you cannot know if the clock is stable

Paolo



Re: [Qemu-devel] [PATCH] target-arm: Add VBAR support to ARM1176 CPUs

2016-11-28 Thread Cédric Le Goater
On 11/28/2016 11:40 AM, Peter Maydell wrote:
> On 24 November 2016 at 14:29, Cédric Le Goater  wrote:
>> On 09/05/2016 04:39 PM, Peter Maydell wrote:
>>> We implement VBAR in v7-without-EL3 even though architecturally
>>> it should only exist in v7-with-EL3 because we have some
>>> legacy board models which we implement as without-EL3 but
>>> where the guest (Linux) assumes it's running on a with-EL3
>>> CPU in NS mode. I'd rather we stick to the architectural
>>> definition for 1176 if we can rather than expanding the
>>> "things we do which aren't architectural" set if there's
>>> no legacy config that requires it. (If it is necessary
>>> to define it for 1176 always then that's OK, I'd just
>>> prefer not to unless we know we have to.)
>>
>> According to the ARM spec, 1176 is a v6 with a couple of
>> extensions, among which v6k and TrustZone. So it has Secure
>> and Non-Secure VBAR and MVBAR registers.
>>
>> Looking at :
>>
>>   https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures
>>
>> May be, we should instead introduce a new ARMv6Z architecture
>> for the 1176 cpu which would represent a ARMv6 + TrustZone ?
>>
>>
>> But I am a bit confused with this EL3 feature. 1176 does not
>> have EL3 to my understanding (I am beyond my ARM skills there).
> 
> EL3 == TrustZone's Monitor mode level of privilege. In
> ARMv8 the privilege level model was tidied up a bit so it's
> more cleanly arranged in levels EL0 (user) EL1 (kernel)
> EL2 (hypervisor) EL3 (secure monitor), and the old 32-bit
> setup can be expressed in terms of that (User being EL0,
> the various Sys/Fiq/Irq/etc modes being EL1, Hyp at EL2
> and Mon at EL3). For QEMU we implemented TrustZone support
> after the v8 spec was published, so we had the opportunity
> to do it in a way that fit nicely with 64-bit TZ support.
> So ARM_FEATURE_EL3 is our "does this CPU have TrustZone"
> flag, in the same way that ARM_FEATURE_EL2 is "does it have
> the Virtualization extensions".
>
> We also support 1176-without-TrustZone, which doesn't
> exist in real hardware but which we have for backwards
> compatibility with code that runs on our 1176 boards
> which we implemented before we had TZ support in the
> CPU implementation.

Thanks for the explanation. It is clearer now.

> What I'm suggesting is that we shoulld provide VBAR
> only in the 1176-with-TZ CPUs, not in the 1176-no-TZ
> CPUs, unless we must provide it for the latter as a
> back-compat requirement.

I don't for the back-compat requirement. 

But I need to rework the patch to keep VBAR in any case 
for the v7-without-EL3 CPUs and remove it for the 1176 CPUs 
when 'has_el3' is not set.

Thanks,

C. 

>> So why is it part of the feature list in arm1176_initfn ?
>> Is that a way to define the MVBAR and SCR regs ?
> 
> The CPU object defaults to TZ-enabled, and then the
> board model can disable it. (See for instance the
> code in hw/arm/realview.c which sets the has_el3 property
> on the CPU object to false.)
> 
> thanks
> -- PMM
> 




Re: [Qemu-devel] [PATCH] rules.mak: Use -r instead of -Wl, -r to fix building when PIE is default

2016-11-28 Thread Paolo Bonzini


On 27/11/2016 17:28, Adrian Bunk wrote:
> Building qemu fails in distributions where gcc enables PIE
> by default (e.g. Debian unstable) with:
> /usr/bin/ld: -r and -pie may not be used together
> 
> -r and -pie cannot be used together in the linker,
> and position independent is already relocatable.
> 
> Use -r instead of -Wl,-r to avoid gcc passing -r to the
> linker when PIE is enabled.
> 
> Signed-off-by: Adrian Bunk 

I think this is a bug in the linker.  If the linker is producing
relocatable objects by default, it has no reason to refuse -r.  Have you
tried asking the binutils folks about it too?

But the patch would probably remove the need for LD_REL_FLAGS, based on
a quick look at the GCC source code, so I guess it's fine.

Paolo

> ---
>  rules.mak | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/rules.mak b/rules.mak
> index 0333ae3..545ebd9 100644
> --- a/rules.mak
> +++ b/rules.mak
> @@ -93,7 +93,7 @@ module-common.o: CFLAGS += $(DSO_OBJ_CFLAGS)
>   $(if $(findstring /,$@),$(call quiet-command,cp $@ $(subst 
> /,-,$@),"CP","$(subst /,-,$@)"))
>  
>  
> -LD_REL := $(CC) -nostdlib -Wl,-r $(LD_REL_FLAGS)
> +LD_REL := $(CC) -nostdlib -r $(LD_REL_FLAGS)
>  
>  %.mo:
>   $(call quiet-command,$(LD_REL) -o $@ $^,"LD","$(TARGET_DIR)$@")
> 



Re: [Qemu-devel] [PATCH] rules.mak: Use -r instead of -Wl, -r to fix building when PIE is default

2016-11-28 Thread Adrian Bunk
On Mon, Nov 28, 2016 at 04:05:33PM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/11/2016 17:28, Adrian Bunk wrote:
> > Building qemu fails in distributions where gcc enables PIE
> > by default (e.g. Debian unstable) with:
> > /usr/bin/ld: -r and -pie may not be used together
> > 
> > -r and -pie cannot be used together in the linker,
> > and position independent is already relocatable.
> > 
> > Use -r instead of -Wl,-r to avoid gcc passing -r to the
> > linker when PIE is enabled.
> > 
> > Signed-off-by: Adrian Bunk 
> 
> I think this is a bug in the linker.  If the linker is producing
> relocatable objects by default, it has no reason to refuse -r.  Have you
> tried asking the binutils folks about it too?

The linker knows nothing about this default, gcc is passing -pie
to the linker.

>...
> Paolo
>...

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed




Re: [Qemu-devel] [dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-11-28 Thread Maxime Coquelin



On 11/24/2016 04:24 PM, Kavanagh, Mark B wrote:


On 11/24/2016 12:47 PM, Maxime Coquelin wrote:



On 11/24/2016 01:33 PM, Yuanhan Liu wrote:

On Thu, Nov 24, 2016 at 09:30:49AM +, Kevin Traynor wrote:

On 11/24/2016 06:31 AM, Yuanhan Liu wrote:

On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote:

You keep assuming that you have the VM started first and
figure out things afterwards, but this does not work.

Think about a cluster of machines. You want to start a VM in
a way that will ensure compatibility with all hosts
in a cluster.


I see. I was more considering about the case when the dst
host (including the qemu and dpdk combo) is given, and
then determine whether it will be a successfull migration
or not.

And you are asking that we need to know which host could
be a good candidate before starting the migration. In such
case, we indeed need some inputs from both the qemu and
vhost-user backend.

For DPDK, I think it could be simple, just as you said, it
could be either a tiny script, or even a macro defined in
the source code file (we extend it every time we add a
new feature) to let the libvirt to read it. Or something
else.


There's the issue of APIs that tweak features as Maxime
suggested.


Yes, it's a good point.


Maybe the only thing to do is to deprecate it,


Looks like so.


but I feel some way for application to pass info into
guest might be benefitial.


The two APIs are just for tweaking feature bits DPDK supports

before

any device got connected. It's another way to disable some features
(the another obvious way is to through QEMU command lines).

IMO, it's bit handy only in a case like: we have bunch of VMs.

Instead

of disabling something though qemu one by one, we could disable it
once in DPDK.

But I doubt the useful of it. It's only used in DPDK's vhost

example

after all. Nor is it used in vhost pmd, neither is it used in OVS.


rte_vhost_feature_disable() is currently used in OVS,

lib/netdev-dpdk.c

Hmmm. I must have checked very old code ...


netdev_dpdk_vhost_class_init(void)
{
static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;

/* This function can be called for different classes.  The
initialization
 * needs to be done only once */
if (ovsthread_once_start(&once)) {
rte_vhost_driver_callback_register(&virtio_net_device_ops);
rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4
  | 1ULL << VIRTIO_NET_F_HOST_TSO6
  | 1ULL << VIRTIO_NET_F_CSUM);

I saw the commit introduced such change, but it tells no reason why
it was added.


I'm also interested to know the reason.


I can't remember off hand, added Mark K or Michal W who should be able
to shed some light on it.


DPDK v16.04 added support for vHost User TSO; as such, by default, TSO is 
advertised to guest devices as an available feature during feature negotiation 
with QEMU.
However, while the vHost user backend sets up the majority of the mbuf fields 
that are required for TSO, there is still a reliance on the associated DPDK 
application (i.e. in this case OvS-DPDK) to set the remaining flags and/or 
offsets. Since OvS-DPDK doesn't currently provide that functionality, it is 
necessary to explicitly disable TSO; otherwise, undefined behaviour will ensue.

Thanks Mark for the clarification.

In this case, maybe we could add a DPDK build option to disable Vhost's
TSO support, that would be selected for OVS packages?

Does that sound reasonable?

Cheers,
Maxime




In any case, I think this is something that can/should be managed by
the management tool, which  should disable it in cmd parameters.

Kevin, do you agree?


I think best to find out the reason first. Because if no reason to
disable in the code, then no need to debate!



Cheers,
Maxime






Re: [Qemu-devel] Linux kernel polling for QEMU

2016-11-28 Thread Stefan Hajnoczi
On Mon, Nov 28, 2016 at 11:31:43AM +0200, Eliezer Tamir wrote:
> + Eric, Willem
> 
> On 24/11/2016 17:12, Stefan Hajnoczi wrote:
> > I looked through the socket SO_BUSY_POLL and blk_mq poll support in
> > recent Linux kernels with an eye towards integrating the ongoing QEMU
> > polling work.  The main missing feature is eventfd polling support which
> > I describe below.
> ...
> > State of polling in Linux
> > -
> > SO_BUSY_POLL causes recvmsg(2), select(2), and poll(2) family system
> > calls to spin awaiting new receive packets.  From what I can tell epoll
> > is not supported so that system call will sleep without polling.
> 
> At the time I sent out an RFC for epoll() SO_BUSY_POLL support.
> https://lkml.org/lkml/2013/8/21/192
> 
> In hindsight I think the way I tracked sockets was over-complicated.
> What I would do today would be to extend the API to allow the user
> to tell epoll which socket/queue combinations are interesting.
> 
> I would love to collaborate on this with you, though I must confess that
> my resources at the moment are limited and the setup I used for testing
> no longer exists.

Thanks for sharing the link.  I'll let you know before embarking on an
effort to make epoll support busy_loop.

At the moment I'm still evaluating whether the good results we've gotten
from polling in QEMU userspace are preserved when polling is shifted to
the kernel.

FWIW I've prototyped ioctl(EVENTFD_SET_POLL_INFO) but haven't had a
chance to test it yet:
https://github.com/stefanha/linux/commit/133e8f1da8eb5364cd5c5f7162decbc79175cd13

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] rules.mak: Use -r instead of -Wl, -r to fix building when PIE is default

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 16:24, Adrian Bunk wrote:
> On Mon, Nov 28, 2016 at 04:05:33PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 27/11/2016 17:28, Adrian Bunk wrote:
>>> Building qemu fails in distributions where gcc enables PIE
>>> by default (e.g. Debian unstable) with:
>>> /usr/bin/ld: -r and -pie may not be used together
>>>
>>> -r and -pie cannot be used together in the linker,
>>> and position independent is already relocatable.
>>>
>>> Use -r instead of -Wl,-r to avoid gcc passing -r to the
>>> linker when PIE is enabled.
>>>
>>> Signed-off-by: Adrian Bunk 
>>
>> I think this is a bug in the linker.  If the linker is producing
>> relocatable objects by default, it has no reason to refuse -r.  Have you
>> tried asking the binutils folks about it too?
> 
> The linker knows nothing about this default, gcc is passing -pie
> to the linker.

The linker is receiving "-r -pie".  It can satisfy the requirement of
producing a relocatable object by discarding the "-r", but it doesn't.
That'd be a linker bug.

But in fact ELF makes PIE ET_DYN and relocatable ET_REL.  That would
make the linker error the right thing, but then I don't understand what
you mean by "position independent is already relocatable".

Paolo



Re: [Qemu-devel] Linux kernel polling for QEMU

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 16:29, Stefan Hajnoczi wrote:
> Thanks for sharing the link.  I'll let you know before embarking on an
> effort to make epoll support busy_loop.
> 
> At the moment I'm still evaluating whether the good results we've gotten
> from polling in QEMU userspace are preserved when polling is shifted to
> the kernel.
> 
> FWIW I've prototyped ioctl(EVENTFD_SET_POLL_INFO) but haven't had a
> chance to test it yet:
> https://github.com/stefanha/linux/commit/133e8f1da8eb5364cd5c5f7162decbc79175cd13

This would add a system call every time the main loop processes a vring,
wouldn't it?

Paolo



Re: [Qemu-devel] [PATCH] rules.mak: Use -r instead of -Wl, -r to fix building when PIE is default

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 16:38, Paolo Bonzini wrote:
> 
> 
> On 28/11/2016 16:24, Adrian Bunk wrote:
>> On Mon, Nov 28, 2016 at 04:05:33PM +0100, Paolo Bonzini wrote:
>>>
>>>
>>> On 27/11/2016 17:28, Adrian Bunk wrote:
 Building qemu fails in distributions where gcc enables PIE
 by default (e.g. Debian unstable) with:
 /usr/bin/ld: -r and -pie may not be used together

 -r and -pie cannot be used together in the linker,
 and position independent is already relocatable.

 Use -r instead of -Wl,-r to avoid gcc passing -r to the
 linker when PIE is enabled.

 Signed-off-by: Adrian Bunk 
>>>
>>> I think this is a bug in the linker.  If the linker is producing
>>> relocatable objects by default, it has no reason to refuse -r.  Have you
>>> tried asking the binutils folks about it too?
>>
>> The linker knows nothing about this default, gcc is passing -pie
>> to the linker.
> 
> The linker is receiving "-r -pie".  It can satisfy the requirement of
> producing a relocatable object by discarding the "-r", but it doesn't.
> That'd be a linker bug.
> 
> But in fact ELF makes PIE ET_DYN and relocatable ET_REL.  That would
> make the linker error the right thing, but then I don't understand what
> you mean by "position independent is already relocatable".

Aha, I looked at GCC source code and this is incorrect: "Use -r instead
of -Wl,-r to avoid gcc passing -r to the linker when PIE is enabled".
When GCC sees -r (as opposed to -Wl,-r) it does not pass -pie to the linker.

Paolo




Re: [Qemu-devel] [PATCH] block/mirror: Fix passing wrong argument to trace_mirror_yield

2016-11-28 Thread Fam Zheng
On Mon, 11/28 16:58, Yang Wei wrote:
> mirror_yield is defined in block/trace-event, just like the following:
> mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight)
> so we should exchange arguement 2 and 4 while invoking it.
> 
> Signed-off-by: Yang Wei 
> ---
>  block/mirror.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 301ba92..2846a2e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -577,7 +577,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  }
>  
>  if (s->in_flight >= MAX_IN_FLIGHT) {
> -trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
> +trace_mirror_yield(s,  -1, s->buf_free_count, s->in_flight);
>  mirror_wait_for_io(s);
>  continue;
>  }
> @@ -730,7 +730,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
>  if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
>  (cnt == 0 && s->in_flight > 0)) {
> -trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
> +trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
>  mirror_wait_for_io(s);
>  continue;
>  } else if (cnt != 0) {
> -- 
> 2.10.2
> 
> 

Reviewed-by: Fam Zheng 



[Qemu-devel] [PATCH v7 0/5] IOMMU: intel_iommu support map and unmap notifications

2016-11-28 Thread Aviv B.D
From: "Aviv Ben-David" 

* Advertize Cache Mode capability in iommu cap register. 
  This capability is controlled by "cache-mode" property of intel-iommu device.
  To enable this option call QEMU with "-device intel-iommu,cache-mode=true".

* On page cache invalidation in intel vIOMMU, check if the domain belong to
  registered notifier, and notify accordingly.

Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
present. Current problems:
* vfio_iommu_map_notify is not aware about memory range belong to specific 
  VFIOGuestIOMMU.
* intel_iommu's replay op is not implemented yet (May come in different patch 
  set).
  The replay function is required for hotplug vfio device and to move devices 
  between existing domains.

Changes from v1 to v2:
* remove assumption that the cache do not clears
* fix lockup on high load.

Changes from v2 to v3:
* remove debug leftovers
* split to sepearate commits
* change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
  to suppress error propagating to guest.

Changes from v3 to v4:
* Add property to intel_iommu device to control the CM capability, 
  default to False.
* Use s->iommu_ops.notify_flag_changed to register notifiers.

Changes from v4 to v4 RESEND:
* Fix codding style pointed by checkpatch.pl script.

Changes from v4 to v5:
* Reduce the number of changes in patch 2 and make flags real bitfield.
* Revert deleted debug prints.
* Fix memory leak in patch 3.

Changes from v5 to v6:
* fix prototype of iommu_translate function for more IOMMU types.
* VFIO will be notified only on the difference, without unmap 
  before change to maps.

Changes from v6 to v7:
* Add replay op to iommu_ops, with current behavior as default implementation
  (Patch 4).
* Add stub to disable VFIO with intel_iommu support (Patch 5).
* Cosmetic changes to other patches.

Aviv Ben-David (5):
  IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
guest
  IOMMU: change iommu_op->translate's is_write to flags, add support to
NO_FAIL flag mode
  IOMMU: enable intel_iommu map and unmap notifiers
  IOMMU: add specific replay function with default implemenation
  IOMMU: add specific null implementation of iommu_replay to intel_iommu

 exec.c |   3 +-
 hw/alpha/typhoon.c |   2 +-
 hw/i386/amd_iommu.c|   4 +-
 hw/i386/intel_iommu.c  | 165 ++---
 hw/i386/intel_iommu_internal.h |   3 +
 hw/pci-host/apb.c  |   2 +-
 hw/ppc/spapr_iommu.c   |   2 +-
 hw/s390x/s390-pci-bus.c|   2 +-
 include/exec/memory.h  |  10 ++-
 include/hw/i386/intel_iommu.h  |  11 +++
 memory.c   |  11 ++-
 11 files changed, 177 insertions(+), 38 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH v7 5/5] IOMMU: add specific null implementation of iommu_replay to intel_iommu

2016-11-28 Thread Aviv B.D
From: "Aviv Ben-David" 

Currently the implementation preventing VFIO to work together with
intel_iommu.

Signed-off-by: Aviv Ben-David 
---
 hw/i386/intel_iommu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d872969..0787714 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2453,6 +2453,12 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 return vtd_dev_as;
 }
 
+static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
+ bool is_write){
+error_report("VFIO use with intel_iommu is currently not supported.");
+exit(1);
+}
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
@@ -2467,6 +2473,7 @@ static void vtd_init(IntelIOMMUState *s)
 
 s->iommu_ops.translate = vtd_iommu_translate;
 s->iommu_ops.notify_flag_changed = vtd_iommu_notify_flag_changed;
+s->iommu_ops.replay = vtd_iommu_replay;
 s->root = 0;
 s->root_extended = false;
 s->dmar_enabled = false;
-- 
1.9.1




[Qemu-devel] [PATCH v7 1/5] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest

2016-11-28 Thread Aviv B.D
From: "Aviv Ben-David" 

This capability asks the guest to invalidate cache before each map operation.
We can use this invalidation to trap map operations in the hypervisor.

Signed-off-by: Aviv Ben-David 
---
 hw/i386/intel_iommu.c  | 5 +
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1b706ad..2cf07cd 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2017,6 +2017,7 @@ static Property vtd_properties[] = {
 DEFINE_PROP_ON_OFF_AUTO("eim", IntelIOMMUState, intr_eim,
 ON_OFF_AUTO_AUTO),
 DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
+DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2391,6 +2392,10 @@ static void vtd_init(IntelIOMMUState *s)
 assert(s->intr_eim != ON_OFF_AUTO_AUTO);
 }
 
+if (s->cache_mode_enabled) {
+s->cap |= VTD_CAP_CM;
+}
+
 vtd_reset_context_cache(s);
 vtd_reset_iotlb(s);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 11abfa2..00cbe0d 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -201,6 +201,7 @@
 #define VTD_CAP_MAMV(VTD_MAMV << 48)
 #define VTD_CAP_PSI (1ULL << 39)
 #define VTD_CAP_SLLPS   ((1ULL << 34) | (1ULL << 35))
+#define VTD_CAP_CM  (1ULL << 7)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT 8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 405c9d1..749eef9 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -257,6 +257,8 @@ struct IntelIOMMUState {
 uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
 uint32_t version;
 
+bool cache_mode_enabled;/* RO - is cap CM enabled? */
+
 dma_addr_t root;/* Current root table pointer */
 bool root_extended; /* Type of root table (extended or not) */
 bool dmar_enabled;  /* Set if DMA remapping is enabled */
-- 
1.9.1




[Qemu-devel] [PATCH v7 4/5] IOMMU: add specific replay function with default implemenation

2016-11-28 Thread Aviv B.D
From: "Aviv Ben-David" 

The default implementation scans the address space and try to find
translation for each page, if exists.

This callback enables effiecent implementation for intel_iommu and other
subsystems with large address space.

Signed-off-by: Aviv Ben-David 
---
 include/exec/memory.h | 4 
 memory.c  | 8 
 2 files changed, 12 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2d7ee54..a8b3701 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -180,6 +180,10 @@ struct MemoryRegionIOMMUOps {
 void (*notify_flag_changed)(MemoryRegion *iommu,
 IOMMUNotifierFlag old_flags,
 IOMMUNotifierFlag new_flags);
+/* Called when region has been moved between iommu domains */
+void (*replay)(MemoryRegion *mr,
+ IOMMUNotifier *n,
+ bool is_write);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
diff --git a/memory.c b/memory.c
index 9b88638..562d540 100644
--- a/memory.c
+++ b/memory.c
@@ -1624,6 +1624,14 @@ void memory_region_iommu_replay(MemoryRegion *mr, 
IOMMUNotifier *n,
 hwaddr addr, granularity;
 IOMMUTLBEntry iotlb;
 
+/* If there is specific implementation, use it */
+assert(memory_region_is_iommu(mr));
+if (mr->iommu_ops && mr->iommu_ops->replay) {
+return mr->iommu_ops->replay(mr, n, is_write);
+}
+
+/* No specific implementation for this iommu, fall back to default
+ * implementation */
 granularity = memory_region_iommu_get_min_page_size(mr);
 
 for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-- 
1.9.1




[Qemu-devel] [PATCH v7 2/5] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode

2016-11-28 Thread Aviv B.D
From: "Aviv Ben-David" 

Supports translation trials without reporting error to guest on
translation failure.

Signed-off-by: Aviv Ben-David 
---
 exec.c  |  3 ++-
 hw/alpha/typhoon.c  |  2 +-
 hw/i386/amd_iommu.c |  4 ++--
 hw/i386/intel_iommu.c   | 59 +++--
 hw/pci-host/apb.c   |  2 +-
 hw/ppc/spapr_iommu.c|  2 +-
 hw/s390x/s390-pci-bus.c |  2 +-
 include/exec/memory.h   |  6 +++--
 memory.c|  3 ++-
 9 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/exec.c b/exec.c
index 3d867f1..126fb31 100644
--- a/exec.c
+++ b/exec.c
@@ -466,7 +466,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
hwaddr addr,
 break;
 }
 
-iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+iotlb = mr->iommu_ops->translate(mr, addr,
+ is_write ? IOMMU_WO : IOMMU_RO);
 addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
 | (addr & iotlb.addr_mask));
 *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 883db13..a2840ac 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -664,7 +664,7 @@ static bool window_translate(TyphoonWindow *win, hwaddr 
addr,
 /* TODO: A translation failure here ought to set PCI error codes on the
Pchip and generate a machine check interrupt.  */
 static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
- bool is_write)
+ IOMMUAccessFlags flags)
 {
 TyphoonPchip *pchip = container_of(iommu, TyphoonPchip, iommu);
 IOMMUTLBEntry ret;
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 47b79d9..1f0d76b 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -988,7 +988,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
 }
 
 static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
- bool is_write)
+ IOMMUAccessFlags flags)
 {
 AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
 AMDVIState *s = as->iommu_state;
@@ -1017,7 +1017,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, 
hwaddr addr,
 return ret;
 }
 
-amdvi_do_translate(as, addr, is_write, &ret);
+amdvi_do_translate(as, addr, flags & IOMMU_WO, &ret);
 trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
 PCI_FUNC(as->devfn), addr, ret.translated_addr);
 return ret;
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2cf07cd..05973b9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -631,7 +631,8 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t 
level)
 /* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
-static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
+static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
+IOMMUAccessFlags flags,
 uint64_t *slptep, uint32_t *slpte_level,
 bool *reads, bool *writes)
 {
@@ -640,7 +641,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t 
gpa, bool is_write,
 uint32_t offset;
 uint64_t slpte;
 uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
-uint64_t access_right_check;
+uint64_t access_right_check = 0;
 
 /* Check if @gpa is above 2^X-1, where X is the minimum of MGAW in CAP_REG
  * and AW in context-entry.
@@ -651,7 +652,15 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t 
gpa, bool is_write,
 }
 
 /* FIXME: what is the Atomics request here? */
-access_right_check = is_write ? VTD_SL_W : VTD_SL_R;
+if (flags & IOMMU_WO) {
+access_right_check |= VTD_SL_W;
+}
+if (flags & IOMMU_RO) {
+access_right_check |= VTD_SL_R;
+}
+if (flags & IOMMU_NO_FAIL) {
+access_right_check |= VTD_SL_R | VTD_SL_W;
+}
 
 while (true) {
 offset = vtd_gpa_level_offset(gpa, level);
@@ -673,8 +682,8 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t 
gpa, bool is_write,
 if (!(slpte & access_right_check)) {
 VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
 "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
-(is_write ? "write" : "read"), gpa, slpte);
-return is_write ? -VTD_FR_WRITE : -VTD_FR_READ;
+(flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
+return (flags & IOMMU_WO) ? -VTD_FR_WRITE : -VTD_FR_READ;
 }
 if (vtd_slpte_nonzero_rsvd(slpte, level)) {
 VTD_DPRINTF(GENERAL, "error: non-zero reserved fi

Re: [Qemu-devel] sane char device writes?

2016-11-28 Thread Michal Suchánek
Hello,

On Fri, 25 Nov 2016 17:16:39 +0100
Paolo Bonzini  wrote:

> On 24/11/2016 08:51, Thomas Huth wrote:
> > > So for this to work an extra buffer would have to be stored in
> > > gtk.c somewhere, and possibly similar timer trick used as in
> > > console.c
> > > 
> > > Any ideas how to do this without introducing too much insanity?
> > >
> > > Presumably using a GTK timer for repeating gd_vc_in the handler
> > > would run in the same GTK UI thread as the "commit" signal
> > > handler and excessive locking would not be required.
> > > 
> > > The data passed to gd_vc_in is presumably freed when it ends so it
> > > would have to be copied somewhere. It's quite possible to create a
> > > static list in gd_vc_in or some extra field in VirtualConsole.  
> >
> > Not sure how the best solution should really look like, but Paolo
> > suggested something here:
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2016-11/msg0.html
> > 
> > ... so I'm putting him on CC: ... maybe he's got some spare minutes
> > to elaborate on his idea.  
> 
> My idea looks very much like Michal's.  I hadn't gone very much beyond
> the "you need a buffer" step, but anyway you don't need a timer---you
> can just record a chr_accept_input callback in gd_vc_handler.  It will
> be called when the front-end is ready to get more characters.
> 
Where will the characters come from, though? 

This might solve the console ui hack since it has a pipe buffer it can
peek and read in pieces but the gtk ui gets a whole paste buffer in an
event callback and has to handle it whole before it returns from the
gtk event callback. Unless it stores the data that does not fit into
the serial fifo somewhere it has to drop it on the floor or block.

Thanks

Michal



[Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map and unmap notifiers

2016-11-28 Thread Aviv B.D
From: "Aviv Ben-David" 

Adds a list of registered vtd_as's to intel iommu state to save
iteration over each PCI device in a search of the corrosponding domain.

Signed-off-by: Aviv Ben-David 
---
 hw/i386/intel_iommu.c  | 94 ++
 hw/i386/intel_iommu_internal.h |  2 +
 include/hw/i386/intel_iommu.h  |  9 
 3 files changed, 98 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 05973b9..d872969 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -679,7 +679,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t 
gpa,
 }
 *reads = (*reads) && (slpte & VTD_SL_R);
 *writes = (*writes) && (slpte & VTD_SL_W);
-if (!(slpte & access_right_check)) {
+if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
 VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
 "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
 (flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
@@ -978,6 +978,23 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState 
*s, uint8_t bus_num)
 return vtd_bus;
 }
 
+static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn,
+   uint16_t *domain_id)
+{
+VTDContextEntry ce;
+int ret_fr;
+
+assert(domain_id);
+
+ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
+if (ret_fr) {
+return -1;
+}
+
+*domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
+return 0;
+}
+
 /* Do a context-cache device-selective invalidation.
  * @func_mask: FM field after shifting
  */
@@ -1064,6 +1081,45 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState 
*s, uint16_t domain_id)
 &domain_id);
 }
 
+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
+   uint16_t domain_id, hwaddr addr,
+   uint8_t am)
+{
+IntelIOMMUNotifierNode *node;
+
+QLIST_FOREACH(node, &(s->notifiers_list), next) {
+VTDAddressSpace *vtd_as = node->vtd_as;
+uint16_t vfio_domain_id;
+int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
+  &vfio_domain_id);
+
+if (!ret && domain_id == vfio_domain_id) {
+hwaddr original_addr = addr;
+
+while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
+IOMMUTLBEntry entry = s->iommu_ops.translate(
+ &node->vtd_as->iommu,
+ addr,
+ IOMMU_NO_FAIL);
+
+if (entry.perm == IOMMU_NONE &&
+node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
+entry.target_as = &address_space_memory;
+entry.iova = addr & VTD_PAGE_MASK_4K;
+entry.translated_addr = 0;
+entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT);
+memory_region_notify_iommu(&node->vtd_as->iommu, entry);
+addr += VTD_PAGE_SIZE;
+} else if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
+memory_region_notify_iommu(&node->vtd_as->iommu, 
entry);
+addr += entry.addr_mask + 1;
+}
+}
+}
+}
+}
+
 static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
   hwaddr addr, uint8_t am)
 {
@@ -1074,6 +1130,8 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, 
uint16_t domain_id,
 info.addr = addr;
 info.mask = ~((1 << am) - 1);
 g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
+
+vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
 }
 
 /* Flush IOTLB
@@ -1999,15 +2057,37 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion 
*iommu,
   IOMMUNotifierFlag new)
 {
 VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+IntelIOMMUState *s = vtd_as->iommu_state;
+IntelIOMMUNotifierNode *node = NULL;
+IntelIOMMUNotifierNode *next_node = NULL;
 
-if (new & IOMMU_NOTIFIER_MAP) {
-error_report("Device at bus %s addr %02x.%d requires iommu "
- "notifier which is currently not supported by "
- "intel-iommu emulation",
- vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
- PCI_FUNC(vtd_as->devfn));
+if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
+error_report("We need to set cache_mode=1 for intel-iommu to enable "
+ "device assignment with IOMMU protection.");
 exit(1);
 }
+
+/* Add new ndoe if no map

Re: [Qemu-devel] sane char device writes?

2016-11-28 Thread Paolo Bonzini


On 28/11/2016 16:53, Michal Suchánek wrote:
>> > 
>> > My idea looks very much like Michal's.  I hadn't gone very much beyond
>> > the "you need a buffer" step, but anyway you don't need a timer---you
>> > can just record a chr_accept_input callback in gd_vc_handler.  It will
>> > be called when the front-end is ready to get more characters.
> 
> Where will the characters come from, though? 

>From ui/gtk.c's own buffer.

> This might solve the console ui hack since it has a pipe buffer it can
> peek and read in pieces but the gtk ui gets a whole paste buffer in an
> event callback and has to handle it whole before it returns from the
> gtk event callback. Unless it stores the data that does not fit into
> the serial fifo somewhere it has to drop it on the floor or block.

Yes, gtk needs to have a buffer of its own (probably it should
dynamically allocate it too, so that it can grow and shrink and the
limit on the size can be higher).

Paolo



Re: [Qemu-devel] [PATCH v6 4/4] hw/intc/arm_gicv3_kvm: Reset GICv3 cpu interface registers

2016-11-28 Thread Vijay Kilari
On Mon, Nov 28, 2016 at 6:31 PM, Peter Maydell  wrote:
> On 23 November 2016 at 12:39,   wrote:
>> From: Vijaya Kumar K 
>>
>> Reset CPU interface registers of GICv3 when CPU is reset.
>> For this, object interface is used, which is called from
>> arm_cpu_reset function.
>>
>> Signed-off-by: Vijaya Kumar K 
>
> This approach doesn't handle the SMP case correctly --
> when a CPU is reset then the CPU interface for that CPU
> (and only that CPU) should be reset. Your code will
> reset every CPU interface every time any CPU is reset.

arm_cpu_reset is not called when particular cpu is reset?.
Is it called for all cpus?.
OR object_child_foreach_recursive() is calling to reset cpu interfaces of
all cpus?.

>
> I think it would be better to use the same approach that
> the arm_gicv3_cpuif.c code uses to arrange for cpu i/f
> registers to be reset, perhaps by moving the appropriate
> parts of that code into the common source file.
>
> Having the reset state depend implicitly on the kernel's
> internal state (as you have here for the ICC_CTLR_EL1
> state) is something I'm a bit unsure about -- what goes
> wrong if you don't do that?

During VM boots kvm_arm_gicv3_reset() writes all
the GIC registers with reset value. kernel does not allow writing ICC_CTLR_EL1
with zeros because it validates against hw supported values.
Similarly SRE_EL1.

>
> thanks
> -- PMM



  1   2   >