Re: [Qemu-devel] [PULL 00/10] collected tcg patches

2015-10-22 Thread Peter Maydell
On 22 October 2015 at 12:02, Peter Maydell  wrote:
> Hi. I'm going to hold off on processing this pull for a few days
> in the hope that the gcc compile farm's ppc64be box is working
> again...

...now applied, thanks. (The folks behind the gcc cfarm did a very
fast job with getting the packages installed I requested, so
thanks to them as well.)

-- PMM



Re: [Qemu-devel] [PATCH] target-tilegx: Implement prefetch instructions in pipe y2

2015-10-22 Thread Richard Henderson

On 10/20/2015 05:26 AM, Chen Gang wrote:

From 14fe2a651b3f5729f1d402dfcd6eb5f7da0f42b1 Mon Sep 17 00:00:00 2001

From: Chen Gang 
Date: Tue, 20 Oct 2015 23:19:02 +0800
Subject: [PATCH] target-tilegx: Implement prefetch instructions in pipe y2

Originally, tilegx qemu only implement prefetch instructions in pipe x1,
did not implement them in pipe y2.

Signed-off-by: Chen Gang 


Applied.


r~



[Qemu-devel] [PULL v3 1/4] crypto: allow use of nettle/gcrypt to be selected explicitly

2015-10-22 Thread Daniel P. Berrange
Currently the choice of whether to use nettle or gcrypt is
made based on what gnutls is linked to. There are times
when it is desirable to be able to force build against a
specific library. For example, if testing changes to QEMU's
crypto code all 3 possible backends need to be checked
regardless of what the local gnutls uses.

It is also desirable to be able to enable nettle/gcrypt
for cipher/hash algorithms, without enabling gnutls
for TLS support.

This gives two new configure flags, which allow the
following possibilities

Automatically determine nettle vs gcrypt from what
gnutls links to (recommended to minimize number of
crypto libraries linked to)

 ./configure

Automatically determine nettle vs gcrypt based on
which is installed

 ./configure --disable-gnutls

Force use of nettle

 ./configure --enable-nettle

Force use of gcrypt

 ./configure --enable-gcrypt

Force use of built-in AES & crippled-DES

 ./configure --disable-nettle --disable-gcrypt

Signed-off-by: Daniel P. Berrange 
---
 configure   | 100 +---
 crypto/cipher.c |   8 ++---
 crypto/init.c   |  26 +++
 3 files changed, 105 insertions(+), 29 deletions(-)

diff --git a/configure b/configure
index 211bc6e..94f38a3 100755
--- a/configure
+++ b/configure
@@ -331,6 +331,8 @@ gtkabi=""
 gtk_gl="no"
 gnutls=""
 gnutls_hash=""
+nettle=""
+gcrypt=""
 vte=""
 virglrenderer=""
 tpm="yes"
@@ -1114,6 +1116,14 @@ for opt do
   ;;
   --enable-gnutls) gnutls="yes"
   ;;
+  --disable-nettle) nettle="no"
+  ;;
+  --enable-nettle) nettle="yes"
+  ;;
+  --disable-gcrypt) gcrypt="no"
+  ;;
+  --enable-gcrypt) gcrypt="yes"
+  ;;
   --enable-rdma) rdma="yes"
   ;;
   --disable-rdma) rdma="no"
@@ -1324,6 +1334,8 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   sparse  sparse checker
 
   gnutls  GNUTLS cryptography support
+  nettle  nettle cryptography support
+  gcrypt  libgcrypt cryptography support
   sdl SDL UI
   --with-sdlabi select preferred SDL ABI 1.2 or 2.0
   gtk gtk UI
@@ -2254,20 +2266,76 @@ else
 gnutls_hash="no"
 fi
 
-if test "$gnutls_gcrypt" != "no"; then
-if has "libgcrypt-config"; then
+
+# If user didn't give a --disable/enable-gcrypt flag,
+# then mark as disabled if user requested nettle
+# explicitly, or if gnutls links to nettle
+if test -z "$gcrypt"
+then
+if test "$nettle" = "yes" || test "$gnutls_nettle" = "yes"
+then
+gcrypt="no"
+fi
+fi
+
+# If user didn't give a --disable/enable-nettle flag,
+# then mark as disabled if user requested gcrypt
+# explicitly, or if gnutls links to gcrypt
+if test -z "$nettle"
+then
+if test "$gcrypt" = "yes" || test "$gnutls_gcrypt" = "yes"
+then
+nettle="no"
+fi
+fi
+
+has_libgcrypt_config() {
+if ! has "libgcrypt-config"
+then
+   return 1
+fi
+
+if test -n "$cross_prefix"
+then
+   host=`libgcrypt-config --host`
+   if test "$host-" != $cross_prefix
+   then
+   return 1
+   fi
+fi
+
+return 0
+}
+
+if test "$gcrypt" != "no"; then
+if has_libgcrypt_config; then
 gcrypt_cflags=`libgcrypt-config --cflags`
 gcrypt_libs=`libgcrypt-config --libs`
+# Debian has remove -lgpg-error from libgcrypt-config
+# as it "spreads unnecessary dependencies" which in
+# turn breaks static builds...
+if test "$static" = "yes"
+then
+gcrypt_libs="$gcrypt_libs -lgpg-error"
+fi
 libs_softmmu="$gcrypt_libs $libs_softmmu"
 libs_tools="$gcrypt_libs $libs_tools"
 QEMU_CFLAGS="$QEMU_CFLAGS $gcrypt_cflags"
+gcrypt="yes"
+if test -z "$nettle"; then
+   nettle="no"
+fi
 else
-feature_not_found "gcrypt" "Install gcrypt devel"
+if test "$gcrypt" = "yes"; then
+feature_not_found "gcrypt" "Install gcrypt devel"
+else
+gcrypt="no"
+fi
 fi
 fi
 
 
-if test "$gnutls_nettle" != "no"; then
+if test "$nettle" != "no"; then
 if $pkg_config --exists "nettle"; then
 nettle_cflags=`$pkg_config --cflags nettle`
 nettle_libs=`$pkg_config --libs nettle`
@@ -2275,11 +2343,21 @@ if test "$gnutls_nettle" != "no"; then
 libs_softmmu="$nettle_libs $libs_softmmu"
 libs_tools="$nettle_libs $libs_tools"
 QEMU_CFLAGS="$QEMU_CFLAGS $nettle_cflags"
+nettle="yes"
 else
-feature_not_found "nettle" "Install nettle devel"
+if test "$nettle" = "yes"; then
+feature_not_found "nettle" "Install nettle devel"
+else
+nettle="no"
+fi
 fi
 fi
 
+if test "$gcrypt" = "yes" && test "$nettle" = "yes"
+then
+error_exit "Only one of gcrypt & nettle can be enabled"
+fi
+
 ##
 # libtasn1 - only for the TLS creds/session test suite
 
@@ -4621,8 

[Qemu-devel] [PULL v3 0/4] Misc fixes for crypto code module

2015-10-22 Thread Daniel P. Berrange
The following changes since commit ca3e40e233e87f7b29442311736a82da01c0df7b:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2015-10-22 12:41:44 +0100)

are available in the git repository at:

  https://github.com/berrange/qemu.git tags/qcrypto-fixes-pull-20151022-2

for you to fetch changes up to 90246037760a2a1d64da67782200b690de24cc49:

  configure: avoid polluting global CFLAGS with tasn1 flags (2015-10-22 
19:03:08 +0100)


Merge qcrypto-fixes 2015/10/22



Daniel P. Berrange (4):
  crypto: allow use of nettle/gcrypt to be selected explicitly
  crypto: don't let builtin aes crash if no IV is provided
  crypto: add sanity checking of plaintext/ciphertext length
  configure: avoid polluting global CFLAGS with tasn1 flags

 configure  | 111 +
 crypto/cipher-builtin.c|  29 
 crypto/cipher-gcrypt.c |  61 ++---
 crypto/cipher-nettle.c |  28 
 crypto/cipher.c|   8 ++--
 crypto/init.c  |  26 +--
 tests/Makefile |  10 +++-
 tests/test-crypto-cipher.c |  80 
 8 files changed, 282 insertions(+), 71 deletions(-)

-- 
2.4.3




[Qemu-devel] [PULL] vhost-user: fix up rhel6 build

2015-10-22 Thread Michael S. Tsirkin
Build on RHEL6 fails:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42875

Apparently unnamed unions couldn't use C99  named field initializers.
Let's just name the payload union field.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/vhost-user.c | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 78442ba..0aa8e0d 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -89,7 +89,7 @@ typedef struct VhostUserMsg {
 struct vhost_vring_state state;
 struct vhost_vring_addr addr;
 VhostUserMemory memory;
-};
+} payload;
 } QEMU_PACKED VhostUserMsg;
 
 static VhostUserMsg m __attribute__ ((unused));
@@ -200,8 +200,8 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
uint64_t base,
 VhostUserMsg msg = {
 .request = VHOST_USER_SET_LOG_BASE,
 .flags = VHOST_USER_VERSION,
-.u64 = base,
-.size = sizeof(m.u64),
+.payload.u64 = base,
+.size = sizeof(m.payload.u64),
 };
 
 if (shmfd && log->fd != -1) {
@@ -247,17 +247,17 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
 _addr);
 fd = qemu_get_ram_fd(ram_addr);
 if (fd > 0) {
-msg.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
-msg.memory.regions[fd_num].memory_size  = reg->memory_size;
-msg.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
-msg.memory.regions[fd_num].mmap_offset = reg->userspace_addr -
+msg.payload.memory.regions[fd_num].userspace_addr = 
reg->userspace_addr;
+msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
+msg.payload.memory.regions[fd_num].guest_phys_addr = 
reg->guest_phys_addr;
+msg.payload.memory.regions[fd_num].mmap_offset = 
reg->userspace_addr -
 (uintptr_t) qemu_get_ram_block_host_ptr(ram_addr);
 assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
 fds[fd_num++] = fd;
 }
 }
 
-msg.memory.nregions = fd_num;
+msg.payload.memory.nregions = fd_num;
 
 if (!fd_num) {
 error_report("Failed initializing vhost-user memory map, "
@@ -265,8 +265,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
 return -1;
 }
 
-msg.size = sizeof(m.memory.nregions);
-msg.size += sizeof(m.memory.padding);
+msg.size = sizeof(m.payload.memory.nregions);
+msg.size += sizeof(m.payload.memory.padding);
 msg.size += fd_num * sizeof(VhostUserMemoryRegion);
 
 vhost_user_write(dev, , fds, fd_num);
@@ -280,7 +280,7 @@ static int vhost_user_set_vring_addr(struct vhost_dev *dev,
 VhostUserMsg msg = {
 .request = VHOST_USER_SET_VRING_ADDR,
 .flags = VHOST_USER_VERSION,
-.addr = *addr,
+.payload.addr = *addr,
 .size = sizeof(*addr),
 };
 
@@ -303,7 +303,7 @@ static int vhost_set_vring(struct vhost_dev *dev,
 VhostUserMsg msg = {
 .request = request,
 .flags = VHOST_USER_VERSION,
-.state = *ring,
+.payload.state = *ring,
 .size = sizeof(*ring),
 };
 
@@ -345,7 +345,7 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
 VhostUserMsg msg = {
 .request = VHOST_USER_GET_VRING_BASE,
 .flags = VHOST_USER_VERSION,
-.state = *ring,
+.payload.state = *ring,
 .size = sizeof(*ring),
 };
 
@@ -361,12 +361,12 @@ static int vhost_user_get_vring_base(struct vhost_dev 
*dev,
 return -1;
 }
 
-if (msg.size != sizeof(m.state)) {
+if (msg.size != sizeof(m.payload.state)) {
 error_report("Received bad msg size.");
 return -1;
 }
 
-*ring = msg.state;
+*ring = msg.payload.state;
 
 return 0;
 }
@@ -380,14 +380,14 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
 VhostUserMsg msg = {
 .request = request,
 .flags = VHOST_USER_VERSION,
-.u64 = file->index & VHOST_USER_VRING_IDX_MASK,
-.size = sizeof(m.u64),
+.payload.u64 = file->index & VHOST_USER_VRING_IDX_MASK,
+.size = sizeof(m.payload.u64),
 };
 
 if (ioeventfd_enabled() && file->fd > 0) {
 fds[fd_num++] = file->fd;
 } else {
-msg.u64 |= VHOST_USER_VRING_NOFD_MASK;
+msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK;
 }
 
 vhost_user_write(dev, , fds, fd_num);
@@ -412,8 +412,8 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64)
 VhostUserMsg msg = {
 .request = request,
 .flags = VHOST_USER_VERSION,
-.u64 = u64,
-.size = sizeof(m.u64),
+.payload.u64 = u64,
+.size = sizeof(m.payload.u64),
 };
 
 vhost_user_write(dev, , NULL, 0);
@@ -456,12 +456,12 @@ static int vhost_user_get_u64(struct vhost_dev 

Re: [Qemu-devel] [PATCH v2 3/3] target-i386: load the migrated vcpu's TSC rate

2015-10-22 Thread Eduardo Habkost
On Tue, Oct 20, 2015 at 03:22:54PM +0800, Haozhong Zhang wrote:
> Set vcpu's TSC rate to the migrated value (if any). If KVM supports TSC
> scaling, guest programs will observe TSC increasing in the migrated rate
> other than the host TSC rate.
> 
> The loading is controlled by a new cpu option 'load-tsc-freq'. If it is
> present, then the loading will be enabled and the migrated vcpu's TSC
> rate will override the value specified by the cpu option
> 'tsc-freq'. Otherwise, the loading will be disabled.

Why do we need an option? Why can't we enable loading unconditionally?

> 
> The setting of vcpu's TSC rate in this patch duplicates the code in
> kvm_arch_init_vcpu(), so we remove the latter one.
> 
> Signed-off-by: Haozhong Zhang 
> ---
>  target-i386/cpu.c |  1 +
>  target-i386/cpu.h |  1 +
>  target-i386/kvm.c | 28 +++-
>  3 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index b6bb457..763ba4b 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3144,6 +3144,7 @@ static Property x86_cpu_properties[] = {
>  DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
>  DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
>  DEFINE_PROP_BOOL("save-tsc-freq", X86CPU, env.save_tsc_khz, true),
> +DEFINE_PROP_BOOL("load-tsc-freq", X86CPU, env.load_tsc_khz, false),
>  DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
>  DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
>  DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index ba1a289..353f5fb 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -968,6 +968,7 @@ typedef struct CPUX86State {
>  int64_t tsc_khz;
>  int64_t tsc_khz_incoming;
>  bool save_tsc_khz;
> +bool load_tsc_khz;
>  void *kvm_xsave_buf;
>  
>  uint64_t mcg_cap;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 698524a..34616f5 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -743,15 +743,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  return r;
>  }
>  
> -r = kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL);
> -if (r && env->tsc_khz) {
> -r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz);
> -if (r < 0) {
> -fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
> -return r;
> -}
> -}
> -
>  if (kvm_has_xsave()) {
>  env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
>  }
> @@ -2223,6 +2214,25 @@ static int kvm_setup_tsc_khz(X86CPU *cpu, int level)
>  return 0;
>  
>  /*
> + * If the cpu option 'load-tsc-freq' is present, the vcpu's TSC rate in 
> the
> + * migrated state will be used and the overrides the user-specified 
> vcpu's
> + * TSC rate (if any).
> + */
> +if (runstate_check(RUN_STATE_INMIGRATE) &&
> +env->load_tsc_khz && env->tsc_khz_incoming) {
> +env->tsc_khz = env->tsc_khz_incoming;
> +}

Please don't make the results of the function depend on global QEMU
runstate, as it makes it harder to reason about it, and easy to
introduce subtle bugs if we change initialization order. Can't we just
ensure tsc_khz gets set to the right value before the function is
called, inside the code that loads migration data?

> +
> +r = kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL);
> +if (r && env->tsc_khz) {
> +r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz);
> +if (r < 0) {
> +fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
> +return r;
> +}
> +}

So, the final result here does not depend on the configuration, but also
on host capabilities. That means nobody can possibly know if the
tsc-freq option really works, until they enable it, run the VM, and
check the results from inside the VM. Not a good idea.

(This doesn't apply just to the new code, the existing code is already
broken this way.)

-- 
Eduardo



[Qemu-devel] [PULL 32/38] vhost-user-test: add live-migration test

2015-10-22 Thread Michael S. Tsirkin
From: Marc-André Lureau 

This test checks that the log fd is given to the migration source, and
mark dirty pages during migration.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Tested-by: Thibaut Collet 
---
 tests/vhost-user-test.c | 171 +++-
 1 file changed, 169 insertions(+), 2 deletions(-)

diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index 791d849..ef22e3e 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -12,6 +12,7 @@
 
 #include "libqtest.h"
 #include "qemu/option.h"
+#include "qemu/range.h"
 #include "sysemu/char.h"
 #include "sysemu/sysemu.h"
 
@@ -47,6 +48,9 @@
 #define VHOST_MEMORY_MAX_NREGIONS8
 
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
+#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
+
+#define VHOST_LOG_PAGE 0x1000
 
 typedef enum VhostUserRequest {
 VHOST_USER_NONE = 0,
@@ -117,6 +121,7 @@ typedef struct TestServer {
 VhostUserMemory memory;
 GMutex data_mutex;
 GCond data_cond;
+int log_fd;
 } TestServer;
 
 #if !GLIB_CHECK_VERSION(2, 32, 0)
@@ -238,7 +243,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 /* send back features to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
 msg.size = sizeof(m.u64);
-msg.u64 = 0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
+msg.u64 = 0x1ULL << VHOST_F_LOG_ALL |
+0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
 p = (uint8_t *) 
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
@@ -252,7 +258,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 /* send back features to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
 msg.size = sizeof(m.u64);
-msg.u64 = 0;
+msg.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
 p = (uint8_t *) 
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
@@ -286,6 +292,21 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
  */
 qemu_set_nonblock(fd);
 break;
+
+case VHOST_USER_SET_LOG_BASE:
+if (s->log_fd != -1) {
+close(s->log_fd);
+s->log_fd = -1;
+}
+qemu_chr_fe_get_msgfds(chr, >log_fd, 1);
+msg.flags |= VHOST_USER_REPLY_MASK;
+msg.size = 0;
+p = (uint8_t *) 
+qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE);
+
+g_cond_signal(>data_cond);
+break;
+
 default:
 break;
 }
@@ -337,6 +358,8 @@ static TestServer *test_server_new(const gchar *name)
 g_mutex_init(>data_mutex);
 g_cond_init(>data_cond);
 
+server->log_fd = -1;
+
 return server;
 }
 
@@ -358,12 +381,155 @@ static void test_server_free(TestServer *server)
 close(server->fds[i]);
 }
 
+if (server->log_fd != -1) {
+close(server->log_fd);
+}
+
 unlink(server->socket_path);
 g_free(server->socket_path);
 
+
+g_free(server->chr_name);
 g_free(server);
 }
 
+static void wait_for_log_fd(TestServer *s)
+{
+gint64 end_time;
+
+g_mutex_lock(>data_mutex);
+end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
+while (s->log_fd == -1) {
+if (!g_cond_wait_until(>data_cond, >data_mutex, end_time)) {
+/* timeout has passed */
+g_assert(s->log_fd != -1);
+break;
+}
+}
+
+g_mutex_unlock(>data_mutex);
+}
+
+static void write_guest_mem(TestServer *s, uint32 seed)
+{
+uint32_t *guest_mem;
+int i, j;
+size_t size;
+
+wait_for_fds(s);
+
+/* iterate all regions */
+for (i = 0; i < s->fds_num; i++) {
+
+/* We'll write only the region statring at 0x0 */
+if (s->memory.regions[i].guest_phys_addr != 0x0) {
+continue;
+}
+
+g_assert_cmpint(s->memory.regions[i].memory_size, >, 1024);
+
+size = s->memory.regions[i].memory_size +
+s->memory.regions[i].mmap_offset;
+
+guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, s->fds[i], 0);
+
+g_assert(guest_mem != MAP_FAILED);
+guest_mem += (s->memory.regions[i].mmap_offset / sizeof(*guest_mem));
+
+for (j = 0; j < 256; j++) {
+guest_mem[j] = seed + j;
+}
+
+munmap(guest_mem, s->memory.regions[i].memory_size);
+break;
+}
+}
+
+static guint64 get_log_size(TestServer *s)
+{
+guint64 log_size = 0;
+int i;
+
+for (i = 0; i < s->memory.nregions; ++i) {
+VhostUserMemoryRegion *reg = >memory.regions[i];
+guint64 last = range_get_last(reg->guest_phys_addr,
+   reg->memory_size);
+log_size = MAX(log_size, last / (8 * VHOST_LOG_PAGE) + 1);

Re: [Qemu-devel] [Qemu-block] [PATCH v5 01/12] aio: Add "is_external" flag for event handlers

2015-10-22 Thread Jeff Cody
On Wed, Oct 21, 2015 at 10:06:38AM +0800, Fam Zheng wrote:
> All callers pass in false, and the real external ones will switch to
> true in coming patches.
> 
> Signed-off-by: Fam Zheng 

Just a comment, but not necessarily a request to change:

We have a lot of functions that take bool true/false arguments, and it
can sometimes make it difficult to read the code.  I wonder if an enum
(e.g. AIO_EXTERNAL, AIO_INTERNAL) would be more descriptive and easier
to read for the aio_set_* functions.

Either way:

Reviewed-by: Jeff Cody 

> ---
>  aio-posix.c |  6 -
>  aio-win32.c |  5 
>  async.c |  3 ++-
>  block/curl.c| 14 +-
>  block/iscsi.c   |  9 +++
>  block/linux-aio.c   |  5 ++--
>  block/nbd-client.c  | 10 ---
>  block/nfs.c | 17 +---
>  block/sheepdog.c| 38 ++-
>  block/ssh.c |  5 ++--
>  block/win32-aio.c   |  5 ++--
>  hw/block/dataplane/virtio-blk.c |  6 +++--
>  hw/scsi/virtio-scsi-dataplane.c | 24 +++--
>  include/block/aio.h |  2 ++
>  iohandler.c |  3 ++-
>  nbd.c   |  4 ++-
>  tests/test-aio.c| 58 
> +++--
>  17 files changed, 130 insertions(+), 84 deletions(-)
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index d477033..f0f9122 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -25,6 +25,7 @@ struct AioHandler
>  IOHandler *io_write;
>  int deleted;
>  void *opaque;
> +bool is_external;
>  QLIST_ENTRY(AioHandler) node;
>  };
>  
> @@ -43,6 +44,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
>  
>  void aio_set_fd_handler(AioContext *ctx,
>  int fd,
> +bool is_external,
>  IOHandler *io_read,
>  IOHandler *io_write,
>  void *opaque)
> @@ -82,6 +84,7 @@ void aio_set_fd_handler(AioContext *ctx,
>  node->io_read = io_read;
>  node->io_write = io_write;
>  node->opaque = opaque;
> +node->is_external = is_external;
>  
>  node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
>  node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
> @@ -92,10 +95,11 @@ void aio_set_fd_handler(AioContext *ctx,
>  
>  void aio_set_event_notifier(AioContext *ctx,
>  EventNotifier *notifier,
> +bool is_external,
>  EventNotifierHandler *io_read)
>  {
>  aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
> -   (IOHandler *)io_read, NULL, notifier);
> +   is_external, (IOHandler *)io_read, NULL, notifier);
>  }
>  
>  bool aio_prepare(AioContext *ctx)
> diff --git a/aio-win32.c b/aio-win32.c
> index 50a6867..3110d85 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -28,11 +28,13 @@ struct AioHandler {
>  GPollFD pfd;
>  int deleted;
>  void *opaque;
> +bool is_external;
>  QLIST_ENTRY(AioHandler) node;
>  };
>  
>  void aio_set_fd_handler(AioContext *ctx,
>  int fd,
> +bool is_external,
>  IOHandler *io_read,
>  IOHandler *io_write,
>  void *opaque)
> @@ -86,6 +88,7 @@ void aio_set_fd_handler(AioContext *ctx,
>  node->opaque = opaque;
>  node->io_read = io_read;
>  node->io_write = io_write;
> +node->is_external = is_external;
>  
>  event = event_notifier_get_handle(>notifier);
>  WSAEventSelect(node->pfd.fd, event,
> @@ -98,6 +101,7 @@ void aio_set_fd_handler(AioContext *ctx,
>  
>  void aio_set_event_notifier(AioContext *ctx,
>  EventNotifier *e,
> +bool is_external,
>  EventNotifierHandler *io_notify)
>  {
>  AioHandler *node;
> @@ -133,6 +137,7 @@ void aio_set_event_notifier(AioContext *ctx,
>  node->e = e;
>  node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
>  node->pfd.events = G_IO_IN;
> +node->is_external = is_external;
>  QLIST_INSERT_HEAD(>aio_handlers, node, node);
>  
>  g_source_add_poll(>source, >pfd);
> diff --git a/async.c b/async.c
> index efce14b..bdc64a3 100644
> --- a/async.c
> +++ b/async.c
> @@ -247,7 +247,7 @@ aio_ctx_finalize(GSource *source)
>  }
>  qemu_mutex_unlock(>bh_lock);
>  
> -aio_set_event_notifier(ctx, >notifier, NULL);
> +aio_set_event_notifier(ctx, >notifier, false, NULL);
>  event_notifier_cleanup(>notifier);
>  rfifolock_destroy(>lock);
>  

[Qemu-devel] [RFC Patch 10/12] IXGBEVF: Add lock to protect tx/rx ring operation

2015-10-22 Thread Lan Tianyu
Ring shifting during restoring VF function maybe race with original
ring operation(transmit/receive package). This patch is to add tx/rx
lock to protect ring related data.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  2 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 28 ---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 6eab402e..3a748c8 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -448,6 +448,8 @@ struct ixgbevf_adapter {
 
spinlock_t mbx_lock;
unsigned long last_reset;
+   spinlock_t mg_rx_lock;
+   spinlock_t mg_tx_lock;
 };
 
 enum ixbgevf_state_t {
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 15ec361..04b6ce7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -227,8 +227,10 @@ static u64 ixgbevf_get_tx_completed(struct ixgbevf_ring 
*ring)
 
 int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
 {
+   struct ixgbevf_adapter *adapter = netdev_priv(r->netdev);
struct ixgbevf_tx_buffer *tx_buffer = NULL;
static union ixgbevf_desc *tx_desc = NULL;
+   unsigned long flags;
 
tx_buffer = vmalloc(sizeof(struct ixgbevf_tx_buffer) * (r->count));
if (!tx_buffer)
@@ -238,6 +240,7 @@ int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
if (!tx_desc)
return -ENOMEM;
 
+   spin_lock_irqsave(>mg_tx_lock, flags);
memcpy(tx_desc, r->desc, sizeof(union ixgbevf_desc) * r->count);
memcpy(r->desc, _desc[head], sizeof(union ixgbevf_desc) * (r->count 
- head));
memcpy(>desc[r->count - head], tx_desc, sizeof(union ixgbevf_desc) * 
head);
@@ -256,6 +259,8 @@ int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
else
r->next_to_use += (r->count - head);
 
+   spin_unlock_irqrestore(>mg_tx_lock, flags);
+
vfree(tx_buffer);
vfree(tx_desc);
return 0;
@@ -263,8 +268,10 @@ int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
 
 int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head)
 {
+   struct ixgbevf_adapter *adapter = netdev_priv(r->netdev);
struct ixgbevf_rx_buffer *rx_buffer = NULL;
static union ixgbevf_desc *rx_desc = NULL;
+   unsigned long flags;
 
rx_buffer = vmalloc(sizeof(struct ixgbevf_rx_buffer) * (r->count));
if (!rx_buffer)
@@ -274,6 +281,7 @@ int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head)
if (!rx_desc)
return -ENOMEM;
 
+   spin_lock_irqsave(>mg_rx_lock, flags);
memcpy(rx_desc, r->desc, sizeof(union ixgbevf_desc) * (r->count));
memcpy(r->desc, _desc[head], sizeof(union ixgbevf_desc) * (r->count 
- head));
memcpy(>desc[r->count - head], rx_desc, sizeof(union ixgbevf_desc) * 
head);
@@ -291,6 +299,7 @@ int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head)
r->next_to_use -= head;
else
r->next_to_use += (r->count - head);
+   spin_unlock_irqrestore(>mg_rx_lock, flags);
 
vfree(rx_buffer);
vfree(rx_desc);
@@ -377,6 +386,8 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
if (test_bit(__IXGBEVF_DOWN, >state))
return true;
 
+   spin_lock(>mg_tx_lock);
+   i = tx_ring->next_to_clean;
tx_buffer = _ring->tx_buffer_info[i];
tx_desc = IXGBEVF_TX_DESC(tx_ring, i);
i -= tx_ring->count;
@@ -471,6 +482,8 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
 
+   spin_unlock(>mg_tx_lock);
+
if (check_for_tx_hang(tx_ring) && ixgbevf_check_tx_hang(tx_ring)) {
struct ixgbe_hw *hw = >hw;
union ixgbe_adv_tx_desc *eop_desc;
@@ -999,10 +1012,12 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector 
*q_vector,
struct ixgbevf_ring *rx_ring,
int budget)
 {
+   struct ixgbevf_adapter *adapter = netdev_priv(rx_ring->netdev);
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 cleaned_count = ixgbevf_desc_unused(rx_ring);
struct sk_buff *skb = rx_ring->skb;
 
+   spin_lock(>mg_rx_lock);
while (likely(total_rx_packets < budget)) {
union ixgbe_adv_rx_desc *rx_desc;
 
@@ -1078,6 +1093,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector 
*q_vector,
q_vector->rx.total_packets += total_rx_packets;
q_vector->rx.total_bytes += total_rx_bytes;
 
+   

[Qemu-devel] [PATCH 23/40] slirp: Fix non blocking connect for w32

2015-10-22 Thread Michael Roth
From: Stefan Weil 

Signed-off-by: Stefan Weil 
(cherry picked from commit a246a01631f90230374c2b8ffce608232e2aa654)
Signed-off-by: Michael Roth 
---
 slirp/tcp_input.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/slirp/tcp_input.c b/slirp/tcp_input.c
index f946db8..00a77b4 100644
--- a/slirp/tcp_input.c
+++ b/slirp/tcp_input.c
@@ -584,7 +584,13 @@ findso:
goto cont_input;
  }
 
- if((tcp_fconnect(so) == -1) && (errno != EINPROGRESS) && (errno != 
EWOULDBLOCK)) {
+  if ((tcp_fconnect(so) == -1) &&
+#if defined(_WIN32)
+  socket_error() != WSAEWOULDBLOCK
+#else
+  (errno != EINPROGRESS) && (errno != EWOULDBLOCK)
+#endif
+  ) {
u_char code=ICMP_UNREACH_NET;
DEBUG_MISC((dfd, " tcp fconnect errno = %d-%s\n",
errno,strerror(errno)));
-- 
1.9.1




[Qemu-devel] [PATCH 38/40] util/qemu-config: fix missing machine command line options

2015-10-22 Thread Michael Roth
From: Tony Krowiak 

Commit 0a7cf217 ("util/qemu-config: fix regression of
qmp_query_command_line_options") aimed to restore parsing of global
machine options, but missed two: "aes-key-wrap" and
"dea-key-wrap" (which were present in the initial version of that
patch). Let's add them to the machine_opts again.

Fixes: 0a7cf217 ("util/qemu-config: fix regression of
  qmp_query_command_line_options")
CC: Marcel Apfelbaum 
CC: qemu-sta...@nongnu.org
Signed-off-by: Tony Krowiak 
Reviewed-by: Marcel Apfelbaum 
Tested-by: Christian Borntraeger 
Message-Id: <1444664181-28023-1-git-send-email-akrow...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 

(cherry picked from commit 5bcfa0c543b42a560673cafd3b5225900ef617e1)
Signed-off-by: Michael Roth 
---
 util/qemu-config.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/util/qemu-config.c b/util/qemu-config.c
index 5fcfd0e..687fd34 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -219,6 +219,14 @@ static QemuOptsList machine_opts = {
 .name = "suppress-vmdesc",
 .type = QEMU_OPT_BOOL,
 .help = "Set on to disable self-describing migration",
+},{
+.name = "aes-key-wrap",
+.type = QEMU_OPT_BOOL,
+.help = "enable/disable AES key wrapping using the CPACF wrapping 
key",
+},{
+.name = "dea-key-wrap",
+.type = QEMU_OPT_BOOL,
+.help = "enable/disable DEA key wrapping using the CPACF wrapping 
key",
 },
 { /* End of list */ }
 }
-- 
1.9.1




[Qemu-devel] [RFC Patch 11/12] IXGBEVF: Migrate VF statistic data

2015-10-22 Thread Lan Tianyu
VF statistic regs are read-only and can't be migrated via writing back
directly.

Currently, statistic data returned to user space by the driver is not equal
to value of statistic regs. VF driver records value of statistic regs as base 
data
when net interface is up or open, calculate increased count of regs during
last period of online service and added it to saved_reset data. When user
space collects statistic data, VF driver returns result of
"current - base + saved_reset". "Current" is reg value at that point.

Restoring net function after migration just likes net interface is up or open.
Call existed function to update base and saved_reset data to keep statistic
data continual during migration.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 04b6ce7..d22160f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3005,6 +3005,7 @@ int ixgbevf_live_mg(struct ixgbevf_adapter *adapter)
return 0;
 
del_timer_sync(>service_timer);
+   ixgbevf_update_stats(adapter);
pr_info("migration start\n");
migration_status = MIGRATION_IN_PROGRESS; 
 
@@ -3017,6 +3018,8 @@ int ixgbevf_live_mg(struct ixgbevf_adapter *adapter)
return 1;
 
ixgbevf_restore_state(adapter);
+   ixgbevf_save_reset_stats(adapter);
+   ixgbevf_init_last_counter_stats(adapter);
migration_status = MIGRATION_COMPLETED;
pr_info("migration end\n");
return 0;
-- 
1.8.4.rc0.1.g8f6a3e5.dirty




Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-22 Thread Valerio Aimale

On 10/21/15 4:54 AM, Markus Armbruster wrote:

Valerio Aimale  writes:


On 10/19/15 1:52 AM, Markus Armbruster wrote:

Valerio Aimale  writes:


On 10/16/15 2:15 AM, Markus Armbruster wrote:

vale...@aimale.com writes:


All-

I've produced a patch for the current QEMU HEAD, for libvmi to
introspect QEMU/KVM VMs.

Libvmi has patches for the old qeum-kvm fork, inside its source tree:
https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch

This patch adds a hmp and a qmp command, "pmemaccess". When the
commands is invoked with a string arguments (a filename), it will open
a UNIX socket and spawn a listening thread.

The client writes binary commands to the socket, in the form of a c
structure:

struct request {
uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
uint64_t address;   // address to read from OR write to
uint64_t length;// number of bytes to read OR write
};

The client receives as a response, either (length+1) bytes, if it is a
read operation, or 1 byte ifit is a write operation.

The last bytes of a read operation response indicates success (1
success, 0 failure). The single byte returned for a write operation
indicates same (1 success, 0 failure).

So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
garbage followed by the "it failed" byte?

Markus, that appear to be the case. However, I did not write the
communication protocol between libvmi and qemu. I'm assuming that the
person that wrote the protocol, did not want to bother with over
complicating things.

https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c

I'm thinking he assumed reads would be small in size and the price of
reading garbage was less than the price of writing a more complicated
protocol. I can see his point, confronted with the same problem, I
might have done the same.

All right, the interface is designed for *small* memory blocks then.

Makes me wonder why he needs a separate binary protocol on a separate
socket.  Small blocks could be done just fine in QMP.

The problem is speed. if one's analyzing the memory space of a running
process (physical and paged), libvmi will make a large number of small
and mid-sized reads. If one uses xp, or pmemsave, the overhead is
quite significant. xp has overhead due to encoding, and pmemsave has
overhead due to file open/write (server), file open/read/close/unlink
(client).

Others have gone through the problem before me. It appears that
pmemsave and xp are significantly slower than reading memory using a
socket via pmemaccess.

That they're slower isn't surprising, but I'd expect the cost of
encoding a small block to be insiginificant compared to the cost of the
network roundtrips.

As block size increases, the space overhead of encoding will eventually
bite.  But for that usage, the binary protocol appears ill-suited,
unless the client can pretty reliably avoid read failure.  I haven't
examined its failure modes, yet.


The following data is not mine, but it shows the time, in
milliseconds, required to resolve the content of a paged memory
address via socket (pmemaccess) , pmemsave and xp

http://cl.ly/image/322a3s0h1V05

Again, I did not produce those data points, they come from an old
libvmi thread.

90ms is a very long time.  What exactly was measured?
That is a fair question to ask. Unfortunately, I extracted  that data 
plot from an old thread in some libvmi mailing list. I do not have the 
data and code that produced it. Sifting through the thread, I can see 
the code
was never published. I will take it upon myself to produce code that 
compares timing - in a fair fashion - of libvmi doing an atomic 
operation and a larger-scale operation (like listing running processes)  
via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a version of xp 
that returns byte streams of memory regions base64 or base85 encoded in 
json strings. I'll publish results and code.


However, given workload and life happening, it will be some time before 
I complete that task.



I think it might be conceivable that there could be a QMP command that
returns the content of an arbitrarily size memory region as a base64
or a base85 json string. It would still have both time- (due to
encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
overhead, + json encoding/decoding overhead. It might still be the
case that socket would outperform such a command as well,
speed-vise. I don't think it would be any faster than xp.

A special-purpose binary protocol over a dedicated socket will always do
less than a QMP solution (ignoring foolishness like transmitting crap on
read error the client is then expected to throw away).  The question is
whether the difference in work translates to a worthwhile difference in
performance.

The larger question is actually whether we have an existing interface
that can serve the libvmi's needs.  We've discussed monitor commands
like xp, 

[Qemu-devel] [PATCH v6 05/12] block: Introduce "drained begin/end" API

2015-10-22 Thread Fam Zheng
The semantics is that after bdrv_drained_begin(bs), bs will not get new external
requests until the matching bdrv_drained_end(bs).

Signed-off-by: Fam Zheng 
---
 block/io.c| 17 +
 include/block/block.h | 19 +++
 include/block/block_int.h |  2 ++
 3 files changed, 38 insertions(+)

diff --git a/block/io.c b/block/io.c
index 2fd7a1d..5ac6256 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2624,3 +2624,20 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
 }
 bdrv_start_throttled_reqs(bs);
 }
+
+void bdrv_drained_begin(BlockDriverState *bs)
+{
+if (!bs->quiesce_counter++) {
+aio_disable_external(bdrv_get_aio_context(bs));
+}
+bdrv_drain(bs);
+}
+
+void bdrv_drained_end(BlockDriverState *bs)
+{
+assert(bs->quiesce_counter > 0);
+if (--bs->quiesce_counter > 0) {
+return;
+}
+aio_enable_external(bdrv_get_aio_context(bs));
+}
diff --git a/include/block/block.h b/include/block/block.h
index 28d903c..5d722a7 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -610,4 +610,23 @@ void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
 void bdrv_flush_io_queue(BlockDriverState *bs);
 
+/**
+ * bdrv_drained_begin:
+ *
+ * Begin a quiesced section for exclusive access to the BDS, by disabling
+ * external request sources including NBD server and device model. Note that
+ * this doesn't block timers or coroutines from submitting more requests, which
+ * means block_job_pause is still necessary.
+ *
+ * This function can be recursive.
+ */
+void bdrv_drained_begin(BlockDriverState *bs);
+
+/**
+ * bdrv_drained_end:
+ *
+ * End a quiescent section started by bdrv_drained_begin().
+ */
+void bdrv_drained_end(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e472a03..e317b14 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -448,6 +448,8 @@ struct BlockDriverState {
 /* threshold limit for writes, in bytes. "High water mark". */
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
+
+int quiesce_counter;
 };
 
 struct BlockBackendRootState {
-- 
2.4.3




Re: [Qemu-devel] [Qemu-block] [PATCH] block/nfs: add support for setting debug level

2015-10-22 Thread Peter Lieven

Am 22.09.2015 um 08:13 schrieb Peter Lieven:

Am 25.06.2015 um 15:18 schrieb Stefan Hajnoczi:

On Tue, Jun 23, 2015 at 10:12:15AM +0200, Peter Lieven wrote:

upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through an URL parameter.

Signed-off-by: Peter Lieven 
---
  block/nfs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index ca9e24e..f7388a3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -329,6 +329,10 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
  } else if (!strcmp(qp->p[i].name, "readahead")) {
  nfs_set_readahead(client->context, val);
  #endif
+#ifdef LIBNFS_FEATURE_DEBUG
+} else if (!strcmp(qp->p[i].name, "debug")) {
+nfs_set_debug(client->context, val);
+#endif
  } else {
  error_setg(errp, "Unknown NFS parameter name: %s",
 qp->p[i].name);

Untrusted users may be able to set these options since they are encoded
in the URI.  I'm imagining a hosting or cloud scenario like OpenStack.

A verbose debug level spams stderr and could consume a lot of disk
space.

(The uid and gid options are probably okay since the NFS server cannot
trust the uid/gid coming from QEMU anyway.)

I think we can merge this patch for QEMU 2.4 but I'd like to have a
discussion about the security risk of encoding libnfs options in the
URI.

CCed Eric Blake in case libvirt is affected.

Has anyone thought about this and what are the rules?


As I hadn't time to work further on the best way to add options for NFS (and 
other
protocols), would it be feasible to allow passing debug as an URL parameter, but
limit the maximum debug level to limit a possible security impact (flooding 
logs)?

If a higher debug level is needed it can be set via device specific options as 
soon
there is a common scheme for them.


Any objections?

Peter




Re: [Qemu-devel] [PATCH v9 3/3] block/gluster: add support for multiple gluster servers

2015-10-22 Thread Peter Krempa
On Wed, Oct 21, 2015 at 19:04:11 +0530, Prasanna Kumar Kalever wrote:

...

> ---
>  block/gluster.c  | 420 
> +--
>  qapi/block-core.json |  62 +++-
>  2 files changed, 433 insertions(+), 49 deletions(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index ededda2..62b6656 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c

...

> +
>  
>  static void qemu_gluster_gconf_free(GlusterConf *gconf)
>  {
>  if (gconf) {
> -g_free(gconf->host);
>  g_free(gconf->volume);
>  g_free(gconf->path);
> -g_free(gconf->transport);
> +if (gconf->gsconf) {
> +g_free(gconf->gsconf[0].host);
> +g_free(gconf->gsconf[0].transport);
> +g_free(gconf->gsconf);

Looks like this leaks second and any further server config struct.

> +}
>  g_free(gconf);
>  }
>  }

Peter



signature.asc
Description: Digital signature


Re: [Qemu-devel] [PATCH 0/6] e1000: Various fixes and registers' implementation

2015-10-22 Thread Jason Wang


On 10/21/2015 09:32 PM, Leonid Bloch wrote:
> Hi Jason, thanks again for reviewing,
>
> On Tue, Oct 20, 2015 at 9:37 AM, Jason Wang  > wrote:
> >
> >
> > On 10/18/2015 03:53 PM, Leonid Bloch wrote:
> >> This series fixes several issues with incorrect packet/octet
> counting in
> >> e1000's Statistic registers, fixes a bug in the packet address
> filtering
> >> procedure, and implements many MAC registers that were absent before.
> >> Additionally, some cosmetic changes are made.
> >>
> >> Leonid Bloch (6):
> >>   e1000: Cosmetic and alignment fixes
> >>   e1000: Trivial implementation of various MAC registers
> >>   e1000: Fixing the received/transmitted packets' counters
> >>   e1000: Fixing the received/transmitted octets' counters
> >>   e1000: Fixing the packet address filtering procedure
> >>   e1000: Implementing various counters
> >>
> >>  hw/net/e1000.c  | 313
> ++--
> >>  hw/net/e1000_regs.h |   8 +-
> >>  2 files changed, 236 insertions(+), 85 deletions(-)
> >>
> >
> > Looks good to me overall, just few comments in individual patches.
> >
> > A question here, is there any real user/OSes that tries to use those
> > registers? If not, maintain them sees a burden and it's a little bit
> > hard the test them unless unit-test were implemented for those
> > registers. And I'd like to know the test status of this series. At least
> > both windows and linux guest need to be tested.
> >
> > Thanks
>
> While we did not encounter any actual drivers that malfunction because
> of the lack of these registers, implementing them makes the device
> closer to Intel's specs, and reduces the chances that some OSes,
> currently or in the future, may misbehave because of the missing
> registers. The implementation of these additional registers seems as a
> natural continuation of this series, the main purpose of which is to
> fix several bugs in this device.
>
> As for testing, it was performed, obviously, in several different
> scenarios with Linux (Fedora 22) + Windows (2012R2) guests. No
> regressions (and no statistically significant deviations) were found.
> Please find representative results (TCP, 1 stream) for both Linux and
> Windows guests below:
>
> Fedora 22 guest -- receive
> 5
> +-+--+
>   |  
>  A
>   4.5 +-+   A A  A
> B
> 4 +-+   A A  
>  |
>   |  A  B
>  |
>   3.5 +-+
>  |
>   |A  
> |
> G   3 +-+  B  
> |
> b 2.5 +-+
>  |
> / |  
>  |
> s   2 +-+
>  |
>   |  A
> |
>   1.5 +-+
>  |
>   |  
>  |
> 1 +-+  A  
> |
>   0.5 +-+   A
>  |
>   A +  + + + +  + + + +  +
> +
> 0
> +-+---+--+-+-+-+--+-+-+-+--+-+
>  32B   64B   128B  256B  512B   1KB2KB   4KB   8KB  16KB  
> 32KB  64KB
>   +Buffer size+
>   Mean-old -- A  Mean-new -- B
>
>
> Fedora 22 guest -- transmit
> 2
> +-+--+
>   |  B
> A
>   1.8 +-+A
> |
>   1.6 +-+
>  |
>   |  
>  |
>   1.4 +-+ A  
>  |
>   |  
>  |
> G 1.2 +-+
>  |
> b   1 +-+
>  |
> / | A
>  |
> s 0.8 +-+
>  |
>   | 

Re: [Qemu-devel] [PATCH qemu v4] monitor/target-ppc: Define target_get_monitor_def

2015-10-22 Thread Alexey Kardashevskiy

On 10/02/2015 04:16 PM, Alexey Kardashevskiy wrote:

At the moment get_monitor_def() returns only registers from statically
defined monitor_defs array. However there is a lot of BOOK3S SPRs
which are not in the list and cannot be printed from the monitor.

This adds a new target platform hook - target_get_monitor_def().
The hook is called if a register was not found in the static
array returned by the target_monitor_defs() hook.

The hook is only defined for POWERPC, it returns registered
SPRs and fails on unregistered ones providing the user with information
on what is actually supported on the running CPU. The register value is
saved as uint64_t as it is the biggest supported register size;
target_ulong cannot be used because of the stub - it is in a "common"
code and cannot include "cpu.h", etc; this is also why the hook prototype
is redefined in the stub instead of being included from some header.

This replaces static descriptors for GPRs, FPRs, SRs with a helper which
looks for a value in a corresponding array in the CPUPPCState.
The immediate effect is that all 32 SRs can be printed now (instead of 16);
later this can be reused for VSX or TM registers.

While we are here, this adds "cr" as a synonym of "ccr".

Signed-off-by: Alexey Kardashevskiy 
---

Does it make sense to split it into two patches?



Ping?




---
Changes:
v4:
* rebased on the current upstream which moved MonitorDef to target-ppc
* reverted the change for registers other than GPR/FPR/SPR, such as
FPSCR/PC/MSR/...
* removed CPU hook and made it a stub

v3:
* removed the check for endptr as strtoul() always initializes it
* check if there is any number after r/f/sr and fail if none
* added tolower()/strncasecmp() to support both r/f/sr and R/F/SR

v2:
* handles r**, f**, sr** if their numbers  were parsed completely and correctly
* added "cr" as synonym of "ccr"
---
  include/monitor/hmp-target.h   |   1 +
  monitor.c  |  10 +-
  stubs/Makefile.objs|   1 +
  stubs/target-get-monitor-def.c |  31 ++
  target-ppc/cpu-qom.h   |   2 +
  target-ppc/monitor.c   | 219 -
  6 files changed, 105 insertions(+), 159 deletions(-)
  create mode 100644 stubs/target-get-monitor-def.c

diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
index 213566c..bc2c9c0 100644
--- a/include/monitor/hmp-target.h
+++ b/include/monitor/hmp-target.h
@@ -35,6 +35,7 @@ struct MonitorDef {
  };

  const MonitorDef *target_monitor_defs(void);
+int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval);

  CPUArchState *mon_get_cpu_env(void);
  CPUState *mon_get_cpu(void);
diff --git a/monitor.c b/monitor.c
index 4f1ba2f..83e126a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2104,6 +2104,8 @@ static int get_monitor_def(target_long *pval, const char 
*name)
  {
  const MonitorDef *md = target_monitor_defs();
  void *ptr;
+uint64_t tmp = 0;
+int ret;

  if (md == NULL) {
  return -1;
@@ -2131,7 +2133,13 @@ static int get_monitor_def(target_long *pval, const char 
*name)
  return 0;
  }
  }
-return -1;
+
+ret = target_get_monitor_def(mon_get_cpu(), name, );
+if (!ret) {
+*pval = (target_long) tmp;
+}
+
+return ret;
  }

  static void next(void)
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 85e4e81..91bcdbb 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -39,3 +39,4 @@ stub-obj-y += cpus.o
  stub-obj-y += kvm.o
  stub-obj-y += qmp_pc_dimm_device_list.o
  stub-obj-y += target-monitor-defs.o
+stub-obj-y += target-get-monitor-def.o
diff --git a/stubs/target-get-monitor-def.c b/stubs/target-get-monitor-def.c
new file mode 100644
index 000..711a9ae
--- /dev/null
+++ b/stubs/target-get-monitor-def.c
@@ -0,0 +1,31 @@
+/*
+ *  Stub for target_get_monitor_def.
+ *
+ *  Copyright IBM Corp., 2015
+ *
+ *  Author: Alexey Kardashevskiy 
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License,
+ *  or (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include "stdint.h"
+
+typedef struct CPUState CPUState;
+
+int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval);
+
+int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval)
+{
+return -1;
+}
diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index 6967a80..bc20504 100644

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Pavel Fedin
 Hello!

> -Original Message-
> From: Shlomo Pongratz [mailto:shlomopongr...@gmail.com]
> Sent: Tuesday, October 20, 2015 8:22 PM
> To: qemu-devel@nongnu.org
> Cc: p.fe...@samsung.com; peter.mayd...@linaro.org; eric.au...@linaro.org;
> shannon.z...@linaro.org; imamm...@redhat.com; ash...@broadcom.com; Shlomo 
> Pongratz
> Subject: [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support
> 
> From: Shlomo Pongratz 
> 
> Add system instructions used by the Linux (kernel) GICv3
> device driver
> 
> Signed-off-by: Shlomo Pongratz 
> ---
>  target-arm/cpu-qom.h |   1 +
>  target-arm/cpu.h |  12 ++
>  target-arm/cpu64.c   | 118 
> +++
>  3 files changed, 131 insertions(+)
> 
> diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
> index 25fb1ce..6a50433 100644
> --- a/target-arm/cpu-qom.h
> +++ b/target-arm/cpu-qom.h
> @@ -220,6 +220,7 @@ hwaddr arm_cpu_get_phys_page_debug(CPUState *cpu, vaddr 
> addr);
> 
>  int arm_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
>  int arm_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
> +void aarch64_registers_with_opaque_set(Object *obj, void *opaque);
> 
>  /* Callback functions for the generic timer's timers. */
>  void arm_gt_ptimer_cb(void *opaque);
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index 3daa7f5..d561313 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -1034,6 +1034,18 @@ void armv7m_nvic_set_pending(void *opaque, int irq);
>  int armv7m_nvic_acknowledge_irq(void *opaque);
>  void armv7m_nvic_complete_irq(void *opaque, int irq);
> 
> +void armv8_gicv3_set_sgi(void *opaque, int cpuindex, uint64_t value);
> +uint64_t armv8_gicv3_acknowledge_irq(void *opaque, int cpuindex,
> +  MemTxAttrs attrs);
> +void armv8_gicv3_complete_irq(void *opaque, int cpuindex, int irq,
> +  MemTxAttrs attrs);
> +uint64_t armv8_gicv3_get_priority_mask(void *opaque, int cpuindex);
> +void armv8_gicv3_set_priority_mask(void *opaque, int cpuindex, uint32_t 
> mask);
> +uint64_t armv8_gicv3_get_sre(void *opaque);
> +void armv8_gicv3_set_sre(void *opaque, uint64_t sre);
> +uint64_t armv8_gicv3_get_igrpen1(void *opaque, int cpuindex);
> +void armv8_gicv3_set_igrpen1(void *opaque, int cpuindex, uint64_t igrpen1);
> +
>  /* Interface for defining coprocessor registers.
>   * Registers are defined in tables of arm_cp_reginfo structs
>   * which are passed to define_arm_cp_regs().
> diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
> index 63c8b1c..4224779 100644
> --- a/target-arm/cpu64.c
> +++ b/target-arm/cpu64.c
> @@ -45,6 +45,115 @@ static uint64_t a57_a53_l2ctlr_read(CPUARMState *env, 
> const ARMCPRegInfo
> *ri)
>  }
>  #endif
> 
> +#ifndef CONFIG_USER_ONLY
> +static void sgi_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
> value)
> +{
> +CPUState *cpu = ENV_GET_CPU(env);
> +armv8_gicv3_set_sgi(ri->opaque, cpu->cpu_index, value);
> +}
> +
> +static uint64_t iar_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +uint64_t value;
> +MemTxAttrs attrs;;
> +CPUState *cpu = ENV_GET_CPU(env);
> +attrs.secure = arm_is_secure_below_el3(env) ? 1 : 0;
> +value = armv8_gicv3_acknowledge_irq(ri->opaque, cpu->cpu_index, attrs);
> +return value;
> +}
> +
> +static void sre_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
> value)
> +{
> +armv8_gicv3_set_sre(ri->opaque, value);
> +}
> +
> +static uint64_t sre_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +uint64_t value;
> +value = armv8_gicv3_get_sre(ri->opaque);
> +return value;
> +}
> +
> +static void eoir_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
> value)
> +{
> +MemTxAttrs attrs;
> +CPUState *cpu = ENV_GET_CPU(env);
> +attrs.secure = arm_is_secure_below_el3(env) ? 1 : 0;
> +armv8_gicv3_complete_irq(ri->opaque, cpu->cpu_index, value, attrs);
> +}
> +
> +static uint64_t pmr_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +uint64_t value;
> +CPUState *cpu = ENV_GET_CPU(env);
> +value = armv8_gicv3_get_priority_mask(ri->opaque, cpu->cpu_index);
> +return value;
> +}
> +
> +static void pmr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
> value)
> +{
> +CPUState *cpu = ENV_GET_CPU(env);
> +armv8_gicv3_set_priority_mask(ri->opaque, cpu->cpu_index, value);
> +}
> +
> +static uint64_t igrpen1_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +uint64_t value;
> +CPUState *cpu = ENV_GET_CPU(env);
> +value = armv8_gicv3_get_igrpen1(ri->opaque, cpu->cpu_index);
> +return value;
> +}
> +
> +static void igrpen1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
> value)
> +{
> +CPUState *cpu = ENV_GET_CPU(env);
> +armv8_gicv3_set_igrpen1(ri->opaque, cpu->cpu_index, value);
> +}
> +#endif
> +
> +static const ARMCPRegInfo 

[Qemu-devel] [PATCH v3 04/21] util: Infrastructure for computing recent averages

2015-10-22 Thread Alberto Garcia
This module computes the average of a set of values within a time
window, keeping also track of the minimum and maximum values.

In order to produce more accurate results it works internally by
creating two time windows of the same period, offsetted by half of
that period. Values are accounted on both windows and the data is
always returned from the oldest one.

Signed-off-by: Alberto Garcia 
---
 include/qemu/timed-average.h |  63 +
 tests/Makefile   |   4 +
 tests/test-timed-average.c   |  90 +++
 util/Makefile.objs   |   1 +
 util/timed-average.c | 210 +++
 5 files changed, 368 insertions(+)
 create mode 100644 include/qemu/timed-average.h
 create mode 100644 tests/test-timed-average.c
 create mode 100644 util/timed-average.c

diff --git a/include/qemu/timed-average.h b/include/qemu/timed-average.h
new file mode 100644
index 000..f1cdddc
--- /dev/null
+++ b/include/qemu/timed-average.h
@@ -0,0 +1,63 @@
+/*
+ * QEMU timed average computation
+ *
+ * Copyright (C) Nodalink, EURL. 2014
+ * Copyright (C) Igalia, S.L. 2015
+ *
+ * Authors:
+ *   Benoît Canet 
+ *   Alberto Garcia 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#ifndef TIMED_AVERAGE_H
+#define TIMED_AVERAGE_H
+
+#include 
+
+#include "qemu/timer.h"
+
+typedef struct TimedAverageWindow TimedAverageWindow;
+typedef struct TimedAverage TimedAverage;
+
+/* All fields of both structures are private */
+
+struct TimedAverageWindow {
+uint64_t  min; /* minimum value accounted in the window */
+uint64_t  max; /* maximum value accounted in the window */
+uint64_t  sum; /* sum of all values */
+uint64_t  count;   /* number of values */
+int64_t   expiration;  /* the end of the current window in ns */
+};
+
+struct TimedAverage {
+uint64_t   period; /* period in nanoseconds */
+TimedAverageWindow windows[2]; /* two overlapping windows of with
+* an offset of period / 2 between them */
+unsigned   current;/* the current window index: it's also the
+* oldest window index */
+QEMUClockType  clock_type; /* the clock used */
+};
+
+void timed_average_init(TimedAverage *ta, QEMUClockType clock_type,
+uint64_t period);
+
+void timed_average_account(TimedAverage *ta, uint64_t value);
+
+uint64_t timed_average_min(TimedAverage *ta);
+uint64_t timed_average_avg(TimedAverage *ta);
+uint64_t timed_average_max(TimedAverage *ta);
+
+#endif
diff --git a/tests/Makefile b/tests/Makefile
index 0531b30..0c12112 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -81,6 +81,7 @@ check-unit-y += tests/test-crypto-cipher$(EXESUF)
 check-unit-$(CONFIG_GNUTLS) += tests/test-crypto-tlscredsx509$(EXESUF)
 check-unit-$(CONFIG_GNUTLS) += tests/test-crypto-tlssession$(EXESUF)
 check-unit-$(CONFIG_LINUX) += tests/test-qga$(EXESUF)
+check-unit-y += tests/test-timed-average$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -403,6 +404,9 @@ tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
migration/vmstate.o migration/qemu-file.o migration/qemu-file-buf.o \
 migration/qemu-file-unix.o qjson.o \
$(test-qom-obj-y)
+tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \
+   libqemuutil.a stubs/clock-warp.o stubs/cpu-get-icount.o \
+   stubs/notify-event.o
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json 
$(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
diff --git a/tests/test-timed-average.c b/tests/test-timed-average.c
new file mode 100644
index 000..a049799
--- /dev/null
+++ b/tests/test-timed-average.c
@@ -0,0 +1,90 @@
+/*
+ * Timed average computation tests
+ *
+ * Copyright Nodalink, EURL. 2014
+ *
+ * Authors:
+ *  Benoît Canet 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include 
+#include 
+
+#include "qemu/timed-average.h"
+
+/* This is the clock for QEMU_CLOCK_VIRTUAL */
+static int64_t 

[Qemu-devel] [PATCH v3 06/21] block: Add statistics for failed and invalid I/O operations

2015-10-22 Thread Alberto Garcia
This patch adds the block_acct_failed() and block_acct_invalid()
functions to allow keeping track of failed and invalid I/O operations.

The number of failed and invalid operations is exposed in
BlockDeviceStats.

We don't keep track of the time spent on invalid operations because
they are cancelled immediately when they are started.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 23 +++
 block/qapi.c   | 10 ++
 include/block/accounting.h |  4 
 qapi/block-core.json   | 23 ++-
 qmp-commands.hx| 12 
 5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/block/accounting.c b/block/accounting.c
index d427fa8..49a9444 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -51,6 +51,29 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 stats->last_access_time_ns = time_ns;
 }
 
+void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
+{
+int64_t time_ns = qemu_clock_get_ns(clock_type);
+
+assert(cookie->type < BLOCK_MAX_IOTYPE);
+
+stats->failed_ops[cookie->type]++;
+stats->total_time_ns[cookie->type] += time_ns - cookie->start_time_ns;
+stats->last_access_time_ns = time_ns;
+}
+
+void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type)
+{
+assert(type < BLOCK_MAX_IOTYPE);
+
+/* block_acct_done() and block_acct_failed() update
+ * total_time_ns[], but this one does not. The reason is that
+ * invalid requests are accounted during their submission,
+ * therefore there's no actual I/O involved. */
+
+stats->invalid_ops[type]++;
+stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
+}
 
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
   int num_requests)
diff --git a/block/qapi.c b/block/qapi.c
index 539c2e3..84d8412 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -351,6 +351,16 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 s->stats->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
 s->stats->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
 s->stats->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
+
+s->stats->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
+s->stats->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
+s->stats->failed_flush_operations = 
stats->failed_ops[BLOCK_ACCT_FLUSH];
+
+s->stats->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
+s->stats->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
+s->stats->invalid_flush_operations =
+stats->invalid_ops[BLOCK_ACCT_FLUSH];
+
 s->stats->rd_merged = stats->merged[BLOCK_ACCT_READ];
 s->stats->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
 s->stats->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 4b2b999..b50e3cc 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -38,6 +38,8 @@ enum BlockAcctType {
 typedef struct BlockAcctStats {
 uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
+uint64_t invalid_ops[BLOCK_MAX_IOTYPE];
+uint64_t failed_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
 uint64_t merged[BLOCK_MAX_IOTYPE];
 int64_t last_access_time_ns;
@@ -52,6 +54,8 @@ typedef struct BlockAcctCookie {
 void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
   int64_t bytes, enum BlockAcctType type);
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
+void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie);
+void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type);
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests);
 int64_t block_acct_idle_time_ns(BlockAcctStats *stats);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 69c3e1f..1e9b9a6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -452,6 +452,24 @@
 #nanoseconds. If the field is absent it means that
 #there haven't been any operations yet (Since 2.5).
 #
+# @failed_rd_operations: The number of failed read operations
+#performed by the device (Since 2.5)
+#
+# @failed_wr_operations: The number of failed write operations
+#performed by the device (Since 2.5)
+#
+# @failed_flush_operations: The number of failed flush operations
+#   performed by the device (Since 2.5)
+#
+# @invalid_rd_operations: The number of invalid read operations
+#  performed by the device (Since 2.5)
+#
+# @invalid_wr_operations: The number of invalid write operations
+#

[Qemu-devel] [PATCH v3 01/21] xen_disk: Account for flush operations

2015-10-22 Thread Alberto Garcia
Currently both BLKIF_OP_WRITE and BLKIF_OP_FLUSH_DISKCACHE are being
accounted as write operations.

Signed-off-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
---
 hw/block/xen_disk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 1bbc111..4869518 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -576,7 +576,9 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
 }
 
 block_acct_start(blk_get_stats(blkdev->blk), >acct,
- ioreq->v.size, BLOCK_ACCT_WRITE);
+ ioreq->v.size,
+ ioreq->req.operation == BLKIF_OP_WRITE ?
+ BLOCK_ACCT_WRITE : BLOCK_ACCT_FLUSH);
 ioreq->aio_inflight++;
 blk_aio_writev(blkdev->blk, ioreq->start / BLOCK_SIZE,
>v, ioreq->v.size / BLOCK_SIZE,
-- 
2.6.1




[Qemu-devel] [PATCH v3 02/21] ide: Account for write operations correctly

2015-10-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 317406d..b559f1b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -895,7 +895,7 @@ static void ide_sector_write(IDEState *s)
 qemu_iovec_init_external(>qiov, >iov, 1);
 
 block_acct_start(blk_get_stats(s->blk), >acct,
- n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ n * BDRV_SECTOR_SIZE, BLOCK_ACCT_WRITE);
 s->pio_aiocb = blk_aio_writev(s->blk, sector_num, >qiov, n,
   ide_sector_write_cb, s);
 }
-- 
2.6.1




[Qemu-devel] [PATCH v3 17/21] atapi: Account for failed and invalid operations

2015-10-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/ide/atapi.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..cf0b78e 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -108,27 +108,30 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
 static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
 {
 int ret;
+block_acct_start(blk_get_stats(s->blk), >acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 
 switch(sector_size) {
 case 2048:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
 break;
 case 2352:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-if (ret < 0)
-return ret;
-cd_data_to_raw(buf, lba);
+if (ret >= 0) {
+cd_data_to_raw(buf, lba);
+}
 break;
 default:
-ret = -EIO;
-break;
+block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
+return -EIO;
 }
+
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->blk), >acct);
+}
+
 return ret;
 }
 
@@ -357,7 +360,11 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int 
ret)
 return;
 
 eot:
-block_acct_done(blk_get_stats(s->blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->blk), >acct);
+}
 ide_set_inactive(s, false);
 }
 
-- 
2.6.1




[Qemu-devel] [PATCH 1/2] iscsi: Translate scsi sense into error code

2015-10-22 Thread Fam Zheng
Previously we return -EIO blindly when anything goes wrong. Add a helper
function to parse sense fields and try to make the return code more
meaningful.

Signed-off-by: Fam Zheng 
---
 block/iscsi.c | 56 +++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 93f1ee4..f3e20ae 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -84,6 +84,7 @@ typedef struct IscsiTask {
 IscsiLun *iscsilun;
 QEMUTimer retry_timer;
 bool force_next_flush;
+int err_code;
 } IscsiTask;
 
 typedef struct IscsiAIOCB {
@@ -182,6 +183,58 @@ static inline unsigned exp_random(double mean)
 #define QEMU_SCSI_STATUS_TIMEOUTSCSI_STATUS_TIMEOUT
 #endif
 
+static int iscsi_translate_sense(struct scsi_sense *sense)
+{
+int ret = 0;
+
+switch (sense->key) {
+case SCSI_SENSE_NO_SENSE:
+return 0;
+break;
+case SCSI_SENSE_NOT_READY:
+return -EBUSY;
+break;
+case SCSI_SENSE_DATA_PROTECTION:
+return -EACCES;
+break;
+case SCSI_SENSE_COMMAND_ABORTED:
+return -ECANCELED;
+break;
+case SCSI_SENSE_ILLEGAL_REQUEST:
+/* Parse ASCQ */
+break;
+default:
+return -EIO;
+break;
+}
+switch (sense->ascq) {
+case SCSI_SENSE_ASCQ_PARAMETER_LIST_LENGTH_ERROR:
+case SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE:
+case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_CDB:
+case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST:
+ret = -EINVAL;
+break;
+case SCSI_SENSE_ASCQ_LBA_OUT_OF_RANGE:
+ret = -ERANGE;
+break;
+case SCSI_SENSE_ASCQ_LOGICAL_UNIT_NOT_SUPPORTED:
+ret = -ENOTSUP;
+break;
+case SCSI_SENSE_ASCQ_WRITE_PROTECTED:
+ret = -EACCES;
+break;
+case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT:
+case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_CLOSED:
+case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_OPEN:
+ret = -ENOMEDIUM;
+break;
+default:
+ret = -EIO;
+break;
+}
+return ret;
+}
+
 static void
 iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 void *command_data, void *opaque)
@@ -226,6 +279,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 return;
 }
 }
+iTask->err_code = iscsi_translate_sense(>sense);
 error_report("iSCSI Failure: %s", iscsi_get_error(iscsi));
 } else {
 iTask->iscsilun->force_next_flush |= iTask->force_next_flush;
@@ -455,7 +509,7 @@ retry:
 }
 
 if (iTask.status != SCSI_STATUS_GOOD) {
-return -EIO;
+return iTask.err_code;
 }
 
 iscsi_allocationmap_set(iscsilun, sector_num, nb_sectors);
-- 
2.4.3




Re: [Qemu-devel] [PATCH 0/2] Fix werror=enospc for qcow2 on iscsi

2015-10-22 Thread Paolo Bonzini


On 22/10/2015 10:17, Fam Zheng wrote:
> When qcow2 is created on iscsi target with a virtual size greater than 
> physical
> capacity of the LUN, over time it's possible that guest fills too much data 
> and
> at that point, new clusters in qcow2 will be allocated beyond the end of disk.
> 
> werror=enospc is useful for that purpose to allocate more data for the guest,
> except in this case, unlike a host file system, iscsi returns -EIO instead of
> -ENOSPC, which makes it hard to detect and report proper error.
> 
> Fix this by improving iscsi error handling code to return meaningful error
> codes (-ERANGE here), then further translate it to -ENOSPC in qcow2.

FWIW, Linux uses ENOSPC if it detects out of range LBAs:

if (iocb->ki_pos >= size)
return -ENOSPC;

so I think it's okay to convert LBA_OUT_OF_RANGE to ENOSPC directly and
avoid patch 2.

Paolo



Re: [Qemu-devel] [PATCH v6 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)

2015-10-22 Thread Marcel Apfelbaum

On 10/22/2015 11:51 AM, Dotan Barak wrote:

I have minor comments, to use the new helper functions.


Hi Dotan,

I think is a good idea, however we can do it as a simple patch on top of it
if no other issues will be found.

Thanks,
Marcel



Thanks
Dotan


-Original Message-
From: Knut Omang [mailto:knut.om...@oracle.com]
Sent: Thursday, October 22, 2015 11:02 AM
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini ; Richard Henderson
; Eduardo Habkost ; Michael S.
Tsirkin ; Alex Williamson ;
Marcel Apfelbaum ; Jan Kiszka ;
Gonglei (Arei) ; Dotan Barak
; Richard W.M. Jones ; Knut
Omang 
Subject: [PATCH v6 2/4] pcie: Add support for Single Root I/O Virtualization
(SR/IOV)

This patch provides the building blocks for creating an SR/IOV PCIe Extended
Capability header and register/unregister SR/IOV Virtual Functions.

Signed-off-by: Knut Omang 
---
  hw/pci/Makefile.objs|   2 +-
  hw/pci/pci.c|  95 +++
  hw/pci/pcie.c   |   2 +-
  hw/pci/pcie_sriov.c | 277

  include/hw/pci/pci.h|  11 +-
  include/hw/pci/pcie.h   |   6 +
  include/hw/pci/pcie_sriov.h |  67 +++
  include/qemu/typedefs.h |   2 +
  trace-events|   5 +
  9 files changed, 441 insertions(+), 26 deletions(-)  create mode 100644
hw/pci/pcie_sriov.c  create mode 100644 include/hw/pci/pcie_sriov.h

diff --git a/hw/pci/pci.c b/hw/pci/pci.c index b095cfe..3a6cce3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -771,6 +782,15 @@ static void pci_init_multifunction(PCIBus *bus,
PCIDevice *dev, Error **errp)
  dev->config[PCI_HEADER_TYPE] |=
PCI_HEADER_TYPE_MULTI_FUNCTION;
  }

+/* With SR/IOV and ARI, a device at function 0 need not be a multifunction
+ * device, as it may just be a VF that ended up with function 0 in
+ * the legacy PCI interpretation. Avoid failing in such cases:
+ */
+if (pci_is_vf(dev) &&
+dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)

Use pcie_sriov_get_pf()


{
+return;
+}
+
  /*
   * multifunction bit is interpreted in two ways as follows.
   *   - all functions must set the bit to 1.




@@ -1060,11 +1081,44 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev,
int region_num)
  return pci_dev->io_regions[region_num].addr;
  }

-static pcibus_t pci_bar_address(PCIDevice *d,
-   int reg, uint8_t type, pcibus_t size)
+
+static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
+uint8_t type, pcibus_t size) {
+pcibus_t new_addr;
+if (!pci_is_vf(d)) {
+int bar = pci_bar(d, reg);
+if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+new_addr = pci_get_quad(d->config + bar);
+} else {
+new_addr = pci_get_long(d->config + bar);
+}
+} else {
+PCIDevice *pf = d->exp.sriov_vf.pf;

Use pcie_sriov_get_pf()



+uint16_t sriov_cap = pf->exp.sriov_cap;
+int bar = sriov_cap + PCI_SRIOV_BAR + reg * 4;
+uint16_t vf_offset = pci_get_word(pf->config + sriov_cap +
PCI_SRIOV_VF_OFFSET);
+uint16_t vf_stride = pci_get_word(pf->config + sriov_cap +
PCI_SRIOV_VF_STRIDE);
+uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) /
+vf_stride;

Use pcie_sriov_vf_number()




b/hw/pci/pcie_sriov.c new file mode 100644 index 000..756bdde
--- /dev/null
+++ b/hw/pci/pcie_sriov.c
@@ -0,0 +1,277 @@
+/*
+ * pcie_sriov.c:
+ *
+ * Implementation of SR/IOV emulation support.
+ *
+ * Copyright (c) 2015 Knut Omang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pci_bus.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "trace.h"
+
+#define SRIOV_ID(dev) \
+(dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
+
+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char
+*name, uint16_t vf_num); static void unregister_vfs(PCIDevice *dev);
+
+void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+const char *vfname, uint16_t vf_dev_id,
+uint16_t init_vfs, uint16_t total_vfs,
+uint16_t vf_offset, uint16_t vf_stride) {
+uint8_t *cfg = dev->config + offset;
+uint8_t *wmask;
+
+pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
+offset, PCI_EXT_CAP_SRIOV_SIZEOF);
+dev->exp.sriov_cap = offset;
+dev->exp.sriov_pf.num_vfs = 0;
+dev->exp.sriov_pf.vfname = g_strdup(vfname);
+

[Qemu-devel] [PATCH v4 0/] Begin to disentangle libxenctrl and provide some stable libraries

2015-10-22 Thread Ian Campbell
In <1431963008.4944.80.ca...@citrix.com> I proposed stabilising some
parts of the libxenctrl API/ABI by disaggregating into separate
libraries.

This is v4 of that set of series against:
xen
qemu-xen
qemu-xen-traditional
mini-os

NB: Samuel+minios-devel will only get the mini-os side and Stefano+qemu
-devel the qemu-xen side.

The code in for all repos can be found in:

git://xenbits.xen.org/people/ianc/libxenctrl-split/xen.git  v4
git://xenbits.xen.org/people/ianc/libxenctrl-split/qemu-xen.git v4
git://xenbits.xen.org/people/ianc/libxenctrl-split/qemu-xen-traditional.git v4
git://xenbits.xen.org/people/ianc/libxenctrl-split/mini-os.git  v4

The tip of the xen.git branch contains an extra patch hacking Config.mk
to point to all the others above, which should get the correct things for
the HEAD of the branch, but not further back in time.

The new libraries here are:

 * libxentoollog: Common logging infrastructure
 * libxenevtchn: Userspace access to evtchns (via /dev/xen/evtchn etc)
 * libxengnttab: Userspace access to grant tables (via /dev/xen/gnt??? etc)
 * libxencall: Making hypercalls (i.e. the IOCTL_PRIVCMD_HYPERCALL type
   functionality)
 * libxenforeignmemory: Privileged mappings of foreign memory
   (IOCTL_PRIVCMD_MMAP et al)

The first three were actually pretty distinct within libxenctrl already and
have not changed in quite some time.

Although the other two are somewhat new they are based on top of long
standing stable ioctls, which gives me some confidence.

Nonetheless I would appreciate extra review of at least the interface
headers of all of these with a particular eye to the suitability of these
interfaces being maintained in an ABI (_B_, not _P_) stable way going
forward.

Still to come would be libraries for specific out of tree purposes
(device model, kexec), which would be adding new library at the same
level as libxc I think, rather than underneath, i.e. also using the
libraries split out here, but hopefully not libxenctrl itself.

The new libraries use linker version-scripts to hopefully make future
ABI changes be possible in a compatible way.

Since last time I have:

 * Addressed various review comments:
* Addressed feedback from Stefano on the qemu-xen series (and this
  version now goes to qemu-devel too)
* Switched the foreign mapping interfaces to use size_t for the number
  of pages.
* Fixed the callers of xenforeignmemory_unmap (should have been pages,
  but everywhere was passing bytes like the previous munmap case)
* HACK patch in xen.git now updates Config.mk instead of .config

The whole thing has been build tested on Linux (incl stubdoms), and on
FreeBSD. I have runtime tested older versions on Linux but my test boxes
are currently in some netherworld having been moved to a different colo.

Neither NetBSD nor Solaris have been tested at all. It's certainly not
impossible that I've not got the #includes in the new files quite right.

http://xenbits.xen.org/people/ianc/libxenctrl-split/v4.html is the document
I've been using to try and track what I'm doing. It may not be all that
useful. The history of it is in the v4-with-doc branch of the xen.git
linked to above.

Ian.

___
Minios-devel mailing list
minios-de...@lists.xenproject.org
http://lists.xenproject.org/cgi-bin/mailman/listinfo/minios-devel



[Qemu-devel] [PATCH v6 07/12] block: Add "drained begin/end" for transactional backup

2015-10-22 Thread Fam Zheng
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

Move the assignment to state->bs up right after bdrv_drained_begin, so
that we can use it in the clean callback. The abort callback will still
check bs->job and state->job, so it's OK.

Signed-off-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
 blockdev.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index e4a5eb4..0a7848b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1684,9 +1684,16 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
+if (!blk_is_available(blk)) {
+error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, backup->device);
+return;
+}
+
 /* AioContext is released in .clean() */
 state->aio_context = blk_get_aio_context(blk);
 aio_context_acquire(state->aio_context);
+bdrv_drained_begin(blk_bs(blk));
+state->bs = blk_bs(blk);
 
 qmp_drive_backup(backup->device, backup->target,
  backup->has_format, backup->format,
@@ -1702,7 +1709,6 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
-state->bs = blk_bs(blk);
 state->job = state->bs->job;
 }
 
@@ -1722,6 +1728,7 @@ static void drive_backup_clean(BlkTransactionState 
*common)
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
 if (state->aio_context) {
+bdrv_drained_end(state->bs);
 aio_context_release(state->aio_context);
 }
 }
-- 
2.4.3




[Qemu-devel] [PATCH v6 00/12] block: Protect nested event loop with bdrv_drained_begin and bdrv_drained_end

2015-10-22 Thread Fam Zheng
v6: Add Kevin's rev-by in patches 1-3, 6-8, 10, 12.
Add Jeff's rev-by in patches 1, 2, 6-8, 10.
04: Fix spelling and wording in comments. [Jeff]
Add assert at decrement. [Jeff]
05: Fix bad rebase. [Jeff]
09: Let blk_is_available come first. [Jeff, Kevin]
11: Rewrite bdrv_qed_drain. [Jeff]

v5: Rebase onto Kevin's block tree.

v4: Rebase on to master so fix the "bdrv_move_feature_fields" issue.

v3: Call bdrv_drain unconditionally in bdrv_drained_begin.
Document the internal I/O implications between bdrv_drain_begin and end.

The nested aio_poll()'s in block layer has a bug that new r/w requests from
ioeventfds and nbd exports are processed, which might break the caller's
semantics (qmp_transaction) or even pointers (bdrv_reopen).

Fam Zheng (12):
  aio: Add "is_external" flag for event handlers
  nbd: Mark fd handlers client type as "external"
  dataplane: Mark host notifiers' client type as "external"
  aio: introduce aio_{disable,enable}_external
  block: Introduce "drained begin/end" API
  block: Add "drained begin/end" for transactional external snapshot
  block: Add "drained begin/end" for transactional backup
  block: Add "drained begin/end" for transactional blockdev-backup
  block: Add "drained begin/end" for internal snapshot
  block: Introduce BlockDriver.bdrv_drain callback
  qed: Implement .bdrv_drain
  tests: Add test case for aio_disable_external

 aio-posix.c |  9 -
 aio-win32.c |  8 +++-
 async.c |  3 +-
 block/curl.c| 14 ---
 block/io.c  | 23 +++-
 block/iscsi.c   |  9 ++---
 block/linux-aio.c   |  5 ++-
 block/nbd-client.c  | 10 +++--
 block/nfs.c | 17 -
 block/qed.c | 15 
 block/sheepdog.c| 38 ---
 block/ssh.c |  5 ++-
 block/win32-aio.c   |  5 ++-
 blockdev.c  | 38 ---
 hw/block/dataplane/virtio-blk.c |  5 ++-
 hw/scsi/virtio-scsi-dataplane.c | 22 +++
 include/block/aio.h | 40 
 include/block/block.h   | 24 
 include/block/block_int.h   |  8 
 iohandler.c |  3 +-
 nbd.c   |  4 +-
 tests/test-aio.c| 82 -
 22 files changed, 294 insertions(+), 93 deletions(-)

-- 
2.4.3




Re: [Qemu-devel] [PATCH COLO-Frame v9 09/32] COLO: Implement colo checkpoint protocol

2015-10-22 Thread zhanghailiang

On 2015/10/21 20:17, Eric Blake wrote:

On 09/02/2015 02:22 AM, zhanghailiang wrote:

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

Primary Secondary
'checkpoint-request'   @ ->
Suspend (In hybrid mode)
'checkpoint-reply' <-- @
Suspend state
'vmstate-send' @ ->
Send state  Receive state
'vmstate-received' <-- @
Release packets Load state
'vmstate-load' <-- @
Resume  Resume (In hybrid mode)

Start Comparing (In hybrid mode)
NOTE:
  1) '@' who sends the message
  2) Every sync-point is synchronized by two sides with only
 one handshake(single direction) for low-latency.
 If more strict synchronization is required, a opposite direction
 sync-point should be added.
  3) Since sync-points are single direction, the remote side may
 go forward a lot when this side just receives the sync-point.
  4) For now, we only support 'periodic' checkpoint, for which
the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
---
  migration/colo.c | 192 ++-
  qapi-schema.json |  26 
  trace-events |   3 +-
  3 files changed, 218 insertions(+), 3 deletions(-)


Just a qapi review:



+++ b/qapi-schema.json
@@ -664,6 +664,32 @@
  '*tls-port': 'int', '*cert-subject': 'str' } }

  ##
+# @COLOCmd


Any reason this can't be COLOCommand?  We tend to avoid abbreviations in
the public interface, although arguably type names are not ABI.



No special reason, will rename it in next version. :)


+#
+# The colo command
+#
+# @invalid: unknown command
+#
+# @checkpoint-ready: SVM is ready for checkpointing
+#
+# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
+#
+# @checkpoint-reply: SVM gets PVM's checkpoint request
+#
+# @vmstate-send: VM's state will be sent by PVM.
+#
+# @vmstate-received: VM's state has been received by SVM
+#
+# @vmstate-loaded: VM's state has been loaded by SVM


7 documentation strings...


+#
+# Since: 2.5
+##
+{ 'enum': 'COLOCmd',
+  'data': [ 'invalid', 'checkpoint-ready', 'checkpoint-request',
+'checkpoint-reply', 'vmstate-send', 'vmstate-size',
+'vmstate-received', 'vmstate-loaded', 'guest-shutdown',
+'ram-steal'] }


...10 enum values.  Missing vmstate-size, guest-shutdown, ram-steal.



Yes, this is a mistake, these three values shouldn't be added in this patch, we
didn't refer to them in this patch, they should appear in the later 
corresponding
patch. I will fix it in next version.

Thanks,
zhanghailiang





Re: [Qemu-devel] DO_UPCAST confusion

2015-10-22 Thread Gerd Hoffmann
  Hi,

> > state = DO_UPCAST(InternalSnapshotState, common, common);
> 
> I much prefer the name container_of() (which is a bit more obvious that
> it is finding the container or derived type that embeds the parent
> type), but if we have to keep the ugly name, could we at least clean up
> the comment to make sense, and fix the name to be DO_DOWNCAST to match
> what it is actually doing?

We don't have to keep it.  DO_UPCAST is there for historical reasons,
was added with qdev.  It used to be used alot more, and the move from
qdev to QOM already killed alot of it.  But so far nobody went out
cleaning up the remaining places.

Feel free to go ahead and zap it.  But if you touch all the places
anyway please I'd much prefer a conversion to container_of() right away
instead of renaming it to something else.

cheers,
  Gerd





[Qemu-devel] [PATCH v3 10/21] block: New option to define the intervals for collecting I/O statistics

2015-10-22 Thread Alberto Garcia
The BlockAcctStats structure contains a list of BlockAcctTimedStats.
Each one of these collects statistics about the minimum, maximum and
average latencies of all I/O operations in a certain interval of time.

This patch adds a new "stats-intervals" option that allows defining
these intervals.

Signed-off-by: Alberto Garcia 
---
 blockdev.c   | 37 +
 qapi/block-core.json |  4 
 2 files changed, 41 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 94635b5..c316c0a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -468,6 +468,7 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 int bdrv_flags = 0;
 int on_read_error, on_write_error;
 bool account_invalid, account_failed;
+const char *stats_intervals;
 BlockBackend *blk;
 BlockDriverState *bs;
 ThrottleConfig cfg;
@@ -507,6 +508,8 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true);
 account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true);
 
+stats_intervals = qemu_opt_get(opts, "stats-intervals");
+
 extract_common_blockdev_options(opts, _flags, _group, ,
 _zeroes, );
 if (error) {
@@ -605,6 +608,35 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 }
 
 block_acct_init(blk_get_stats(blk), account_invalid, account_failed);
+
+if (stats_intervals) {
+char **intervals = g_strsplit(stats_intervals, ":", 0);
+unsigned i;
+
+if (*stats_intervals == '\0') {
+error_setg(, "stats-intervals can't have an empty 
value");
+}
+
+for (i = 0; !error && intervals[i] != NULL; i++) {
+unsigned long long val;
+if (parse_uint_full(intervals[i], , 10) == 0 &&
+val > 0 && val <= UINT_MAX) {
+block_acct_add_interval(blk_get_stats(blk), val);
+} else {
+error_setg(, "Invalid interval length: '%s'",
+   intervals[i]);
+}
+}
+
+g_strfreev(intervals);
+
+if (error) {
+error_propagate(errp, error);
+blk_unref(blk);
+blk = NULL;
+goto err_no_bs_opts;
+}
+}
 }
 
 blk_set_on_error(blk, on_read_error, on_write_error);
@@ -3535,6 +3567,11 @@ QemuOptsList qemu_common_drive_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "whether to account for failed I/O operations "
 "in the statistics",
+},{
+.name = "stats-intervals",
+.type = QEMU_OPT_STRING,
+.help = "colon-separated list of intervals "
+"for collecting I/O statistics, in seconds",
 },
 { /* end of list */ }
 },
diff --git a/qapi/block-core.json b/qapi/block-core.json
index e32b523..2c1600b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1503,6 +1503,9 @@
 # @stats-account-failed: #optional whether to include failed
 # operations when computing latency and last
 # access statistics (default: true) (Since 2.5)
+# @stats-intervals: #optional colon-separated list of intervals for
+#   collecting I/O statistics, in seconds (default: none)
+#   (Since 2.5)
 # @detect-zeroes: #optional detect and optimize zero writes (Since 2.1)
 # (default: off)
 #
@@ -1520,6 +1523,7 @@
 '*read-only': 'bool',
 '*stats-account-invalid': 'bool',
 '*stats-account-failed': 'bool',
+'*stats-intervals': 'str',
 '*detect-zeroes': 'BlockdevDetectZeroesOptions' } }
 
 ##
-- 
2.6.1




[Qemu-devel] [PATCH 2/2] qcow2: Translate -ERANGE to -ENOSPC

2015-10-22 Thread Fam Zheng
This will make the default werror (=enospc) work better when qcow2 is
created on top of iscsi or other block devices.

Signed-off-by: Fam Zheng 
---
 block/qcow2.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index bacc4f2..8edf0fe 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1586,6 +1586,12 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
  cur_nr_sectors, _qiov);
 qemu_co_mutex_lock(>lock);
 if (ret < 0) {
+if (ret == -ERANGE) {
+/* Out of range access means we're already allocating clusters
+ * beyond end of disk, fix the error code to support
+ * werror=enospc. */
+ret = -ENOSPC;
+}
 goto fail;
 }
 
-- 
2.4.3




[Qemu-devel] [PATCH v3 13/21] iotests: Add test for the block device statistics

2015-10-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/136 | 349 +
 tests/qemu-iotests/136.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 355 insertions(+)
 create mode 100644 tests/qemu-iotests/136
 create mode 100644 tests/qemu-iotests/136.out

diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136
new file mode 100644
index 000..f574d83
--- /dev/null
+++ b/tests/qemu-iotests/136
@@ -0,0 +1,349 @@
+#!/usr/bin/env python
+#
+# Tests for block device statistics
+#
+# Copyright (C) 2015 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+import os
+
+interval_length = 10
+nsec_per_sec = 10
+op_latency = nsec_per_sec / 1000 # See qtest_latency_ns in accounting.c
+bad_sector = 8192
+bad_offset = bad_sector * 512
+blkdebug_file = os.path.join(iotests.test_dir, 'blkdebug.conf')
+
+class BlockDeviceStatsTestCase(iotests.QMPTestCase):
+test_img = "null-aio://"
+total_rd_bytes = 0
+total_rd_ops = 0
+total_wr_bytes = 0
+total_wr_ops = 0
+total_wr_merged = 0
+total_flush_ops = 0
+failed_rd_ops = 0
+failed_wr_ops = 0
+invalid_rd_ops = 0
+invalid_wr_ops = 0
+wr_highest_offset = 0
+account_invalid = False
+account_failed = False
+
+def blockstats(self, device):
+result = self.vm.qmp("query-blockstats")
+for r in result['return']:
+if r['device'] == device:
+return r['stats']
+raise Exception("Device not found for blockstats: %s" % device)
+
+def create_blkdebug_file(self):
+file = open(blkdebug_file, 'w')
+file.write('''
+[inject-error]
+event = "read_aio"
+errno = "5"
+sector = "%d"
+
+[inject-error]
+event = "write_aio"
+errno = "5"
+sector = "%d"
+''' % (bad_sector, bad_sector))
+file.close()
+
+def setUp(self):
+drive_args = []
+drive_args.append("stats-intervals=%d" % interval_length)
+drive_args.append("stats-account-invalid=%s" %
+  (self.account_invalid and "on" or "off"))
+drive_args.append("stats-account-failed=%s" %
+  (self.account_failed and "on" or "off"))
+self.create_blkdebug_file()
+self.vm = iotests.VM().add_drive('blkdebug:%s:%s ' %
+ (blkdebug_file, self.test_img),
+ ','.join(drive_args))
+self.vm.launch()
+# Set an initial value for the clock
+self.vm.qtest("clock_step %d" % nsec_per_sec)
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(blkdebug_file)
+
+def accounted_ops(self, read = False, write = False, flush = False):
+ops = 0
+if write:
+ops += self.total_wr_ops
+if self.account_failed:
+ops += self.failed_wr_ops
+if self.account_invalid:
+ops += self.invalid_wr_ops
+if read:
+ops += self.total_rd_ops
+if self.account_failed:
+ops += self.failed_rd_ops
+if self.account_invalid:
+ops += self.invalid_rd_ops
+if flush:
+ops += self.total_flush_ops
+return ops
+
+def accounted_latency(self, read = False, write = False, flush = False):
+latency = 0
+if write:
+latency += self.total_wr_ops * op_latency
+if self.account_failed:
+latency += self.failed_wr_ops * op_latency
+if read:
+latency += self.total_rd_ops * op_latency
+if self.account_failed:
+latency += self.failed_rd_ops * op_latency
+if flush:
+latency += self.total_flush_ops * op_latency
+return latency
+
+def check_values(self):
+stats = self.blockstats('drive0')
+
+# Check that the totals match with what we have calculated
+self.assertEqual(self.total_rd_bytes, stats['rd_bytes'])
+self.assertEqual(self.total_wr_bytes, stats['wr_bytes'])
+self.assertEqual(self.total_rd_ops, stats['rd_operations'])
+self.assertEqual(self.total_wr_ops, stats['wr_operations'])
+self.assertEqual(self.total_flush_ops, stats['flush_operations'])
+

Re: [Qemu-devel] [PATCH 1/2] iscsi: Translate scsi sense into error code

2015-10-22 Thread Paolo Bonzini


On 22/10/2015 10:31, Peter Lieven wrote:
> 
> +switch (sense->key) {
> +case SCSI_SENSE_NO_SENSE:
> +return 0;
> +break;
> +case SCSI_SENSE_NOT_READY:
> +return -EBUSY;
> +break;
> +case SCSI_SENSE_DATA_PROTECTION:
> +return -EACCES;

Probably EPERM, not EACCES.

> +break;
> +case SCSI_SENSE_COMMAND_ABORTED:
> +return -ECANCELED;
> +break;
> +case SCSI_SENSE_ILLEGAL_REQUEST:
> +/* Parse ASCQ */
> +break;
> +default:
> +return -EIO;
> +break;
> +}
> +switch (sense->ascq) {
> +case SCSI_SENSE_ASCQ_PARAMETER_LIST_LENGTH_ERROR:
> +case SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE:
> +case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_CDB:
> +case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST:
> +ret = -EINVAL;
> +break;
> +case SCSI_SENSE_ASCQ_LBA_OUT_OF_RANGE:
> +ret = -ERANGE;
> +break;
> +case SCSI_SENSE_ASCQ_LOGICAL_UNIT_NOT_SUPPORTED:
> +ret = -ENOTSUP;
> +break;
> +case SCSI_SENSE_ASCQ_WRITE_PROTECTED:
> +ret = -EACCES;

Same here.

Paolo

> +break;
> +case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT:
> +case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_CLOSED:
> +case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_OPEN:
> +ret = -ENOMEDIUM;
> +break;
> +default:
> +ret = -EIO;
> +break;
> +}



[Qemu-devel] [RFC Patch 04/12] IXGBE: Add ixgbe_ping_vf() to notify a specified VF via mailbox msg.

2015-10-22 Thread Lan Tianyu
This patch is to add ixgbe_ping_vf() to notify a specified VF. When
migration status is changed, it's necessary to notify VF the change.
VF driver will check the migrate status when it gets mailbox msg.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 19 ---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  1 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 89671eb..e247d67 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -1318,18 +1318,23 @@ void ixgbe_disable_tx_rx(struct ixgbe_adapter *adapter)
IXGBE_WRITE_REG(hw, IXGBE_VFRE(1), 0);
 }
 
-void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
+void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vfn)
 {
struct ixgbe_hw *hw = >hw;
u32 ping;
+
+   ping = IXGBE_PF_CONTROL_MSG;
+   if (adapter->vfinfo[vfn].clear_to_send)
+   ping |= IXGBE_VT_MSGTYPE_CTS;
+   ixgbe_write_mbx(hw, , 1, vfn);
+}
+
+void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
+{
int i;
 
-   for (i = 0 ; i < adapter->num_vfs; i++) {
-   ping = IXGBE_PF_CONTROL_MSG;
-   if (adapter->vfinfo[i].clear_to_send)
-   ping |= IXGBE_VT_MSGTYPE_CTS;
-   ixgbe_write_mbx(hw, , 1, i);
-   }
+   for (i = 0 ; i < adapter->num_vfs; i++)
+   ixgbe_ping_vf(adapter, i);
 }
 
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 2c197e6..143e2fd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -41,6 +41,7 @@ void ixgbe_msg_task(struct ixgbe_adapter *adapter);
 int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask);
 void ixgbe_disable_tx_rx(struct ixgbe_adapter *adapter);
 void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter);
+void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vfn);
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int queue, u8 *mac);
 int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int queue, u16 vlan,
   u8 qos);
-- 
1.8.4.rc0.1.g8f6a3e5.dirty




[Qemu-devel] [PATCH 1/5] slirp: closesocket must be called to close sockets on windows

2015-10-22 Thread Mark Pizzolato
Signed-off-by: Mark Pizzolato 
---
 slirp/slirp.c  | 2 +-
 slirp/socket.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index 35f819a..d18faa8 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -846,7 +846,7 @@ int slirp_remove_hostfwd(Slirp *slirp, int is_udp, struct 
in_addr host_addr,
 getsockname(so->s, (struct sockaddr *), _len) == 0 &&
 addr.sin_addr.s_addr == host_addr.s_addr &&
 addr.sin_port == port) {
-close(so->s);
+closesocket(so->s);
 sofree(so);
 return 0;
 }
diff --git a/slirp/socket.c b/slirp/socket.c
index 37ac5cf..4a20e08 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -632,8 +632,9 @@ tcp_listen(Slirp *slirp, uint32_t haddr, u_int hport, 
uint32_t laddr,
(listen(s,1) < 0)) {
int tmperrno = errno; /* Don't clobber the real reason we 
failed */
 
-   close(s);
+   closesocket(s);
sofree(so);
+   fprintf (stderr, "Socket Error %d", tmperrno);
/* Restore the real errno */
 #ifdef _WIN32
WSASetLastError(tmperrno);
-- 
1.9.5.msysgit.0





Re: [Qemu-devel] [PATCH] hw/arm/virt: Fix address in PCIe device tree node's unit name

2015-10-22 Thread Peter Maydell
On 21 October 2015 at 21:43, Alexander Gordeev  wrote:
> PCIe device tree unit name is pcie@1000 - which denotes
> IO space base address. However, the corresponding node's
> "reg" property points to PCI configuration space base address
> 0x3f00.
>
> Set the unit name to pcie@3f00 which is not only correct,
> but also conforms to Open Firmware (IEEE 1275).

Nothing should actually care about the address in the
nodename, though, right -- it's just for human readability
and debugging (and guests will be looking at the regs
etc properties of the node to figure out where it is)?
Or have I misunderstood this and there's an actual visible
consequence to this bug?

thanks
-- PMM



Re: [Qemu-devel] [PULL 10/10] cpu-exec: Add "nochain" debug flag

2015-10-22 Thread Edgar E. Iglesias
On Wed, Oct 21, 2015 at 11:42:59AM -1000, Richard Henderson wrote:
> Respect it to avoid linking TBs together.
> 

Reviewed-by: Edgar E. Iglesias 


> Reviewed-by: Peter Maydell 
> Signed-off-by: Richard Henderson 
> ---
>  cpu-exec.c | 3 ++-
>  include/qemu/log.h | 1 +
>  qemu-log.c | 3 +++
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8fd56a6..7eef083 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -477,7 +477,8 @@ int cpu_exec(CPUState *cpu)
>  /* see if we can patch the calling TB. When the TB
> spans two pages, we cannot safely do a direct
> jump. */
> -if (next_tb != 0 && tb->page_addr[1] == -1) {
> +if (next_tb != 0 && tb->page_addr[1] == -1
> +&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>  tb_add_jump((TranslationBlock *)(next_tb & 
> ~TB_EXIT_MASK),
>  next_tb & TB_EXIT_MASK, tb);
>  }
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index f880e66..7de4500 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -41,6 +41,7 @@ static inline bool qemu_log_enabled(void)
>  #define LOG_UNIMP  (1 << 10)
>  #define LOG_GUEST_ERROR(1 << 11)
>  #define CPU_LOG_MMU(1 << 12)
> +#define CPU_LOG_TB_NOCHAIN (1 << 13)
>  
>  /* Returns true if a bit is set in the current loglevel mask
>   */
> diff --git a/qemu-log.c b/qemu-log.c
> index 13f3813..efd07c8 100644
> --- a/qemu-log.c
> +++ b/qemu-log.c
> @@ -119,6 +119,9 @@ const QEMULogItem qemu_log_items[] = {
>  { LOG_GUEST_ERROR, "guest_errors",
>"log when the guest OS does something invalid (eg accessing a\n"
>"non-existent register)" },
> +{ CPU_LOG_TB_NOCHAIN, "nochain",
> +  "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
> +  "complete traces" },
>  { 0, NULL, NULL },
>  };
>  
> -- 
> 2.4.3
> 
> 



Re: [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-22 Thread Stefan Hajnoczi
Thanks for summarizing the discussion!

If you are taking over Fam's series, please squash in your patches to
make review easier.

Maybe the names can be improved:

"allow-partial" is not self-explanatory.

"sync-cancel" is misleading since successful completion is affected too,
not just failure/cancel (jobs wait for each other before reporting
successful completion).

How about "transactional-completion" or "group-completion": "none"/"all"?

Stefan



[Qemu-devel] [PATCH v6 10/12] block: Introduce BlockDriver.bdrv_drain callback

2015-10-22 Thread Fam Zheng
Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Update the comments of bdrv_drain and bdrv_drained_begin accordingly.

Signed-off-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
 block/io.c| 6 +-
 include/block/block.h | 9 +++--
 include/block/block_int.h | 6 ++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5ac6256..999df63 100644
--- a/block/io.c
+++ b/block/io.c
@@ -235,7 +235,8 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 }
 
 /*
- * Wait for pending requests to complete on a single BlockDriverState subtree
+ * Wait for pending requests to complete on a single BlockDriverState subtree,
+ * and suspend block driver's internal I/O until next request arrives.
  *
  * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
  * AioContext.
@@ -248,6 +249,9 @@ void bdrv_drain(BlockDriverState *bs)
 {
 bool busy = true;
 
+if (bs->drv && bs->drv->bdrv_drain) {
+bs->drv->bdrv_drain(bs);
+}
 while (busy) {
 /* Keep iterating */
  bdrv_flush_io_queue(bs);
diff --git a/include/block/block.h b/include/block/block.h
index 5d722a7..cf459d6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -615,8 +615,13 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
  *
  * Begin a quiesced section for exclusive access to the BDS, by disabling
  * external request sources including NBD server and device model. Note that
- * this doesn't block timers or coroutines from submitting more requests, which
- * means block_job_pause is still necessary.
+ * this doesn't prevent timers or coroutines from submitting more requests,
+ * which means block_job_pause is still necessary.
+ *
+ * If new I/O requests are submitted after bdrv_drained_begin is called before
+ * bdrv_drained_end, more internal I/O might be going on after the request has
+ * been completed. If you don't want this, you have to issue another bdrv_drain
+ * or use a nested bdrv_drained_begin/end section.
  *
  * This function can be recursive.
  */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e317b14..73eba05 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -288,6 +288,12 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+/**
+ * Drain and stop any internal sources of requests in the driver, and
+ * remain so until next I/O callback (e.g. bdrv_co_writev) is called.
+ */
+void (*bdrv_drain)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3




[Qemu-devel] [PATCH v6 04/12] aio: introduce aio_{disable, enable}_external

2015-10-22 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 aio-posix.c |  3 ++-
 aio-win32.c |  3 ++-
 include/block/aio.h | 38 ++
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index f0f9122..0467f23 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -261,7 +261,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 /* fill pollfds */
 QLIST_FOREACH(node, >aio_handlers, node) {
-if (!node->deleted && node->pfd.events) {
+if (!node->deleted && node->pfd.events
+&& aio_node_check(ctx, node->is_external)) {
 add_pollfd(node);
 }
 }
diff --git a/aio-win32.c b/aio-win32.c
index 3110d85..43c4c79 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -309,7 +309,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 /* fill fd sets */
 count = 0;
 QLIST_FOREACH(node, >aio_handlers, node) {
-if (!node->deleted && node->io_notify) {
+if (!node->deleted && node->io_notify
+&& aio_node_check(ctx, node->is_external)) {
 events[count++] = event_notifier_get_handle(node->e);
 }
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 12f1141..bcc7d43 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -122,6 +122,8 @@ struct AioContext {
 
 /* TimerLists for calling timers - one per clock type */
 QEMUTimerListGroup tlg;
+
+int external_disable_cnt;
 };
 
 /**
@@ -375,4 +377,40 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
+/**
+ * aio_disable_external:
+ * @ctx: the aio context
+ *
+ * Disable the further processing of external clients.
+ */
+static inline void aio_disable_external(AioContext *ctx)
+{
+atomic_inc(>external_disable_cnt);
+}
+
+/**
+ * aio_enable_external:
+ * @ctx: the aio context
+ *
+ * Enable the processing of external clients.
+ */
+static inline void aio_enable_external(AioContext *ctx)
+{
+assert(ctx->external_disable_cnt > 0);
+atomic_dec(>external_disable_cnt);
+}
+
+/**
+ * aio_node_check:
+ * @ctx: the aio context
+ * @is_external: Whether or not the checked node is an external event source.
+ *
+ * Check if the node's is_external flag is okay to be polled by the ctx at this
+ * moment. True means green light.
+ */
+static inline bool aio_node_check(AioContext *ctx, bool is_external)
+{
+return !is_external || !atomic_read(>external_disable_cnt);
+}
+
 #endif
-- 
2.4.3




[Qemu-devel] [PATCH v6 03/12] dataplane: Mark host notifiers' client type as "external"

2015-10-22 Thread Fam Zheng
They will be excluded by type in the nested event loops in block layer,
so that unwanted events won't be processed there.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 hw/block/dataplane/virtio-blk.c |  5 ++---
 hw/scsi/virtio-scsi-dataplane.c | 18 --
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index f8716bc..c42ddeb 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -283,7 +283,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 
 /* Get this show started by hooking up our callbacks */
 aio_context_acquire(s->ctx);
-aio_set_event_notifier(s->ctx, >host_notifier, false,
+aio_set_event_notifier(s->ctx, >host_notifier, true,
handle_notify);
 aio_context_release(s->ctx);
 return;
@@ -320,8 +320,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 aio_context_acquire(s->ctx);
 
 /* Stop notifications for new requests from guest */
-aio_set_event_notifier(s->ctx, >host_notifier, false,
-   NULL);
+aio_set_event_notifier(s->ctx, >host_notifier, true, NULL);
 
 /* Drain and switch bs back to the QEMU main loop */
 blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 21f51df..0d8d71e 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -60,8 +60,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
 r = g_new(VirtIOSCSIVring, 1);
 r->host_notifier = *virtio_queue_get_host_notifier(vq);
 r->guest_notifier = *virtio_queue_get_guest_notifier(vq);
-aio_set_event_notifier(s->ctx, >host_notifier, false,
-   handler);
+aio_set_event_notifier(s->ctx, >host_notifier, true, handler);
 
 r->parent = s;
 
@@ -72,8 +71,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
 return r;
 
 fail_vring:
-aio_set_event_notifier(s->ctx, >host_notifier, false,
-   NULL);
+aio_set_event_notifier(s->ctx, >host_notifier, true, NULL);
 k->set_host_notifier(qbus->parent, n, false);
 g_free(r);
 return NULL;
@@ -165,16 +163,16 @@ static void virtio_scsi_clear_aio(VirtIOSCSI *s)
 
 if (s->ctrl_vring) {
 aio_set_event_notifier(s->ctx, >ctrl_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 if (s->event_vring) {
 aio_set_event_notifier(s->ctx, >event_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 if (s->cmd_vrings) {
 for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
 aio_set_event_notifier(s->ctx, >cmd_vrings[i]->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 }
 }
@@ -296,12 +294,12 @@ void virtio_scsi_dataplane_stop(VirtIOSCSI *s)
 aio_context_acquire(s->ctx);
 
 aio_set_event_notifier(s->ctx, >ctrl_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 aio_set_event_notifier(s->ctx, >event_vring->host_notifier,
-   false, NULL);
+   true, NULL);
 for (i = 0; i < vs->conf.num_queues; i++) {
 aio_set_event_notifier(s->ctx, >cmd_vrings[i]->host_notifier,
-   false, NULL);
+   true, NULL);
 }
 
 blk_drain_all(); /* ensure there are no in-flight requests */
-- 
2.4.3




[Qemu-devel] PING: [PATCH] Add mp-affinity property for ARM CPU class

2015-10-22 Thread Pavel Fedin
 Hello!

> Nothing wrong with this patch, but I'd rather add it as part of
> the series which actually uses it. (I see you have it in one of
> your GICv3 patchsets.)

 Just a small PING, Shlomo asked for this in 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg04884.html
 I told him that he can also include it in his GIC-500 software emulation 
patchset, but finally it's up to you.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia





Re: [Qemu-devel] [PATCH 2/6] e1000: Trivial implementation of various MAC registers

2015-10-22 Thread Jason Wang


On 10/21/2015 05:13 PM, Leonid Bloch wrote:
> Hi Jason, thanks for the review!
>
> On Tue, Oct 20, 2015 at 8:40 AM, Jason Wang  wrote:
>>
>>
>> On 10/18/2015 03:53 PM, Leonid Bloch wrote:
>>> These registers appear in Intel's specs, but were not implemented.
>>> These registers are now implemented trivially, i.e. they are initiated
>>> with zero values, and if they are RW, they can be written or read by the
>>> driver, or read only if they are R (essentially retaining their zero
>>> values). For these registers no other procedures are performed.
>>>
>>> The registers implemented here are:
>>>
>>> Transmit:
>>> RW: AIT
>>>
>>> Management:
>>> RW: WUC WUS IPAV*   IP6AT*  IP4AT*  FFLT*   WUPM*   FFMT*   FFVT*
>> My version of DSM (Revision) said WUS is read only.
> This seems to be a typo in the specs. We also have the specs where it
> is said that WUS's read only, but exactly two lines below it - writing
> to it is mentioned. Additionally, in the specs for newer Intel's
> devices, where the offset and the functionality of WUS are exactly the
> same, it is written that WUS is RW.

Ok.

>>> Diagnostic:
>>> RW: RDFHRDFTRDFHS   RDFTS   RDFPC   PBM*
>> For those diagnostic register, isn't it better to warn the incomplete
>> emulation instead of trying to give all zero values silently? I suspect
>> this can make diagnostic software think the device is malfunction?
> That's a good point. What do you think about keeping the zero values,
> but printing out a warning (via DBGOUT) on each read/write attempt?

This is fine for me.

>>> Statistic:
>>> RW: FCRUC   TDFHTDFTTDFHS   TDFTS   TDFPC
>> TDFHTDFTTDFHS   TDFTS   TDFPC should be Diagnostic?
> Yes, they are. Thanks, I'll reword.
>>> R:  RNBCTSCTFC  MGTPRC  MGTPDC  MGTPTC  RFC RJC SCC ECOL
>>> LATECOL MCC COLCDC  TNCRS   SEC CEXTERR RLECXONRXC
>>> XONTXC  XOFFRXC XOFFTXC
>>>
>>> Signed-off-by: Leonid Bloch 
>>> Signed-off-by: Dmitry Fleytman 
>>> ---
>>>  hw/net/e1000.c  | 52 
>>> +---
>>>  hw/net/e1000_regs.h |  6 ++
>>>  2 files changed, 55 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>>> index 6d57663..6f754ac 100644
>>> --- a/hw/net/e1000.c
>>> +++ b/hw/net/e1000.c
>>> @@ -168,7 +168,17 @@ enum {
>>>  defreg(TPR), defreg(TPT), defreg(TXDCTL),  defreg(WUFC),
>>>  defreg(RA),  defreg(MTA), defreg(CRCERRS), defreg(VFTA),
>>>  defreg(VET), defreg(RDTR),defreg(RADV),defreg(TADV),
>>> -defreg(ITR),
>>> +defreg(ITR), defreg(FCRUC),   defreg(TDFH),defreg(TDFT),
>>> +defreg(TDFHS),   defreg(TDFTS),   defreg(TDFPC),   defreg(RDFH),
>>> +defreg(RDFT),defreg(RDFHS),   defreg(RDFTS),   defreg(RDFPC),
>>> +defreg(IPAV),defreg(WUC), defreg(WUS), defreg(AIT),
>>> +defreg(IP6AT),   defreg(IP4AT),   defreg(FFLT),defreg(FFMT),
>>> +defreg(FFVT),defreg(WUPM),defreg(PBM), defreg(SCC),
>>> +defreg(ECOL),defreg(MCC), defreg(LATECOL), defreg(COLC),
>>> +defreg(DC),  defreg(TNCRS),   defreg(SEC), defreg(CEXTERR),
>>> +defreg(RLEC),defreg(XONRXC),  defreg(XONTXC),  defreg(XOFFRXC),
>>> +defreg(XOFFTXC), defreg(RFC), defreg(RJC), defreg(RNBC),
>>> +defreg(TSCTFC),  defreg(MGTPRC),  defreg(MGTPDC),  defreg(MGTPTC)
>>>  };
>>>
>>>  static void
>>> @@ -1114,6 +1124,18 @@ mac_readreg(E1000State *s, int index)
>>>  }
>>>
>>>  static uint32_t
>>> +mac_low11_read(E1000State *s, int index)
>>> +{
>>> +return s->mac_reg[index] & 0x7ff;
>>> +}
>>> +
>>> +static uint32_t
>>> +mac_low13_read(E1000State *s, int index)
>>> +{
>>> +return s->mac_reg[index] & 0x1fff;
>>> +}
>>> +
>>> +static uint32_t
>>>  mac_icr_read(E1000State *s, int index)
>>>  {
>>>  uint32_t ret = s->mac_reg[ICR];
>>> @@ -1215,16 +1237,31 @@ static uint32_t (*macreg_readops[])(E1000State *, 
>>> int) = {
>>>  getreg(RDH),  getreg(RDT),  getreg(VET),  getreg(ICS),
>>>  getreg(TDBAL),getreg(TDBAH),getreg(RDBAH),getreg(RDBAL),
>>>  getreg(TDLEN),getreg(RDLEN),getreg(RDTR), getreg(RADV),
>>> -getreg(TADV), getreg(ITR),
>>> +getreg(TADV), getreg(ITR),  getreg(FCRUC),getreg(IPAV),
>>> +getreg(WUC),  getreg(WUS),  getreg(AIT),  getreg(SCC),
>> For AIT should we use low16_read() ?
> Contrary to registers where lowXX_read() is used, for AIT the specs
> say that the higher bits should be written with 0b, and not "read as
> 0b". That's my reasoning for that. What do you think?

I think it's better to test this behavior on real card.

>> And low4_read() for FFMT?
> Why? The specs say nothing about the reserved bits there...

If I read the spec (Revision 3.7 13.6.10) correctly, only low 4 bits
were used for byte mask.

[...]



Re: [Qemu-devel] [PATCH 4/6] e1000: Fixing the received/transmitted octets' counters

2015-10-22 Thread Jason Wang


On 10/21/2015 08:20 PM, Leonid Bloch wrote:
> On Tue, Oct 20, 2015 at 9:16 AM, Jason Wang  wrote:
>> >
>> >
>> > On 10/18/2015 03:53 PM, Leonid Bloch wrote:
>>> >> Previously, the lower parts of these counters (TORL, TOTL) were
>>> >> resetting after reaching their maximal values, and since the continuation
>>> >> of counting in the higher parts (TORH, TOTH) was triggered by an
>>> >> overflow event of the lower parts, the count was not correct.
>>> >>
>>> >> Additionally, TORH and TOTH were counting the corresponding frames, and
>>> >> not the octets, as they supposed to do.
>>> >>
>>> >> Additionally, these 64-bit registers did not stick at their maximal
>>> >> values when (and if) they reached them.
>>> >>
>>> >> This fix resolves all the issues mentioned above, and makes the octet
>>> >> counters behave according to Intel's specs.
>>> >>
>>> >> Signed-off-by: Leonid Bloch 
>>> >> Signed-off-by: Dmitry Fleytman 
>>> >> ---
>>> >>  hw/net/e1000.c | 34 ++
>>> >>  1 file changed, 26 insertions(+), 8 deletions(-)
>>> >>
>>> >> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>>> >> index 5530285..7f977b6 100644
>>> >> --- a/hw/net/e1000.c
>>> >> +++ b/hw/net/e1000.c
>>> >> @@ -583,6 +583,28 @@ inc_reg_if_not_full(E1000State *s, int index)
>>> >>  }
>>> >>  }
>>> >>
>>> >> +static void
>>> >> +grow_8reg_if_not_full(E1000State *s, int index, int size)
>>> >> +{
>>> >> +uint32_t lo = s->mac_reg[index];
>>> >> +uint32_t hi = s->mac_reg[index+1];
>>> >> +
>>> >> +if (lo == 0x) {
>>> >> +if ((hi += size) > s->mac_reg[index+1]) {
>>> >> +s->mac_reg[index+1] = hi;
>>> >> +} else if (s->mac_reg[index+1] != 0x) {
>>> >> +s->mac_reg[index+1] = 0x;
>>> >> +}
>>> >> +} else {
>>> >> +if (((lo += size) < s->mac_reg[index])
>>> >> +&& (s->mac_reg[index] = 0x)) {  /* setting low to 
>>> >> full */
>>> >> +s->mac_reg[index+1] += ++lo;
>>> >> +} else {
>>> >> +s->mac_reg[index] = lo;
>>> >> +}
>>> >> +}
>>> >> +}
>> >
>> > How about something easier:
>> >
>> > uint64_t sum = s->mac_reg[index] | (uint64_t)s->mac_reg[index+1] <<32;
>> > if (sum + size < sum) {
>> > sum = 0x;
>> > } else {
>> > sum += size;
>> > }
>> > s->max_reg[index] = sum;
>> > s->max_reg[index+1] = sum >> 32;
> Yes, that is better! Few small changes:
>
> uint64_t sum = s->mac_reg[index] | (uint64_t)s->mac_reg[index+1] << 32;
>
> if (sum + size < sum) {
> sum = ~0;
> } else {
> sum += size;
> }
> s->mac_reg[index] = sum;
> s->mac_reg[index+1] = sum >> 32;
>
>> >

Looks good to me.



[Qemu-devel] [PATCH v3 15/21] virtio-blk: Account for failed and invalid operations

2015-10-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/block/virtio-blk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8beb26b..aea9728 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -76,7 +76,7 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, 
int error,
 s->rq = req;
 } else if (action == BLOCK_ERROR_ACTION_REPORT) {
 virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-block_acct_done(blk_get_stats(s->blk), >acct);
+block_acct_failed(blk_get_stats(s->blk), >acct);
 virtio_blk_free_request(req);
 }
 
@@ -536,6 +536,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 if (!virtio_blk_sect_range_ok(req->dev, req->sector_num,
   req->qiov.size)) {
 virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
+block_acct_invalid(blk_get_stats(req->dev->blk),
+   is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
 virtio_blk_free_request(req);
 return;
 }
-- 
2.6.1




[Qemu-devel] [PATCH v3 11/21] qemu-io: Account for failed, invalid and flush operations

2015-10-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 qemu-io-cmds.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 6e5d1e4..0cac623 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1364,6 +1364,7 @@ static void aio_write_done(void *opaque, int ret)
 
 if (ret < 0) {
 printf("aio_write failed: %s\n", strerror(-ret));
+block_acct_failed(blk_get_stats(ctx->blk), >acct);
 goto out;
 }
 
@@ -1392,6 +1393,7 @@ static void aio_read_done(void *opaque, int ret)
 
 if (ret < 0) {
 printf("readv failed: %s\n", strerror(-ret));
+block_acct_failed(blk_get_stats(ctx->blk), >acct);
 goto out;
 }
 
@@ -1505,6 +1507,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 if (ctx->offset & 0x1ff) {
 printf("offset %" PRId64 " is not sector aligned\n",
ctx->offset);
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
 g_free(ctx);
 return 0;
 }
@@ -1512,6 +1515,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 nr_iov = argc - optind;
 ctx->buf = create_iovec(blk, >qiov, [optind], nr_iov, 0xab);
 if (ctx->buf == NULL) {
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
 g_free(ctx);
 return 0;
 }
@@ -1600,6 +1604,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 if (ctx->offset & 0x1ff) {
 printf("offset %" PRId64 " is not sector aligned\n",
ctx->offset);
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
 g_free(ctx);
 return 0;
 }
@@ -1607,6 +1612,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 nr_iov = argc - optind;
 ctx->buf = create_iovec(blk, >qiov, [optind], nr_iov, pattern);
 if (ctx->buf == NULL) {
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
 g_free(ctx);
 return 0;
 }
@@ -1621,7 +1627,10 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 
 static int aio_flush_f(BlockBackend *blk, int argc, char **argv)
 {
+BlockAcctCookie cookie;
+block_acct_start(blk_get_stats(blk), , 0, BLOCK_ACCT_FLUSH);
 blk_drain_all();
+block_acct_done(blk_get_stats(blk), );
 return 0;
 }
 
-- 
2.6.1




Re: [Qemu-devel] [PATCH v6 0/4] pcie: Add support for Single Root I/O Virtualization

2015-10-22 Thread Knut Omang
Michael,

I just realized that this now went out without 

Reviewed-by: Marcel Apfelbaum 

to patches 2-4, 

Sorry about that - can you add it for me?

Thanks,
Knut

On Thu, 2015-10-22 at 10:01 +0200, Knut Omang wrote:
> This patch set implements generic support for SR/IOV as an extension
> to the
> core PCIe functionality, similar to the way other capabilities such
> as AER
> is implemented.
> 
> There is no implementation of any device that provides
> SR/IOV support included, but I have implemented a test
> example which can be found together with this patch set here:
> 
>   git://github.com/knuto/qemu.git sriov_patches_v6
> 
> Testing with the example device was documented here:
> 
>   
> http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05110.html
> 
> Changes since v5:
>   - Fix reset logic that got broken in v5. Reset logic is now equal
> to
> that of v4 except that two ambiguous initialization statements
> (introduced during rebase) have been removed
>   - From private feedback, added observer functions for SR/IOV values
> in pcie_sriov.h. To ease access to the vf number, the SR/IOV VF
> device
> struct extension now caches this value.
> 
> Changes since v4:
>   - Mostly based on feeback in Marcel Apfelbaum's review:
>   - The patch with changes to pci_regs.h got eliminated by rebase
>   - Added some documentation as an additional patch
>   - Some trivial fixes moved to separate patch
>   - Modified code to use error and trace functions instead of printfs
> 
> Changes since v3:
>   - Reworked 'pci: Update pci_regs header' to merge kernel version
> improvements
> with the current qemu version instead of copying from the kernel
> version.
> 
> Changes since v2:
>   - Rebased onto 090d0bfd
>   - Un-qdev'ified - avoids issues when resetting NUM_VFS
>   - Fixed handling of vf_offset/vf_stride
> 
> Changes since v1:
>   - Rebased on top of latest master, eliminating prereqs.
>   - Implement proper support for VF_STRIDE, VF_OFFSET and SUP_PGSIZE
> Time better spent fixing it than explaining what the previous
> limitations were.
> - Added new first patch to fix pci bug related to this
>   - Split out patch to pci_default_config_write to a separate patch 2
> to highlight bug fix.
>   - Refactored out logic into new source files
> hw/pci/pcie_sriov.c include/hw/pci/pcie_sriov.h
> similar to pcie_aer.c/h.
>   - Rename functions and introduce structs to better separate
> pf and vf functionality.
>   - Replaced is_vf member with pci_is_vf() function abstraction
>   - Fix numerous syntax, whitespace and comment issues
> according to Michael's review.
>   - Fix memory leaks.
>   - Removed igb example device - a rebased version available
> on github instead.
> 
> Knut Omang (4):
>   pci: Make use of the devfn property when registering new devices
>   pcie: Add support for Single Root I/O Virtualization (SR/IOV)
>   pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
>   pcie: A few minor fixes (type+code simplify)
> 
>  docs/pcie_sriov.txt | 115 ++
>  hw/pci/Makefile.objs|   2 +-
>  hw/pci/pci.c|  97 
>  hw/pci/pcie.c   |   9 +-
>  hw/pci/pcie_sriov.c | 277
> 
>  include/hw/pci/pci.h|  11 +-
>  include/hw/pci/pcie.h   |   6 +
>  include/hw/pci/pcie_sriov.h |  67 +++
>  include/qemu/typedefs.h |   2 +
>  trace-events|   5 +
>  10 files changed, 561 insertions(+), 30 deletions(-)
>  create mode 100644 docs/pcie_sriov.txt
>  create mode 100644 hw/pci/pcie_sriov.c
>  create mode 100644 include/hw/pci/pcie_sriov.h
> 
> --
> 2.4.3



Re: [Qemu-devel] [PATCH] throttle: Remove throttle_group_lock/unlock()

2015-10-22 Thread Kevin Wolf
Am 21.10.2015 um 20:36 hat Alberto Garcia geschrieben:
> The group throttling code was always meant to handle its locking
> internally. However, bdrv_swap() was touching the ThrottleGroup
> structure directly and therefore needed an API for that.
> 
> Now that bdrv_swap() no longer exists there's no need for the
> throttle_group_lock() API anymore.
> 
> Signed-off-by: Alberto Garcia 

Thanks, applied to the block branch.

Kevin



[Qemu-devel] [PATCH qemu 1/2] ppc: Add mmu_model defines for arch 2.03 and 2.07

2015-10-22 Thread Alexey Kardashevskiy
From: Benjamin Herrenschmidt 

This removes unused POWERPC_MMU_2_06a/POWERPC_MMU_2_06d.

This replaces POWERPC_MMU_64B with POWERPC_MMU_2_03 for POWER5+ to be
more explicit about the version of the PowerISA supported.

This defines POWERPC_MMU_2_07 and uses it for the POWER8 CPU family.
This will not have an immediate effect now but it will in the following
patch.

This should cause no behavioural change.

Signed-off-by: Benjamin Herrenschmidt 
[aik: rebased, changed commit log]
Signed-off-by: Alexey Kardashevskiy 
---
 target-ppc/cpu.h| 10 +-
 target-ppc/kvm.c|  8 +---
 target-ppc/mmu_helper.c | 16 
 target-ppc/translate.c  |  4 ++--
 target-ppc/translate_init.c |  4 ++--
 5 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 98ce5a7..69d8cf6 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -117,14 +117,14 @@ enum powerpc_mmu_t {
 #define POWERPC_MMU_AMR  0x0004
 /* 64 bits PowerPC MMU */
 POWERPC_MMU_64B= POWERPC_MMU_64 | 0x0001,
+/* Architecture 2.03 and later (has LPCR) */
+POWERPC_MMU_2_03   = POWERPC_MMU_64 | 0x0002,
 /* Architecture 2.06 variant   */
 POWERPC_MMU_2_06   = POWERPC_MMU_64 | POWERPC_MMU_1TSEG
  | POWERPC_MMU_AMR | 0x0003,
-/* Architecture 2.06 "degraded" (no 1T segments)   */
-POWERPC_MMU_2_06a  = POWERPC_MMU_64 | POWERPC_MMU_AMR
- | 0x0003,
-/* Architecture 2.06 "degraded" (no 1T segments or AMR)*/
-POWERPC_MMU_2_06d  = POWERPC_MMU_64 | 0x0003,
+/* Architecture 2.07 variant   */
+POWERPC_MMU_2_07   = POWERPC_MMU_64 | POWERPC_MMU_1TSEG
+ | POWERPC_MMU_AMR | 0x0004,
 #endif /* defined(TARGET_PPC64) */
 };
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 38aa927..7671ae7 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -259,7 +259,8 @@ static void kvm_get_fallback_smmu_info(PowerPCCPU *cpu,
 info->flags |= KVM_PPC_1T_SEGMENTS;
 }
 
-if (env->mmu_model == POWERPC_MMU_2_06) {
+if (env->mmu_model == POWERPC_MMU_2_06 ||
+env->mmu_model == POWERPC_MMU_2_07) {
 info->slb_size = 32;
 } else {
 info->slb_size = 64;
@@ -272,8 +273,9 @@ static void kvm_get_fallback_smmu_info(PowerPCCPU *cpu,
 info->sps[i].enc[0].pte_enc = 0;
 i++;
 
-/* 64K on MMU 2.06 */
-if (env->mmu_model == POWERPC_MMU_2_06) {
+/* 64K on MMU 2.06 and later */
+if (env->mmu_model == POWERPC_MMU_2_06 ||
+env->mmu_model == POWERPC_MMU_2_07) {
 info->sps[i].page_shift = 16;
 info->sps[i].slb_enc = 0x110;
 info->sps[i].enc[0].page_shift = 16;
diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 527c6ad..e52d0e5 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -1293,9 +1293,9 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, 
CPUPPCState *env)
 break;
 #if defined(TARGET_PPC64)
 case POWERPC_MMU_64B:
+case POWERPC_MMU_2_03:
 case POWERPC_MMU_2_06:
-case POWERPC_MMU_2_06a:
-case POWERPC_MMU_2_06d:
+case POWERPC_MMU_2_07:
 dump_slb(f, cpu_fprintf, env);
 break;
 #endif
@@ -1433,9 +1433,9 @@ hwaddr ppc_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 switch (env->mmu_model) {
 #if defined(TARGET_PPC64)
 case POWERPC_MMU_64B:
+case POWERPC_MMU_2_03:
 case POWERPC_MMU_2_06:
-case POWERPC_MMU_2_06a:
-case POWERPC_MMU_2_06d:
+case POWERPC_MMU_2_07:
 return ppc_hash64_get_phys_page_debug(env, addr);
 #endif
 
@@ -1937,9 +1937,9 @@ void ppc_tlb_invalidate_all(CPUPPCState *env)
 case POWERPC_MMU_601:
 #if defined(TARGET_PPC64)
 case POWERPC_MMU_64B:
+case POWERPC_MMU_2_03:
 case POWERPC_MMU_2_06:
-case POWERPC_MMU_2_06a:
-case POWERPC_MMU_2_06d:
+case POWERPC_MMU_2_07:
 #endif /* defined(TARGET_PPC64) */
 tlb_flush(CPU(cpu), 1);
 break;
@@ -2011,9 +2011,9 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, 
target_ulong addr)
 break;
 #if defined(TARGET_PPC64)
 case POWERPC_MMU_64B:
+case POWERPC_MMU_2_03:
 case POWERPC_MMU_2_06:
-case POWERPC_MMU_2_06a:
-case POWERPC_MMU_2_06d:
+case POWERPC_MMU_2_07:
 /* tlbie invalidate TLBs for all segments */
 /* XXX: given the fact that there are too many segments to invalidate,
  *  and we still don't have a tlb_flush_mask(env, n, mask) in QEMU,
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index c2bc1a7..453509a 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -11327,9 +11327,9 

[Qemu-devel] [PATCH v6 4/4] pcie: A few minor fixes (type+code simplify)

2015-10-22 Thread Knut Omang
- Fix comment typo in pcie_cap_slot_write_config
- Simplify code in pcie_cap_slot_hot_unplug_request_cb.

Signed-off-by: Knut Omang 
---
 hw/pci/pcie.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 774b9ed..ba49c0f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -265,10 +265,11 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
  DeviceState *dev, Error **errp)
 {
 uint8_t *exp_cap;
+PCIDevice *pdev = PCI_DEVICE(hotplug_dev);
 
-pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, _cap, errp);
+pcie_cap_slot_hotplug_common(pdev, dev, _cap, errp);
 
-pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
+pcie_cap_slot_push_attention_button(pdev);
 }
 
 /* pci express slot for pci express root/downstream port
@@ -408,7 +409,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 }
 
 /*
- * If the slot is polulated, power indicator is off and power
+ * If the slot is populated, power indicator is off and power
  * controller is off, it is safe to detach the devices.
  */
 if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
-- 
2.4.3




[Qemu-devel] [PATCH v6 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)

2015-10-22 Thread Knut Omang
This patch provides the building blocks for creating an SR/IOV
PCIe Extended Capability header and register/unregister
SR/IOV Virtual Functions.

Signed-off-by: Knut Omang 
---
 hw/pci/Makefile.objs|   2 +-
 hw/pci/pci.c|  95 +++
 hw/pci/pcie.c   |   2 +-
 hw/pci/pcie_sriov.c | 277 
 include/hw/pci/pci.h|  11 +-
 include/hw/pci/pcie.h   |   6 +
 include/hw/pci/pcie_sriov.h |  67 +++
 include/qemu/typedefs.h |   2 +
 trace-events|   5 +
 9 files changed, 441 insertions(+), 26 deletions(-)
 create mode 100644 hw/pci/pcie_sriov.c
 create mode 100644 include/hw/pci/pcie_sriov.h

diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
index 9f905e6..2226980 100644
--- a/hw/pci/Makefile.objs
+++ b/hw/pci/Makefile.objs
@@ -3,7 +3,7 @@ common-obj-$(CONFIG_PCI) += msix.o msi.o
 common-obj-$(CONFIG_PCI) += shpc.o
 common-obj-$(CONFIG_PCI) += slotid_cap.o
 common-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
-common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
+common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o pcie_sriov.o
 
 common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
 common-obj-$(CONFIG_ALL) += pci-stub.o
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b095cfe..3a6cce3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -153,6 +153,9 @@ int pci_bar(PCIDevice *d, int reg)
 {
 uint8_t type;
 
+/* PCIe virtual functions do not have their own BARs */
+assert(!pci_is_vf(d));
+
 if (reg != PCI_ROM_SLOT)
 return PCI_BASE_ADDRESS_0 + reg * 4;
 
@@ -211,22 +214,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
 }
 }
 
-static void pci_do_device_reset(PCIDevice *dev)
+static void pci_reset_regions(PCIDevice *dev)
 {
 int r;
+if (pci_is_vf(dev)) {
+return;
+}
 
-pci_device_deassert_intx(dev);
-assert(dev->irq_state == 0);
-
-/* Clear all writable bits */
-pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
- pci_get_word(dev->wmask + PCI_COMMAND) |
- pci_get_word(dev->w1cmask + PCI_COMMAND));
-pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
- pci_get_word(dev->wmask + PCI_STATUS) |
- pci_get_word(dev->w1cmask + PCI_STATUS));
-dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
-dev->config[PCI_INTERRUPT_LINE] = 0x0;
 for (r = 0; r < PCI_NUM_REGIONS; ++r) {
 PCIIORegion *region = >io_regions[r];
 if (!region->size) {
@@ -240,6 +234,23 @@ static void pci_do_device_reset(PCIDevice *dev)
 pci_set_long(dev->config + pci_bar(dev, r), region->type);
 }
 }
+}
+
+static void pci_do_device_reset(PCIDevice *dev)
+{
+pci_device_deassert_intx(dev);
+assert(dev->irq_state == 0);
+
+/* Clear all writable bits */
+pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
+ pci_get_word(dev->wmask + PCI_COMMAND) |
+ pci_get_word(dev->w1cmask + PCI_COMMAND));
+pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
+ pci_get_word(dev->wmask + PCI_STATUS) |
+ pci_get_word(dev->w1cmask + PCI_STATUS));
+dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
+dev->config[PCI_INTERRUPT_LINE] = 0x0;
+pci_reset_regions(dev);
 pci_update_mappings(dev);
 
 msi_reset(dev);
@@ -771,6 +782,15 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev, Error **errp)
 dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
 }
 
+/* With SR/IOV and ARI, a device at function 0 need not be a multifunction
+ * device, as it may just be a VF that ended up with function 0 in
+ * the legacy PCI interpretation. Avoid failing in such cases:
+ */
+if (pci_is_vf(dev) &&
+dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+return;
+}
+
 /*
  * multifunction bit is interpreted in two ways as follows.
  *   - all functions must set the bit to 1.
@@ -962,6 +982,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 uint64_t wmask;
 pcibus_t size = memory_region_size(memory);
 
+assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */
 assert(region_num >= 0);
 assert(region_num < PCI_NUM_REGIONS);
 if (size & (size-1)) {
@@ -1060,11 +1081,44 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int 
region_num)
 return pci_dev->io_regions[region_num].addr;
 }
 
-static pcibus_t pci_bar_address(PCIDevice *d,
-   int reg, uint8_t type, pcibus_t size)
+
+static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
+uint8_t type, pcibus_t size)
+{
+pcibus_t new_addr;
+if (!pci_is_vf(d)) {

Re: [Qemu-devel] [PATCH] pc: allow raising low memory via max-ram-below-4g option

2015-10-22 Thread Igor Mammedov
On Thu, 22 Oct 2015 08:49:00 +0200
Gerd Hoffmann  wrote:

> This patch extends the functionality of the max-ram-below-4g option
> to also allow increasing lowmem.  While being at it also rework the
> lowmem calculation logic and add a longish comment describing how it
> works and what the compatibility constrains are.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/i386/pc.c  |  2 +-
>  hw/i386/pc_piix.c | 61 
> +++
>  2 files changed, 40 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index b25a872..7d0b5f7 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1888,7 +1888,7 @@ static void pc_machine_initfn(Object *obj)
>  pc_machine_get_hotplug_memory_region_size,
>  NULL, NULL, NULL, _abort);
>  
> -pcms->max_ram_below_4g = 1ULL << 32; /* 4G */
> +pcms->max_ram_below_4g = 0xe000; /* 3.5G */
>  object_property_add(obj, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
>  pc_machine_get_max_ram_below_4g,
>  pc_machine_set_max_ram_below_4g,
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index a91cc3d..d628166 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -100,29 +100,46 @@ static void pc_init1(MachineState *machine,
>  PcGuestInfo *guest_info;
>  ram_addr_t lowmem;
>  
> -/* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory).
> - * If it doesn't, we need to split it in chunks below and above 4G.
> - * In any case, try to make sure that guest addresses aligned at
> - * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> - * For old machine types, use whatever split we used historically to 
> avoid
> - * breaking migration.
> +/*
> + * Calculate ram split, for memory below and above 4G.  It's a bit
> + * complicated for backward compatibility reasons ...
> + *
> + *  - Traditional split is 3.5G (lowmem = 0xe000).  This is the
> + *default value for max_ram_below_4g now.
> + *
> + *  - Then, to gigabyte align the memory, we move the split to 3G
> + *(lowmem = 0xc000).  But only in case we have to split in
> + *the first place, i.e. ram_size is larger than (traditional)
> + *lowmem.  And for new machine types (gigabyte_align = true)
> + *only, for live migration compatibility reasons.
> + *
> + *  - Next the max-ram-below-4g option was added, which allowed to
> + *reduce lowmem to a smaller value, to allow a larger PCI I/O
> + *window below 4G.  qemu doesn't enforce gigabyte alignment here,
> + *but prints a warning.
> + *
> + *  - Finally max-ram-below-4g got updated to also allow raising lowmem,
> + *so legacy non-PAE guests can get as much memory as possible in
> + *the 32bit address space below 4G.
> + *
> + * Examples:
> + *qemu -M pc-1.7 -m 4G(old default)-> 3584M low,  512M high
> + *qemu -M pc -m 4G(new default)-> 3072M low, 1024M high
> + *qemu -M pc,max-ram-below-4g=2G -M 4G -> 2048M low, 2048M high
 ^^^ should be -m

> + *qemu -M pc,max-ram-below-4g=4G -M 3968M  -> 3968M low (=4G-128M)
 ^^^ the same here

>   */
> -if (machine->ram_size >= 0xe000) {
> -lowmem = gigabyte_align ? 0xc000 : 0xe000;
> -} else {
> -lowmem = 0xe000;
> -}
> -
> -/* Handle the machine opt max-ram-below-4g.  It is basically doing
> - * min(qemu limit, user limit).
> - */
> -if (lowmem > pcms->max_ram_below_4g) {
> -lowmem = pcms->max_ram_below_4g;
> -if (machine->ram_size - lowmem > lowmem &&
> -lowmem & ((1ULL << 30) - 1)) {
> -error_report("Warning: Large machine and 
> max_ram_below_4g(%"PRIu64
> - ") not a multiple of 1G; possible bad performance.",
> - pcms->max_ram_below_4g);
> +lowmem = pcms->max_ram_below_4g;
> +if (machine->ram_size >= pcms->max_ram_below_4g) {
> +if (gigabyte_align) {
> +if (lowmem > 0xc000) {
> +lowmem = 0xc000;
> +}
> +if (lowmem & ((1ULL << 30) - 1)) {
> +error_report("Warning: Large machine and max_ram_below_4g "
> + "(%" PRIu64 ") not a multiple of 1G; "
> + "possible bad performance.",
> + pcms->max_ram_below_4g);
> +}
>  }
>  }
>  




Re: [Qemu-devel] [PATCH v9 06/17] qapi-visit: Remove redundant functions for flat union base

2015-10-22 Thread Markus Armbruster
Eric Blake  writes:

> On 10/21/2015 11:36 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> The code for visiting the base class of a child struct created
>>> visit_type_Base_fields() which covers all fields of Base; while
>>> the code for visiting the base class of a flat union created
>>> visit_type_Union_fields() covering all fields of the base
>>> except the discriminator.  But if the base class were to always
>>> visit all its fields, then we wouldn't need a separate visit of
>>> the discriminator for a union.  Not only is consistently
>>> visiting all fields easier to understand, it lets us share code.
>>>
>>> Now that gen_visit_struct_fields() can potentially collect more
>>> than one function into 'ret', a regular expression searching for
>>> whether a label was used may hit a false positive within the
>>> body of the first function.  But using a regex was overkill,
>>> since we can easily determine when we jumped to a label.
>>>
>>> Signed-off-by: Eric Blake 
>>>
>
>>> +++ b/scripts/qapi-visit.py
>>> @@ -90,7 +90,7 @@ static void visit_type_%(c_name)s_fields(Visitor *v, 
>>> %(c_name)s **obj, Error **e
>>>
>>>  ret += gen_visit_fields(members, prefix='(*obj)->')
>>>
>>> -if re.search('^ *goto out;', ret, re.MULTILINE):
>>> +if base or members:
>> 
>> What if we have an empty base and no members?  Empty base is a
>> pathological case, admittedly.
>
> I'm going to filter the re.search cleanups into its own prereq patch.
> But yes, it will need to care for empty base and no members (hmm, I
> really ought to add positive tests to qapi-schema-test for an empty
> inherited struct, to make sure I'm getting it right - even if we don't
> want that patch in the final series).

Don't take my reluctance to take some positive tests as general
opposition towards positive tests!  Positive tests for corner cases like
"empty base" are valuable.

>> Diff is confusing (not your fault).  Let me compare code before and
>> after real slow.
>
> I also plan for v10 to include a diff of the generated code in the
> commit message, if that will help make the change more obvious.
>
>> 
>> = Before =
>> 
>>   def gen_visit_union(name, base, variants):
>>   ret = ''
>> 
>> 0. base is None if and only if the union is simple.
>> 
>> 1. If it's a flat union, generate its visit_type_NAME_fields().  This
>
> where NAME is the union name.
>
>> function visits the union's non-variant members *except* the
>> discriminator.  Since a simple union has no non-variant members other
>> than the discriminator, generate it only for flat unions.
>> 
>>   if base:
>>   members = [m for m in base.members if m != variants.tag_member]
>>   ret += gen_visit_struct_fields(name, None, members)
>> 
>> 2. Generate the visit_type_implicit_FOO() we're going to need.
>> 
>>   for var in variants.variants:
>>   # Ugly special case for simple union TODO get rid of it
>>   if not var.simple_union_type():
>>   ret += gen_visit_implicit_struct(var.type)
>
> Could be made slightly simpler by generating these while we iterate over
> cases (but we'd have to be careful to generate into multiple variables,
> and then concat together at the end, since we can't generate one
> function in the body of the other).

I doubt it would be an improvement.  The loop is trivial, so
de-duplicating it doesn't have much value.  Having generator code
arranged in the same order as the generated code *does* have value.

>> 3. Generate its visit_type_NAME().
>> 
>
>> 
>> 3.a. If it's a flat union, generate the call of
>> visit_type_NAME_fields().  Not necessary for simple unions, see 1.
>
> Again, important to note that this was visit_type_UNION_fields().
>
>> 3.b. Generate visit of discriminator.
>> 
>
>> 
>> 3.c. Generate visit of the active variant.
>> 
>
>> = After =
>> 
>>   def gen_visit_union(name, base, variants):
>>   ret = ''
>> 
>> 0. base is None if and only if the union is simple.
>> 
>> 1. If it's a flat union, generate its visit_type_NAME_fields().  This
>> function visits the union's non-variant members *including* the
>> discriminator.  However, we generate it only for flat unions.  Simple
>> unions have no non-variant members other than the discriminator.
>> 
>>   if base:
>>   ret += gen_visit_struct_fields(base.name, base.base,
>>  base.local_members)
>
> Note that this creates visit_type_BASE_fields() (a different function).

Missed this detail, thanks.  The old visit_type_UNION_fields() is
visit_type_BASE_fields() less the tag visit.  Reusing
visit_type_BASE_fields() instead behaves as I described above, so my
analysis remains valid regardless.

visit_type_BASE_fields() should be generated when gen_visit_struct()
processes BASE.  Here, we should only generate a forward declaration, if
necessary.

>> 
>> 2. Generate the visit_type_implicit_FOO() we're going to need.
>> 
>>  

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Pavel Fedin
 Hello!

> I've implemented the registers accessed by Linux driver in 
> drivers/irqchip/irq-gic-v3.c
> If this register is used only with KVM e.g. virt/kvm/arm/vgic-v3.c than it is 
> out of my mandate.

 It has nothing to do with KVM. EFI is a firmware, which originates from Intel, 
but now adopted by ARM64 architecture too. You can also run it under qemu, if 
you want to make kind of "full" machine. And it writes some value to BPR1, 
which is indeed ignored by Linux kernel.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




Re: [Qemu-devel] [PATCH 0/2] Fix werror=enospc for qcow2 on iscsi

2015-10-22 Thread Peter Lieven

Am 22.10.2015 um 10:17 schrieb Fam Zheng:

When qcow2 is created on iscsi target with a virtual size greater than physical
capacity of the LUN, over time it's possible that guest fills too much data and
at that point, new clusters in qcow2 will be allocated beyond the end of disk.


Why would you want to put a QCOW2 on a fixed size block device?

Anyway, I like the error code translation.

Peter




[Qemu-devel] [PATCH] migration: Add state records for migration incoming

2015-10-22 Thread zhanghailiang
For migration destination, sometimes we need to know its state,
and it is also useful for tracing migration incoming process.

Here we add a new member 'state' for MigrationIncomingState,
and also use migrate_set_state() to modify its value.
We fix the first parameter of migrate_set_state(), and make it
public.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
Hi,

This is picked from COLO frame series, Dave suggests me
to submit it by itself.

I fixed a little for the commit message. and keep Dave's reviewed-by tag.

Thanks.
---
 include/migration/migration.h |  3 +++
 migration/migration.c | 43 +++
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8334621..4435dee 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -50,6 +50,7 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 struct MigrationIncomingState {
 QEMUFile *file;
 
+int state;
 /* See savevm.c */
 LoadStateEntry_Head loadvm_handlers;
 };
@@ -82,6 +83,8 @@ struct MigrationState
 int64_t dirty_sync_count;
 };
 
+void migrate_set_state(int *state, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index b092f38..85ac850 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -96,6 +96,7 @@ MigrationIncomingState 
*migration_incoming_state_new(QEMUFile* f)
 {
 mis_current = g_new0(MigrationIncomingState, 1);
 mis_current->file = f;
+mis_current->state = MIGRATION_STATUS_NONE;
 QLIST_INIT(_current->loadvm_handlers);
 
 return mis_current;
@@ -277,11 +278,13 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 static void process_incoming_migration_co(void *opaque)
 {
 QEMUFile *f = opaque;
+MigrationIncomingState *mis;
 Error *local_err = NULL;
 int ret;
 
-migration_incoming_state_new(f);
-migrate_generate_event(MIGRATION_STATUS_ACTIVE);
+mis = migration_incoming_state_new(f);
+migrate_set_state(>state, MIGRATION_STATUS_NONE,
+  MIGRATION_STATUS_ACTIVE);
 ret = qemu_loadvm_state(f);
 
 qemu_fclose(f);
@@ -289,7 +292,8 @@ static void process_incoming_migration_co(void *opaque)
 migration_incoming_state_destroy();
 
 if (ret < 0) {
-migrate_generate_event(MIGRATION_STATUS_FAILED);
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_FAILED);
 error_report("load of migration failed: %s", strerror(-ret));
 migrate_decompress_threads_join();
 exit(EXIT_FAILURE);
@@ -298,7 +302,8 @@ static void process_incoming_migration_co(void *opaque)
 /* Make sure all file formats flush their mutable metadata */
 bdrv_invalidate_cache_all(_err);
 if (local_err) {
-migrate_generate_event(MIGRATION_STATUS_FAILED);
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_FAILED);
 error_report_err(local_err);
 migrate_decompress_threads_join();
 exit(EXIT_FAILURE);
@@ -330,7 +335,8 @@ static void process_incoming_migration_co(void *opaque)
  * observer sees this event they might start to prod at the VM assuming
  * it's ready to use.
  */
-migrate_generate_event(MIGRATION_STATUS_COMPLETED);
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_COMPLETED);
 }
 
 void process_incoming_migration(QEMUFile *f)
@@ -585,9 +591,9 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void migrate_set_state(int *state, int old_state, int new_state)
 {
-if (atomic_cmpxchg(>state, old_state, new_state) == old_state) {
+if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
 trace_migrate_set_state(new_state);
 migrate_generate_event(new_state);
 }
@@ -616,7 +622,7 @@ static void migrate_fd_cleanup(void *opaque)
 if (s->state != MIGRATION_STATUS_COMPLETED) {
 qemu_savevm_state_cancel();
 if (s->state == MIGRATION_STATUS_CANCELLING) {
-migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
+migrate_set_state(>state, MIGRATION_STATUS_CANCELLING,
   MIGRATION_STATUS_CANCELLED);
 }
 }
@@ -628,7 +634,8 @@ void migrate_fd_error(MigrationState *s)
 {
 trace_migrate_fd_error();
 assert(s->file == NULL);
-migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED);
+migrate_set_state(>state, MIGRATION_STATUS_SETUP,
+  MIGRATION_STATUS_FAILED);
 

[Qemu-devel] [PULL 23/38] net: add trace_vhost_user_event

2015-10-22 Thread Michael S. Tsirkin
From: Marc-André Lureau 

Replace error_report() and use tracing instead. It's not an error to get
a connection or a disconnection, so silence this and trace it instead.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Tested-by: Thibaut Collet 
---
 net/vhost-user.c | 4 ++--
 trace-events | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 8f354eb..9b38431 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -15,6 +15,7 @@
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qmp-commands.h"
+#include "trace.h"
 
 typedef struct VhostUserState {
 NetClientState nc;
@@ -148,18 +149,17 @@ static void net_vhost_user_event(void *opaque, int event)
   NET_CLIENT_OPTIONS_KIND_NIC,
   MAX_QUEUE_NUM);
 s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+trace_vhost_user_event(s->chr->label, event);
 switch (event) {
 case CHR_EVENT_OPENED:
 if (vhost_user_start(queues, ncs) < 0) {
 exit(1);
 }
 qmp_set_link(name, true, );
-error_report("chardev \"%s\" went up", s->chr->label);
 break;
 case CHR_EVENT_CLOSED:
 qmp_set_link(name, true, );
 vhost_user_stop(queues, ncs);
-error_report("chardev \"%s\" went down", s->chr->label);
 break;
 }
 
diff --git a/trace-events b/trace-events
index a0ddc6b..f237c7f 100644
--- a/trace-events
+++ b/trace-events
@@ -1705,3 +1705,6 @@ qcrypto_tls_creds_x509_load_cert_list(void *creds, const 
char *file) "TLS creds
 
 # crypto/tlssession.c
 qcrypto_tls_session_new(void *session, void *creds, const char *hostname, 
const char *aclname, int endpoint) "TLS session new session=%p creds=%p 
hostname=%s aclname=%s endpoint=%d"
+
+# net/vhost-user.c
+vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
-- 
MST




[Qemu-devel] [PATCH 5/5] slirp: Fix signed/unsigned comparison and variable truncation warnings

2015-10-22 Thread Mark Pizzolato
Some warnings affect potentially wrapping sequence numbers.  Careful
analysis of intent and consequences is necessary.
 - Variable type changes where appropriate
 - Explicit casts where appropriate

Signed-off-by: Mark Pizzolato 
---
 slirp/bootp.c  |  2 +-
 slirp/dnssearch.c  |  6 +++---
 slirp/sbuf.c   |  6 +++---
 slirp/slirp.c  |  2 +-
 slirp/socket.c |  4 ++--
 slirp/socket.h |  2 +-
 slirp/tcp.h|  2 +-
 slirp/tcp_input.c  |  4 ++--
 slirp/tcp_output.c | 12 ++--
 9 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/slirp/bootp.c b/slirp/bootp.c
index 27a4032..79a5c80 100644
--- a/slirp/bootp.c
+++ b/slirp/bootp.c
@@ -284,7 +284,7 @@ static void bootp_reply(Slirp *slirp, const struct bootp_t 
*bp)
 if (slirp->vdnssearch) {
 size_t spaceleft = sizeof(rbp->bp_vend) - (q - rbp->bp_vend);
 val = slirp->vdnssearch_len;
-if (val + 1 > spaceleft) {
+if ((size_t)val + 1 > spaceleft) {
 g_warning("DHCP packet size exceeded, "
 "omitting domain-search option.");
 } else {
diff --git a/slirp/dnssearch.c b/slirp/dnssearch.c
index 4c9064e..dfe38be 100644
--- a/slirp/dnssearch.c
+++ b/slirp/dnssearch.c
@@ -135,7 +135,7 @@ static void domain_mklabels(CompactDomain *cd, const char 
*input)
 if ((len == 0 && cur_chr == '.') || len >= 64) {
 goto fail;
 }
-*len_marker = len;
+*len_marker = (uint8_t)len;
 
 output++;
 len_marker = output;
@@ -222,7 +222,7 @@ static size_t domain_compactify(CompactDomain *domains, 
size_t n)
 if (moff < 0x3FFFu) {
 cd->len -= cd->common_octets - 2;
 cd->labels[cd->len - 1] = moff & 0xFFu;
-cd->labels[cd->len - 2] = 0xC0u | (moff >> 8);
+cd->labels[cd->len - 2] = (uint8_t)(0xC0u | (moff >> 8));
 }
 }
 
@@ -301,7 +301,7 @@ int translate_dnssearch(Slirp *s, const char **names)
 size_t len = bsrc_end - bsrc_start;
 memmove(result + bdst_start, result + bsrc_start, len);
 result[bdst_start - 2] = RFC3397_OPT_DOMAIN_SEARCH;
-result[bdst_start - 1] = len;
+result[bdst_start - 1] = (uint8_t)len;
 bsrc_end = bsrc_start;
 bsrc_start -= MAX_OPT_LEN;
 bdst_start -= MAX_OPT_LEN + OPT_HEADER_LEN;
diff --git a/slirp/sbuf.c b/slirp/sbuf.c
index 08ec2b4..b14f7d6 100644
--- a/slirp/sbuf.c
+++ b/slirp/sbuf.c
@@ -19,13 +19,13 @@ sbfree(struct sbuf *sb)
 void
 sbdrop(struct sbuf *sb, int num)
 {
-int limit = sb->sb_datalen / 2;
+u_int limit = sb->sb_datalen / 2;
 
/*
 * We can only drop how much we have
 * This should never succeed
 */
-   if(num > sb->sb_cc)
+   if((u_int)num > sb->sb_cc)
num = sb->sb_cc;
sb->sb_cc -= num;
sb->sb_rptr += num;
@@ -173,7 +173,7 @@ sbcopy(struct sbuf *sb, int off, int len, char *to)
from -= sb->sb_datalen;
 
if (from < sb->sb_wptr) {
-   if (len > sb->sb_cc) len = sb->sb_cc;
+   if ((u_int)len > sb->sb_cc) len = sb->sb_cc;
memcpy(to,from,len);
} else {
/* re-use off */
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 05bb7e0..c597eb9 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -464,7 +464,7 @@ void slirp_pollfds_poll(GArray *pollfds, int select_error)
 return;
 }
 
-curtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+curtime = (u_int)qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
 QTAILQ_FOREACH(slirp, _instances, entry) {
 /*
diff --git a/slirp/socket.c b/slirp/socket.c
index 92c9bac..62cb6de 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -208,7 +208,7 @@ soread(struct socket *so)
return nn;
 }
 
-int soreadbuf(struct socket *so, const char *buf, int size)
+int soreadbuf(struct socket *so, const char *buf, size_t size)
 {
 int n, nn, copy = size;
struct sbuf *sb = >so_snd;
@@ -468,7 +468,7 @@ sorecvfrom(struct socket *so)
  udp_detach(so);
} else {/* A "normal" UDP packet */
  struct mbuf *m;
-  int len;
+  u_int len;
 #ifdef _WIN32
   unsigned long n;
 #else
diff --git a/slirp/socket.h b/slirp/socket.h
index 57e0407..822b044 100644
--- a/slirp/socket.h
+++ b/slirp/socket.h
@@ -92,6 +92,6 @@ void soisfconnected(register struct socket *);
 void sofwdrain(struct socket *);
 struct iovec; /* For win32 */
 size_t sopreprbuf(struct socket *so, struct iovec *iov, int *np);
-int soreadbuf(struct socket *so, const char *buf, int size);
+int soreadbuf(struct socket *so, const char *buf, size_t size);
 
 #endif /* _SOCKET_H_ */
diff --git a/slirp/tcp.h b/slirp/tcp.h
index 2e2b403..4c791e1 100644
--- a/slirp/tcp.h
+++ b/slirp/tcp.h
@@ -108,7 +108,7 @@ struct tcphdr {
 #define  

Re: [Qemu-devel] Functions to intercept Disk IO information?

2015-10-22 Thread QuQ Edsel
Thank you for reply!
But as I insert a printf message in dma_buf_rw, compile and boot up the VM,
there are no messages printed even when I open/create/edit a .txt or .jpeg
file ?
If dma_buf_rw is called it is supposed to print messages in the monitor...
On the other hand,
the dma_blk_io called in ide_dma_ cb did happen when files are
manipulated
Would it be point to determine Disk IO and to intercept the information I
need?
Sorry I really lack knowledge in this field...
Does the PIO you mentioned means NIC IO?

Edsel

2015-10-22 9:22 GMT+08:00 Fam Zheng :

> On Thu, 10/22 01:13, QuQ Edsel wrote:
> > Hi,
> > My friends and I were assigned a task to find out a point to insert a
> > callback function to intercept Disk IO activities such as read/write a
> .txt
> > file. Our final goal is to generate a report for target process/file 's
> > Disk IO activities. We have QEMU 2.3 with KVM enabled. We have been
> looking
> > for such a point for long...but not so capable of such a task.
> > People who had this project last year done so with TEMU 1.0 (probably
> QEMU
> > 0.9), and the implementation point they had is dma_buf_rw(). The
> > information in their report shows pid, timestamps, disk sector, buffer
> size
> > and write/read for a target file (I am not even sure if such information
> is
> > meaningful or useful)
> > Currently I have tried to printf in functions such as dma_buf_rw /
> > dma_blk_io / bdrv_aio_readv...etc. to see if they print out message when
> I
> > open/edit/save a .txt or .jpeg file. The first one just don't print at
> all,
> > and the second and third one print a lot after booting up the guest
> > I can see that the dma_blk_io function call in ide_dma_cb (core.c) may be
> > related because it prints as I have activities on files. However it also
> > prints sometime when I am not doing any thing... so I not that certain
> > about it. (and I don't know if write/read activities invoke such
> > function..)
>
> Yes, dma_buf_rw should handle all your I/O unless you're doing PIO. I
> think the
> other activities you see when you're not doing anything is from guest
> system's
> background tasks.
>
> Thanks,
> Fam
>
> >
> > Is there a correct /better point to intercept disk IO information
> > ?(especially for activities such as read/write a .file)
> > Or what should I do to clearly get the needed information from Disk IO
> > functions?
> > I would be so grateful to have the information.
> > Thank you.
> >
> > The guest environment I have is 64bit Windows 7 with qcow2 image (not
> sure
> > if relative)
> >
>


[Qemu-devel] [PATCH v6 01/12] aio: Add "is_external" flag for event handlers

2015-10-22 Thread Fam Zheng
All callers pass in false, and the real external ones will switch to
true in coming patches.

Signed-off-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
 aio-posix.c |  6 -
 aio-win32.c |  5 
 async.c |  3 ++-
 block/curl.c| 14 +-
 block/iscsi.c   |  9 +++
 block/linux-aio.c   |  5 ++--
 block/nbd-client.c  | 10 ---
 block/nfs.c | 17 +---
 block/sheepdog.c| 38 ++-
 block/ssh.c |  5 ++--
 block/win32-aio.c   |  5 ++--
 hw/block/dataplane/virtio-blk.c |  6 +++--
 hw/scsi/virtio-scsi-dataplane.c | 24 +++--
 include/block/aio.h |  2 ++
 iohandler.c |  3 ++-
 nbd.c   |  4 ++-
 tests/test-aio.c| 58 +++--
 17 files changed, 130 insertions(+), 84 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index d477033..f0f9122 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -25,6 +25,7 @@ struct AioHandler
 IOHandler *io_write;
 int deleted;
 void *opaque;
+bool is_external;
 QLIST_ENTRY(AioHandler) node;
 };
 
@@ -43,6 +44,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 
 void aio_set_fd_handler(AioContext *ctx,
 int fd,
+bool is_external,
 IOHandler *io_read,
 IOHandler *io_write,
 void *opaque)
@@ -82,6 +84,7 @@ void aio_set_fd_handler(AioContext *ctx,
 node->io_read = io_read;
 node->io_write = io_write;
 node->opaque = opaque;
+node->is_external = is_external;
 
 node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
 node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
@@ -92,10 +95,11 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *notifier,
+bool is_external,
 EventNotifierHandler *io_read)
 {
 aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
-   (IOHandler *)io_read, NULL, notifier);
+   is_external, (IOHandler *)io_read, NULL, notifier);
 }
 
 bool aio_prepare(AioContext *ctx)
diff --git a/aio-win32.c b/aio-win32.c
index 50a6867..3110d85 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -28,11 +28,13 @@ struct AioHandler {
 GPollFD pfd;
 int deleted;
 void *opaque;
+bool is_external;
 QLIST_ENTRY(AioHandler) node;
 };
 
 void aio_set_fd_handler(AioContext *ctx,
 int fd,
+bool is_external,
 IOHandler *io_read,
 IOHandler *io_write,
 void *opaque)
@@ -86,6 +88,7 @@ void aio_set_fd_handler(AioContext *ctx,
 node->opaque = opaque;
 node->io_read = io_read;
 node->io_write = io_write;
+node->is_external = is_external;
 
 event = event_notifier_get_handle(>notifier);
 WSAEventSelect(node->pfd.fd, event,
@@ -98,6 +101,7 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *e,
+bool is_external,
 EventNotifierHandler *io_notify)
 {
 AioHandler *node;
@@ -133,6 +137,7 @@ void aio_set_event_notifier(AioContext *ctx,
 node->e = e;
 node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
 node->pfd.events = G_IO_IN;
+node->is_external = is_external;
 QLIST_INSERT_HEAD(>aio_handlers, node, node);
 
 g_source_add_poll(>source, >pfd);
diff --git a/async.c b/async.c
index efce14b..bdc64a3 100644
--- a/async.c
+++ b/async.c
@@ -247,7 +247,7 @@ aio_ctx_finalize(GSource *source)
 }
 qemu_mutex_unlock(>bh_lock);
 
-aio_set_event_notifier(ctx, >notifier, NULL);
+aio_set_event_notifier(ctx, >notifier, false, NULL);
 event_notifier_cleanup(>notifier);
 rfifolock_destroy(>lock);
 qemu_mutex_destroy(>bh_lock);
@@ -329,6 +329,7 @@ AioContext *aio_context_new(Error **errp)
 }
 g_source_set_can_recurse(>source, true);
 aio_set_event_notifier(ctx, >notifier,
+   false,
(EventNotifierHandler *)
event_notifier_dummy_cb);
 ctx->thread_pool = NULL;
diff --git a/block/curl.c b/block/curl.c
index 032cc8a..8994182 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -154,18 +154,20 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 DPRINTF("CURL (AIO): Sock action %d 

[Qemu-devel] [PATCH v2 0/8] i.MX: Standardize debug code

2015-10-22 Thread Jean-Christophe Dubois
We fix all i.MX driver files to use the same type of debug code.

The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

We also replace IPRINTF with qemu_log_mask(). The qemu_log_mask() output
is following the same format as the above debug.

Jean-Christophe Dubois (8):
  i.MX: Standardize i.MX serial debug.
  i.MX: Standardize i.MX GPIO debug
  i.MX: Standardize i.MX I2C debug
  i.MX: Standardize i.MX AVIC debug
  i.MX: Standardize i.MX CCM debug
  i.MX: Standardize i.MX FEC debug
  i.MX: Standardize i.MX EPIT debug
  i.MX: Standardize i.MX GPT debug

 hw/char/imx_serial.c | 57 +++--
 hw/gpio/imx_gpio.c   | 19 ---
 hw/i2c/imx_i2c.c | 43 +-
 hw/intc/imx_avic.c   | 44 ++-
 hw/misc/imx_ccm.c| 34 +--
 hw/net/imx_fec.c | 66 ++--
 hw/timer/imx_epit.c  | 37 -
 hw/timer/imx_gpt.c   | 46 +++-
 8 files changed, 168 insertions(+), 178 deletions(-)

-- 
2.1.4




[Qemu-devel] [PATCH v6 02/12] nbd: Mark fd handlers client type as "external"

2015-10-22 Thread Fam Zheng
So we could distinguish it from internal used fds, thus avoid handling
unwanted events in nested aio polls.

Signed-off-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
 nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/nbd.c b/nbd.c
index fbc66be..dab1ebb 100644
--- a/nbd.c
+++ b/nbd.c
@@ -1446,7 +1446,7 @@ static void nbd_set_handlers(NBDClient *client)
 {
 if (client->exp && client->exp->ctx) {
 aio_set_fd_handler(client->exp->ctx, client->sock,
-   false,
+   true,
client->can_read ? nbd_read : NULL,
client->send_coroutine ? nbd_restart_write : NULL,
client);
@@ -1457,7 +1457,7 @@ static void nbd_unset_handlers(NBDClient *client)
 {
 if (client->exp && client->exp->ctx) {
 aio_set_fd_handler(client->exp->ctx, client->sock,
-   false, NULL, NULL, NULL);
+   true, NULL, NULL, NULL);
 }
 }
 
-- 
2.4.3




[Qemu-devel] [PATCH v6 08/12] block: Add "drained begin/end" for transactional blockdev-backup

2015-10-22 Thread Fam Zheng
Similar to the previous patch, make sure that external events are not
dispatched during transaction operations.

Signed-off-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
 blockdev.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 0a7848b..52f44b2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1756,6 +1756,11 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
+if (!blk_is_available(blk)) {
+error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, backup->device);
+return;
+}
+
 target = blk_by_name(backup->target);
 if (!target) {
 error_setg(errp, "Device '%s' not found", backup->target);
@@ -1770,6 +1775,8 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 aio_context_acquire(state->aio_context);
+state->bs = blk_bs(blk);
+bdrv_drained_begin(state->bs);
 
 qmp_blockdev_backup(backup->device, backup->target,
 backup->sync,
@@ -1782,7 +1789,6 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
-state->bs = blk_bs(blk);
 state->job = state->bs->job;
 }
 
@@ -1802,6 +1808,7 @@ static void blockdev_backup_clean(BlkTransactionState 
*common)
 BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
common);
 
 if (state->aio_context) {
+bdrv_drained_end(state->bs);
 aio_context_release(state->aio_context);
 }
 }
-- 
2.4.3




Re: [Qemu-devel] Functions to intercept Disk IO information?

2015-10-22 Thread Fam Zheng
On Thu, 10/22 14:18, QuQ Edsel wrote:
> Thank you for reply!
> But as I insert a printf message in dma_buf_rw, compile and boot up the VM,
> there are no messages printed even when I open/create/edit a .txt or .jpeg
> file ?
> If dma_buf_rw is called it is supposed to print messages in the monitor...
> On the other hand,
> the dma_blk_io called in ide_dma_ cb did happen when files are
> manipulated

Yes, it was a mistake, I meant dma_blk_io, sorry for confusion. :) dma_buf_rw
only works for some device types, while dma_blk_io works more commonly.

> Would it be point to determine Disk IO and to intercept the information I
> need?
> Sorry I really lack knowledge in this field...
> Does the PIO you mentioned means NIC IO?

PIO is port I/O which doesn't use DMA.

Fam

> 
> Edsel
> 
> 2015-10-22 9:22 GMT+08:00 Fam Zheng :
> 
> > On Thu, 10/22 01:13, QuQ Edsel wrote:
> > > Hi,
> > > My friends and I were assigned a task to find out a point to insert a
> > > callback function to intercept Disk IO activities such as read/write a
> > .txt
> > > file. Our final goal is to generate a report for target process/file 's
> > > Disk IO activities. We have QEMU 2.3 with KVM enabled. We have been
> > looking
> > > for such a point for long...but not so capable of such a task.
> > > People who had this project last year done so with TEMU 1.0 (probably
> > QEMU
> > > 0.9), and the implementation point they had is dma_buf_rw(). The
> > > information in their report shows pid, timestamps, disk sector, buffer
> > size
> > > and write/read for a target file (I am not even sure if such information
> > is
> > > meaningful or useful)
> > > Currently I have tried to printf in functions such as dma_buf_rw /
> > > dma_blk_io / bdrv_aio_readv...etc. to see if they print out message when
> > I
> > > open/edit/save a .txt or .jpeg file. The first one just don't print at
> > all,
> > > and the second and third one print a lot after booting up the guest
> > > I can see that the dma_blk_io function call in ide_dma_cb (core.c) may be
> > > related because it prints as I have activities on files. However it also
> > > prints sometime when I am not doing any thing... so I not that certain
> > > about it. (and I don't know if write/read activities invoke such
> > > function..)
> >
> > Yes, dma_buf_rw should handle all your I/O unless you're doing PIO. I
> > think the
> > other activities you see when you're not doing anything is from guest
> > system's
> > background tasks.
> >
> > Thanks,
> > Fam
> >
> > >
> > > Is there a correct /better point to intercept disk IO information
> > > ?(especially for activities such as read/write a .file)
> > > Or what should I do to clearly get the needed information from Disk IO
> > > functions?
> > > I would be so grateful to have the information.
> > > Thank you.
> > >
> > > The guest environment I have is 64bit Windows 7 with qcow2 image (not
> > sure
> > > if relative)
> > >
> >



Re: [Qemu-devel] [PATCH 00/40] Patch Round-up for stable 2.4.1, freeze on 2015-10-29

2015-10-22 Thread Markus Armbruster
I'm afraid

2d0583f qmp: Fix device-list-properties not to crash for abstract device
2874c65 qdev: Protect device-list-properties against broken devices
55b4efb Revert "qdev: Use qdev_get_device_class() for -device ,help"

unmask a bunch of device model bugs, so you need to pick their fixes,
too:

ac98fa8 update-linux-headers: Rename SW_MAX to SW_MAX_
c6047e9 virtio-input: Fix device introspection on non-Linux hosts
2e2b8eb memory: allow destroying a non-empty MemoryRegion
81e0ab4 hw: do not pass NULL to memory_region_init from instance_init
c710440 macio: move DBDMA_init from instance_init to realize

To check everything's sane, you can pick

e253c28 tests: Fix how qom-test is run
5fb48d9 libqtest: New hmp() & friends
2d1abb8 device-introspect-test: New, covering device introspection

and run make check.

I apologize for not communicating this better in the commit messages.



[Qemu-devel] [PATCH v3 07/21] block: Allow configuring whether to account failed and invalid ops

2015-10-22 Thread Alberto Garcia
This patch adds two options, "stats-account-invalid" and
"stats-account-failed", that can be used to decide whether invalid and
failed I/O operations must be used when collecting statistics for
latency and last access time.

Signed-off-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
---
 block/accounting.c | 24 +++-
 block/qapi.c   |  3 +++
 blockdev.c | 16 
 include/block/accounting.h |  5 +
 qapi/block-core.json   | 17 -
 qmp-commands.hx| 25 -
 6 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 49a9444..923aeaf 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -28,6 +28,13 @@
 
 static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
 
+void block_acct_init(BlockAcctStats *stats, bool account_invalid,
+ bool account_failed)
+{
+stats->account_invalid = account_invalid;
+stats->account_failed = account_failed;
+}
+
 void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
   int64_t bytes, enum BlockAcctType type)
 {
@@ -53,13 +60,17 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 
 void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
 {
-int64_t time_ns = qemu_clock_get_ns(clock_type);
-
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
 stats->failed_ops[cookie->type]++;
-stats->total_time_ns[cookie->type] += time_ns - cookie->start_time_ns;
-stats->last_access_time_ns = time_ns;
+
+if (stats->account_failed) {
+int64_t time_ns = qemu_clock_get_ns(clock_type);
+int64_t latency_ns = time_ns - cookie->start_time_ns;
+
+stats->total_time_ns[cookie->type] += latency_ns;
+stats->last_access_time_ns = time_ns;
+}
 }
 
 void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type)
@@ -72,7 +83,10 @@ void block_acct_invalid(BlockAcctStats *stats, enum 
BlockAcctType type)
  * therefore there's no actual I/O involved. */
 
 stats->invalid_ops[type]++;
-stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
+
+if (stats->account_invalid) {
+stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
+}
 }
 
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
diff --git a/block/qapi.c b/block/qapi.c
index 84d8412..56c8139 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -372,6 +372,9 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 if (s->stats->has_idle_time_ns) {
 s->stats->idle_time_ns = block_acct_idle_time_ns(stats);
 }
+
+s->stats->account_invalid = stats->account_invalid;
+s->stats->account_failed = stats->account_failed;
 }
 
 s->stats->wr_highest_offset = bs->wr_highest_offset;
diff --git a/blockdev.c b/blockdev.c
index b79b0a6..94635b5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -467,6 +467,7 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 const char *buf;
 int bdrv_flags = 0;
 int on_read_error, on_write_error;
+bool account_invalid, account_failed;
 BlockBackend *blk;
 BlockDriverState *bs;
 ThrottleConfig cfg;
@@ -503,6 +504,9 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 /* extract parameters */
 snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
 
+account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true);
+account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true);
+
 extract_common_blockdev_options(opts, _flags, _group, ,
 _zeroes, );
 if (error) {
@@ -599,6 +603,8 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 if (bdrv_key_required(bs)) {
 autostart = 0;
 }
+
+block_acct_init(blk_get_stats(blk), account_invalid, account_failed);
 }
 
 blk_set_on_error(blk, on_read_error, on_write_error);
@@ -3519,6 +3525,16 @@ QemuOptsList qemu_common_drive_opts = {
 .name = "detect-zeroes",
 .type = QEMU_OPT_STRING,
 .help = "try to optimize zero writes (off, on, unmap)",
+},{
+.name = "stats-account-invalid",
+.type = QEMU_OPT_BOOL,
+.help = "whether to account for invalid I/O operations "
+"in the statistics",
+},{
+.name = "stats-account-failed",
+.type = QEMU_OPT_BOOL,
+.help = "whether to account for failed I/O operations "
+"in the statistics",
 },
 { /* end of list */ }
 },
diff --git a/include/block/accounting.h b/include/block/accounting.h
index b50e3cc..0d9b076 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -25,6 +25,7 @@
 

[Qemu-devel] [PATCH v3 03/21] block: define 'clock_type' for the accounting code

2015-10-22 Thread Alberto Garcia
Its value is still QEMU_CLOCK_REALTIME, but having it in a variable will
allow us to change its value easily in the future when running in qtest
mode.

Signed-off-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
---
 block/accounting.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index a423560..6f4c0f1 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -26,13 +26,15 @@
 #include "block/block_int.h"
 #include "qemu/timer.h"
 
+static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
+
 void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
   int64_t bytes, enum BlockAcctType type)
 {
 assert(type < BLOCK_MAX_IOTYPE);
 
 cookie->bytes = bytes;
-cookie->start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+cookie->start_time_ns = qemu_clock_get_ns(clock_type);
 cookie->type = type;
 }
 
@@ -43,7 +45,7 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 stats->nr_bytes[cookie->type] += cookie->bytes;
 stats->nr_ops[cookie->type]++;
 stats->total_time_ns[cookie->type] +=
-qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cookie->start_time_ns;
+qemu_clock_get_ns(clock_type) - cookie->start_time_ns;
 }
 
 
-- 
2.6.1




[Qemu-devel] [PATCH v3 05/21] block: Add idle_time_ns to BlockDeviceStats

2015-10-22 Thread Alberto Garcia
This patch adds the new field 'idle_time_ns' to the BlockDeviceStats
structure, indicating the time that has passed since the previous I/O
operation.

It also adds the block_acct_idle_time_ns() call, to ensure that all
references to the clock type used for accounting are in the same
place. This will later allow us to use a different clock for iotests.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 12 ++--
 block/qapi.c   |  5 +
 hmp.c  |  4 +++-
 include/block/accounting.h |  2 ++
 qapi/block-core.json   |  6 +-
 qmp-commands.hx| 10 --
 6 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 6f4c0f1..d427fa8 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -40,12 +40,15 @@ void block_acct_start(BlockAcctStats *stats, 
BlockAcctCookie *cookie,
 
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
 {
+int64_t time_ns = qemu_clock_get_ns(clock_type);
+int64_t latency_ns = time_ns - cookie->start_time_ns;
+
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
 stats->nr_bytes[cookie->type] += cookie->bytes;
 stats->nr_ops[cookie->type]++;
-stats->total_time_ns[cookie->type] +=
-qemu_clock_get_ns(clock_type) - cookie->start_time_ns;
+stats->total_time_ns[cookie->type] += latency_ns;
+stats->last_access_time_ns = time_ns;
 }
 
 
@@ -55,3 +58,8 @@ void block_acct_merge_done(BlockAcctStats *stats, enum 
BlockAcctType type,
 assert(type < BLOCK_MAX_IOTYPE);
 stats->merged[type] += num_requests;
 }
+
+int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
+{
+return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns;
+}
diff --git a/block/qapi.c b/block/qapi.c
index ec0f513..539c2e3 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -357,6 +357,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
 s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
 s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
+
+s->stats->has_idle_time_ns = stats->last_access_time_ns > 0;
+if (s->stats->has_idle_time_ns) {
+s->stats->idle_time_ns = block_acct_idle_time_ns(stats);
+}
 }
 
 s->stats->wr_highest_offset = bs->wr_highest_offset;
diff --git a/hmp.c b/hmp.c
index 28caa7d..8ee473e 100644
--- a/hmp.c
+++ b/hmp.c
@@ -522,6 +522,7 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
" flush_total_time_ns=%" PRId64
" rd_merged=%" PRId64
" wr_merged=%" PRId64
+   " idle_time_ns=%" PRId64
"\n",
stats->value->stats->rd_bytes,
stats->value->stats->wr_bytes,
@@ -532,7 +533,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
stats->value->stats->rd_total_time_ns,
stats->value->stats->flush_total_time_ns,
stats->value->stats->rd_merged,
-   stats->value->stats->wr_merged);
+   stats->value->stats->wr_merged,
+   stats->value->stats->idle_time_ns);
 }
 
 qapi_free_BlockStatsList(stats_list);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 66637cd..4b2b999 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -40,6 +40,7 @@ typedef struct BlockAcctStats {
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
 uint64_t merged[BLOCK_MAX_IOTYPE];
+int64_t last_access_time_ns;
 } BlockAcctStats;
 
 typedef struct BlockAcctCookie {
@@ -53,5 +54,6 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests);
+int64_t block_acct_idle_time_ns(BlockAcctStats *stats);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5f12af7..69c3e1f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -448,6 +448,10 @@
 # @wr_merged: Number of write requests that have been merged into another
 # request (Since 2.3).
 #
+# @idle_time_ns: #optional Time since the last I/O operation, in
+#nanoseconds. If the field is absent it means that
+#there haven't been any operations yet (Since 2.5).
+#
 # Since: 0.14.0
 ##
 { 'struct': 'BlockDeviceStats',
@@ -455,7 +459,7 @@
'wr_operations': 'int', 'flush_operations': 'int',
'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
-   

[Qemu-devel] [PATCH v3 12/21] block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode

2015-10-22 Thread Alberto Garcia
This patch switches to QEMU_CLOCK_VIRTUAL for the accounting code in
qtest mode, and makes the latency of the operation constant. This way we
can perform tests on the accounting code with reproducible results.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/accounting.c b/block/accounting.c
index a941931..05a5c5f 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -25,14 +25,20 @@
 #include "block/accounting.h"
 #include "block/block_int.h"
 #include "qemu/timer.h"
+#include "sysemu/qtest.h"
 
 static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
+static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000;
 
 void block_acct_init(BlockAcctStats *stats, bool account_invalid,
  bool account_failed)
 {
 stats->account_invalid = account_invalid;
 stats->account_failed = account_failed;
+
+if (qtest_enabled()) {
+clock_type = QEMU_CLOCK_VIRTUAL;
+}
 }
 
 void block_acct_cleanup(BlockAcctStats *stats)
@@ -84,6 +90,10 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 int64_t time_ns = qemu_clock_get_ns(clock_type);
 int64_t latency_ns = time_ns - cookie->start_time_ns;
 
+if (qtest_enabled()) {
+latency_ns = qtest_latency_ns;
+}
+
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
 stats->nr_bytes[cookie->type] += cookie->bytes;
@@ -107,6 +117,10 @@ void block_acct_failed(BlockAcctStats *stats, 
BlockAcctCookie *cookie)
 int64_t time_ns = qemu_clock_get_ns(clock_type);
 int64_t latency_ns = time_ns - cookie->start_time_ns;
 
+if (qtest_enabled()) {
+latency_ns = qtest_latency_ns;
+}
+
 stats->total_time_ns[cookie->type] += latency_ns;
 stats->last_access_time_ns = time_ns;
 
-- 
2.6.1




[Qemu-devel] [PULL 06/10] tcg/mips: Add use_mips32r6_instructions definition

2015-10-22 Thread Richard Henderson
From: James Hogan 

Add definition use_mips32r6_instructions to the MIPS TCG backend which
is constant 1 when built for MIPS release 6. This will be used to decide
between pre-R6 and R6 instruction encodings.

Reviewed-by: Aurelien Jarno 
Signed-off-by: James Hogan 
Signed-off-by: Richard Henderson 
Message-Id: <1443788657-14537-4-git-send-email-james.ho...@imgtec.com>
---
 tcg/mips/tcg-target.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f5ba52c..e579c10 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -96,6 +96,13 @@ extern bool use_mips32_instructions;
 extern bool use_mips32r2_instructions;
 #endif
 
+/* MIPS32R6 instruction set detection */
+#if defined(__mips_isa_rev) && (__mips_isa_rev >= 6)
+#define use_mips32r6_instructions  1
+#else
+#define use_mips32r6_instructions  0
+#endif
+
 /* optional instructions */
 #define TCG_TARGET_HAS_div_i32  1
 #define TCG_TARGET_HAS_rem_i32  1
-- 
2.4.3




Re: [Qemu-devel] [PATCH v9 06/17] qapi-visit: Remove redundant functions for flat union base

2015-10-22 Thread Markus Armbruster
Eric Blake  writes:

> The code for visiting the base class of a child struct created
> visit_type_Base_fields() which covers all fields of Base; while
> the code for visiting the base class of a flat union created
> visit_type_Union_fields() covering all fields of the base
> except the discriminator.  But if the base class were to always
> visit all its fields, then we wouldn't need a separate visit of
> the discriminator for a union.  Not only is consistently
> visiting all fields easier to understand, it lets us share code.
>
> Now that gen_visit_struct_fields() can potentially collect more
> than one function into 'ret', a regular expression searching for
> whether a label was used may hit a false positive within the
> body of the first function.  But using a regex was overkill,
> since we can easily determine when we jumped to a label.
>
> Signed-off-by: Eric Blake 
>
> ---
> v9: (no v6-8): hoist from v5 35/46; rebase to master; fix indentation
> botch in gen_visit_union(); polish commit message
> ---
>  scripts/qapi-visit.py | 35 +--
>  1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
> index 8aae8da..91bf350 100644
> --- a/scripts/qapi-visit.py
> +++ b/scripts/qapi-visit.py
> @@ -90,7 +90,7 @@ static void visit_type_%(c_name)s_fields(Visitor *v, 
> %(c_name)s **obj, Error **e
>
>  ret += gen_visit_fields(members, prefix='(*obj)->')
>
> -if re.search('^ *goto out;', ret, re.MULTILINE):
> +if base or members:

What if we have an empty base and no members?  Empty base is a
pathological case, admittedly.

>  ret += mcgen('''
>
>  out:
> @@ -221,8 +221,8 @@ def gen_visit_union(name, base, variants):
>  ret = ''
>
>  if base:
> -members = [m for m in base.members if m != variants.tag_member]
> -ret += gen_visit_struct_fields(name, None, members)
> +ret += gen_visit_struct_fields(base.name, base.base,
> +   base.local_members)
>
>  for var in variants.variants:
>  # Ugly special case for simple union TODO get rid of it
> @@ -247,31 +247,30 @@ void visit_type_%(c_name)s(Visitor *v, %(c_name)s 
> **obj, const char *name, Error
>
>  if base:
>  ret += mcgen('''
> -visit_type_%(c_name)s_fields(v, obj, );
> +visit_type_%(c_name)s_fields(v, (%(c_name)s **)obj, );
>  ''',
> - c_name=c_name(name))
> -ret += gen_err_check(label='out_obj')
> -
> -tag_key = variants.tag_member.name
> -if not variants.tag_name:
> -# we pointlessly use a different key for simple unions
> -tag_key = 'type'
> -ret += mcgen('''
> + c_name=base.c_name())
> +else:
> +ret += mcgen('''
>  visit_type_%(c_type)s(v, &(*obj)->%(c_name)s, "%(name)s", );
> -if (err) {
> -goto out_obj;
> -}
> +''',
> + c_type=variants.tag_member.type.c_name(),
> + # TODO ugly special case for simple union
> + # Use same tag name in C as on the wire to get rid of
> + # it, then: c_name=c_name(variants.tag_member.name)
> + c_name='kind',
> + name=variants.tag_member.name)
> +ret += gen_err_check(label='out_obj')
> +ret += mcgen('''
>  if (!visit_start_union(v, !!(*obj)->data, ) || err) {
>  goto out_obj;
>  }
>  switch ((*obj)->%(c_name)s) {
>  ''',
> - c_type=variants.tag_member.type.c_name(),
>   # TODO ugly special case for simple union
>   # Use same tag name in C as on the wire to get rid of
>   # it, then: c_name=c_name(variants.tag_member.name)
> - c_name=c_name(variants.tag_name or 'kind'),
> - name=tag_key)
> + c_name=c_name(variants.tag_name or 'kind'))
>
>  for var in variants.variants:
>  # TODO ugly special case for simple union

Diff is confusing (not your fault).  Let me compare code before and
after real slow.

= Before =

  def gen_visit_union(name, base, variants):
  ret = ''

0. base is None if and only if the union is simple.

1. If it's a flat union, generate its visit_type_NAME_fields().  This
function visits the union's non-variant members *except* the
discriminator.  Since a simple union has no non-variant members other
than the discriminator, generate it only for flat unions.

  if base:
  members = [m for m in base.members if m != variants.tag_member]
  ret += gen_visit_struct_fields(name, None, members)

2. Generate the visit_type_implicit_FOO() we're going to need.

  for var in variants.variants:
  # Ugly special case for simple union TODO get rid of it
  if not var.simple_union_type():
  ret += gen_visit_implicit_struct(var.type)

3. Generate its 

[Qemu-devel] [PATCH v6 12/12] tests: Add test case for aio_disable_external

2015-10-22 Thread Fam Zheng
Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 tests/test-aio.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/tests/test-aio.c b/tests/test-aio.c
index 03cd45d..1623803 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -374,6 +374,29 @@ static void test_flush_event_notifier(void)
 event_notifier_cleanup();
 }
 
+static void test_aio_external_client(void)
+{
+int i, j;
+
+for (i = 1; i < 3; i++) {
+EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true 
};
+event_notifier_init(, false);
+aio_set_event_notifier(ctx, , true, event_ready_cb);
+event_notifier_set();
+for (j = 0; j < i; j++) {
+aio_disable_external(ctx);
+}
+for (j = 0; j < i; j++) {
+assert(!aio_poll(ctx, false));
+assert(event_notifier_test_and_clear());
+event_notifier_set();
+aio_enable_external(ctx);
+}
+assert(aio_poll(ctx, false));
+event_notifier_cleanup();
+}
+}
+
 static void test_wait_event_notifier_noflush(void)
 {
 EventNotifierTestData data = { .n = 0 };
@@ -832,6 +855,7 @@ int main(int argc, char **argv)
 g_test_add_func("/aio/event/wait",  test_wait_event_notifier);
 g_test_add_func("/aio/event/wait/no-flush-cb",  
test_wait_event_notifier_noflush);
 g_test_add_func("/aio/event/flush", test_flush_event_notifier);
+g_test_add_func("/aio/external-client", test_aio_external_client);
 g_test_add_func("/aio/timer/schedule",  test_timer_schedule);
 
 g_test_add_func("/aio-gsource/flush",   test_source_flush);
-- 
2.4.3




[Qemu-devel] [PATCH v3 21/21] block: Update copyright of the accounting code

2015-10-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 1 +
 include/block/accounting.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block/accounting.c b/block/accounting.c
index 05a5c5f..185025e 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -2,6 +2,7 @@
  * QEMU System Emulator block accounting
  *
  * Copyright (c) 2011 Christoph Hellwig
+ * Copyright (c) 2015 Igalia, S.L.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
diff --git a/include/block/accounting.h b/include/block/accounting.h
index f41ddde..0215a4a 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -2,6 +2,7 @@
  * QEMU System Emulator block accounting
  *
  * Copyright (c) 2011 Christoph Hellwig
+ * Copyright (c) 2015 Igalia, S.L.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
-- 
2.6.1




[Qemu-devel] [PATCH v3 09/21] block: Add average I/O queue depth to BlockDeviceTimedStats

2015-10-22 Thread Alberto Garcia
This patch adds two new fields to BlockDeviceTimedStats that track the
average number of pending read and write requests for a block device.

The values are calculated for the period of time defined for that
interval.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c   | 12 
 block/qapi.c |  5 +
 include/block/accounting.h   |  2 ++
 include/qemu/timed-average.h |  1 +
 qapi/block-core.json |  9 -
 qmp-commands.hx  |  6 ++
 util/timed-average.c | 17 +
 7 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/block/accounting.c b/block/accounting.c
index 61de8ce..a941931 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -143,3 +143,15 @@ int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
 {
 return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns;
 }
+
+double block_acct_queue_depth(BlockAcctTimedStats *stats,
+  enum BlockAcctType type)
+{
+uint64_t sum, elapsed;
+
+assert(type < BLOCK_MAX_IOTYPE);
+
+sum = timed_average_sum(>latency[type], );
+
+return (double) sum / elapsed;
+}
diff --git a/block/qapi.c b/block/qapi.c
index 4baf6e1..99d5303 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -402,6 +402,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 dev_stats->min_flush_latency_ns = timed_average_min(fl);
 dev_stats->max_flush_latency_ns = timed_average_max(fl);
 dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
+
+dev_stats->avg_rd_queue_depth =
+block_acct_queue_depth(ts, BLOCK_ACCT_READ);
+dev_stats->avg_wr_queue_depth =
+block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
 }
 }
 
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 09605bb..f41ddde 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -78,5 +78,7 @@ void block_acct_invalid(BlockAcctStats *stats, enum 
BlockAcctType type);
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests);
 int64_t block_acct_idle_time_ns(BlockAcctStats *stats);
+double block_acct_queue_depth(BlockAcctTimedStats *stats,
+  enum BlockAcctType type);
 
 #endif
diff --git a/include/qemu/timed-average.h b/include/qemu/timed-average.h
index f1cdddc..364bf88 100644
--- a/include/qemu/timed-average.h
+++ b/include/qemu/timed-average.h
@@ -59,5 +59,6 @@ void timed_average_account(TimedAverage *ta, uint64_t value);
 uint64_t timed_average_min(TimedAverage *ta);
 uint64_t timed_average_avg(TimedAverage *ta);
 uint64_t timed_average_max(TimedAverage *ta);
+uint64_t timed_average_sum(TimedAverage *ta, uint64_t *elapsed);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 741f7e6..e32b523 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -450,6 +450,12 @@
 # @avg_flush_latency_ns: Average latency of flush operations in the
 #defined interval, in nanoseconds.
 #
+# @avg_rd_queue_depth: Average number of pending read operations
+#  in the defined interval.
+#
+# @avg_wr_queue_depth: Average number of pending write operations
+#  in the defined interval.
+#
 # Since: 2.5
 ##
 
@@ -458,7 +464,8 @@
 'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
 'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
-'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int' } }
+'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
+'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
 
 ##
 # @BlockDeviceStats:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9f1d2ab..526a317 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2578,6 +2578,12 @@ Each json-object contain the following:
 - "avg_flush_latency_ns": average latency of flush operations
   in the defined interval, in
   nanoseconds (json-int)
+- "avg_rd_queue_depth": average number of pending read
+operations in the defined interval
+(json-number)
+- "avg_wr_queue_depth": average number of pending write
+operations in the defined interval
+(json-number).
 - "parent": Contains recursively the statistics of the underlying
 protocol (e.g. the host file for a qcow2 image). If there is
 no underlying protocol, this field is omitted
diff --git a/util/timed-average.c b/util/timed-average.c
index 98a1170..70926ef 100644
--- a/util/timed-average.c
+++ b/util/timed-average.c
@@ -208,3 

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Shlomo Pongratz
Hi

On Thursday, October 22, 2015, Pavel Fedin  wrote:

>  Hello!
>
> > -Original Message-
> > From: Shlomo Pongratz [mailto:shlomopongr...@gmail.com ]
> > Sent: Tuesday, October 20, 2015 8:22 PM
> > To: qemu-devel@nongnu.org 
> > Cc: p.fe...@samsung.com ; peter.mayd...@linaro.org
> ; eric.au...@linaro.org ;
> > shannon.z...@linaro.org ; imamm...@redhat.com
> ; ash...@broadcom.com ; Shlomo Pongratz
> > Subject: [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions
> support
> >
> > From: Shlomo Pongratz >
> >
> > Add system instructions used by the Linux (kernel) GICv3
> > device driver
> >
> > Signed-off-by: Shlomo Pongratz  >
> > ---
> >  target-arm/cpu-qom.h |   1 +
> >  target-arm/cpu.h |  12 ++
> >  target-arm/cpu64.c   | 118
> +++
> >  3 files changed, 131 insertions(+)
> >
> > diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
> > index 25fb1ce..6a50433 100644
> > --- a/target-arm/cpu-qom.h
> > +++ b/target-arm/cpu-qom.h
> > @@ -220,6 +220,7 @@ hwaddr arm_cpu_get_phys_page_debug(CPUState *cpu,
> vaddr addr);
> >
> >  int arm_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
> >  int arm_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
> > +void aarch64_registers_with_opaque_set(Object *obj, void *opaque);
> >
> >  /* Callback functions for the generic timer's timers. */
> >  void arm_gt_ptimer_cb(void *opaque);
> > diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> > index 3daa7f5..d561313 100644
> > --- a/target-arm/cpu.h
> > +++ b/target-arm/cpu.h
> > @@ -1034,6 +1034,18 @@ void armv7m_nvic_set_pending(void *opaque, int
> irq);
> >  int armv7m_nvic_acknowledge_irq(void *opaque);
> >  void armv7m_nvic_complete_irq(void *opaque, int irq);
> >
> > +void armv8_gicv3_set_sgi(void *opaque, int cpuindex, uint64_t value);
> > +uint64_t armv8_gicv3_acknowledge_irq(void *opaque, int cpuindex,
> > +  MemTxAttrs attrs);
> > +void armv8_gicv3_complete_irq(void *opaque, int cpuindex, int irq,
> > +  MemTxAttrs attrs);
> > +uint64_t armv8_gicv3_get_priority_mask(void *opaque, int cpuindex);
> > +void armv8_gicv3_set_priority_mask(void *opaque, int cpuindex, uint32_t
> mask);
> > +uint64_t armv8_gicv3_get_sre(void *opaque);
> > +void armv8_gicv3_set_sre(void *opaque, uint64_t sre);
> > +uint64_t armv8_gicv3_get_igrpen1(void *opaque, int cpuindex);
> > +void armv8_gicv3_set_igrpen1(void *opaque, int cpuindex, uint64_t
> igrpen1);
> > +
> >  /* Interface for defining coprocessor registers.
> >   * Registers are defined in tables of arm_cp_reginfo structs
> >   * which are passed to define_arm_cp_regs().
> > diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
> > index 63c8b1c..4224779 100644
> > --- a/target-arm/cpu64.c
> > +++ b/target-arm/cpu64.c
> > @@ -45,6 +45,115 @@ static uint64_t a57_a53_l2ctlr_read(CPUARMState
> *env, const ARMCPRegInfo
> > *ri)
> >  }
> >  #endif
> >
> > +#ifndef CONFIG_USER_ONLY
> > +static void sgi_write(CPUARMState *env, const ARMCPRegInfo *ri,
> uint64_t value)
> > +{
> > +CPUState *cpu = ENV_GET_CPU(env);
> > +armv8_gicv3_set_sgi(ri->opaque, cpu->cpu_index, value);
> > +}
> > +
> > +static uint64_t iar_read(CPUARMState *env, const ARMCPRegInfo *ri)
> > +{
> > +uint64_t value;
> > +MemTxAttrs attrs;;
> > +CPUState *cpu = ENV_GET_CPU(env);
> > +attrs.secure = arm_is_secure_below_el3(env) ? 1 : 0;
> > +value = armv8_gicv3_acknowledge_irq(ri->opaque, cpu->cpu_index,
> attrs);
> > +return value;
> > +}
> > +
> > +static void sre_write(CPUARMState *env, const ARMCPRegInfo *ri,
> uint64_t value)
> > +{
> > +armv8_gicv3_set_sre(ri->opaque, value);
> > +}
> > +
> > +static uint64_t sre_read(CPUARMState *env, const ARMCPRegInfo *ri)
> > +{
> > +uint64_t value;
> > +value = armv8_gicv3_get_sre(ri->opaque);
> > +return value;
> > +}
> > +
> > +static void eoir_write(CPUARMState *env, const ARMCPRegInfo *ri,
> uint64_t value)
> > +{
> > +MemTxAttrs attrs;
> > +CPUState *cpu = ENV_GET_CPU(env);
> > +attrs.secure = arm_is_secure_below_el3(env) ? 1 : 0;
> > +armv8_gicv3_complete_irq(ri->opaque, cpu->cpu_index, value, attrs);
> > +}
> > +
> > +static uint64_t pmr_read(CPUARMState *env, const ARMCPRegInfo *ri)
> > +{
> > +uint64_t value;
> > +CPUState *cpu = ENV_GET_CPU(env);
> > +value = armv8_gicv3_get_priority_mask(ri->opaque, cpu->cpu_index);
> > +return value;
> > +}
> > +
> > +static void pmr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> uint64_t value)
> > +{
> > +CPUState *cpu = ENV_GET_CPU(env);
> > +armv8_gicv3_set_priority_mask(ri->opaque, cpu->cpu_index, value);
> > +}
> > +
> > +static uint64_t igrpen1_read(CPUARMState *env, 

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Shlomo Pongratz
Hi

On Thursday, October 22, 2015, Pavel Fedin  wrote:

>  Hello!
>
> > I've implemented the registers accessed by Linux driver in
> drivers/irqchip/irq-gic-v3.c
> > If this register is used only with KVM e.g. virt/kvm/arm/vgic-v3.c than
> it is out of my mandate.
>
>  It has nothing to do with KVM. EFI is a firmware, which originates from
> Intel, but now adopted by ARM64 architecture too. You can also run it under
> qemu, if you want to make kind of "full" machine. And it writes some value
> to BPR1, which is indeed ignored by Linux kernel.
>
>
So were is it used in QEMU?
Which machine in hw/arm needs it?


> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
>
>


[Qemu-devel] [PATCH v1 3/5] sockets: remove use of QemuOpts from socket_connect

2015-10-22 Thread Daniel P. Berrange
The socket_connect method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_connect_opts/unix_connect_opts helper methods. By
converting the latter to use QAPI SocketAddress directly,
the QemuOpts conversion step can be eliminated

Signed-off-by: Daniel P. Berrange 
---
 util/qemu-sockets.c | 91 +
 1 file changed, 36 insertions(+), 55 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 768ca52..420f9ff 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -388,38 +388,34 @@ static int inet_connect_addr(struct addrinfo *addr, bool 
*in_progress,
 return sock;
 }
 
-static struct addrinfo *inet_parse_connect_opts(QemuOpts *opts, Error **errp)
+static struct addrinfo *inet_parse_connect_saddr(InetSocketAddress *saddr,
+ Error **errp)
 {
 struct addrinfo ai, *res;
 int rc;
-const char *addr;
-const char *port;
+Error *err = NULL;
 
 memset(, 0, sizeof(ai));
 
 ai.ai_flags = AI_CANONNAME | AI_V4MAPPED | AI_ADDRCONFIG;
-ai.ai_family = PF_UNSPEC;
+ai.ai_family = inet_ai_family_from_address(saddr, );
 ai.ai_socktype = SOCK_STREAM;
 
-addr = qemu_opt_get(opts, "host");
-port = qemu_opt_get(opts, "port");
-if (addr == NULL || port == NULL) {
-error_setg(errp, "host and/or port not specified");
+if (err) {
+error_propagate(errp, err);
 return NULL;
 }
 
-if (qemu_opt_get_bool(opts, "ipv4", 0)) {
-ai.ai_family = PF_INET;
-}
-if (qemu_opt_get_bool(opts, "ipv6", 0)) {
-ai.ai_family = PF_INET6;
+if (saddr->host == NULL || saddr->port == NULL) {
+error_setg(errp, "host and/or port not specified");
+return NULL;
 }
 
 /* lookup */
-rc = getaddrinfo(addr, port, , );
+rc = getaddrinfo(saddr->host, saddr->port, , );
 if (rc != 0) {
-error_setg(errp, "address resolution failed for %s:%s: %s", addr, port,
-   gai_strerror(rc));
+error_setg(errp, "address resolution failed for %s:%s: %s",
+   saddr->host, saddr->port, gai_strerror(rc));
 return NULL;
 }
 return res;
@@ -428,8 +424,7 @@ static struct addrinfo *inet_parse_connect_opts(QemuOpts 
*opts, Error **errp)
 /**
  * Create a socket and connect it to an address.
  *
- * @opts: QEMU options, recognized parameters strings "host" and "port",
- *bools "ipv4" and "ipv6".
+ * @saddr: Inet socket address specification
  * @errp: set on error
  * @callback: callback function for non-blocking connect
  * @opaque: opaque for callback function
@@ -440,8 +435,8 @@ static struct addrinfo *inet_parse_connect_opts(QemuOpts 
*opts, Error **errp)
  * function succeeds, callback will be called when the connection
  * completes, with the file descriptor on success, or -1 on error.
  */
-static int inet_connect_opts(QemuOpts *opts, Error **errp,
- NonBlockingConnectHandler *callback, void *opaque)
+static int inet_connect_saddr(InetSocketAddress *saddr, Error **errp,
+  NonBlockingConnectHandler *callback, void 
*opaque)
 {
 Error *local_err = NULL;
 struct addrinfo *res, *e;
@@ -449,7 +444,7 @@ static int inet_connect_opts(QemuOpts *opts, Error **errp,
 bool in_progress;
 ConnectState *connect_state = NULL;
 
-res = inet_parse_connect_opts(opts, errp);
+res = inet_parse_connect_saddr(saddr, errp);
 if (!res) {
 return -1;
 }
@@ -701,17 +696,13 @@ int inet_listen(const char *str, char *ostr, int olen,
  **/
 int inet_connect(const char *str, Error **errp)
 {
-QemuOpts *opts;
 int sock = -1;
 InetSocketAddress *addr;
 
 addr = inet_parse(str, errp);
 if (addr != NULL) {
-opts = qemu_opts_create(_optslist, NULL, 0, _abort);
-inet_addr_to_opts(opts, addr);
+sock = inet_connect_saddr(addr, errp, NULL, NULL);
 qapi_free_InetSocketAddress(addr);
-sock = inet_connect_opts(opts, errp, NULL, NULL);
-qemu_opts_del(opts);
 }
 return sock;
 }
@@ -733,7 +724,6 @@ int inet_nonblocking_connect(const char *str,
  NonBlockingConnectHandler *callback,
  void *opaque, Error **errp)
 {
-QemuOpts *opts;
 int sock = -1;
 InetSocketAddress *addr;
 
@@ -741,11 +731,8 @@ int inet_nonblocking_connect(const char *str,
 
 addr = inet_parse(str, errp);
 if (addr != NULL) {
-opts = qemu_opts_create(_optslist, NULL, 0, _abort);
-inet_addr_to_opts(opts, addr);
+sock = inet_connect_saddr(addr, errp, callback, opaque);
 qapi_free_InetSocketAddress(addr);
-sock = inet_connect_opts(opts, errp, callback, opaque);
-qemu_opts_del(opts);
 }
 return sock;
 }
@@ -821,15 +808,14 @@ err:
 return -1;
 }
 
-static int 

[Qemu-devel] [RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device

2015-10-22 Thread Lan Tianyu
Add "virtfn_index" member in the struct pci_device to record VF sequence
of PF. This will be used in the VF sysfs node handle.

Signed-off-by: Lan Tianyu 
---
 drivers/pci/iov.c   | 1 +
 include/linux/pci.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..065b6bb 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -136,6 +136,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
virtfn->physfn = pci_dev_get(dev);
virtfn->is_virtfn = 1;
virtfn->multifunction = 0;
+   virtfn->virtfn_index = id;
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = >resource[i + PCI_IOV_RESOURCES];
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..85c5531 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -356,6 +356,7 @@ struct pci_dev {
unsigned intio_window_1k:1; /* Intel P2P bridge 1K I/O windows */
unsigned intirq_managed:1;
pci_dev_flags_t dev_flags;
+   unsigned intvirtfn_index;
atomic_tenable_cnt; /* pci_enable_device has been called */
 
u32 saved_config_space[16]; /* config space saved at 
suspend time */
-- 
1.8.4.rc0.1.g8f6a3e5.dirty




Re: [Qemu-devel] [PATCH RFC V5 1/9] hw/intc: Implement GIC-500 support files

2015-10-22 Thread Shlomo Pongratz
O.K.

On Wednesday, October 21, 2015, Pavel Fedin  wrote:

>  Hello!
>
> > Do you mean that in virt.c::create_gic I'll take the cpu's affinity from
> the cpu's property and not directly from
> > ARM_CPU(qemu_get_cpu(i))->mp_affinity
>
>  I mean that you can do it in your GIC's realize function. And, even
> better, in arm_gicv3_common_realize(), because KVM GICv3 live migration
> code will also need it:
> --- cut ---
> for (i = 0; i < s->num_cpu; i++) {
> Object *cpu = OBJECT(qemu_get_cpu(i));
> s->cpu[i].mp_affinity = object_property_get_int(cpu, "mp-affinity",
> NULL);
> }
> --- cut ---
>
> > I can do that but that depends on acceptance of your patch.
>
>  Peter ACKed it, just he doesn't like having unused code:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg03105.html
>  Just include it into your next respin and forget. :) Actually, i made my
> RFC so that you could just take 0001 and 0002 from it and use for your
> purpose. With additions, of course, if necessary.
>
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
>
>
>


[Qemu-devel] [PATCH qemu 0/2] ppc: Add "ibm,pa-features"

2015-10-22 Thread Alexey Kardashevskiy
This adds an "ibm,pa-features" property. Please comment. Thanks.


Benjamin Herrenschmidt (2):
  ppc: Add mmu_model defines for arch 2.03 and 2.07
  ppc/spapr: Add "ibm,pa-features" property to the device-tree

 hw/ppc/spapr.c  | 31 +++
 target-ppc/cpu.h| 11 ++-
 target-ppc/kvm.c| 15 ---
 target-ppc/mmu_helper.c | 16 
 target-ppc/translate.c  |  4 ++--
 target-ppc/translate_init.c |  5 +++--
 6 files changed, 62 insertions(+), 20 deletions(-)

-- 
2.5.0.rc3




[Qemu-devel] [PATCH qemu 2/2] ppc/spapr: Add "ibm, pa-features" property to the device-tree

2015-10-22 Thread Alexey Kardashevskiy
From: Benjamin Herrenschmidt 

LoPAPR defines a "ibm,pa-features" per-CPU device tree property which
describes extended features of the Processor Architecture.

This adds the property to the device tree. At the moment this is the
copy of what pHyp advertises except "I=1 (cache inhibited) Large Pages"
which is enabled for TCG and disabled when running under HV KVM host
with 4K system page size.

Signed-off-by: Benjamin Herrenschmidt 
[aik: rebased, changed commit log, moved ci_large_pages initialization,
renamed pa_features arrays]
Signed-off-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr.c  | 31 +++
 target-ppc/cpu.h|  1 +
 target-ppc/kvm.c|  7 +++
 target-ppc/translate_init.c |  1 +
 4 files changed, 40 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3852ad1..21c1312 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -597,6 +597,24 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 uint32_t vcpus_per_socket = smp_threads * smp_cores;
 uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
 
+/* Note: we keep CI large pages off for now because a 64K capable guest
+ * provisioned with large pages might otherwise try to map a qemu
+ * framebuffer (or other kind of memory mapped PCI BAR) using 64K pages
+ * even if that qemu runs on a 4k host.
+ *
+ * We can later add this bit back when we are confident this is not
+ * an issue (!HV KVM or 64K host)
+ */
+uint8_t pa_features_206[] = { 6, 0,
+0xf6, 0x1f, 0xc7, 0x00, 0x80, 0xc0 };
+uint8_t pa_features_207[] = { 24, 0,
+0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
+0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
+uint8_t *pa_features;
+size_t pa_size;
+
 _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
 _FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
 
@@ -662,6 +680,19 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
   page_sizes_prop, page_sizes_prop_size)));
 }
 
+/* Do the ibm,pa-features property, adjust it for ci-large-pages */
+if (env->mmu_model == POWERPC_MMU_2_06) {
+pa_features = pa_features_206;
+pa_size = sizeof(pa_features_206);
+} else /* env->mmu_model == POWERPC_MMU_2_07 */ {
+pa_features = pa_features_207;
+pa_size = sizeof(pa_features_207);
+}
+if (env->ci_large_pages) {
+pa_features[3] |= 0x20;
+}
+_FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
+
 _FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id",
cs->cpu_index / vcpus_per_socket)));
 
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 69d8cf6..b34aed6 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1073,6 +1073,7 @@ struct CPUPPCState {
 uint64_t insns_flags2;
 #if defined(TARGET_PPC64)
 struct ppc_segment_page_sizes sps;
+bool ci_large_pages;
 #endif
 
 #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 7671ae7..0c59f7f 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -414,6 +414,13 @@ static void kvm_fixup_page_sizes(PowerPCCPU *cpu)
 /* Convert to QEMU form */
 memset(>sps, 0, sizeof(env->sps));
 
+/* If we have HV KVM, we need to forbid CI large pages if our
+ * host page size is smaller than 64K.
+ */
+if (smmu_info.flags & KVM_PPC_PAGE_SIZES_REAL) {
+env->ci_large_pages = getpagesize() >= 0x1;
+}
+
 /*
  * XXX This loop should be an entry wide AND of the capabilities that
  * the selected CPU has with the capabilities that KVM supports.
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 2adbb63..4934c80 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7864,6 +7864,7 @@ static void init_proc_book3s_64(CPUPPCState *env, int 
version)
 gen_spr_book3s_ids(env);
 gen_spr_amr(env);
 gen_spr_book3s_purr(env);
+env->ci_large_pages = true;
 break;
 default:
 g_assert_not_reached();
-- 
2.5.0.rc3




Re: [Qemu-devel] [PATCH 1/2] iscsi: Translate scsi sense into error code

2015-10-22 Thread Peter Lieven

Am 22.10.2015 um 10:17 schrieb Fam Zheng:

Previously we return -EIO blindly when anything goes wrong. Add a helper
function to parse sense fields and try to make the return code more
meaningful.

Signed-off-by: Fam Zheng 
---
  block/iscsi.c | 56 +++-
  1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 93f1ee4..f3e20ae 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -84,6 +84,7 @@ typedef struct IscsiTask {
  IscsiLun *iscsilun;
  QEMUTimer retry_timer;
  bool force_next_flush;
+int err_code;
  } IscsiTask;
  
  typedef struct IscsiAIOCB {

@@ -182,6 +183,58 @@ static inline unsigned exp_random(double mean)
  #define QEMU_SCSI_STATUS_TIMEOUTSCSI_STATUS_TIMEOUT
  #endif
  
+static int iscsi_translate_sense(struct scsi_sense *sense)

+{
+int ret = 0;
+
+switch (sense->key) {
+case SCSI_SENSE_NO_SENSE:
+return 0;
+break;
+case SCSI_SENSE_NOT_READY:
+return -EBUSY;
+break;
+case SCSI_SENSE_DATA_PROTECTION:
+return -EACCES;
+break;
+case SCSI_SENSE_COMMAND_ABORTED:
+return -ECANCELED;
+break;
+case SCSI_SENSE_ILLEGAL_REQUEST:
+/* Parse ASCQ */
+break;
+default:
+return -EIO;
+break;
+}
+switch (sense->ascq) {
+case SCSI_SENSE_ASCQ_PARAMETER_LIST_LENGTH_ERROR:
+case SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE:
+case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_CDB:
+case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST:
+ret = -EINVAL;
+break;
+case SCSI_SENSE_ASCQ_LBA_OUT_OF_RANGE:
+ret = -ERANGE;
+break;
+case SCSI_SENSE_ASCQ_LOGICAL_UNIT_NOT_SUPPORTED:
+ret = -ENOTSUP;
+break;
+case SCSI_SENSE_ASCQ_WRITE_PROTECTED:
+ret = -EACCES;
+break;
+case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT:
+case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_CLOSED:
+case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_OPEN:
+ret = -ENOMEDIUM;
+break;
+default:
+ret = -EIO;
+break;
+}
+return ret;
+}
+
  static void
  iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
  void *command_data, void *opaque)
@@ -226,6 +279,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
  return;
  }
  }
+iTask->err_code = iscsi_translate_sense(>sense);
  error_report("iSCSI Failure: %s", iscsi_get_error(iscsi));
  } else {
  iTask->iscsilun->force_next_flush |= iTask->force_next_flush;
@@ -455,7 +509,7 @@ retry:
  }
  
  if (iTask.status != SCSI_STATUS_GOOD) {

-return -EIO;
+return iTask.err_code;
  }


why do you only use that translated error code for writev? Other calls could 
benefit from it as well.

Peter




[Qemu-devel] [PATCH] virtio-9p: add savem handlers

2015-10-22 Thread Greg Kurz
We don't support migration of mounted 9p shares. This is handled by a
migration blocker.

One would expect, however, to be able to migrate if the share is unmounted.
Unfortunately virtio-9p-device does not register savevm handlers at all !
Migration succeeds and leaves the guest with a dangling device...

This patch simply registers migration handlers for virtio-9p-device. Whether
migration is possible or not still depends on the migration blocker.

Signed-off-by: Greg Kurz 
---
Michael, Aneesh,

This is the same patch minus the call to unregister_savevm() since we don't
have an unrealize handler.

I decided to simply drop all the other patches. Hot-unplug support is totally
missing and definitely needs more work. I'll try to come up with a solution
in its own series.

Cheers.

--
Greg

---
 hw/9pfs/virtio-9p-device.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 93a407c45926..e3abcfaffb2a 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -43,6 +43,16 @@ static void virtio_9p_get_config(VirtIODevice *vdev, uint8_t 
*config)
 g_free(cfg);
 }
 
+static void virtio_9p_save(QEMUFile *f, void *opaque)
+{
+virtio_save(VIRTIO_DEVICE(opaque), f);
+}
+
+static int virtio_9p_load(QEMUFile *f, void *opaque, int version_id)
+{
+return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
+}
+
 static void virtio_9p_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -130,6 +140,7 @@ static void virtio_9p_device_realize(DeviceState *dev, 
Error **errp)
 }
 v9fs_path_free();
 
+register_savevm(dev, "virtio-9p", -1, 1, virtio_9p_save, virtio_9p_load, 
s);
 return;
 out:
 g_free(s->ctx.fs_root);




[Qemu-devel] [PULL v3 2/4] crypto: don't let builtin aes crash if no IV is provided

2015-10-22 Thread Daniel P. Berrange
If no IV is provided, then use a default IV of all-zeros
instead of crashing. This gives parity with gcrypt and
nettle backends.

Signed-off-by: Daniel P. Berrange 
---
 crypto/cipher-builtin.c| 14 +-
 tests/test-crypto-cipher.c | 30 ++
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/crypto/cipher-builtin.c b/crypto/cipher-builtin.c
index 30f4853..37e1a19 100644
--- a/crypto/cipher-builtin.c
+++ b/crypto/cipher-builtin.c
@@ -25,8 +25,7 @@ typedef struct QCryptoCipherBuiltinAES 
QCryptoCipherBuiltinAES;
 struct QCryptoCipherBuiltinAES {
 AES_KEY encrypt_key;
 AES_KEY decrypt_key;
-uint8_t *iv;
-size_t niv;
+uint8_t iv[AES_BLOCK_SIZE];
 };
 typedef struct QCryptoCipherBuiltinDESRFB QCryptoCipherBuiltinDESRFB;
 struct QCryptoCipherBuiltinDESRFB {
@@ -61,7 +60,6 @@ static void qcrypto_cipher_free_aes(QCryptoCipher *cipher)
 {
 QCryptoCipherBuiltin *ctxt = cipher->opaque;
 
-g_free(ctxt->state.aes.iv);
 g_free(ctxt);
 cipher->opaque = NULL;
 }
@@ -145,15 +143,13 @@ static int qcrypto_cipher_setiv_aes(QCryptoCipher *cipher,
  Error **errp)
 {
 QCryptoCipherBuiltin *ctxt = cipher->opaque;
-if (niv != 16) {
-error_setg(errp, "IV must be 16 bytes not %zu", niv);
+if (niv != AES_BLOCK_SIZE) {
+error_setg(errp, "IV must be %d bytes not %zu",
+   AES_BLOCK_SIZE, niv);
 return -1;
 }
 
-g_free(ctxt->state.aes.iv);
-ctxt->state.aes.iv = g_new0(uint8_t, niv);
-memcpy(ctxt->state.aes.iv, iv, niv);
-ctxt->state.aes.niv = niv;
+memcpy(ctxt->state.aes.iv, iv, AES_BLOCK_SIZE);
 
 return 0;
 }
diff --git a/tests/test-crypto-cipher.c b/tests/test-crypto-cipher.c
index 9d38d26..1b60c34 100644
--- a/tests/test-crypto-cipher.c
+++ b/tests/test-crypto-cipher.c
@@ -287,6 +287,32 @@ static void test_cipher(const void *opaque)
 qcrypto_cipher_free(cipher);
 }
 
+
+static void test_cipher_null_iv(void)
+{
+QCryptoCipher *cipher;
+uint8_t key[32] = { 0 };
+uint8_t plaintext[32] = { 0 };
+uint8_t ciphertext[32] = { 0 };
+
+cipher = qcrypto_cipher_new(
+QCRYPTO_CIPHER_ALG_AES_256,
+QCRYPTO_CIPHER_MODE_CBC,
+key, sizeof(key),
+_abort);
+g_assert(cipher != NULL);
+
+/* Don't call qcrypto_cipher_setiv */
+
+qcrypto_cipher_encrypt(cipher,
+   plaintext,
+   ciphertext,
+   sizeof(plaintext),
+   _abort);
+
+qcrypto_cipher_free(cipher);
+}
+
 int main(int argc, char **argv)
 {
 size_t i;
@@ -298,5 +324,9 @@ int main(int argc, char **argv)
 for (i = 0; i < G_N_ELEMENTS(test_data); i++) {
 g_test_add_data_func(test_data[i].path, _data[i], test_cipher);
 }
+
+g_test_add_func("/crypto/cipher/null-iv",
+test_cipher_null_iv);
+
 return g_test_run();
 }
-- 
2.4.3




[Qemu-devel] [PULL v3 3/4] crypto: add sanity checking of plaintext/ciphertext length

2015-10-22 Thread Daniel P. Berrange
When encrypting/decrypting data, the plaintext/ciphertext
buffers are required to be a multiple of the cipher block
size. If this is not done, nettle will abort and gcrypt
will report an error. To get consistent behaviour add
explicit checks upfront for the buffer sizes.

Signed-off-by: Daniel P. Berrange 
---
 crypto/cipher-builtin.c| 15 
 crypto/cipher-gcrypt.c | 61 ++
 crypto/cipher-nettle.c | 28 +++--
 tests/test-crypto-cipher.c | 50 +
 4 files changed, 130 insertions(+), 24 deletions(-)

diff --git a/crypto/cipher-builtin.c b/crypto/cipher-builtin.c
index 37e1a19..39e31a7 100644
--- a/crypto/cipher-builtin.c
+++ b/crypto/cipher-builtin.c
@@ -39,6 +39,7 @@ struct QCryptoCipherBuiltin {
 QCryptoCipherBuiltinAES aes;
 QCryptoCipherBuiltinDESRFB desrfb;
 } state;
+size_t blocksize;
 void (*free)(QCryptoCipher *cipher);
 int (*setiv)(QCryptoCipher *cipher,
  const uint8_t *iv, size_t niv,
@@ -181,6 +182,7 @@ static int qcrypto_cipher_init_aes(QCryptoCipher *cipher,
 goto error;
 }
 
+ctxt->blocksize = AES_BLOCK_SIZE;
 ctxt->free = qcrypto_cipher_free_aes;
 ctxt->setiv = qcrypto_cipher_setiv_aes;
 ctxt->encrypt = qcrypto_cipher_encrypt_aes;
@@ -282,6 +284,7 @@ static int qcrypto_cipher_init_des_rfb(QCryptoCipher 
*cipher,
 memcpy(ctxt->state.desrfb.key, key, nkey);
 ctxt->state.desrfb.nkey = nkey;
 
+ctxt->blocksize = 8;
 ctxt->free = qcrypto_cipher_free_des_rfb;
 ctxt->setiv = qcrypto_cipher_setiv_des_rfb;
 ctxt->encrypt = qcrypto_cipher_encrypt_des_rfb;
@@ -370,6 +373,12 @@ int qcrypto_cipher_encrypt(QCryptoCipher *cipher,
 {
 QCryptoCipherBuiltin *ctxt = cipher->opaque;
 
+if (len % ctxt->blocksize) {
+error_setg(errp, "Length %zu must be a multiple of block size %zu",
+   len, ctxt->blocksize);
+return -1;
+}
+
 return ctxt->encrypt(cipher, in, out, len, errp);
 }
 
@@ -382,6 +391,12 @@ int qcrypto_cipher_decrypt(QCryptoCipher *cipher,
 {
 QCryptoCipherBuiltin *ctxt = cipher->opaque;
 
+if (len % ctxt->blocksize) {
+error_setg(errp, "Length %zu must be a multiple of block size %zu",
+   len, ctxt->blocksize);
+return -1;
+}
+
 return ctxt->decrypt(cipher, in, out, len, errp);
 }
 
diff --git a/crypto/cipher-gcrypt.c b/crypto/cipher-gcrypt.c
index 8cfc562..c4f8114 100644
--- a/crypto/cipher-gcrypt.c
+++ b/crypto/cipher-gcrypt.c
@@ -34,6 +34,11 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg)
 }
 }
 
+typedef struct QCryptoCipherGcrypt QCryptoCipherGcrypt;
+struct QCryptoCipherGcrypt {
+gcry_cipher_hd_t handle;
+size_t blocksize;
+};
 
 QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
   QCryptoCipherMode mode,
@@ -41,7 +46,7 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
   Error **errp)
 {
 QCryptoCipher *cipher;
-gcry_cipher_hd_t handle;
+QCryptoCipherGcrypt *ctx;
 gcry_error_t err;
 int gcryalg, gcrymode;
 
@@ -87,7 +92,9 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
 cipher->alg = alg;
 cipher->mode = mode;
 
-err = gcry_cipher_open(, gcryalg, gcrymode, 0);
+ctx = g_new0(QCryptoCipherGcrypt, 1);
+
+err = gcry_cipher_open(>handle, gcryalg, gcrymode, 0);
 if (err != 0) {
 error_setg(errp, "Cannot initialize cipher: %s",
gcry_strerror(err));
@@ -100,10 +107,12 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm 
alg,
  * bizarre RFB variant of DES :-)
  */
 uint8_t *rfbkey = qcrypto_cipher_munge_des_rfb_key(key, nkey);
-err = gcry_cipher_setkey(handle, rfbkey, nkey);
+err = gcry_cipher_setkey(ctx->handle, rfbkey, nkey);
 g_free(rfbkey);
+ctx->blocksize = 8;
 } else {
-err = gcry_cipher_setkey(handle, key, nkey);
+err = gcry_cipher_setkey(ctx->handle, key, nkey);
+ctx->blocksize = 16;
 }
 if (err != 0) {
 error_setg(errp, "Cannot set key: %s",
@@ -111,11 +120,12 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm 
alg,
 goto error;
 }
 
-cipher->opaque = handle;
+cipher->opaque = ctx;
 return cipher;
 
  error:
-gcry_cipher_close(handle);
+gcry_cipher_close(ctx->handle);
+g_free(ctx);
 g_free(cipher);
 return NULL;
 }
@@ -123,12 +133,13 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm 
alg,
 
 void qcrypto_cipher_free(QCryptoCipher *cipher)
 {
-gcry_cipher_hd_t handle;
+QCryptoCipherGcrypt *ctx;
 if (!cipher) {
 return;
 }
-handle = cipher->opaque;
-gcry_cipher_close(handle);
+ctx = cipher->opaque;
+gcry_cipher_close(ctx->handle);
+g_free(ctx);
 

Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-22 Thread Valerio Aimale

On 10/21/15 4:54 AM, Markus Armbruster wrote:

Valerio Aimale  writes:


On 10/19/15 1:52 AM, Markus Armbruster wrote:

Valerio Aimale  writes:


On 10/16/15 2:15 AM, Markus Armbruster wrote:

vale...@aimale.com writes:


All-

I've produced a patch for the current QEMU HEAD, for libvmi to
introspect QEMU/KVM VMs.

Libvmi has patches for the old qeum-kvm fork, inside its source tree:
https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch

This patch adds a hmp and a qmp command, "pmemaccess". When the
commands is invoked with a string arguments (a filename), it will open
a UNIX socket and spawn a listening thread.

The client writes binary commands to the socket, in the form of a c
structure:

struct request {
uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
uint64_t address;   // address to read from OR write to
uint64_t length;// number of bytes to read OR write
};

The client receives as a response, either (length+1) bytes, if it is a
read operation, or 1 byte ifit is a write operation.

The last bytes of a read operation response indicates success (1
success, 0 failure). The single byte returned for a write operation
indicates same (1 success, 0 failure).

So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
garbage followed by the "it failed" byte?

Markus, that appear to be the case. However, I did not write the
communication protocol between libvmi and qemu. I'm assuming that the
person that wrote the protocol, did not want to bother with over
complicating things.

https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c

I'm thinking he assumed reads would be small in size and the price of
reading garbage was less than the price of writing a more complicated
protocol. I can see his point, confronted with the same problem, I
might have done the same.

All right, the interface is designed for *small* memory blocks then.

Makes me wonder why he needs a separate binary protocol on a separate
socket.  Small blocks could be done just fine in QMP.

The problem is speed. if one's analyzing the memory space of a running
process (physical and paged), libvmi will make a large number of small
and mid-sized reads. If one uses xp, or pmemsave, the overhead is
quite significant. xp has overhead due to encoding, and pmemsave has
overhead due to file open/write (server), file open/read/close/unlink
(client).

Others have gone through the problem before me. It appears that
pmemsave and xp are significantly slower than reading memory using a
socket via pmemaccess.

That they're slower isn't surprising, but I'd expect the cost of
encoding a small block to be insiginificant compared to the cost of the
network roundtrips.

As block size increases, the space overhead of encoding will eventually
bite.  But for that usage, the binary protocol appears ill-suited,
unless the client can pretty reliably avoid read failure.  I haven't
examined its failure modes, yet.


The following data is not mine, but it shows the time, in
milliseconds, required to resolve the content of a paged memory
address via socket (pmemaccess) , pmemsave and xp

http://cl.ly/image/322a3s0h1V05

Again, I did not produce those data points, they come from an old
libvmi thread.

90ms is a very long time.  What exactly was measured?


I think it might be conceivable that there could be a QMP command that
returns the content of an arbitrarily size memory region as a base64
or a base85 json string. It would still have both time- (due to
encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
overhead, + json encoding/decoding overhead. It might still be the
case that socket would outperform such a command as well,
speed-vise. I don't think it would be any faster than xp.

A special-purpose binary protocol over a dedicated socket will always do
less than a QMP solution (ignoring foolishness like transmitting crap on
read error the client is then expected to throw away).  The question is
whether the difference in work translates to a worthwhile difference in
performance.

The larger question is actually whether we have an existing interface
that can serve the libvmi's needs.  We've discussed monitor commands
like xp, pmemsave, pmemread.  There's another existing interface: the
GDB stub.  Have you considered it?


There's also a similar patch, floating around the internet, the uses
shared memory, instead of sockets, as inter-process communication
between libvmi and QEMU. I've never used that.

By the time you built a working IPC mechanism on top of shared memory,
you're often no better off than with AF_LOCAL sockets.

Crazy idea: can we allocate guest memory in a way that support sharing
it with another process?  Eduardo, can -mem-path do such wild things?

Markus, your suggestion led to a lightbulb going off in my head.

What if there was a qmp command, say 'pmemmap' then when invoked, 
performs the following:


qmp_pmemmap( 

Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-22 Thread Eduardo Habkost
On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> Valerio Aimale  writes:
[...]
> > There's also a similar patch, floating around the internet, the uses
> > shared memory, instead of sockets, as inter-process communication
> > between libvmi and QEMU. I've never used that.
> 
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
> 
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process?  Eduardo, can -mem-path do such wild things?

It can't today, but just because it creates a temporary file inside
mem-path and unlinks it immediately after opening a file descriptor. We
could make memory-backend-file also accept a full filename as argument,
or add a mechanism to let QEMU send the open file descriptor to a QMP
client.

-- 
Eduardo



Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection

2015-10-22 Thread Paolo Bonzini


On 22/10/2015 19:39, Radim Krčmář wrote:
> 2015-10-22 18:14+0200, Paolo Bonzini:
>> On 22/10/2015 18:02, Eric Blake wrote:
>>> I see a bug in there:
>>
>> Of course.  You shouldn't have told me what the bug was, I deserved
>> to look for it myself. :)
> 
> It rather seems that you don't want spoilers, :)
> 
> I see two bugs now.

Me too. :)  But Rusty surely has some testcases in case he wants to
adopt some of the ideas here. O:-)

Paolo

>> bool memeqzero4_paolo(const void *data, size_t length)
>> {
>> const unsigned char *p = data;
>> unsigned long word;
>>
>> while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
>> if (*p)
>> return false;
>> p++;
>> length--;
>> if (!length)
>> return true;
>> }
>>
>> /* We must always read one byte or word, even if everything is aligned!
>>  * Otherwise, memcmp(data, data, length) is trivially true.
>>  */
>> for (;;) {
>> memcpy(, p, sizeof(word));
>> if (word)
>> return false;
>> if (__builtin_expect(length & (16 - sizeof(word)), 0) == 0)
>> break;
>> p += sizeof(word);
>> length -= sizeof(word);
>> if (!length)
>> return true;
>> }
>>
>>  /* Now we know that's zero, memcmp with self. */
>>  return memcmp(data, p, length) == 0;
>> }



Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-22 Thread Valerio Aimale

On 10/22/15 1:12 PM, Eduardo Habkost wrote:

On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:

Valerio Aimale  writes:

[...]

There's also a similar patch, floating around the internet, the uses
shared memory, instead of sockets, as inter-process communication
between libvmi and QEMU. I've never used that.

By the time you built a working IPC mechanism on top of shared memory,
you're often no better off than with AF_LOCAL sockets.

Crazy idea: can we allocate guest memory in a way that support sharing
it with another process?  Eduardo, can -mem-path do such wild things?

It can't today, but just because it creates a temporary file inside
mem-path and unlinks it immediately after opening a file descriptor. We
could make memory-backend-file also accept a full filename as argument,
or add a mechanism to let QEMU send the open file descriptor to a QMP
client.

Eduardo, would my "artisanal" idea of creating an mmap'ed image of the 
guest memory footprint work, augmented by Eric's suggestion of having 
the qmp client pass the filename?


qmp_pmemmap( [...]) {

char *template = "/tmp/QEM_mmap_XXX";
int mmap_fd;
uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming 
VM with 8GB RAM */);


cpu_physical_memory_rw( (hwaddr) 0,  local_memspace , (hwaddr) 
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will 
discuss write later */);


   mmap_fd = mkstemp("/tmp/QEUM_mmap_XXX");

   mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | 
PROT_WRITE,  MAP_SHARED | MAP_ANON,  mmap_fd, (off_t) 0);


  /* etc */

}

pmemmap would return the following json

{
'success' : 'true',
'map_filename' : '/tmp/QEM_mmap_1234567'
}





Re: [Qemu-devel] [PATCH] hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT

2015-10-22 Thread Kevin O'Connor
On Thu, Oct 22, 2015 at 10:40:08AM +0200, Paolo Bonzini wrote:
> On 21/10/2015 20:36, Jordan Justen wrote:
> > On 2015-10-20 11:14:00, Laszlo Ersek wrote:
> > > Commit 4d00636e97b7 ("ich9: Add the lpc chip", Nov 14 2012) added the
> > > ich9_apm_ctrl_changed() ioport write callback function such that it would
> > > inject the SMI, in response to a write to the APM_CNT register, on the
> > > first CPU, invariably.
> > > 
> > > Since this register is used by guest code to trigger an SMI synchronously,
> > > the interrupt should be injected on the VCPU that is performing the write.
> > 
> > Why not send an SMI to *all* processors, like the real chipsets do?
> 
> That's much less scalable, and more important I would have to check that
> SeaBIOS can handle that correctly.  It probably doesn't, as it doesn't
> relocate SMBASEs.

SeaBIOS is only expecting its SMI handler to be called once in
response to a synchronous SMI.  We can change SeaBIOS to fix that.

SeaBIOS does relocate the smbase from 0x3 to 0xa during its
init phase (by creating a synchronous SMI on the BSP and then setting
the smbase register to 0xa in the smi handler).

-Kevin



Re: [Qemu-devel] [PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-22 Thread Eduardo Habkost
On Tue, Oct 20, 2015 at 03:22:51PM +0800, Haozhong Zhang wrote:
> This patchset enables QEMU to save/restore vcpu's TSC rate during the
> migration. When cooperating with KVM which supports TSC scaling, guest
> programs can observe a consistent guest TSC rate even though they are
> migrated among machines with different host TSC rates.
> 
> A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
> control the migration of vcpu's TSC rate.

The requirements and goals aren't clear to me. I see two possible use
cases, here:

1) Best effort to keep TSC frequency constant if possible (but not
   aborting migration if not possible). This would be an interesting
   default, but a bit unpredictable.
2) Strictly ensuring TSC frequency stays constant on migration (and
   aborting migration if not possible). This would be an useful feature,
   but can't be enabled by default unless both hosts have the same TSC
   frequency or support TSC scaling.

Which one(s) you are trying to implement?

In other words, what is the right behavior when KVM_SET_TSC_KHZ fails or
KVM_CAP_TSC_CONTROL is not available? We can't answer that question if
the requirements and goals are not clear.

Once we know what exactly is the goal, we could enable the new mode with
a single option, instead of raw options to control migration stream
loading/saving.


>  * By default, the migration of vcpu's TSC rate is enabled only on
>pc-*-2.5 and newer machine types. If the cpu option 'save-tsc-freq'
>is present, the vcpu's TSC rate will be migrated from older machine
>types as well.
>  * Another cpu option 'load-tsc-freq' controls whether the migrated
>vcpu's TSC rate is used. By default, QEMU will not use the migrated
>TSC rate if this option is not present. Otherwise, QEMU will use
>the migrated TSC rate and override the TSC rate given by the cpu
>option 'tsc-freq'.
> 
> Changes in v2:
>  * Add a pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' to
>control the migration of vcpu's TSC rate.
>  * Move all logic of setting TSC rate to target-i386.
>  * Remove the duplicated TSC setup in kvm_arch_init_vcpu().
> 
> Haozhong Zhang (3):
>   target-i386: add a subsection for migrating vcpu's TSC rate
>   target-i386: calculate vcpu's TSC rate to be migrated
>   target-i386: load the migrated vcpu's TSC rate
> 
>  include/hw/i386/pc.h  |  5 +
>  target-i386/cpu.c |  2 ++
>  target-i386/cpu.h |  3 +++
>  target-i386/kvm.c | 61 
> +++
>  target-i386/machine.c | 19 
>  5 files changed, 81 insertions(+), 9 deletions(-)
> 
> -- 
> 2.4.8
> 

-- 
Eduardo



Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-22 Thread Eric Blake
On 10/22/2015 12:43 PM, Valerio Aimale wrote:

> 
> What if there was a qmp command, say 'pmemmap' then when invoked,
> performs the following:
> 
> qmp_pmemmap( [...]) {
> 
> char *template = "/tmp/QEM_mmap_XXX";

Why not let the caller pass in the file name, rather than opening it
ourselves? But the idea of coordinating a file that both caller and qemu
mmap to the same guest memory view might indeed have merit.

[by the way, it's okay to trim messages to the relevant portions, rather
than making readers scroll through lots of quoted content to find the meat]

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2] i386/acpi: add _HID to processor objects

2015-10-22 Thread Matthias Lange
This patch appends "ACPI0007" as the HID to each processor object.

Until commit 20843d processor objects used to have a _HID. According
to the ACPI spec this is not required but removing it breaks systems
which relied on the HID. As it does no harm it is safe to add _HID
to processor objects and restore the old behaviour.

Signed-off-by: Matthias Lange 
---
 hw/i386/acpi-build.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 95e0c65..314cd0b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1153,6 +1153,9 @@ build_ssdt(GArray *table_data, GArray *linker,
 for (i = 0; i < acpi_cpus; i++) {
 dev = aml_processor(i, 0, 0, "CP%.02X", i);
 
+/* for processor objects a _HID is not strictly required, however 
it
+ * does no harm and preserves compatibility with other BIOSes */
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 method = aml_method("_MAT", 0);
 aml_append(method, aml_return(aml_call1("CPMA", aml_int(i;
 aml_append(dev, method);
-- 
1.9.1




Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-22 Thread Eric Blake
On 10/22/2015 01:57 PM, Valerio Aimale wrote:

> 
> pmemmap would return the following json
> 
> {
> 'success' : 'true',
> 'map_filename' : '/tmp/QEM_mmap_1234567'
> }

In general, it is better if the client controls the filename, and not
qemu.  This is because things like libvirt like to run qemu in a
highly-constrained environment, where the caller can pass in a file
descriptor that qemu cannot itself open().  So returning a filename is
pointless if the filename was already provided by the caller.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 00/40] Patch Round-up for stable 2.4.1, freeze on 2015-10-29

2015-10-22 Thread Cole Robinson
On 10/21/2015 02:05 PM, Cole Robinson wrote:
> On 10/21/2015 01:51 PM, Michael Roth wrote:
>> Hi everyone,
>>
>> The following new patches are queued for QEMU stable v2.4.1:
>>
>>   https://github.com/mdroth/qemu/commits/stable-2.4-staging
>>
>> The release is planned for 2015-11-03:
>>
>>   http://wiki.qemu.org/Planning/2.4
>>
>> Please respond here or CC qemu-sta...@nongnu.org on any patches you
>> think should be included in the release.
>>
> 

Another potential:

commit 98cf48f60aa4999f5b2808569a193a401a390e6a
Author: Paolo Bonzini 
Date:   Wed Sep 16 17:38:44 2015 +0200

trace: remove malloc tracing

Prevents qemu from dropping this stderr warning with latest glib:

(process:23283): GLib-WARNING **: gmem.c:482: custom memory allocation vtable
not supported

Not sure if it meets the stable criteria, but the error is annoying and I'll
be adding that patch to the fedora builds

Thanks,
Cole



[Qemu-devel] [PULL] vhost: build fix

2015-10-22 Thread Michael S. Tsirkin
The following changes since commit 3c23402d4032f69af44a87fdb8019ad3229a4f31:

  hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT 
(2015-10-22 14:39:09 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 7f4a930e64b9e69cd340395a7e4f0494aef4fcdd:

  vhost-user: fix up rhel6 build (2015-10-22 22:34:59 +0300)


vhost: build fix

Fix build breakages when using older gcc.

Signed-off-by: Michael S. Tsirkin 


Michael S. Tsirkin (1):
  vhost-user: fix up rhel6 build

 hw/virtio/vhost-user.c | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)




Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection

2015-10-22 Thread Radim Krčmář
2015-10-22 18:14+0200, Paolo Bonzini:
> On 22/10/2015 18:02, Eric Blake wrote:
>> I see a bug in there:
> 
> Of course.  You shouldn't have told me what the bug was, I deserved
> to look for it myself. :)

It rather seems that you don't want spoilers, :)

I see two bugs now.

> bool memeqzero4_paolo(const void *data, size_t length)
> {
> const unsigned char *p = data;
> unsigned long word;
> 
> while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
> if (*p)
> return false;
> p++;
> length--;
> if (!length)
> return true;
> }
> 
> /* We must always read one byte or word, even if everything is aligned!
>  * Otherwise, memcmp(data, data, length) is trivially true.
>  */
> for (;;) {
> memcpy(, p, sizeof(word));
> if (word)
> return false;
> if (__builtin_expect(length & (16 - sizeof(word)), 0) == 0)
> break;
> p += sizeof(word);
> length -= sizeof(word);
> if (!length)
> return true;
> }
> 
>  /* Now we know that's zero, memcmp with self. */
>  return memcmp(data, p, length) == 0;
> }



  1   2   3   4   >