date:20160624

Re: [Qemu-devel] [PATCH v2 0/5] linux-user: safe_syscall updates

2016-06-24 Thread Riku Voipio

On Thu, Jun 23, 2016 at 01:10:40PM +0100, Peter Maydell wrote:
> On 22 June 2016 at 01:32, Richard Henderson  wrote:
> > Rebased on Riku's linux-user-for-upstream branch.
> > Fixed some nits that Peter pointed out.
> > Fixed the ppc64 version to properly return -errno.
> >
> > Retested all except s390x, which is, at the moment, inconvenient.
 
> I don't know enough s390/ppc to really review those two patches
> but they look right to me.

All pushed to linux-user, thanks!

https://git.linaro.org/people/riku.voipio/qemu.git/shortlog/refs/heads/linux-user-for-upstream

I'll craft a pull req to get these in well before soft freeze.

Riku

Re: [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR

2016-06-24 Thread Peter Xu

On Fri, Jun 24, 2016 at 03:10:21PM +0800, Peter Xu wrote:
> When user specify "kernel-irqchip=on", throw error and then quit.
> 
> Signed-off-by: Peter Xu 
> ---
> 
> One more patch for this series. Without this one, guest kernel will
> possibly hang. This is not user friendly.

This patch should not be here. It should in-reply-to the cover letter.
My fault to erroneously pasted a wrong message ID. :(

-- peterx

Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2 resize with snapshot

2016-06-24 Thread Stefan Hajnoczi

On Thu, Jun 23, 2016 at 10:34:48PM +0800, zhangzhiming wrote:
> hi, please help to review my code. i have checked it for serval times.  
> it is short and very easy too read. and i will be very grateful to you for 
> taking a very little time
> to read it. thanks very much!
> 
> zhangzhiming
> zhangzhimin...@meituan.com 
> 
> Signed-off-by: zhangzhiming  >

Please follow the guidelines for submitting patches:
http://qemu-project.org/Contribute/SubmitAPatch

Send patches as new email threads, not replies to old emails.

Include a meaningful commit description that explains why this patch is
necessary.  Do not put comments into the commit description that aren't
useful in the git log.

Stefan

signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] arm: Re-enable tmp105 test

2016-06-24 Thread Thomas Huth

The tmp105 test is currently not executed since the following
line in the Makefile overwrites the check-qtest-arm-y variable
instead of extending it.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index fd2dba4..6c09962 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -251,7 +251,7 @@ check-qtest-sparc64-y = tests/endianness-test$(EXESUF)
 gcov-files-sparc-y += hw/timer/m48t59.c
 gcov-files-sparc64-y += hw/timer/m48t59.c
 check-qtest-arm-y = tests/tmp105-test$(EXESUF)
-check-qtest-arm-y = tests/ds1338-test$(EXESUF)
+check-qtest-arm-y += tests/ds1338-test$(EXESUF)
 gcov-files-arm-y += hw/misc/tmp105.c
 check-qtest-arm-y += tests/virtio-blk-test$(EXESUF)
 gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 0/6] serial: flow control fixes

2016-06-24 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> v1->v2:
> - use guint for GSource tags
> - move migration patch last because it set watch_tag but never read it
> - add more details in errors
> - don't drop tsr_retry sanity check in migration patch

Thanks,
Reviewed-by: Dr. David Alan Gilbert 

Note you have a 1 character typo in both of your errors, 
+ "(tsr empty, tsr_retry=%d", s->tsr_retry);
+ "(tsr not empty, tsr_retry=0");

I think you want to add the close brackets, i.e.:
+ "(tsr empty, tsr_retry=%d)", s->tsr_retry);
+ "(tsr not empty, tsr_retry=0)");

Dave

> 
> Paolo Bonzini (6):
>   serial: make tsr_retry unsigned
>   serial: simplify tsr_retry reset
>   serial: separate serial_xmit and serial_watch_cb
>   char: change qemu_chr_fe_add_watch to return unsigned
>   serial: remove watch on reset
>   serial: reinstate watch after migration
> 
>  hw/char/cadence_uart.c   |  9 ---
>  hw/char/serial.c | 67 
> 
>  include/hw/char/serial.h |  3 ++-
>  include/sysemu/char.h| 16 ++--
>  net/vhost-user.c |  2 +-
>  qemu-char.c  |  8 +++---
>  6 files changed, 78 insertions(+), 27 deletions(-)
> 
> -- 
> 2.5.5
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] Qemu and heavily increased RSS usage

2016-06-24 Thread Stefan Hajnoczi

On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
> On 22 June 2016 at 20:55, Peter Lieven  wrote:
> > What makes the coroutine pool memory intensive is the stack size of 1MB per
> > coroutine. Is it really necessary to have such a big stack?
> 
> That reminds me that I was wondering if we should allocate
> our coroutine stacks with MAP_GROWSDOWN (though if we're
> not actually using 1MB of stack then it's only going to
> be eating virtual memory, not necessarily real memory.)

Yes, MAP_GROWSDOWN will not reduce RSS.

It's possible that we can reduce RSS usage of the coroutine pool but it
will require someone to profile the pool usage patterns.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 0/7] virtio-blk: multiqueue support

2016-06-24 Thread Stefan Hajnoczi

On Tue, Jun 21, 2016 at 01:13:09PM +0100, Stefan Hajnoczi wrote:
> v4:
>  * Rebased onto qemu.git/master
>  * Included latest performance results
> 
> v3:
>  * Drop Patch 1 to batch guest notify for non-dataplane
> 
>The Linux AIO completion BH and the virtio-blk batch notify BH changed 
> order
>in the AioContext->first_bh list as a side-effect of moving the BH from
>hw/block/dataplane/virtio-blk.c to hw/block/virtio-blk.c.  This caused a
>serious performance regression for both dataplane and non-dataplane.
> 
>I've decided not to move the BH in this series and work on a separate
>solution for making batch notify generic.
> 
>The remaining patches have been reordered and cleaned up.
> 
>  * See performance data below.
> 
> v2:
>  * Simplify s->rq live migration [Paolo]
>  * Use more efficient bitmap ops for batch notification [Paolo]
>  * Fix perf regression due to batch notify BH in wrong AioContext [Christian]
> 
> The virtio_blk guest driver has supported multiple virtqueues since Linux 
> 3.17.
> This patch series adds multiple virtqueues to QEMU's virtio-blk emulated
> device.
> 
> Ming Lei sent patches previously but these were not merged.  This series
> implements virtio-blk multiqueue for QEMU from scratch since the codebase has
> changed.  Live migration support for s->rq was also missing from the previous
> series and has been added.
> 
> It's important to note that QEMU's block layer does not support multiqueue 
> yet.
> Therefore virtio-blk device processes all virtqueues in the same AioContext
> (IOThread).  Further work is necessary to take advantage of multiqueue support
> in QEMU's block layer once it becomes available.
> 
> Performance results:
> 
> Using virtio-blk-pci,num-queues=4 can produce a speed-up but -smp 4
> introduces a lot of variance across runs.  No pinning was performed.
> 
> RHEL 7.2 guest on RHEL 7.2 host with 1 vcpu and 1 GB RAM unless otherwise
> noted.  The default configuration of the Linux null_blk driver is used as
> /dev/vdb.
> 
> $ cat files/fio.job
> [global]
> filename=/dev/vdb
> ioengine=libaio
> direct=1
> gtod_reduce=1
> 
> [job1]
> numjobs=4
> iodepth=16
> rw=randread
> bs=4K
> 
> $ ./analyze.py runs/
> Name   IOPS   Error
> v4-smp-4-dataplane   13326598.0 ± 6.31%
> v4-smp-4-dataplane-no-mq 11483568.0 ± 3.42%
> v4-smp-4-no-dataplane18108611.6 ± 1.53%
> v4-smp-4-no-dataplane-no-mq  13951225.6 ± 7.81%
> 
> Stefan Hajnoczi (7):
>   virtio-blk: add VirtIOBlockConf->num_queues
>   virtio-blk: multiqueue batch notify
>   virtio-blk: tell dataplane which vq to notify
>   virtio-blk: associate request with a virtqueue
>   virtio-blk: live migrate s->rq with multiqueue
>   virtio-blk: dataplane multiqueue support
>   virtio-blk: add num-queues device property
> 
>  hw/block/dataplane/virtio-blk.c | 81 
> +
>  hw/block/dataplane/virtio-blk.h |  2 +-
>  hw/block/virtio-blk.c   | 52 +-
>  include/hw/virtio/virtio-blk.h  |  6 ++-
>  4 files changed, 105 insertions(+), 36 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 4/5] include: Move typedef qemu_irq to qemu/typedefs.h

2016-06-24 Thread Peter Maydell

On 24 June 2016 at 09:17, Paolo Bonzini  wrote:
>
>
> On 24/06/2016 10:15, Peter Maydell wrote:
> >>> @@ -31,6 +31,7 @@ typedef struct FWCfgState FWCfgState;
> >>>  typedef struct HCIInfo HCIInfo;
> >>>  typedef struct I2CBus I2CBus;
> >>>  typedef struct I2SCodec I2SCodec;
> >>> +typedef struct IRQState *qemu_irq;
> >>>  typedef struct ISABus ISABus;
> >>>  typedef struct ISADevice ISADevice;
> >>>  typedef struct IsaDma IsaDma;
 >>
 >> Everything else in typedefs.h is a "typedef struct Thing Thing",
 >> but qemu_irq is different...
>>> >
>>> > We want to keep our readers on their toes!
>> It would mean you now have to decide whether the file is orderd
>> by the types being defined or by the underlying implementation
>> type (previously both orders were the same)...
>
> Indeed, and renaming the struct is trivial because it's used in a
> handful of places only.

It would still be different by being a pointer-to-Foo, not a Foo.

thanks
-- PMM

[Qemu-devel] [PULL 00/24] linux-user changes

2016-06-24 Thread riku . voipio

From: Riku Voipio 

The following changes since commit c7288767523f6510cf557707d3eb5e78e519b90d:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160623' into 
staging (2016-06-23 11:53:14 +0100)

are available in the git repository at:

  git://git.linaro.org/people/riku.voipio/qemu.git tags/pull-linux-user-20160624

for you to fetch changes up to 895b3c88e8ed1017bbcf0aa9aaf3d90754494ab5:

  linux-user: Provide safe_syscall for ppc64 (2016-06-24 11:57:35 +0300)


Drop building linux-user targets on HPPA or m68k host systems
and add safe_syscall support for i386, aarch64, arm, ppc64 and
s390x.



Laurent Vivier (7):
  linux-user: add socketcall() strace
  linux-user: add socket() strace
  linux-user: fix clone() strace
  linux-user: update get_thread_area/set_thread_area strace
  linux-user: add missing return in netlink switch statement
  linux-user: fd_trans_host_to_target_data() must process only received
data
  linux-user: don't swap NLMSG_DATA() fields

Peter Maydell (11):
  linux-user: Avoid possible misalignment in host_to_target_siginfo()
  linux-user: Use __get_user() and __put_user() to handle structs in
do_fcntl()
  linux-user: Use safe_syscall wrapper for fcntl
  linux-user: Don't use sigfillset() on uc->uc_sigmask
  configure: Don't override ARCH=unknown if enabling TCI
  configure: Don't allow user-only targets for unknown CPU architectures
  user-exec: Delete now-unused hppa and m68k cpu_signal_handler() code
  user-exec: Remove unused code for OSX hosts
  linux-user: Create a hostdep.h for each host architecture
  linux-user: Fix wrong type used for argument to rt_sigqueueinfo
  linux-user: Support F_GETPIPE_SZ and F_SETPIPE_SZ fcntls

Richard Henderson (6):
  linux-user: fix x86_64 safe_syscall
  linux-user: Provide safe_syscall for i386
  linux-user: Provide safe_syscall for arm
  linux-user: Provide safe_syscall for aarch64
  linux-user: Provide safe_syscall for s390x
  linux-user: Provide safe_syscall for ppc64

 Makefile.target|   5 +-
 configure  |   8 +-
 linux-user/host/aarch64/hostdep.h  |  38 ++
 linux-user/host/aarch64/safe-syscall.inc.S |  75 
 linux-user/host/arm/hostdep.h  |  38 ++
 linux-user/host/arm/safe-syscall.inc.S |  90 +
 linux-user/host/generic/hostdep.h  |  20 -
 linux-user/host/i386/hostdep.h |  38 ++
 linux-user/host/i386/safe-syscall.inc.S| 112 ++
 linux-user/host/ia64/hostdep.h |  15 +
 linux-user/host/mips/hostdep.h |  15 +
 linux-user/host/ppc/hostdep.h  |  15 +
 linux-user/host/ppc64/hostdep.h|  38 ++
 linux-user/host/ppc64/safe-syscall.inc.S   |  92 +
 linux-user/host/s390/hostdep.h |  15 +
 linux-user/host/s390x/hostdep.h|  38 ++
 linux-user/host/s390x/safe-syscall.inc.S   |  90 +
 linux-user/host/sparc/hostdep.h|  15 +
 linux-user/host/sparc64/hostdep.h  |  15 +
 linux-user/host/x32/hostdep.h  |  15 +
 linux-user/host/x86_64/safe-syscall.inc.S  |   6 +-
 linux-user/qemu.h  |   5 +
 linux-user/signal.c|  23 +-
 linux-user/strace.c| 621 -
 linux-user/strace.list |  10 +-
 linux-user/syscall.c   | 419 ++-
 linux-user/syscall_defs.h  |  24 +-
 user-exec.c| 107 +
 28 files changed, 1657 insertions(+), 345 deletions(-)
 create mode 100644 linux-user/host/aarch64/hostdep.h
 create mode 100644 linux-user/host/aarch64/safe-syscall.inc.S
 create mode 100644 linux-user/host/arm/hostdep.h
 create mode 100644 linux-user/host/arm/safe-syscall.inc.S
 delete mode 100644 linux-user/host/generic/hostdep.h
 create mode 100644 linux-user/host/i386/hostdep.h
 create mode 100644 linux-user/host/i386/safe-syscall.inc.S
 create mode 100644 linux-user/host/ia64/hostdep.h
 create mode 100644 linux-user/host/mips/hostdep.h
 create mode 100644 linux-user/host/ppc/hostdep.h
 create mode 100644 linux-user/host/ppc64/hostdep.h
 create mode 100644 linux-user/host/ppc64/safe-syscall.inc.S
 create mode 100644 linux-user/host/s390/hostdep.h
 create mode 100644 linux-user/host/s390x/hostdep.h
 create mode 100644 linux-user/host/s390x/safe-syscall.inc.S
 create mode 100644 linux-user/host/sparc/hostdep.h
 create mode 100644 linux-user/host/sparc64/hostdep.h
 create mode 100644 linux-user/host/x32/hostdep.h

-- 
2.1.4

[Qemu-devel] [PULL 05/24] configure: Don't override ARCH=unknown if enabling TCI

2016-06-24 Thread riku . voipio

From: Peter Maydell 

At the moment if configure finds an unknown CPU it will set
ARCH to 'unknown', and then later either bail out or set it
to 'tci' (depending on whether the user passed configure the
--enable-tcg-interpreter switch). This is unnecessarily
confusing, because we could be using TCI in two cases:
 * a known host architecture (in which case ARCH is set to
   the actual host architecture, like 'i386')
 * an unknown host architecture (in which case ARCH is
   set to 'tci')
so nothing can rely on ARCH=tci to mean "using TCI".
Remove the line setting ARCH, so we leave it as "unknown",
which is what the actual situation is.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 configure | 1 -
 1 file changed, 1 deletion(-)

diff --git a/configure b/configure
index 5929aba..6696316 100755
--- a/configure
+++ b/configure
@@ -1380,7 +1380,6 @@ fi
 if test "$ARCH" = "unknown"; then
 if test "$tcg_interpreter" = "yes" ; then
 echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
-ARCH=tci
 else
 error_exit "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
 fi
-- 
2.1.4

[Qemu-devel] [PULL 04/24] linux-user: Don't use sigfillset() on uc->uc_sigmask

2016-06-24 Thread riku . voipio

From: Peter Maydell 

The kernel and libc have different ideas about what a sigset_t
is -- for the kernel it is only _NSIG / 8 bytes in size (usually
8 bytes), but for libc it is much larger, 128 bytes. In most
situations the difference doesn't matter, because if you pass a
pointer to a libc sigset_t to the kernel it just acts on the first
8 bytes of it, but for the ucontext_t* argument to a signal handler
it trips us up. The kernel allocates this ucontext_t on the stack
according to its idea of the sigset_t type, but the type of the
ucontext_t defined by the libc headers uses the libc type, and
so do the manipulator functions like sigfillset(). This means that
 (1) sizeof(uc->uc_sigmask) is much larger than the actual
 space used on the stack
 (2) sigfillset(&uc->uc_sigmask) will write garbage 0xff bytes
 off the end of the structure, which can trash data that
 was on the stack before the signal handler was invoked,
 and may result in a crash after the handler returns

To avoid this, we use a memset() of the correct size to fill
the signal mask rather than using the libc function.

This fixes a problem where we would crash at least some of the
time on an i386 host when a signal was taken.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
---
 linux-user/qemu.h|  5 +
 linux-user/signal.c  | 10 +-
 linux-user/syscall.c |  5 -
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 56f29c3..e8a5aed 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -20,6 +20,11 @@
 
 #define THREAD __thread
 
+/* This is the size of the host kernel's sigset_t, needed where we make
+ * direct system calls that take a sigset_t pointer and a size.
+ */
+#define SIGSET_T_SIZE (_NSIG / 8)
+
 /* This struct is used to hold certain information about the image.
  * Basically, it replicates in user space what would be certain
  * task_struct fields in the kernel
diff --git a/linux-user/signal.c b/linux-user/signal.c
index e2d55ff..9d98045 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -636,8 +636,16 @@ static void host_signal_handler(int host_signum, siginfo_t 
*info,
  * code in case the guest code provokes one in the window between
  * now and it getting out to the main loop. Signals will be
  * unblocked again in process_pending_signals().
+ *
+ * WARNING: we cannot use sigfillset() here because the uc_sigmask
+ * field is a kernel sigset_t, which is much smaller than the
+ * libc sigset_t which sigfillset() operates on. Using sigfillset()
+ * would write 0xff bytes off the end of the structure and trash
+ * data on the struct.
+ * We can't use sizeof(uc->uc_sigmask) either, because the libc
+ * headers define the struct field with the wrong (too large) type.
  */
-sigfillset(&uc->uc_sigmask);
+memset(&uc->uc_sigmask, 0xff, SIGSET_T_SIZE);
 sigdelset(&uc->uc_sigmask, SIGSEGV);
 sigdelset(&uc->uc_sigmask, SIGBUS);
 
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8dc8c7a..95eafeb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -123,11 +123,6 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
 #defineVFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct 
linux_dirent [2])
 #defineVFAT_IOCTL_READDIR_SHORT_IOR('r', 2, struct 
linux_dirent [2])
 
-/* This is the size of the host kernel's sigset_t, needed where we make
- * direct system calls that take a sigset_t pointer and a size.
- */
-#define SIGSET_T_SIZE (_NSIG / 8)
-
 #undef _syscall0
 #undef _syscall1
 #undef _syscall2
-- 
2.1.4

[Qemu-devel] [PULL 03/24] linux-user: Use safe_syscall wrapper for fcntl

2016-06-24 Thread riku . voipio

From: Peter Maydell 

Use the safe_syscall wrapper for fcntl. This is straightforward now
that we always use 'struct fcntl64' on the host, as we don't need
to select whether to call the host's fcntl64 or fcntl syscall
(a detail that the libc previously hid for us).

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
---
 linux-user/syscall.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ee05405..8dc8c7a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -783,6 +783,16 @@ safe_syscall5(int, mq_timedreceive, int, mqdes, char *, 
msg_ptr,
  * the libc function.
  */
 #define safe_ioctl(...) safe_syscall(__NR_ioctl, __VA_ARGS__)
+/* Similarly for fcntl. Note that callers must always:
+ *  pass the F_GETLK64 etc constants rather than the unsuffixed F_GETLK
+ *  use the flock64 struct rather than unsuffixed flock
+ * This will then work and use a 64-bit offset for both 32-bit and 64-bit 
hosts.
+ */
+#ifdef __NR_fcntl64
+#define safe_fcntl(...) safe_syscall(__NR_fcntl64, __VA_ARGS__)
+#else
+#define safe_fcntl(...) safe_syscall(__NR_fcntl, __VA_ARGS__)
+#endif
 
 static inline int host_to_target_sock_type(int host_type)
 {
@@ -5740,7 +5750,7 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 if (ret) {
 return ret;
 }
-ret = get_errno(fcntl(fd, host_cmd, &fl64));
+ret = get_errno(safe_fcntl(fd, host_cmd, &fl64));
 if (ret == 0) {
 ret = copy_to_user_flock(arg, &fl64);
 }
@@ -5752,7 +5762,7 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 if (ret) {
 return ret;
 }
-ret = get_errno(fcntl(fd, host_cmd, &fl64));
+ret = get_errno(safe_fcntl(fd, host_cmd, &fl64));
 break;
 
 case TARGET_F_GETLK64:
@@ -5760,7 +5770,7 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 if (ret) {
 return ret;
 }
-ret = get_errno(fcntl(fd, host_cmd, &fl64));
+ret = get_errno(safe_fcntl(fd, host_cmd, &fl64));
 if (ret == 0) {
 ret = copy_to_user_flock64(arg, &fl64);
 }
@@ -5771,23 +5781,25 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 if (ret) {
 return ret;
 }
-ret = get_errno(fcntl(fd, host_cmd, &fl64));
+ret = get_errno(safe_fcntl(fd, host_cmd, &fl64));
 break;
 
 case TARGET_F_GETFL:
-ret = get_errno(fcntl(fd, host_cmd, arg));
+ret = get_errno(safe_fcntl(fd, host_cmd, arg));
 if (ret >= 0) {
 ret = host_to_target_bitmask(ret, fcntl_flags_tbl);
 }
 break;
 
 case TARGET_F_SETFL:
-ret = get_errno(fcntl(fd, host_cmd, target_to_host_bitmask(arg, 
fcntl_flags_tbl)));
+ret = get_errno(safe_fcntl(fd, host_cmd,
+   target_to_host_bitmask(arg,
+  fcntl_flags_tbl)));
 break;
 
 #ifdef F_GETOWN_EX
 case TARGET_F_GETOWN_EX:
-ret = get_errno(fcntl(fd, host_cmd, &fox));
+ret = get_errno(safe_fcntl(fd, host_cmd, &fox));
 if (ret >= 0) {
 if (!lock_user_struct(VERIFY_WRITE, target_fox, arg, 0))
 return -TARGET_EFAULT;
@@ -5805,7 +5817,7 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 fox.type = tswap32(target_fox->type);
 fox.pid = tswap32(target_fox->pid);
 unlock_user_struct(target_fox, arg, 0);
-ret = get_errno(fcntl(fd, host_cmd, &fox));
+ret = get_errno(safe_fcntl(fd, host_cmd, &fox));
 break;
 #endif
 
@@ -5815,11 +5827,11 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 case TARGET_F_GETSIG:
 case TARGET_F_SETLEASE:
 case TARGET_F_GETLEASE:
-ret = get_errno(fcntl(fd, host_cmd, arg));
+ret = get_errno(safe_fcntl(fd, host_cmd, arg));
 break;
 
 default:
-ret = get_errno(fcntl(fd, cmd, arg));
+ret = get_errno(safe_fcntl(fd, cmd, arg));
 break;
 }
 return ret;
@@ -10252,7 +10264,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 if (ret) {
 break;
 }
-ret = get_errno(fcntl(arg1, cmd, &fl));
+ret = get_errno(safe_fcntl(arg1, cmd, &fl));
break;
 default:
 ret = do_fcntl(arg1, arg2, arg3);
-- 
2.1.4

[Qemu-devel] [PULL 11/24] linux-user: Support F_GETPIPE_SZ and F_SETPIPE_SZ fcntls

2016-06-24 Thread riku . voipio

From: Peter Maydell 

Support the F_GETPIPE_SZ and F_SETPIPE_SZ fcntl operations.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
---
 linux-user/strace.c   | 7 +++
 linux-user/syscall.c  | 6 ++
 linux-user/syscall_defs.h | 2 ++
 3 files changed, 15 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 4046b81..6ef5d38 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -918,6 +918,13 @@ print_fcntl(const struct syscallname *name,
 case TARGET_F_GETLEASE:
 gemu_log("F_GETLEASE");
 break;
+case TARGET_F_SETPIPE_SZ:
+gemu_log("F_SETPIPE_SZ,");
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 1);
+break;
+case TARGET_F_GETPIPE_SZ:
+gemu_log("F_GETPIPE_SZ");
+break;
 case TARGET_F_DUPFD_CLOEXEC:
 gemu_log("F_DUPFD_CLOEXEC,");
 print_raw_param(TARGET_ABI_FMT_ld, arg2, 1);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 686ebfb..b1ed57c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5585,6 +5585,10 @@ static int target_to_host_fcntl_cmd(int cmd)
case TARGET_F_SETOWN_EX:
return F_SETOWN_EX;
 #endif
+case TARGET_F_SETPIPE_SZ:
+return F_SETPIPE_SZ;
+case TARGET_F_GETPIPE_SZ:
+return F_GETPIPE_SZ;
default:
 return -TARGET_EINVAL;
 }
@@ -5822,6 +5826,8 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 case TARGET_F_GETSIG:
 case TARGET_F_SETLEASE:
 case TARGET_F_GETLEASE:
+case TARGET_F_SETPIPE_SZ:
+case TARGET_F_GETPIPE_SZ:
 ret = get_errno(safe_fcntl(fd, host_cmd, arg));
 break;
 
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 6ee9251..420463b 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -2166,6 +2166,8 @@ struct target_statfs64 {
 #define TARGET_F_SETLEASE (TARGET_F_LINUX_SPECIFIC_BASE + 0)
 #define TARGET_F_GETLEASE (TARGET_F_LINUX_SPECIFIC_BASE + 1)
 #define TARGET_F_DUPFD_CLOEXEC (TARGET_F_LINUX_SPECIFIC_BASE + 6)
+#define TARGET_F_SETPIPE_SZ (TARGET_F_LINUX_SPECIFIC_BASE + 7)
+#define TARGET_F_GETPIPE_SZ (TARGET_F_LINUX_SPECIFIC_BASE + 8)
 #define TARGET_F_NOTIFY  (TARGET_F_LINUX_SPECIFIC_BASE+2)
 
 #if defined(TARGET_ALPHA)
-- 
2.1.4

[Qemu-devel] [PULL 01/24] linux-user: Avoid possible misalignment in host_to_target_siginfo()

2016-06-24 Thread riku . voipio

From: Peter Maydell 

host_to_target_siginfo() is implemented by a combination of
host_to_target_siginfo_noswap() followed by tswap_siginfo().
The first of these two functions assumes that the target_siginfo_t
it is writing to is correctly aligned, but the pointer passed
into host_to_target_siginfo() is directly from the guest and
might be misaligned. Use a local variable to avoid this problem.
(tswap_siginfo() does now correctly handle a misaligned destination.)

We have to add a memset() to host_to_target_siginfo_noswap()
to avoid some false positive "may be used uninitialized" warnings
from gcc about subfields of the _sifields union if it chooses to
inline both tswap_siginfo() and host_to_target_siginfo_noswap()
into host_to_target_siginfo().

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Signed-off-by: Peter Maydell 
---
 linux-user/signal.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 1dadddf..e2d55ff 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -278,6 +278,14 @@ static inline void 
host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
 tinfo->si_errno = 0;
 tinfo->si_code = info->si_code;
 
+/* This memset serves two purposes:
+ * (1) ensure we don't leak random junk to the guest later
+ * (2) placate false positives from gcc about fields
+ * being used uninitialized if it chooses to inline both this
+ * function and tswap_siginfo() into host_to_target_siginfo().
+ */
+memset(tinfo->_sifields._pad, 0, sizeof(tinfo->_sifields._pad));
+
 /* This is awkward, because we have to use a combination of
  * the si_code and si_signo to figure out which of the union's
  * members are valid. (Within the host kernel it is always possible
@@ -397,8 +405,9 @@ static void tswap_siginfo(target_siginfo_t *tinfo,
 
 void host_to_target_siginfo(target_siginfo_t *tinfo, const siginfo_t *info)
 {
-host_to_target_siginfo_noswap(tinfo, info);
-tswap_siginfo(tinfo, tinfo);
+target_siginfo_t tgt_tmp;
+host_to_target_siginfo_noswap(&tgt_tmp, info);
+tswap_siginfo(tinfo, &tgt_tmp);
 }
 
 /* XXX: we support only POSIX RT signals are used. */
-- 
2.1.4

[Qemu-devel] [PULL 02/24] linux-user: Use __get_user() and __put_user() to handle structs in do_fcntl()

2016-06-24 Thread riku . voipio

From: Peter Maydell 

Use the __get_user() and __put_user() to handle reading and writing the
guest structures in do_ioctl(). This has two benefits:
 * avoids possible errors due to misaligned guest pointers
 * correctly sign extends signed fields (like l_start in struct flock)
   which might be different sizes between guest and host

To do this we abstract out into copy_from/to_user functions. We
also standardize on always using host flock64 and the F_GETLK64
etc flock commands, as this means we always have 64 bit offsets
whether the host is 64-bit or 32-bit and we don't need to support
conversion to both host struct flock and struct flock64.

In passing we fix errors in converting l_type from the host to
the target (where we were doing a byteswap of the host value
before trying to do the convert-bitmasks operation rather than
otherwise, and inexplicably shifting left by 1); these were
accidentally left over when the original simple "just shift by 1"
arm<->x86 conversion of commit 43f238d was changed to the more
general scheme of using target_to_host_bitmask() functions in 2ba7f73.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
---
 linux-user/syscall.c | 298 ---
 1 file changed, 166 insertions(+), 132 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 1c17b74..ee05405 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5541,11 +5541,11 @@ static int target_to_host_fcntl_cmd(int cmd)
case TARGET_F_SETFL:
 return cmd;
 case TARGET_F_GETLK:
-   return F_GETLK;
-   case TARGET_F_SETLK:
-   return F_SETLK;
-   case TARGET_F_SETLKW:
-   return F_SETLKW;
+return F_GETLK64;
+case TARGET_F_SETLK:
+return F_SETLK64;
+case TARGET_F_SETLKW:
+return F_SETLKW64;
case TARGET_F_GETOWN:
return F_GETOWN;
case TARGET_F_SETOWN:
@@ -5596,12 +5596,134 @@ static const bitmask_transtbl flock_tbl[] = {
 { 0, 0, 0, 0 }
 };
 
-static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
+static inline abi_long copy_from_user_flock(struct flock64 *fl,
+abi_ulong target_flock_addr)
 {
-struct flock fl;
 struct target_flock *target_fl;
+short l_type;
+
+if (!lock_user_struct(VERIFY_READ, target_fl, target_flock_addr, 1)) {
+return -TARGET_EFAULT;
+}
+
+__get_user(l_type, &target_fl->l_type);
+fl->l_type = target_to_host_bitmask(l_type, flock_tbl);
+__get_user(fl->l_whence, &target_fl->l_whence);
+__get_user(fl->l_start, &target_fl->l_start);
+__get_user(fl->l_len, &target_fl->l_len);
+__get_user(fl->l_pid, &target_fl->l_pid);
+unlock_user_struct(target_fl, target_flock_addr, 0);
+return 0;
+}
+
+static inline abi_long copy_to_user_flock(abi_ulong target_flock_addr,
+  const struct flock64 *fl)
+{
+struct target_flock *target_fl;
+short l_type;
+
+if (!lock_user_struct(VERIFY_WRITE, target_fl, target_flock_addr, 0)) {
+return -TARGET_EFAULT;
+}
+
+l_type = host_to_target_bitmask(fl->l_type, flock_tbl);
+__put_user(l_type, &target_fl->l_type);
+__put_user(fl->l_whence, &target_fl->l_whence);
+__put_user(fl->l_start, &target_fl->l_start);
+__put_user(fl->l_len, &target_fl->l_len);
+__put_user(fl->l_pid, &target_fl->l_pid);
+unlock_user_struct(target_fl, target_flock_addr, 1);
+return 0;
+}
+
+typedef abi_long from_flock64_fn(struct flock64 *fl, abi_ulong target_addr);
+typedef abi_long to_flock64_fn(abi_ulong target_addr, const struct flock64 
*fl);
+
+#ifdef TARGET_ARM
+static inline abi_long copy_from_user_eabi_flock64(struct flock64 *fl,
+   abi_ulong target_flock_addr)
+{
+struct target_eabi_flock64 *target_fl;
+short l_type;
+
+if (!lock_user_struct(VERIFY_READ, target_fl, target_flock_addr, 1)) {
+return -TARGET_EFAULT;
+}
+
+__get_user(l_type, &target_fl->l_type);
+fl->l_type = target_to_host_bitmask(l_type, flock_tbl);
+__get_user(fl->l_whence, &target_fl->l_whence);
+__get_user(fl->l_start, &target_fl->l_start);
+__get_user(fl->l_len, &target_fl->l_len);
+__get_user(fl->l_pid, &target_fl->l_pid);
+unlock_user_struct(target_fl, target_flock_addr, 0);
+return 0;
+}
+
+static inline abi_long copy_to_user_eabi_flock64(abi_ulong target_flock_addr,
+ const struct flock64 *fl)
+{
+struct target_eabi_flock64 *target_fl;
+short l_type;
+
+if (!lock_user_struct(VERIFY_WRITE, target_fl, target_flock_addr, 0)) {
+return -TARGET_EFAULT;
+}
+
+l_type = host_to_target_bitmask(fl->l_type, flock_tbl);
+__put_user(l_type, &target_fl->l_type);
+__put_user(fl->l_whence, &target_fl->l_whence);
+__put_

[Qemu-devel] [PULL 08/24] user-exec: Remove unused code for OSX hosts

2016-06-24 Thread riku . voipio

From: Peter Maydell 

Since we dropped darwin-user support many years ago, the code in
user-exec to support hosts which define __APPLE__ is unused; delete it.

Reviewed-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Signed-off-by: Riku Voipio 
Signed-off-by: Peter Maydell 
---
 user-exec.c | 47 +--
 1 file changed, 1 insertion(+), 46 deletions(-)

diff --git a/user-exec.c b/user-exec.c
index 1e2449e..95f9f97 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -117,14 +117,7 @@ static inline int handle_cpu_signal(uintptr_t pc, unsigned 
long address,
 
 #if defined(__i386__)
 
-#if defined(__APPLE__)
-#include 
-
-#define EIP_sig(context)  (*((unsigned long *)&(context)->uc_mcontext->ss.eip))
-#define TRAP_sig(context)((context)->uc_mcontext->es.trapno)
-#define ERROR_sig(context)   ((context)->uc_mcontext->es.err)
-#define MASK_sig(context)((context)->uc_sigmask)
-#elif defined(__NetBSD__)
+#if defined(__NetBSD__)
 #include 
 
 #define EIP_sig(context) ((context)->uc_mcontext.__gregs[_REG_EIP])
@@ -274,44 +267,6 @@ int cpu_signal_handler(int host_signum, void *pinfo,
 #define TRAP_sig(context)  ((context)->uc_mcontext.mc_exc)
 #endif /* __FreeBSD__|| __FreeBSD_kernel__ */
 
-#ifdef __APPLE__
-#include 
-typedef struct ucontext SIGCONTEXT;
-/* All Registers access - only for local access */
-#define REG_sig(reg_name, context)  \
-((context)->uc_mcontext->ss.reg_name)
-#define FLOATREG_sig(reg_name, context) \
-((context)->uc_mcontext->fs.reg_name)
-#define EXCEPREG_sig(reg_name, context) \
-((context)->uc_mcontext->es.reg_name)
-#define VECREG_sig(reg_name, context)   \
-((context)->uc_mcontext->vs.reg_name)
-/* Gpr Registers access */
-#define GPR_sig(reg_num, context)  REG_sig(r##reg_num, context)
-/* Program counter */
-#define IAR_sig(context)   REG_sig(srr0, context)
-/* Machine State Register (Supervisor) */
-#define MSR_sig(context)   REG_sig(srr1, context)
-#define CTR_sig(context)   REG_sig(ctr, context)
-/* Link register */
-#define XER_sig(context)   REG_sig(xer, context)
-/* User's integer exception register */
-#define LR_sig(context)REG_sig(lr, context)
-/* Condition register */
-#define CR_sig(context)REG_sig(cr, context)
-/* Float Registers access */
-#define FLOAT_sig(reg_num, context) \
-FLOATREG_sig(fpregs[reg_num], context)
-#define FPSCR_sig(context)  \
-((double)FLOATREG_sig(fpscr, context))
-/* Exception Registers access */
-/* Fault registers for coredump */
-#define DAR_sig(context)   EXCEPREG_sig(dar, context)
-#define DSISR_sig(context) EXCEPREG_sig(dsisr, context)
-/* number of powerpc exception taken */
-#define TRAP_sig(context)  EXCEPREG_sig(exception, context)
-#endif /* __APPLE__ */
-
 int cpu_signal_handler(int host_signum, void *pinfo,
void *puc)
 {
-- 
2.1.4

[Qemu-devel] [PULL 12/24] linux-user: add socketcall() strace

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/strace.c   | 549 ++
 linux-user/strace.list|   2 +-
 linux-user/syscall_defs.h |  22 +-
 3 files changed, 568 insertions(+), 5 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 6ef5d38..c8df76f 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -5,6 +5,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include "qemu.h"
 
@@ -57,10 +60,15 @@ UNUSED static void print_open_flags(abi_long, int);
 UNUSED static void print_syscall_prologue(const struct syscallname *);
 UNUSED static void print_syscall_epilogue(const struct syscallname *);
 UNUSED static void print_string(abi_long, int);
+UNUSED static void print_buf(abi_long addr, abi_long len, int last);
 UNUSED static void print_raw_param(const char *, abi_long, int);
 UNUSED static void print_timeval(abi_ulong, int);
 UNUSED static void print_number(abi_long, int);
 UNUSED static void print_signal(abi_ulong, int);
+UNUSED static void print_sockaddr(abi_ulong addr, abi_long addrlen);
+UNUSED static void print_socket_domain(int domain);
+UNUSED static void print_socket_type(int type);
+UNUSED static void print_socket_protocol(int domain, int type, int protocol);
 
 /*
  * Utility functions
@@ -146,6 +154,165 @@ print_signal(abi_ulong arg, int last)
 gemu_log("%s%s", signal_name, get_comma(last));
 }
 
+static void
+print_sockaddr(abi_ulong addr, abi_long addrlen)
+{
+struct target_sockaddr *sa;
+int i;
+int sa_family;
+
+sa = lock_user(VERIFY_READ, addr, addrlen, 1);
+if (sa) {
+sa_family = tswap16(sa->sa_family);
+switch (sa_family) {
+case AF_UNIX: {
+struct target_sockaddr_un *un = (struct target_sockaddr_un *)sa;
+int i;
+gemu_log("{sun_family=AF_UNIX,sun_path=\"");
+for (i = 0; i < addrlen -
+offsetof(struct target_sockaddr_un, sun_path) &&
+ un->sun_path[i]; i++) {
+gemu_log("%c", un->sun_path[i]);
+}
+gemu_log("\"}");
+break;
+}
+case AF_INET: {
+struct target_sockaddr_in *in = (struct target_sockaddr_in *)sa;
+uint8_t *c = (uint8_t *)&in->sin_addr.s_addr;
+gemu_log("{sin_family=AF_INET,sin_port=htons(%d),",
+ ntohs(in->sin_port));
+gemu_log("sin_addr=inet_addr(\"%d.%d.%d.%d\")",
+ c[0], c[1], c[2], c[3]);
+gemu_log("}");
+break;
+}
+case AF_PACKET: {
+struct target_sockaddr_ll *ll = (struct target_sockaddr_ll *)sa;
+uint8_t *c = (uint8_t *)&ll->sll_addr;
+gemu_log("{sll_family=AF_PACKET,"
+ "sll_protocol=htons(0x%04x),if%d,pkttype=",
+ ntohs(ll->sll_protocol), ll->sll_ifindex);
+switch (ll->sll_pkttype) {
+case PACKET_HOST:
+gemu_log("PACKET_HOST");
+break;
+case PACKET_BROADCAST:
+gemu_log("PACKET_BROADCAST");
+break;
+case PACKET_MULTICAST:
+gemu_log("PACKET_MULTICAST");
+break;
+case PACKET_OTHERHOST:
+gemu_log("PACKET_OTHERHOST");
+break;
+case PACKET_OUTGOING:
+gemu_log("PACKET_OUTGOING");
+break;
+default:
+gemu_log("%d", ll->sll_pkttype);
+break;
+}
+gemu_log(",sll_addr=%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x",
+ c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7]);
+gemu_log("}");
+break;
+}
+default:
+gemu_log("{sa_family=%d, sa_data={", sa->sa_family);
+for (i = 0; i < 13; i++) {
+gemu_log("%02x, ", sa->sa_data[i]);
+}
+gemu_log("%02x}", sa->sa_data[i]);
+gemu_log("}");
+break;
+}
+unlock_user(sa, addr, 0);
+} else {
+print_raw_param("0x"TARGET_ABI_FMT_lx, addr, 0);
+}
+gemu_log(", "TARGET_ABI_FMT_ld, addrlen);
+}
+
+static void
+print_socket_domain(int domain)
+{
+switch (domain) {
+case PF_UNIX:
+gemu_log("PF_UNIX");
+break;
+case PF_INET:
+gemu_log("PF_INET");
+break;
+case PF_PACKET:
+gemu_log("PF_PACKET");
+break;
+default:
+gemu_log("%d", domain);
+break;
+}
+}
+
+static void
+print_socket_type(int type)
+{
+switch (type) {
+case TARGET_SOCK_DGRAM:
+gemu_log("SOCK_DGRAM");
+break;
+case TARGET_SOCK_STREAM:
+gemu_log("SOCK_STREAM");
+break;
+case TARGET_SOCK_RAW:
+gemu_log("SOCK_RA

[Qemu-devel] [PULL 06/24] configure: Don't allow user-only targets for unknown CPU architectures

2016-06-24 Thread riku . voipio

From: Peter Maydell 

For the user-only targets, we need to know something about the host CPU
architecture even if we are using the TCI interpreter rather than TCG.
(In particular user-exec.c has code for handling signals that needs
to know about that host's context structures.)

Specifically forbid building the user-only targets on unknown CPU
architectures, rather than allowing them to configure but then fail
when building user-exec.c.

This change drops supports for two configurations which were theoretically
possible before:
 * linux-user targets on M68K hosts using TCI
 * linux-user targets on HPPA hosts using TCI

We don't think anybody is actually trying to use these in practice, though:
 * interpreted TCG on a slow host CPU would be unusably slow
 * the m68k user-exec.c support is missing is_write detection so guest
   code which writes to the same page it is executing from was broken
   (will include any guest program using signals)
 * HPPA TCG backend support was dropped two and a half years ago
   with no complaints

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 configure | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/configure b/configure
index 6696316..dce20f0 100755
--- a/configure
+++ b/configure
@@ -1216,6 +1216,13 @@ esac
 QEMU_CFLAGS="$CPU_CFLAGS $QEMU_CFLAGS"
 EXTRA_CFLAGS="$CPU_CFLAGS $EXTRA_CFLAGS"
 
+# For user-mode emulation the host arch has to be one we explicitly
+# support, even if we're using TCI.
+if [ "$ARCH" = "unknown" ]; then
+  bsd_user="no"
+  linux_user="no"
+fi
+
 default_target_list=""
 
 mak_wilds=""
-- 
2.1.4

[Qemu-devel] [PULL 13/24] linux-user: add socket() strace

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/strace.c| 23 +++
 linux-user/strace.list |  2 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index c8df76f..95f4338 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1227,6 +1227,29 @@ print__llseek(const struct syscallname *name,
 }
 #endif
 
+#if defined(TARGET_NR_socket)
+static void
+print_socket(const struct syscallname *name,
+ abi_long arg0, abi_long arg1, abi_long arg2,
+ abi_long arg3, abi_long arg4, abi_long arg5)
+{
+abi_ulong domain = arg0, type = arg1, protocol = arg2;
+
+print_syscall_prologue(name);
+print_socket_domain(domain);
+gemu_log(",");
+print_socket_type(type);
+gemu_log(",");
+if (domain == AF_PACKET ||
+(domain == AF_INET && type == TARGET_SOCK_PACKET)) {
+protocol = tswap16(protocol);
+}
+print_socket_protocol(domain, type, protocol);
+print_syscall_epilogue(name);
+}
+
+#endif
+
 #if defined(TARGET_NR_socketcall)
 
 #define get_user_ualx(x, gaddr, idx) \
diff --git a/linux-user/strace.list b/linux-user/strace.list
index b379497..7c54dc6 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1291,7 +1291,7 @@
 { TARGET_NR_sigsuspend, "sigsuspend" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_socket
-{ TARGET_NR_socket, "socket" , NULL, NULL, NULL },
+{ TARGET_NR_socket, "socket" , NULL, print_socket, NULL },
 #endif
 #ifdef TARGET_NR_socketcall
 { TARGET_NR_socketcall, "socketcall" , NULL, print_socketcall, NULL },
-- 
2.1.4

[Qemu-devel] [PULL 07/24] user-exec: Delete now-unused hppa and m68k cpu_signal_handler() code

2016-06-24 Thread riku . voipio

From: Peter Maydell 

Now that configure blocks attempts to build user-mode code on hppa
and m68k hosts, we can delete the cpu_signal_handler() implementations
for those architectures.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 user-exec.c | 60 
 1 file changed, 60 deletions(-)

diff --git a/user-exec.c b/user-exec.c
index 50e95a6..1e2449e 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -494,24 +494,6 @@ int cpu_signal_handler(int host_signum, void *pinfo, void 
*puc)
  is_write, &uc->uc_sigmask);
 }
 
-#elif defined(__mc68000)
-
-int cpu_signal_handler(int host_signum, void *pinfo,
-   void *puc)
-{
-siginfo_t *info = pinfo;
-struct ucontext *uc = puc;
-unsigned long pc;
-int is_write;
-
-pc = uc->uc_mcontext.gregs[16];
-/* XXX: compute is_write */
-is_write = 0;
-return handle_cpu_signal(pc, (unsigned long)info->si_addr,
- is_write,
- &uc->uc_sigmask);
-}
-
 #elif defined(__ia64)
 
 #ifndef __ISR_VALID
@@ -616,48 +598,6 @@ int cpu_signal_handler(int host_signum, void *pinfo,
  is_write, &uc->uc_sigmask);
 }
 
-#elif defined(__hppa__)
-
-int cpu_signal_handler(int host_signum, void *pinfo,
-   void *puc)
-{
-siginfo_t *info = pinfo;
-struct ucontext *uc = puc;
-unsigned long pc = uc->uc_mcontext.sc_iaoq[0];
-uint32_t insn = *(uint32_t *)pc;
-int is_write = 0;
-
-/* XXX: need kernel patch to get write flag faster.  */
-switch (insn >> 26) {
-case 0x1a: /* STW */
-case 0x19: /* STH */
-case 0x18: /* STB */
-case 0x1b: /* STWM */
-is_write = 1;
-break;
-
-case 0x09: /* CSTWX, FSTWX, FSTWS */
-case 0x0b: /* CSTDX, FSTDX, FSTDS */
-/* Distinguish from coprocessor load ... */
-is_write = (insn >> 9) & 1;
-break;
-
-case 0x03:
-switch ((insn >> 6) & 15) {
-case 0xa: /* STWS */
-case 0x9: /* STHS */
-case 0x8: /* STBS */
-case 0xe: /* STWAS */
-case 0xc: /* STBYS */
-is_write = 1;
-}
-break;
-}
-
-return handle_cpu_signal(pc, (unsigned long)info->si_addr,
- is_write, &uc->uc_sigmask);
-}
-
 #else
 
 #error host CPU specific signal handler needed
-- 
2.1.4

[Qemu-devel] [PULL 14/24] linux-user: fix clone() strace

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/strace.c | 42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 95f4338..cc10dc4 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -957,33 +957,31 @@ print_chmod(const struct syscallname *name,
 #endif
 
 #ifdef TARGET_NR_clone
+static void do_print_clone(unsigned int flags, abi_ulong newsp,
+   abi_ulong parent_tidptr, target_ulong newtls,
+   abi_ulong child_tidptr)
+{
+print_flags(clone_flags, flags, 0);
+print_raw_param("child_stack=0x" TARGET_ABI_FMT_lx, newsp, 0);
+print_raw_param("parent_tidptr=0x" TARGET_ABI_FMT_lx, parent_tidptr, 0);
+print_raw_param("tls=0x" TARGET_ABI_FMT_lx, newtls, 0);
+print_raw_param("child_tidptr=0x" TARGET_ABI_FMT_lx, child_tidptr, 1);
+}
+
 static void
 print_clone(const struct syscallname *name,
-abi_long arg0, abi_long arg1, abi_long arg2,
-abi_long arg3, abi_long arg4, abi_long arg5)
+abi_long arg1, abi_long arg2, abi_long arg3,
+abi_long arg4, abi_long arg5, abi_long arg6)
 {
 print_syscall_prologue(name);
-#if defined(TARGET_M68K)
-print_flags(clone_flags, arg0, 0);
-print_raw_param("newsp=0x" TARGET_ABI_FMT_lx, arg1, 1);
-#elif defined(TARGET_SH4) || defined(TARGET_ALPHA)
-print_flags(clone_flags, arg0, 0);
-print_raw_param("child_stack=0x" TARGET_ABI_FMT_lx, arg1, 0);
-print_raw_param("parent_tidptr=0x" TARGET_ABI_FMT_lx, arg2, 0);
-print_raw_param("child_tidptr=0x" TARGET_ABI_FMT_lx, arg3, 0);
-print_raw_param("tls=0x" TARGET_ABI_FMT_lx, arg4, 1);
-#elif defined(TARGET_CRIS)
-print_raw_param("child_stack=0x" TARGET_ABI_FMT_lx, arg0, 0);
-print_flags(clone_flags, arg1, 0);
-print_raw_param("parent_tidptr=0x" TARGET_ABI_FMT_lx, arg2, 0);
-print_raw_param("tls=0x" TARGET_ABI_FMT_lx, arg3, 0);
-print_raw_param("child_tidptr=0x" TARGET_ABI_FMT_lx, arg4, 1);
+#if defined(TARGET_MICROBLAZE)
+do_print_clone(arg1, arg2, arg4, arg6, arg5);
+#elif defined(TARGET_CLONE_BACKWARDS)
+do_print_clone(arg1, arg2, arg3, arg4, arg5);
+#elif defined(TARGET_CLONE_BACKWARDS2)
+do_print_clone(arg2, arg1, arg3, arg5, arg4);
 #else
-print_flags(clone_flags, arg0, 0);
-print_raw_param("child_stack=0x" TARGET_ABI_FMT_lx, arg1, 0);
-print_raw_param("parent_tidptr=0x" TARGET_ABI_FMT_lx, arg2, 0);
-print_raw_param("tls=0x" TARGET_ABI_FMT_lx, arg3, 0);
-print_raw_param("child_tidptr=0x" TARGET_ABI_FMT_lx, arg4, 1);
+do_print_clone(arg1, arg2, arg3, arg5, arg4);
 #endif
 print_syscall_epilogue(name);
 }
-- 
2.1.4

[Qemu-devel] [PULL 15/24] linux-user: update get_thread_area/set_thread_area strace

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

   int get_thread_area(struct user_desc *u_info);
   int set_thread_area(struct user_desc *u_info);

Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/strace.list | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index 7c54dc6..aa967a2 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -337,7 +337,8 @@
 { TARGET_NR_getsockopt, "getsockopt" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_get_thread_area
-{ TARGET_NR_get_thread_area, "get_thread_area" , NULL, NULL, NULL },
+{ TARGET_NR_get_thread_area, "get_thread_area", "%s(0x"TARGET_ABI_FMT_lx")",
+  NULL, NULL },
 #endif
 #ifdef TARGET_NR_gettid
 { TARGET_NR_gettid, "gettid" , NULL, NULL, NULL },
@@ -1234,7 +1235,8 @@
 { TARGET_NR_setsockopt, "setsockopt" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_set_thread_area
-{ TARGET_NR_set_thread_area, "set_thread_area" , NULL, NULL, NULL },
+{ TARGET_NR_set_thread_area, "set_thread_area", "%s(0x"TARGET_ABI_FMT_lx")",
+  NULL, NULL },
 #endif
 #ifdef TARGET_NR_set_tid_address
 { TARGET_NR_set_tid_address, "set_tid_address" , NULL, NULL, NULL },
-- 
2.1.4

[Qemu-devel] [PULL 17/24] linux-user: fd_trans_host_to_target_data() must process only received data

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

if we process the whole buffer, the netlink helpers can try
to swap invalid data.

Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ce9f020..b635127 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2991,7 +2991,7 @@ static abi_long do_sendrecvmsg_locked(int fd, struct 
target_msghdr *msgp,
 len = ret;
 if (fd_trans_host_to_target_data(fd)) {
 ret = fd_trans_host_to_target_data(fd)(msg.msg_iov->iov_base,
-   msg.msg_iov->iov_len);
+   len);
 } else {
 ret = host_to_target_cmsg(msgp, &msg);
 }
-- 
2.1.4

[Qemu-devel] [PULL 09/24] linux-user: Create a hostdep.h for each host architecture

2016-06-24 Thread riku . voipio

From: Peter Maydell 

In commit 4d330cee37a21 a new hostdep.h file was added, with the intent
that host architectures which needed one could provide it, and the
build system would automatically fall back to a generic version if
there was no version for the host architecture. Although this works,
it has a flaw: if a subsequent commit switches an architecture from
"uses generic/hostdep.h" to "uses its own hostdep.h" nothing in the
makefile dependencies notices this and so doing a rebuild without
a manual 'make clean' will fail.

So we drop the idea of having a 'generic' version in favour of
every architecture we support having its own hostdep.h, even if
it doesn't have anything in it. (There are only thirteen of these.)

If the dependency files claim that an object file depends on a
nonexistent file, our dependency system means that make will
rebuild the object file, and regenerate the dependencies in
the process. So moving between trees prior to this commit and
trees after this commit works without requiring a 'make clean'.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 Makefile.target   |  5 +
 linux-user/host/aarch64/hostdep.h | 15 +++
 linux-user/host/arm/hostdep.h | 15 +++
 linux-user/host/generic/hostdep.h | 20 
 linux-user/host/i386/hostdep.h| 15 +++
 linux-user/host/ia64/hostdep.h| 15 +++
 linux-user/host/mips/hostdep.h| 15 +++
 linux-user/host/ppc/hostdep.h | 15 +++
 linux-user/host/ppc64/hostdep.h   | 15 +++
 linux-user/host/s390/hostdep.h| 15 +++
 linux-user/host/s390x/hostdep.h   | 15 +++
 linux-user/host/sparc/hostdep.h   | 15 +++
 linux-user/host/sparc64/hostdep.h | 15 +++
 linux-user/host/x32/hostdep.h | 15 +++
 14 files changed, 181 insertions(+), 24 deletions(-)
 create mode 100644 linux-user/host/aarch64/hostdep.h
 create mode 100644 linux-user/host/arm/hostdep.h
 delete mode 100644 linux-user/host/generic/hostdep.h
 create mode 100644 linux-user/host/i386/hostdep.h
 create mode 100644 linux-user/host/ia64/hostdep.h
 create mode 100644 linux-user/host/mips/hostdep.h
 create mode 100644 linux-user/host/ppc/hostdep.h
 create mode 100644 linux-user/host/ppc64/hostdep.h
 create mode 100644 linux-user/host/s390/hostdep.h
 create mode 100644 linux-user/host/s390x/hostdep.h
 create mode 100644 linux-user/host/sparc/hostdep.h
 create mode 100644 linux-user/host/sparc64/hostdep.h
 create mode 100644 linux-user/host/x32/hostdep.h

diff --git a/Makefile.target b/Makefile.target
index d720b3e..a440bcb 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -108,11 +108,8 @@ obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/dpd/decimal128.o
 
 ifdef CONFIG_LINUX_USER
 
-# Note that we only add linux-user/host/$ARCH if it exists, and
-# that it must come before linux-user/host/generic in the search path.
 QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) \
- $(patsubst %,-I%,$(wildcard $(SRC_PATH)/linux-user/host/$(ARCH))) 
\
- -I$(SRC_PATH)/linux-user/host/generic \
+ -I$(SRC_PATH)/linux-user/host/$(ARCH) \
  -I$(SRC_PATH)/linux-user
 
 obj-y += linux-user/
diff --git a/linux-user/host/aarch64/hostdep.h 
b/linux-user/host/aarch64/hostdep.h
new file mode 100644
index 000..7609bf5
--- /dev/null
+++ b/linux-user/host/aarch64/hostdep.h
@@ -0,0 +1,15 @@
+/*
+ * hostdep.h : things which are dependent on the host architecture
+ *
+ *  * Written by Peter Maydell 
+ *
+ * Copyright (C) 2016 Linaro Limited
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HOSTDEP_H
+#define QEMU_HOSTDEP_H
+
+#endif
diff --git a/linux-user/host/arm/hostdep.h b/linux-user/host/arm/hostdep.h
new file mode 100644
index 000..7609bf5
--- /dev/null
+++ b/linux-user/host/arm/hostdep.h
@@ -0,0 +1,15 @@
+/*
+ * hostdep.h : things which are dependent on the host architecture
+ *
+ *  * Written by Peter Maydell 
+ *
+ * Copyright (C) 2016 Linaro Limited
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HOSTDEP_H
+#define QEMU_HOSTDEP_H
+
+#endif
diff --git a/linux-user/host/generic/hostdep.h 
b/linux-user/host/generic/hostdep.h
deleted file mode 100644
index cfabc35..000
--- a/linux-user/host/generic/hostdep.h
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- * hostdep.h : fallback generic version of header for things
- * which are dependent on the host architecture
- *
- *  * Written by Peter Maydell 
- *
- * Copyright (C) 2016 Linaro Limited
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef QEMU_HOSTDEP_H

Re: [Qemu-devel] Qemu and heavily increased RSS usage

2016-06-24 Thread Peter Lieven

Am 24.06.2016 um 11:37 schrieb Stefan Hajnoczi:
> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>> On 22 June 2016 at 20:55, Peter Lieven  wrote:
>>> What makes the coroutine pool memory intensive is the stack size of 1MB per
>>> coroutine. Is it really necessary to have such a big stack?
>> That reminds me that I was wondering if we should allocate
>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>> not actually using 1MB of stack then it's only going to
>> be eating virtual memory, not necessarily real memory.)
> Yes, MAP_GROWSDOWN will not reduce RSS.

Yes, I can confirm just tested...

>
> It's possible that we can reduce RSS usage of the coroutine pool but it
> will require someone to profile the pool usage patterns.

It would be interesting to see what stack size we really need. Is it possible
to automatically detect this value (at compile time?)

I can also confirm that the coroutine pool is the second major RSS user beside
heap fragmentation.

Lowering the mmap threshold of malloc to about 32k also gives good results.
In this case there are very few active mappings in the running vServer, but the
RSS is still at about 50MB (without coroutine pool). Maybe it would be good
to identify which parts of Qemu malloc lets say >16kB and convert them to mmap
if it is feasible.

Peter

[Qemu-devel] [PULL 18/24] linux-user: don't swap NLMSG_DATA() fields

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

If the structure pointed by NLMSG_DATA() is bigger
than the size of NLMSG_DATA(), don't swap its fields
to avoid memory corruption.

Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/syscall.c | 72 ++--
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b635127..52f88e2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1948,29 +1948,35 @@ static abi_long host_to_target_data_route(struct 
nlmsghdr *nlh)
 case RTM_NEWLINK:
 case RTM_DELLINK:
 case RTM_GETLINK:
-ifi = NLMSG_DATA(nlh);
-ifi->ifi_type = tswap16(ifi->ifi_type);
-ifi->ifi_index = tswap32(ifi->ifi_index);
-ifi->ifi_flags = tswap32(ifi->ifi_flags);
-ifi->ifi_change = tswap32(ifi->ifi_change);
-host_to_target_link_rtattr(IFLA_RTA(ifi),
-   nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)));
+if (nlh->nlmsg_len >= NLMSG_LENGTH(sizeof(*ifi))) {
+ifi = NLMSG_DATA(nlh);
+ifi->ifi_type = tswap16(ifi->ifi_type);
+ifi->ifi_index = tswap32(ifi->ifi_index);
+ifi->ifi_flags = tswap32(ifi->ifi_flags);
+ifi->ifi_change = tswap32(ifi->ifi_change);
+host_to_target_link_rtattr(IFLA_RTA(ifi),
+   nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)));
+}
 break;
 case RTM_NEWADDR:
 case RTM_DELADDR:
 case RTM_GETADDR:
-ifa = NLMSG_DATA(nlh);
-ifa->ifa_index = tswap32(ifa->ifa_index);
-host_to_target_addr_rtattr(IFA_RTA(ifa),
-   nlmsg_len - NLMSG_LENGTH(sizeof(*ifa)));
+if (nlh->nlmsg_len >= NLMSG_LENGTH(sizeof(*ifa))) {
+ifa = NLMSG_DATA(nlh);
+ifa->ifa_index = tswap32(ifa->ifa_index);
+host_to_target_addr_rtattr(IFA_RTA(ifa),
+   nlmsg_len - NLMSG_LENGTH(sizeof(*ifa)));
+}
 break;
 case RTM_NEWROUTE:
 case RTM_DELROUTE:
 case RTM_GETROUTE:
-rtm = NLMSG_DATA(nlh);
-rtm->rtm_flags = tswap32(rtm->rtm_flags);
-host_to_target_route_rtattr(RTM_RTA(rtm),
-nlmsg_len - NLMSG_LENGTH(sizeof(*rtm)));
+if (nlh->nlmsg_len >= NLMSG_LENGTH(sizeof(*rtm))) {
+rtm = NLMSG_DATA(nlh);
+rtm->rtm_flags = tswap32(rtm->rtm_flags);
+host_to_target_route_rtattr(RTM_RTA(rtm),
+nlmsg_len - 
NLMSG_LENGTH(sizeof(*rtm)));
+}
 break;
 default:
 return -TARGET_EINVAL;
@@ -2086,30 +2092,36 @@ static abi_long target_to_host_data_route(struct 
nlmsghdr *nlh)
 break;
 case RTM_NEWLINK:
 case RTM_DELLINK:
-ifi = NLMSG_DATA(nlh);
-ifi->ifi_type = tswap16(ifi->ifi_type);
-ifi->ifi_index = tswap32(ifi->ifi_index);
-ifi->ifi_flags = tswap32(ifi->ifi_flags);
-ifi->ifi_change = tswap32(ifi->ifi_change);
-target_to_host_link_rtattr(IFLA_RTA(ifi), nlh->nlmsg_len -
-   NLMSG_LENGTH(sizeof(*ifi)));
+if (nlh->nlmsg_len >= NLMSG_LENGTH(sizeof(*ifi))) {
+ifi = NLMSG_DATA(nlh);
+ifi->ifi_type = tswap16(ifi->ifi_type);
+ifi->ifi_index = tswap32(ifi->ifi_index);
+ifi->ifi_flags = tswap32(ifi->ifi_flags);
+ifi->ifi_change = tswap32(ifi->ifi_change);
+target_to_host_link_rtattr(IFLA_RTA(ifi), nlh->nlmsg_len -
+   NLMSG_LENGTH(sizeof(*ifi)));
+}
 break;
 case RTM_GETADDR:
 case RTM_NEWADDR:
 case RTM_DELADDR:
-ifa = NLMSG_DATA(nlh);
-ifa->ifa_index = tswap32(ifa->ifa_index);
-target_to_host_addr_rtattr(IFA_RTA(ifa), nlh->nlmsg_len -
-   NLMSG_LENGTH(sizeof(*ifa)));
+if (nlh->nlmsg_len >= NLMSG_LENGTH(sizeof(*ifa))) {
+ifa = NLMSG_DATA(nlh);
+ifa->ifa_index = tswap32(ifa->ifa_index);
+target_to_host_addr_rtattr(IFA_RTA(ifa), nlh->nlmsg_len -
+   NLMSG_LENGTH(sizeof(*ifa)));
+}
 break;
 case RTM_GETROUTE:
 break;
 case RTM_NEWROUTE:
 case RTM_DELROUTE:
-rtm = NLMSG_DATA(nlh);
-rtm->rtm_flags = tswap32(rtm->rtm_flags);
-target_to_host_route_rtattr(RTM_RTA(rtm), nlh->nlmsg_len -
-NLMSG_LENGTH(sizeof(*rtm)));
+if (nlh->nlmsg_len >= NLMSG_LENGTH(sizeof(*rtm))) {
+rtm = NLMSG_DATA(nlh);
+rtm->rtm_flags = tswap32(rtm->rtm_flags);
+target_to_host_route_rtattr(RTM_RTA(rtm), nlh->nlmsg_len -
+NLMSG_LENGTH(sizeof(*rtm)));
+}
 break;

[Qemu-devel] [PULL 10/24] linux-user: Fix wrong type used for argument to rt_sigqueueinfo

2016-06-24 Thread riku . voipio

From: Peter Maydell 

The third argument to the rt_sigqueueinfo syscall is a pointer to
a siginfo_t, not a pointer to a sigset_t. Fix the error in the
arguments to lock_user(), which meant that we would not have
detected some faults that we should.

Signed-off-by: Peter Maydell 
Reviewed-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
---
 linux-user/syscall.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 95eafeb..686ebfb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7876,8 +7876,11 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 case TARGET_NR_rt_sigqueueinfo:
 {
 siginfo_t uinfo;
-if (!(p = lock_user(VERIFY_READ, arg3, sizeof(target_sigset_t), 
1)))
+
+p = lock_user(VERIFY_READ, arg3, sizeof(target_siginfo_t), 1);
+if (!p) {
 goto efault;
+}
 target_to_host_siginfo(&uinfo, p);
 unlock_user(p, arg1, 0);
 ret = get_errno(sys_rt_sigqueueinfo(arg1, arg2, &uinfo));
-- 
2.1.4

Re: [Qemu-devel] [Qemu-block] [PATCH] block/qdev: Fix NULL access when using BB twice

2016-06-24 Thread Stefan Hajnoczi

On Thu, Jun 23, 2016 at 09:30:01AM +0200, Kevin Wolf wrote:
> BlockBackend has only a single pointer to its guest device, so it makes
> sure that only a single guest device is attached to it. device-add
> returns an error if you try to attach a second device to a BB. In order
> to make the error message nicer, -device that manually connects to a
> if=none block device get a different message than -drive that implicitly
> creates a guest device. The if=... option is stored in DriveInfo.
> 
> However, since blockdev-add exists, not every BlockBackend has a
> DriveInfo any more. Check that it exists before we dereference it.
> 
> QMP reproducer resulting in a segfault:
> 
> {"execute":"blockdev-add","arguments":{"options":{"id":"disk","driver":"file","filename":"/tmp/test.img"}}}
> {"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"disk"}}
> {"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"disk"}}
> 
> Signed-off-by: Kevin Wolf 
> ---
>  hw/core/qdev-properties-system.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

[Qemu-devel] [PULL 16/24] linux-user: add missing return in netlink switch statement

2016-06-24 Thread riku . voipio

From: Laurent Vivier 

Reported-by: Peter Maydell 
Signed-off-by: Laurent Vivier 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b1ed57c..ce9f020 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1692,6 +1692,7 @@ static abi_long target_to_host_for_each_nlmsg(struct 
nlmsghdr *nlh,
 struct nlmsgerr *e = NLMSG_DATA(nlh);
 e->error = tswap32(e->error);
 tswap_nlmsghdr(&e->msg);
+return 0;
 }
 default:
 ret = target_to_host_nlmsg(nlh);
-- 
2.1.4

[Qemu-devel] [PULL 19/24] linux-user: fix x86_64 safe_syscall

2016-06-24 Thread riku . voipio

From: Richard Henderson 

Do what the comment says, test for signal_pending non-zero,
rather than the current code which tests for bit 0 non-zero.

Signed-off-by: Richard Henderson 
Signed-off-by: Riku Voipio 
Reviewed-by: Peter Maydell 
---
 linux-user/host/x86_64/safe-syscall.inc.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/host/x86_64/safe-syscall.inc.S 
b/linux-user/host/x86_64/safe-syscall.inc.S
index e09368d..f36992d 100644
--- a/linux-user/host/x86_64/safe-syscall.inc.S
+++ b/linux-user/host/x86_64/safe-syscall.inc.S
@@ -67,8 +67,8 @@ safe_syscall_base:
  */
 safe_syscall_start:
 /* if signal_pending is non-zero, don't do the call */
-testl   $1, (%rbp)
-jnz return_ERESTARTSYS
+cmpl   $0, (%rbp)
+jnz 1f
 syscall
 safe_syscall_end:
 /* code path for having successfully executed the syscall */
@@ -78,7 +78,7 @@ safe_syscall_end:
 .cfi_restore rbp
 ret
 
-return_ERESTARTSYS:
+1:
 /* code path when we didn't execute the syscall */
 .cfi_restore_state
 mov $-TARGET_ERESTARTSYS, %rax
-- 
2.1.4

[Qemu-devel] [PULL 22/24] linux-user: Provide safe_syscall for aarch64

2016-06-24 Thread riku . voipio

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Signed-off-by: Riku Voipio 
[RV] Updated syscall argument comment to match code
---
 linux-user/host/aarch64/hostdep.h  | 23 +
 linux-user/host/aarch64/safe-syscall.inc.S | 75 ++
 2 files changed, 98 insertions(+)
 create mode 100644 linux-user/host/aarch64/safe-syscall.inc.S

diff --git a/linux-user/host/aarch64/hostdep.h 
b/linux-user/host/aarch64/hostdep.h
index 7609bf5..b79eaf1 100644
--- a/linux-user/host/aarch64/hostdep.h
+++ b/linux-user/host/aarch64/hostdep.h
@@ -12,4 +12,27 @@
 #ifndef QEMU_HOSTDEP_H
 #define QEMU_HOSTDEP_H
 
+/* We have a safe-syscall.inc.S */
+#define HAVE_SAFE_SYSCALL
+
+#ifndef __ASSEMBLER__
+
+/* These are defined by the safe-syscall.inc.S file */
+extern char safe_syscall_start[];
+extern char safe_syscall_end[];
+
+/* Adjust the signal context to rewind out of safe-syscall if we're in it */
+static inline void rewind_if_in_safe_syscall(void *puc)
+{
+struct ucontext *uc = puc;
+__u64 *pcreg = &uc->uc_mcontext.pc;
+
+if (*pcreg > (uintptr_t)safe_syscall_start
+&& *pcreg < (uintptr_t)safe_syscall_end) {
+*pcreg = (uintptr_t)safe_syscall_start;
+}
+}
+
+#endif /* __ASSEMBLER__ */
+
 #endif
diff --git a/linux-user/host/aarch64/safe-syscall.inc.S 
b/linux-user/host/aarch64/safe-syscall.inc.S
new file mode 100644
index 000..58a2329
--- /dev/null
+++ b/linux-user/host/aarch64/safe-syscall.inc.S
@@ -0,0 +1,75 @@
+/*
+ * safe-syscall.inc.S : host-specific assembly fragment
+ * to handle signals occurring at the same time as system calls.
+ * This is intended to be included by linux-user/safe-syscall.S
+ *
+ * Written by Richard Henderson 
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+   .global safe_syscall_base
+   .global safe_syscall_start
+   .global safe_syscall_end
+   .type   safe_syscall_base, #function
+   .type   safe_syscall_start, #function
+   .type   safe_syscall_end, #function
+
+   /* This is the entry point for making a system call. The calling
+* convention here is that of a C varargs function with the
+* first argument an 'int *' to the signal_pending flag, the
+* second one the system call number (as a 'long'), and all further
+* arguments being syscall arguments (also 'long').
+* We return a long which is the syscall's return value, which
+* may be negative-errno on failure. Conversion to the
+* -1-and-errno-set convention is done by the calling wrapper.
+*/
+safe_syscall_base:
+   .cfi_startproc
+   /* The syscall calling convention isn't the same as the
+* C one:
+* we enter with x0 == *signal_pending
+*   x1 == syscall number
+*   x2 ... x7, (stack) == syscall arguments
+*   and return the result in x0
+* and the syscall instruction needs
+*   x8 == syscall number
+*   x0 ... x7 == syscall arguments
+*   and returns the result in x0
+* Shuffle everything around appropriately.
+*/
+   mov x9, x0  /* signal_pending pointer */
+   mov x8, x1  /* syscall number */
+   mov x0, x2  /* syscall arguments */
+   mov x1, x3
+   mov x2, x4
+   mov x3, x5
+   mov x4, x6
+   mov x6, x7
+   ldr x7, [sp]
+
+   /* This next sequence of code works in conjunction with the
+* rewind_if_safe_syscall_function(). If a signal is taken
+* and the interrupted PC is anywhere between 'safe_syscall_start'
+* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'.
+* The code sequence must therefore be able to cope with this, and
+* the syscall instruction must be the final one in the sequence.
+*/
+safe_syscall_start:
+   /* if signal_pending is non-zero, don't do the call */
+   ldr w10, [x9]
+   cbnzw10, 0f 
+   svc 0x0
+safe_syscall_end:
+   /* code path for having successfully executed the syscall */
+   ret
+
+0:
+   /* code path when we didn't execute the syscall */
+   mov x0, #-TARGET_ERESTARTSYS
+   ret
+   .cfi_endproc
+
+   .size   safe_syscall_base, .-safe_syscall_base
-- 
2.1.4

Re: [Qemu-devel] [PATCH v2 1/1] cpu: report hyperv feature words through qom

2016-06-24 Thread Evgeny Yakovlev



On 23.06.2016 00:01, Eduardo Habkost wrote:

On Mon, Jun 20, 2016 at 06:29:40PM +0300, Denis V. Lunev wrote:

From: Evgeny Yakovlev 

This change adds hyperv feature words report through qom rpc.

When VM is configured with hyperv features enabled
libvirt will check that required feature words are set
in cpuid leaf 4003 through qom request.

Currently qemu does not report hyperv feature words
which prevents windows guests from starting with libvirt.

Signed-off-by: Evgeny Yakovlev 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: Marcelo Tosatti 
---
Changes from v1:
- renamed hyperv features so they don't conflict with hyperv properties
- refactored kvm_arch_init_vcpu to process hyperv props into feature words

  target-i386/cpu.c |  50 
  target-i386/cpu.h |   3 ++
  target-i386/kvm.c | 114 +++---
  3 files changed, 119 insertions(+), 48 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 3665fec..c79b4e3 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -245,6 +245,44 @@ static const char *kvm_feature_name[] = {
  NULL, NULL, NULL, NULL,
  };
  
+static const char *hyperv_priv_feature_name[] = {

+"hv_msr_vp_runtime_access", "hv_msr_time_refcount_access",
+"hv_msr_synic_access", "hv_msr_stimer_access",
+"hv_msr_apic_access", "hv_msr_hypercall_access",
+"hv_vpindex_access", "hv_msr_reset_access",
+"hv_msr_stats_access", "hv_reftsc_access",
+"hv_msr_idle_access", "hv_msr_frequency_access",
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
+static const char *hyperv_ident_feature_name[] = {
+"hv_create_partitions", "hv_access_partition_id",
+"hv_access_memory_pool", "hv_adjust_message_buffers",
+"hv_post_messages", "hv_signal_events",
+"hv_create_port", "hv_connect_port",
+"hv_access_stats", NULL, NULL, "hv_debugging",
+"hv_cpu_power_management", "hv_configure_profiler", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
+static const char *hyperv_misc_feature_name[] = {
+"hv_mwait", "hv_guest_debugging", "hv_perf_monitor", "hv_cpu_dynamic_part",
+"hv_hypercall_params_xmm", "hv_guest_idle_state", NULL, NULL,
+NULL, NULL, "hv_guest_crash_msr", NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};

Adding these feature bit names will let individual bits to be
configured from the command-line, not just reported using QOM.

I am not sure this is intentional, as now we have conflicting
ways to configure some hyperv features. For example: now
HV_X64_MSR_VP_RUNTIME_AVAILABLE can be set using "+hv-runtime" or
"+hv_msr_vp_runtime_access". And the difference between both
methods is not clear for users.

Also, as kvm_arch_get_supported_cpuid() won't return anything
about those feature flags, QEMU will get confused if you try to
use "+hv_msr_vp_runtime_access" in the command-line.

I believe this can be addressed by doing the work in three steps:

1) Add hyperv CPUID leaves to FeatureWord/feature_word_info
without any name arrays, make the changes you made below to
change env->features inside kvm_arch_init_vcpu().
* In other words: this patch, but without the feature_name
  arrays.
* This wil make QEMU report all the hyperv feature bits in the
  "feature-words" QOM property (read-only)
* This won't change any command-line interface.
* This shouldn't confuse QEMU due to
  lack of kvm_arch_get_supported_cpuid() support, because
  env->features is being set up after
  x86_cpu_filter_features() was already called.

If all you want is to report low-level CPUID data through QMP,
step (1) is enough: it will already include the low-level hyperv
CPUID data in the "feature-words" property.

2) Replace the hyperv_* boolean fields with env->feature bits.
* See below how this could be done for each specific case.
* This requires making kvm_arch_get_supported_cpuid() report
  them (after making the appropriate capability checks).
* This makes the check/enforce flags support hyperv
  capabilities.

3) (optional) Add the remaining feature names (that are not
configurable yet) to the feature_names arrays.
* This won't let them be configured in the command-line by
  now, if kvm_arch_get_supported_cpuid() doesn't report them
  as supported.
* I am not sure we really want that. What would be the point
  of adding feature names that we don't even support yet?


+
  static const char *svm_feature_name[] = {
  "npt", "lbrv", "svm_lock", "nrip_save",
  "tsc_scale", "vmcb_clean",  "flushbyasid", "decodeassists",

[...]

diff --git a/target-i386/kvm.c b/ta

[Qemu-devel] [PULL 20/24] linux-user: Provide safe_syscall for i386

2016-06-24 Thread riku . voipio

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Signed-off-by: Riku Voipio 
---
 linux-user/host/i386/hostdep.h  |  23 +++
 linux-user/host/i386/safe-syscall.inc.S | 112 
 2 files changed, 135 insertions(+)
 create mode 100644 linux-user/host/i386/safe-syscall.inc.S

diff --git a/linux-user/host/i386/hostdep.h b/linux-user/host/i386/hostdep.h
index 7609bf5..5a12f4a 100644
--- a/linux-user/host/i386/hostdep.h
+++ b/linux-user/host/i386/hostdep.h
@@ -12,4 +12,27 @@
 #ifndef QEMU_HOSTDEP_H
 #define QEMU_HOSTDEP_H
 
+/* We have a safe-syscall.inc.S */
+#define HAVE_SAFE_SYSCALL
+
+#ifndef __ASSEMBLER__
+
+/* These are defined by the safe-syscall.inc.S file */
+extern char safe_syscall_start[];
+extern char safe_syscall_end[];
+
+/* Adjust the signal context to rewind out of safe-syscall if we're in it */
+static inline void rewind_if_in_safe_syscall(void *puc)
+{
+struct ucontext *uc = puc;
+greg_t *pcreg = &uc->uc_mcontext.gregs[REG_EIP];
+
+if (*pcreg > (uintptr_t)safe_syscall_start
+&& *pcreg < (uintptr_t)safe_syscall_end) {
+*pcreg = (uintptr_t)safe_syscall_start;
+}
+}
+
+#endif /* __ASSEMBLER__ */
+
 #endif
diff --git a/linux-user/host/i386/safe-syscall.inc.S 
b/linux-user/host/i386/safe-syscall.inc.S
new file mode 100644
index 000..766d0de
--- /dev/null
+++ b/linux-user/host/i386/safe-syscall.inc.S
@@ -0,0 +1,112 @@
+/*
+ * safe-syscall.inc.S : host-specific assembly fragment
+ * to handle signals occurring at the same time as system calls.
+ * This is intended to be included by linux-user/safe-syscall.S
+ *
+ * Written by Richard Henderson 
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+   .global safe_syscall_base
+   .global safe_syscall_start
+   .global safe_syscall_end
+   .type   safe_syscall_base, @function
+
+   /* This is the entry point for making a system call. The calling
+* convention here is that of a C varargs function with the
+* first argument an 'int *' to the signal_pending flag, the
+* second one the system call number (as a 'long'), and all further
+* arguments being syscall arguments (also 'long').
+* We return a long which is the syscall's return value, which
+* may be negative-errno on failure. Conversion to the
+* -1-and-errno-set convention is done by the calling wrapper.
+*/
+safe_syscall_base:
+   .cfi_startproc
+   push%ebp
+   .cfi_adjust_cfa_offset 4
+   .cfi_rel_offset ebp, 0
+   push%esi
+   .cfi_adjust_cfa_offset 4
+   .cfi_rel_offset esi, 0
+   push%edi
+   .cfi_adjust_cfa_offset 4
+   .cfi_rel_offset edi, 0
+   push%ebx
+   .cfi_adjust_cfa_offset 4
+   .cfi_rel_offset ebx, 0
+
+   /* The syscall calling convention isn't the same as the C one:
+* we enter with 0(%esp) == return address
+*   4(%esp) == *signal_pending
+*   8(%esp) == syscall number
+*   12(%esp) ... 32(%esp) == syscall arguments
+*   and return the result in eax
+* and the syscall instruction needs
+*   eax == syscall number
+*   ebx, ecx, edx, esi, edi, ebp == syscall arguments
+*   and returns the result in eax
+* Shuffle everything around appropriately.
+* Note the 16 bytes that we pushed to save registers.
+*/
+   mov 12+16(%esp), %ebx   /* the syscall arguments */
+   mov 16+16(%esp), %ecx
+   mov 20+16(%esp), %edx
+   mov 24+16(%esp), %esi
+   mov 28+16(%esp), %edi
+   mov 32+16(%esp), %ebp
+
+   /* This next sequence of code works in conjunction with the
+* rewind_if_safe_syscall_function(). If a signal is taken
+* and the interrupted PC is anywhere between 'safe_syscall_start'
+* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'.
+* The code sequence must therefore be able to cope with this, and
+* the syscall instruction must be the final one in the sequence.
+*/
+safe_syscall_start:
+   /* if signal_pending is non-zero, don't do the call */
+   mov 4+16(%esp), %eax/* signal_pending */
+   cmp $0, (%eax)
+   jnz 1f
+   mov 8+16(%esp), %eax/* syscall number */
+   int $0x80
+safe_syscall_end:
+   /* code path for having successfully executed the syscall */
+   pop %ebx
+   .cfi_remember_state
+   .cfi_def_cfa_offset -4
+   .cfi_restore ebx
+   pop %edi
+   .cfi_def_cfa_offset -4
+   .cfi_restore edi
+   pop %esi
+   .cfi_def_cfa_offset -4
+   .cfi_restore esi
+   pop %ebp
+

[Qemu-devel] [PULL 21/24] linux-user: Provide safe_syscall for arm

2016-06-24 Thread riku . voipio

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 linux-user/host/arm/hostdep.h  | 23 +
 linux-user/host/arm/safe-syscall.inc.S | 90 ++
 2 files changed, 113 insertions(+)
 create mode 100644 linux-user/host/arm/safe-syscall.inc.S

diff --git a/linux-user/host/arm/hostdep.h b/linux-user/host/arm/hostdep.h
index 7609bf5..8e1ff2f 100644
--- a/linux-user/host/arm/hostdep.h
+++ b/linux-user/host/arm/hostdep.h
@@ -12,4 +12,27 @@
 #ifndef QEMU_HOSTDEP_H
 #define QEMU_HOSTDEP_H
 
+/* We have a safe-syscall.inc.S */
+#define HAVE_SAFE_SYSCALL
+
+#ifndef __ASSEMBLER__
+
+/* These are defined by the safe-syscall.inc.S file */
+extern char safe_syscall_start[];
+extern char safe_syscall_end[];
+
+/* Adjust the signal context to rewind out of safe-syscall if we're in it */
+static inline void rewind_if_in_safe_syscall(void *puc)
+{
+struct ucontext *uc = puc;
+unsigned long *pcreg = &uc->uc_mcontext.arm_pc;
+
+if (*pcreg > (uintptr_t)safe_syscall_start
+&& *pcreg < (uintptr_t)safe_syscall_end) {
+*pcreg = (uintptr_t)safe_syscall_start;
+}
+}
+
+#endif /* __ASSEMBLER__ */
+
 #endif
diff --git a/linux-user/host/arm/safe-syscall.inc.S 
b/linux-user/host/arm/safe-syscall.inc.S
new file mode 100644
index 000..88c4958
--- /dev/null
+++ b/linux-user/host/arm/safe-syscall.inc.S
@@ -0,0 +1,90 @@
+/*
+ * safe-syscall.inc.S : host-specific assembly fragment
+ * to handle signals occurring at the same time as system calls.
+ * This is intended to be included by linux-user/safe-syscall.S
+ *
+ * Written by Richard Henderson 
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+   .global safe_syscall_base
+   .global safe_syscall_start
+   .global safe_syscall_end
+   .type   safe_syscall_base, %function
+
+   .cfi_sections   .debug_frame
+
+   .text
+   .syntax unified
+   .arm
+   .align 2
+
+   /* This is the entry point for making a system call. The calling
+* convention here is that of a C varargs function with the
+* first argument an 'int *' to the signal_pending flag, the
+* second one the system call number (as a 'long'), and all further
+* arguments being syscall arguments (also 'long').
+* We return a long which is the syscall's return value, which
+* may be negative-errno on failure. Conversion to the
+* -1-and-errno-set convention is done by the calling wrapper.
+*/
+safe_syscall_base:
+   .fnstart
+   .cfi_startproc
+   mov r12, sp /* save entry stack */
+   push{ r4, r5, r6, r7, r8, lr }
+   .save   { r4, r5, r6, r7, r8, lr }
+   .cfi_adjust_cfa_offset 24
+   .cfi_rel_offset r4, 0
+   .cfi_rel_offset r5, 4
+   .cfi_rel_offset r6, 8
+   .cfi_rel_offset r7, 12
+   .cfi_rel_offset r8, 16
+   .cfi_rel_offset lr, 20
+
+   /* The syscall calling convention isn't the same as the C one:
+* we enter with r0 == *signal_pending
+*   r1 == syscall number
+*   r2, r3, [sp+0] ... [sp+12] == syscall arguments
+*   and return the result in r0
+* and the syscall instruction needs
+*   r7 == syscall number
+*   r0 ... r6 == syscall arguments
+*   and returns the result in r0
+* Shuffle everything around appropriately.
+* Note the 16 bytes that we pushed to save registers.
+*/
+   mov r8, r0  /* copy signal_pending */
+   mov r7, r1  /* syscall number */
+   mov r0, r2  /* syscall args */
+   mov r1, r3
+   ldm r12, { r2, r3, r4, r5, r6 }
+
+   /* This next sequence of code works in conjunction with the
+* rewind_if_safe_syscall_function(). If a signal is taken
+* and the interrupted PC is anywhere between 'safe_syscall_start'
+* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'.
+* The code sequence must therefore be able to cope with this, and
+* the syscall instruction must be the final one in the sequence.
+*/
+safe_syscall_start:
+   /* if signal_pending is non-zero, don't do the call */
+   ldr r12, [r8]   /* signal_pending */
+   tst r12, r12
+   bne 1f
+   swi 0
+safe_syscall_end:
+   /* code path for having successfully executed the syscall */
+   pop { r4, r5, r6, r7, r8, pc }
+
+1:
+   /* code path when we didn't execute the syscall */
+   ldr r0, =-TARGET_ERESTARTSYS
+   pop { r4, r5, r6, r7, r8, pc }
+   .fnend
+   .cfi_endproc
+
+   .size   safe_syscall_base, .-safe_sy

[Qemu-devel] [PULL 24/24] linux-user: Provide safe_syscall for ppc64

2016-06-24 Thread riku . voipio

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 linux-user/host/ppc64/hostdep.h  | 23 
 linux-user/host/ppc64/safe-syscall.inc.S | 92 
 2 files changed, 115 insertions(+)
 create mode 100644 linux-user/host/ppc64/safe-syscall.inc.S

diff --git a/linux-user/host/ppc64/hostdep.h b/linux-user/host/ppc64/hostdep.h
index 7609bf5..310e7d1 100644
--- a/linux-user/host/ppc64/hostdep.h
+++ b/linux-user/host/ppc64/hostdep.h
@@ -12,4 +12,27 @@
 #ifndef QEMU_HOSTDEP_H
 #define QEMU_HOSTDEP_H
 
+/* We have a safe-syscall.inc.S */
+#define HAVE_SAFE_SYSCALL
+
+#ifndef __ASSEMBLER__
+
+/* These are defined by the safe-syscall.inc.S file */
+extern char safe_syscall_start[];
+extern char safe_syscall_end[];
+
+/* Adjust the signal context to rewind out of safe-syscall if we're in it */
+static inline void rewind_if_in_safe_syscall(void *puc)
+{
+struct ucontext *uc = puc;
+unsigned long *pcreg = &uc->uc_mcontext.gp_regs[PT_NIP];
+
+if (*pcreg > (uintptr_t)safe_syscall_start
+&& *pcreg < (uintptr_t)safe_syscall_end) {
+*pcreg = (uintptr_t)safe_syscall_start;
+}
+}
+
+#endif /* __ASSEMBLER__ */
+
 #endif
diff --git a/linux-user/host/ppc64/safe-syscall.inc.S 
b/linux-user/host/ppc64/safe-syscall.inc.S
new file mode 100644
index 000..d30050a
--- /dev/null
+++ b/linux-user/host/ppc64/safe-syscall.inc.S
@@ -0,0 +1,92 @@
+/*
+ * safe-syscall.inc.S : host-specific assembly fragment
+ * to handle signals occurring at the same time as system calls.
+ * This is intended to be included by linux-user/safe-syscall.S
+ *
+ * Written by Richard Henderson 
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+   .global safe_syscall_base
+   .global safe_syscall_start
+   .global safe_syscall_end
+   .type   safe_syscall_base, @function
+
+   .text
+
+   /* This is the entry point for making a system call. The calling
+* convention here is that of a C varargs function with the
+* first argument an 'int *' to the signal_pending flag, the
+* second one the system call number (as a 'long'), and all further
+* arguments being syscall arguments (also 'long').
+* We return a long which is the syscall's return value, which
+* may be negative-errno on failure. Conversion to the
+* -1-and-errno-set convention is done by the calling wrapper.
+*/
+#if _CALL_ELF == 2
+safe_syscall_base:
+   .cfi_startproc
+   .localentry safe_syscall_base,0
+#else
+   .section ".opd","aw"
+   .align  3
+safe_syscall_base:
+   .quad   .L.safe_syscall_base,.TOC.@tocbase,0
+   .previous
+.L.safe_syscall_base:
+   .cfi_startproc
+#endif
+   /* We enter with r3 == *signal_pending
+*   r4 == syscall number
+*   r5 ... r10 == syscall arguments
+*   and return the result in r3
+* and the syscall instruction needs
+*   r0 == syscall number
+*   r3 ... r8 == syscall arguments
+*   and returns the result in r3
+* Shuffle everything around appropriately.
+*/
+   mr  11, 3   /* signal_pending */
+   mr  0, 4/* syscall number */
+   mr  3, 5/* syscall arguments */
+   mr  4, 6
+   mr  5, 7
+   mr  6, 8
+   mr  7, 9
+   mr  8, 10
+
+   /* This next sequence of code works in conjunction with the
+* rewind_if_safe_syscall_function(). If a signal is taken
+* and the interrupted PC is anywhere between 'safe_syscall_start'
+* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'.
+* The code sequence must therefore be able to cope with this, and
+* the syscall instruction must be the final one in the sequence.
+*/
+safe_syscall_start:
+   /* if signal_pending is non-zero, don't do the call */
+   lwz 12, 0(11)
+   cmpwi   0, 12, 0
+   bne-0f
+   sc
+safe_syscall_end:
+   /* code path when we did execute the syscall */
+   bnslr+
+
+   /* syscall failed; return negative errno */
+   neg 3, 3
+   blr
+
+   /* code path when we didn't execute the syscall */
+0: addi3, 0, -TARGET_ERESTARTSYS
+   blr
+   .cfi_endproc
+
+#if _CALL_ELF == 2
+   .size   safe_syscall_base, .-safe_syscall_base
+#else
+   .size   safe_syscall_base, .-.L.safe_syscall_base
+   .size   .L.safe_syscall_base, .-.L.safe_syscall_base
+#endif
-- 
2.1.4

Re: [Qemu-devel] Queries on dataplane mechanism

2016-06-24 Thread Stefan Hajnoczi

On Thu, Jun 23, 2016 at 08:56:34PM +0530, Gaurav Sharma wrote:
> Hi,
> I am trying to explore how the data plane mechanism works in QEMU. I
> understand the behavior of QEMU big lock. Can someone clarify the following
> w.r.t. to data plane :
> 
> 1. Currently only virtio-blk-pci and virtio-scsi-pci have data plane
> enabled ?

Yes.

> 2. From qemu 2.1.0 data plane is enabled by default.

No "enabled by default" would mean that existing QEMU command-lines
enable dataplane.  This is not the case.  You have to explicitly define
an iothread object and then associate a virtio-blk/virtio-scsi device
with it.

> I specify the
> following options in the command line to enable :
> -enable-kvm -drive if=none,id=drive1,file=file_name -object
> iothread,id=iothread2 -device
> virtio-blk-pci,id=drv0,drive=drive1,iothread=iothread2
> Is the above syntax correct ?

Yes.

> 3. What is the best possible scenario to test data plane ? Currently, I
> have a test set up wherein i have two different devices [dev1 and dev2]. If
> i process a write to dev1 which i made blocking by putting a sleep
> statement, will i be able to process write on dev2 ? My understanding is
> that as in case of dataplane we have a different event loop, i should be
> able to process write on dev2. Is this correct ?

Dataplane improves scalability for high IOPS workloads when there are
multiple disks.

You do not need to modify any code in order to benchmark dataplane.  Run
fio inside an SMP 4 guest with 4 disks (you can use the host Linux
kernel's null_blk driver) and you should find that QEMU without
dataplane has lower iops.  The difference should become clear around 4
or 8 vcpus/disks.

Stefan

signature.asc
Description: PGP signature

[Qemu-devel] [PULL 23/24] linux-user: Provide safe_syscall for s390x

2016-06-24 Thread riku . voipio

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Riku Voipio 
---
 linux-user/host/s390x/hostdep.h  | 23 
 linux-user/host/s390x/safe-syscall.inc.S | 90 
 2 files changed, 113 insertions(+)
 create mode 100644 linux-user/host/s390x/safe-syscall.inc.S

diff --git a/linux-user/host/s390x/hostdep.h b/linux-user/host/s390x/hostdep.h
index 7609bf5..e95871c 100644
--- a/linux-user/host/s390x/hostdep.h
+++ b/linux-user/host/s390x/hostdep.h
@@ -12,4 +12,27 @@
 #ifndef QEMU_HOSTDEP_H
 #define QEMU_HOSTDEP_H
 
+/* We have a safe-syscall.inc.S */
+#define HAVE_SAFE_SYSCALL
+
+#ifndef __ASSEMBLER__
+
+/* These are defined by the safe-syscall.inc.S file */
+extern char safe_syscall_start[];
+extern char safe_syscall_end[];
+
+/* Adjust the signal context to rewind out of safe-syscall if we're in it */
+static inline void rewind_if_in_safe_syscall(void *puc)
+{
+struct ucontext *uc = puc;
+unsigned long *pcreg = &uc->uc_mcontext.psw.addr;
+
+if (*pcreg > (uintptr_t)safe_syscall_start
+&& *pcreg < (uintptr_t)safe_syscall_end) {
+*pcreg = (uintptr_t)safe_syscall_start;
+}
+}
+
+#endif /* __ASSEMBLER__ */
+
 #endif
diff --git a/linux-user/host/s390x/safe-syscall.inc.S 
b/linux-user/host/s390x/safe-syscall.inc.S
new file mode 100644
index 000..f1b446a
--- /dev/null
+++ b/linux-user/host/s390x/safe-syscall.inc.S
@@ -0,0 +1,90 @@
+/*
+ * safe-syscall.inc.S : host-specific assembly fragment
+ * to handle signals occurring at the same time as system calls.
+ * This is intended to be included by linux-user/safe-syscall.S
+ *
+ * Written by Richard Henderson 
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+   .global safe_syscall_base
+   .global safe_syscall_start
+   .global safe_syscall_end
+   .type   safe_syscall_base, @function
+
+   /* This is the entry point for making a system call. The calling
+* convention here is that of a C varargs function with the
+* first argument an 'int *' to the signal_pending flag, the
+* second one the system call number (as a 'long'), and all further
+* arguments being syscall arguments (also 'long').
+* We return a long which is the syscall's return value, which
+* may be negative-errno on failure. Conversion to the
+* -1-and-errno-set convention is done by the calling wrapper.
+*/
+safe_syscall_base:
+   .cfi_startproc
+   stmg%r6,%r15,48(%r15)   /* save all call-saved registers */
+   .cfi_offset %r15,-40
+   .cfi_offset %r14,-48
+   .cfi_offset %r13,-56
+   .cfi_offset %r12,-64
+   .cfi_offset %r11,-72
+   .cfi_offset %r10,-80
+   .cfi_offset %r9,-88
+   .cfi_offset %r8,-96
+   .cfi_offset %r7,-104
+   .cfi_offset %r6,-112
+   lgr %r1,%r15
+   lg  %r0,8(%r15) /* load eos */
+   aghi%r15,-160
+   .cfi_adjust_cfa_offset 160
+   stg %r1,0(%r15) /* store back chain */
+   stg %r0,8(%r15) /* store eos */
+
+   /* The syscall calling convention isn't the same as the
+* C one:
+* we enter with r2 == *signal_pending
+*   r3 == syscall number
+*   r4, r5, r6, (stack) == syscall arguments
+*   and return the result in r2
+* and the syscall instruction needs
+*   r1 == syscall number
+*   r2 ... r7 == syscall arguments
+*   and returns the result in r2
+* Shuffle everything around appropriately.
+*/
+   lgr %r8,%r2 /* signal_pending pointer */
+   lgr %r1,%r3 /* syscall number */
+   lgr %r2,%r4 /* syscall args */
+   lgr %r3,%r5
+   lgr %r4,%r6
+   lmg %r5,%r7,320(%r15)
+
+   /* This next sequence of code works in conjunction with the
+* rewind_if_safe_syscall_function(). If a signal is taken
+* and the interrupted PC is anywhere between 'safe_syscall_start'
+* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'.
+* The code sequence must therefore be able to cope with this, and
+* the syscall instruction must be the final one in the sequence.
+*/
+safe_syscall_start:
+   /* if signal_pending is non-zero, don't do the call */
+   lt  %r0,0(%r8)
+   jne 2f
+   svc 0
+safe_syscall_end:
+
+1: lg  %r15,0(%r15)/* load back chain */
+   .cfi_remember_state
+   .cfi_adjust_cfa_offset -160
+   lmg %r6,%r15,48(%r15)   /* load saved registers */
+   br  %r14
+   .cfi_restore_state
+2: lghi%r2, -TARGET_ERESTARTSYS
+   j   1b
+   .cfi_endp

Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] blockdev: Add dynamic generation of module_block.h

2016-06-24 Thread Stefan Hajnoczi

On Wed, Jun 22, 2016 at 05:35:53PM -0400, Colin Lord wrote:
> +def print_top(fheader):
> +fheader.write('''/* AUTOMATICALLY GENERATED, DO NOT MODIFY */
> +/*
> + * QEMU Block Module Infrastructure
> + *
> + * Copyright Red Hat, Inc. 2015
> + *
> + * Authors:
> + *  Marc Mari   
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */

The copyright and license doesn't make sense.  qapi.py and tracetool.py
also do not emit a copyright header for generated code.


signature.asc
Description: PGP signature

Re: [Qemu-devel] Qemu and heavily increased RSS usage

2016-06-24 Thread Dr. David Alan Gilbert

* Peter Lieven (p...@kamp.de) wrote:
> Am 24.06.2016 um 11:37 schrieb Stefan Hajnoczi:
> > On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
> >> On 22 June 2016 at 20:55, Peter Lieven  wrote:
> >>> What makes the coroutine pool memory intensive is the stack size of 1MB 
> >>> per
> >>> coroutine. Is it really necessary to have such a big stack?
> >> That reminds me that I was wondering if we should allocate
> >> our coroutine stacks with MAP_GROWSDOWN (though if we're
> >> not actually using 1MB of stack then it's only going to
> >> be eating virtual memory, not necessarily real memory.)
> > Yes, MAP_GROWSDOWN will not reduce RSS.
> 
> Yes, I can confirm just tested...
> 
> >
> > It's possible that we can reduce RSS usage of the coroutine pool but it
> > will require someone to profile the pool usage patterns.
> 
> It would be interesting to see what stack size we really need. Is it possible
> to automatically detect this value (at compile time?)
> 
> I can also confirm that the coroutine pool is the second major RSS user beside
> heap fragmentation.

But is it there stack? You said you tried marking GROWSDOWN, so can you check
 /proc/../smaps and see how much of the Rss is the growsdown space?

Dave

> Lowering the mmap threshold of malloc to about 32k also gives good results.
> In this case there are very few active mappings in the running vServer, but 
> the
> RSS is still at about 50MB (without coroutine pool). Maybe it would be good
> to identify which parts of Qemu malloc lets say >16kB and convert them to mmap
> if it is feasible.
> 
> Peter
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] Qemu and heavily increased RSS usage

2016-06-24 Thread Peter Maydell

On 24 June 2016 at 10:37, Stefan Hajnoczi  wrote:
> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>> On 22 June 2016 at 20:55, Peter Lieven  wrote:
>> > What makes the coroutine pool memory intensive is the stack size of 1MB per
>> > coroutine. Is it really necessary to have such a big stack?
>>
>> That reminds me that I was wondering if we should allocate
>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>> not actually using 1MB of stack then it's only going to
>> be eating virtual memory, not necessarily real memory.)
>
> Yes, MAP_GROWSDOWN will not reduce RSS.

Right, but then the 1MB of stack as currently allocated isn't
going to be affecting RSS either I would have thought (except
transiently, since we zero it on allocation which will
bring it into the RSS until it falls back out again
because we don't touch it after that).

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 0/2] small fix of block job

2016-06-24 Thread Stefan Hajnoczi

On Thu, Jun 23, 2016 at 04:57:19PM +0800, Changlong Xie wrote:
> V2
> p1: put assert(cb) in block_job_create
> 
> Changlong Xie (2):
>   blockjob: assert(cb) when create job
>   mirror: fix misleading comments
> 
>  block/backup.c | 1 -
>  block/mirror.c | 2 +-
>  blockjob.c | 1 +
>  3 files changed, 2 insertions(+), 2 deletions(-)
> 
> -- 
> 1.9.3
> 
> 
> 
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation

2016-06-24 Thread Mark Cave-Ayland


On 24/06/16 04:48, Richard Henderson wrote:


I was unhappy about the complexity of the second try.

Better to convert to normal temps, allowing in rare
occasions, spilling the "globals" to the stack in order
to satisfy register allocation.

I can no longer provoke an allocation failure on i686.
Hopefully this fixes the OpenBSD case that Mark mentioned
re the second attempt.


r~


Richard Henderson (9):
  tcg: Fix name for high-half register
  tcg: Optimize spills of constants
  tcg: Require liveness analysis
  tcg: Compress liveness data to 16 bits
  tcg: Reorg TCGOp chaining
  tcg: Fold life data into TCGOp
  tcg: Compress dead_temps and mem_temps into a single array
  tcg: Include liveness info in the dumps
  tcg: Lower indirect registers in a separate pass

 include/exec/gen-icount.h|   2 +-
 include/qemu/log.h   |   1 +
 tcg/aarch64/tcg-target.inc.c |  10 +
 tcg/arm/tcg-target.inc.c |   6 +
 tcg/i386/tcg-target.inc.c|  21 +-
 tcg/ia64/tcg-target.inc.c|  10 +
 tcg/mips/tcg-target.inc.c|  10 +
 tcg/optimize.c   |  37 +--
 tcg/ppc/tcg-target.inc.c |   6 +
 tcg/s390/tcg-target.inc.c|   6 +
 tcg/sparc/tcg-target.inc.c   |  10 +
 tcg/tcg-op.c |   2 +-
 tcg/tcg.c| 690 ---
 tcg/tcg.h|  50 ++--
 tcg/tci/tcg-target.inc.c |   6 +
 util/log.c   |   5 +-
 16 files changed, 563 insertions(+), 309 deletions(-)


Hi Richard,

I gave this patchset a quick test, and I see no regressions with most of 
the images, however while the OpenBSD 5.5 boot gets further I still see 
(a different type) of panic:



$ ./qemu-system-sparc64 -cdrom /home/build/install55.iso -boot d -nographic
OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: ----
Welcome to OpenBIOS v1.1 built on Apr 18 2016 08:20
  Type 'help' for detailed information
Trying cdrom:f...
Not a bootable ELF image
Not a bootable a.out image

Loading FCode image...
Loaded 4829 bytes
entry point is 0x4000
OpenBSD IEEE 1275 Bootblock 1.3
..
Jumping to entry point 0010 for type 0001...
switching to new context: entry point 0x10 stack 0xffe84a09
>> OpenBSD BOOT 1.6
Trying bsd...
open /pci@1fe,0/pci-ata@5/ide1@8200/cdrom@0:f/etc/random.seed: No such 
file or directory

Booting /pci@1fe,0/pci-ata@5/ide1@8200/cdrom@0:f/bsd
3901336@0x100+6248@0x13b8798+3261984@0x180+932320@0x1b1c620
symbols @ 0xffc5a300 119 start=0x100

Unexpected client interface exception: -1
console is /pci@1fe,0/ebus@3/su
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2014 OpenBSD. All rights reserved. 
http://www.OpenBSD.org


OpenBSD 5.5 (RAMDISK) #153: Tue Mar  4 15:12:10 MST 2014
dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/RAMDISK
real mem = 134217728 (128MB)
avail mem = 122011648 (116MB)
mainbus0 at root: OpenBiosTeam,OpenBIOS
cpu0 at mainbus0: SUNW,UltraSPARC-IIi (rev 9.1) @ 100 MHz
cpu0: physical 256K instruction (64 b/l), 16K data (32 b/l), 256K 
external (64 b/l)

psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-2, PCI bus 0
psycho0: dvma map c000-dfff
pci0 at psycho0
ppb0 at pci0 dev 1 function 0 "Sun Simba" rev 0x11
pci1 at ppb0 bus 1
ppb1 at pci0 dev 1 function 1 "Sun Simba" rev 0x11
pci2 at ppb1 bus 2
unknown vendor 0x1234 product 0x (class display subclass VGA, rev 
0x02) at pci0 dev 2 function 0 not configured

ebus0 at pci0 dev 3 function 0 "Sun PCIO EBus2" rev 0x01
clock1 at ebus0 addr 2000-3fff: mk48t59
"fdthree" at ebus0 addr 0- not configured
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
"kb_ps2" at ebus0 addr 60-67 not configured
"Realtek 8029" rev 0x00 at pci0 dev 4 function 0 not configured
pciide0 at pci0 dev 5 function 0 "CMD Technology PCI0646" rev 0x07: DMA, 
channel 0 configured to native-PCI, channel 1 configured to native-PCI

pciide0: using ivec 0x7d4 for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0:  ATAPI 5/cdrom 
removable

cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
softraid0 at root
scsibus1 at softraid0: 256 targets
bootpath: /pci@1fe,0/pci-ata@5,0/ide1@8200,0/cdrom@0,0:f
root on rd0a swap on rd0b dump on rd0b
erase ^?, weraseqemu: fatal: Trap 0x0030 while trap level (5) >= MAXTL 
(5), Error state

pc: 01011d9c  npc: 01011da0
%g0-3:  fc00d9d49470  
%g4-7: fc00d9d48000 0400e0017619 0400e0017619 0017
%o0-3: 0400e0016000 0030  
%o4-7: 0400e0016000 082d 0400062ce391 01011d9c
%l0-3:

Re: [Qemu-devel] [Qemu-block] [PATCH 3/3] blockdev: Add dynamic module loading for block drivers

2016-06-24 Thread Stefan Hajnoczi

On Wed, Jun 22, 2016 at 05:35:54PM -0400, Colin Lord wrote:
> +for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
> +if (!strcmp(block_driver_modules[i].format_name, format_name)) {
> +block_module_load_one(block_driver_modules[i].library_name);
> +/* Copying code is not nice, but this way the current discovery 
> is
> + * not modified. Calling recursively could fail if the library
> + * has been deleted.
> + */
> +QLIST_FOREACH(drv1, &bdrv_drivers, list) {
> +if (!strcmp(drv1->format_name, format_name)) {
> +return drv1;
> +}
> +}
> +}
> +}
> +
>  return NULL;
>  }

No recursion:

static BlockDriver *bdrv_do_find_format(const char *format_name)
{
BlockDriver *drv1;

QLIST_FOREACH(drv1, &bdrv_drivers, list) {
if (!strcmp(drv1->format_name, format_name)) {
return drv1;
}
}

return NULL;
}

BlockDriver *bdrv_find_format(const char *format_name)
{
BlockDriver *drv;
size_t i;

drv = bdrv_do_find_format(format_name);
if (drv) {
return drv;
}

/* The driver isn't registered, maybe we need to load a module */
for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
if (!strcmp(block_driver_modules[i].format_name, format_name)) {
block_module_load_one(block_driver_modules[i].library_name);
break;
}
}
return bdrv_do_find_format(format_name);
}

>  
> @@ -447,8 +466,15 @@ int get_tmp_filename(char *filename, int size)
>  static BlockDriver *find_hdev_driver(const char *filename)
>  {
>  int score_max = 0, score;
> +size_t i;
>  BlockDriver *drv = NULL, *d;
>  
> +for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
> +if (block_driver_modules[i].has_probe_device) {
> +block_module_load_one(block_driver_modules[i].library_name);
> +}
> +}

This patch series needs to solve probing so that we don't end up loading
all block drivers.  Fam's suggestion for a built-in probe.c sounds good
to me.


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] arm: Re-enable tmp105 test

2016-06-24 Thread Peter Maydell

On 24 June 2016 at 10:25, Thomas Huth  wrote:
> The tmp105 test is currently not executed since the following
> line in the Makefile overwrites the check-qtest-arm-y variable
> instead of extending it.
>
> Signed-off-by: Thomas Huth 
> ---


Applied to target-arm.next, thanks.

-- PMM

Re: [Qemu-devel] [Qemu-block] [PATCH 3/3] blockdev: Add dynamic module loading for block drivers

2016-06-24 Thread Daniel P. Berrange

On Fri, Jun 24, 2016 at 11:04:43AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 22, 2016 at 05:35:54PM -0400, Colin Lord wrote:
> 
> >  
> > @@ -447,8 +466,15 @@ int get_tmp_filename(char *filename, int size)
> >  static BlockDriver *find_hdev_driver(const char *filename)
> >  {
> >  int score_max = 0, score;
> > +size_t i;
> >  BlockDriver *drv = NULL, *d;
> >  
> > +for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
> > +if (block_driver_modules[i].has_probe_device) {
> > +block_module_load_one(block_driver_modules[i].library_name);
> > +}
> > +}
> 
> This patch series needs to solve probing so that we don't end up loading
> all block drivers.  Fam's suggestion for a built-in probe.c sounds good
> to me.

Do we really care if probing loads all drivers ? Last time we discussed
this I thought we decided that because probing almost always leads to
security vulnerabilities, no one should use it by default and so we
don't really need to worry about optimizing it.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PULL 5/8] target-sparc: Use global registers for the register window

2016-06-24 Thread Mark Cave-Ayland


On 24/06/16 07:36, Paolo Bonzini wrote:


On 24/06/2016 05:57, Richard Henderson wrote:


Whatever happens, it happens after 10GB of logs, which is simply too
much to sift through.  I've tried to narrow it down, but the lack of a
hardware tlb refill means that we get hundreds of thousands of Data
Access Faults that are simply TLB misses and not the actual Segmentation
Fault in question.

It doesn't seem to affect other OSes, so I can't imagine what quirk is
being exercised in this case.

As loath as I am to suggest it, we may have to revert the sparc indirect
register patch for the release.


We have more than a month.  If it's reproducible, it can be fixed. :)


I do now ping the rest of my sparc improvements patchset.  It's
completely independent of the use of indirect registers.


Mark, perhaps you can try to use migration to reduce the amount of
logging?  (Start QEMU with -snapshot, try to stop the vm before it
fails.  If you succeed, do a "migrate exec:cat>foo.sav" followed by
"commit"; if you fail, try again).


Yeah, given the improvements that Richard has made, I'd prefer not to 
revert if at all possible. Finally I have some spare time today so I'll 
try and get this down to an easily-testable qcow2 image that can 
reproduce the issue.



ATB,

Mark.

Re: [Qemu-devel] Qemu and heavily increased RSS usage

2016-06-24 Thread Peter Lieven

Am 24.06.2016 um 11:58 schrieb Peter Maydell:
> On 24 June 2016 at 10:37, Stefan Hajnoczi  wrote:
>> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>>> On 22 June 2016 at 20:55, Peter Lieven  wrote:
 What makes the coroutine pool memory intensive is the stack size of 1MB per
 coroutine. Is it really necessary to have such a big stack?
>>> That reminds me that I was wondering if we should allocate
>>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>>> not actually using 1MB of stack then it's only going to
>>> be eating virtual memory, not necessarily real memory.)
>> Yes, MAP_GROWSDOWN will not reduce RSS.
> Right, but then the 1MB of stack as currently allocated isn't
> going to be affecting RSS either I would have thought (except
> transiently, since we zero it on allocation which will
> bring it into the RSS until it falls back out again
> because we don't touch it after that).

What I observe regarding the coroutine pool is really strange. Under I/O load
while booting the vServer the RSS size is low as expected. If the vServer runs
for some time the RSS size suddenly explodes as if suddenly all the stack 
memory gets
mapped. This symptom definetely goes away if I disable the pool.

Regarding the coroutine pool I had the following thoughts:
 - mmap the stack so its actually really freed if the coroutine is deleted 
(with MAP_GROWSDOWN or not?)
 - drop the release_pool. It has actually only an effect for non virtio devices 
where the coroutine is
   not created and deleted in the same thread. But for virtio the release pool 
has the drawback that there
   is always a pingpong between the release_pool and the alloc_pool.
 - implement some kind of garbage collector that detects that a threads 
alloc_pool is actually to big (e.g. it
   stays above a watermark for some time) and then reduce its size.
 - detect that a coroutine was created in a vcpu thread (e.g. IDE) and released 
in the iothread. In this case
   don't add it to the pool so the alloc_pool of the I/O thread does not grow 
to max without being used.

Peter

[Qemu-devel] [PATCH v3 1/1] cpu: report hyperv feature words through qom

2016-06-24 Thread Denis V. Lunev

From: Evgeny Yakovlev 

This change adds hyperv feature words report through qom rpc.

When VM is configured with hyperv features enabled
libvirt will check that required feature words are set
in cpuid leaf 4003 through qom request.

Currently qemu does not report hyperv feature words
which prevents windows guests from starting with libvirt.

To avoid conflicting with current hyperv properties all added feature
words cannot be set directly with -cpu +feature yet.

Signed-off-by: Evgeny Yakovlev 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: Marcelo Tosatti 
---
Changes from v2:
- removed all hyperv feature_names so that we don't have two ways of
  setting CPUID bits

Changes from v1:
- renamed hyperv features so they don't conflict with hyperv properties
- refactored kvm_arch_init_vcpu to process hyperv props into feature words

target-i386/cpu.c |  53 +
 target-i386/cpu.h |   3 ++
 target-i386/kvm.c | 114 +++---
 3 files changed, 122 insertions(+), 48 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 3bd3cfc..d07da78 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -245,6 +245,47 @@ static const char *kvm_feature_name[] = {
 NULL, NULL, NULL, NULL,
 };
 
+static const char *hyperv_priv_feature_name[] = {
+NULL /* hv_msr_vp_runtime_access */, NULL /* hv_msr_time_refcount_access 
*/,
+NULL /* hv_msr_synic_access */, NULL /* hv_msr_stimer_access */,
+NULL /* hv_msr_apic_access */, NULL /* hv_msr_hypercall_access */,
+NULL /* hv_vpindex_access */, NULL /* hv_msr_reset_access */,
+NULL /* hv_msr_stats_access */, NULL /* hv_reftsc_access */,
+NULL /* hv_msr_idle_access */, NULL /* hv_msr_frequency_access */,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
+static const char *hyperv_ident_feature_name[] = {
+NULL /* hv_create_partitions */, NULL /* hv_access_partition_id */,
+NULL /* hv_access_memory_pool */, NULL /* hv_adjust_message_buffers */,
+NULL /* hv_post_messages */, NULL /* hv_signal_events */,
+NULL /* hv_create_port */, NULL /* hv_connect_port */,
+NULL /* hv_access_stats */, NULL, NULL, NULL /* hv_debugging */,
+NULL /* hv_cpu_power_management */, NULL /* hv_configure_profiler */,
+NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
+static const char *hyperv_misc_feature_name[] = {
+NULL /* hv_mwait */, NULL /* hv_guest_debugging */,
+NULL /* hv_perf_monitor */, NULL /* hv_cpu_dynamic_part */,
+NULL /* hv_hypercall_params_xmm */, NULL /* hv_guest_idle_state */,
+NULL, NULL,
+NULL, NULL, NULL /* hv_guest_crash_msr */, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
 static const char *svm_feature_name[] = {
 "npt", "lbrv", "svm_lock", "nrip_save",
 "tsc_scale", "vmcb_clean",  "flushbyasid", "decodeassists",
@@ -411,6 +452,18 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .cpuid_eax = KVM_CPUID_FEATURES, .cpuid_reg = R_EAX,
 .tcg_features = TCG_KVM_FEATURES,
 },
+[FEAT_HYPERV_EAX] = {
+.feat_names = hyperv_priv_feature_name,
+.cpuid_eax = 0x4003, .cpuid_reg = R_EAX,
+},
+[FEAT_HYPERV_EBX] = {
+.feat_names = hyperv_ident_feature_name,
+.cpuid_eax = 0x4003, .cpuid_reg = R_EBX,
+},
+[FEAT_HYPERV_EDX] = {
+.feat_names = hyperv_misc_feature_name,
+.cpuid_eax = 0x4003, .cpuid_reg = R_EDX,
+},
 [FEAT_SVM] = {
 .feat_names = svm_feature_name,
 .cpuid_eax = 0x800A, .cpuid_reg = R_EDX,
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index d9ab884..4496c8b 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -440,6 +440,9 @@ typedef enum FeatureWord {
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_C000_0001_EDX, /* CPUID[C000_0001].EDX */
 FEAT_KVM,   /* CPUID[4000_0001].EAX (KVM_CPUID_FEATURES) */
+FEAT_HYPERV_EAX,/* CPUID[4000_0003].EAX */
+FEAT_HYPERV_EBX,/* CPUID[4000_0003].EBX */
+FEAT_HYPERV_EDX,/* CPUID[4000_0003].EDX */
 FEAT_SVM,   /* CPUID[8000_000A].EDX */
 FEAT_XSAVE, /* CPUID[EAX=0xd,ECX=1].EAX */
 FEAT_6_EAX, /* CPUID[6].EAX */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f3698f1..f44e3c7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -574,6 +574,66 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
 return 0;
 }
 
+static int hyperv_handle_properties(CPUState *cs)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+if (cpu->hyperv_relaxed_timing) {
+env->features[FEAT_HYPERV_EAX

Re: [Qemu-devel] [PATCH v3 3/3] palmetto-bmc: Configure the SCU's hardware strapping register

2016-06-24 Thread Cédric Le Goater

On 06/24/2016 06:58 AM, Andrew Jeffery wrote:
> The magic constant configures the following options:
> 
> * 28:27: Configure DRAM size as 256MB
> * 26:24: DDR3 SDRAM with CL = 6, CWL = 5
> * 23: Configure 24/48MHz CLKIN
> * 22: Disable GPIOE pass-through mode
> * 21: Disable GPIOD pass-through mode
> * 20: Enable LPC decode of SuperIO 0x2E/0x4E addresses
> * 19: Disable ACPI
> * 18: Configure 48MHz CLKIN
> * 17: Disable BMC 2nd boot watchdog timer
> * 16: Decode SuperIO address 0x2E
> * 15: VGA Class Code
> * 14: Enable LPC dedicated reset pin
> * 13:12: Enable SPI Master and SPI Slave to AHB Bridge
> * 11:10: Select CPU:AHB ratio = 2:1
> * 9:8: Select 384MHz H-PLL
> * 7: Configure MAC#2 for RMII/NCSI
> * 6: Configure MAC#1 for RMII/NCSI
> * 5: No VGA BIOS ROM
> * 4: Boot using 32bit SPI address mode
> * 3:2: Select 16MB VGA memory
> * 1:0: Boot from SPI flash memory

As the previous patchset was not setting the silicon version, we did not 
see that should be skipping the SPI Flash checks in U-Boot. So, with this 
little change :

-object_property_set_int(OBJECT(&bmc->soc), 0x120CE416, "hw-strap1",
+object_property_set_int(OBJECT(&bmc->soc), 0x120CE414, "hw-strap1",

We reach a point where U-Boot is ready to load a kernel :

U-Boot 2013.07 (Jun 07 2016 - 01:33:14)

I2C:   ready
DRAM:  256 MiB
WARNING: Caches not enabled
Flash: SPI Flash ID: 19ba20 
32 MiB
In:serial
Out:   serial
Err:   serial
H/W:   AST2400 series chip Rev. 00 
Watchdog: 300s
Net:   aspeednic#0
Warning: aspeednic#0 (eth0) using random MAC address - aa:36:26:50:30:9c

Hit any key to stop autoboot:  0 
boot#


I will see if it is worth adding what is required in the SMC controller to 
support this. Else, we have the quick solution above. 

Tested-by: Cédric Le Goater 

Thanks,

C.  

 
> Signed-off-by: Andrew Jeffery 
> Reviewed-by: Cédric Le Goater 
> Reviewed-by: Peter Maydell 
> ---
>  hw/arm/palmetto-bmc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/arm/palmetto-bmc.c b/hw/arm/palmetto-bmc.c
> index a51d960510ee..b8eed21348d8 100644
> --- a/hw/arm/palmetto-bmc.c
> +++ b/hw/arm/palmetto-bmc.c
> @@ -44,6 +44,8 @@ static void palmetto_bmc_init(MachineState *machine)
>  &bmc->ram);
>  object_property_add_const_link(OBJECT(&bmc->soc), "ram", 
> OBJECT(&bmc->ram),
> &error_abort);
> +object_property_set_int(OBJECT(&bmc->soc), 0x120CE416, "hw-strap1",
> +&error_abort);
>  object_property_set_bool(OBJECT(&bmc->soc), true, "realized",
>   &error_abort);
>  
>

Re: [Qemu-devel] [PATCH v2 3/3] palmetto-bmc: Configure the SCU's hardware strapping register

2016-06-24 Thread Peter Maydell

On 24 June 2016 at 03:21, Andrew Jeffery  wrote:
> On Thu, 2016-06-23 at 18:39 +0100, Peter Maydell wrote:
>> On 23 June 2016 at 03:15, Andrew Jeffery  wrote:
>> >
>> > The magic constant configures the following options:
>> >
>> > * 28:27: Configure DRAM size as 256MB
>> > * 26:24: DDR3 SDRAM with CL = 6, CWL = 5
>> > * 23: Configure 24/48MHz CLKIN
>> > * 22: Disable GPIOE pass-through mode
>> > * 21: Disable GPIOD pass-through mode
>> > * 20: Enable LPC decode of SuperIO 0x2E/0x4E addresses
>> > * 19: Disable ACPI
>> > * 18: Configure 48MHz CLKIN
>> > * 17: Disable BMC 2nd boot watchdog timer
>> > * 16: Decode SuperIO address 0x2E
>> > * 15: VGA Class Code
>> > * 14: Enable LPC dedicated reset pin
>> > * 13:12: Enable SPI Master and SPI Slave to AHB Bridge
>> > * 11:10: Select CPU:AHB ratio = 2:1
>> > * 9:8: Select 384MHz H-PLL
>> > * 7: Configure MAC#2 for RMII/NCSI
>> > * 6: Configure MAC#1 for RMII/NCSI
>> > * 5: No VGA BIOS ROM
>> > * 4: Boot using 32bit SPI address mode
>> > * 3:2: Select 16MB VGA memory
>> > * 1:0: Boot from SPI flash memory
>> Maybe we should say this in a comment in the code?
>
> The list describes our specific value choices in the register's
> bitfields rather than fully documenting the bitfields and values. If
> you prefer I could switch to the latter and make it a comment, but
> failing that my only thought was if we tweaked the value the comment
> maybe come out of sync. By putting our choices in the commit message
> the description is at least accurate for what was configured at the
> time.

I'd just like some idea of where the magic number comes
from. At the moment the source code doesn't even have a
reference to a data sheet that would indicate where to look.

thanks
-- PMM

[Qemu-devel] [PATCH] resize qcow2 with snapshot

2016-06-24 Thread zhangzhiming

qcow2 can’t be resized while there is a snapshot in qcow2 image but in version 
3 image of qcow2,
each disk size of snapshot is stored in the image, so we can resize image even 
through this are some snapshots 
since version 3 image. the function of resize nearly finished by others,  i 
just do very little job.


Signed-off-by: zhangzhiming mailto:zhangzhimin...@meituan.com>>

---
 block.c  |   19 +++
 block/qcow2-cluster.c|   29 +
 block/qcow2-snapshot.c   |   37 +
 block/qcow2.c|   29 -
 block/qcow2.h|1 +
 include/block/snapshot.h |1 +
 6 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index f54bc25..0de7b2d 100644
--- a/block.c
+++ b/block.c
@@ -2586,6 +2586,25 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
 return ret;
 }
 
+int bdrv_apply_snapshot(BlockDriverState *bs, const char *snapshot_id,
+uint64_t snapshot_size)
+{
+int ret = bdrv_snapshot_goto(bs, snapshot_id);
+if (ret < 0) {
+return ret;
+}
+
+ret = refresh_total_sectors(bs, snapshot_size >> BDRV_SECTOR_BITS);
+if (ret < 0) {
+return ret;
+}
+bdrv_dirty_bitmap_truncate(bs);
+if (bs->blk) {
+blk_dev_resize_cb(bs->blk);
+}
+return ret;
+}
+
 /**
  * Length of a allocated file in bytes. Sparse files are counted by actual
  * allocated space. Return < 0 if error or unknown.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b04bfaf..ebadddf 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,35 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int shrink_l1_table(BlockDriverState *bs, int64_t new_l1_size)
+{
+BDRVQcow2State *s = bs->opaque;
+int64_t old_l1_size = s->l1_size;
+s->l1_size = new_l1_size;
+int ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
+ s->l1_size, 1);
+if (ret < 0) {
+return ret;
+}
+
+s->l1_size = old_l1_size;
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
+ s->l1_size, -1);
+if (ret < 0) {
+return ret;
+}
+s->l1_size = new_l1_size;
+
+uint32_t be_l1_size = cpu_to_be32(s->l1_size);
+ret = bdrv_pwrite_sync(bs->file->bs, offsetof(QCowHeader, l1_size),
+   &be_l1_size, sizeof(be_l1_size));
+if (ret < 0) {
+return ret;
+}
+
+return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size)
 {
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 242fb21..6dd6769 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -478,13 +478,6 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }
 sn = &s->snapshots[snapshot_index];
 
-if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
-error_report("qcow2: Loading snapshots with different disk "
-"size is not implemented");
-ret = -ENOTSUP;
-goto fail;
-}
-
 /*
  * Make sure that the current L1 table is big enough to contain the whole
  * L1 table of the snapshot. If the snapshot L1 table is smaller, the
@@ -506,8 +499,8 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
  * Decrease the refcount referenced by the old one only when the L1
  * table is overwritten.
  */
-sn_l1_table = g_try_malloc0(cur_l1_bytes);
-if (cur_l1_bytes && sn_l1_table == NULL) {
+sn_l1_table = g_try_malloc0(sn_l1_bytes);
+if (sn_l1_bytes && sn_l1_table == NULL) {
 ret = -ENOMEM;
 goto fail;
 }
@@ -525,17 +518,34 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }
 
 ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L1,
-s->l1_table_offset, cur_l1_bytes);
+s->l1_table_offset, sn_l1_bytes);
 if (ret < 0) {
 goto fail;
 }
 
 ret = bdrv_pwrite_sync(bs->file->bs, s->l1_table_offset, sn_l1_table,
-   cur_l1_bytes);
+   sn_l1_bytes);
+if (ret < 0) {
+goto fail;
+}
+
+/* write updated header.size */
+uint64_t be_disk_size = cpu_to_be64(sn->disk_size);
+ret = bdrv_pwrite_sync(bs->file->bs, offsetof(QCowHeader, size),
+   &be_disk_size, sizeof(uint64_t));
 if (ret < 0) {
 goto fail;
 }
 
+uint32_t be_l1_size = cpu_to_be32(sn->l1_size);
+ret = bdrv_pwrite_sync(bs->file->bs, offsetof(QCowHeader, l1_size),
+   &be_l1_size, sizeof(be_l1_size));
+if (ret < 0) {
+goto fail;
+}
+
+s->l1_vm_state_index = sn->l1_size;
+
 /*

Re: [Qemu-devel] Change of max-ram-below-4g initial value breaks Xen

2016-06-24 Thread Anthony PERARD

On Fri, Jun 24, 2016 at 07:46:23AM +0200, Gerd Hoffmann wrote:
> On Do, 2016-06-23 at 17:18 +0100, Anthony PERARD wrote:
> > On Thu, Jun 23, 2016 at 04:57:54PM +0200, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > How could xen_ram_init() find out if the value of max-ram-below-4g is
> > > > the default or if a user have set it? Is there another way we could fix
> > > > this?
> > > 
> > > Attached patch should fix it.  Patch survived a quick smoke test on kvm
> > > so far, need to do some more testing tomorrow.  Can you give it a spin
> > > on xen?
> > 
> > Thanks. Unfortunately, it does not work :(.
> > 
> > In this patch, max_ram_below_4g is set before the call to xen_ram_init()
> > and xen_ram_init read it back (via object_property_get_int()).  So, in
> > xen_ram_init, user_lowmem is not 0.
> 
> Ah, I see.  We do the split calculation twice on xen.  That is pretty
> pointless.  New patch attached.

I've tested on Xen, it works fine. Thanks. Also, the patch look good.

Cheers,

-- 
Anthony PERARD

Re: [Qemu-devel] Question about a qemu Aarch64 error when adding several SCSI disks

2016-06-24 Thread Cole Robinson

On 06/24/2016 12:58 AM, Kevin Zhao wrote:
> Hi Peter,
>  Follow your advice, I have complied the Qemu v2.6.
> stack@u202158:~$ kvm --version
> QEMU emulator version 2.6.50 (v2.6.0-1280-g6f1d2d1-dirty), Copyright (c)
> 2003-2008 Fabrice Bellard
>  With this newest version, I use virt-manager to create the guest , the
> xml file is in the attachment. But the Qemu return error when creating:
>  *error: internal error: process exited while connecting to monitor:
> qemu-system-aarch64: -device
> pci-bridge,chassis_nr=2,id=pci,bus=pci,addr=0x1: Duplicate ID 'pci' for
> device*
> 

That's probably this libvirt issue fixed in 1.3.4 and later:

https://www.redhat.com/archives/libvirt-users/2016-April/msg00030.html

I suggest testing with libvirt.git as well, there's been aarch64 related
patches trickling in regularly

- Cole

>  The guest xml file in in attachment. But the XML worked when Qemu is
> v2.4.0.
>   Also I delete the items in the xml :
>   -  
>   -  
>   -
>   - function='0x0'/>
>   -  
>   -  
>   -
>   -
>   - function='0x0'/>
>   -  
>   Using virsh create guest.xml, got the error too :
>   *error: internal error: process exited while connecting to monitor:
> qemu-system-aarch64: -device
> pci-bridge,chassis_nr=2,id=pci,bus=pci,addr=0x1: Duplicate ID 'pci' for
> device.*
> My test machine is Softiron, with AMD* ARM64 *server CPU. The  libvirt
> version is 1.3.1
> 
>  Kindly need your help. You will be really appreciated :-)
>  Big Thanks~
>

[Qemu-devel] [PATCH] xen: fix ram init regression

2016-06-24 Thread Gerd Hoffmann

Commit "8156d48 pc: allow raising low memory via max-ram-below-4g
option" causes a regression on xen, because it uses a different
memory split.

This patch initializes max-ram-below-4g to zero and leaves the
initialization to the memory initialization functions.  That way
they can pick different default values (max-ram-below-4g is zero
still) or use the user supplied value (max-ram-below-4g is non-zero).

Also skip the whole ram split calculation on Xen.  xen_ram_init()
does its own split calculation anyway so it is superfluous, also
this way xen_ram_init can actually see whenever max-ram-below-4g
is zero or not.

Reported-by: Anthony PERARD 
Tested-by: Anthony PERARD 
Signed-off-by: Gerd Hoffmann 
---
 hw/i386/pc.c  |  2 +-
 hw/i386/pc_piix.c | 52 +---
 hw/i386/pc_q35.c  |  3 +++
 xen-hvm.c |  3 +++
 4 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7198ed5..66e1dae 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1886,7 +1886,7 @@ static void pc_machine_initfn(Object *obj)
 pc_machine_get_hotplug_memory_region_size,
 NULL, NULL, NULL, &error_abort);
 
-pcms->max_ram_below_4g = 0xe000; /* 3.5G */
+pcms->max_ram_below_4g = 0; /* use default */
 object_property_add(obj, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
 pc_machine_get_max_ram_below_4g,
 pc_machine_set_max_ram_below_4g,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 53bc968..f51fa77 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -108,37 +108,43 @@ static void pc_init1(MachineState *machine,
  *so legacy non-PAE guests can get as much memory as possible in
  *the 32bit address space below 4G.
  *
+ *  - Note that Xen has its own ram setp code in xen_ram_init(),
+ *called via xen_hvm_init().
+ *
  * Examples:
  *qemu -M pc-1.7 -m 4G(old default)-> 3584M low,  512M high
  *qemu -M pc -m 4G(new default)-> 3072M low, 1024M high
  *qemu -M pc,max-ram-below-4g=2G -m 4G -> 2048M low, 2048M high
  *qemu -M pc,max-ram-below-4g=4G -m 3968M  -> 3968M low (=4G-128M)
  */
-lowmem = pcms->max_ram_below_4g;
-if (machine->ram_size >= pcms->max_ram_below_4g) {
-if (pcmc->gigabyte_align) {
-if (lowmem > 0xc000) {
-lowmem = 0xc000;
-}
-if (lowmem & ((1ULL << 30) - 1)) {
-error_report("Warning: Large machine and max_ram_below_4g "
- "(%" PRIu64 ") not a multiple of 1G; "
- "possible bad performance.",
- pcms->max_ram_below_4g);
-}
-}
-}
-
-if (machine->ram_size >= lowmem) {
-pcms->above_4g_mem_size = machine->ram_size - lowmem;
-pcms->below_4g_mem_size = lowmem;
-} else {
-pcms->above_4g_mem_size = 0;
-pcms->below_4g_mem_size = machine->ram_size;
-}
-
 if (xen_enabled()) {
 xen_hvm_init(pcms, &ram_memory);
+} else {
+if (!pcms->max_ram_below_4g) {
+pcms->max_ram_below_4g = 0xe000; /* default: 3.5G */
+}
+lowmem = pcms->max_ram_below_4g;
+if (machine->ram_size >= pcms->max_ram_below_4g) {
+if (pcmc->gigabyte_align) {
+if (lowmem > 0xc000) {
+lowmem = 0xc000;
+}
+if (lowmem & ((1ULL << 30) - 1)) {
+error_report("Warning: Large machine and max_ram_below_4g "
+ "(%" PRIu64 ") not a multiple of 1G; "
+ "possible bad performance.",
+ pcms->max_ram_below_4g);
+}
+}
+}
+
+if (machine->ram_size >= lowmem) {
+pcms->above_4g_mem_size = machine->ram_size - lowmem;
+pcms->below_4g_mem_size = lowmem;
+} else {
+pcms->above_4g_mem_size = 0;
+pcms->below_4g_mem_size = machine->ram_size;
+}
 }
 
 pc_cpus_init(pcms);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e4b541f..1b653e2 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -93,6 +93,9 @@ static void pc_q35_init(MachineState *machine)
 /* Handle the machine opt max-ram-below-4g.  It is basically doing
  * min(qemu limit, user limit).
  */
+if (!pcms->max_ram_below_4g) {
+pcms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
+}
 if (lowmem > pcms->max_ram_below_4g) {
 lowmem = pcms->max_ram_below_4g;
 if (machine->ram_size - lowmem > lowmem &&
diff --git a/xen-hvm.c b/xen-hvm.c
index 98ea44f..eb57792 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -190,6 +190,9 @@ static void xen_ram_init(PCMachineState *pcms,
 /* Handle the m

Re: [Qemu-devel] [PATCH v2 3/3] palmetto-bmc: Configure the SCU's hardware strapping register

2016-06-24 Thread Cédric Le Goater

On 06/24/2016 12:55 PM, Peter Maydell wrote:
> On 24 June 2016 at 03:21, Andrew Jeffery  wrote:
>> On Thu, 2016-06-23 at 18:39 +0100, Peter Maydell wrote:
>>> On 23 June 2016 at 03:15, Andrew Jeffery  wrote:

 The magic constant configures the following options:

 * 28:27: Configure DRAM size as 256MB
 * 26:24: DDR3 SDRAM with CL = 6, CWL = 5
 * 23: Configure 24/48MHz CLKIN
 * 22: Disable GPIOE pass-through mode
 * 21: Disable GPIOD pass-through mode
 * 20: Enable LPC decode of SuperIO 0x2E/0x4E addresses
 * 19: Disable ACPI
 * 18: Configure 48MHz CLKIN
 * 17: Disable BMC 2nd boot watchdog timer
 * 16: Decode SuperIO address 0x2E
 * 15: VGA Class Code
 * 14: Enable LPC dedicated reset pin
 * 13:12: Enable SPI Master and SPI Slave to AHB Bridge
 * 11:10: Select CPU:AHB ratio = 2:1
 * 9:8: Select 384MHz H-PLL
 * 7: Configure MAC#2 for RMII/NCSI
 * 6: Configure MAC#1 for RMII/NCSI
 * 5: No VGA BIOS ROM
 * 4: Boot using 32bit SPI address mode
 * 3:2: Select 16MB VGA memory
 * 1:0: Boot from SPI flash memory
>>> Maybe we should say this in a comment in the code?
>>
>> The list describes our specific value choices in the register's
>> bitfields rather than fully documenting the bitfields and values. If
>> you prefer I could switch to the latter and make it a comment, but
>> failing that my only thought was if we tweaked the value the comment
>> maybe come out of sync. By putting our choices in the commit message
>> the description is at least accurate for what was configured at the
>> time.
> 
> I'd just like some idea of where the magic number comes
> from. At the moment the source code doesn't even have a
> reference to a data sheet that would indicate where to look.

Yes. I think the HW_STRAP1 register deserves its list of defines. 
There are some differences between the SOCs. With defines, it will 
be easier to build and read the values in the platform file. 

I can do that with a patch I will probably need to unset SPI boot.

Thanks,

C.

Re: [Qemu-devel] [PATCH v3 1/5] m25p80: qdev-ify drive property

2016-06-24 Thread Cédric Le Goater

On 06/23/2016 08:35 PM, Peter Maydell wrote:
> On 23 June 2016 at 17:33, Cédric Le Goater  wrote:
>> From: Paolo Bonzini 
>>
>> This allows specifying the property via -drive if=none and creating
>> the flash device with -device.
>>
>> Signed-off-by: Paolo Bonzini 
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  hw/arm/xilinx_zynq.c|  8 +++-
>>  hw/arm/xlnx-ep108.c |  9 -
>>  hw/block/m25p80.c   | 10 ++
>>  hw/microblaze/petalogix_ml605_mmu.c |  9 -
>>  4 files changed, 25 insertions(+), 11 deletions(-)
> 
> hw/arm/sabrelite.c also creates one of these devices and needs updating.

OK. I will extend Paolo's patch then.

Thanks, 

C.

Re: [Qemu-devel] [PULL 5/8] target-sparc: Use global registers for the register window

2016-06-24 Thread Paolo Bonzini



On 24/06/2016 12:42, Mark Cave-Ayland wrote:
> On 24/06/16 07:36, Paolo Bonzini wrote:
> 
>> On 24/06/2016 05:57, Richard Henderson wrote:
>>>
>>> Whatever happens, it happens after 10GB of logs, which is simply too
>>> much to sift through.  I've tried to narrow it down, but the lack of a
>>> hardware tlb refill means that we get hundreds of thousands of Data
>>> Access Faults that are simply TLB misses and not the actual Segmentation
>>> Fault in question.
>>>
>>> It doesn't seem to affect other OSes, so I can't imagine what quirk is
>>> being exercised in this case.
>>>
>>> As loath as I am to suggest it, we may have to revert the sparc indirect
>>> register patch for the release.
>>
>> We have more than a month.  If it's reproducible, it can be fixed. :)
>>
>>> I do now ping the rest of my sparc improvements patchset.  It's
>>> completely independent of the use of indirect registers.
>>
>> Mark, perhaps you can try to use migration to reduce the amount of
>> logging?  (Start QEMU with -snapshot, try to stop the vm before it
>> fails.  If you succeed, do a "migrate exec:cat>foo.sav" followed by
>> "commit"; if you fail, try again).
> 
> Yeah, given the improvements that Richard has made, I'd prefer not to
> revert if at all possible. Finally I have some spare time today so I'll
> try and get this down to an easily-testable qcow2 image that can
> reproduce the issue.

I've gotten an image that reaches the segmentation fault in about 1
second but I cannot upload it anywhere in the next few hours.  The good
news is that it fails even without a hard disk (so it's a stateless vm)
and with -d nochain -singlestep.  The bad news is that the dump is not
very deterministic and that I failed to create images closer to the failure.

Paolo

Re: [Qemu-devel] [PATCH v2 3/3] socket: unlink unix socket on remove

2016-06-24 Thread Marc-André Lureau

On Thu, Jun 23, 2016 at 7:01 PM, Michael S. Tsirkin  wrote:
>> > Maybe what you want is a need_unlink feature.
>> > Set it for unix sockets only, that would make some sense.
>>
>> Oh perhaps what you mean is that if the fd was passed, we should cleanup the 
>> unix socket? Yes, I think we should do that then. I'll update the series.

Actually it's not possible to pass a listening fd to a socket chardev
today (the path argument doesn't understand /dev/fdset), so only path
created by qemu will be cleaned up.

>
> I'd like it better contained - that's all. So let's set a flag that says
> "must unlink" as opposed to "it's listening".

You suggest to rename QIO_CHANNEL_FEATURE_LISTEN to
QIO_CHANNEL_FEATURE_LISTEN_MUST_UNLINK ? Or to add another feature
flag? I don't think that brings anything useful here.

What would you think Daniel?

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 3/3] socket: unlink unix socket on remove

2016-06-24 Thread Daniel P. Berrange

On Fri, Jun 24, 2016 at 02:08:52PM +0200, Marc-André Lureau wrote:
> On Thu, Jun 23, 2016 at 7:01 PM, Michael S. Tsirkin  wrote:
> >> > Maybe what you want is a need_unlink feature.
> >> > Set it for unix sockets only, that would make some sense.
> >>
> >> Oh perhaps what you mean is that if the fd was passed, we should cleanup 
> >> the unix socket? Yes, I think we should do that then. I'll update the 
> >> series.
> 
> Actually it's not possible to pass a listening fd to a socket chardev
> today (the path argument doesn't understand /dev/fdset), so only path
> created by qemu will be cleaned up.
> 
> >
> > I'd like it better contained - that's all. So let's set a flag that says
> > "must unlink" as opposed to "it's listening".
> 
> You suggest to rename QIO_CHANNEL_FEATURE_LISTEN to
> QIO_CHANNEL_FEATURE_LISTEN_MUST_UNLINK ? Or to add another feature
> flag? I don't think that brings anything useful here.

IMHO the existing QIO_CHANNEL_FEATURE_LISTEN makes more sense than
a QIO_CHANNEL_FEATURE_MUST_UNLINK, so I'd like to see your current
patches go in as is.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump

2016-06-24 Thread Kai Storbeck

@serge,

I'd be happy to test each of the patches, but considering the length of
this page I'd like an exact link to a patch and/or patches that need to
be tested, and against which version (trusty-security i suppose?).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1297218

Title:
  guest hangs after live migration due to tsc jump

Status in QEMU:
  New
Status in glusterfs package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Fix Released
Status in glusterfs source package in Trusty:
  Confirmed
Status in qemu source package in Trusty:
  Confirmed

Bug description:
  We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing
  a Gluster filesystem. Guests can be live migrated between them.
  However, live migration often leads to the guest being stuck at 100%
  for a while. In that case, the dmesg output for such a guest will show
  (once it recovers): Clocksource tsc unstable (delta = 662463064082
  ns). In this particular example, a guest was migrated and only after
  11 minutes (662 seconds) did it become responsive again.

  It seems that newly booted guests doe not suffer from this problem,
  these can be migrated back and forth at will. After a day or so, the
  problem becomes apparent. It also seems that migrating from server A
  to server B causes much more problems than going from B back to A. If
  necessary, I can do more measurements to qualify these observations.

  The VM servers run Ubuntu 13.04 with these packages:
  Kernel: 3.8.0-35-generic x86_64
  Libvirt: 1.0.2
  Qemu: 1.4.0
  Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using 
libgfapi yet as the Ubuntu libvirt is not linked against libgfapi).
  The interconnect between both machines (both for migration and gluster) is 
10GbE. 
  Both servers are synced to NTP and well within 1ms form one another.

  Guests are either Ubuntu 13.04 or 13.10.

  On the guests, the current_clocksource is kvm-clock.
  The XML definition of the guests only contains:   

  Now as far as I've read in the documentation of kvm-clock, it specifically 
supports live migrations, so I'm a bit surprised at these problems. There isn't 
all that much information to find on these issue, although I have found 
postings by others that seem to have run into the same issues, but without a 
solution.
  --- 
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  DistroRelease: Ubuntu 14.04
  Package: libvirt (not installed)
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic 
root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 
BOOTIF=01-00-25-90-75-b5-c8
  ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
  Tags:  trusty apparmor apparmor apparmor apparmor apparmor
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  modified.conffile..etc.default.libvirt.bin: [modified]
  modified.conffile..etc.libvirt.libvirtd.conf: [modified]
  modified.conffile..etc.libvirt.qemu.conf: [modified]
  modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted]
  mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662
  mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837
  mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions

Re: [Qemu-devel] [PATCH RFC 4/5] include: Move typedef qemu_irq to qemu/typedefs.h

2016-06-24 Thread Markus Armbruster

Peter Maydell  writes:

> On 24 June 2016 at 09:17, Paolo Bonzini  wrote:
>>
>>
>> On 24/06/2016 10:15, Peter Maydell wrote:
>> >>> @@ -31,6 +31,7 @@ typedef struct FWCfgState FWCfgState;
>> >>>  typedef struct HCIInfo HCIInfo;
>> >>>  typedef struct I2CBus I2CBus;
>> >>>  typedef struct I2SCodec I2SCodec;
>> >>> +typedef struct IRQState *qemu_irq;
>> >>>  typedef struct ISABus ISABus;
>> >>>  typedef struct ISADevice ISADevice;
>> >>>  typedef struct IsaDma IsaDma;
> >>
> >> Everything else in typedefs.h is a "typedef struct Thing Thing",
> >> but qemu_irq is different...
 >
 > We want to keep our readers on their toes!
>>> It would mean you now have to decide whether the file is orderd
>>> by the types being defined or by the underlying implementation
>>> type (previously both orders were the same)...
>>
>> Indeed, and renaming the struct is trivial because it's used in a
>> handful of places only.
>
> It would still be different by being a pointer-to-Foo, not a Foo.

Hiding pointer-ness behind a typedef is a bad idea more often than not.

What do you want me to do, if anything?

[Qemu-devel] [PATCH] target-sparc: fix register corruption in ldstub if there is no write permission

2016-06-24 Thread Artyom Tarasenko

Signed-off-by: Artyom Tarasenko 
---
 target-sparc/translate.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 5111cf0..065326c 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -5187,12 +5187,15 @@ printf("ops, illegal rdhpr\n");
 case 0xd:   /* ldstub -- XXX: should be atomically */
 {
 TCGv r_const;
+TCGv tmp = tcg_temp_new();
 
 gen_address_mask(dc, cpu_addr);
-tcg_gen_qemu_ld8u(cpu_val, cpu_addr, dc->mem_idx);
+tcg_gen_qemu_ld8u(tmp, cpu_addr, dc->mem_idx);
 r_const = tcg_const_tl(0xff);
 tcg_gen_qemu_st8(r_const, cpu_addr, dc->mem_idx);
+tcg_gen_mov_tl(cpu_val, tmp);
 tcg_temp_free(r_const);
+tcg_temp_free(tmp);
 }
 break;
 case 0x0f:
-- 
1.8.3.1

Re: [Qemu-devel] [PULL 5/8] target-sparc: Use global registers for the register window

2016-06-24 Thread Artyom Tarasenko

On Fri, Jun 24, 2016 at 2:03 PM, Paolo Bonzini  wrote:
>
>
> On 24/06/2016 12:42, Mark Cave-Ayland wrote:
>> On 24/06/16 07:36, Paolo Bonzini wrote:
>>
>>> On 24/06/2016 05:57, Richard Henderson wrote:

 Whatever happens, it happens after 10GB of logs, which is simply too
 much to sift through.  I've tried to narrow it down, but the lack of a
 hardware tlb refill means that we get hundreds of thousands of Data
 Access Faults that are simply TLB misses and not the actual Segmentation
 Fault in question.

 It doesn't seem to affect other OSes, so I can't imagine what quirk is
 being exercised in this case.

 As loath as I am to suggest it, we may have to revert the sparc indirect
 register patch for the release.
>>>
>>> We have more than a month.  If it's reproducible, it can be fixed. :)
>>>
 I do now ping the rest of my sparc improvements patchset.  It's
 completely independent of the use of indirect registers.
>>>
>>> Mark, perhaps you can try to use migration to reduce the amount of
>>> logging?  (Start QEMU with -snapshot, try to stop the vm before it
>>> fails.  If you succeed, do a "migrate exec:cat>foo.sav" followed by
>>> "commit"; if you fail, try again).
>>
>> Yeah, given the improvements that Richard has made, I'd prefer not to
>> revert if at all possible. Finally I have some spare time today so I'll
>> try and get this down to an easily-testable qcow2 image that can
>> reproduce the issue.
>
> I've gotten an image that reaches the segmentation fault in about 1
> second but I cannot upload it anywhere in the next few hours.  The good
> news is that it fails even without a hard disk (so it's a stateless vm)
> and with -d nochain -singlestep.  The bad news is that the dump is not
> very deterministic and that I failed to create images closer to the failure.

I have a fix for the bug, will post it shortly. This patch does reveal a bug in
the ldstub implementation, but I'm really surprised we haven't hit it before.

 I observe the following sequence under Solaris:

1. a memory page is mapped without the write permission.
2. the ldstub instruction is executed on this page.
3. ldstub raises an access exception
4b. Because of the bug in the ldstub, a register gets corrupted.
5b. Solaris kernel re-maps the page to have the write access
6b. ldstub is executed again with the corrupted register content
7b. Segmentation fault

With the ldstub fix it goes:
4a. Solaris kernel re-maps the page to have the write access
5a. ldstub is executed again, this time all is good.

There must be another bug in memory access handling.
Maybe cached TBs can be executed with the wrong mem idx?

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

[Qemu-devel] [PATCH v2 1/7] virtio-bus: Drop "set_handler" parameter

2016-06-24 Thread Fam Zheng

It always equals to assign now.

Signed-off-by: Fam Zheng 
---
 hw/virtio/virtio-bus.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 1313760..f34b4fc 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -150,10 +150,9 @@ void virtio_bus_set_vdev_config(VirtioBusState *bus, 
uint8_t *config)
  * This function handles both assigning the ioeventfd handler and
  * registering it with the kernel.
  * assign: register/deregister ioeventfd with the kernel
- * set_handler: use the generic ioeventfd handler
  */
 static int set_host_notifier_internal(DeviceState *proxy, VirtioBusState *bus,
-  int n, bool assign, bool set_handler)
+  int n, bool assign)
 {
 VirtIODevice *vdev = virtio_bus_get_device(bus);
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
@@ -167,7 +166,7 @@ static int set_host_notifier_internal(DeviceState *proxy, 
VirtioBusState *bus,
 error_report("%s: unable to init event notifier: %d", __func__, r);
 return r;
 }
-virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler);
+virtio_queue_set_host_notifier_fd_handler(vq, true, true);
 r = k->ioeventfd_assign(proxy, notifier, n, assign);
 if (r < 0) {
 error_report("%s: unable to assign ioeventfd: %d", __func__, r);
@@ -201,7 +200,7 @@ void virtio_bus_start_ioeventfd(VirtioBusState *bus)
 if (!virtio_queue_get_num(vdev, n)) {
 continue;
 }
-r = set_host_notifier_internal(proxy, bus, n, true, true);
+r = set_host_notifier_internal(proxy, bus, n, true);
 if (r < 0) {
 goto assign_error;
 }
@@ -215,7 +214,7 @@ assign_error:
 continue;
 }
 
-r = set_host_notifier_internal(proxy, bus, n, false, false);
+r = set_host_notifier_internal(proxy, bus, n, false);
 assert(r >= 0);
 }
 k->ioeventfd_set_started(proxy, false, true);
@@ -237,7 +236,7 @@ void virtio_bus_stop_ioeventfd(VirtioBusState *bus)
 if (!virtio_queue_get_num(vdev, n)) {
 continue;
 }
-r = set_host_notifier_internal(proxy, bus, n, false, false);
+r = set_host_notifier_internal(proxy, bus, n, false);
 assert(r >= 0);
 }
 k->ioeventfd_set_started(proxy, false, false);
-- 
2.8.3

[Qemu-devel] [PATCH v2 0/7] virtio: Merge virtio-{blk, scsi} host notifier handling paths

2016-06-24 Thread Fam Zheng

v2: Only convert virtio-{blk,scsi}. [Paolo]

This series is based on top of Cornelia's

[PATCH 0/6] virtio: refactor host notifiers

The benifit is we don't use event_notifier_set_handler even in non-dataplane
now, which in turn makes virtio-blk and virtio-scsi follow block layer aio
context semantics. Specifically, I/O requests must come from
blk_get_aio_context(blk) events, rather than iohandler_get_aio_context(), so
that bdrv_drained_begin/end will work as expected.

Patch 4 reverts the hack (ab27c3b5e7) we added for 2.6. Lately, commit
b880481579 added another pair of bdrv_drained_begin/end so the crash cannot
happen even without ab27c3b5e7, but in order to avoid leaking requests, patch
two is still a must.


Fam Zheng (7):
  virtio-bus: Drop "set_handler" parameter
  virtio: Add typedef for handle_output
  virtio: Introduce virtio_add_queue_aio
  virtio: Use aio_set_event_notifier for aio vq
  virtio-blk: Call virtio_add_queue_aio
  virtio-scsi: Call virtio_add_queue_aio
  Revert "mirror: Workaround for unexpected iohandler events during
completion"

 block/mirror.c |  9 -
 hw/block/virtio-blk.c  |  2 +-
 hw/scsi/virtio-scsi.c  |  9 +++--
 hw/virtio/virtio-bus.c | 11 +--
 hw/virtio/virtio.c | 40 
 include/hw/virtio/virtio.h |  8 ++--
 6 files changed, 47 insertions(+), 32 deletions(-)

-- 
2.8.3

[Qemu-devel] [PATCH v2 6/7] virtio-scsi: Call virtio_add_queue_aio

2016-06-24 Thread Fam Zheng

AIO based handler is more appropriate here because it will then
cooperate with bdrv_drained_begin/end. It is needed by the coming
revert patch.

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 71d09d3..57deddd 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -851,13 +851,10 @@ void virtio_scsi_common_realize(DeviceState *dev, Error 
**errp,
 s->sense_size = VIRTIO_SCSI_SENSE_DEFAULT_SIZE;
 s->cdb_size = VIRTIO_SCSI_CDB_DEFAULT_SIZE;
 
-s->ctrl_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE,
-  ctrl);
-s->event_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE,
-   evt);
+s->ctrl_vq = virtio_add_queue_aio(vdev, VIRTIO_SCSI_VQ_SIZE, ctrl);
+s->event_vq = virtio_add_queue_aio(vdev, VIRTIO_SCSI_VQ_SIZE, evt);
 for (i = 0; i < s->conf.num_queues; i++) {
-s->cmd_vqs[i] = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE,
- cmd);
+s->cmd_vqs[i] = virtio_add_queue_aio(vdev, VIRTIO_SCSI_VQ_SIZE, cmd);
 }
 
 if (s->conf.iothread) {
-- 
2.8.3

[Qemu-devel] [PATCH 4/6] e1000e: add boot rom

2016-06-24 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/net/e1000e.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 692283f..4778744 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -693,6 +693,7 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
 c->vendor_id = PCI_VENDOR_ID_INTEL;
 c->device_id = E1000_DEV_ID_82574L;
 c->revision = 0;
+c->romfile = "efi-e1000e.rom";
 c->class_id = PCI_CLASS_NETWORK_ETHERNET;
 c->is_express = 1;
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 4/7] virtio: Use aio_set_event_notifier for aio vq

2016-06-24 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 hw/virtio/virtio.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 1ea6f66..a586529 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1835,11 +1835,21 @@ static void 
virtio_queue_host_notifier_read(EventNotifier *n)
 void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign,
bool set_handler)
 {
+AioContext *ctx = qemu_get_aio_context();
 if (assign && set_handler) {
-event_notifier_set_handler(&vq->host_notifier, true,
+if (vq->use_aio) {
+aio_set_event_notifier(ctx, &vq->host_notifier, true,
virtio_queue_host_notifier_read);
+} else {
+event_notifier_set_handler(&vq->host_notifier, true,
+   virtio_queue_host_notifier_read);
+}
 } else {
-event_notifier_set_handler(&vq->host_notifier, true, NULL);
+if (vq->use_aio) {
+aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL);
+} else {
+event_notifier_set_handler(&vq->host_notifier, true, NULL);
+}
 }
 if (!assign) {
 /* Test and clear notifier before after disabling event,
-- 
2.8.3

[Qemu-devel] [PATCH v2 5/7] virtio-blk: Call virtio_add_queue_aio

2016-06-24 Thread Fam Zheng

AIO based handler is more appropriate here because it will then
cooperate with bdrv_drained_begin/end. It is needed by the coming
revert patch.

Signed-off-by: Fam Zheng 
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 284e646..b1f56c3 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -888,7 +888,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->rq = NULL;
 s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
-s->vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+s->vq = virtio_add_queue_aio(vdev, 128, virtio_blk_handle_output);
 virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err);
 if (err != NULL) {
 error_propagate(errp, err);
-- 
2.8.3

[Qemu-devel] [PATCH v2 2/7] virtio: Add typedef for handle_output

2016-06-24 Thread Fam Zheng

The function pointer signature has been repeated a few times, using a
typedef may make coding easier.

Signed-off-by: Fam Zheng 
---
 hw/virtio/virtio.c | 9 -
 include/hw/virtio/virtio.h | 5 +++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7ed06ea..bba73bb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -95,8 +95,8 @@ struct VirtQueue
 int inuse;
 
 uint16_t vector;
-void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
-void (*handle_aio_output)(VirtIODevice *vdev, VirtQueue *vq);
+VirtIOHandleOutput handle_output;
+VirtIOHandleOutput handle_aio_output;
 VirtIODevice *vdev;
 EventNotifier guest_notifier;
 EventNotifier host_notifier;
@@ -1131,7 +1131,7 @@ void virtio_queue_set_vector(VirtIODevice *vdev, int n, 
uint16_t vector)
 }
 
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
-void (*handle_output)(VirtIODevice *, VirtQueue *))
+VirtIOHandleOutput handle_output)
 {
 int i;
 
@@ -1794,8 +1794,7 @@ static void 
virtio_queue_host_notifier_aio_read(EventNotifier *n)
 }
 
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
-void 
(*handle_output)(VirtIODevice *,
-  
VirtQueue *))
+VirtIOHandleOutput 
handle_output)
 {
 if (handle_output) {
 vq->handle_aio_output = handle_output;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 96b581d..b104104 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -138,9 +138,10 @@ void virtio_cleanup(VirtIODevice *vdev);
 /* Set the child bus name. */
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name);
 
+typedef void (*VirtIOHandleOutput)(VirtIODevice *, VirtQueue *);
+
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
-void (*handle_output)(VirtIODevice *,
-  VirtQueue *));
+VirtIOHandleOutput handle_output);
 
 void virtio_del_queue(VirtIODevice *vdev, int n);
 
-- 
2.8.3

[Qemu-devel] [PATCH v2 7/7] Revert "mirror: Workaround for unexpected iohandler events during completion"

2016-06-24 Thread Fam Zheng

This reverts commit ab27c3b5e7408693dde0b565f050aa55c4a1bcef.

The virtio storage device host notifiers now work with
bdrv_drained_begin/end, so we don't need this hack any more.

Signed-off-by: Fam Zheng 
---
 block/mirror.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index a04ed9c..147c0d6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -500,9 +500,6 @@ static void mirror_exit(BlockJob *job, void *opaque)
 block_job_completed(&s->common, data->ret);
 g_free(data);
 bdrv_drained_end(src);
-if (qemu_get_aio_context() == bdrv_get_aio_context(src)) {
-aio_enable_external(iohandler_get_aio_context());
-}
 bdrv_unref(src);
 }
 
@@ -726,12 +723,6 @@ immediate_exit:
 /* Before we switch to target in mirror_exit, make sure data doesn't
  * change. */
 bdrv_drained_begin(bs);
-if (qemu_get_aio_context() == bdrv_get_aio_context(bs)) {
-/* FIXME: virtio host notifiers run on iohandler_ctx, therefore the
- * above bdrv_drained_end isn't enough to quiesce it. This is ugly, we
- * need a block layer API change to achieve this. */
-aio_disable_external(iohandler_get_aio_context());
-}
 block_job_defer_to_main_loop(&s->common, mirror_exit, data);
 }
 
-- 
2.8.3

[Qemu-devel] [PATCH v2 3/7] virtio: Introduce virtio_add_queue_aio

2016-06-24 Thread Fam Zheng

Using this function instead of virtio_add_queue marks the vq as aio
based. This differentiation will be useful in later patches.

Signed-off-by: Fam Zheng 
---
 hw/virtio/virtio.c | 19 +--
 include/hw/virtio/virtio.h |  3 +++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bba73bb..1ea6f66 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -97,6 +97,7 @@ struct VirtQueue
 uint16_t vector;
 VirtIOHandleOutput handle_output;
 VirtIOHandleOutput handle_aio_output;
+bool use_aio;
 VirtIODevice *vdev;
 EventNotifier guest_notifier;
 EventNotifier host_notifier;
@@ -1130,8 +1131,9 @@ void virtio_queue_set_vector(VirtIODevice *vdev, int n, 
uint16_t vector)
 }
 }
 
-VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
-VirtIOHandleOutput handle_output)
+static VirtQueue *virtio_add_queue_internal(VirtIODevice *vdev, int queue_size,
+VirtIOHandleOutput handle_output,
+bool use_aio)
 {
 int i;
 
@@ -1148,10 +1150,23 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
 vdev->vq[i].handle_output = handle_output;
 vdev->vq[i].handle_aio_output = NULL;
+vdev->vq[i].use_aio = use_aio;
 
 return &vdev->vq[i];
 }
 
+VirtQueue *virtio_add_queue_aio(VirtIODevice *vdev, int queue_size,
+VirtIOHandleOutput handle_output)
+{
+return virtio_add_queue_internal(vdev, queue_size, handle_output, true);
+}
+
+VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
+VirtIOHandleOutput handle_output)
+{
+return virtio_add_queue_internal(vdev, queue_size, handle_output, false);
+}
+
 void virtio_del_queue(VirtIODevice *vdev, int n)
 {
 if (n < 0 || n >= VIRTIO_QUEUE_MAX) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b104104..1e8cae5 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -143,6 +143,9 @@ typedef void (*VirtIOHandleOutput)(VirtIODevice *, 
VirtQueue *);
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 VirtIOHandleOutput handle_output);
 
+VirtQueue *virtio_add_queue_aio(VirtIODevice *vdev, int queue_size,
+VirtIOHandleOutput handle_output);
+
 void virtio_del_queue(VirtIODevice *vdev, int n);
 
 void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num);
-- 
2.8.3

[Qemu-devel] [PATCH 0/6] ipxe: update roms

2016-06-24 Thread Gerd Hoffmann

  Hi,

Here comes ipxe update for qemu 2.7.  It rebases the ipxe submodule
to latest upstream master.  We pick up support for virtio 1.0.
Also a fix for a EFI bug which causes problems with recent edk2
versions.

There are some more hickups due to network boot changes in edk2
which are not root-caused yet.  They are present with both old
and new ipxe versios, so we have at least no regressions here.
But possibly we will need another (much smaller) bugfix ipxe
update during freeze.

This patch series also adds a boot rom for the new e1000e emulation
and, while being at it, for vmxnet3 too.

Patch 6/6 with the binary update has *not* been sent to the list
as it is pretty big.  If you want the prebuild roms for testing
please fetch them from the git repository at:
  git://git.kraxel.org/qemu work/ipxe

cheers,
  Gerd

Gerd Hoffmann (6):
  ipxe: update submodule from 4e03af8ec to 041863191
  ipxe: add e1000e rom
  ipxe: add vmxnet3 rom
  e1000e: add boot rom
  vmxnet3: add boot rom
  ipxe: update prebuilt binaries

 hw/net/e1000e.c  |   1 +
 hw/net/vmxnet3.c |   1 +
 include/hw/i386/pc.h |   4 
 pc-bios/efi-e1000.rom| Bin 196608 -> 209408 bytes
 pc-bios/efi-e1000e.rom   | Bin 0 -> 209408 bytes
 pc-bios/efi-eepro100.rom | Bin 197120 -> 209920 bytes
 pc-bios/efi-ne2k_pci.rom | Bin 195584 -> 208384 bytes
 pc-bios/efi-pcnet.rom| Bin 195584 -> 208384 bytes
 pc-bios/efi-rtl8139.rom  | Bin 199168 -> 211456 bytes
 pc-bios/efi-virtio.rom   | Bin 193024 -> 211456 bytes
 pc-bios/efi-vmxnet3.rom  | Bin 0 -> 205312 bytes
 roms/Makefile|   8 ++--
 roms/ipxe|   2 +-
 13 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 pc-bios/efi-e1000e.rom
 create mode 100644 pc-bios/efi-vmxnet3.rom

-- 
1.8.3.1

[Qemu-devel] [PATCH 2/6] ipxe: add e1000e rom

2016-06-24 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 roms/Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/roms/Makefile b/roms/Makefile
index 7bd1252..e8133fe 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -1,11 +1,13 @@
 
 vgabios_variants := stdvga cirrus vmware qxl isavga virtio
 vgabios_targets  := $(subst -isavga,,$(patsubst 
%,vgabios-%.bin,$(vgabios_variants)))
-pxerom_variants  := e1000 eepro100 ne2k_pci pcnet rtl8139 virtio
-pxerom_targets   := 8086100e 80861209 10500940 10222000 10ec8139 1af41000
+pxerom_variants  := e1000 e1000e eepro100 ne2k_pci pcnet rtl8139 virtio
+pxerom_targets   := 8086100e 808610d3 80861209 10500940 10222000 10ec8139 
1af41000
 
 pxe-rom-e1000efi-rom-e1000: VID := 8086
 pxe-rom-e1000efi-rom-e1000: DID := 100e
+pxe-rom-e1000e   efi-rom-e1000e   : VID := 8086
+pxe-rom-e1000e   efi-rom-e1000e   : DID := 10d3
 pxe-rom-eepro100 efi-rom-eepro100 : VID := 8086
 pxe-rom-eepro100 efi-rom-eepro100 : DID := 1209
 pxe-rom-ne2k_pci efi-rom-ne2k_pci : VID := 1050
-- 
1.8.3.1

[Qemu-devel] [PATCH 1/6] ipxe: update submodule from 4e03af8ec to 041863191

2016-06-24 Thread Gerd Hoffmann

shortlog


Andrew Widdersheim (1):
  [netdevice] Add "ifname" setting

Carl Henrik Lunde (1):
  [vmxnet3] Avoid completely filling the TX descriptor ring

Christian Hesse (2):
  [golan] Fix build error on some versions of gcc
  [ath9k] Fix buffer overrun for ar9287

Christian Nilsson (2):
  [intel] Add PCI device ID for another I219-V
  [intel] Add PCI device ID for another I219-LM

Hummel Frank (1):
  [intel] Add INTEL_NO_PHY_RST for I218-LM

Kyösti Mälkki (1):
  [intel] Add PCI IDs for i210/i211 flashless operation

Ladi Prosek (6):
  [pci] Add pci_find_next_capability()
  [virtio] Add virtio 1.0 constants and data structures
  [virtio] Add virtio 1.0 PCI support
  [virtio] Add virtio-net 1.0 support
  [virtio] Renumber virtio_pci_region flags
  [virtio] Fix virtio-pci logging

Leendert van Doorn (2):
  [tg3] Fix address truncation bug on 64-bit machines
  [tg3] Add missing memory barrier

Michael Brown (287):
  [settings] Re-add "uristring" setting type
  [dhcp] Do not skip ProxyDHCPREQUEST if next-server is empty
  [efi] Add definitions of GUIDs observed when booting shim.efi and grub.efi
  [efi] Mark EFI debug transcription functions as __attribute__ (( pure ))
  [efi] Remove raw EFI_HANDLE values from debug messages
  [efi] Include installed protocol list in unknown handle names
  [efi] Improve efi_wrap debugging
  [pxe] Construct all fake DHCP packets before starting PXE NBP
  [efi] Add definitions of GUIDs observed when booting wdsmgfw.efi
  [efi] Fix debug directory size
  [efi] Populate debug directory entry FileOffset field
  [build] Search for ldlinux.c32 separately from isolinux.bin
  [tcpip] Allow supported address families to be detected at runtime
  [efi] Allow calls to efi_snp_claim() and efi_snp_release() to be nested
  [efi] Fix order of events on SNP removal path
  [efi] Do not return EFI_NOT_READY from our ReceiveFilters() method
  [pxe] Populate ciaddr in fake PXE Boot Server ACK packet
  [uri] Generalise tftp_uri() to pxe_uri()
  [efi] Implement the EFI_PXE_BASE_CODE_PROTOCOL
  [usb] Expose usb_find_driver()
  [usb] Add function to device's function list before attempting probe
  [efi] Add USB headers and GUID definitions
  [efi] Allow efidev_parent() to traverse multiple device generations
  [efi] Add a USB host controller driver based on EFI_USB_IO_PROTOCOL
  [tcpip] Avoid generating positive zero for transmitted UDP checksums
  [usb] Generalise zero-length packet generation logic
  [ehci] Do not treat zero-length NULL pointers as unreachable
  [ehci] Support arbitrarily large transfers
  [xhci] Support arbitrarily large transfers
  [efi] Provide efi_devpath_len()
  [efi] Include a copy of the device path within struct efi_device
  [usb] Select preferred USB device configuration based on driver score
  [usb] Allow for wildcard USB class IDs
  [efi] Expose unused USB devices via EFI_USB_IO_PROTOCOL
  [ncm] Support setting MAC address
  [build] Remove dependency on libiberty
  [efi] Minimise use of iPXE header files when building host utilities
  [pxe] Invoke INT 1a,564e when PXE stack is activated
  [pxe] Notify BIOS via INT 1a,564e for each new network device
  [efi] Work around broken 32-bit PE executable parsing in ImageHlp.dll
  [efi] Avoid infinite loops when asked to stop non-existent devices
  [efi] Expose an UNDI interface alongside the existing SNP interface
  [malloc] Avoid integer overflow for excessively large memory allocations
  [peerdist] Avoid NULL pointer dereference for plaintext blocks
  [http] Verify server port when reusing a pooled connection
  [efi] Reset root directory when installing EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
  [efi] Update to current EDK2 headers
  [efi] Import EFI_HII_FONT_PROTOCOL definitions
  [fbcon] Allow character height to be selected at runtime
  [fbcon] Move margin calculations to fbcon.c
  [console] Tidy up config/console.h
  [build] Generalise CONSOLE_VESAFB to CONSOLE_FRAMEBUFFER
  [efi] Add support for EFI_GRAPHICS_OUTPUT_PROTOCOL frame buffer consoles
  [dhcp] Reset start time when deferring discovery
  [dhcp] Limit maximum number of DHCP discovery deferrals
  [comboot] Reset console before starting COMBOOT executable
  [intel] Forcibly skip PHY reset on some models
  [intel] Correct definition of receive overrun bit
  [infiniband] Add definitions for FDR and EDR link speeds
  [infiniband] Add qword accessors for ib_guid and ib_gid
  [pci] Add definitions for PCI Express function level reset (FLR)
  [bitops] Fix definitions for big-endian devices
  [smsc95xx] Add driver for SMSC/Microchip LAN95xx USB Ethernet NICs
  [bitops] Provide BIT_QWORD_PTR()
  [efi] Add %.usb target for building EFI-bootable USB (or other

[Qemu-devel] [PATCH 5/6] vmxnet3: add boot rom

2016-06-24 Thread Gerd Hoffmann

Disable for old machine types as this is a guest visible change.

Signed-off-by: Gerd Hoffmann 
---
 hw/net/vmxnet3.c | 1 +
 include/hw/i386/pc.h | 4 
 2 files changed, 5 insertions(+)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index d978976..25cee9f 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2700,6 +2700,7 @@ static void vmxnet3_class_init(ObjectClass *class, void 
*data)
 c->vendor_id = PCI_VENDOR_ID_VMWARE;
 c->device_id = PCI_DEVICE_ID_VMWARE_VMXNET3;
 c->revision = PCI_DEVICE_ID_VMWARE_VMXNET3_REVISION;
+c->romfile = "efi-vmxnet3.rom";
 c->class_id = PCI_CLASS_NETWORK_ETHERNET;
 c->subsystem_vendor_id = PCI_VENDOR_ID_VMWARE;
 c->subsystem_id = PCI_DEVICE_ID_VMWARE_VMXNET3;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 49566c8..a112efb 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -362,6 +362,10 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 .driver   = TYPE_X86_CPU,\
 .property = "cpuid-0xb",\
 .value= "off",\
+},{\
+.driver   = "vmxnet3",\
+.property = "romfile",\
+.value= "",\
 },
 
 #define PC_COMPAT_2_5 \
-- 
1.8.3.1

[Qemu-devel] [PATCH 3/6] ipxe: add vmxnet3 rom

2016-06-24 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 roms/Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/roms/Makefile b/roms/Makefile
index e8133fe..88b3709 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -1,8 +1,8 @@
 
 vgabios_variants := stdvga cirrus vmware qxl isavga virtio
 vgabios_targets  := $(subst -isavga,,$(patsubst 
%,vgabios-%.bin,$(vgabios_variants)))
-pxerom_variants  := e1000 e1000e eepro100 ne2k_pci pcnet rtl8139 virtio
-pxerom_targets   := 8086100e 808610d3 80861209 10500940 10222000 10ec8139 
1af41000
+pxerom_variants  := e1000 e1000e eepro100 ne2k_pci pcnet rtl8139 virtio vmxnet3
+pxerom_targets   := 8086100e 808610d3 80861209 10500940 10222000 10ec8139 
1af41000 15ad07b0
 
 pxe-rom-e1000efi-rom-e1000: VID := 8086
 pxe-rom-e1000efi-rom-e1000: DID := 100e
@@ -18,6 +18,8 @@ pxe-rom-rtl8139  efi-rom-rtl8139  : VID := 10ec
 pxe-rom-rtl8139  efi-rom-rtl8139  : DID := 8139
 pxe-rom-virtio   efi-rom-virtio   : VID := 1af4
 pxe-rom-virtio   efi-rom-virtio   : DID := 1000
+pxe-rom-vmxnet3  efi-rom-vmxnet3  : VID := 15ad
+pxe-rom-vmxnet3  efi-rom-vmxnet3  : DID := 07b0
 
 #
 # cross compiler auto detection
-- 
1.8.3.1

Re: [Qemu-devel] [virtio-dev][RFC 2/2] virtio-sdm: new device specification

2016-06-24 Thread Christian Pinto

Hello Stefan,

On Thu, Jun 23, 2016 at 9:39 PM, Stefan Hajnoczi 
wrote:

> On Fri, Jun 17, 2016 at 06:03:14PM +0200, Christian Pinto wrote:
> > This patch adds the specification of the Signal Dristribution Module
> virtio
> > device to the current virtio specification document.
> >
> > Signed-off-by: Christian Pinto 
> > ---
> >  virtio-sdm.tex | 126
> +
> >  1 file changed, 126 insertions(+)
> >  create mode 100644 virtio-sdm.tex
> >
> > diff --git a/virtio-sdm.tex b/virtio-sdm.tex
> > new file mode 100644
> > index 000..abbdb80
> > --- /dev/null
> > +++ b/virtio-sdm.tex
> > @@ -0,0 +1,126 @@
> > +\section{Signal Distribution Module}\label{sec:Device Types / SDM
> Device}
> > +
> > +The virtio Signal Distribution Module is meant to enable Inter Processor
> > +interrupts in QEMU/KVM environment. The SDM enables different
> processors in the
> > +same or different QEMU instances to send mutual signals. An example are
> Inter
> > +Processor Interrupts used in AMP systems, for the processors involved
> to notify
> > +the presence of new data in the communication queues.
>
> Please rephrase without mentioning QEMU/KVM.  Focus on the device
> functionality and purpose.  VIRTIO's scope is more general than
> QEMU/KVM.
>

Right, thanks for pointing this out. I will remove any dependence from
QEMU/KVM from the device specs.


>
> I'm not familiar with AMP systems.  Can you explain how SDM relates to
> remoteproc and rpmsg?  There seems to be overlap because remoteproc can
> also boot a remote processor.
>

I will clarify this aspect in the cover letter of the next release.

In any case, to give you some preliminary information, the SDM is a device
to send generic signals among processors. Such signals can be customized
according to their purpose, as an example a payload can be added to a
signal in order to model a mailbox-like behavior.
 As an example it can be used by the remoteproc driver to send the BOOT
signal to the processor that needs to be booted. Similarly it is used in
rpmsg to send a signal to the destination every time a new message is
available. RPMSG uses virtio queues to exchange messages, and the SDM is
used when kicking a virtqueue to raise the associated interrupt in the
destination processor.



> > +
> > +\subsection{Device ID}\label{sec:Device Types / SDM Device / Device ID}
> > +
> > +19
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / SDM Device /
> Virtqueues}
> > +
> > +\begin{description}
> > +\item[0] hg_vq
> > +\item[1] gh_vq
> > +\end{description}
> > +
> > +Queue 0 is used for host-guest communication (i.e., notification of a
> signal),
> > +while queue 1 for guest-host communication.
>
> I think the VIRTIO terminology is "device" (host) and "driver" (guest).
> So it would be:
>
> Queue 0 is used for device-to-driver communication (i.e. notification of
> a signal), while queue 1 is for driver-to-device communication.
>

Thanks, I will update the text according to this comment.


>
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / SDM Device / Feature
> bits}
> > +
> > +None.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / SDM
> Device /
> > +Device configuration layout}
> > +
> > +The device configuration is composed by three fields: \texttt{master},
> > +\texttt{max_slaves} and \texttt{current_slaves}.
> > +
> > +\begin{lstlisting}
> > +struct virtio_sdm_config {
> > + u8master;
> > + u16   max_slaves;
> > + u16   current_slaves;
> > +};
> > +\end{lstlisting}
> > +
> > +The SDM device can be instantiated either as a master or as a slave.
> The master
> > +slave behavior is meant on purpose to reflect the AMP like type of
> communication
> > +where a master processor controls one or more slave processors.
> > +A master SDM instance can send a signal to any of the slaves instances,
> > +while slaves can only signal the master.
> > +
> > +The \texttt{master} field of the configuration space is meant to be
> read by the
> > +Linux driver to figure out whether a specific instance is a master or a
> slave.
>
> The scope of VIRTIO is wider than Linux, just saying "the driver" is
> preferred.
>

ACK


>
> > +The \texttt{max_slaves} field contains the maximum number of slaves
> supported by
> > +the SDM device.
> > +The value of \texttt{max_slaves} is initially set from the command line
> of QEMU,
>
> The QEMU command-line is out of scope for the VIRTIO specification.
> Please drop this part of the sentence.  I suggest saying that a
> configuration change notification is sent when the value of max_slaves
> changes.
>

I agree, this will make the description agnostic w.r.t. QEMU.


>
> > +but can be changed during operations through the configuration space.
> > +Finally the \texttt{current_slaves} field contains the actual number of
> slaves
> > +registered to the master SDM. This field is a read only field.
> > +
> > +\subsection{Device Initialization}\label{sec:Devi

Re: [Qemu-devel] [PATCH RFC 4/5] include: Move typedef qemu_irq to qemu/typedefs.h

2016-06-24 Thread Peter Maydell

On 24 June 2016 at 13:32, Markus Armbruster  wrote:
> Peter Maydell  writes:
>> It would still be different by being a pointer-to-Foo, not a Foo.
>
> Hiding pointer-ness behind a typedef is a bad idea more often than not.
>
> What do you want me to do, if anything?

I think my underlying issue is that the purpose of typedefs.h
(as I see it) is to define some typedefs for handing around
pointers to opaque objects, but we don't pass around pointers
to qemu_irqs, we pass around actual qemu_irqs. So I don't
really feel like it belongs in this header.

thanks
-- PMM

Re: [Qemu-devel] [virtio-dev][RFC 1/2] content: reserve virtio device ID

2016-06-24 Thread Christian Pinto

Hello Stefan,

On Thu, Jun 23, 2016 at 8:57 PM, Stefan Hajnoczi 
wrote:

> On Fri, Jun 17, 2016 at 06:03:13PM +0200, Christian Pinto wrote:
> > Signed-off-by: Christian Pinto 
> > ---
> >  content.tex | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/content.tex b/content.tex
> > index d989d98..0d98926 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -2990,6 +2990,8 @@ Device ID  &  Virtio Device\\
> >  \hline
> >  18 &   Input device \\
> >  \hline
> > +19 &   Signal Distribution Module \\
> > +\hline
>
> Please use a higher number, 19 is already proposed for virtio-vsock:
> https://lists.oasis-open.org/archives/virtio-dev/201603/msg00042.html


Thanks, I will use 20 then.


>
>
> > @@ -5641,6 +5643,8 @@ descriptor for the \field{sense_len},
> \field{residual},
> >  \field{status_qualifier}, \field{status}, \field{response} and
> >  \field{sense} fields.
> >
> > +\input{virtio-sdm.tex}
>
> Please move this to the patch that adds virtio-sdm.tex.  Each patch must
> be self-contained and build successfully.
>

Right, I will add this change to the patch adding the device specification
text.


Best,

Christian

Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value

2016-06-24 Thread Marc-André Lureau

Hi

On Thu, Jun 23, 2016 at 7:08 PM, Michael S. Tsirkin  wrote:
>>
>> Callers do not always ignore it (and in general it should not, should it?), 
>> this helps breaking the execution at right moment, help debugging, code 
>> consistency, good practices etc (perhaps it's too obvious to me and I am 
>> missing something?)
>
> Reporting error up the stack is helpful if it's handled in some way.
> If we just keep guest going on this error, then we could
> maybe log it for debug build but that's all.

Reporting up to guest somehow would be a good thing at some point, so
I think we should start from the bottom. vhost-user lacks error
handling, let's add it.

Regarding debug build messages, I don't think it's enough. As long as
we don't have an official supported way to handle disconnect. It's
better to report an error than be silent.

-- 
Marc-André Lureau

Re: [Qemu-devel] [PULL 00/34] pc, pci, virtio: new features, cleanups, fixes

2016-06-24 Thread Peter Maydell

On 24 June 2016 at 06:53, Michael S. Tsirkin  wrote:
> The following changes since commit c7288767523f6510cf557707d3eb5e78e519b90d:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160623' 
> into staging (2016-06-23 11:53:14 +0100)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 21a4d96243e60a4c8eeb124a023b8a3bd9120e18:
>
>   virtio-bus: remove old set_host_notifier callback (2016-06-24 08:47:35 
> +0300)
>
> 
> pc, pci, virtio: new features, cleanups, fixes
>
> nvdimm label support
> cpu acpi hotplug rework
> virtio rework
> misc cleanups and fixes
>
> Signed-off-by: Michael S. Tsirkin 
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error

2016-06-24 Thread Marc-André Lureau

Hi

On Thu, Jun 23, 2016 at 7:03 PM, Michael S. Tsirkin  wrote:
>> > why bother?  So callers can just ignore them in turn?
>>
>> The callers do not always ignore errors, fortunately (see
>> vhost_user_init() for ex).
>
> But that doesn't call set log base, right?

No, what is the point you are making?


-- 
Marc-André Lureau

Re: [Qemu-devel] [virtio-dev][RFC 1/2] content: reserve virtio device ID

2016-06-24 Thread Cornelia Huck

On Fri, 24 Jun 2016 14:45:29 +0200
Christian Pinto  wrote:

> On Thu, Jun 23, 2016 at 8:57 PM, Stefan Hajnoczi 
> wrote:
> 
> > On Fri, Jun 17, 2016 at 06:03:13PM +0200, Christian Pinto wrote:

> > > @@ -2990,6 +2990,8 @@ Device ID  &  Virtio Device\\
> > >  \hline
> > >  18 &   Input device \\
> > >  \hline
> > > +19 &   Signal Distribution Module \\
> > > +\hline
> >
> > Please use a higher number, 19 is already proposed for virtio-vsock:
> > https://lists.oasis-open.org/archives/virtio-dev/201603/msg00042.html
> 
> 
> Thanks, I will use 20 then.

I think virtio-crypto already wants to use 20. We sadly have a bit of a
backlog...

Re: [Qemu-devel] [PATCH v3 00/24] target-sparc improvements

2016-06-24 Thread Artyom Tarasenko

On Thu, Jun 2, 2016 at 2:17 PM, Artyom Tarasenko  wrote:
> On Thu, Jun 2, 2016 at 7:56 AM, Richard Henderson  wrote:
>> The primary focus of this patch set is to reduce the number of
>> helpers that modify TCG globals, and thus increase the lifetime
>> of those globals within each TB, and thus decrease the number
>> of times that tcg must spill and fill them from backing store.
>>
>> As a byproduct, I also implement the bulk of the interesting v9 ASIs
>> inline, thus exposing e.g. the little-endian loads and stores as
>> simple tcg operations.
>>
>> The patch set is relative to my outstanding tcg pull request.
>> For reference, the complete tree can be found at
>>
>>   git://github.com/rth7680/qemu.git tgt-sparc-2
>>
>> Changes from v2 to v3:
>>   * Add ASI_BLK_COMMIT_[PS] to patch 19.
>> This fixes the illegal instruction that Artyom reported.
>>   * Add gen_address_mask calls to all direct accesses.
>> This fixes a follow-on segv that affected the debian install.
>>
>> Changes from v1 to v2:
>>   * Commit message refers to UA2005 instead of UA2011 when
>> introducing new asi.h defines. (Artyom)

Once this change gets back to v4

Reviewed-By: Artyom Tarasenko 

(Currently rebasing my sun4v branch on top of this series)

>> Richard Henderson (24):
>>   target-sparc: Mark more flags for helpers
>>   target-sparc: Remove softint as a TCG global
>>   target-sparc: Store mmu index in TB flags
>>   target-sparc: Create gen_exception
>>   target-sparc: Unify asi handling between 32 and 64-bit
>>   target-sparc: Store %asi in TB flags
>>   target-sparc: Introduce get_asi
>>   target-sparc: Pass TCGMemOp to gen_ld/st_asi
>>   target-sparc: Import linux/arch/sparc/include/uapi/asm/asi.h
>>   target-sparc: Add UA2011 defines to asi.h
>>   target-sparc: Use defines from asi.h
>>   target-sparc: Directly implement easy ld/st asis
>>   target-sparc: Use QT0 to return results from ldda
>>   target-sparc: Introduce gen_check_align
>>   target-sparc: Directly implement easy ldd/std asis
>>   target-sparc: Fix obvious error in ASI_M_BFILL
>>   target-sparc: Pass TCGMemOp constants to helper_ld/st_asi
>>   target-sparc: Directly implement easy ldf/stf asis
>>   target-sparc: Directly implement block and short ldf/stf asis
>>   target-sparc: Remove helper_ldf_asi, helper_stf_asi
>>   target-sparc: Use explicit writes to cpu_fsr
>>   target-sparc: Use cpu_fsr in stfsr
>>   target-sparc: Use cpu_loop_exit_restore from
>> helper_check_ieee_exceptions
>>   target-sparc: Elide duplicate updates to fprs
>>
>>  target-sparc/asi.h |  311 +++
>>  target-sparc/cpu.h |   28 +-
>>  target-sparc/fop_helper.c  |  230 +++-
>>  target-sparc/helper.h  |  168 +++---
>>  target-sparc/ldst_helper.c |  696 +++-
>>  target-sparc/translate.c   | 1273 
>> 
>>  6 files changed, 1607 insertions(+), 1099 deletions(-)
>>  create mode 100644 target-sparc/asi.h
>>
>> --
>> 2.5.5
>>
>
>
>
> --
> Regards,
> Artyom Tarasenko
>
> SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

Re: [Qemu-devel] [PATCH v2 1/1] cpu: report hyperv feature words through qom

2016-06-24 Thread Eduardo Habkost

On Fri, Jun 24, 2016 at 01:10:45PM +0300, Evgeny Yakovlev wrote:
[...]
> Thanks for all the comments!
> Removing hyperv_* booleans sounds like the proper way forward however it
> will break compatibility with how libvirt enables hyperv enlightenments.

It won't break any compatibility, if the property name in the
feature_names arrays is the same as the previous boolean property
name. The only difference is internal to QEMU: now the property
will set a bit inside env->features instead of a boolean field in
X86CPU.

> I think we will now concentrate on the QOM feature-words and later get back
> to reworking hv_* properties. You suggestions will be very helpful for that.

OK!

-- 
Eduardo

Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes

2016-06-24 Thread Marc-André Lureau

Hi

On Thu, Jun 23, 2016 at 6:32 AM, Michael S. Tsirkin  wrote:
>
> Can this be rarranged, first patches fixing the code paths that are wrong,
> then patches handling the asserts etc?

Ok, I'll try to improve ordering (minor fix, close(fd) fix,
assert/fprintf->error_report, check and return error,
vhost_dev_cleanup() fixes, keep vhost_net after a disconnect, wait
until connection is ready)


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes

2016-06-24 Thread Marc-André Lureau

Hi

On Thu, Jun 23, 2016 at 6:31 AM, Michael S. Tsirkin  wrote:
>
> OK so if it's ok for read or write to fail, then I think
> the callback should just return success.
> This will make sure backends such as vhost net in kernel
> where failure indicates an internal bug are handled
> correctly, while vhost user ignores errors and attempts
> to go on.
>

That's what we are trying to do, but since it's not yet well defined
how to disconnect/reconnect and there are many bugs around this, it
would help a lot to add error checking and error reporting for now.

>> Since there is feature checks on reconnection, qemu should wait for
>> the initial connection feature negotiation to complete. The test added
>> demonstrates this.
>
>
> So on disconnect, we keep the guest going? But then when connect is
> attempted, then we for some reason block guest until negotiation
> is done? Sounds rather strange to me.

When qemu is the server, it already waits for an initial connection
from vhost-user backend. Here it is just enhanced to wait for the
initial nego to complete, which is necessary for proper device
initialization and also further reconnections checks.

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail

2016-06-24 Thread Marc-André Lureau

On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin  wrote:
>> > If it's ok and we can recover, then why should we print errors?
>>
>> To me, the current disconnect handling is not handled cleanly. There
>> is not much/nothing in the protocol that tells when and how you can
>> disconnect. If qemu vhost-user reconnection works today, it is mostly
>> by luck. Imho, we should print errors if any call to vhost user fails
>> to help further analysis of broken behaviour.
>
> But we decided disconnect is OK, did we not? So now a failure like this
> is just expected. It's not broken behaviour anymore ...

As you know, there are many ways qemu or the vm can break when the
backend is disconnected.  For now, it would help a lot if we had a bit
of error messages in this case. But do we really want to support
spurious disconnect/reconnect at any time? It's going to be
challenging, to maintain, to test... Is it really worthwhile? I doubt
it. I think it would be easier and more future-proof to have a
dedicated vhost-user request for that and only do a best effort for
the ungraceful disconnect.

>> And we shoud have a
>> clean mechanism to handle shutdown/disconnect/reconnect.
>
>
> At some level this is something that's missing here.
>
> How about we patch docs/specs/vhost-user.txt
> and describe how is reconnection supposed to work?
>
> All we have is:
> If Master is unable to send the full message or receives a wrong reply
> it will close the connection. An optional reconnection mechanism can 
> be
> implemented.

This text is there from the original commit but it doesn't say how to
reconnect and or how to recover from the ring. I don't have enough
experience with virtio in general to say when it is acceptable for
backend to be gone (not processing the ring), I can imagine the
implementation vary widely, and the requirements depend on the kind of
device.

If we limit the spec to vhost-user protocol only and leave it open on
how each virtio device should support ungraceful disconnect/reconnect,
then it sounds reasonable to document that after such disconnect, on
reconnect:
- the server assumes the backend has no knowledge of previous connection
- the state can be changed between reconnections (new ring state, mem table etc)
- the server will check feature compatibility with previous backend
and reject incompatible backends
- backend must restore ring processing from used->idx and ignore
VHOST_USER_SET_VRING_BASE (or should we change and document that in
this case the ring base is updated from used->idx?)

(it's probably not a complete list, I am not good at imagining all
possible cases)

-- 
Marc-André Lureau

[Qemu-devel] [PATCH 16/17] s390x/pci: replace fid with idx in msg data of msix

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

Present code uses fid as the part of message data of msix for looking
up the specific zpci device. However it limits the usable range of fid,
and the code looking up the zpci device may fail due to truncation of
the fid.

In addition, fh is composed of enabled bit, FH_VIRT and the array index.
So we can use the array index as the identifier to store in msg data.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c  | 8 
 hw/s390x/s390-pci-inst.c | 3 ++-
 target-s390x/kvm.c   | 4 ++--
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index f930b04..630a826 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -474,18 +474,18 @@ static void s390_msi_ctrl_write(void *opaque, hwaddr 
addr, uint64_t data,
 {
 S390PCIBusDevice *pbdev;
 uint32_t io_int_word;
-uint32_t fid = data >> ZPCI_MSI_VEC_BITS;
+uint32_t idx = data >> ZPCI_MSI_VEC_BITS;
 uint32_t vec = data & ZPCI_MSI_VEC_MASK;
 uint64_t ind_bit;
 uint32_t sum_bit;
 uint32_t e = 0;
 
-DPRINTF("write_msix data 0x%" PRIx64 " fid %d vec 0x%x\n", data, fid, vec);
+DPRINTF("write_msix data 0x%" PRIx64 " idx %d vec 0x%x\n", data, idx, vec);
 
-pbdev = s390_pci_find_dev_by_fid(fid);
+pbdev = s390_pci_find_dev_by_idx(idx);
 if (!pbdev) {
 e |= (vec << ERR_EVENT_MVN_OFFSET);
-s390_pci_generate_error_event(ERR_EVENT_NOMSI, 0, fid, addr, e);
+s390_pci_generate_error_event(ERR_EVENT_NOMSI, idx, 0, addr, e);
 return;
 }
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 37572df..331bc4c 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -419,7 +419,8 @@ static void update_msix_table_msg_data(S390PCIBusDevice 
*pbdev, uint64_t offset,
 
 msg_data = (uint8_t *)data - offset % PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_VECTOR_CTRL;
-val = pci_get_long(msg_data) | (pbdev->fid << ZPCI_MSI_VEC_BITS);
+val = pci_get_long(msg_data) |
+((pbdev->fh & FH_MASK_INDEX) << ZPCI_MSI_VEC_BITS);
 pci_set_long(msg_data, val);
 DPRINTF("update msix msg_data to 0x%" PRIx64 "\n", *data);
 }
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 45e94ca..2991bff 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -2246,10 +2246,10 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev)
 {
 S390PCIBusDevice *pbdev;
-uint32_t fid = data >> ZPCI_MSI_VEC_BITS;
+uint32_t idx = data >> ZPCI_MSI_VEC_BITS;
 uint32_t vec = data & ZPCI_MSI_VEC_MASK;
 
-pbdev = s390_pci_find_dev_by_fid(fid);
+pbdev = s390_pci_find_dev_by_idx(idx);
 if (!pbdev) {
 DPRINTF("add_msi_route no dev\n");
 return -ENODEV;
-- 
2.9.0

[Qemu-devel] [PATCH 15/17] s390x/pci: fix stpcifc_service_call

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

Firstly the function misses dmaas checking. This patch adds it.

Secondly the function uses s390_pci_find_dev_by_fh() to look up the
zpci device. This may fail if the guest provides a valid and disabled
fh but fh of the associated zpci device is enabled. Thus we use
s390_pci_find_dev_by_idx() instead.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-inst.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 70db835..37572df 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -944,6 +944,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t 
fiba, uint8_t ar)
 int stpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar)
 {
 CPUS390XState *env = &cpu->env;
+uint8_t dmaas;
 uint32_t fh;
 ZpciFib fib;
 S390PCIBusDevice *pbdev;
@@ -956,13 +957,20 @@ int stpcifc_service_call(S390CPU *cpu, uint8_t r1, 
uint64_t fiba, uint8_t ar)
 }
 
 fh = env->regs[r1] >> 32;
+dmaas = (env->regs[r1] >> 16) & 0xff;
+
+if (dmaas) {
+setcc(cpu, ZPCI_PCI_LS_ERR);
+s390_set_status_code(env, r1, ZPCI_STPCIFC_ST_INVAL_DMAAS);
+return 0;
+}
 
 if (fiba & 0x7) {
 program_interrupt(env, PGM_SPECIFICATION, 6);
 return 0;
 }
 
-pbdev = s390_pci_find_dev_by_fh(fh);
+pbdev = s390_pci_find_dev_by_idx(fh & FH_MASK_INDEX);
 if (!pbdev) {
 setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
 return 0;
-- 
2.9.0

[Qemu-devel] [PATCH 05/17] s390x/pci: refactor s390_pci_find_dev_by_fh

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

Because this function is called very frequently, we should use a more
effective way to find the zpci device. So we use the FH's index to get
the device directly.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 7111587..1b83772 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -198,19 +198,12 @@ S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
 
 S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh)
 {
-S390PCIBusDevice *pbdev;
-int i;
 S390pciState *s = s390_get_phb();
+S390PCIBusDevice *pbdev;
 
-if (!fh) {
-return NULL;
-}
-
-for (i = 0; i < PCI_SLOT_MAX; i++) {
-pbdev = &s->pbdev[i];
-if (pbdev->fh == fh) {
-return pbdev;
-}
+pbdev = &s->pbdev[fh & FH_MASK_INDEX];
+if (pbdev->fh != 0 && pbdev->fh == fh) {
+return pbdev;
 }
 
 return NULL;
-- 
2.9.0

[Qemu-devel] [PATCH 08/17] s390x/pci: introduce S390PCIIOMMU

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

Currently each zpci device holds its own DMA address space and memory
region. At the same time, all instances of zpci device are stored in
S390pciState. So duirng the initialization of S390pciState, all zpci
devices are created and then all DMA address spaces are created. Thus,
when initializing pci devices, their corresponding DMA address spaces
could be found.

But zpci qdev will be introduced later. Zpci device may be initialized
and plugged afterwards generic pci device. So we should initialize all
DMA address spaces and memory regions before initializing zpci devices.

We introduce a new struct named S390PCIIOMMU. And a new field of
S390pciState, which is an array to store all instances of S390PCIIOMMU,
is added so that qemu pci code could find the corresponding DMA
address space when initializing a generic pci device. And this should
be done before the connection of a zpci device and a generic pci
device is built.

Signed-off-by: Yi Min Zhao 
Acked-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c | 19 +++
 hw/s390x/s390-pci-bus.h |  9 +++--
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 0c67c1e..af3263f 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -406,7 +406,7 @@ static AddressSpace *s390_pci_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 {
 S390pciState *s = opaque;
 
-return &s->pbdev[PCI_SLOT(devfn)].as;
+return &s->iommu[PCI_SLOT(devfn)]->as;
 }
 
 static uint8_t set_ind_atomic(uint64_t ind_loc, uint8_t to_be_set)
@@ -478,15 +478,15 @@ static const MemoryRegionOps s390_msi_ctrl_ops = {
 
 void s390_pci_iommu_enable(S390PCIBusDevice *pbdev)
 {
-memory_region_init_iommu(&pbdev->iommu_mr, OBJECT(&pbdev->mr),
+memory_region_init_iommu(&pbdev->iommu_mr, OBJECT(&pbdev->iommu->mr),
  &s390_iommu_ops, "iommu-s390", pbdev->pal + 1);
-memory_region_add_subregion(&pbdev->mr, 0, &pbdev->iommu_mr);
+memory_region_add_subregion(&pbdev->iommu->mr, 0, &pbdev->iommu_mr);
 pbdev->iommu_enabled = true;
 }
 
 void s390_pci_iommu_disable(S390PCIBusDevice *pbdev)
 {
-memory_region_del_subregion(&pbdev->mr, &pbdev->iommu_mr);
+memory_region_del_subregion(&pbdev->iommu->mr, &pbdev->iommu_mr);
 object_unparent(OBJECT(&pbdev->iommu_mr));
 pbdev->iommu_enabled = false;
 }
@@ -494,13 +494,15 @@ void s390_pci_iommu_disable(S390PCIBusDevice *pbdev)
 static void s390_pcihost_init_as(S390pciState *s)
 {
 int i;
-S390PCIBusDevice *pbdev;
+S390PCIIOMMU *iommu;
 
 for (i = 0; i < PCI_SLOT_MAX; i++) {
-pbdev = &s->pbdev[i];
-memory_region_init(&pbdev->mr, OBJECT(s),
+iommu = g_malloc0(sizeof(S390PCIIOMMU));
+memory_region_init(&iommu->mr, OBJECT(s),
"iommu-root-s390", UINT64_MAX);
-address_space_init(&pbdev->as, &pbdev->mr, "iommu-pci");
+address_space_init(&iommu->as, &iommu->mr, "iommu-pci");
+
+s->iommu[i] = iommu;
 }
 
 memory_region_init_io(&s->msix_notify_mr, OBJECT(s),
@@ -576,6 +578,7 @@ static void s390_pcihost_hot_plug(HotplugHandler 
*hotplug_dev,
 pbdev->pdev = pci_dev;
 pbdev->state = ZPCI_FS_DISABLED;
 pbdev->fh = s390_pci_get_pfh(pci_dev);
+pbdev->iommu = s->iommu[PCI_SLOT(pci_dev->devfn)];
 
 s390_pcihost_setup_msix(pbdev);
 
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index c4d4079..ea1efcc 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -248,6 +248,11 @@ typedef struct S390MsixInfo {
 uint32_t pba_offset;
 } S390MsixInfo;
 
+typedef struct S390PCIIOMMU {
+AddressSpace as;
+MemoryRegion mr;
+} S390PCIIOMMU;
+
 typedef struct S390PCIBusDevice {
 PCIDevice *pdev;
 ZpciState state;
@@ -263,8 +268,7 @@ typedef struct S390PCIBusDevice {
 uint8_t sum;
 S390MsixInfo msix;
 AdapterRoutes routes;
-AddressSpace as;
-MemoryRegion mr;
+S390PCIIOMMU *iommu;
 MemoryRegion iommu_mr;
 IndAddr *summary_ind;
 IndAddr *indicator;
@@ -278,6 +282,7 @@ typedef struct S390pciState {
 PCIHostState parent_obj;
 S390PCIBus *bus;
 S390PCIBusDevice pbdev[PCI_SLOT_MAX];
+S390PCIIOMMU *iommu[PCI_SLOT_MAX];
 AddressSpace msix_notify_as;
 MemoryRegion msix_notify_mr;
 QTAILQ_HEAD(, SeiContainer) pending_sei;
-- 
2.9.0

[Qemu-devel] [PATCH 17/17] s390x/pci: make hot-unplug handler smoother

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

The current implementation of hot-unplug handler is abrupt. Any pci
operation will be just rejected if pci device is unconfigured. Thus a
pci device can not be reset or destroyed in a right, smooth and safe
way.

Improve this as follows:
- Notify the guest via a HP_EVENT_DECONFIGURE_REQUEST(0x303) event in
  the unplug handler, giving it a chance to deconfigure the device via
  sclp and allowing us to continue hot-unplug afterwards.
- Set up a timer that will generate the HP_EVENT_CONFIGURE_TO_STBRES
  (0x304) event as before if the guest did not react after an adequate
  time.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c | 35 ++-
 hw/s390x/s390-pci-bus.h |  3 +++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 630a826..640a4ea 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -192,6 +192,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
 }
 pbdev->state = ZPCI_FS_STANDBY;
 rc = SCLP_RC_NORMAL_COMPLETION;
+
+if (pbdev->release_timer) {
+qdev_unplug(DEVICE(pbdev->pdev), NULL);
+}
 }
 out:
 psccb->header.response_code = cpu_to_be16(rc);
@@ -679,6 +683,23 @@ static void s390_pcihost_hot_plug(HotplugHandler 
*hotplug_dev,
 }
 }
 
+static void s390_pcihost_timer_cb(void *opaque)
+{
+S390PCIBusDevice *pbdev = opaque;
+
+if (pbdev->summary_ind) {
+pci_dereg_irqs(pbdev);
+}
+if (pbdev->iommu_enabled) {
+pci_dereg_ioat(pbdev);
+}
+
+pbdev->state = ZPCI_FS_STANDBY;
+s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
+ pbdev->fh, pbdev->fid);
+qdev_unplug(DEVICE(pbdev), NULL);
+}
+
 static void s390_pcihost_hot_unplug(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
@@ -712,8 +733,20 @@ static void s390_pcihost_hot_unplug(HotplugHandler 
*hotplug_dev,
 case ZPCI_FS_STANDBY:
 break;
 default:
-s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
+s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
  pbdev->fh, pbdev->fid);
+pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+s390_pcihost_timer_cb,
+pbdev);
+timer_mod(pbdev->release_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
+return;
+}
+
+if (pbdev->release_timer && timer_pending(pbdev->release_timer)) {
+timer_del(pbdev->release_timer);
+timer_free(pbdev->release_timer);
+pbdev->release_timer = NULL;
 }
 
 s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index 71cdd7a..f1fbd3c 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -34,6 +34,7 @@
 #define ZPCI_MAX_UID 0x
 #define UID_UNDEFINED 0
 #define UID_CHECKING_ENABLED 0x01
+#define HOT_UNPLUG_TIMEOUT (NANOSECONDS_PER_SECOND * 60 * 5)
 
 #define S390_PCI_HOST_BRIDGE(obj) \
 OBJECT_CHECK(S390pciState, (obj), TYPE_S390_PCI_HOST_BRIDGE)
@@ -44,6 +45,7 @@
 
 #define HP_EVENT_TO_CONFIGURED0x0301
 #define HP_EVENT_RESERVED_TO_STANDBY  0x0302
+#define HP_EVENT_DECONFIGURE_REQUEST  0x0303
 #define HP_EVENT_CONFIGURED_TO_STBRES 0x0304
 #define HP_EVENT_STANDBY_TO_RESERVED  0x0308
 
@@ -283,6 +285,7 @@ typedef struct S390PCIBusDevice {
 MemoryRegion iommu_mr;
 IndAddr *summary_ind;
 IndAddr *indicator;
+QEMUTimer *release_timer;
 } S390PCIBusDevice;
 
 typedef struct S390PCIBus {
-- 
2.9.0

[Qemu-devel] [PATCH 06/17] s390x/pci: enforce zPCI state checking

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

Current code uses some fields combinatorially to indicate the state of
a s390 pci device. This patch introduces device states in order to make
the code more readable and more logical.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c  | 104 +++--
 hw/s390x/s390-pci-bus.h  |  34 +-
 hw/s390x/s390-pci-inst.c | 168 ++-
 hw/s390x/s390-pci-inst.h |   5 ++
 include/hw/s390x/sclp.h  |   1 +
 5 files changed, 230 insertions(+), 82 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 1b83772..0f6fcef 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -116,16 +116,22 @@ void s390_pci_sclp_configure(SCCB *sccb)
 goto out;
 }
 
-if (pbdev) {
-if (pbdev->configured) {
-rc = SCLP_RC_NO_ACTION_REQUIRED;
-} else {
-pbdev->configured = true;
-rc = SCLP_RC_NORMAL_COMPLETION;
-}
-} else {
+if (!pbdev) {
 DPRINTF("sclp config no dev found\n");
 rc = SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED;
+goto out;
+}
+
+switch (pbdev->state) {
+case ZPCI_FS_RESERVED:
+rc = SCLP_RC_ADAPTER_IN_RESERVED_STATE;
+break;
+case ZPCI_FS_STANDBY:
+pbdev->state = ZPCI_FS_DISABLED;
+rc = SCLP_RC_NORMAL_COMPLETION;
+break;
+default:
+rc = SCLP_RC_NO_ACTION_REQUIRED;
 }
 out:
 psccb->header.response_code = cpu_to_be16(rc);
@@ -142,22 +148,28 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
 goto out;
 }
 
-if (pbdev) {
-if (!pbdev->configured) {
-rc = SCLP_RC_NO_ACTION_REQUIRED;
-} else {
-if (pbdev->summary_ind) {
-pci_dereg_irqs(pbdev);
-}
-if (pbdev->iommu_enabled) {
-pci_dereg_ioat(pbdev);
-}
-pbdev->configured = false;
-rc = SCLP_RC_NORMAL_COMPLETION;
-}
-} else {
+if (!pbdev) {
 DPRINTF("sclp deconfig no dev found\n");
 rc = SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED;
+goto out;
+}
+
+switch (pbdev->state) {
+case ZPCI_FS_RESERVED:
+rc = SCLP_RC_ADAPTER_IN_RESERVED_STATE;
+break;
+case ZPCI_FS_STANDBY:
+rc = SCLP_RC_NO_ACTION_REQUIRED;
+break;
+default:
+if (pbdev->summary_ind) {
+pci_dereg_irqs(pbdev);
+}
+if (pbdev->iommu_enabled) {
+pci_dereg_ioat(pbdev);
+}
+pbdev->state = ZPCI_FS_STANDBY;
+rc = SCLP_RC_NORMAL_COMPLETION;
 }
 out:
 psccb->header.response_code = cpu_to_be16(rc);
@@ -183,7 +195,7 @@ S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
 for (i = 0; i < PCI_SLOT_MAX; i++) {
 pbdev = &s->pbdev[i];
 
-if (pbdev->fh == 0) {
+if (pbdev->state == ZPCI_FS_RESERVED) {
 continue;
 }
 
@@ -233,9 +245,8 @@ static void s390_pci_generate_plug_event(uint16_t pec, 
uint32_t fh,
 s390_pci_generate_event(2, pec, fh, fid, 0, 0);
 }
 
-static void s390_pci_generate_error_event(uint16_t pec, uint32_t fh,
-  uint32_t fid, uint64_t faddr,
-  uint32_t e)
+void s390_pci_generate_error_event(uint16_t pec, uint32_t fh, uint32_t fid,
+   uint64_t faddr, uint32_t e)
 {
 s390_pci_generate_event(1, pec, fh, fid, faddr, e);
 }
@@ -337,8 +348,14 @@ static IOMMUTLBEntry s390_translate_iommu(MemoryRegion 
*iommu, hwaddr addr,
 .perm = IOMMU_NONE,
 };
 
-if (!pbdev->configured || !pbdev->pdev ||
-!(pbdev->fh & FH_MASK_ENABLE) || !pbdev->iommu_enabled) {
+switch (pbdev->state) {
+case ZPCI_FS_ENABLED:
+case ZPCI_FS_BLOCKED:
+if (!pbdev->iommu_enabled) {
+return ret;
+}
+break;
+default:
 return ret;
 }
 
@@ -357,30 +374,13 @@ static IOMMUTLBEntry s390_translate_iommu(MemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
-if (!pbdev->g_iota) {
-pbdev->error_state = true;
-pbdev->lgstg_blocked = true;
-s390_pci_generate_error_event(ERR_EVENT_INVALAS, pbdev->fh, pbdev->fid,
-  addr, 0);
-return ret;
-}
-
 if (addr < pbdev->pba || addr > pbdev->pal) {
-pbdev->error_state = true;
-pbdev->lgstg_blocked = true;
-s390_pci_generate_error_event(ERR_EVENT_OORANGE, pbdev->fh, pbdev->fid,
-  addr, 0);
 return ret;
 }
 
 pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
addr);
-
 if (!pte) {
-pbdev->error_state = true;
-pbdev->lgstg_blocked = true;
-s390_pci_generate_error_event(ERR_EVENT_SE

[Qemu-devel] [PATCH 01/17] s390x/pci: fix failures of dma map/unmap

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

In commit d78c19b5cf4821d0c198f4132a085bdbf19dda4c, vfio code stores
the IOMMU's offset_within_address_space and adjusts the IOVA before
calling vfio_dma_map/vfio_dma_unmap. But s390_translate_iommu already
considers the base address of an IOMMU memory region.

Thus we use pal as the size and 0x0 as the base address to initialize
IOMMU memory subregion.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index a77c10c..8f03b82 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -498,11 +498,9 @@ static const MemoryRegionOps s390_msi_ctrl_ops = {
 
 void s390_pci_iommu_enable(S390PCIBusDevice *pbdev)
 {
-uint64_t size = pbdev->pal - pbdev->pba + 1;
-
 memory_region_init_iommu(&pbdev->iommu_mr, OBJECT(&pbdev->mr),
- &s390_iommu_ops, "iommu-s390", size);
-memory_region_add_subregion(&pbdev->mr, pbdev->pba, &pbdev->iommu_mr);
+ &s390_iommu_ops, "iommu-s390", pbdev->pal + 1);
+memory_region_add_subregion(&pbdev->mr, 0, &pbdev->iommu_mr);
 pbdev->iommu_enabled = true;
 }
 
-- 
2.9.0

[Qemu-devel] [PATCH 04/17] s390x/pci: unify FH_ macros

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

Present code uses some macros to structure PCI Function Handle. But
their names don't have a uniform format. Let's use FH_MASK_ as the
unified prefix.

While we're at it, differentiate the SHM bits: use different bits for
vfio and emulated devices.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c  |  6 +++---
 hw/s390x/s390-pci-bus.h  |  9 ++---
 hw/s390x/s390-pci-inst.c | 18 +-
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index ba637c9..7111587 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -170,7 +170,7 @@ static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
 
 static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
 {
-return PCI_SLOT(pdev->devfn) | FH_VIRT;
+return PCI_SLOT(pdev->devfn) | FH_SHM_VFIO;
 }
 
 S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
@@ -345,7 +345,7 @@ static IOMMUTLBEntry s390_translate_iommu(MemoryRegion 
*iommu, hwaddr addr,
 };
 
 if (!pbdev->configured || !pbdev->pdev ||
-!(pbdev->fh & FH_ENABLED) || !pbdev->iommu_enabled) {
+!(pbdev->fh & FH_MASK_ENABLE) || !pbdev->iommu_enabled) {
 return ret;
 }
 
@@ -456,7 +456,7 @@ static void s390_msi_ctrl_write(void *opaque, hwaddr addr, 
uint64_t data,
 return;
 }
 
-if (!(pbdev->fh & FH_ENABLED)) {
+if (!(pbdev->fh & FH_MASK_ENABLE)) {
 return;
 }
 
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index 2c852d4..5cf7926 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -21,9 +21,12 @@
 #include "hw/s390x/css.h"
 
 #define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
-#define FH_VIRT 0x00ff
-#define ENABLE_BIT_OFFSET 31
-#define FH_ENABLED (1 << ENABLE_BIT_OFFSET)
+#define FH_MASK_ENABLE   0x8000
+#define FH_MASK_INSTANCE 0x7f00
+#define FH_MASK_SHM  0x00ff
+#define FH_MASK_INDEX0x001f
+#define FH_SHM_VFIO  0x0001
+#define FH_SHM_EMUL  0x0002
 #define S390_PCIPT_ADAPTER 2
 
 #define S390_PCI_HOST_BRIDGE(obj) \
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 2cd7d14..8b68824 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -92,7 +92,7 @@ static int list_pci(ClpReqRspListPci *rrb, uint8_t *cc)
 stl_p(&rrb->response.fmt, 0);
 stq_p(&rrb->response.reserved1, 0);
 stq_p(&rrb->response.reserved2, 0);
-stl_p(&rrb->response.mdd, FH_VIRT);
+stl_p(&rrb->response.mdd, FH_MASK_SHM);
 stw_p(&rrb->response.max_fn, PCI_MAX_FUNCTIONS);
 rrb->response.entry_size = sizeof(ClpFhListEntry);
 finish = 0;
@@ -212,12 +212,12 @@ int clp_service_call(S390CPU *cpu, uint8_t r2)
 
 switch (reqsetpci->oc) {
 case CLP_SET_ENABLE_PCI_FN:
-pbdev->fh = pbdev->fh | FH_ENABLED;
+pbdev->fh |= FH_MASK_ENABLE;
 stl_p(&ressetpci->fh, pbdev->fh);
 stw_p(&ressetpci->hdr.rsp, CLP_RC_OK);
 break;
 case CLP_SET_DISABLE_PCI_FN:
-pbdev->fh = pbdev->fh & ~FH_ENABLED;
+pbdev->fh &= ~FH_MASK_ENABLE;
 pbdev->error_state = false;
 pbdev->lgstg_blocked = false;
 stl_p(&ressetpci->fh, pbdev->fh);
@@ -318,7 +318,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 offset = env->regs[r2 + 1];
 
 pbdev = s390_pci_find_dev_by_fh(fh);
-if (!pbdev || !(pbdev->fh & FH_ENABLED)) {
+if (!pbdev || !(pbdev->fh & FH_MASK_ENABLE)) {
 DPRINTF("pcilg no pci dev\n");
 setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
 return 0;
@@ -435,7 +435,7 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 offset = env->regs[r2 + 1];
 
 pbdev = s390_pci_find_dev_by_fh(fh);
-if (!pbdev || !(pbdev->fh & FH_ENABLED)) {
+if (!pbdev || !(pbdev->fh & FH_MASK_ENABLE)) {
 DPRINTF("pcistg no pci dev\n");
 setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
 return 0;
@@ -526,7 +526,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 end = start + env->regs[r2 + 1];
 
 pbdev = s390_pci_find_dev_by_fh(fh);
-if (!pbdev || !(pbdev->fh & FH_ENABLED)) {
+if (!pbdev || !(pbdev->fh & FH_MASK_ENABLE)) {
 DPRINTF("rpcit no pci dev\n");
 setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
 goto out;
@@ -590,7 +590,7 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 }
 
 pbdev = s390_pci_find_dev_by_fh(fh);
-if (!pbdev || !(pbdev->fh & FH_ENABLED)) {
+if (!pbdev || !(pbdev->fh & FH_MASK_ENABLE)) {
 DPRINTF("pcistb no pci dev fh 0x%x\n", fh);
 setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
 return 0;
@@ -743,7 +743,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t 
fiba, uint8_t ar)
 }
 
 pbdev = s390_pci_find_dev_by_fh(fh);
-if (!pbdev || !(pbdev->fh & FH_ENABLE

[Qemu-devel] [PATCH 03/17] s390x/pci: write fid in CLP_QUERY_PCI_FN

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

We forgot to write the fid; fix that.

Signed-off-by: Yi Min Zhao 
Acked-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-inst.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 479375f..2cd7d14 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -256,6 +256,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2)
 
 stq_p(&resquery->sdma, ZPCI_SDMA_ADDR);
 stq_p(&resquery->edma, ZPCI_EDMA_ADDR);
+stl_p(&resquery->fid, pbdev->fid);
 stw_p(&resquery->pchid, 0);
 stw_p(&resquery->ug, 1);
 stl_p(&resquery->uid, pbdev->fid);
-- 
2.9.0

[Qemu-devel] [PATCH 11/17] s390x/pci: enable zpci hot-plug/hot-unplug

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

We need to support hot-plug/hot-unplug for the new zpci devices as
well. This patch enables the present hot-plug/hot-unplug handlers
to support not only generic pci devices but also zpci devices.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c | 119 +---
 1 file changed, 73 insertions(+), 46 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 8e0f707..57d5d14 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -177,16 +177,6 @@ out:
 psccb->header.response_code = cpu_to_be16(rc);
 }
 
-static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
-{
-return PCI_SLOT(pdev->devfn);
-}
-
-static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
-{
-return PCI_SLOT(pdev->devfn) | FH_SHM_VFIO;
-}
-
 static S390PCIBusDevice *s390_pci_find_dev_by_uid(uint16_t uid)
 {
 int i;
@@ -580,6 +570,7 @@ static int s390_pcihost_init(SysBusDevice *dev)
 phb->bus = b;
 
 s->bus = S390_PCI_BUS(qbus_create(TYPE_S390_PCI_BUS, DEVICE(s), NULL));
+qbus_set_hotplug_handler(BUS(s->bus), DEVICE(s), NULL);
 
 QTAILQ_INIT(&s->pending_sei);
 return 0;
@@ -632,52 +623,87 @@ static S390PCIBusDevice *s390_pci_device_new(const char 
*target)
 static void s390_pcihost_hot_plug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
-PCIDevice *pci_dev = PCI_DEVICE(dev);
-S390PCIBusDevice *pbdev;
-S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
-   ->qbus.parent);
+PCIDevice *pdev = NULL;
+S390PCIBusDevice *pbdev = NULL;
+S390pciState *s = s390_get_phb();
 
-if (!dev->id) {
-/* In the case the PCI device does not define an id */
-/* we generate one based on the PCI address */
-dev->id = g_strdup_printf("auto_%02x:%02x.%01x",
-  pci_bus_num(pci_dev->bus),
-  PCI_SLOT(pci_dev->devfn),
-  PCI_FUNC(pci_dev->devfn));
-}
+if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+pdev = PCI_DEVICE(dev);
 
-pbdev = s390_pci_find_dev_by_target(dev->id);
-if (!pbdev) {
-pbdev = s390_pci_device_new(dev->id);
+if (!dev->id) {
+/* In the case the PCI device does not define an id */
+/* we generate one based on the PCI address */
+dev->id = g_strdup_printf("auto_%02x:%02x.%01x",
+  pci_bus_num(pdev->bus),
+  PCI_SLOT(pdev->devfn),
+  PCI_FUNC(pdev->devfn));
+}
+
+pbdev = s390_pci_find_dev_by_target(dev->id);
 if (!pbdev) {
-error_setg(errp, "create zpci device failed");
+pbdev = s390_pci_device_new(dev->id);
+if (!pbdev) {
+error_setg(errp, "create zpci device failed");
+}
 }
-}
 
-s->pbdev[PCI_SLOT(pci_dev->devfn)] = pbdev;
-pbdev->fid = s390_pci_get_pfid(pci_dev);
-pbdev->pdev = pci_dev;
-pbdev->state = ZPCI_FS_DISABLED;
-pbdev->fh = s390_pci_get_pfh(pci_dev);
-pbdev->iommu = s->iommu[PCI_SLOT(pci_dev->devfn)];
+if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
+pbdev->fh |= FH_SHM_VFIO;
+} else {
+pbdev->fh |= FH_SHM_EMUL;
+}
 
-s390_pcihost_setup_msix(pbdev);
+pbdev->pdev = pdev;
+pbdev->iommu = s->iommu[PCI_SLOT(pdev->devfn)];
+pbdev->state = ZPCI_FS_STANDBY;
+s390_pcihost_setup_msix(pbdev);
 
-if (dev->hotplugged) {
-s390_pci_generate_plug_event(HP_EVENT_RESERVED_TO_STANDBY,
- pbdev->fh, pbdev->fid);
-s390_pci_generate_plug_event(HP_EVENT_TO_CONFIGURED,
- pbdev->fh, pbdev->fid);
+if (dev->hotplugged) {
+s390_pci_generate_plug_event(HP_EVENT_RESERVED_TO_STANDBY,
+ pbdev->fh, pbdev->fid);
+}
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
+int idx;
+
+pbdev = S390_PCI_DEVICE(dev);
+for (idx = 0; idx < PCI_SLOT_MAX; idx++) {
+if (!s->pbdev[idx]) {
+s->pbdev[idx] = pbdev;
+pbdev->fh = idx;
+return;
+}
+}
+
+error_setg(errp, "no slot for plugging zpci device");
 }
 }
 
 static void s390_pcihost_hot_unplug(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
-PCIDevice *pci_dev = PCI_DEVICE(dev);
-S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
-   ->qbus.parent);
-S390PCIBusDevice *pbdev = s->pbdev

[Qemu-devel] [PATCH 07/17] s390x/pci: introduce S390PCIBus

2016-06-24 Thread Cornelia Huck

From: Yi Min Zhao 

To enable S390PCIBusDevice as qdev, there should be a new bus to
plug and manage all instances of S390PCIBusDevice. Due to this,
S390PCIBus is introduced.

Signed-off-by: Yi Min Zhao 
Reviewed-by: Pierre Morel 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-pci-bus.c | 10 ++
 hw/s390x/s390-pci-bus.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 0f6fcef..0c67c1e 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -527,6 +527,9 @@ static int s390_pcihost_init(SysBusDevice *dev)
 bus = BUS(b);
 qbus_set_hotplug_handler(bus, DEVICE(dev), NULL);
 phb->bus = b;
+
+s->bus = S390_PCI_BUS(qbus_create(TYPE_S390_PCI_BUS, DEVICE(s), NULL));
+
 QTAILQ_INIT(&s->pending_sei);
 return 0;
 }
@@ -636,9 +639,16 @@ static const TypeInfo s390_pcihost_info = {
 }
 };
 
+static const TypeInfo s390_pcibus_info = {
+.name = TYPE_S390_PCI_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(S390PCIBus),
+};
+
 static void s390_pci_register_types(void)
 {
 type_register_static(&s390_pcihost_info);
+type_register_static(&s390_pcibus_info);
 }
 
 type_init(s390_pci_register_types)
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index e332f6a..c4d4079 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -21,6 +21,7 @@
 #include "hw/s390x/css.h"
 
 #define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
+#define TYPE_S390_PCI_BUS "s390-pcibus"
 #define FH_MASK_ENABLE   0x8000
 #define FH_MASK_INSTANCE 0x7f00
 #define FH_MASK_SHM  0x00ff
@@ -31,6 +32,8 @@
 
 #define S390_PCI_HOST_BRIDGE(obj) \
 OBJECT_CHECK(S390pciState, (obj), TYPE_S390_PCI_HOST_BRIDGE)
+#define S390_PCI_BUS(obj) \
+OBJECT_CHECK(S390PCIBus, (obj), TYPE_S390_PCI_BUS)
 
 #define HP_EVENT_TO_CONFIGURED0x0301
 #define HP_EVENT_RESERVED_TO_STANDBY  0x0302
@@ -267,8 +270,13 @@ typedef struct S390PCIBusDevice {
 IndAddr *indicator;
 } S390PCIBusDevice;
 
+typedef struct S390PCIBus {
+BusState qbus;
+} S390PCIBus;
+
 typedef struct S390pciState {
 PCIHostState parent_obj;
+S390PCIBus *bus;
 S390PCIBusDevice pbdev[PCI_SLOT_MAX];
 AddressSpace msix_notify_as;
 MemoryRegion msix_notify_mr;
-- 
2.9.0

1 2 3 4 >

1 - 100 of 313 matches

Mail list logo