Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

2018-06-01 Thread Wei Wang

On 06/01/2018 12:58 PM, Peter Xu wrote:

On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Guest: 8G RAM, 4 vCPU
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
 - Idle Guest Live Migration Time (results are averaged over 10 runs):
 - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
 - Guest with Linux Compilation Workload (make bzImage -j4):
 - Live Migration Time (average)
   Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
 - Linux Compilation Time
   Optimization v.s. Legacy = 4min56s v.s. 5min3s
   --> no obvious difference

- Source Code
 - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
 - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

Hi, Wei,

I have a very high-level question to the series.


Hi Peter,

Thanks for joining the discussion :)



IIUC the core idea for this series is that we can avoid sending some
of the pages if we know that we don't need to send them.  I think this
is based on the fact that on the destination side all the pages are by
default zero after they are malloced.  While before this series, IIUC
any migration will send every single page to destination, no matter
whether it's zeroed or not.  So I'm uncertain about whether this will
affect the received bitmap on the destination side.  Say, before this
series, the received bitmap will directly cover the whole RAM bitmap
after migration is finished, now it's won't.  Will there be any side
effect?  I don't see obvious issue now, but just raise this question
up.


This feature currently only supports pre-copy (I think the received 
bitmap is something matters to post copy only).

That's why we have
rs->free_page_support = ..&& !migrate_postcopy();


Meanwhile, this reminds me about a more funny idea: whether we can
just avoid sending the zero pages directly from QEMU's perspective.
In other words, can we just do nothing if save_zero_page() detected
that the page is zero (I guess the is_zero_range() can be fast too,
but I don't know exactly how fast it is)?  And how that would be
differed from this page hinting way in either performance and other
aspects.


I guess you referred to the zero page optimization. I think the major 
overhead comes to the zero page checking - lots of memory accesses, 
which also waste memory bandwidth. Please see the results attached in 
the cover letter. The legacy case already includes the zero page 
optimization.


Best,
Wei






Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

2018-06-01 Thread Wei Wang

On 06/01/2018 01:07 PM, Peter Xu wrote:

On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:

On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Guest: 8G RAM, 4 vCPU
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
 - Idle Guest Live Migration Time (results are averaged over 10 runs):
 - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
 - Guest with Linux Compilation Workload (make bzImage -j4):
 - Live Migration Time (average)
   Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
 - Linux Compilation Time
   Optimization v.s. Legacy = 4min56s v.s. 5min3s
   --> no obvious difference

- Source Code
 - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
 - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

Hi, Wei,

I have a very high-level question to the series.

IIUC the core idea for this series is that we can avoid sending some
of the pages if we know that we don't need to send them.  I think this
is based on the fact that on the destination side all the pages are by
default zero after they are malloced.  While before this series, IIUC
any migration will send every single page to destination, no matter
whether it's zeroed or not.  So I'm uncertain about whether this will
affect the received bitmap on the destination side.  Say, before this
series, the received bitmap will directly cover the whole RAM bitmap
after migration is finished, now it's won't.  Will there be any side
effect?  I don't see obvious issue now, but just raise this question
up.

Meanwhile, this reminds me about a more funny idea: whether we can
just avoid sending the zero pages directly from QEMU's perspective.
In other words, can we just do nothing if save_zero_page() detected
that the page is zero (I guess the is_zero_range() can be fast too,
but I don't know exactly how fast it is)?  And how that would be
differed from this page hinting way in either performance and other
aspects.

I noticed a problem (after I wrote the above paragraph 5 minutes
ago...): when a page was valid and sent to the destination (with
non-zero data), however after a while that page was zeroed.  Then if
we don't send zero pages at all, we won't send the page after it's
zeroed.  Then on the destination side we'll have a stale non-zero
page.  Is my understanding correct?  Will that be a problem to this
series too where a valid page can be possibly freed and hinted?


I think that won't be an issue either for zero page optimization or this 
free page optimization.


For the zero page optimization, QEMU always sends compressed 0s to the 
destination. The zero page is detected at the time QEMU checks it 
(before sending the page). if it is a 0 page, QEMU compresses all 0s 
(actually just a flag) and send it.


For the free page optimization, we skip free pages (could be thought of 
as 0 pages in this context). The zero pages are detected at the time 
guest reports it QEMU. The page won't be reported if it is non-zero 
(i.e. used).



Best,
Wei



[Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall

2018-06-01 Thread Richard Henderson
This function is, as I think everyone will agree, way too large.
This is about a third of the complete change, but I thought I'd
get some feedback on the method and form before I go any farther.


r~


Richard Henderson (33):
  linux-user: Split out do_syscall1
  linux-user: Relax single exit from "break"
  linux-user: Propagate goto ebadf to return
  linux-user: Propagate goto efault to return
  linux-user: Propagate goto unimplemented_nowarn to return
  linux-user: Split out goto unimplemented to do_unimplemented
  linux-user: Propagate goto fail to return
  linux-user: Make syscall number unsigned
  linux-user: Set up infrastructure for table-izing syscalls
  linux-user: Split out brk, close, exit, read, write
  linux-user: Split out execve
  linux-user: Split out open, openat
  linux-user: Split out name_to_handle_at
  linux-user: Split out open_to_handle_at
  linux-user: Split out creat, fork, waitid, waitpid
  linux-user: Split out link, linkat
  linux-user: Split out unlink, unlinkat
  linux-user: Split out chdir, mknod, mknodat, time, chmod
  linux-user: Remove all unimplemented entries
  linux-user: Split out getpid, getxpid, lseek
  linux-user: Split out mount, umount
  linux-user: Split out alarm, pause, stime, utime, utimes
  linux-user: Split out access, faccessat, futimesat, kill, nice, sync,
syncfs
  linux-user: Split out rename, renameat, renameat2
  linux-user: Split out dup, mkdir, mkdirat, rmdir
  linux-user: Split out acct, pipe, pipe2, times, umount2
  linux-user: Split out ioctl
  linux-user: Split out chroot, dup2, dup3, fcntl, setpgid, umask
  linux-user: Split out getpgrp, getppid, setsid
  linux-user: Split out rt_sigaction, sigaction
  linux-user: Split out rt_sigprocmask, sgetmask, sigprocmask, ssetmask
  linux-user: Split out rt_sigpending, rt_sigsuspend, sigpending,
sigsuspend
  linux-user: Split out rt_sigqueueinfo, rt_sigtimedwait,
rt_tgsigqueueinfo

 linux-user/qemu.h|2 +-
 linux-user/syscall.c | 4651 ++
 2 files changed, 2394 insertions(+), 2259 deletions(-)

-- 
2.17.0




[Qemu-devel] [PATCH 05/33] linux-user: Propagate goto unimplemented_nowarn to return

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8ea2099001..f7b7051c1c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -12081,7 +12081,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
   return 0;
   }
 #else
-  goto unimplemented_nowarn;
+  return -TARGET_ENOSYS;
 #endif
 #endif
 #ifdef TARGET_NR_get_thread_area
@@ -12094,12 +12094,12 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ts->tp_value;
 }
 #else
-goto unimplemented_nowarn;
+return -TARGET_ENOSYS;
 #endif
 #endif
 #ifdef TARGET_NR_getdomainname
 case TARGET_NR_getdomainname:
-goto unimplemented_nowarn;
+return -TARGET_ENOSYS;
 #endif
 
 #ifdef TARGET_NR_clock_settime
@@ -12184,7 +12184,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
  * holding a mutex that is shared with another process via
  * shared memory).
  */
-goto unimplemented_nowarn;
+return -TARGET_ENOSYS;
 #endif
 
 #if defined(TARGET_NR_utimensat)
@@ -12886,9 +12886,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 default:
 unimplemented:
 gemu_log("qemu: Unsupported syscall: %d\n", num);
-#if defined(TARGET_NR_setxattr) || defined(TARGET_NR_get_thread_area) || 
defined(TARGET_NR_getdomainname) || defined(TARGET_NR_set_robust_list)
-unimplemented_nowarn:
-#endif
 return -TARGET_ENOSYS;
 }
 fail:
-- 
2.17.0




[Qemu-devel] [PATCH 03/33] linux-user: Propagate goto ebadf to return

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 187 +--
 1 file changed, 92 insertions(+), 95 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 258aff0411..d0bf650c62 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8025,7 +8025,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return 0;
 } else {
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
 goto efault;
@@ -8039,7 +8039,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 case TARGET_NR_write:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1)))
 goto efault;
@@ -8070,7 +8070,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 case TARGET_NR_openat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 if (!(p = lock_user_string(arg2)))
 goto efault;
@@ -8083,7 +8083,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_name_to_handle_at:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5);
 return ret;
@@ -8091,7 +8091,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_open_by_handle_at:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 ret = do_open_by_handle_at(arg1, arg2, arg3);
 fd_trans_unregister(ret);
@@ -8099,7 +8099,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 case TARGET_NR_close:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 fd_trans_unregister(arg1);
 return get_errno(close(arg1));
@@ -8163,7 +8163,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_linkat)
 case TARGET_NR_linkat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 } else {
 void * p2 = NULL;
 if (!arg2 || !arg4)
@@ -8190,7 +8190,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_unlinkat)
 case TARGET_NR_unlinkat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 if (!(p = lock_user_string(arg2)))
 goto efault;
@@ -8324,7 +8324,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_mknodat)
 case TARGET_NR_mknodat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 if (!(p = lock_user_string(arg2)))
 goto efault;
@@ -8350,7 +8350,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 case TARGET_NR_lseek:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 return get_errno(lseek(arg1, arg2, arg3));
 #if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA)
@@ -8497,7 +8497,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_futimesat)
 case TARGET_NR_futimesat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 } else {
 struct timeval *tvp, tv[2];
 if (arg3) {
@@ -8543,7 +8543,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_faccessat) && defined(__NR_faccessat)
 case TARGET_NR_faccessat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 }
 if (!(fn = lock_user_string(arg2))) {
 goto efault;
@@ -8590,7 +8590,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_renameat)
 case TARGET_NR_renameat:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 } else {
 void *p2;
 p  = lock_user_string(arg2);
@@ -8607,7 +8607,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_renameat2)
 case TARGET_NR_renameat2:
 if (is_hostfd(arg1)) {
-goto ebadf;
+return -TARGET_EBADF;
 } else {
 void *p2;

[Qemu-devel] [PATCH 08/33] linux-user: Make syscall number unsigned

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/qemu.h|  2 +-
 linux-user/syscall.c | 20 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 05a82a3628..623a8d8b7a 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -231,7 +231,7 @@ abi_long memcpy_to_target(abi_ulong dest, const void *src,
 void target_set_brk(abi_ulong new_brk);
 abi_long do_brk(abi_ulong new_brk);
 void syscall_init(void);
-abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
+abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1,
 abi_long arg2, abi_long arg3, abi_long arg4,
 abi_long arg5, abi_long arg6, abi_long arg7,
 abi_long arg8);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a413aad658..e2e2d58e84 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -719,20 +719,20 @@ static inline int next_free_host_timer(void)
 
 /* ARM EABI and MIPS expect 64bit types aligned even on pairs or registers */
 #ifdef TARGET_ARM
-static inline int regpairs_aligned(void *cpu_env, int num)
+static inline int regpairs_aligned(void *cpu_env, unsigned num)
 {
 return CPUARMState *)cpu_env)->eabi) == 1) ;
 }
 #elif defined(TARGET_MIPS) && (TARGET_ABI_BITS == 32)
-static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 1; }
 #elif defined(TARGET_PPC) && !defined(TARGET_PPC64)
 /* SysV AVI for PPC32 expects 64bit parameters to be passed on odd/even pairs
  * of registers which translates to the same as ARM/MIPS, because we start with
  * r3 as arg1 */
-static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 1; }
 #elif defined(TARGET_SH4)
 /* SH4 doesn't align register pairs, except for p{read,write}64 */
-static inline int regpairs_aligned(void *cpu_env, int num)
+static inline int regpairs_aligned(void *cpu_env, unsigned num)
 {
 switch (num) {
 case TARGET_NR_pread64:
@@ -744,9 +744,9 @@ static inline int regpairs_aligned(void *cpu_env, int num)
 }
 }
 #elif defined(TARGET_XTENSA)
-static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 1; }
 #else
-static inline int regpairs_aligned(void *cpu_env, int num) { return 0; }
+static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 0; }
 #endif
 
 #define ERRNO_TABLE_SIZE 1200
@@ -7962,9 +7962,9 @@ static int host_to_target_cpu_mask(const unsigned long 
*host_mask,
 return 0;
 }
 
-static abi_long do_unimplemented(int num)
+static abi_long do_unimplemented(unsigned num)
 {
-gemu_log("qemu: Unsupported syscall: %d\n", num);
+gemu_log("qemu: Unsupported syscall: %u\n", num);
 return -TARGET_ENOSYS;
 }
 
@@ -7973,7 +7973,7 @@ static abi_long do_unimplemented(int num)
  * of syscall results, can be performed.
  * All errnos that do_syscall() returns must be -TARGET_.
  */
-static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
+static abi_long do_syscall1(void *cpu_env, unsigned num, abi_long arg1,
 abi_long arg2, abi_long arg3, abi_long arg4,
 abi_long arg5, abi_long arg6, abi_long arg7,
 abi_long arg8)
@@ -12880,7 +12880,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 }
 
-abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
+abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1,
 abi_long arg2, abi_long arg3, abi_long arg4,
 abi_long arg5, abi_long arg6, abi_long arg7,
 abi_long arg8)
-- 
2.17.0




[Qemu-devel] [PATCH 11/33] linux-user: Split out execve

2018-06-01 Thread Richard Henderson
At the same time, fix the repeated re-reading of the argv and env
arrays from guest memory.  Instead read into a unified array once.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 203 ++-
 1 file changed, 106 insertions(+), 97 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b0d268dab7..a9b59a8658 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7998,6 +7998,111 @@ IMPL(close)
 return get_errno(close(arg1));
 }
 
+IMPL(execve)
+{
+abi_ulong *guest_ptrs;
+char **host_ptrs;
+int argc, envc, alloc, i;
+abi_ulong gp;
+abi_ulong guest_argp = arg2;
+abi_ulong guest_envp = arg3;
+char *filename;
+abi_long ret;
+
+/* Initial estimate of number of guest pointers required.  */
+alloc = 32;
+guest_ptrs = g_new(abi_ulong, alloc);
+
+/* Iterate through argp and envp, counting entries, and
+ * reading guest addresses from the arrays.
+ */
+for (gp = guest_argp, argc = 0; gp; gp += sizeof(abi_ulong)) {
+abi_ulong addr;
+if (get_user_ual(addr, gp)) {
+return -TARGET_EFAULT;
+}
+if (!addr) {
+break;
+}
+if (argc >= alloc) {
+alloc *= 2;
+guest_ptrs = g_renew(abi_ulong, guest_ptrs, alloc);
+}
+guest_ptrs[argc++] = addr;
+}
+for (gp = guest_envp, envc = 0; gp; gp += sizeof(abi_ulong)) {
+abi_ulong addr;
+if (get_user_ual(addr, gp)) {
+return -TARGET_EFAULT;
+}
+if (!addr) {
+break;
+}
+if (argc + envc >= alloc) {
+alloc *= 2;
+guest_ptrs = g_renew(abi_ulong, guest_ptrs, alloc);
+}
+guest_ptrs[argc + envc++] = addr;
+}
+
+/* Exact number of host pointers required.  */
+host_ptrs = g_new0(char *, argc + envc + 2);
+
+/* Iterate through the argp and envp that we already read
+ * and convert the guest pointers to host pointers.
+ */
+ret = -TARGET_EFAULT;
+for (i = 0; i < argc; ++i) {
+char *p = lock_user_string(guest_ptrs[i]);
+if (!p) {
+goto fini;
+}
+host_ptrs[i] = p;
+}
+for (i = 0; i < envc; ++i) {
+char *p = lock_user_string(guest_ptrs[argc + i]);
+if (!p) {
+goto fini;
+}
+host_ptrs[argc + 1 + i] = p;
+}
+
+/* Read the executable filename.  */
+filename = lock_user_string(arg1);
+if (!filename) {
+goto fini;
+}
+
+/* Although execve() is not an interruptible syscall it is
+ * a special case where we must use the safe_syscall wrapper:
+ * if we allow a signal to happen before we make the host
+ * syscall then we will 'lose' it, because at the point of
+ * execve the process leaves QEMU's control. So we use the
+ * safe syscall wrapper to ensure that we either take the
+ * signal as a guest signal, or else it does not happen
+ * before the execve completes and makes it the other
+ * program's problem.
+ */
+ret = get_errno(safe_execve(filename, host_ptrs, host_ptrs + argc + 1));
+unlock_user(filename, arg1, 0);
+
+ fini:
+/* Deallocate everything we allocated above.  */
+for (i = 0; i < argc; ++i) {
+if (host_ptrs[i]) {
+unlock_user(host_ptrs[i], guest_ptrs[i], 0);
+}
+}
+for (i = 0; i < envc; ++i) {
+if (host_ptrs[argc + 1 + i]) {
+unlock_user(host_ptrs[argc + 1 + i], guest_ptrs[argc + i], 0);
+}
+}
+g_free(host_ptrs);
+g_free(guest_ptrs);
+return ret;
+}
+
 IMPL(exit)
 {
 CPUState *cpu = ENV_GET_CPU(cpu_env);
@@ -8237,103 +8342,6 @@ IMPL(everything_else)
 unlock_user(p, arg2, 0);
 return ret;
 #endif
-case TARGET_NR_execve:
-{
-char **argp, **envp;
-int argc, envc;
-abi_ulong gp;
-abi_ulong guest_argp;
-abi_ulong guest_envp;
-abi_ulong addr;
-char **q;
-int total_size = 0;
-
-argc = 0;
-guest_argp = arg2;
-for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) {
-if (get_user_ual(addr, gp))
-return -TARGET_EFAULT;
-if (!addr)
-break;
-argc++;
-}
-envc = 0;
-guest_envp = arg3;
-for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) {
-if (get_user_ual(addr, gp))
-return -TARGET_EFAULT;
-if (!addr)
-break;
-envc++;
-}
-
-argp = g_new0(char *, argc + 1);
-envp = g_new0(char *, envc + 1);
-
-for (gp = guest_argp, q = argp; gp;
-  gp += sizeof(abi_ulong), q++) {
-if (get_user_ual(addr, gp))
-

[Qemu-devel] [PATCH 01/33] linux-user: Split out do_syscall1

2018-06-01 Thread Richard Henderson
There was supposed to be a single point of return for do_syscall
so that tracing works properly.  However, there are a few bugs
in that area.  It is significantly simpler to simply split out
an inner function to enforce this.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 89 +++-
 1 file changed, 54 insertions(+), 35 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b75dd9a5bc..ebaefebcc2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7962,13 +7962,15 @@ static int host_to_target_cpu_mask(const unsigned long 
*host_mask,
 return 0;
 }
 
-/* do_syscall() should always have a single exit point at the end so
-   that actions, such as logging of syscall results, can be performed.
-   All errnos that do_syscall() returns must be -TARGET_. */
-abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
-abi_long arg2, abi_long arg3, abi_long arg4,
-abi_long arg5, abi_long arg6, abi_long arg7,
-abi_long arg8)
+/* This is an internal helper for do_syscall so that it is easier
+ * to have a single return point, so that actions, such as logging
+ * of syscall results, can be performed.
+ * All errnos that do_syscall() returns must be -TARGET_.
+ */
+static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
+abi_long arg2, abi_long arg3, abi_long arg4,
+abi_long arg5, abi_long arg6, abi_long arg7,
+abi_long arg8)
 {
 CPUState *cpu = ENV_GET_CPU(cpu_env);
 abi_long ret;
@@ -7977,28 +7979,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 void *p;
 char *fn;
 
-#if defined(DEBUG_ERESTARTSYS)
-/* Debug-only code for exercising the syscall-restart code paths
- * in the per-architecture cpu main loops: restart every syscall
- * the guest makes once before letting it through.
- */
-{
-static int flag;
-
-flag = !flag;
-if (flag) {
-return -TARGET_ERESTARTSYS;
-}
-}
-#endif
-
-#ifdef DEBUG
-gemu_log("syscall %d", num);
-#endif
-trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, 
arg7, arg8);
-if(do_strace)
-print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
-
 switch(num) {
 case TARGET_NR_exit:
 /* In old applications this may be used to implement _exit(2).
@@ -13101,12 +13081,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 }
 fail:
-#ifdef DEBUG
-gemu_log(" = " TARGET_ABI_FMT_ld "\n", ret);
-#endif
-if(do_strace)
-print_syscall_ret(num, ret);
-trace_guest_user_syscall_ret(cpu, num, ret);
 return ret;
 efault:
 ret = -TARGET_EFAULT;
@@ -13115,3 +13089,48 @@ ebadf:
 ret = -TARGET_EBADF;
 goto fail;
 }
+
+abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
+abi_long arg2, abi_long arg3, abi_long arg4,
+abi_long arg5, abi_long arg6, abi_long arg7,
+abi_long arg8)
+{
+CPUState *cpu = ENV_GET_CPU(cpu_env);
+abi_long ret;
+
+#if defined(DEBUG_ERESTARTSYS)
+/* Debug-only code for exercising the syscall-restart code paths
+ * in the per-architecture cpu main loops: restart every syscall
+ * the guest makes once before letting it through.
+ */
+{
+static bool flag;
+flag = !flag;
+if (flag) {
+return -TARGET_ERESTARTSYS;
+}
+}
+#endif
+#ifdef DEBUG
+gemu_log("syscall %d", num);
+#endif
+
+trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4,
+ arg5, arg6, arg7, arg8);
+
+if (unlikely(do_strace)) {
+print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
+ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4,
+  arg5, arg6, arg7, arg8);
+print_syscall_ret(num, ret);
+} else {
+ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4,
+  arg5, arg6, arg7, arg8);
+}
+
+#ifdef DEBUG
+gemu_log(" = " TARGET_ABI_FMT_ld "\n", ret);
+#endif
+trace_guest_user_syscall_ret(cpu, num, ret);
+return ret;
+}
-- 
2.17.0




[Qemu-devel] [PATCH 07/33] linux-user: Propagate goto fail to return

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 62 
 1 file changed, 23 insertions(+), 39 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4269ec2c23..a413aad658 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9001,8 +9001,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 how = SIG_SETMASK;
 break;
 default:
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 mask = arg2;
 target_to_host_old_sigset(&set, &mask);
@@ -9029,8 +9028,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 how = SIG_SETMASK;
 break;
 default:
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 if (!(p = lock_user(VERIFY_READ, arg2, 
sizeof(target_sigset_t), 1)))
 return -TARGET_EFAULT;
@@ -9073,8 +9071,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 how = SIG_SETMASK;
 break;
 default:
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 if (!(p = lock_user(VERIFY_READ, arg2, 
sizeof(target_sigset_t), 1)))
 return -TARGET_EFAULT;
@@ -9363,15 +9360,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 
 ret = copy_from_user_fdset_ptr(&rfds, &rfds_ptr, rfd_addr, n);
 if (ret) {
-goto fail;
+return ret;
 }
 ret = copy_from_user_fdset_ptr(&wfds, &wfds_ptr, wfd_addr, n);
 if (ret) {
-goto fail;
+return ret;
 }
 ret = copy_from_user_fdset_ptr(&efds, &efds_ptr, efd_addr, n);
 if (ret) {
-goto fail;
+return ret;
 }
 if (contains_hostfd(&rfds) ||
 contains_hostfd(&wfds) ||
@@ -9409,8 +9406,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 sig.set = &set;
 if (arg_sigsize != sizeof(*target_sigset)) {
 /* Like the kernel, we enforce correct size sigsets */
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 target_sigset = lock_user(VERIFY_READ, arg_sigset,
   sizeof(*target_sigset), 1);
@@ -9951,18 +9947,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 case TARGET_SYSLOG_ACTION_READ_CLEAR:/* Read/clear msgs */
 case TARGET_SYSLOG_ACTION_READ_ALL:  /* Read last messages */
 {
-ret = -TARGET_EINVAL;
 if (len < 0) {
-goto fail;
+return -TARGET_EINVAL;
 }
-ret = 0;
 if (len == 0) {
-return ret;
+return 0;
 }
 p = lock_user(VERIFY_WRITE, arg2, arg3, 0);
 if (!p) {
-ret = -TARGET_EFAULT;
-goto fail;
+return -TARGET_EFAULT;
 }
 ret = get_errno(sys_syslog((int)arg1, p, (int)arg3));
 unlock_user(p, arg2, arg3);
@@ -10363,8 +10356,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 
 dirp = g_try_malloc(count);
 if (!dirp) {
-ret = -TARGET_ENOMEM;
-goto fail;
+return -TARGET_ENOMEM;
 }
 
 ret = get_errno(sys_getdents(arg1, dirp, count));
@@ -10556,7 +10548,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 if (ret < 0) {
 unlock_user(target_pfd, arg1,
 sizeof(struct target_pollfd) * nfds);
-goto fail;
+return ret;
 }
 }
 
@@ -10788,7 +10780,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
arg2 ? &node : NULL,
NULL));
 if (is_error(ret)) {
-goto fail;
+return ret;
 }
 if (arg1 && put_user_u32(cpu, arg1)) {
 return -TARGET_EFAULT;
@@ -11290,8 +11282,7 @@ static abi_long do_syscall1(void *c

[Qemu-devel] [PATCH 09/33] linux-user: Set up infrastructure for table-izing syscalls

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 42 ++
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e2e2d58e84..fc3dc3f40d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7962,21 +7962,34 @@ static int host_to_target_cpu_mask(const unsigned long 
*host_mask,
 return 0;
 }
 
+typedef abi_long impl_fn(void *cpu_env, unsigned num, abi_long arg1,
+ abi_long arg2, abi_long arg3, abi_long arg4,
+ abi_long arg5, abi_long arg6, abi_long arg7,
+ abi_long arg8);
+
 static abi_long do_unimplemented(unsigned num)
 {
 gemu_log("qemu: Unsupported syscall: %u\n", num);
 return -TARGET_ENOSYS;
 }
 
+#define IMPL(NAME) \
+static abi_long impl_##NAME(void *cpu_env, unsigned num, abi_long arg1,   \
+abi_long arg2, abi_long arg3, abi_long arg4,  \
+abi_long arg5, abi_long arg6, abi_long arg7,  \
+abi_long arg8)
+
+IMPL(enosys)
+{
+return do_unimplemented(num);
+}
+
 /* This is an internal helper for do_syscall so that it is easier
  * to have a single return point, so that actions, such as logging
  * of syscall results, can be performed.
  * All errnos that do_syscall() returns must be -TARGET_.
  */
-static abi_long do_syscall1(void *cpu_env, unsigned num, abi_long arg1,
-abi_long arg2, abi_long arg3, abi_long arg4,
-abi_long arg5, abi_long arg6, abi_long arg7,
-abi_long arg8)
+IMPL(everything_else)
 {
 CPUState *cpu = ENV_GET_CPU(cpu_env);
 abi_long ret;
@@ -12880,6 +12893,10 @@ static abi_long do_syscall1(void *cpu_env, unsigned 
num, abi_long arg1,
 return ret;
 }
 
+static impl_fn * const syscall_table[] = {
+impl_everything_else,
+};
+
 abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1,
 abi_long arg2, abi_long arg3, abi_long arg4,
 abi_long arg5, abi_long arg6, abi_long arg7,
@@ -12908,14 +12925,23 @@ abi_long do_syscall(void *cpu_env, unsigned num, 
abi_long arg1,
 trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4,
  arg5, arg6, arg7, arg8);
 
+/* ??? After impl_everything_else is fully split, initialize with NULL.  */
+impl_fn *fn = impl_everything_else;
+if (num < ARRAY_SIZE(syscall_table)) {
+fn = syscall_table[num];
+}
+if (fn == NULL) {
+fn = impl_enosys;
+}
+
 if (unlikely(do_strace)) {
 print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
-ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4,
-  arg5, arg6, arg7, arg8);
+ret = fn(cpu_env, num, arg1, arg2, arg3, arg4,
+ arg5, arg6, arg7, arg8);
 print_syscall_ret(num, ret);
 } else {
-ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4,
-  arg5, arg6, arg7, arg8);
+ret = fn(cpu_env, num, arg1, arg2, arg3, arg4,
+ arg5, arg6, arg7, arg8);
 }
 
 #ifdef DEBUG
-- 
2.17.0




[Qemu-devel] [PATCH 12/33] linux-user: Split out open, openat

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 65 
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a9b59a8658..fb1a8a4e7e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8145,6 +8145,44 @@ IMPL(exit)
 g_assert_not_reached();
 }
 
+#ifdef TARGET_NR_open
+IMPL(open)
+{
+char *fn = lock_user_string(arg1);
+abi_long ret;
+
+if (!fn) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(do_openat(cpu_env, AT_FDCWD, fn,
+  target_to_host_bitmask(arg2, fcntl_flags_tbl),
+  arg3));
+fd_trans_unregister(ret);
+unlock_user(fn, arg1, 0);
+return ret;
+}
+#endif
+
+IMPL(openat)
+{
+char *fn;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+fn = lock_user_string(arg2);
+if (!fn) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(do_openat(cpu_env, arg1, fn,
+  target_to_host_bitmask(arg3, fcntl_flags_tbl),
+  arg4));
+fd_trans_unregister(ret);
+unlock_user(fn, arg2, 0);
+return ret;
+}
+
 IMPL(read)
 {
 abi_long ret;
@@ -8210,29 +8248,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_open
-case TARGET_NR_open:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(do_openat(cpu_env, AT_FDCWD, p,
-  target_to_host_bitmask(arg2, 
fcntl_flags_tbl),
-  arg3));
-fd_trans_unregister(ret);
-unlock_user(p, arg1, 0);
-return ret;
-#endif
-case TARGET_NR_openat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-if (!(p = lock_user_string(arg2)))
-return -TARGET_EFAULT;
-ret = get_errno(do_openat(cpu_env, arg1, p,
-  target_to_host_bitmask(arg3, 
fcntl_flags_tbl),
-  arg4));
-fd_trans_unregister(ret);
-unlock_user(p, arg2, 0);
-return ret;
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_name_to_handle_at:
 if (is_hostfd(arg1)) {
@@ -12926,6 +12941,10 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_close] = impl_close,
 [TARGET_NR_execve] = impl_execve,
 [TARGET_NR_exit] = impl_exit,
+#ifdef TARGET_NR_open
+[TARGET_NR_open] = impl_open,
+#endif
+[TARGET_NR_openat] = impl_openat,
 [TARGET_NR_read] = impl_read,
 [TARGET_NR_write] = impl_write,
 };
-- 
2.17.0




[Qemu-devel] [PATCH 06/33] linux-user: Split out goto unimplemented to do_unimplemented

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 82 +++-
 1 file changed, 43 insertions(+), 39 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f7b7051c1c..4269ec2c23 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7962,6 +7962,12 @@ static int host_to_target_cpu_mask(const unsigned long 
*host_mask,
 return 0;
 }
 
+static abi_long do_unimplemented(int num)
+{
+gemu_log("qemu: Unsupported syscall: %d\n", num);
+return -TARGET_ENOSYS;
+}
+
 /* This is an internal helper for do_syscall so that it is easier
  * to have a single return point, so that actions, such as logging
  * of syscall results, can be performed.
@@ -8342,11 +8348,11 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_break
 case TARGET_NR_break:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_oldstat
 case TARGET_NR_oldstat:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 case TARGET_NR_lseek:
 if (is_hostfd(arg1)) {
@@ -8436,14 +8442,14 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 #endif
 case TARGET_NR_ptrace:
-goto unimplemented;
+return do_unimplemented(num);
 #ifdef TARGET_NR_alarm /* not on alpha */
 case TARGET_NR_alarm:
 return alarm(arg1);
 #endif
 #ifdef TARGET_NR_oldfstat
 case TARGET_NR_oldfstat:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_pause /* not on alpha */
 case TARGET_NR_pause:
@@ -8522,11 +8528,11 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_stty
 case TARGET_NR_stty:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_gtty
 case TARGET_NR_gtty:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_access
 case TARGET_NR_access:
@@ -8561,7 +8567,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_ftime
 case TARGET_NR_ftime:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 case TARGET_NR_sync:
 sync();
@@ -8687,11 +8693,11 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 #ifdef TARGET_NR_prof
 case TARGET_NR_prof:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_signal
 case TARGET_NR_signal:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 case TARGET_NR_acct:
 if (arg1 == 0) {
@@ -8715,7 +8721,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_lock
 case TARGET_NR_lock:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 case TARGET_NR_ioctl:
 return do_ioctl(arg1, arg2, arg3);
@@ -8725,17 +8731,17 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_mpx
 case TARGET_NR_mpx:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 case TARGET_NR_setpgid:
 return get_errno(setpgid(arg1, arg2));
 #ifdef TARGET_NR_ulimit
 case TARGET_NR_ulimit:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_oldolduname
 case TARGET_NR_oldolduname:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 case TARGET_NR_umask:
 return get_errno(umask(arg1));
@@ -8747,7 +8753,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 #ifdef TARGET_NR_ustat
 case TARGET_NR_ustat:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_dup2
 case TARGET_NR_dup2:
@@ -9471,7 +9477,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_oldlstat
 case TARGET_NR_oldlstat:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_readlink
 case TARGET_NR_readlink:
@@ -9536,7 +9542,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #ifdef TARGET_NR_uselib
 case TARGET_NR_uselib:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_swapon
 case TARGET_NR_swapon:
@@ -9561,7 +9567,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 #ifdef TARGET_NR_readdir
 case TARGET_NR_readdir:
-goto unimplemented;
+return do_unimplemented(num);
 #endif
 #ifdef TARGET_NR_mmap
 case TARGET_NR_mmap:
@@ -9699,7 +9705,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return get_errno(setpriority(arg1, arg2, arg3));
 #ifdef TARGET_NR_pro

[Qemu-devel] [PATCH 10/33] linux-user: Split out brk, close, exit, read, write

2018-06-01 Thread Richard Henderson
These are relatively simple unconditionally defined syscalls.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 198 ---
 1 file changed, 111 insertions(+), 87 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index fc3dc3f40d..b0d268dab7 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7984,6 +7984,112 @@ IMPL(enosys)
 return do_unimplemented(num);
 }
 
+IMPL(brk)
+{
+return do_brk(arg1);
+}
+
+IMPL(close)
+{
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+fd_trans_unregister(arg1);
+return get_errno(close(arg1));
+}
+
+IMPL(exit)
+{
+CPUState *cpu = ENV_GET_CPU(cpu_env);
+
+/* In old applications this may be used to implement _exit(2).
+   However in threaded applictions it is used for thread termination,
+   and _exit_group is used for application termination.
+   Do thread termination if we have more then one thread.  */
+if (block_signals()) {
+return -TARGET_ERESTARTSYS;
+}
+
+cpu_list_lock();
+
+if (CPU_NEXT(first_cpu)) {
+/* Remove the CPU from the list.  */
+QTAILQ_REMOVE(&cpus, cpu, node);
+cpu_list_unlock();
+
+TaskState *ts = cpu->opaque;
+if (ts->child_tidptr) {
+put_user_u32(0, ts->child_tidptr);
+sys_futex(g2h(ts->child_tidptr), FUTEX_WAKE, INT_MAX,
+  NULL, NULL, 0);
+}
+thread_cpu = NULL;
+object_unref(OBJECT(cpu));
+g_free(ts);
+rcu_unregister_thread();
+pthread_exit(NULL);
+} else {
+cpu_list_unlock();
+
+#ifdef TARGET_GPROF
+_mcleanup();
+#endif
+gdb_exit(cpu_env, arg1);
+_exit(arg1);
+}
+g_assert_not_reached();
+}
+
+IMPL(read)
+{
+abi_long ret;
+char *fn;
+
+if (arg3 == 0) {
+return 0;
+}
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+fn = lock_user(VERIFY_WRITE, arg2, arg3, 0);
+if (!fn) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(safe_read(arg1, fn, arg3));
+if (ret >= 0 && fd_trans_host_to_target_data(arg1)) {
+ret = fd_trans_host_to_target_data(arg1)(fn, ret);
+}
+unlock_user(fn, arg2, ret);
+return ret;
+}
+
+IMPL(write)
+{
+abi_long ret;
+char *fn;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+fn = lock_user(VERIFY_READ, arg2, arg3, 1);
+if (!fn) {
+return -TARGET_EFAULT;
+}
+if (fd_trans_target_to_host_data(arg1)) {
+void *copy = g_malloc(arg3);
+memcpy(copy, fn, arg3);
+ret = fd_trans_target_to_host_data(arg1)(copy, arg3);
+if (ret >= 0) {
+ret = get_errno(safe_write(arg1, copy, ret));
+}
+g_free(copy);
+} else {
+ret = get_errno(safe_write(arg1, fn, arg3));
+}
+unlock_user(fn, arg2, ret);
+return ret;
+}
+
 /* This is an internal helper for do_syscall so that it is easier
  * to have a single return point, so that actions, such as logging
  * of syscall results, can be performed.
@@ -7999,83 +8105,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-case TARGET_NR_exit:
-/* In old applications this may be used to implement _exit(2).
-   However in threaded applictions it is used for thread termination,
-   and _exit_group is used for application termination.
-   Do thread termination if we have more then one thread.  */
-
-if (block_signals()) {
-return -TARGET_ERESTARTSYS;
-}
-
-cpu_list_lock();
-
-if (CPU_NEXT(first_cpu)) {
-TaskState *ts;
-
-/* Remove the CPU from the list.  */
-QTAILQ_REMOVE(&cpus, cpu, node);
-
-cpu_list_unlock();
-
-ts = cpu->opaque;
-if (ts->child_tidptr) {
-put_user_u32(0, ts->child_tidptr);
-sys_futex(g2h(ts->child_tidptr), FUTEX_WAKE, INT_MAX,
-  NULL, NULL, 0);
-}
-thread_cpu = NULL;
-object_unref(OBJECT(cpu));
-g_free(ts);
-rcu_unregister_thread();
-pthread_exit(NULL);
-}
-
-cpu_list_unlock();
-#ifdef TARGET_GPROF
-_mcleanup();
-#endif
-gdb_exit(cpu_env, arg1);
-_exit(arg1);
-return 0; /* avoid warning */
-case TARGET_NR_read:
-if (arg3 == 0) {
-return 0;
-} else {
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
-return -TARGET_EFAULT;
-ret = get_errno(safe_read(arg1, p, arg3));
-if (ret >= 0 &&
-fd_trans_host_to_target_data(arg1)) {
-ret = fd_trans_host_to_target_data(arg1)(p, ret);
-}
-unlock_user(p, arg2, ret);
-

[Qemu-devel] [PATCH 13/33] linux-user: Split out name_to_handle_at

2018-06-01 Thread Richard Henderson
At the same time, merge do_name_to_handle_at into the new function.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 129 +--
 1 file changed, 64 insertions(+), 65 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index fb1a8a4e7e..4afc22c20c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7369,63 +7369,6 @@ static int do_futex(target_ulong uaddr, int op, int val, 
target_ulong timeout,
 return -TARGET_ENOSYS;
 }
 }
-#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-static abi_long do_name_to_handle_at(abi_long dirfd, abi_long pathname,
- abi_long handle, abi_long mount_id,
- abi_long flags)
-{
-struct file_handle *target_fh;
-struct file_handle *fh;
-int mid = 0;
-abi_long ret;
-char *name;
-unsigned int size, total_size;
-
-if (get_user_s32(size, handle)) {
-return -TARGET_EFAULT;
-}
-
-name = lock_user_string(pathname);
-if (!name) {
-return -TARGET_EFAULT;
-}
-
-total_size = sizeof(struct file_handle) + size;
-target_fh = lock_user(VERIFY_WRITE, handle, total_size, 0);
-if (!target_fh) {
-unlock_user(name, pathname, 0);
-return -TARGET_EFAULT;
-}
-
-fh = g_malloc0(total_size);
-fh->handle_bytes = size;
-
-TRY_INTERP_FD(ret, name,
-  name_to_handle_at(interp_dirfd, name + 1, fh, &mid, flags),
-  name_to_handle_at(dirfd, name, fh, &mid, flags));
-ret = get_errno(ret);
-unlock_user(name, pathname, 0);
-
-/* man name_to_handle_at(2):
- * Other than the use of the handle_bytes field, the caller should treat
- * the file_handle structure as an opaque data type
- */
-
-memcpy(target_fh, fh, total_size);
-target_fh->handle_bytes = tswap32(fh->handle_bytes);
-target_fh->handle_type = tswap32(fh->handle_type);
-g_free(fh);
-unlock_user(target_fh, handle, total_size);
-
-if (put_user_s32(mid, mount_id)) {
-return -TARGET_EFAULT;
-}
-
-return ret;
-
-}
-#endif
-
 #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 static abi_long do_open_by_handle_at(abi_long mount_fd, abi_long handle,
  abi_long flags)
@@ -8145,6 +8088,67 @@ IMPL(exit)
 g_assert_not_reached();
 }
 
+#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
+IMPL(name_to_handle_at)
+{
+abi_long dirfd = arg1;
+abi_long pathname = arg2;
+abi_long handle = arg3;
+abi_long mount_id = arg4;
+abi_long flags = arg5;
+struct file_handle *target_fh;
+struct file_handle *fh;
+int mid = 0;
+abi_long ret;
+char *name;
+unsigned int size, total_size;
+
+if (is_hostfd(dirfd)) {
+return -TARGET_EBADF;
+}
+if (get_user_s32(size, handle)) {
+return -TARGET_EFAULT;
+}
+
+name = lock_user_string(pathname);
+if (!name) {
+return -TARGET_EFAULT;
+}
+
+total_size = sizeof(struct file_handle) + size;
+target_fh = lock_user(VERIFY_WRITE, handle, total_size, 0);
+if (!target_fh) {
+unlock_user(name, pathname, 0);
+return -TARGET_EFAULT;
+}
+
+fh = g_malloc0(total_size);
+fh->handle_bytes = size;
+
+TRY_INTERP_FD(ret, name,
+  name_to_handle_at(interp_dirfd, name + 1, fh, &mid, flags),
+  name_to_handle_at(dirfd, name, fh, &mid, flags));
+ret = get_errno(ret);
+unlock_user(name, pathname, 0);
+
+/* man name_to_handle_at(2):
+ * Other than the use of the handle_bytes field, the caller should treat
+ * the file_handle structure as an opaque data type
+ */
+
+memcpy(target_fh, fh, total_size);
+target_fh->handle_bytes = tswap32(fh->handle_bytes);
+target_fh->handle_type = tswap32(fh->handle_type);
+g_free(fh);
+unlock_user(target_fh, handle, total_size);
+
+if (put_user_s32(mid, mount_id)) {
+return -TARGET_EFAULT;
+}
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_open
 IMPL(open)
 {
@@ -8248,14 +8252,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-case TARGET_NR_name_to_handle_at:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5);
-return ret;
-#endif
 #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_open_by_handle_at:
 if (is_hostfd(arg1)) {
@@ -12941,6 +12937,9 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_close] = impl_close,
 [TARGET_NR_execve] = impl_execve,
 [TARGET_NR_exit] = impl_exit,
+#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
+[TARGET_NR_name_to_handle_at] 

[Qemu-devel] [PATCH 16/33] linux-user: Split out link, linkat

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 77 +---
 1 file changed, 43 insertions(+), 34 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e208f8647a..b5736436f8 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8078,6 +8078,43 @@ IMPL(fork)
 }
 #endif
 
+#ifdef TARGET_NR_link
+IMPL(link)
+{
+char *p1 = lock_user_string(arg1);
+char *p2 = lock_user_string(arg2);
+abi_long ret = -TARGET_EFAULT;
+
+if (p1 && p2) {
+ret = get_errno(link(p1, p2));
+}
+unlock_user(p1, arg1, 0);
+unlock_user(p2, arg2, 0);
+return ret;
+}
+#endif
+
+#if defined(TARGET_NR_linkat)
+IMPL(linkat)
+{
+char *p1, *p2;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+p1 = lock_user_string(arg2);
+p2 = lock_user_string(arg4);
+ret = -TARGET_EFAULT;
+if (p1 && p2) {
+ret = get_errno(linkat(arg1, p1, arg3, p2, arg5));
+}
+unlock_user(p1, arg2, 0);
+unlock_user(p2, arg4, 0);
+return ret;
+}
+#endif
+
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 IMPL(name_to_handle_at)
 {
@@ -8315,40 +8352,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_link
-case TARGET_NR_link:
-{
-void * p2;
-p = lock_user_string(arg1);
-p2 = lock_user_string(arg2);
-if (!p || !p2)
-ret = -TARGET_EFAULT;
-else
-ret = get_errno(link(p, p2));
-unlock_user(p2, arg2, 0);
-unlock_user(p, arg1, 0);
-}
-return ret;
-#endif
-#if defined(TARGET_NR_linkat)
-case TARGET_NR_linkat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-} else {
-void * p2 = NULL;
-if (!arg2 || !arg4)
-return -TARGET_EFAULT;
-p  = lock_user_string(arg2);
-p2 = lock_user_string(arg4);
-if (!p || !p2)
-ret = -TARGET_EFAULT;
-else
-ret = get_errno(linkat(arg1, p, arg3, p2, arg5));
-unlock_user(p, arg2, 0);
-unlock_user(p2, arg4, 0);
-}
-return ret;
-#endif
 #ifdef TARGET_NR_unlink
 case TARGET_NR_unlink:
 if (!(p = lock_user_string(arg1)))
@@ -12958,6 +12961,12 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_fork
 [TARGET_NR_fork] = impl_fork,
 #endif
+#ifdef TARGET_NR_link
+[TARGET_NR_link] = impl_link,
+#endif
+#if defined(TARGET_NR_linkat)
+[TARGET_NR_linkat] = impl_linkat,
+#endif
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 02/33] linux-user: Relax single exit from "break"

2018-06-01 Thread Richard Henderson
Transform outermost "break" to "return ret".  If the immediately
preceeding statement was an assignment to ret, return the value
directly.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 969 +--
 1 file changed, 390 insertions(+), 579 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ebaefebcc2..258aff0411 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7987,8 +7987,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
Do thread termination if we have more then one thread.  */
 
 if (block_signals()) {
-ret = -TARGET_ERESTARTSYS;
-break;
+return -TARGET_ERESTARTSYS;
 }
 
 cpu_list_lock();
@@ -8020,12 +8019,11 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 gdb_exit(cpu_env, arg1);
 _exit(arg1);
-ret = 0; /* avoid warning */
-break;
+return 0; /* avoid warning */
 case TARGET_NR_read:
-if (arg3 == 0)
-ret = 0;
-else {
+if (arg3 == 0) {
+return 0;
+} else {
 if (is_hostfd(arg1)) {
 goto ebadf;
 }
@@ -8038,7 +8036,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 unlock_user(p, arg2, ret);
 }
-break;
+return ret;
 case TARGET_NR_write:
 if (is_hostfd(arg1)) {
 goto ebadf;
@@ -8057,7 +8055,8 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(safe_write(arg1, p, arg3));
 }
 unlock_user(p, arg2, 0);
-break;
+return ret;
+
 #ifdef TARGET_NR_open
 case TARGET_NR_open:
 if (!(p = lock_user_string(arg1)))
@@ -8067,7 +8066,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
   arg3));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
-break;
+return ret;
 #endif
 case TARGET_NR_openat:
 if (is_hostfd(arg1)) {
@@ -8080,14 +8079,14 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
   arg4));
 fd_trans_unregister(ret);
 unlock_user(p, arg2, 0);
-break;
+return ret;
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_name_to_handle_at:
 if (is_hostfd(arg1)) {
 goto ebadf;
 }
 ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5);
-break;
+return ret;
 #endif
 #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_open_by_handle_at:
@@ -8096,22 +8095,20 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 ret = do_open_by_handle_at(arg1, arg2, arg3);
 fd_trans_unregister(ret);
-break;
+return ret;
 #endif
 case TARGET_NR_close:
 if (is_hostfd(arg1)) {
 goto ebadf;
 }
 fd_trans_unregister(arg1);
-ret = get_errno(close(arg1));
-break;
+return get_errno(close(arg1));
+
 case TARGET_NR_brk:
-ret = do_brk(arg1);
-break;
+return do_brk(arg1);
 #ifdef TARGET_NR_fork
 case TARGET_NR_fork:
-ret = get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
-break;
+return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
 #endif
 #ifdef TARGET_NR_waitpid
 case TARGET_NR_waitpid:
@@ -8122,7 +8119,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 && put_user_s32(host_to_target_waitstatus(status), arg2))
 goto efault;
 }
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_waitid
 case TARGET_NR_waitid:
@@ -8137,7 +8134,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 unlock_user(p, arg3, sizeof(target_siginfo_t));
 }
 }
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_creat /* not on alpha */
 case TARGET_NR_creat:
@@ -8146,7 +8143,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(creat(p, arg2));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_link
 case TARGET_NR_link:
@@ -8161,7 +8158,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 unlock_user(p2, arg2, 0);
 unlock_user(p, arg1, 0);
 }
-break;
+return ret;
 #endif
 #if defined(TARGET_NR_linkat)
 case TARGET_NR_linkat:
@@ -8180,7 +8177,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 unlock_user(p, arg2, 0);
 unlock_user(p

[Qemu-devel] [PATCH 04/33] linux-user: Propagate goto efault to return

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 311 +--
 1 file changed, 154 insertions(+), 157 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index d0bf650c62..8ea2099001 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8028,7 +8028,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return -TARGET_EBADF;
 }
 if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(safe_read(arg1, p, arg3));
 if (ret >= 0 &&
 fd_trans_host_to_target_data(arg1)) {
@@ -8042,7 +8042,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return -TARGET_EBADF;
 }
 if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1)))
-goto efault;
+return -TARGET_EFAULT;
 if (fd_trans_target_to_host_data(arg1)) {
 void *copy = g_malloc(arg3);
 memcpy(copy, p, arg3);
@@ -8060,7 +8060,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_open
 case TARGET_NR_open:
 if (!(p = lock_user_string(arg1)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(do_openat(cpu_env, AT_FDCWD, p,
   target_to_host_bitmask(arg2, 
fcntl_flags_tbl),
   arg3));
@@ -8073,7 +8073,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return -TARGET_EBADF;
 }
 if (!(p = lock_user_string(arg2)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(do_openat(cpu_env, arg1, p,
   target_to_host_bitmask(arg3, 
fcntl_flags_tbl),
   arg4));
@@ -8117,7 +8117,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(safe_wait4(arg1, &status, arg3, 0));
 if (!is_error(ret) && arg2 && ret
 && put_user_s32(host_to_target_waitstatus(status), arg2))
-goto efault;
+return -TARGET_EFAULT;
 }
 return ret;
 #endif
@@ -8129,7 +8129,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL));
 if (!is_error(ret) && arg3 && info.si_pid != 0) {
 if (!(p = lock_user(VERIFY_WRITE, arg3, 
sizeof(target_siginfo_t), 0)))
-goto efault;
+return -TARGET_EFAULT;
 host_to_target_siginfo(p, &info);
 unlock_user(p, arg3, sizeof(target_siginfo_t));
 }
@@ -8139,7 +8139,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_creat /* not on alpha */
 case TARGET_NR_creat:
 if (!(p = lock_user_string(arg1)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(creat(p, arg2));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
@@ -8167,7 +8167,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 } else {
 void * p2 = NULL;
 if (!arg2 || !arg4)
-goto efault;
+return -TARGET_EFAULT;
 p  = lock_user_string(arg2);
 p2 = lock_user_string(arg4);
 if (!p || !p2)
@@ -8182,7 +8182,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_unlink
 case TARGET_NR_unlink:
 if (!(p = lock_user_string(arg1)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(unlink(p));
 unlock_user(p, arg1, 0);
 return ret;
@@ -8193,7 +8193,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return -TARGET_EBADF;
 }
 if (!(p = lock_user_string(arg2)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(unlinkat(arg1, p, arg3));
 unlock_user(p, arg2, 0);
 return ret;
@@ -8213,7 +8213,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 guest_argp = arg2;
 for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) {
 if (get_user_ual(addr, gp))
-goto efault;
+return -TARGET_EFAULT;
 if (!addr)
 break;
 argc++;
@@ -8222,7 +8222,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 guest_envp = arg3;
 for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) {
 if (get_user_ual(addr, gp))
-goto efault;
+return -TARGET_EFAULT;

[Qemu-devel] [PATCH 25/33] linux-user: Split out dup, mkdir, mkdirat, rmdir

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 109 +--
 1 file changed, 73 insertions(+), 36 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 24514329b0..36092d753d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7977,6 +7977,20 @@ IMPL(creat)
 }
 #endif
 
+IMPL(dup)
+{
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+ret = get_errno(dup(arg1));
+if (ret >= 0) {
+fd_trans_dup(arg1, ret);
+}
+return ret;
+}
+
 IMPL(execve)
 {
 abi_ulong *guest_ptrs;
@@ -8249,6 +8263,40 @@ IMPL(lseek)
 return get_errno(lseek(arg1, arg2, arg3));
 }
 
+#ifdef TARGET_NR_mkdir
+IMPL(mkdir)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(mkdir(p, arg2));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_mkdirat
+IMPL(mkdirat)
+{
+char *p;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+p = lock_user_string(arg2);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(mkdirat(arg1, p, arg3));
+unlock_user(p, arg2, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_mknod
 IMPL(mknod)
 {
@@ -8558,6 +8606,21 @@ IMPL(renameat2)
 }
 #endif
 
+#ifdef TARGET_NR_rmdir
+IMPL(rmdir)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(rmdir(p));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_stime
 IMPL(stime)
 {
@@ -8768,42 +8831,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_mkdir
-case TARGET_NR_mkdir:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(mkdir(p, arg2));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
-#if defined(TARGET_NR_mkdirat)
-case TARGET_NR_mkdirat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-if (!(p = lock_user_string(arg2)))
-return -TARGET_EFAULT;
-ret = get_errno(mkdirat(arg1, p, arg3));
-unlock_user(p, arg2, 0);
-return ret;
-#endif
-#ifdef TARGET_NR_rmdir
-case TARGET_NR_rmdir:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(rmdir(p));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
-case TARGET_NR_dup:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-ret = get_errno(dup(arg1));
-if (ret >= 0) {
-fd_trans_dup(arg1, ret);
-}
-return ret;
 #ifdef TARGET_NR_pipe
 case TARGET_NR_pipe:
 return do_pipe(cpu_env, arg1, 0, 0);
@@ -12922,6 +12949,7 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_creat
 [TARGET_NR_creat] = impl_creat,
 #endif
+[TARGET_NR_dup] = impl_dup,
 [TARGET_NR_execve] = impl_execve,
 [TARGET_NR_exit] = impl_exit,
 #ifdef TARGET_NR_faccessat
@@ -12947,6 +12975,12 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_linkat] = impl_linkat,
 #endif
 [TARGET_NR_lseek] = impl_lseek,
+#ifdef TARGET_NR_mkdir
+[TARGET_NR_mkdir] = impl_mkdir,
+#endif
+#ifdef TARGET_NR_mkdirat
+[TARGET_NR_mkdirat] = impl_mkdirat,
+#endif
 #ifdef TARGET_NR_mknod
 [TARGET_NR_mknod] = impl_mknod,
 #endif
@@ -12980,6 +13014,9 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_renameat2
 [TARGET_NR_renameat2] = impl_renameat2,
 #endif
+#ifdef TARGET_NR_rmdir
+[TARGET_NR_rmdir] = impl_rmdir,
+#endif
 #ifdef TARGET_NR_stime
 [TARGET_NR_stime] = impl_stime,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 15/33] linux-user: Split out creat, fork, waitid, waitpid

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 108 +++
 1 file changed, 69 insertions(+), 39 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 48bb1c0231..e208f8647a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7908,6 +7908,22 @@ IMPL(close)
 return get_errno(close(arg1));
 }
 
+#ifdef TARGET_NR_creat
+IMPL(creat)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(creat(p, arg2));
+fd_trans_unregister(ret);
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
 IMPL(execve)
 {
 abi_ulong *guest_ptrs;
@@ -8055,6 +8071,13 @@ IMPL(exit)
 g_assert_not_reached();
 }
 
+#ifdef TARGET_NR_fork
+IMPL(fork)
+{
+return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
+}
+#endif
+
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 IMPL(name_to_handle_at)
 {
@@ -8216,6 +8239,40 @@ IMPL(read)
 return ret;
 }
 
+#ifdef TARGET_NR_waitid
+IMPL(waitid)
+{
+siginfo_t info;
+abi_long ret;
+
+info.si_pid = 0;
+ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL));
+if (!is_error(ret) && arg3 && info.si_pid != 0) {
+target_siginfo_t *p
+= lock_user(VERIFY_WRITE, arg3, sizeof(target_siginfo_t), 0);
+if (!p) {
+return -TARGET_EFAULT;
+}
+host_to_target_siginfo(p, &info);
+unlock_user(p, arg3, sizeof(target_siginfo_t));
+}
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_waitpid
+IMPL(waitpid)
+{
+int status;
+abi_long ret = get_errno(safe_wait4(arg1, &status, arg3, 0));
+if (!is_error(ret) && arg2 && ret &&
+put_user_s32(host_to_target_waitstatus(status), arg2)) {
+return -TARGET_EFAULT;
+}
+return ret;
+}
+#endif
+
 IMPL(write)
 {
 abi_long ret;
@@ -8258,45 +8315,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_fork
-case TARGET_NR_fork:
-return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
-#endif
-#ifdef TARGET_NR_waitpid
-case TARGET_NR_waitpid:
-{
-int status;
-ret = get_errno(safe_wait4(arg1, &status, arg3, 0));
-if (!is_error(ret) && arg2 && ret
-&& put_user_s32(host_to_target_waitstatus(status), arg2))
-return -TARGET_EFAULT;
-}
-return ret;
-#endif
-#ifdef TARGET_NR_waitid
-case TARGET_NR_waitid:
-{
-siginfo_t info;
-info.si_pid = 0;
-ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL));
-if (!is_error(ret) && arg3 && info.si_pid != 0) {
-if (!(p = lock_user(VERIFY_WRITE, arg3, 
sizeof(target_siginfo_t), 0)))
-return -TARGET_EFAULT;
-host_to_target_siginfo(p, &info);
-unlock_user(p, arg3, sizeof(target_siginfo_t));
-}
-}
-return ret;
-#endif
-#ifdef TARGET_NR_creat /* not on alpha */
-case TARGET_NR_creat:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(creat(p, arg2));
-fd_trans_unregister(ret);
-unlock_user(p, arg1, 0);
-return ret;
-#endif
 #ifdef TARGET_NR_link
 case TARGET_NR_link:
 {
@@ -12932,8 +12950,14 @@ IMPL(everything_else)
 static impl_fn * const syscall_table[] = {
 [TARGET_NR_brk] = impl_brk,
 [TARGET_NR_close] = impl_close,
+#ifdef TARGET_NR_creat
+[TARGET_NR_creat] = impl_creat,
+#endif
 [TARGET_NR_execve] = impl_execve,
 [TARGET_NR_exit] = impl_exit,
+#ifdef TARGET_NR_fork
+[TARGET_NR_fork] = impl_fork,
+#endif
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at,
 #endif
@@ -12945,6 +12969,12 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_open_by_handle_at] = impl_open_by_handle_at,
 #endif
 [TARGET_NR_read] = impl_read,
+#ifdef TARGET_NR_waitid
+[TARGET_NR_waitid] = impl_waitid,
+#endif
+#ifdef TARGET_NR_waitpid
+[TARGET_NR_waitpid] = impl_waitpid,
+#endif
 [TARGET_NR_write] = impl_write,
 };
 
-- 
2.17.0




[Qemu-devel] [PATCH 14/33] linux-user: Split out open_to_handle_at

2018-06-01 Thread Richard Henderson
At the same time, merge do_open_to_handle_at into the new function.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 84 ++--
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4afc22c20c..48bb1c0231 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7369,39 +7369,6 @@ static int do_futex(target_ulong uaddr, int op, int val, 
target_ulong timeout,
 return -TARGET_ENOSYS;
 }
 }
-#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-static abi_long do_open_by_handle_at(abi_long mount_fd, abi_long handle,
- abi_long flags)
-{
-struct file_handle *target_fh;
-struct file_handle *fh;
-unsigned int size, total_size;
-abi_long ret;
-
-if (get_user_s32(size, handle)) {
-return -TARGET_EFAULT;
-}
-
-total_size = sizeof(struct file_handle) + size;
-target_fh = lock_user(VERIFY_READ, handle, total_size, 1);
-if (!target_fh) {
-return -TARGET_EFAULT;
-}
-
-fh = g_memdup(target_fh, total_size);
-fh->handle_bytes = size;
-fh->handle_type = tswap32(target_fh->handle_type);
-
-ret = get_errno(open_by_handle_at(mount_fd, fh,
-target_to_host_bitmask(flags, fcntl_flags_tbl)));
-
-g_free(fh);
-
-unlock_user(target_fh, handle, total_size);
-
-return ret;
-}
-#endif
 
 #if defined(TARGET_NR_signalfd) || defined(TARGET_NR_signalfd4)
 
@@ -8187,6 +8154,45 @@ IMPL(openat)
 return ret;
 }
 
+#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
+IMPL(open_by_handle_at)
+{
+abi_long mount_fd = arg1;
+abi_long handle = arg2;
+abi_long flags = arg3;
+struct file_handle *target_fh;
+struct file_handle *fh;
+unsigned int size, total_size;
+abi_long ret;
+
+if (is_hostfd(mount_fd)) {
+return -TARGET_EBADF;
+}
+if (get_user_s32(size, handle)) {
+return -TARGET_EFAULT;
+}
+
+total_size = sizeof(struct file_handle) + size;
+target_fh = lock_user(VERIFY_READ, handle, total_size, 1);
+if (!target_fh) {
+return -TARGET_EFAULT;
+}
+
+fh = g_memdup(target_fh, total_size);
+fh->handle_bytes = size;
+fh->handle_type = tswap32(target_fh->handle_type);
+
+ret = get_errno(open_by_handle_at(mount_fd, fh,
+target_to_host_bitmask(flags, fcntl_flags_tbl)));
+
+g_free(fh);
+unlock_user(target_fh, handle, total_size);
+
+fd_trans_unregister(ret);
+return ret;
+}
+#endif
+
 IMPL(read)
 {
 abi_long ret;
@@ -8252,15 +8258,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-case TARGET_NR_open_by_handle_at:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-ret = do_open_by_handle_at(arg1, arg2, arg3);
-fd_trans_unregister(ret);
-return ret;
-#endif
 #ifdef TARGET_NR_fork
 case TARGET_NR_fork:
 return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
@@ -12944,6 +12941,9 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_open] = impl_open,
 #endif
 [TARGET_NR_openat] = impl_openat,
+#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
+[TARGET_NR_open_by_handle_at] = impl_open_by_handle_at,
+#endif
 [TARGET_NR_read] = impl_read,
 [TARGET_NR_write] = impl_write,
 };
-- 
2.17.0




[Qemu-devel] [PATCH 17/33] linux-user: Split out unlink, unlinkat

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 59 ++--
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b5736436f8..bbe9d6d9fb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8276,6 +8276,40 @@ IMPL(read)
 return ret;
 }
 
+#ifdef TARGET_NR_unlink
+IMPL(unlink)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(unlink(p));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_unlinkat
+IMPL(unlinkat)
+{
+char *p;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+p = lock_user_string(arg2);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(unlinkat(arg1, p, arg3));
+unlock_user(p, arg2, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_waitid
 IMPL(waitid)
 {
@@ -8352,25 +8386,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_unlink
-case TARGET_NR_unlink:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(unlink(p));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
-#if defined(TARGET_NR_unlinkat)
-case TARGET_NR_unlinkat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-if (!(p = lock_user_string(arg2)))
-return -TARGET_EFAULT;
-ret = get_errno(unlinkat(arg1, p, arg3));
-unlock_user(p, arg2, 0);
-return ret;
-#endif
 case TARGET_NR_chdir:
 if (!(p = lock_user_string(arg1)))
 return -TARGET_EFAULT;
@@ -12978,6 +12993,12 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_open_by_handle_at] = impl_open_by_handle_at,
 #endif
 [TARGET_NR_read] = impl_read,
+#ifdef TARGET_NR_unlink
+[TARGET_NR_unlink] = impl_unlink,
+#endif
+#if TARGET_NR_unlinkat
+[TARGET_NR_unlinkat] = impl_unlinkat,
+#endif
 #ifdef TARGET_NR_waitid
 [TARGET_NR_waitid] = impl_waitid,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 29/33] linux-user: Split out getpgrp, getppid, setsid

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4d9b9cad6e..3dfb77ac11 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8182,6 +8182,13 @@ IMPL(futimesat)
 }
 #endif
 
+#ifdef TARGET_NR_getpgrp
+IMPL(getpgrp)
+{
+return get_errno(getpgrp());
+}
+#endif
+
 #ifdef TARGET_NR_getpid
 IMPL(getpid)
 {
@@ -8189,6 +8196,13 @@ IMPL(getpid)
 }
 #endif
 
+#ifdef TARGET_NR_getppid
+IMPL(getppid)
+{
+return get_errno(getppid());
+}
+#endif
+
 #if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA)
 IMPL(getxpid)
 {
@@ -8721,6 +8735,11 @@ IMPL(setpgid)
 return get_errno(setpgid(arg1, arg2));
 }
 
+IMPL(setsid)
+{
+return get_errno(setsid());
+}
+
 #ifdef TARGET_NR_stime
 IMPL(stime)
 {
@@ -8972,16 +8991,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_getppid /* not on alpha */
-case TARGET_NR_getppid:
-return get_errno(getppid());
-#endif
-#ifdef TARGET_NR_getpgrp
-case TARGET_NR_getpgrp:
-return get_errno(getpgrp());
-#endif
-case TARGET_NR_setsid:
-return get_errno(setsid());
 #ifdef TARGET_NR_sigaction
 case TARGET_NR_sigaction:
 {
@@ -13020,9 +13029,15 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_futimesat
 [TARGET_NR_futimesat] = impl_futimesat,
 #endif
+#ifdef TARGET_NR_getpgrp
+[TARGET_NR_getpgrp] = impl_getpgrp,
+#endif
 #ifdef TARGET_NR_getpid
 [TARGET_NR_getpid] = impl_getpid,
 #endif
+#ifdef TARGET_NR_getppid
+[TARGET_NR_getppid] = impl_getppid,
+#endif
 #if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA)
 [TARGET_NR_getxpid] = impl_getxpid,
 #endif
@@ -13084,6 +13099,7 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_rmdir] = impl_rmdir,
 #endif
 [TARGET_NR_setpgid] = impl_setpgid,
+[TARGET_NR_setsid] = impl_setsid,
 #ifdef TARGET_NR_stime
 [TARGET_NR_stime] = impl_stime,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 19/33] linux-user: Remove all unimplemented entries

2018-06-01 Thread Richard Henderson
There is no reason to list these, since -ENOSYS is the default.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 140 ---
 1 file changed, 140 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 88e0da31ba..6a701ea8f6 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8460,14 +8460,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_break
-case TARGET_NR_break:
-return do_unimplemented(num);
-#endif
-#ifdef TARGET_NR_oldstat
-case TARGET_NR_oldstat:
-return do_unimplemented(num);
-#endif
 case TARGET_NR_lseek:
 if (is_hostfd(arg1)) {
 return -TARGET_EBADF;
@@ -8555,16 +8547,10 @@ IMPL(everything_else)
 return get_errno(stime(&host_time));
 }
 #endif
-case TARGET_NR_ptrace:
-return do_unimplemented(num);
 #ifdef TARGET_NR_alarm /* not on alpha */
 case TARGET_NR_alarm:
 return alarm(arg1);
 #endif
-#ifdef TARGET_NR_oldfstat
-case TARGET_NR_oldfstat:
-return do_unimplemented(num);
-#endif
 #ifdef TARGET_NR_pause /* not on alpha */
 case TARGET_NR_pause:
 if (!block_signals()) {
@@ -8640,14 +8626,6 @@ IMPL(everything_else)
 }
 return ret;
 #endif
-#ifdef TARGET_NR_stty
-case TARGET_NR_stty:
-return do_unimplemented(num);
-#endif
-#ifdef TARGET_NR_gtty
-case TARGET_NR_gtty:
-return do_unimplemented(num);
-#endif
 #ifdef TARGET_NR_access
 case TARGET_NR_access:
 if (!(fn = lock_user_string(arg1))) {
@@ -8678,10 +8656,6 @@ IMPL(everything_else)
 #ifdef TARGET_NR_nice /* not on alpha */
 case TARGET_NR_nice:
 return get_errno(nice(arg1));
-#endif
-#ifdef TARGET_NR_ftime
-case TARGET_NR_ftime:
-return do_unimplemented(num);
 #endif
 case TARGET_NR_sync:
 sync();
@@ -8805,14 +8779,6 @@ IMPL(everything_else)
 ret = host_to_target_clock_t(ret);
 }
 return ret;
-#ifdef TARGET_NR_prof
-case TARGET_NR_prof:
-return do_unimplemented(num);
-#endif
-#ifdef TARGET_NR_signal
-case TARGET_NR_signal:
-return do_unimplemented(num);
-#endif
 case TARGET_NR_acct:
 if (arg1 == 0) {
 ret = get_errno(acct(NULL));
@@ -8832,31 +8798,15 @@ IMPL(everything_else)
 ret = get_errno(umount2(p, arg2));
 unlock_user(p, arg1, 0);
 return ret;
-#endif
-#ifdef TARGET_NR_lock
-case TARGET_NR_lock:
-return do_unimplemented(num);
 #endif
 case TARGET_NR_ioctl:
 return do_ioctl(arg1, arg2, arg3);
 #ifdef TARGET_NR_fcntl
 case TARGET_NR_fcntl:
 return do_fcntl(arg1, arg2, arg3);
-#endif
-#ifdef TARGET_NR_mpx
-case TARGET_NR_mpx:
-return do_unimplemented(num);
 #endif
 case TARGET_NR_setpgid:
 return get_errno(setpgid(arg1, arg2));
-#ifdef TARGET_NR_ulimit
-case TARGET_NR_ulimit:
-return do_unimplemented(num);
-#endif
-#ifdef TARGET_NR_oldolduname
-case TARGET_NR_oldolduname:
-return do_unimplemented(num);
-#endif
 case TARGET_NR_umask:
 return get_errno(umask(arg1));
 case TARGET_NR_chroot:
@@ -8865,10 +8815,6 @@ IMPL(everything_else)
 ret = get_errno(chroot(p));
 unlock_user(p, arg1, 0);
 return ret;
-#ifdef TARGET_NR_ustat
-case TARGET_NR_ustat:
-return do_unimplemented(num);
-#endif
 #ifdef TARGET_NR_dup2
 case TARGET_NR_dup2:
 if (is_hostfd(arg1) || is_hostfd(arg2)) {
@@ -9585,10 +9531,6 @@ IMPL(everything_else)
 }
 return ret;
 #endif
-#ifdef TARGET_NR_oldlstat
-case TARGET_NR_oldlstat:
-return do_unimplemented(num);
-#endif
 #ifdef TARGET_NR_readlink
 case TARGET_NR_readlink:
 {
@@ -9650,10 +9592,6 @@ IMPL(everything_else)
 }
 return ret;
 #endif
-#ifdef TARGET_NR_uselib
-case TARGET_NR_uselib:
-return do_unimplemented(num);
-#endif
 #ifdef TARGET_NR_swapon
 case TARGET_NR_swapon:
 if (!(p = lock_user_string(arg1)))
@@ -9675,10 +9613,6 @@ IMPL(everything_else)
ret = get_errno(reboot(arg1, arg2, arg3, NULL));
 }
 return ret;
-#ifdef TARGET_NR_readdir
-case TARGET_NR_readdir:
-return do_unimplemented(num);
-#endif
 #ifdef TARGET_NR_mmap
 case TARGET_NR_mmap:
 #if (defined(TARGET_I386) && defined(TARGET_ABI32)) || \
@@ -9813,10 +9747,6 @@ IMPL(everything_else)
 return ret;
 case TARGET_NR_setpriority:
 return get_errno(setpriority(arg1, arg2, arg3));
-#ifdef TARGET_NR_profil
-case TARGET_NR_profil:
-return do_unimplemented(num);
-#endif
 case TARGET_NR_statfs:
 if (!(fn = lock_user_string(arg1))) {
 return -TARGET_EFAULT;
@@ -9892,10 +9822,6 @@ IMPL(everything_else)
 ret = get_errno(fstatfs(arg1, &stfs));
 goto convert_statfs64;
 #endif
-#ifdef TARGET_NR_ioperm
-case TARGET_NR_ioperm:

[Qemu-devel] [PATCH 18/33] linux-user: Split out chdir, mknod, mknodat, time, chmod

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 132 ---
 1 file changed, 87 insertions(+), 45 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index bbe9d6d9fb..88e0da31ba 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7899,6 +7899,34 @@ IMPL(brk)
 return do_brk(arg1);
 }
 
+IMPL(chdir)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(chdir(p));
+unlock_user(p, arg1, 0);
+return ret;
+}
+
+#ifdef TARGET_NR_chmod
+IMPL(chmod)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(chmod(p, arg2));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
 IMPL(close)
 {
 if (is_hostfd(arg1)) {
@@ -8115,6 +8143,40 @@ IMPL(linkat)
 }
 #endif
 
+#ifdef TARGET_NR_mknod
+IMPL(mknod)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(mknod(p, arg2, arg3));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_mknodat
+IMPL(mknodat)
+{
+char *p;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+p = lock_user_string(arg2);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(mknodat(arg1, p, arg3, arg4));
+unlock_user(p, arg2, 0);
+return ret;
+}
+#endif
+
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 IMPL(name_to_handle_at)
 {
@@ -8276,6 +8338,18 @@ IMPL(read)
 return ret;
 }
 
+#ifdef TARGET_NR_time
+IMPL(time)
+{
+time_t host_time;
+abi_long ret = get_errno(time(&host_time));
+if (!is_error(ret) && arg1 && put_user_sal(host_time, arg1)) {
+return -TARGET_EFAULT;
+}
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_unlink
 IMPL(unlink)
 {
@@ -8386,51 +8460,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-case TARGET_NR_chdir:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(chdir(p));
-unlock_user(p, arg1, 0);
-return ret;
-#ifdef TARGET_NR_time
-case TARGET_NR_time:
-{
-time_t host_time;
-ret = get_errno(time(&host_time));
-if (!is_error(ret)
-&& arg1
-&& put_user_sal(host_time, arg1))
-return -TARGET_EFAULT;
-}
-return ret;
-#endif
-#ifdef TARGET_NR_mknod
-case TARGET_NR_mknod:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(mknod(p, arg2, arg3));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
-#if defined(TARGET_NR_mknodat)
-case TARGET_NR_mknodat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-if (!(p = lock_user_string(arg2)))
-return -TARGET_EFAULT;
-ret = get_errno(mknodat(arg1, p, arg3, arg4));
-unlock_user(p, arg2, 0);
-return ret;
-#endif
-#ifdef TARGET_NR_chmod
-case TARGET_NR_chmod:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(chmod(p, arg2));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
 #ifdef TARGET_NR_break
 case TARGET_NR_break:
 return do_unimplemented(num);
@@ -12968,6 +12997,10 @@ IMPL(everything_else)
 static impl_fn * const syscall_table[] = {
 [TARGET_NR_brk] = impl_brk,
 [TARGET_NR_close] = impl_close,
+[TARGET_NR_chdir] = impl_chdir,
+#ifdef TARGET_NR_chmod
+[TARGET_NR_chmod] = impl_chmod,
+#endif
 #ifdef TARGET_NR_creat
 [TARGET_NR_creat] = impl_creat,
 #endif
@@ -12982,6 +13015,12 @@ static impl_fn * const syscall_table[] = {
 #if defined(TARGET_NR_linkat)
 [TARGET_NR_linkat] = impl_linkat,
 #endif
+#ifdef TARGET_NR_mknod
+[TARGET_NR_mknod] = impl_mknod,
+#endif
+#ifdef TARGET_NR_mknodat
+[TARGET_NR_mknodat] = impl_mknodat,
+#endif
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at,
 #endif
@@ -12993,6 +13032,9 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_open_by_handle_at] = impl_open_by_handle_at,
 #endif
 [TARGET_NR_read] = impl_read,
+#ifdef TARGET_NR_time
+[TARGET_NR_time] = impl_time,
+#endif
 #ifdef TARGET_NR_unlink
 [TARGET_NR_unlink] = impl_unlink,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 31/33] linux-user: Split out rt_sigprocmask, sgetmask, sigprocmask, ssetmask

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 294 +++
 1 file changed, 158 insertions(+), 136 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 36e2bb838e..e37a3ab643 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8805,6 +8805,65 @@ IMPL(rt_sigaction)
 return ret;
 }
 
+IMPL(rt_sigprocmask)
+{
+int how = 0;
+sigset_t set, oldset, *set_ptr = NULL;
+abi_long ret;
+target_sigset_t *p;
+
+if (arg4 != sizeof(target_sigset_t)) {
+return -TARGET_EINVAL;
+}
+
+if (arg2) {
+switch (arg1) {
+case TARGET_SIG_BLOCK:
+how = SIG_BLOCK;
+break;
+case TARGET_SIG_UNBLOCK:
+how = SIG_UNBLOCK;
+break;
+case TARGET_SIG_SETMASK:
+how = SIG_SETMASK;
+break;
+default:
+return -TARGET_EINVAL;
+}
+p = lock_user(VERIFY_READ, arg2, sizeof(target_sigset_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_sigset(&set, p);
+unlock_user(p, arg2, 0);
+set_ptr = &set;
+}
+ret = do_sigprocmask(how, set_ptr, &oldset);
+if (!is_error(ret) && arg3) {
+p = lock_user(VERIFY_WRITE, arg3, sizeof(target_sigset_t), 0);
+if (!p) {
+return -TARGET_EFAULT;
+}
+host_to_target_sigset(p, &oldset);
+unlock_user(p, arg3, sizeof(target_sigset_t));
+}
+return ret;
+}
+
+#ifdef TARGET_NR_sgetmask
+IMPL(sgetmask)
+{
+sigset_t cur_set;
+abi_ulong target_set;
+abi_long ret = do_sigprocmask(0, NULL, &cur_set);
+if (!ret) {
+host_to_target_old_sigset(&target_set, &cur_set);
+ret = target_set;
+}
+return ret;
+}
+#endif
+
 IMPL(setpgid)
 {
 return get_errno(setpgid(arg1, arg2));
@@ -8901,6 +8960,95 @@ IMPL(sigaction)
 }
 #endif
 
+#ifdef TARGET_NR_sigprocmask
+IMPL(sigprocmask)
+{
+abi_long ret;
+# ifdef TARGET_ALPHA
+sigset_t set, oldset;
+abi_ulong mask;
+int how;
+
+switch (arg1) {
+case TARGET_SIG_BLOCK:
+how = SIG_BLOCK;
+break;
+case TARGET_SIG_UNBLOCK:
+how = SIG_UNBLOCK;
+break;
+case TARGET_SIG_SETMASK:
+how = SIG_SETMASK;
+break;
+default:
+return -TARGET_EINVAL;
+}
+mask = arg2;
+target_to_host_old_sigset(&set, &mask);
+
+ret = do_sigprocmask(how, &set, &oldset);
+if (!is_error(ret)) {
+host_to_target_old_sigset(&mask, &oldset);
+ret = mask;
+((CPUAlphaState *)cpu_env)->ir[IR_V0] = 0; /* force no error */
+}
+# else
+sigset_t set, oldset, *set_ptr = NULL;
+int how = 0;
+abi_ulong *p;
+
+if (arg2) {
+switch (arg1) {
+case TARGET_SIG_BLOCK:
+how = SIG_BLOCK;
+break;
+case TARGET_SIG_UNBLOCK:
+how = SIG_UNBLOCK;
+break;
+case TARGET_SIG_SETMASK:
+how = SIG_SETMASK;
+break;
+default:
+return -TARGET_EINVAL;
+}
+p = lock_user(VERIFY_READ, arg2, sizeof(target_sigset_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_old_sigset(&set, p);
+unlock_user(p, arg2, 0);
+set_ptr = &set;
+}
+ret = do_sigprocmask(how, set_ptr, &oldset);
+if (!is_error(ret) && arg3) {
+p = lock_user(VERIFY_WRITE, arg3, sizeof(target_sigset_t), 0);
+if (!p) {
+return -TARGET_EFAULT;
+}
+host_to_target_old_sigset(p, &oldset);
+unlock_user(p, arg3, sizeof(target_sigset_t));
+}
+# endif
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_ssetmask
+IMPL(ssetmask)
+{
+sigset_t set, oset;
+abi_ulong target_set = arg1;
+abi_long ret;
+
+target_to_host_old_sigset(&set, &target_set);
+ret = do_sigprocmask(SIG_SETMASK, &set, &oset);
+if (!ret) {
+host_to_target_old_sigset(&target_set, &oset);
+ret = target_set;
+}
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_stime
 IMPL(stime)
 {
@@ -9152,142 +9300,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_sgetmask /* not on alpha */
-case TARGET_NR_sgetmask:
-{
-sigset_t cur_set;
-abi_ulong target_set;
-ret = do_sigprocmask(0, NULL, &cur_set);
-if (!ret) {
-host_to_target_old_sigset(&target_set, &cur_set);
-ret = target_set;
-}
-}
-return ret;
-#endif
-#ifdef TARGET_NR_ssetmask /* not on alpha */
-case TARGET_NR_ssetmask:
-{
-sigset_t set, oset;
-abi_ulong target_set = arg1;
-target_to_host_old_sigset(&set, &target_set);
-ret = do_sigprocmask(SIG_SETMASK, &set, &oset);
-if (!ret) {
-host_to_target_old_sigset(&t

[Qemu-devel] [PATCH 23/33] linux-user: Split out access, faccessat, futimesat, kill, nice, sync, syncfs

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 179 +++
 1 file changed, 113 insertions(+), 66 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b3838c5161..2a172e24eb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7894,6 +7894,24 @@ IMPL(enosys)
 return do_unimplemented(num);
 }
 
+#ifdef TARGET_NR_access
+IMPL(access)
+{
+char *fn = lock_user_string(arg1);
+abi_long ret;
+
+if (!fn) {
+return -TARGET_EFAULT;
+}
+TRY_INTERP_FD(ret, fn,
+  faccessat(interp_dirfd, fn + 1, arg2, 0),
+  access(fn, arg2));
+ret = get_errno(ret);
+unlock_user(fn, arg1, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_alarm
 IMPL(alarm)
 {
@@ -8106,6 +8124,28 @@ IMPL(exit)
 g_assert_not_reached();
 }
 
+#ifdef TARGET_NR_faccessat
+IMPL(faccessat)
+{
+char *fn;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+fn = lock_user_string(arg2);
+if (!fn) {
+return -TARGET_EFAULT;
+}
+TRY_INTERP_FD(ret, fn,
+  faccessat(interp_dirfd, fn + 1, arg3, 0),
+  faccessat(arg1, fn, arg3, 0));
+ret = get_errno(ret);
+unlock_user(fn, arg2, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_fork
 IMPL(fork)
 {
@@ -8113,6 +8153,37 @@ IMPL(fork)
 }
 #endif
 
+#ifdef TARGET_NR_futimesat
+IMPL(futimesat)
+{
+struct timeval tv[2], *tvp = NULL;
+char *fn;
+abi_long ret;
+
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+if (arg3) {
+if (copy_from_user_timeval(&tv[0], arg3) ||
+copy_from_user_timeval(&tv[1],
+   arg3 + sizeof(struct target_timeval))) {
+return -TARGET_EFAULT;
+}
+tvp = tv;
+}
+fn = lock_user_string(arg2);
+if (!fn) {
+return -TARGET_EFAULT;
+}
+TRY_INTERP_FD(ret, fn,
+  futimesat(interp_dirfd, fn + 1, tvp),
+  futimesat(arg1, fn, tvp));
+ret = get_errno(ret);
+unlock_user(fn, arg2, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_getpid
 IMPL(getpid)
 {
@@ -8128,6 +8199,11 @@ IMPL(getxpid)
 }
 #endif
 
+IMPL(kill)
+{
+return get_errno(safe_kill(arg1, target_to_host_signal(arg2)));
+}
+
 #ifdef TARGET_NR_link
 IMPL(link)
 {
@@ -8309,6 +8385,13 @@ IMPL(name_to_handle_at)
 }
 #endif
 
+#ifdef TARGET_NR_nice
+IMPL(nice)
+{
+return get_errno(nice(arg1));
+}
+#endif
+
 #ifdef TARGET_NR_open
 IMPL(open)
 {
@@ -8432,6 +8515,19 @@ IMPL(stime)
 }
 #endif
 
+IMPL(sync)
+{
+sync();
+return 0;
+}
+
+#if defined(TARGET_NR_syncfs) && defined(CONFIG_SYNCFS)
+IMPL(syncfs)
+{
+return get_errno(syncfs(arg1));
+}
+#endif
+
 #ifdef TARGET_NR_time
 IMPL(time)
 {
@@ -8618,72 +8714,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#if defined(TARGET_NR_futimesat)
-case TARGET_NR_futimesat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-} else {
-struct timeval *tvp, tv[2];
-if (arg3) {
-if (copy_from_user_timeval(&tv[0], arg3)
-|| copy_from_user_timeval(&tv[1],
-  arg3 + sizeof(struct 
target_timeval)))
-return -TARGET_EFAULT;
-tvp = tv;
-} else {
-tvp = NULL;
-}
-if (!(fn = lock_user_string(arg2))) {
-return -TARGET_EFAULT;
-}
-TRY_INTERP_FD(ret, fn,
-  futimesat(interp_dirfd, fn + 1, tvp),
-  futimesat(arg1, fn, tvp));
-ret = get_errno(ret);
-unlock_user(fn, arg2, 0);
-}
-return ret;
-#endif
-#ifdef TARGET_NR_access
-case TARGET_NR_access:
-if (!(fn = lock_user_string(arg1))) {
-return -TARGET_EFAULT;
-}
-TRY_INTERP_FD(ret, fn,
-  faccessat(interp_dirfd, fn + 1, arg2, 0),
-  access(fn, arg2));
-ret = get_errno(ret);
-unlock_user(fn, arg1, 0);
-return ret;
-#endif
-#if defined(TARGET_NR_faccessat) && defined(__NR_faccessat)
-case TARGET_NR_faccessat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-if (!(fn = lock_user_string(arg2))) {
-return -TARGET_EFAULT;
-}
-TRY_INTERP_FD(ret, fn,
-  faccessat(interp_dirfd, fn + 1, arg3, 0),
-  faccessat(arg1, fn, arg3, 0));
-ret = get_errno(ret);
-unlock_user(fn, arg2, 0);
-return ret;
-#endif
-#ifdef TARGET_NR_nice /* not on alpha */
-case TARGET_NR_nice:
-return get_errno(nice(arg1));
-#endif
-case TARGET_NR_sync:
-sync();
-return 0;
-#if defined(TARGET_NR_syncfs) && defined(CONFIG_SYNCFS)
-case TARGET_NR_sync

[Qemu-devel] [PATCH 21/33] linux-user: Split out mount, umount

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 123 +--
 1 file changed, 60 insertions(+), 63 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b568144369..53eac58ec0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8200,6 +8200,47 @@ IMPL(mknodat)
 }
 #endif
 
+IMPL(mount)
+{
+char *p1 = NULL, *p2, *p3 = NULL;
+abi_long ret = -TARGET_EFAULT;
+
+if (arg1) {
+p1 = lock_user_string(arg1);
+if (!p1) {
+goto exit1;
+}
+}
+p2 = lock_user_string(arg2);
+if (!p2) {
+goto exit2;
+}
+if (arg3) {
+p3 = lock_user_string(arg3);
+if (!p3) {
+goto exit3;
+}
+}
+
+/* FIXME - arg5 should be locked, but it isn't clear how to do that
+ * since it's not guaranteed to be a NULL-terminated string.
+ */
+ret = mount(p1, p2, p3, (unsigned long)arg4, arg5 ? g2h(arg5) : NULL);
+ret = get_errno(ret);
+
+if (arg3) {
+unlock_user(p3, arg3, 0);
+}
+ exit3:
+unlock_user(p2, arg2, 0);
+ exit2:
+if (arg1) {
+unlock_user(p1, arg1, 0);
+}
+ exit1:
+return ret;
+}
+
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 IMPL(name_to_handle_at)
 {
@@ -8373,6 +8414,21 @@ IMPL(time)
 }
 #endif
 
+#ifdef TARGET_NR_umount
+IMPL(umount)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(umount(p));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_unlink
 IMPL(unlink)
 {
@@ -8483,69 +8539,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-case TARGET_NR_mount:
-{
-/* need to look at the data field */
-void *p2, *p3;
-
-if (arg1) {
-p = lock_user_string(arg1);
-if (!p) {
-return -TARGET_EFAULT;
-}
-} else {
-p = NULL;
-}
-
-p2 = lock_user_string(arg2);
-if (!p2) {
-if (arg1) {
-unlock_user(p, arg1, 0);
-}
-return -TARGET_EFAULT;
-}
-
-if (arg3) {
-p3 = lock_user_string(arg3);
-if (!p3) {
-if (arg1) {
-unlock_user(p, arg1, 0);
-}
-unlock_user(p2, arg2, 0);
-return -TARGET_EFAULT;
-}
-} else {
-p3 = NULL;
-}
-
-/* FIXME - arg5 should be locked, but it isn't clear how to
- * do that since it's not guaranteed to be a NULL-terminated
- * string.
- */
-if (!arg5) {
-ret = mount(p, p2, p3, (unsigned long)arg4, NULL);
-} else {
-ret = mount(p, p2, p3, (unsigned long)arg4, g2h(arg5));
-}
-ret = get_errno(ret);
-
-if (arg1) {
-unlock_user(p, arg1, 0);
-}
-unlock_user(p2, arg2, 0);
-if (arg3) {
-unlock_user(p3, arg3, 0);
-}
-}
-return ret;
-#ifdef TARGET_NR_umount
-case TARGET_NR_umount:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(umount(p));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
 #ifdef TARGET_NR_stime /* not on alpha */
 case TARGET_NR_stime:
 {
@@ -12896,6 +12889,7 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_mknodat
 [TARGET_NR_mknodat] = impl_mknodat,
 #endif
+[TARGET_NR_mount] = impl_mount,
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at,
 #endif
@@ -12910,6 +12904,9 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_time
 [TARGET_NR_time] = impl_time,
 #endif
+#ifdef TARGET_NR_umount
+[TARGET_NR_umount] = impl_umount,
+#endif
 #ifdef TARGET_NR_unlink
 [TARGET_NR_unlink] = impl_unlink,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 20/33] linux-user: Split out getpid, getxpid, lseek

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 45 +---
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 6a701ea8f6..b568144369 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8106,6 +8106,21 @@ IMPL(fork)
 }
 #endif
 
+#ifdef TARGET_NR_getpid
+IMPL(getpid)
+{
+return get_errno(getpid());
+}
+#endif
+
+#if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA)
+IMPL(getxpid)
+{
+((CPUAlphaState *)cpu_env)->ir[IR_A4] = getppid();
+return get_errno(getpid());
+}
+#endif
+
 #ifdef TARGET_NR_link
 IMPL(link)
 {
@@ -8143,6 +8158,14 @@ IMPL(linkat)
 }
 #endif
 
+IMPL(lseek)
+{
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+return get_errno(lseek(arg1, arg2, arg3));
+}
+
 #ifdef TARGET_NR_mknod
 IMPL(mknod)
 {
@@ -8460,21 +8483,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-case TARGET_NR_lseek:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-}
-return get_errno(lseek(arg1, arg2, arg3));
-#if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA)
-/* Alpha specific */
-case TARGET_NR_getxpid:
-((CPUAlphaState *)cpu_env)->ir[IR_A4] = getppid();
-return get_errno(getpid());
-#endif
-#ifdef TARGET_NR_getpid
-case TARGET_NR_getpid:
-return get_errno(getpid());
-#endif
 case TARGET_NR_mount:
 {
 /* need to look at the data field */
@@ -12869,12 +12877,19 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_fork
 [TARGET_NR_fork] = impl_fork,
 #endif
+#ifdef TARGET_NR_getpid
+[TARGET_NR_getpid] = impl_getpid,
+#endif
+#if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA)
+[TARGET_NR_getxpid] = impl_getxpid,
+#endif
 #ifdef TARGET_NR_link
 [TARGET_NR_link] = impl_link,
 #endif
 #if defined(TARGET_NR_linkat)
 [TARGET_NR_linkat] = impl_linkat,
 #endif
+[TARGET_NR_lseek] = impl_lseek,
 #ifdef TARGET_NR_mknod
 [TARGET_NR_mknod] = impl_mknod,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 32/33] linux-user: Split out rt_sigpending, rt_sigsuspend, sigpending, sigsuspend

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 176 +--
 1 file changed, 101 insertions(+), 75 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e37a3ab643..c3bd625965 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8805,6 +8805,32 @@ IMPL(rt_sigaction)
 return ret;
 }
 
+IMPL(rt_sigpending)
+{
+sigset_t set;
+abi_long ret;
+
+/* Yes, this check is >, not != like most. We follow the kernel's
+ * logic and it does it like this because it implements
+ * NR_sigpending through the same code path, and in that case
+ * the old_sigset_t is smaller in size.
+ */
+if (arg2 > sizeof(target_sigset_t)) {
+return -TARGET_EINVAL;
+}
+ret = get_errno(sigpending(&set));
+if (!is_error(ret)) {
+target_sigset_t *p;
+p = lock_user(VERIFY_WRITE, arg1, sizeof(target_sigset_t), 0);
+if (!p) {
+return -TARGET_EFAULT;
+}
+host_to_target_sigset(p, &set);
+unlock_user(p, arg1, sizeof(target_sigset_t));
+}
+return ret;
+}
+
 IMPL(rt_sigprocmask)
 {
 int how = 0;
@@ -8850,6 +8876,29 @@ IMPL(rt_sigprocmask)
 return ret;
 }
 
+IMPL(rt_sigsuspend)
+{
+CPUState *cpu = ENV_GET_CPU(cpu_env);
+TaskState *ts = cpu->opaque;
+target_sigset_t *p;
+abi_long ret;
+
+if (arg2 != sizeof(target_sigset_t)) {
+return -TARGET_EINVAL;
+}
+p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_sigset(&ts->sigsuspend_mask, p);
+unlock_user(p, arg1, 0);
+ret = get_errno(safe_rt_sigsuspend(&ts->sigsuspend_mask, SIGSET_T_SIZE));
+if (ret != -TARGET_ERESTARTSYS) {
+ts->in_sigsuspend = 1;
+}
+return ret;
+}
+
 #ifdef TARGET_NR_sgetmask
 IMPL(sgetmask)
 {
@@ -8960,6 +9009,24 @@ IMPL(sigaction)
 }
 #endif
 
+#ifdef TARGET_NR_sigpending
+IMPL(sigpending)
+{
+sigset_t set;
+abi_long ret = get_errno(sigpending(&set));
+if (!is_error(ret)) {
+abi_ulong *p;
+p = lock_user(VERIFY_WRITE, arg1, sizeof(target_sigset_t), 0);
+if (!p) {
+return -TARGET_EFAULT;
+}
+host_to_target_old_sigset(p, &set);
+unlock_user(p, arg1, sizeof(target_sigset_t));
+}
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_sigprocmask
 IMPL(sigprocmask)
 {
@@ -9032,6 +9099,32 @@ IMPL(sigprocmask)
 }
 #endif
 
+#ifdef TARGET_NR_sigsuspend
+IMPL(sigsuspend)
+{
+CPUState *cpu = ENV_GET_CPU(cpu_env);
+TaskState *ts = cpu->opaque;
+abi_long ret;
+
+# ifdef TARGET_ALPHA
+abi_ulong mask = arg1;
+target_to_host_old_sigset(&ts->sigsuspend_mask, &mask);
+# else
+abi_ulong *p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_old_sigset(&ts->sigsuspend_mask, p);
+unlock_user(p, arg1, 0);
+# endif
+ret = get_errno(safe_rt_sigsuspend(&ts->sigsuspend_mask, SIGSET_T_SIZE));
+if (ret != -TARGET_ERESTARTSYS) {
+ts->in_sigsuspend = 1;
+}
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_ssetmask
 IMPL(ssetmask)
 {
@@ -9300,81 +9393,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_sigpending
-case TARGET_NR_sigpending:
-{
-sigset_t set;
-ret = get_errno(sigpending(&set));
-if (!is_error(ret)) {
-if (!(p = lock_user(VERIFY_WRITE, arg1, 
sizeof(target_sigset_t), 0)))
-return -TARGET_EFAULT;
-host_to_target_old_sigset(p, &set);
-unlock_user(p, arg1, sizeof(target_sigset_t));
-}
-}
-return ret;
-#endif
-case TARGET_NR_rt_sigpending:
-{
-sigset_t set;
-
-/* Yes, this check is >, not != like most. We follow the kernel's
- * logic and it does it like this because it implements
- * NR_sigpending through the same code path, and in that case
- * the old_sigset_t is smaller in size.
- */
-if (arg2 > sizeof(target_sigset_t)) {
-return -TARGET_EINVAL;
-}
-
-ret = get_errno(sigpending(&set));
-if (!is_error(ret)) {
-if (!(p = lock_user(VERIFY_WRITE, arg1, 
sizeof(target_sigset_t), 0)))
-return -TARGET_EFAULT;
-host_to_target_sigset(p, &set);
-unlock_user(p, arg1, sizeof(target_sigset_t));
-}
-}
-return ret;
-#ifdef TARGET_NR_sigsuspend
-case TARGET_NR_sigsuspend:
-{
-TaskState *ts = cpu->opaque;
-#if defined(TARGET_ALPHA)
-abi_ulong mask = arg1;
-target_to_host_old_sigset(&ts->sigsuspend_mask, &mask);
-#else
-if (!(p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 
1)))
- 

[Qemu-devel] [PATCH 27/33] linux-user: Split out ioctl

2018-06-01 Thread Richard Henderson
At the same time, merge do_ioctl into the new function.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 190 ++-
 1 file changed, 97 insertions(+), 93 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index bde1f9872f..4be71367fc 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5759,97 +5759,6 @@ static IOCTLEntry ioctl_entries[] = {
 { 0, 0, },
 };
 
-/* ??? Implement proper locking for ioctls.  */
-/* do_ioctl() Must return target values and target errnos. */
-static abi_long do_ioctl(int fd, int cmd, abi_long arg)
-{
-const IOCTLEntry *ie;
-const argtype *arg_type;
-abi_long ret;
-uint8_t buf_temp[MAX_STRUCT_SIZE];
-int target_size;
-void *argptr;
-
-ie = ioctl_entries;
-for(;;) {
-if (ie->target_cmd == 0) {
-gemu_log("Unsupported ioctl: cmd=0x%04lx\n", (long)cmd);
-return -TARGET_ENOSYS;
-}
-if (ie->target_cmd == cmd)
-break;
-ie++;
-}
-arg_type = ie->arg_type;
-#if defined(DEBUG)
-gemu_log("ioctl: cmd=0x%04lx (%s)\n", (long)cmd, ie->name);
-#endif
-if (ie->do_ioctl) {
-return ie->do_ioctl(ie, buf_temp, fd, cmd, arg);
-} else if (!ie->host_cmd) {
-/* Some architectures define BSD ioctls in their headers
-   that are not implemented in Linux.  */
-return -TARGET_ENOSYS;
-}
-
-switch(arg_type[0]) {
-case TYPE_NULL:
-/* no argument */
-ret = get_errno(safe_ioctl(fd, ie->host_cmd));
-break;
-case TYPE_PTRVOID:
-case TYPE_INT:
-ret = get_errno(safe_ioctl(fd, ie->host_cmd, arg));
-break;
-case TYPE_PTR:
-arg_type++;
-target_size = thunk_type_size(arg_type, 0);
-switch(ie->access) {
-case IOC_R:
-ret = get_errno(safe_ioctl(fd, ie->host_cmd, buf_temp));
-if (!is_error(ret)) {
-argptr = lock_user(VERIFY_WRITE, arg, target_size, 0);
-if (!argptr)
-return -TARGET_EFAULT;
-thunk_convert(argptr, buf_temp, arg_type, THUNK_TARGET);
-unlock_user(argptr, arg, target_size);
-}
-break;
-case IOC_W:
-argptr = lock_user(VERIFY_READ, arg, target_size, 1);
-if (!argptr)
-return -TARGET_EFAULT;
-thunk_convert(buf_temp, argptr, arg_type, THUNK_HOST);
-unlock_user(argptr, arg, 0);
-ret = get_errno(safe_ioctl(fd, ie->host_cmd, buf_temp));
-break;
-default:
-case IOC_RW:
-argptr = lock_user(VERIFY_READ, arg, target_size, 1);
-if (!argptr)
-return -TARGET_EFAULT;
-thunk_convert(buf_temp, argptr, arg_type, THUNK_HOST);
-unlock_user(argptr, arg, 0);
-ret = get_errno(safe_ioctl(fd, ie->host_cmd, buf_temp));
-if (!is_error(ret)) {
-argptr = lock_user(VERIFY_WRITE, arg, target_size, 0);
-if (!argptr)
-return -TARGET_EFAULT;
-thunk_convert(argptr, buf_temp, arg_type, THUNK_TARGET);
-unlock_user(argptr, arg, target_size);
-}
-break;
-}
-break;
-default:
-gemu_log("Unsupported ioctl type: cmd=0x%04lx type=%d\n",
- (long)cmd, arg_type[0]);
-ret = -TARGET_ENOSYS;
-break;
-}
-return ret;
-}
-
 static const bitmask_transtbl iflag_tbl[] = {
 { TARGET_IGNBRK, TARGET_IGNBRK, IGNBRK, IGNBRK },
 { TARGET_BRKINT, TARGET_BRKINT, BRKINT, BRKINT },
@@ -8231,6 +8140,102 @@ IMPL(getxpid)
 }
 #endif
 
+/* ??? Implement proper locking for ioctls.  */
+IMPL(ioctl)
+{
+abi_long fd = arg1;
+abi_long cmd = arg2;
+abi_long arg = arg3;
+const IOCTLEntry *ie;
+const argtype *arg_type;
+abi_long ret;
+uint8_t buf_temp[MAX_STRUCT_SIZE];
+int target_size;
+void *argptr;
+
+for (ie = ioctl_entries; ; ie++) {
+if (ie->target_cmd == 0) {
+gemu_log("Unsupported ioctl: cmd=0x%04lx\n", (long)cmd);
+return -TARGET_ENOSYS;
+}
+if (ie->target_cmd == cmd) {
+break;
+}
+}
+arg_type = ie->arg_type;
+#if defined(DEBUG)
+gemu_log("ioctl: cmd=0x%04lx (%s)\n", (long)cmd, ie->name);
+#endif
+if (ie->do_ioctl) {
+return ie->do_ioctl(ie, buf_temp, fd, cmd, arg);
+} else if (!ie->host_cmd) {
+/* Some architectures define BSD ioctls in their headers
+   that are not implemented in Linux.  */
+return -TARGET_ENOSYS;
+}
+
+switch (arg_type[0]) {
+case TYPE_NULL:
+/* no argument */
+ret = get_errno(safe_ioctl(fd, ie->host_cmd));
+break;
+case TYPE_PTRVOID:
+case TYPE_INT:
+ret = get_errno(safe_ioctl(fd, ie->host_cmd, arg));
+ 

[Qemu-devel] [PATCH 24/33] linux-user: Split out rename, renameat, renameat2

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 113 ---
 1 file changed, 63 insertions(+), 50 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 2a172e24eb..24514329b0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8504,6 +8504,60 @@ IMPL(read)
 return ret;
 }
 
+#ifdef TARGET_NR_rename
+IMPL(rename)
+{
+char *p1 = lock_user_string(arg1);
+char *p2 = lock_user_string(arg2);
+abi_long ret = -TARGET_EFAULT;
+
+if (p1 && p2) {
+ret = get_errno(rename(p1, p2));
+}
+unlock_user(p2, arg2, 0);
+unlock_user(p1, arg1, 0);
+return ret;
+}
+#endif
+
+#if defined(TARGET_NR_renameat)
+IMPL(renameat)
+{
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+
+char *p1 = lock_user_string(arg2);
+char *p2 = lock_user_string(arg4);
+abi_long ret = -TARGET_EFAULT;
+if (p1 && p2) {
+ret = get_errno(renameat(arg1, p1, arg3, p2));
+}
+unlock_user(p2, arg4, 0);
+unlock_user(p1, arg2, 0);
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_renameat2
+IMPL(renameat2)
+{
+if (is_hostfd(arg1)) {
+return -TARGET_EBADF;
+}
+
+char *p1 = lock_user_string(arg2);
+char *p2 = lock_user_string(arg4);
+abi_long ret = -TARGET_EFAULT;
+if (p1 && p2) {
+ret = get_errno(sys_renameat2(arg1, p1, arg3, p2, arg5));
+}
+unlock_user(p2, arg4, 0);
+unlock_user(p1, arg2, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_stime
 IMPL(stime)
 {
@@ -8714,56 +8768,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_rename
-case TARGET_NR_rename:
-{
-void *p2;
-p = lock_user_string(arg1);
-p2 = lock_user_string(arg2);
-if (!p || !p2)
-ret = -TARGET_EFAULT;
-else
-ret = get_errno(rename(p, p2));
-unlock_user(p2, arg2, 0);
-unlock_user(p, arg1, 0);
-}
-return ret;
-#endif
-#if defined(TARGET_NR_renameat)
-case TARGET_NR_renameat:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-} else {
-void *p2;
-p  = lock_user_string(arg2);
-p2 = lock_user_string(arg4);
-if (!p || !p2)
-ret = -TARGET_EFAULT;
-else
-ret = get_errno(renameat(arg1, p, arg3, p2));
-unlock_user(p2, arg4, 0);
-unlock_user(p, arg2, 0);
-}
-return ret;
-#endif
-#if defined(TARGET_NR_renameat2)
-case TARGET_NR_renameat2:
-if (is_hostfd(arg1)) {
-return -TARGET_EBADF;
-} else {
-void *p2;
-p  = lock_user_string(arg2);
-p2 = lock_user_string(arg4);
-if (!p || !p2) {
-ret = -TARGET_EFAULT;
-} else {
-ret = get_errno(sys_renameat2(arg1, p, arg3, p2, arg5));
-}
-unlock_user(p2, arg4, 0);
-unlock_user(p, arg2, 0);
-}
-return ret;
-#endif
 #ifdef TARGET_NR_mkdir
 case TARGET_NR_mkdir:
 if (!(p = lock_user_string(arg1)))
@@ -12967,6 +12971,15 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_pause] = impl_pause,
 #endif
 [TARGET_NR_read] = impl_read,
+#ifdef TARGET_NR_rename
+[TARGET_NR_rename] = impl_rename,
+#endif
+#ifdef TARGET_NR_renameat
+[TARGET_NR_renameat] = impl_renameat,
+#endif
+#ifdef TARGET_NR_renameat2
+[TARGET_NR_renameat2] = impl_renameat2,
+#endif
 #ifdef TARGET_NR_stime
 [TARGET_NR_stime] = impl_stime,
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH 22/33] linux-user: Split out alarm, pause, stime, utime, utimes

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 156 ++-
 1 file changed, 94 insertions(+), 62 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 53eac58ec0..b3838c5161 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7894,6 +7894,13 @@ IMPL(enosys)
 return do_unimplemented(num);
 }
 
+#ifdef TARGET_NR_alarm
+IMPL(alarm)
+{
+return alarm(arg1);
+}
+#endif
+
 IMPL(brk)
 {
 return do_brk(arg1);
@@ -8379,6 +8386,18 @@ IMPL(open_by_handle_at)
 }
 #endif
 
+#ifdef TARGET_NR_pause
+IMPL(pause)
+{
+CPUState *cpu = ENV_GET_CPU(cpu_env);
+
+if (!block_signals()) {
+sigsuspend(&((TaskState *)cpu->opaque)->signal_mask);
+}
+return -TARGET_EINTR;
+}
+#endif
+
 IMPL(read)
 {
 abi_long ret;
@@ -8402,6 +8421,17 @@ IMPL(read)
 return ret;
 }
 
+#ifdef TARGET_NR_stime
+IMPL(stime)
+{
+time_t host_time;
+if (get_user_sal(host_time, arg1)) {
+return -TARGET_EFAULT;
+}
+return get_errno(stime(&host_time));
+}
+#endif
+
 #ifdef TARGET_NR_time
 IMPL(time)
 {
@@ -8463,6 +8493,55 @@ IMPL(unlinkat)
 }
 #endif
 
+#ifdef TARGET_NR_utime
+IMPL(utime)
+{
+struct utimbuf tbuf;
+char *p;
+abi_long ret;
+
+if (arg2) {
+struct target_utimbuf *target_tbuf;
+if (!lock_user_struct(VERIFY_READ, target_tbuf, arg2, 1)) {
+return -TARGET_EFAULT;
+}
+tbuf.actime = tswapal(target_tbuf->actime);
+tbuf.modtime = tswapal(target_tbuf->modtime);
+unlock_user_struct(target_tbuf, arg2, 0);
+}
+p = lock_user_string(arg1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(utime(p, arg2 ? &tbuf : NULL));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
+#ifdef TARGET_NR_utimes
+IMPL(utimes)
+{
+struct timeval tv[2];
+char *p;
+abi_long ret;
+
+if (arg2 &&
+(copy_from_user_timeval(&tv[0], arg2) ||
+ copy_from_user_timeval(&tv[1],
+arg2 + sizeof(struct target_timeval {
+return -TARGET_EFAULT;
+}
+p = lock_user_string(arg1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(utimes(p, arg2 ? tv : NULL));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_waitid
 IMPL(waitid)
 {
@@ -8539,68 +8618,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_stime /* not on alpha */
-case TARGET_NR_stime:
-{
-time_t host_time;
-if (get_user_sal(host_time, arg1))
-return -TARGET_EFAULT;
-return get_errno(stime(&host_time));
-}
-#endif
-#ifdef TARGET_NR_alarm /* not on alpha */
-case TARGET_NR_alarm:
-return alarm(arg1);
-#endif
-#ifdef TARGET_NR_pause /* not on alpha */
-case TARGET_NR_pause:
-if (!block_signals()) {
-sigsuspend(&((TaskState *)cpu->opaque)->signal_mask);
-}
-return -TARGET_EINTR;
-#endif
-#ifdef TARGET_NR_utime
-case TARGET_NR_utime:
-{
-struct utimbuf tbuf, *host_tbuf;
-struct target_utimbuf *target_tbuf;
-if (arg2) {
-if (!lock_user_struct(VERIFY_READ, target_tbuf, arg2, 1))
-return -TARGET_EFAULT;
-tbuf.actime = tswapal(target_tbuf->actime);
-tbuf.modtime = tswapal(target_tbuf->modtime);
-unlock_user_struct(target_tbuf, arg2, 0);
-host_tbuf = &tbuf;
-} else {
-host_tbuf = NULL;
-}
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(utime(p, host_tbuf));
-unlock_user(p, arg1, 0);
-}
-return ret;
-#endif
-#ifdef TARGET_NR_utimes
-case TARGET_NR_utimes:
-{
-struct timeval *tvp, tv[2];
-if (arg2) {
-if (copy_from_user_timeval(&tv[0], arg2)
-|| copy_from_user_timeval(&tv[1],
-  arg2 + sizeof(struct 
target_timeval)))
-return -TARGET_EFAULT;
-tvp = tv;
-} else {
-tvp = NULL;
-}
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(utimes(p, tvp));
-unlock_user(p, arg1, 0);
-}
-return ret;
-#endif
 #if defined(TARGET_NR_futimesat)
 case TARGET_NR_futimesat:
 if (is_hostfd(arg1)) {
@@ -12856,6 +12873,9 @@ IMPL(everything_else)
 }
 
 static impl_fn * const syscall_table[] = {
+#ifdef TARGET_NR_alarm
+[TARGET_NR_alarm] = impl_alarm,
+#endif
 [TARGET_NR_brk] = impl_brk,
 [TARGET_NR_close] = impl_close,
 [TARGET_NR_chdir] = impl_chdir,
@@ -12899,8 +12919,14 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_openat

[Qemu-devel] [PATCH 33/33] linux-user: Split out rt_sigqueueinfo, rt_sigtimedwait, rt_tgsigqueueinfo

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 129 ++-
 1 file changed, 67 insertions(+), 62 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c3bd625965..b9e07c2d3f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8876,6 +8876,20 @@ IMPL(rt_sigprocmask)
 return ret;
 }
 
+IMPL(rt_sigqueueinfo)
+{
+siginfo_t uinfo;
+target_siginfo_t *p;
+
+p = lock_user(VERIFY_READ, arg3, sizeof(target_siginfo_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_siginfo(&uinfo, p);
+unlock_user(p, arg3, 0);
+return get_errno(sys_rt_sigqueueinfo(arg1, arg2, &uinfo));
+}
+
 IMPL(rt_sigsuspend)
 {
 CPUState *cpu = ENV_GET_CPU(cpu_env);
@@ -8899,6 +8913,56 @@ IMPL(rt_sigsuspend)
 return ret;
 }
 
+IMPL(rt_sigtimedwait)
+{
+sigset_t set;
+struct timespec uts, *puts = NULL;
+void *p;
+siginfo_t uinfo;
+abi_long ret;
+
+if (arg4 != sizeof(target_sigset_t)) {
+return -TARGET_EINVAL;
+}
+p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_sigset(&set, p);
+unlock_user(p, arg1, 0);
+if (arg3) {
+puts = &uts;
+target_to_host_timespec(puts, arg3);
+}
+ret = get_errno(safe_rt_sigtimedwait(&set, &uinfo, puts, SIGSET_T_SIZE));
+if (!is_error(ret)) {
+if (arg2) {
+p = lock_user(VERIFY_WRITE, arg2, sizeof(target_siginfo_t), 0);
+if (!p) {
+return -TARGET_EFAULT;
+}
+host_to_target_siginfo(p, &uinfo);
+unlock_user(p, arg2, sizeof(target_siginfo_t));
+}
+ret = host_to_target_signal(ret);
+}
+return ret;
+}
+
+IMPL(rt_tgsigqueueinfo)
+{
+siginfo_t uinfo;
+target_siginfo_t *p;
+
+p = lock_user(VERIFY_READ, arg4, sizeof(target_siginfo_t), 1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+target_to_host_siginfo(&uinfo, p);
+unlock_user(p, arg4, 0);
+return get_errno(sys_rt_tgsigqueueinfo(arg1, arg2, arg3, &uinfo));
+}
+
 #ifdef TARGET_NR_sgetmask
 IMPL(sgetmask)
 {
@@ -9393,68 +9457,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-case TARGET_NR_rt_sigtimedwait:
-{
-sigset_t set;
-struct timespec uts, *puts;
-siginfo_t uinfo;
-
-if (arg4 != sizeof(target_sigset_t)) {
-return -TARGET_EINVAL;
-}
-
-if (!(p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 
1)))
-return -TARGET_EFAULT;
-target_to_host_sigset(&set, p);
-unlock_user(p, arg1, 0);
-if (arg3) {
-puts = &uts;
-target_to_host_timespec(puts, arg3);
-} else {
-puts = NULL;
-}
-ret = get_errno(safe_rt_sigtimedwait(&set, &uinfo, puts,
- SIGSET_T_SIZE));
-if (!is_error(ret)) {
-if (arg2) {
-p = lock_user(VERIFY_WRITE, arg2, sizeof(target_siginfo_t),
-  0);
-if (!p) {
-return -TARGET_EFAULT;
-}
-host_to_target_siginfo(p, &uinfo);
-unlock_user(p, arg2, sizeof(target_siginfo_t));
-}
-ret = host_to_target_signal(ret);
-}
-}
-return ret;
-case TARGET_NR_rt_sigqueueinfo:
-{
-siginfo_t uinfo;
-
-p = lock_user(VERIFY_READ, arg3, sizeof(target_siginfo_t), 1);
-if (!p) {
-return -TARGET_EFAULT;
-}
-target_to_host_siginfo(&uinfo, p);
-unlock_user(p, arg3, 0);
-ret = get_errno(sys_rt_sigqueueinfo(arg1, arg2, &uinfo));
-}
-return ret;
-case TARGET_NR_rt_tgsigqueueinfo:
-{
-siginfo_t uinfo;
-
-p = lock_user(VERIFY_READ, arg4, sizeof(target_siginfo_t), 1);
-if (!p) {
-return -TARGET_EFAULT;
-}
-target_to_host_siginfo(&uinfo, p);
-unlock_user(p, arg4, 0);
-ret = get_errno(sys_rt_tgsigqueueinfo(arg1, arg2, arg3, &uinfo));
-}
-return ret;
 #ifdef TARGET_NR_sigreturn
 case TARGET_NR_sigreturn:
 if (block_signals()) {
@@ -13132,7 +13134,10 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_rt_sigaction] = impl_rt_sigaction,
 [TARGET_NR_rt_sigpending] = impl_rt_sigpending,
 [TARGET_NR_rt_sigprocmask] = impl_rt_sigprocmask,
+[TARGET_NR_rt_sigqueueinfo] = impl_rt_sigqueueinfo,
 [TARGET_NR_rt_sigsuspend] = impl_rt_sigsuspend,
+[TARGET_NR_rt_sigtimedwait] = impl_rt_sigtimedwait,
+[TARGET_NR_rt_tgsigqueueinfo] = impl_rt_tgsigqueuein

[Qemu-devel] [PATCH 30/33] linux-user: Split out rt_sigaction, sigaction

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 325 ++-
 1 file changed, 165 insertions(+), 160 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3dfb77ac11..36e2bb838e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8730,6 +8730,81 @@ IMPL(rmdir)
 }
 #endif
 
+IMPL(rt_sigaction)
+{
+abi_long ret;
+#ifdef TARGET_ALPHA
+/* For Alpha and SPARC this is a 5 argument syscall, with
+ * a 'restorer' parameter which must be copied into the
+ * sa_restorer field of the sigaction struct.
+ * For Alpha that 'restorer' is arg5; for SPARC it is arg4,
+ * and arg5 is the sigsetsize.
+ * Alpha also has a separate rt_sigaction struct that it uses
+ * here; SPARC uses the usual sigaction struct.
+ */
+struct target_rt_sigaction *rt_act;
+struct target_sigaction act, oact, *pact = 0;
+
+if (arg4 != sizeof(target_sigset_t)) {
+return -TARGET_EINVAL;
+}
+if (arg2) {
+if (!lock_user_struct(VERIFY_READ, rt_act, arg2, 1)) {
+return -TARGET_EFAULT;
+}
+act._sa_handler = rt_act->_sa_handler;
+act.sa_mask = rt_act->sa_mask;
+act.sa_flags = rt_act->sa_flags;
+act.sa_restorer = arg5;
+unlock_user_struct(rt_act, arg2, 0);
+pact = &act;
+}
+ret = get_errno(do_sigaction(arg1, pact, &oact));
+if (!is_error(ret) && arg3) {
+if (!lock_user_struct(VERIFY_WRITE, rt_act, arg3, 0)) {
+return -TARGET_EFAULT;
+}
+rt_act->_sa_handler = oact._sa_handler;
+rt_act->sa_mask = oact.sa_mask;
+rt_act->sa_flags = oact.sa_flags;
+unlock_user_struct(rt_act, arg3, 1);
+}
+#else
+# ifdef TARGET_SPARC
+target_ulong restorer = arg4;
+target_ulong sigsetsize = arg5;
+# else
+target_ulong sigsetsize = arg4;
+# endif
+struct target_sigaction *act = NULL;
+struct target_sigaction *oact = NULL;
+
+if (sigsetsize != sizeof(target_sigset_t)) {
+return -TARGET_EINVAL;
+}
+if (arg2) {
+if (!lock_user_struct(VERIFY_READ, act, arg2, 1)) {
+return -TARGET_EFAULT;
+}
+# ifdef TARGET_ARCH_HAS_KA_RESTORER
+act->ka_restorer = restorer;
+# endif
+}
+if (arg3 && !lock_user_struct(VERIFY_WRITE, oact, arg3, 0)) {
+ret = -TARGET_EFAULT;
+} else {
+ret = get_errno(do_sigaction(arg1, act, oact));
+}
+if (act) {
+unlock_user_struct(act, arg2, 0);
+}
+if (oact) {
+unlock_user_struct(oact, arg3, 1);
+}
+#endif
+return ret;
+}
+
 IMPL(setpgid)
 {
 return get_errno(setpgid(arg1, arg2));
@@ -8740,6 +8815,92 @@ IMPL(setsid)
 return get_errno(setsid());
 }
 
+#ifdef TARGET_NR_sigaction
+IMPL(sigaction)
+{
+abi_long ret;
+# if defined(TARGET_ALPHA)
+struct target_sigaction act, oact, *pact = NULL;
+struct target_old_sigaction *old_act;
+if (arg2) {
+if (!lock_user_struct(VERIFY_READ, old_act, arg2, 1)) {
+return -TARGET_EFAULT;
+}
+act._sa_handler = old_act->_sa_handler;
+target_siginitset(&act.sa_mask, old_act->sa_mask);
+act.sa_flags = old_act->sa_flags;
+act.sa_restorer = 0;
+unlock_user_struct(old_act, arg2, 0);
+pact = &act;
+}
+ret = get_errno(do_sigaction(arg1, pact, &oact));
+if (!is_error(ret) && arg3) {
+if (!lock_user_struct(VERIFY_WRITE, old_act, arg3, 0)) {
+return -TARGET_EFAULT;
+}
+old_act->_sa_handler = oact._sa_handler;
+old_act->sa_mask = oact.sa_mask.sig[0];
+old_act->sa_flags = oact.sa_flags;
+unlock_user_struct(old_act, arg3, 1);
+}
+# elif defined(TARGET_MIPS)
+struct target_sigaction act, oact, *pact = NULL, *old_act;
+if (arg2) {
+if (!lock_user_struct(VERIFY_READ, old_act, arg2, 1)) {
+return -TARGET_EFAULT;
+}
+   act._sa_handler = old_act->_sa_handler;
+   target_siginitset(&act.sa_mask, old_act->sa_mask.sig[0]);
+   act.sa_flags = old_act->sa_flags;
+   unlock_user_struct(old_act, arg2, 0);
+   pact = &act;
+}
+ret = get_errno(do_sigaction(arg1, pact, &oact));
+if (!is_error(ret) && arg3) {
+if (!lock_user_struct(VERIFY_WRITE, old_act, arg3, 0)) {
+return -TARGET_EFAULT;
+}
+   old_act->_sa_handler = oact._sa_handler;
+   old_act->sa_flags = oact.sa_flags;
+   old_act->sa_mask.sig[0] = oact.sa_mask.sig[0];
+   old_act->sa_mask.sig[1] = 0;
+   old_act->sa_mask.sig[2] = 0;
+   old_act->sa_mask.sig[3] = 0;
+   unlock_user_struct(old_act, arg3, 1);
+}
+# else
+struct target_sigaction act, oact, *pact = NULL;
+struct target_old_sigaction *old_act;
+if (arg2) {
+if (!lock_user_struct(VERIFY_READ, old_act, arg2, 1)) {
+return -TARGET_EFAULT;
+}
+act._sa_handler = old_act->_sa

[Qemu-devel] [PATCH 26/33] linux-user: Split out acct, pipe, pipe2, times, umount2

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 127 +++
 1 file changed, 80 insertions(+), 47 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 36092d753d..bde1f9872f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7912,6 +7912,24 @@ IMPL(access)
 }
 #endif
 
+IMPL(acct)
+{
+if (arg1 == 0) {
+return get_errno(acct(NULL));
+} else {
+char *fn = lock_user_string(arg1);
+abi_long ret;
+
+if (!fn) {
+return -TARGET_EFAULT;
+}
+TRY_INTERP_PATH(ret, fn, acct(fn));
+ret = get_errno(ret);
+unlock_user(fn, arg1, 0);
+return ret;
+}
+}
+
 #ifdef TARGET_NR_alarm
 IMPL(alarm)
 {
@@ -8529,6 +8547,21 @@ IMPL(pause)
 }
 #endif
 
+#ifdef TARGET_NR_pipe
+IMPL(pipe)
+{
+return do_pipe(cpu_env, arg1, 0, 0);
+}
+#endif
+
+#ifdef TARGET_NR_pipe2
+IMPL(pipe2)
+{
+return do_pipe(cpu_env, arg1,
+   target_to_host_bitmask(arg2, fcntl_flags_tbl), 1);
+}
+#endif
+
 IMPL(read)
 {
 abi_long ret;
@@ -8657,6 +8690,27 @@ IMPL(time)
 }
 #endif
 
+IMPL(times)
+{
+struct tms tms;
+abi_long ret = get_errno(times(&tms));
+if (arg1) {
+struct target_tms *tmsp
+= lock_user(VERIFY_WRITE, arg1, sizeof(struct target_tms), 0);
+if (!tmsp) {
+return -TARGET_EFAULT;
+}
+tmsp->tms_utime = tswapal(host_to_target_clock_t(tms.tms_utime));
+tmsp->tms_stime = tswapal(host_to_target_clock_t(tms.tms_stime));
+tmsp->tms_cutime = tswapal(host_to_target_clock_t(tms.tms_cutime));
+tmsp->tms_cstime = tswapal(host_to_target_clock_t(tms.tms_cstime));
+}
+if (!is_error(ret)) {
+ret = host_to_target_clock_t(ret);
+}
+return ret;
+}
+
 #ifdef TARGET_NR_umount
 IMPL(umount)
 {
@@ -8672,6 +8726,21 @@ IMPL(umount)
 }
 #endif
 
+#ifdef TARGET_NR_umount2
+IMPL(umount2)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(umount2(p, arg2));
+unlock_user(p, arg1, 0);
+return ret;
+}
+#endif
+
 #ifdef TARGET_NR_unlink
 IMPL(unlink)
 {
@@ -8831,53 +8900,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_pipe
-case TARGET_NR_pipe:
-return do_pipe(cpu_env, arg1, 0, 0);
-#endif
-#ifdef TARGET_NR_pipe2
-case TARGET_NR_pipe2:
-return do_pipe(cpu_env, arg1,
-   target_to_host_bitmask(arg2, fcntl_flags_tbl), 1);
-#endif
-case TARGET_NR_times:
-{
-struct target_tms *tmsp;
-struct tms tms;
-ret = get_errno(times(&tms));
-if (arg1) {
-tmsp = lock_user(VERIFY_WRITE, arg1, sizeof(struct 
target_tms), 0);
-if (!tmsp)
-return -TARGET_EFAULT;
-tmsp->tms_utime = 
tswapal(host_to_target_clock_t(tms.tms_utime));
-tmsp->tms_stime = 
tswapal(host_to_target_clock_t(tms.tms_stime));
-tmsp->tms_cutime = 
tswapal(host_to_target_clock_t(tms.tms_cutime));
-tmsp->tms_cstime = 
tswapal(host_to_target_clock_t(tms.tms_cstime));
-}
-if (!is_error(ret))
-ret = host_to_target_clock_t(ret);
-}
-return ret;
-case TARGET_NR_acct:
-if (arg1 == 0) {
-ret = get_errno(acct(NULL));
-} else {
-if (!(fn = lock_user_string(arg1))) {
-return -TARGET_EFAULT;
-}
-TRY_INTERP_PATH(ret, fn, acct(fn));
-ret = get_errno(ret);
-unlock_user(fn, arg1, 0);
-}
-return ret;
-#ifdef TARGET_NR_umount2
-case TARGET_NR_umount2:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(umount2(p, arg2));
-unlock_user(p, arg1, 0);
-return ret;
-#endif
 case TARGET_NR_ioctl:
 return do_ioctl(arg1, arg2, arg3);
 #ifdef TARGET_NR_fcntl
@@ -12937,6 +12959,7 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_access
 [TARGET_NR_access] = impl_access,
 #endif
+[TARGET_NR_acct] = impl_acct,
 #ifdef TARGET_NR_alarm
 [TARGET_NR_alarm] = impl_alarm,
 #endif
@@ -13003,6 +13026,12 @@ static impl_fn * const syscall_table[] = {
 #endif
 #ifdef TARGET_NR_pause
 [TARGET_NR_pause] = impl_pause,
+#endif
+#ifdef TARGET_NR_pipe
+[TARGET_NR_pipe] = impl_pipe,
+#endif
+#ifdef TARGET_NR_pipe2
+[TARGET_NR_pipe2] = impl_pipe2,
 #endif
 [TARGET_NR_read] = impl_read,
 #ifdef TARGET_NR_rename
@@ -13027,9 +13056,13 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_time
 [TARGET_NR_time] = impl_time,
 #endif
+[TARGET_NR_times] = impl_times,
 #ifdef TARGET_NR_umount
 [TARGET_NR_umount] = impl_umount,
 #endif
+#ifdef TARGET_NR_umount2
+[TARGET_NR_umount2] = impl_umount2,
+#endif
 

Re: [Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall

2018-06-01 Thread Richard Henderson
On 06/01/2018 12:30 AM, Richard Henderson wrote:
> This function is, as I think everyone will agree, way too large.
> This is about a third of the complete change, but I thought I'd
> get some feedback on the method and form before I go any farther.

Bah.  I also meant to say

Based-on: 20180531224911.23725-1-richard.hender...@linaro.org

that is, the interp_prefix patch set from earlier today.


r~



[Qemu-devel] [PATCH 28/33] linux-user: Split out chroot, dup2, dup3, fcntl, setpgid, umask

2018-06-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 123 +++
 1 file changed, 79 insertions(+), 44 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4be71367fc..4d9b9cad6e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7879,6 +7879,19 @@ IMPL(chmod)
 }
 #endif
 
+IMPL(chroot)
+{
+char *p = lock_user_string(arg1);
+abi_long ret;
+
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(chroot(p));
+unlock_user(p, arg1, 0);
+return ret;
+}
+
 IMPL(close)
 {
 if (is_hostfd(arg1)) {
@@ -7918,6 +7931,43 @@ IMPL(dup)
 return ret;
 }
 
+#ifdef TARGET_NR_dup2
+IMPL(dup2)
+{
+abi_long ret;
+
+if (is_hostfd(arg1) || is_hostfd(arg2)) {
+return -TARGET_EBADF;
+}
+ret = get_errno(dup2(arg1, arg2));
+if (ret >= 0) {
+fd_trans_dup(arg1, arg2);
+}
+return ret;
+}
+#endif
+
+#if defined(TARGET_NR_dup3) && defined(CONFIG_DUP3)
+IMPL(dup3)
+{
+int host_flags;
+abi_long ret;
+
+if (is_hostfd(arg1) || is_hostfd(arg2)) {
+return -TARGET_EBADF;
+}
+if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
+return -EINVAL;
+}
+host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
+ret = get_errno(dup3(arg1, arg2, host_flags));
+if (ret >= 0) {
+fd_trans_dup(arg1, arg2);
+}
+return ret;
+}
+#endif
+
 IMPL(execve)
 {
 abi_ulong *guest_ptrs;
@@ -8087,6 +8137,13 @@ IMPL(faccessat)
 }
 #endif
 
+#ifdef TARGET_NR_fcntl
+IMPL(fcntl)
+{
+return do_fcntl(arg1, arg2, arg3);
+}
+#endif
+
 #ifdef TARGET_NR_fork
 IMPL(fork)
 {
@@ -8659,6 +8716,11 @@ IMPL(rmdir)
 }
 #endif
 
+IMPL(setpgid)
+{
+return get_errno(setpgid(arg1, arg2));
+}
+
 #ifdef TARGET_NR_stime
 IMPL(stime)
 {
@@ -8716,6 +8778,11 @@ IMPL(times)
 return ret;
 }
 
+IMPL(umask)
+{
+return get_errno(umask(arg1));
+}
+
 #ifdef TARGET_NR_umount
 IMPL(umount)
 {
@@ -8905,50 +8972,6 @@ IMPL(everything_else)
 char *fn;
 
 switch(num) {
-#ifdef TARGET_NR_fcntl
-case TARGET_NR_fcntl:
-return do_fcntl(arg1, arg2, arg3);
-#endif
-case TARGET_NR_setpgid:
-return get_errno(setpgid(arg1, arg2));
-case TARGET_NR_umask:
-return get_errno(umask(arg1));
-case TARGET_NR_chroot:
-if (!(p = lock_user_string(arg1)))
-return -TARGET_EFAULT;
-ret = get_errno(chroot(p));
-unlock_user(p, arg1, 0);
-return ret;
-#ifdef TARGET_NR_dup2
-case TARGET_NR_dup2:
-if (is_hostfd(arg1) || is_hostfd(arg2)) {
-return -TARGET_EBADF;
-}
-ret = get_errno(dup2(arg1, arg2));
-if (ret >= 0) {
-fd_trans_dup(arg1, arg2);
-}
-return ret;
-#endif
-#if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3)
-case TARGET_NR_dup3:
-{
-int host_flags;
-
-if (is_hostfd(arg1) || is_hostfd(arg2)) {
-return -TARGET_EBADF;
-}
-if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
-return -EINVAL;
-}
-host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
-ret = get_errno(dup3(arg1, arg2, host_flags));
-if (ret >= 0) {
-fd_trans_dup(arg1, arg2);
-}
-return ret;
-}
-#endif
 #ifdef TARGET_NR_getppid /* not on alpha */
 case TARGET_NR_getppid:
 return get_errno(getppid());
@@ -12969,6 +12992,7 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_brk] = impl_brk,
 [TARGET_NR_close] = impl_close,
 [TARGET_NR_chdir] = impl_chdir,
+[TARGET_NR_chroot] = impl_chroot,
 #ifdef TARGET_NR_chmod
 [TARGET_NR_chmod] = impl_chmod,
 #endif
@@ -12976,11 +13000,20 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_creat] = impl_creat,
 #endif
 [TARGET_NR_dup] = impl_dup,
+#ifdef TARGET_NR_dup2
+[TARGET_NR_dup2] = impl_dup2,
+#endif
+#if defined(TARGET_NR_dup3) && defined(CONFIG_DUP3)
+[TARGET_NR_dup3] = impl_dup3,
+#endif
 [TARGET_NR_execve] = impl_execve,
 [TARGET_NR_exit] = impl_exit,
 #ifdef TARGET_NR_faccessat
 [TARGET_NR_faccessat] = impl_faccessat,
 #endif
+#ifdef TARGET_NR_fcntl
+[TARGET_NR_fcntl] = impl_fcntl,
+#endif
 #ifdef TARGET_NR_fork
 [TARGET_NR_fork] = impl_fork,
 #endif
@@ -13050,6 +13083,7 @@ static impl_fn * const syscall_table[] = {
 #ifdef TARGET_NR_rmdir
 [TARGET_NR_rmdir] = impl_rmdir,
 #endif
+[TARGET_NR_setpgid] = impl_setpgid,
 #ifdef TARGET_NR_stime
 [TARGET_NR_stime] = impl_stime,
 #endif
@@ -13061,6 +13095,7 @@ static impl_fn * const syscall_table[] = {
 [TARGET_NR_time] = impl_time,
 #endif
 [TARGET_NR_times] = impl_times,
+[TARGET_NR_umask] = impl_umask,
 #ifdef TARGET_NR_umount
 [TARGET_NR_umount] = impl_umount,
 #endif
-- 
2.17.0




Re: [Qemu-devel] [PATCH 01/33] linux-user: Split out do_syscall1

2018-06-01 Thread Laurent Vivier
Le 01/06/2018 à 09:30, Richard Henderson a écrit :
> There was supposed to be a single point of return for do_syscall
> so that tracing works properly.  However, there are a few bugs
> in that area.  It is significantly simpler to simply split out
> an inner function to enforce this.
> 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/syscall.c | 89 +++-
>  1 file changed, 54 insertions(+), 35 deletions(-)

Reviewed-by: Laurent Vivier 





Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap

2018-06-01 Thread Wei Wang

On 06/01/2018 12:00 PM, Peter Xu wrote:

On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
---
  include/migration/misc.h |  2 ++
  migration/ram.c  | 44 
  2 files changed, 46 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
  #ifndef MIGRATION_MISC_H
  #define MIGRATION_MISC_H
  
+#include "exec/cpu-common.h"

  #include "qemu/notify.h"
  
  /* migration/ram.c */
  
  void ram_mig_init(void);

+void qemu_guest_free_page_hint(void *addr, size_t len);
  
  /* migration/block.c */
  
diff --git a/migration/ram.c b/migration/ram.c

index 9a72b1a..0147548 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
  }
  
  /*

+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;

Do we need to check here on whether a migration is in progress?  Since
if not I'm not sure whether this hint still makes any sense any more,
and more importantly it seems to me that block->bmap below at [1] is
only valid during a migration.  So I'm not sure whether QEMU will
crash if this function is called without a running migration.


OK. How about just adding comments above to have users noted that this 
function should be used during migration?


If we want to do a sanity check here, I think it would be easier to just 
check !block->bmap here.






+
+for (; len > 0; len -= used_len) {
+block = qemu_ram_block_from_host(addr, false, &offset);
+if (unlikely(!block)) {
+return;

We should never reach here, should we?  Assuming the callers of this
function should always pass in a correct host address. If we are very
sure that the host addr should be valid, could we just assert?


Probably not the case, because of the corner case that the memory would 
be hot unplugged after the free page is reported to QEMU.







+}
+
+/*
+ * This handles the case that the RAMBlock is resized after the free
+ * page hint is reported.
+ */
+if (unlikely(offset > block->used_length)) {
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+addr += used_len;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(&ram_state->bitmap_mutex);

So now I think I understand the lock can still be meaningful since
this function now can be called outside the migration thread (e.g., in
vcpu thread).  But still it would be nice to mention it somewhere on
the truth of the lock.



Yes. Thanks for the reminder. I will add some explanation to the patch 2 
commit log.



Best,
Wei



Re: [Qemu-devel] About cpu_physical_memory_map()

2018-06-01 Thread Huaicheng Li
Hi Peter,

Thank you a lot for the analysis!

So it'll be simpler
> if you start with the buffer in the host QEMU process, map this
> in to the guest's physical address space at some GPA, tell the
> guest kernel that that's the GPA to use, and have the guest kernel
> map that GPA into the guest userspace process's virtual address space.
> (Think of how you would map a framebuffer, for instance.)


This makes sense to me. Could you help provide a pointer where I can refer
to similar implementations?
Should I do something like this during system memory initialization:

memory_region_init_ram_ptr(my_mr, owner, "mybuf", buf_size, buf); //
where buf is the buffer in QEMU AS
memory_region_add_subregion(system_memory, GPA_OFFSET, my_mr);

If I set guest memory to be "-m 1G", can I make "GPA_OFFSET" beyond 1GB
(e.g. 2GB)? This way, the guest OS
won't be able to access my buffer and use it like other regular RAM.

Thanks!

Best,
Huaicheng




On Thu, May 31, 2018 at 3:11 AM Peter Maydell 
wrote:

> On 30 May 2018 at 01:24, Huaicheng Li  wrote:
> > Dear QEMU/KVM developers,
> >
> > I was trying to map a buffer in host QEMU process to a guest user space
> > application. I tried to achieve this
> > by allocating a buffer in the guest application first, then map this
> buffer
> > to QEMU process address space via
> > GVA -> GPA --> HVA (GPA to HVA is done via cpu_physical_memory_map).
> Last,
> > I wrote a host kernel driver to
> > walk QEMU process's page table and change corresponding page table
> entries
> > of HVA to the HPA of the target
> > buffer.
>
> This seems like the wrong way round to try to do this. As a rule
> of thumb, you'll have an easier life if you have things behave
> similarly to how they would in real hardware. So it'll be simpler
> if you start with the buffer in the host QEMU process, map this
> in to the guest's physical address space at some GPA, tell the
> guest kernel that that's the GPA to use, and have the guest kernel
> map that GPA into the guest userspace process's virtual address space.
> (Think of how you would map a framebuffer, for instance.)
>
> Changing the host page table entries for QEMU under its feet seems
> like it's never going to work reliably.
>
> (I think the specific problem you're running into is that guest memory
> is both mapped into the QEMU host process and also exposed to the
> guest VM. The former is controlled by the page tables for the
> QEMU host process, but the latter is a different set of page tables,
> which QEMU asks the kernel to configure, using KVM_SET_USER_MEMORY_REGION
> ioctls.)
>
> thanks
> -- PMM
>


[Qemu-devel] [PATCH V6 1/7] memory, exec: Expose all memory block related flags.

2018-06-01 Thread junyan . he
From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
---
 exec.c| 17 -
 include/exec/memory.h | 17 +
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/exec.c b/exec.c
index c30f905..302c04b 100644
--- a/exec.c
+++ b/exec.c
@@ -87,23 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 67ea7fe..3da315e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -102,6 +102,23 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end)
-- 
2.7.4




[Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-01 Thread junyan . he
From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v6:
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
rather than sync pages one by one in previous version.

Changes in v5:
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4:
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
PMEM writes in it, so we can remove the unnecessary
xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.


Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] hostmem-file: add the 'pmem' option
[4/7] configure: add libpmem support


Signed-off-by: Haozhong Zhang 
Signed-off-by: Junyan He 

---
backends/hostmem-file.c | 28 +++-
configure   | 29 +
docs/nvdimm.txt | 14 ++
exec.c  | 36 ++--
hw/mem/nvdimm.c |  9 -
include/exec/memory.h   | 31 +--
include/exec/ram_addr.h | 28 ++--
include/qemu/pmem.h | 24 
memory.c|  8 +---
migration/ram.c | 18 ++
numa.c  |  2 +-
qemu-options.hx |  7 +++
stubs/Makefile.objs |  1 +
stubs/pmem.c| 23 +++
14 files changed, 226 insertions(+), 32 deletions(-)
-- 
2.7.4



[Qemu-devel] [PATCH V6 2/7] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-06-01 Thread junyan . he
From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
  memory_region_init_ram_from_file
  qemu_ram_alloc_from_fd
  qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  8 ++--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index 302c04b..f2082fa 100644
--- a/exec.c
+++ b/exec.c
@@ -2054,7 +2054,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2096,14 +2096,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2115,7 +2115,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2127,7 +2127,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3da315e..3b68a43 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -596,6 +596,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -607,7 +608,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: specify properties of this memory region, which can be one or
+ * bit-or of following values:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -619,7 +623,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t ram_flags,
   const char *path,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf24

[Qemu-devel] [PATCH V6 4/7] configure: add libpmem support

2018-06-01 Thread junyan . he
From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] 
https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index a6a4616..f44d669 100755
--- a/configure
+++ b/configure
@@ -456,6 +456,7 @@ jemalloc="no"
 replication="yes"
 vxhs=""
 libxml2=""
+libpmem=""
 
 supported_cpu="no"
 supported_os="no"
@@ -1381,6 +1382,10 @@ for opt do
   ;;
   --disable-git-update) git_update=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1638,6 +1643,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   crypto-afalgLinux AF_ALG crypto backend driver
   vhost-user  vhost-user support
   capstonecapstone disassembler support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5445,6 +5451,24 @@ EOF
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -5907,6 +5931,7 @@ echo "avx2 optimization $avx2_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6651,6 +6676,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4




[Qemu-devel] [PATCH V6 5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-06-01 Thread junyan . he
From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 4087aca..03b478e 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..be9a042 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4




[Qemu-devel] [PATCH V6 6/7] migration/ram: Add check and info message to nvdimm post copy.

2018-06-01 Thread junyan . he
From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index c53e836..aa0c6f0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3397,6 +3397,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH V6 3/7] hostmem-file: add the 'pmem' option

2018-06-01 Thread junyan . he
From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c | 27 ++-
 docs/nvdimm.txt | 14 ++
 exec.c  |  9 +
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..ccca7a1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +133,26 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(OBJECT(backend)));
+return;
+}
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +184,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8b..bcb2032 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -153,3 +153,17 @@ guest NVDIMM region mapping structure.  This unarmed flag 
indicates
 guest software that this vNVDIMM device contains a region that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+--
+
+[1] SNIA NVM Programming Model: 
https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] PMDK: http://pmem.io/pmdk/
diff --git a/exec.c b/exec.c
index f2082fa..f066705 100644
--- a/exec.c
+++ b/exec.c
@@ -2061,6 +2061,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 Error *local_err = NULL;
 int64_t file_size;
 
+/* Just support these ram flags by now. */
+assert(ram_flags == 0 || (ram_flags & (RAM_SHARED | RAM_PMEM)));
+
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
 return NULL;
@@ -3971,6 +3974,11 @@ err:
 return ret;
 }
 
+bool ramblock_is_pmem(RAMBlock *rb)
+{
+return rb->flags & RAM_PMEM;
+}
+
 #endif
 
 void page_size_init(void)
@@ -4069,3 +4077,4 @@ void mtree_print_dispatch(fprintf_function mon, void *f,
 }
 
 #endif
+
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b68a43..6523512 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -119,6 +119,11 @@ typedef struct IOMMUNotifier IOMMUNotifier;
  */
 #define RAM_UF_ZEROPAGE (1 << 3)
 
+/* QEMU_RAM_PMEM is avail

Re: [Qemu-devel] [PATCH v2 01/20] cutils: Provide strchrnul

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:25:56 -0400
Keno Fischer  wrote:

> strchrnul is a GNU extension and thus unavailable on a number of targets.
> In the review for a commit removing strchrnul from 9p, I was asked to
> create a qemu_strchrnul helper to factor out this functionality.
> Do so, and use it in a number of other places in the code base that inlined
> the replacement pattern in a place where strchrnul could be used.
> 
> Signed-off-by: Keno Fischer 
> ---
> 

And possibly we could detect in configure if the host has strchrnul() and
use it, but this optimization can be done later.

I haven't checked if there could be other candidates in the current code
base though. Also, this patch touches some other subsystems, so I'm Cc'ing
to the other maintainers as reported by ./scripts/get_maintainer.pl:

Greg Kurz  (supporter:virtio-9p)
Markus Armbruster  (supporter:QMP)
"Dr. David Alan Gilbert"  (maintainer:Human Monitor (HMP))
qemu-devel@nongnu.org (open list:All patches CC here)

Anyway,

Acked-by: Greg Kurz 

> Changes since v1: New patch
> 
>  hw/9pfs/9p-local.c|  2 +-
>  include/qemu/cutils.h |  1 +
>  monitor.c |  8 ++--
>  util/cutils.c | 13 +
>  util/qemu-option.c|  6 +-
>  util/uri.c|  6 ++
>  6 files changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> index b37b1db..bcf2798 100644
> --- a/hw/9pfs/9p-local.c
> +++ b/hw/9pfs/9p-local.c
> @@ -65,7 +65,7 @@ int local_open_nofollow(FsContext *fs_ctx, const char 
> *path, int flags,
>  assert(*path != '/');
>  
>  head = g_strdup(path);
> -c = strchrnul(path, '/');
> +c = qemu_strchrnul(path, '/');
>  if (*c) {
>  /* Intermediate path element */
>  head[c - path] = 0;
> diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
> index a663340..bc40c30 100644
> --- a/include/qemu/cutils.h
> +++ b/include/qemu/cutils.h
> @@ -122,6 +122,7 @@ int qemu_strnlen(const char *s, int max_len);
>   * Returns: the pointer originally in @input.
>   */
>  char *qemu_strsep(char **input, const char *delim);
> +const char *qemu_strchrnul(const char *s, int c);
>  time_t mktimegm(struct tm *tm);
>  int qemu_fdatasync(int fd);
>  int fcntl_setfl(int fd, int flag);
> diff --git a/monitor.c b/monitor.c
> index 922cfc0..e1f01c4 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -798,9 +798,7 @@ static int compare_cmd(const char *name, const char *list)
>  p = list;
>  for(;;) {
>  pstart = p;
> -p = strchr(p, '|');
> -if (!p)
> -p = pstart + strlen(pstart);
> +p = qemu_strchrnul(p, '|');
>  if ((p - pstart) == len && !memcmp(pstart, name, len))
>  return 1;
>  if (*p == '\0')
> @@ -3401,9 +3399,7 @@ static void cmd_completion(Monitor *mon, const char 
> *name, const char *list)
>  p = list;
>  for(;;) {
>  pstart = p;
> -p = strchr(p, '|');
> -if (!p)
> -p = pstart + strlen(pstart);
> +p = qemu_strchrnul(p, '|');
>  len = p - pstart;
>  if (len > sizeof(cmd) - 2)
>  len = sizeof(cmd) - 2;
> diff --git a/util/cutils.c b/util/cutils.c
> index 0de69e6..6e078b0 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -545,6 +545,19 @@ int qemu_strtou64(const char *nptr, const char **endptr, 
> int base,
>  }
>  
>  /**
> + * Searches for the first occurrence of 'c' in 's', and returns a pointer
> + * to the trailing null byte if none was found.
> + */
> +const char *qemu_strchrnul(const char *s, int c)
> +{
> +const char *e = strchr(s, c);
> +if (!e) {
> +e = s + strlen(s);
> +}
> +return e;
> +}
> +
> +/**
>   * parse_uint:
>   *
>   * @s: String to parse
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 58d1c23..54eca12 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -77,11 +77,7 @@ const char *get_opt_value(const char *p, char **value)
>  
>  *value = NULL;
>  while (1) {
> -offset = strchr(p, ',');
> -if (!offset) {
> -offset = p + strlen(p);
> -}
> -
> +offset = qemu_strchrnul(p, ',');
>  length = offset - p;
>  if (*offset != '\0' && *(offset + 1) == ',') {
>  length++;
> diff --git a/util/uri.c b/util/uri.c
> index 8624a7a..8bdef84 100644
> --- a/util/uri.c
> +++ b/util/uri.c
> @@ -52,6 +52,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  
>  #include "qemu/uri.h"
>  
> @@ -2266,10 +2267,7 @@ struct QueryParams *query_params_parse(const char 
> *query)
>  /* Find the next separator, or end of the string. */
>  end = strchr(query, '&');
>  if (!end) {
> -end = strchr(query, ';');
> -}
> -if (!end) {
> -end = query + strlen(query);
> +end = qemu_strchrnul(query, ';');
>  }
>  
>  /* Find the first '=' char

[Qemu-devel] [PATCH V6 7/7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-01 Thread junyan . he
From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..b1e1b5c 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void *pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index aa0c6f0..09525b2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+ }
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..c5bc6d6 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void *pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4




Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT

2018-06-01 Thread Arnabjyoti Kalita
Dear Pavel,

Thank you for providing me with all the details. Let us take an example of
a Network packet. In icount mode, when the network backend, receives a
network packet, you record the whole packet with the help of the
replay-filter. This packet will be written to the log file. Now when the
time comes for replay, you stop accepting any packets from the network
backend and directly inject all of the packets that you have already
recorded in the log file into the guest address space memory. Am I correct
in understanding this ?

Thanks and Regards,
Arnab

On Fri, Jun 1, 2018 at 1:31 AM, Pavel Dovgalyuk  wrote:

> Hi,
>
>
>
> I’m not familiar with KVM, but I know successful attempts of replaying the
> execution by logging IO and MMIO in TCG mode.
>
> The difference in CPU I/O and VM I/O is the following. In icount we record
> anything coming into the VM, but not into the CPU.
>
> It means that the whole packet is recorded. Virtual hardware behaves
> deterministically and therefore CPU will get identical
>
> input in case of replay, because the whole recorded packet is injected
> again by the filter.
>
>
>
> Pavel Dovgalyuk
>
>
>
> *From:* Arnabjyoti Kalita [mailto:akal...@cs.stonybrook.edu]
> *Sent:* Thursday, May 31, 2018 11:14 PM
> *To:* Pavel Dovgalyuk
> *Cc:* Stefan Hajnoczi; qemu-devel@nongnu.org; Pavel Dovgalyuk
> *Subject:* Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT
>
>
>
> Dear Pavel,
>
>
>
> Thank you for your answer. I am not being able to understand the
> difference between CPU I/Os and VM I/Os. Would any network packet that
> comes into the Guest OS from the outside be a part of VM I/O or CPU I/O ? I
> am only interested in "recording" and "replaying" those network packets
> that come from the outside into the networking backend and not the other
> way around. Say for example when I get a VMExit because of the arrival of a
> network packet, I will use the VMExit reason : "KVM_EXIT_MMIO"  to trace
> back to "e1000_mmio_write()" which I expect should be enough to record
> network packets that come from the outside and write to the guest address
> space for "e1000" devices. In such a case, I think I will not have to use
> the "network-filter" backend that you use to record VM I/O only. Let me
> know if you find errors in my approach.
>
>
>
> I will try to see how I can record disk packets. If disk packets use other
> ways of writing to the guest memory apart from a normal VMExit, I will try
> to find it out. Eventually I hope that it will use one of the available
> disk front-end functions to write to the guest memory from the disk, just
> like e1000 does with an "e1000_mmio_write()" call.
>
>
>
> Thanks and best regards,
>
> Arnab
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, May 31, 2018 at 8:44 AM, Pavel Dovgalyuk 
> wrote:
>
> > From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> > On Wed, May 30, 2018 at 11:19:13PM -0400, Arnabjyoti Kalita wrote:
> > > I am trying to implement a 'minimal' record-replay mechanism for KVM,
> which
> > > is similar to the one existing for TCG via -icount. I am trying to
> record
> > > I/O events only (specifically disk and network events) when KVM does a
> > > VMEXIT. This has led me to the function kvm_cpu_exec where I can
> clearly
> > > see the different ways of handling all of the possible VMExit cases
> (like
> > > PIO, MMIO etc.). To record network packets, I am working with the e1000
> > > hardware device.
> > >
> > > Can I make sure that all of the network I/O, atleast for the e1000
> device
> > > happens through the KVM_EXIT_MMIO case and subsequent use of the
> > > address_space_rw() function ? Do I also need to look at other
> functions as
> > > well ? Also for recording disk activity, can I make sure that looking
> out
> > > for the KVM_EXIT_MMIO and/or KVM_EXIT_PIO cases in the vmexit
> mechanism,
> > > will be enough ?
> > >
> > > Let me know if there are other details that I need to take care of. I
> am
> > > using QEMU 2.11 on a x86-64 CPU and the guest runs a Linux Kernel 4.4
> with
> > > Ubuntu 16.04.
>
> The main icount-based record/replay advantage is that we don't record
> any CPU IO. We record only VM IO (e.g., by using the network filter).
>
> Disk devices may transfer data to CPU using DMA, therefore intercepting
> only VMExit cases will not be enough.
>
> Pavel Dovgalyuk
>
>
>


Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-01 Thread no-reply
Hi,

This series failed docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1527840629-18648-1-git-send-email-junyan...@gmx.com
Subject: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
4ddbee3d9b migration/ram: ensure write persistence on loading all data to PMEM.
1e8b882644 migration/ram: Add check and info message to nvdimm post copy.
fbc70b5463 mem/nvdimm: ensure write persistence to PMEM in label emulation
10e5fa15d3 configure: add libpmem support
8f7f12852b hostmem-file: add the 'pmem' option
ab99d47fbf memory, exec: switch file ram allocation functions to 'flags' 
parameters
08c1d813ef memory, exec: Expose all memory block related flags.

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-4kz3s9ce/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   fedora
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-4kz3s9ce/src'
  GEN 
/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPYRUNNER
RUN test-mingw in qemu:fedora 
Packages installed:
PyYAML-3.12-5.fc27.x86_64
SDL2-devel-2.0.7-2.fc27.x86_64
bc-1.07.1-3.fc27.x86_64
bison-3.0.4-8.fc27.x86_64
bluez-libs-devel-5.48-3.fc27.x86_64
brlapi-devel-0.6.6-8.fc27.x86_64
bzip2-1.0.6-24.fc27.x86_64
bzip2-devel-1.0.6-24.fc27.x86_64
ccache-3.3.6-1.fc27.x86_64
clang-5.0.1-5.fc27.x86_64
device-mapper-multipath-devel-0.7.1-9.git847cc43.fc27.x86_64
findutils-4.6.0-16.fc27.x86_64
flex-2.6.1-5.fc27.x86_64
gcc-7.3.1-5.fc27.x86_64
gcc-c++-7.3.1-5.fc27.x86_64
gettext-0.19.8.1-12.fc27.x86_64
git-2.14.3-3.fc27.x86_64
glib2-devel-2.54.3-2.fc27.x86_64
glusterfs-api-devel-3.12.7-1.fc27.x86_64
gnutls-devel-3.5.18-2.fc27.x86_64
gtk3-devel-3.22.26-2.fc27.x86_64
hostname-3.18-4.fc27.x86_64
libaio-devel-0.3.110-9.fc27.x86_64
libasan-7.3.1-5.fc27.x86_64
libattr-devel-2.4.47-21.fc27.x86_64
libcap-devel-2.25-7.fc27.x86_64
libcap-ng-devel-0.7.8-5.fc27.x86_64
libcurl-devel-7.55.1-10.fc27.x86_64
libfdt-devel-1.4.6-1.fc27.x86_64
libpng-devel-1.6.31-1.fc27.x86_64
librbd-devel-12.2.4-1.fc27.x86_64
libssh2-devel-1.8.0-5.fc27.x86_64
libubsan-7.3.1-5.fc27.x86_64
libusbx-devel-1.0.21-4.fc27.x86_64
libxml2-devel-2.9.7-1.fc27.x86_64
llvm-5.0.1-6.fc27.x86_64
lzo-devel-2.08-11.fc27.x86_64
make-4.2.1-4.fc27.x86_64
mingw32-SDL-1.2.15-9.fc27.noarch
mingw32-bzip2-1.0.6-9.fc27.noarch
mingw32-curl-7.54.1-2.fc27.noarch
mingw32-glib2-2.54.1-1.fc27.noarch
mingw32-gmp-6.1.2-2.fc27.noarch
mingw32-gnutls-3.5.13-2.fc27.noarch
mingw32-gtk2-2.24.31-4.fc27.noarch
mingw32-gtk3-3.22.16-1.fc27.noarch
mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw32-libpng-1.6.29-2.fc27.noarch
mingw32-libssh2-1.8.0-3.fc27.noarch
mingw32-libtasn1-4.13-1.fc27.noarch
mingw32-nettle-3.3-3.fc27.noarch
mingw32-pixman-0.34.0-3.fc27.noarch
mingw32-pkg-config-0.28-9.fc27.x86_64
mingw64-SDL-1.2.15-9.fc27.noarch
mingw64-bzip2-1.0.6-9.fc27.noarch
mingw64-curl-7.54.1-2.fc27.noarch
mingw64-glib2-2.54.1-1.fc27.noarch
mingw64-gmp-6.1.2-2.fc27.noarch
mingw64-gnutls-3.5.13-2.fc27.noarch
mingw64-gtk2-2.24.31-4.fc27.noarch
mingw64-gtk3-3.22.16-1.fc27.noarch
mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw64-libpng-1.6.29-2.fc27.noarch
mingw64-libssh2-1.8.0-3.fc27.noarch
mingw64-libtasn1-4.13-1.fc27.noarch
mingw64-nettle-3.3-3.fc27.noarch
mingw64-pixman-0.34.0-3.fc27.noarch
mingw64-pkg-config-0.28-9.fc27.x86_64
ncurses-devel-6.0-13.20170722.fc27.x86_64
nettle-devel-3.4-1.fc27.x86_64
nss-devel-3.36.0-1.0.fc27.x86_64
numactl-devel-2.0.11-5.fc27.x86_64
package libjpeg-devel is not installed
perl-5.26.1-403.fc27.x86_64
pixman-devel-0.34.0-4.fc27.x86_64
python3-3.6.2-13.fc27.x86_64
snappy-devel-1.1.4-5.fc27.x86_64
sparse-0.5.1-2.fc27.x86_64
spice-server-devel-0.14.0-1.fc27.x

Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading

2018-06-01 Thread no-reply
Hi,

This series failed build test on s390x host. Please find the details below.

Type: series
Message-id: 20180601062849.28641-1-f...@redhat.com
Subject: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20180601062849.28641-1-f...@redhat.com -> 
patchew/20180601062849.28641-1-f...@redhat.com
Switched to a new branch 'test'
5c03bb24cc qemu-img: Convert with copy offloading
51e732e62a block-backend: Add blk_co_copy_range
92edadadac iscsi: Implement copy offloading
6f8d57827a iscsi: Create and use iscsi_co_wait_for_task
5940e7ebff iscsi: Query and save device designator when opening
7619b4045d file-posix: Implement bdrv_co_copy_range
af0dbb042d qcow2: Implement copy offloading
d1550d576c raw: Implement copy offloading
61c824c950 raw: Check byte range uniformly
35c506afdd block: Introduce API for copy offloading
b5683be886 docker: Update fedora image to 28

=== OUTPUT BEGIN ===
=== ENV ===
LANG=en_US.UTF-8
XDG_SESSION_ID=212219
USER=fam
PWD=/var/tmp/patchew-tester-tmp-2l7s8dte/src
HOME=/home/fam
SHELL=/bin/sh
SHLVL=2
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
PATH=/usr/bin:/bin
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
glibc-debuginfo-common-2.24-10.fc25.s390x
fedora-release-26-1.noarch
dejavu-sans-mono-fonts-2.35-4.fc26.noarch
xemacs-filesystem-21.5.34-22.20170124hgf412e9f093d4.fc26.noarch
bash-4.4.12-7.fc26.s390x
libSM-1.2.2-5.fc26.s390x
libmpc-1.0.2-6.fc26.s390x
libaio-0.3.110-7.fc26.s390x
libverto-0.2.6-7.fc26.s390x
perl-Scalar-List-Utils-1.48-1.fc26.s390x
iptables-libs-1.6.1-2.fc26.s390x
tcl-8.6.6-2.fc26.s390x
libxshmfence-1.2-4.fc26.s390x
expect-5.45-23.fc26.s390x
perl-Thread-Queue-3.12-1.fc26.noarch
perl-encoding-2.19-6.fc26.s390x
keyutils-1.5.10-1.fc26.s390x
gmp-devel-6.1.2-4.fc26.s390x
enchant-1.6.0-16.fc26.s390x
python-gobject-base-3.24.1-1.fc26.s390x
python3-enchant-1.6.10-1.fc26.noarch
python-lockfile-0.11.0-6.fc26.noarch
python2-pyparsing-2.1.10-3.fc26.noarch
python2-lxml-4.1.1-1.fc26.s390x
librados2-10.2.7-2.fc26.s390x
trousers-lib-0.3.13-7.fc26.s390x
libdatrie-0.2.9-4.fc26.s390x
libsoup-2.58.2-1.fc26.s390x
passwd-0.79-9.fc26.s390x
bind99-libs-9.9.10-3.P3.fc26.s390x
python3-rpm-4.13.0.2-1.fc26.s390x
systemd-233-7.fc26.s390x
virglrenderer-0.6.0-1.20170210git76b3da97b.fc26.s390x
s390utils-ziomon-1.36.1-3.fc26.s390x
s390utils-osasnmpd-1.36.1-3.fc26.s390x
libXrandr-1.5.1-2.fc26.s390x
libglvnd-glx-1.0.0-1.fc26.s390x
texlive-ifxetex-svn19685.0.5-33.fc26.2.noarch
texlive-psnfss-svn33946.9.2a-33.fc26.2.noarch
texlive-dvipdfmx-def-svn40328-33.fc26.2.noarch
texlive-natbib-svn20668.8.31b-33.fc26.2.noarch
texlive-xdvi-bin-svn40750-33.20160520.fc26.2.s390x
texlive-cm-svn32865.0-33.fc26.2.noarch
texlive-beton-svn15878.0-33.fc26.2.noarch
texlive-fpl-svn15878.1.002-33.fc26.2.noarch
texlive-mflogo-svn38628-33.fc26.2.noarch
texlive-texlive-docindex-svn41430-33.fc26.2.noarch
texlive-luaotfload-bin-svn34647.0-33.20160520.fc26.2.noarch
texlive-koma-script-svn41508-33.fc26.2.noarch
texlive-pst-tree-svn24142.1.12-33.fc26.2.noarch
texlive-breqn-svn38099.0.98d-33.fc26.2.noarch
texlive-xetex-svn41438-33.fc26.2.noarch
gstreamer1-plugins-bad-free-1.12.3-1.fc26.s390x
xorg-x11-font-utils-7.5-33.fc26.s390x
ghostscript-fonts-5.50-36.fc26.noarch
libXext-devel-1.3.3-5.fc26.s390x
libusbx-devel-1.0.21-2.fc26.s390x
libglvnd-devel-1.0.0-1.fc26.s390x
emacs-25.3-3.fc26.s390x
alsa-lib-devel-1.1.4.1-1.fc26.s390x
kbd-2.0.4-2.fc26.s390x
dconf-0.26.0-2.fc26.s390x
mc-4.8.19-5.fc26.s390x
doxygen-1.8.13-9.fc26.s390x
dpkg-1.18.24-1.fc26.s390x
libtdb-1.3.13-1.fc26.s390x
python2-pynacl-1.1.1-1.fc26.s390x
perl-Filter-1.58-1.fc26.s390x
python2-pip-9.0.1-11.fc26.noarch
dnf-2.7.5-2.fc26.noarch
bind-license-9.11.2-1.P1.fc26.noarch
libtasn1-4.13-1.fc26.s390x
cpp-7.3.1-2.fc26.s390x
pkgconf-1.3.12-2.fc26.s390x
python2-fedora-0.10.0-1.fc26.noarch
cmake-filesystem-3.10.1-11.fc26.s390x
python3-requests-kerberos-0.12.0-1.fc26.noarch
libmicrohttpd-0.9.59-1.fc26.s390x
GeoIP-GeoLite-data-2018.01-1.fc26.noarch
python2-libs-2.7.14-7.fc26.s390x
libidn2-2.0.4-3.fc26.s390x
p11-kit-devel-0.23.10-1.fc26.s390x
perl-Errno-1.25-396.fc26.s390x
libdrm-2.4.90-2.fc26.s390x
sssd-common-1.16.1-1.fc26.s390x
boost-random-1.63.0-11.fc26.s390x
urw-fonts-2.4-

Re: [Qemu-devel] [PATCH v2 01/20] cutils: Provide strchrnul

2018-06-01 Thread Dr. David Alan Gilbert
* Greg Kurz (gr...@kaod.org) wrote:
> On Thu, 31 May 2018 21:25:56 -0400
> Keno Fischer  wrote:
> 
> > strchrnul is a GNU extension and thus unavailable on a number of targets.
> > In the review for a commit removing strchrnul from 9p, I was asked to
> > create a qemu_strchrnul helper to factor out this functionality.
> > Do so, and use it in a number of other places in the code base that inlined
> > the replacement pattern in a place where strchrnul could be used.
> > 
> > Signed-off-by: Keno Fischer 
> > ---
> > 
> 
> And possibly we could detect in configure if the host has strchrnul() and
> use it, but this optimization can be done later.
> 
> I haven't checked if there could be other candidates in the current code
> base though. Also, this patch touches some other subsystems, so I'm Cc'ing
> to the other maintainers as reported by ./scripts/get_maintainer.pl:

That looks fine from my point of view;  I can see you could probably
also use it in the code at the start of the while loop in hmp_sendkey:

while (1) {
separator = strchr(keys, '-');
keyname_len = separator ? separator - keys : strlen(keys);

Dave

> Greg Kurz  (supporter:virtio-9p)
> Markus Armbruster  (supporter:QMP)
> "Dr. David Alan Gilbert"  (maintainer:Human Monitor 
> (HMP))
> qemu-devel@nongnu.org (open list:All patches CC here)
> 
> Anyway,
> 
> Acked-by: Greg Kurz 
> 
> > Changes since v1: New patch
> > 
> >  hw/9pfs/9p-local.c|  2 +-
> >  include/qemu/cutils.h |  1 +
> >  monitor.c |  8 ++--
> >  util/cutils.c | 13 +
> >  util/qemu-option.c|  6 +-
> >  util/uri.c|  6 ++
> >  6 files changed, 20 insertions(+), 16 deletions(-)
> > 
> > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> > index b37b1db..bcf2798 100644
> > --- a/hw/9pfs/9p-local.c
> > +++ b/hw/9pfs/9p-local.c
> > @@ -65,7 +65,7 @@ int local_open_nofollow(FsContext *fs_ctx, const char 
> > *path, int flags,
> >  assert(*path != '/');
> >  
> >  head = g_strdup(path);
> > -c = strchrnul(path, '/');
> > +c = qemu_strchrnul(path, '/');
> >  if (*c) {
> >  /* Intermediate path element */
> >  head[c - path] = 0;
> > diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
> > index a663340..bc40c30 100644
> > --- a/include/qemu/cutils.h
> > +++ b/include/qemu/cutils.h
> > @@ -122,6 +122,7 @@ int qemu_strnlen(const char *s, int max_len);
> >   * Returns: the pointer originally in @input.
> >   */
> >  char *qemu_strsep(char **input, const char *delim);
> > +const char *qemu_strchrnul(const char *s, int c);
> >  time_t mktimegm(struct tm *tm);
> >  int qemu_fdatasync(int fd);
> >  int fcntl_setfl(int fd, int flag);
> > diff --git a/monitor.c b/monitor.c
> > index 922cfc0..e1f01c4 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -798,9 +798,7 @@ static int compare_cmd(const char *name, const char 
> > *list)
> >  p = list;
> >  for(;;) {
> >  pstart = p;
> > -p = strchr(p, '|');
> > -if (!p)
> > -p = pstart + strlen(pstart);
> > +p = qemu_strchrnul(p, '|');
> >  if ((p - pstart) == len && !memcmp(pstart, name, len))
> >  return 1;
> >  if (*p == '\0')
> > @@ -3401,9 +3399,7 @@ static void cmd_completion(Monitor *mon, const char 
> > *name, const char *list)
> >  p = list;
> >  for(;;) {
> >  pstart = p;
> > -p = strchr(p, '|');
> > -if (!p)
> > -p = pstart + strlen(pstart);
> > +p = qemu_strchrnul(p, '|');
> >  len = p - pstart;
> >  if (len > sizeof(cmd) - 2)
> >  len = sizeof(cmd) - 2;
> > diff --git a/util/cutils.c b/util/cutils.c
> > index 0de69e6..6e078b0 100644
> > --- a/util/cutils.c
> > +++ b/util/cutils.c
> > @@ -545,6 +545,19 @@ int qemu_strtou64(const char *nptr, const char 
> > **endptr, int base,
> >  }
> >  
> >  /**
> > + * Searches for the first occurrence of 'c' in 's', and returns a pointer
> > + * to the trailing null byte if none was found.
> > + */
> > +const char *qemu_strchrnul(const char *s, int c)
> > +{
> > +const char *e = strchr(s, c);
> > +if (!e) {
> > +e = s + strlen(s);
> > +}
> > +return e;
> > +}
> > +
> > +/**
> >   * parse_uint:
> >   *
> >   * @s: String to parse
> > diff --git a/util/qemu-option.c b/util/qemu-option.c
> > index 58d1c23..54eca12 100644
> > --- a/util/qemu-option.c
> > +++ b/util/qemu-option.c
> > @@ -77,11 +77,7 @@ const char *get_opt_value(const char *p, char **value)
> >  
> >  *value = NULL;
> >  while (1) {
> > -offset = strchr(p, ',');
> > -if (!offset) {
> > -offset = p + strlen(p);
> > -}
> > -
> > +offset = qemu_strchrnul(p, ',');
> >  length = offset - p;
> >  if (*offset != '\0' && *(offset + 1) == ',') {
> >  length++;
> > diff --git a/util/uri.c b/util/uri.c
> > index 8624a7a..8bdef84 100644
> > ---

Re: [Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall

2018-06-01 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180601073050.8054-1-richard.hender...@linaro.org
Subject: [Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20180601073050.8054-1-richard.hender...@linaro.org -> 
patchew/20180601073050.8054-1-richard.hender...@linaro.org
Switched to a new branch 'test'
747f98cda4 linux-user: Split out rt_sigqueueinfo, rt_sigtimedwait, 
rt_tgsigqueueinfo
8c59503fd9 linux-user: Split out rt_sigpending, rt_sigsuspend, sigpending, 
sigsuspend
3a57cc334f linux-user: Split out rt_sigprocmask, sgetmask, sigprocmask, ssetmask
81bd8e94dd linux-user: Split out rt_sigaction, sigaction
0fc1031ca6 linux-user: Split out getpgrp, getppid, setsid
76b6ab61e4 linux-user: Split out chroot, dup2, dup3, fcntl, setpgid, umask
818843d921 linux-user: Split out ioctl
d3a9fa76f4 linux-user: Split out acct, pipe, pipe2, times, umount2
73775770a5 linux-user: Split out dup, mkdir, mkdirat, rmdir
8d6ff832d7 linux-user: Split out rename, renameat, renameat2
ded97414bf linux-user: Split out access, faccessat, futimesat, kill, nice, 
sync, syncfs
f2ac2715a0 linux-user: Split out alarm, pause, stime, utime, utimes
fd239f018f linux-user: Split out mount, umount
7f8d08b0df linux-user: Split out getpid, getxpid, lseek
0033fce107 linux-user: Remove all unimplemented entries
1d093d6966 linux-user: Split out chdir, mknod, mknodat, time, chmod
d0bc8c69af linux-user: Split out unlink, unlinkat
ee1804088c linux-user: Split out link, linkat
7da1a7d2ec linux-user: Split out creat, fork, waitid, waitpid
6c9db8aee2 linux-user: Split out open_to_handle_at
8e8c59cd27 linux-user: Split out name_to_handle_at
4ed7c56516 linux-user: Split out open, openat
dbf85fddf7 linux-user: Split out execve
d4654a6ef7 linux-user: Split out brk, close, exit, read, write
404318016d linux-user: Set up infrastructure for table-izing syscalls
a7b3ac0407 linux-user: Make syscall number unsigned
6086f3339b linux-user: Propagate goto fail to return
1969ae08d3 linux-user: Split out goto unimplemented to do_unimplemented
f3213e38ff linux-user: Propagate goto unimplemented_nowarn to return
4e20509f56 linux-user: Propagate goto efault to return
dda83e01f8 linux-user: Propagate goto ebadf to return
fddfe2eb57 linux-user: Relax single exit from "break"
bac309b293 linux-user: Split out do_syscall1

=== OUTPUT BEGIN ===
Checking PATCH 1/33: linux-user: Split out do_syscall1...
Checking PATCH 2/33: linux-user: Relax single exit from "break"...
ERROR: code indent should never use tabs
#1929: FILE: linux-user/syscall.c:11150:
+^Ireturn ret;$

ERROR: code indent should never use tabs
#1938: FILE: linux-user/syscall.c:11159:
+^Ireturn ret;$

ERROR: code indent should never use tabs
#1947: FILE: linux-user/syscall.c:11166:
+^Ireturn target_ftruncate64(cpu_env, arg1, arg2, arg3, arg4);$

ERROR: code indent should never use tabs
#2411: FILE: linux-user/syscall.c:11862:
+^Ireturn ret;$

ERROR: code indent should never use tabs
#2683: FILE: linux-user/syscall.c:12216:
+^Ireturn ret;$

total: 5 errors, 0 warnings, 2853 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/33: linux-user: Propagate goto ebadf to return...
Checking PATCH 4/33: linux-user: Propagate goto efault to return...
ERROR: suspect code indent for conditional statements (11, 14)
#642: FILE: linux-user/syscall.c:9553:
if (!p) {
+  return -TARGET_EFAULT;

total: 1 errors, 0 warnings, 1211 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 5/33: linux-user: Propagate goto unimplemented_nowarn to 
return...
Checking PATCH 6/33: linux-user: Split out goto unimplemented to 
do_unimplemented...
Checking PATCH 7/33: linux-user: Propagate goto fail to return...
Checking PATCH 8/33: linux-user: Make syscall number unsigned...
Checking PATCH 9/33: linux-user: Set up infrastructure for table-izing 
syscalls...
Checking PATCH 10/33: linux-user: Split out brk, close, exit, read, write...
Checking PATCH 11/33: linux-user: Split out execve...
Checking PATCH 12/33: linux-u

Re: [Qemu-devel] [PATCH v3 15/22] target/arm: Add ARM_FEATURE_V7VE for v7 Virtualization Extensions

2018-06-01 Thread Peter Maydell
On 31 May 2018 at 21:39, Aaron Lindsay  wrote:
> On May 31 15:18, Peter Maydell wrote:
>>if (arm_feature(env, ARM_FEATURE_V7VE) {
>>/* v7 Virtualization Extensions. In real hardware this implies
>> * EL2 and also the presence of the Security Extensions.
>> * For QEMU, for backwards-compatibility we implement some
>> * CPUs or CPU configs which have no actual EL2 or EL3 but do
>> * include the various other features that V7VE implies.
>> * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
>> * Security Extensions is ARM_FEATURE_EL3.
>> */
>>set_feature(env, ARM_FEATURE_ARM_DIV);
>
> Is it safe to assume from your comment above regarding keeping ARM_DIV
> separate from V7VE that the inclusion of it here is an oversight and
> that only LPAE and V7 should be set if V7VE is? (and that V8 should
> now directly imply both V7VE and ARM_DIV?)

No; V7VE always implies ARM_DIV. (ARM_DIV doesn't imply V7VE,
though, which is why it is a separate feature bit.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v7 04/11] hmp: disable monitor in preconfig state

2018-06-01 Thread Dr. David Alan Gilbert
* Igor Mammedov (imamm...@redhat.com) wrote:
> On Fri, 25 May 2018 16:39:34 -0300
> Eduardo Habkost  wrote:
> 
> > On Fri, May 25, 2018 at 08:05:30AM +0200, Markus Armbruster wrote:
> > > Eduardo Habkost  writes:
> > >   
> > > > On Thu, May 24, 2018 at 08:16:20PM +0200, Markus Armbruster wrote:  
> > > >> Markus Armbruster  writes:
> > > >>   
> > > >> > Igor Mammedov  writes:
> > > >> >  
> > > >> >> Ban it for now, if someone would need it to work early,
> > > >> >> one would have to implement checks if HMP command is valid
> > > >> >> at preconfig state.
> > > >> >>
> > > >> >> Signed-off-by: Igor Mammedov 
> > > >> >> Reviewed-by: Eric Blake 
> > > >> >> ---
> > > >> >> v5:
> > > >> >>   * add 'use QMP instead" to error message, suggesting user
> > > >> >> the right interface to use
> > > >> >> v4:
> > > >> >>   * v3 was only printing error but not preventing command execution,
> > > >> >> Fix it by returning after printing error message.
> > > >> >> ("Dr. David Alan Gilbert" )
> > > >> >> ---
> > > >> >>  monitor.c | 6 ++
> > > >> >>  1 file changed, 6 insertions(+)
> > > >> >>
> > > >> >> diff --git a/monitor.c b/monitor.c
> > > >> >> index 39f8ee1..0ffdf1d 100644
> > > >> >> --- a/monitor.c
> > > >> >> +++ b/monitor.c
> > > >> >> @@ -3374,6 +3374,12 @@ static void handle_hmp_command(Monitor *mon, 
> > > >> >> const char *cmdline)
> > > >> >>  
> > > >> >>  trace_handle_hmp_command(mon, cmdline);
> > > >> >>  
> > > >> >> +if (runstate_check(RUN_STATE_PRECONFIG)) {
> > > >> >> +monitor_printf(mon, "HMP not available in preconfig state, 
> > > >> >> "
> > > >> >> +"use QMP instead\n");
> > > >> >> +return;
> > > >> >> +}
> > > >> >> +
> > > >> >>  cmd = monitor_parse_command(mon, cmdline, &cmdline, 
> > > >> >> mon->cmd_table);
> > > >> >>  if (!cmd) {
> > > >> >>  return;  
> > > >> >
> > > >> > So we offer the user an HMP monitor, but we summarily fail all 
> > > >> > commands.
> > > >> > I'm sorry, but that's... searching for polite word... embarrassing.  
> > > >> > We
> > > >> > should accept HMP output only when we're ready to accept it.  Yes, 
> > > >> > that
> > > >> > would involve a bit more surgery rather than this cheap hack.  The 
> > > >> > whole
> > > >> > preconfig thing smells like a cheap hack to me, but let's not overdo 
> > > >> > it.  
> > > >> 
> > > >> Clarification: I don't think we need to hold the series because of
> > > >> this.  I do think you should investigate delaying HMP until it can 
> > > >> work.  
> > > >
> > > > What would it mean to delay HMP?  Not creating the socket?
> > > > Creating the socket but not accepting clients?  Accepting clients
> > > > but not consuming any input from the socket until we are out of
> > > > preconfig?
> > > >
> > > > I'm not sure if any of those options would be better.  If a human
> > > > is already trying to talk on the other side, it seems better to
> > > > show QEMU is alive (but not ready to hold a conversation yet)
> > > > than staying silent.  
> > > 
> > > If this
> > > 
> > > QEMU 2.12.50 monitor - type 'help' for more information
> > > (qemu) help
> > > HMP not available in preconfig state, use QMP instead
> > > (qemu) quit
> > > HMP not available in preconfig state, use QMP instead
> > > (qemu) let me out dammit
> > > HMP not available in preconfig state, use QMP instead
> > > (qemu) 
> > > 
> > > is better than the alternatives, then I wonder how much more
> > > entertainment the alternatives could provide!
> > > 
> > > We *can* do better.  Start like this:
> > > 
> > > QEMU 2.12.50 monitor is not ready with -preconfig until you complete
> > > configuration with QMP
> > > 
> > > and when we exit preconfig state, add:
> > > 
> > > QEMU 2.12.50 monitor - type 'help' for more information
> > > (qemu) 
> > > 
> > > Note that this is upfront about the monitor not being ready, avoids
> > > misleading the user about "help", talks to the user in the user's terms
> > > (-preconfig) instead of internal terms (preconfig state), and is more
> > > specific on how to ready the monitor.  
> > 
> > Yes, this sounds better than any of the options I have
> > considered.
> > 
> > Making at least 'help', 'quit', and 'exit-preconfig' work might
> > be even better, though.
> I'll look into both options and try to come up a patch to make it better.

Lets keep whatever we do here simple.
As I understand it, the only reason to deny HMP in preconfig is
because we've not got a per-command flag to say which commands
are allowed in preconfig state.  If you're going to allow
'help', 'quit' etc then you just end up adding that flag
(which should be easy) and then we've got the flag and
we can go back and enable other HMP commands in preconfig as well.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH 1/6] gdbstub: Return the fd from gdbserver_start

2018-06-01 Thread Peter Maydell
On 31 May 2018 at 23:49, Richard Henderson  wrote:
> This will allow us to protect gdbserver_fd from the guest.

Ha, I hadn't realised we already had an internal-to-QEMU filedescriptor :-)

thanks
-- PMM



[Qemu-devel] [Bug 1774605] [NEW] PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs

2018-06-01 Thread Satheesh Rajendran
Public bug reported:

PowerPC KVM guest does not emulate L2 and L2 caches for vCPU, it would
be good to have them enabled if not any known issues/limitation already
with PowerPC.

Host Env:
kernel: 4.17.0-rc7-00045-g0512e0134582
qemu: v2.12.0-923-gc181ddaa17-dirty
#libvirtd -V
libvirtd (libvirt) 4.4.0


Guest Kernel:
# uname -a
Linux atest-guest 4.17.0-rc7-00045-g0512e0134582 #9 SMP Fri Jun 1 02:55:50 EDT 
2018 ppc64le ppc64le ppc64le GNU/Linux

Guest:
# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  8
Core(s) per socket:  2
Socket(s):   1
NUMA node(s):1
Model:   2.1 (pvr 004b 0201)
Model name:  POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   64K
L1i cache:   32K
NUMA node0 CPU(s):   0-15


background: x86 enabling cpu L2 cache bydefault and L3 cache on demand for kvm 
guest
and claims performance improvement as vcpus can be 
benefited with lesser `vmexits due to guest send IPIs.` with L3 cache enabled, 
below was patch for same.

https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1774605

Title:
  PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs

Status in QEMU:
  New

Bug description:
  PowerPC KVM guest does not emulate L2 and L2 caches for vCPU, it would
  be good to have them enabled if not any known issues/limitation
  already with PowerPC.

  Host Env:
  kernel: 4.17.0-rc7-00045-g0512e0134582
  qemu: v2.12.0-923-gc181ddaa17-dirty
  #libvirtd -V
  libvirtd (libvirt) 4.4.0

  
  Guest Kernel:
  # uname -a
  Linux atest-guest 4.17.0-rc7-00045-g0512e0134582 #9 SMP Fri Jun 1 02:55:50 
EDT 2018 ppc64le ppc64le ppc64le GNU/Linux

  Guest:
  # lscpu
  Architecture:ppc64le
  Byte Order:  Little Endian
  CPU(s):  16
  On-line CPU(s) list: 0-15
  Thread(s) per core:  8
  Core(s) per socket:  2
  Socket(s):   1
  NUMA node(s):1
  Model:   2.1 (pvr 004b 0201)
  Model name:  POWER8 (architected), altivec supported
  Hypervisor vendor:   KVM
  Virtualization type: para
  L1d cache:   64K
  L1i cache:   32K
  NUMA node0 CPU(s):   0-15


  background: x86 enabling cpu L2 cache bydefault and L3 cache on demand for 
kvm guest
  and claims performance improvement as vcpus can be 
  benefited with lesser `vmexits due to guest send IPIs.` with L3 cache 
enabled, below was patch for same.

  
https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1774605/+subscriptions



[Qemu-devel] [Bug 1774605] Re: PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs

2018-06-01 Thread Satheesh Rajendran
Guest xml(cpu portion):

...
   32
  
/machine
  
  
hvm
/home/kvmci/linux/vmlinux
root=/dev/sda2 rw console=tty0 console=ttyS0,115200 
init=/sbin/init initcall_debug

  
  

  
...


Host lscpu:
# lscpu
Architecture: ppc64le
Byte Order:   Little Endian
CPU(s):   80
On-line CPU(s) list:  0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:   1
Core(s) per socket:   5
Socket(s):2
NUMA node(s): 2
Model:2.1 (pvr 004b 0201)
Model name:   POWER8E (raw), altivec supported
CPU max MHz:  3690.
CPU min MHz:  2061.
L1d cache:64K
L1i cache:32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s):0,8,16,24,32
NUMA node1 CPU(s):40,48,56,64,72

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1774605

Title:
  PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs

Status in QEMU:
  New

Bug description:
  PowerPC KVM guest does not emulate L2 and L2 caches for vCPU, it would
  be good to have them enabled if not any known issues/limitation
  already with PowerPC.

  Host Env:
  kernel: 4.17.0-rc7-00045-g0512e0134582
  qemu: v2.12.0-923-gc181ddaa17-dirty
  #libvirtd -V
  libvirtd (libvirt) 4.4.0

  
  Guest Kernel:
  # uname -a
  Linux atest-guest 4.17.0-rc7-00045-g0512e0134582 #9 SMP Fri Jun 1 02:55:50 
EDT 2018 ppc64le ppc64le ppc64le GNU/Linux

  Guest:
  # lscpu
  Architecture:ppc64le
  Byte Order:  Little Endian
  CPU(s):  16
  On-line CPU(s) list: 0-15
  Thread(s) per core:  8
  Core(s) per socket:  2
  Socket(s):   1
  NUMA node(s):1
  Model:   2.1 (pvr 004b 0201)
  Model name:  POWER8 (architected), altivec supported
  Hypervisor vendor:   KVM
  Virtualization type: para
  L1d cache:   64K
  L1i cache:   32K
  NUMA node0 CPU(s):   0-15


  background: x86 enabling cpu L2 cache bydefault and L3 cache on demand for 
kvm guest
  and claims performance improvement as vcpus can be 
  benefited with lesser `vmexits due to guest send IPIs.` with L3 cache 
enabled, below was patch for same.

  
https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1774605/+subscriptions



Re: [Qemu-devel] [PATCH v2 02/20] 9p: proxy: Fix size passed to `connect`

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:25:57 -0400
Keno Fischer  wrote:

> The size to pass to the `connect` call is the size of the entire
> `struct sockaddr_un`. Passing anything shorter than this causes errors
> on darwin.
> 

From the linux unix(7) manual page:

   ret = connect (data_socket, (const struct sockaddr *) &addr,
  sizeof(struct sockaddr_un));

Not sure why it was done differently, but I definitely prefer the
fixed size version.

Applied to 9p-next.

Thanks !

> Signed-off-by: Keno Fischer 
> ---
> 
> Changes since v1: New patch
> 
>  hw/9pfs/9p-proxy.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/9pfs/9p-proxy.c b/hw/9pfs/9p-proxy.c
> index e2e0329..47a94e0 100644
> --- a/hw/9pfs/9p-proxy.c
> +++ b/hw/9pfs/9p-proxy.c
> @@ -1088,7 +1088,7 @@ static int proxy_ioc_getversion(FsContext *fs_ctx, 
> V9fsPath *path,
>  
>  static int connect_namedsocket(const char *path, Error **errp)
>  {
> -int sockfd, size;
> +int sockfd;
>  struct sockaddr_un helper;
>  
>  if (strlen(path) >= sizeof(helper.sun_path)) {
> @@ -1102,8 +1102,7 @@ static int connect_namedsocket(const char *path, Error 
> **errp)
>  }
>  strcpy(helper.sun_path, path);
>  helper.sun_family = AF_UNIX;
> -size = strlen(helper.sun_path) + sizeof(helper.sun_family);
> -if (connect(sockfd, (struct sockaddr *)&helper, size) < 0) {
> +if (connect(sockfd, (struct sockaddr *)&helper, sizeof(helper)) < 0) {
>  error_setg_errno(errp, errno, "failed to connect to '%s'", path);
>  close(sockfd);
>  return -1;




[Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-01 Thread xuyandong
Hi there,

I am doing some test on qemu vcpu hotplug and I run into some trouble.
An emulation failure occurs and qemu prints the following msg:

KVM internal error. Suberror: 1
emulation failure
EAX= EBX= ECX= EDX=0600
ESI= EDI= EBP= ESP=fff8
EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00if
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3  66 68 21 
8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 c6

I notice that guest is still running SeabBIOS in real mode when the vcpu has 
just been pluged.
This emulation failure can be steadly reproduced if I am doing vcpu hotplug 
during VM launch process.
After some digging, I find this KVM internal error shows up because KVM cannot 
emulate some MMIO (gpa 0xfff53 ).

So I am confused,
(1) does qemu support vcpu hotplug even if guest is running seabios ?
(2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm confirm 
it as a mmio address incorrectly?


Re: [Qemu-devel] [PULL 00/25] target-arm queue

2018-06-01 Thread Peter Maydell
On 31 May 2018 at 17:00, Peter Maydell  wrote:
> target-arm queue. This has the "plumb txattrs through various
> bits of exec.c" patches, and a collection of bug fixes from
> various people.
>
> v2: fix compile error on arm hosts...
>
> thanks
> -- PMM
>
>
> The following changes since commit a3ac12fba028df90f7b3dbec924995c126c41022:
>
>   Merge remote-tracking branch 'remotes/ehabkost/tags/numa-next-pull-request' 
> into staging (2018-05-31 11:12:36 +0100)
>
> are available in the Git repository at:
>
>   git://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20180531-1
>
> for you to fetch changes up to 2f15b79280cf71b7991dfd3f0312a1797630e376:
>
>   KVM: GIC: Fix memory leak due to calling kvm_init_irq_routing twice 
> (2018-05-31 16:32:35 +0100)
>

Applied, thanks.

-- PMM



[Qemu-devel] [PATCH] file-posix: Consolidate the locking error message

2018-06-01 Thread Fam Zheng
When hot-plugging a block device fails due to image locking errors,
users won't see the helpful 'Is another process using the image?'
message in QMP because currently the error hint is not carried over
there.

Even though extending QMP to include hint is a conceivably easy task,
Libvirt will need some change to consume that data.

Before that is fully sorted out, let's just do the easy fix by joining
the two lines.

Signed-off-by: Fam Zheng 
---
 block/file-posix.c | 10 ++--
 tests/qemu-iotests/153.out | 99 +-
 tests/qemu-iotests/182.out |  3 +-
 3 files changed, 38 insertions(+), 74 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 5a602cfe37..03776e13b1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -699,11 +699,10 @@ static int raw_check_lock_bytes(BDRVRawState *s,
 if (ret) {
 char *perm_name = bdrv_perm_names(p);
 error_setg(errp,
-   "Failed to get \"%s\" lock",
+   "Failed to get \"%s\" lock. "
+   "Is another process using the image?",
perm_name);
 g_free(perm_name);
-error_append_hint(errp,
-  "Is another process using the image?\n");
 return ret;
 }
 }
@@ -716,11 +715,10 @@ static int raw_check_lock_bytes(BDRVRawState *s,
 if (ret) {
 char *perm_name = bdrv_perm_names(p);
 error_setg(errp,
-   "Failed to get shared \"%s\" lock",
+   "Failed to get shared \"%s\" lock. "
+   "Is another process using the image?",
perm_name);
 g_free(perm_name);
-error_append_hint(errp,
-  "Is another process using the image?\n");
 return ret;
 }
 }
diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
index 2510762ba1..e256a9f714 100644
--- a/tests/qemu-iotests/153.out
+++ b/tests/qemu-iotests/153.out
@@ -11,86 +11,67 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33554432 
backing_file=TEST_DIR/t
 == Launching QEMU, opts: '' ==
 
 == Launching another QEMU, opts: '' ==
-QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,: Failed to get "write" lock
-Is another process using the image?
+QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,: Failed to get "write" lock. 
Is another process using the image?
 
 == Launching another QEMU, opts: 'read-only=on' ==
-QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,read-only=on: Failed to get 
shared "write" lock
-Is another process using the image?
+QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,read-only=on: Failed to get 
shared "write" lock. Is another process using the image?
 
 == Launching another QEMU, opts: 'read-only=on,force-share=on' ==
 
 == Running utility commands  ==
 
 _qemu_io_wrapper -c read 0 512 TEST_DIR/t.qcow2
-can't open device TEST_DIR/t.qcow2: Failed to get "write" lock
-Is another process using the image?
+can't open device TEST_DIR/t.qcow2: Failed to get "write" lock. Is another 
process using the image?
 
 _qemu_io_wrapper -r -c read 0 512 TEST_DIR/t.qcow2
-can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock
-Is another process using the image?
+can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock. Is 
another process using the image?
 
 _qemu_io_wrapper -c open  TEST_DIR/t.qcow2 -c read 0 512
-can't open device TEST_DIR/t.qcow2: Failed to get "write" lock
-Is another process using the image?
+can't open device TEST_DIR/t.qcow2: Failed to get "write" lock. Is another 
process using the image?
 no file open, try 'help open'
 
 _qemu_io_wrapper -c open -r  TEST_DIR/t.qcow2 -c read 0 512
-can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock
-Is another process using the image?
+can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock. Is 
another process using the image?
 no file open, try 'help open'
 
 _qemu_img_wrapper info TEST_DIR/t.qcow2
-qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
-Is another process using the image?
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" 
lock. Is another process using the image?
 
 _qemu_img_wrapper check TEST_DIR/t.qcow2
-qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
-Is another process using the image?
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" 
lock. Is another process using the image?
 
 _qemu_img_wrapper compare TEST_DIR/t.qcow2 TEST_DIR/t.qcow2
-qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
-Is another process using the image?
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" 
lock. Is another p

Re: [Qemu-devel] [PATCH v2 03/20] 9p: xattr: Fix crash due to free of uninitialized value

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:25:58 -0400
Keno Fischer  wrote:

> If the size returned from llistxattr is 0, we skipped the malloc
> call, leaving xattr.value uninitialized. However, this value is
> later passed to `g_free` without any further checks, causing an

Ouch, good catch.

> error. Fix that by always calling g_malloc unconditionally. If
> `size` is 0, it will return a pointer that is safe to pass to g_free,
> likely NULL.
> 

"Allocates n_bytes bytes of memory, initialized to 0's. If n_bytes is 0 it
 returns NULL."

https://developer.gnome.org/glib/unstable/glib-Memory-Allocation.html#g-malloc

The fix is good, but it seems the same can also happen if v9fs_co_lgetxattr()
returns 0 a few lines below. Can you check this out and fix it if needed ?

> Signed-off-by: Keno Fischer 
> ---
> 
> Changes since v1: New patch
> 
>  hw/9pfs/9p.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index d74302d..b80db65 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -3256,8 +3256,8 @@ static void coroutine_fn v9fs_xattrwalk(void *opaque)
>  xattr_fidp->fs.xattr.len = size;
>  xattr_fidp->fid_type = P9_FID_XATTR;
>  xattr_fidp->fs.xattr.xattrwalk_fid = true;
> +xattr_fidp->fs.xattr.value = g_malloc0(size);
>  if (size) {
> -xattr_fidp->fs.xattr.value = g_malloc0(size);
>  err = v9fs_co_llistxattr(pdu, &xattr_fidp->path,
>   xattr_fidp->fs.xattr.value,
>   xattr_fidp->fs.xattr.len);




Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading

2018-06-01 Thread Fam Zheng
On Thu, 05/31 23:45, no-re...@patchew.org wrote:
> Hi,
> 
> This series failed build test on s390x host. Please find the details below.
> 
> Type: series
> Message-id: 20180601062849.28641-1-f...@redhat.com
> Subject: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> # Testing script will be invoked under the git checkout with
> # HEAD pointing to a commit that has the patches applied on top of "base"
> # branch
> set -e
> echo "=== ENV ==="
> env
> echo "=== PACKAGES ==="
> rpm -qa
> echo "=== TEST BEGIN ==="
> CC=$HOME/bin/cc
> INSTALL=$PWD/install
> BUILD=$PWD/build
> echo -n "Using CC: "
> realpath $CC
> mkdir -p $BUILD $INSTALL
> SRC=$PWD
> cd $BUILD
> $SRC/configure --cc=$CC --prefix=$INSTALL
> make -j4
> # XXX: we need reliable clean up
> # make check -j4 V=1
> make install
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  * [new tag]   patchew/20180601062849.28641-1-f...@redhat.com -> 
> patchew/20180601062849.28641-1-f...@redhat.com
> Switched to a new branch 'test'
> 5c03bb24cc qemu-img: Convert with copy offloading
> 51e732e62a block-backend: Add blk_co_copy_range
> 92edadadac iscsi: Implement copy offloading
> 6f8d57827a iscsi: Create and use iscsi_co_wait_for_task
> 5940e7ebff iscsi: Query and save device designator when opening
> 7619b4045d file-posix: Implement bdrv_co_copy_range
> af0dbb042d qcow2: Implement copy offloading
> d1550d576c raw: Implement copy offloading
> 61c824c950 raw: Check byte range uniformly
> 35c506afdd block: Introduce API for copy offloading
> b5683be886 docker: Update fedora image to 28
> 
> === OUTPUT BEGIN ===
> === ENV ===
> LANG=en_US.UTF-8
> XDG_SESSION_ID=212219
> USER=fam
> PWD=/var/tmp/patchew-tester-tmp-2l7s8dte/src
> HOME=/home/fam
> SHELL=/bin/sh
> SHLVL=2
> PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
> LOGNAME=fam
> DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
> XDG_RUNTIME_DIR=/run/user/1012
> PATH=/usr/bin:/bin
> _=/usr/bin/env
> === PACKAGES ===
> gpg-pubkey-873529b8-54e386ff
> glibc-debuginfo-common-2.24-10.fc25.s390x
> fedora-release-26-1.noarch
> dejavu-sans-mono-fonts-2.35-4.fc26.noarch
> xemacs-filesystem-21.5.34-22.20170124hgf412e9f093d4.fc26.noarch
> bash-4.4.12-7.fc26.s390x
> libSM-1.2.2-5.fc26.s390x
> libmpc-1.0.2-6.fc26.s390x
> libaio-0.3.110-7.fc26.s390x
> libverto-0.2.6-7.fc26.s390x
> perl-Scalar-List-Utils-1.48-1.fc26.s390x
> iptables-libs-1.6.1-2.fc26.s390x
> tcl-8.6.6-2.fc26.s390x
> libxshmfence-1.2-4.fc26.s390x
> expect-5.45-23.fc26.s390x
> perl-Thread-Queue-3.12-1.fc26.noarch
> perl-encoding-2.19-6.fc26.s390x
> keyutils-1.5.10-1.fc26.s390x
> gmp-devel-6.1.2-4.fc26.s390x
> enchant-1.6.0-16.fc26.s390x
> python-gobject-base-3.24.1-1.fc26.s390x
> python3-enchant-1.6.10-1.fc26.noarch
> python-lockfile-0.11.0-6.fc26.noarch
> python2-pyparsing-2.1.10-3.fc26.noarch
> python2-lxml-4.1.1-1.fc26.s390x
> librados2-10.2.7-2.fc26.s390x
> trousers-lib-0.3.13-7.fc26.s390x
> libdatrie-0.2.9-4.fc26.s390x
> libsoup-2.58.2-1.fc26.s390x
> passwd-0.79-9.fc26.s390x
> bind99-libs-9.9.10-3.P3.fc26.s390x
> python3-rpm-4.13.0.2-1.fc26.s390x
> systemd-233-7.fc26.s390x
> virglrenderer-0.6.0-1.20170210git76b3da97b.fc26.s390x
> s390utils-ziomon-1.36.1-3.fc26.s390x
> s390utils-osasnmpd-1.36.1-3.fc26.s390x
> libXrandr-1.5.1-2.fc26.s390x
> libglvnd-glx-1.0.0-1.fc26.s390x
> texlive-ifxetex-svn19685.0.5-33.fc26.2.noarch
> texlive-psnfss-svn33946.9.2a-33.fc26.2.noarch
> texlive-dvipdfmx-def-svn40328-33.fc26.2.noarch
> texlive-natbib-svn20668.8.31b-33.fc26.2.noarch
> texlive-xdvi-bin-svn40750-33.20160520.fc26.2.s390x
> texlive-cm-svn32865.0-33.fc26.2.noarch
> texlive-beton-svn15878.0-33.fc26.2.noarch
> texlive-fpl-svn15878.1.002-33.fc26.2.noarch
> texlive-mflogo-svn38628-33.fc26.2.noarch
> texlive-texlive-docindex-svn41430-33.fc26.2.noarch
> texlive-luaotfload-bin-svn34647.0-33.20160520.fc26.2.noarch
> texlive-koma-script-svn41508-33.fc26.2.noarch
> texlive-pst-tree-svn24142.1.12-33.fc26.2.noarch
> texlive-breqn-svn38099.0.98d-33.fc26.2.noarch
> texlive-xetex-svn41438-33.fc26.2.noarch
> gstreamer1-plugins-bad-free-1.12.3-1.fc26.s390x
> xorg-x11-font-utils-7.5-33.fc26.s390x
> ghostscript-fonts-5.50-36.fc26.noarch
> libXext-devel-1.3.3-5.fc26.s390x
> libusbx-devel-1.0.21-2.fc26.s390x
> libglvnd-devel-1.0.0-1.fc26.s390x
> emacs-25.3-3.fc26.s390x
> alsa-lib-devel-1.1.4.1-1.fc26.s390x
> kbd-2.0.4-2.fc26.s390x
> dconf-0.26.0-2.fc26.s390x
> mc-4.8.19-5.fc26.s390x
> doxygen-1.8.13-9.fc26.s390x
> dpkg-1.18.24-1.fc26.s390x
> libtdb-1.3.13-1.fc26.s390x
> python2-pynacl-1.1.1-1.fc26.s390x
> perl-Filter-1.58-1.fc26.s390x
> python2-pip-9.0.1-11.fc26.noarch
> dnf-2.7.5-2.fc26.noarch
> bind-license-9.11.2-1.P1.fc26.noarch
> libtasn1-4.13-1.fc26.s390x
> cpp-7.3.1-2.fc26.s390x
> pkgconf-1.3.12-2.fc26.s390x
> python2-fedora-0.10.0-1.fc26.noarch
> cmake-filesystem-3.10.1-11.fc26.s390x
> python3-requests

Re: [Qemu-devel] Questions about the flow of interrupt simulation

2018-06-01 Thread Peter Maydell
On 1 June 2018 at 07:17, Eva Chen  wrote:
> 1. There are two kinds of interrupt:  edge triggered and level triggered.
> I have seen two code segment related to the level: gic_set_irq() and
> arm_cpu_set_irq().
> In gic_set_irq(), if level == GIC_TEST_LEVEL(irq, cm), which means the
> level is not changed, will return.
> In arm_cpu_set_irq() said that if level ==1, call cpu_interrupt(). if
> level==0, call cpu_reset_interrupt(), which will clean up that irq bits..
> Does that mean all interrupt in arm are level triggered(high level)?
> How to know the triggered type of interrupt?

This is mixing up interrupts in two different places. For the Arm
architecture, IRQ and FIQ are always level-sensitive: the thing
which sets them (the GIC, typically) has to set them and keep them
set until the CPU acknowledges them.

For the GIC, its input interrupts may be either level sensitive or
edge sensitive. This is configurable for each interrupt on GICv2 by
writing to the GICD_ICFGRn registers. The gic_set_irq() code implements
the behaviour that the GIC specification requires, depending on whether
the ICFGRn register says that interrupt should be edge or level
triggered.

Other interrupt controllers that QEMU models may behave differently.
(For instance the ARMv7M NVIC is different again.)

> 2. interrupt signal will be passed through GIC from device to CPU. There
> are four types of interrupt in CPU: CPU_INTERRUPT_HARD/FIQ/VIRQ/VFIQ.
> Where exactly define the CPU_INTERRUPT_{type} that device's interrupt
> corresponded?

Again, this is configurable by the guest by writing to GIC registers.
In the GICv2, the GICD_IGROUPRn registers set the whether the interrupt
should be in "group 0" or "group 1". Group 1 interrupts always
cause an IRQ; group 0 interrupts cause either IRQ or FIQ depending
on the setting of the GICC_CTLR FIQEn bit. (The expected use is that
interrupts configured for use with the TrustZone Secure World will
use FIQ and those configured for use with the NonSecure World will
use IRQ.)

VIRQ and VFIQ are for when the GIC and CPU support the Virtualization
Extension.

The behaviour of all of this is defined by the GIC specification;
QEMU just has to implement what the hardware does.

> 3. I have seen others device's code under qemu/hw directory. Almost all
> device will call qemu_set_irq() at the end of device's read/write. Is that
> for the purpose of a device to tell CPU that it has done some works?
> but the second parameter of qemu_set_irq(), level, will be set to a
> different value(not always 1 or 0), which sometimes will cause the
> interrupt return at gic_set_irq() instead of passing to CPU.
> What does the interrupt at the end of device_read/write(device_update())
> mean?

The best way to think of this is not to try to think about whether
the interrupt line is connected to the CPU or anything else. Just
think about a device model as being an emulation of a particular
bit of hardware. For instance, take the pl011 UART. The specification
for that UART says that when certain conditions inside the device are
true, the UART will assert its outgoing interrupt line. So our model
also must check those conditions and call qemu_set_irq() to raise
and lower the interrupt at the right time. The common way to code
this is to have a function which is called whenever any of the
relevant state has changed, which rechecks the conditions and calls
qemu_set_irq(). The "purpose" of this code is just to behave the
way the hardware behaves.

Commonly, the output IRQ line from a device is connected to an
interrupt controller and thus to a CPU, but it doesn't have to be.
On some boards, the IRQ line might not be connected to anything.
Or it might be connected up to some other device which provides its
value to the guest via a status register. Or perhaps it's ORed together
with lines from other devices and the output of the OR gate goes
to the interrupt controller. If you're designing a board in real
hardware, you are taking various components (UART, interrupt
controller, etc) and connecting them up to implement a useful design.
In QEMU, we also take various components and connect them up, to
produce the same design the hardware has.

qemu_set_irq() is usually just modelling what in the hardware is a
simple wire. The source end calls this function to say "the wire
is at logical 0/1", and on the destination end a function is called
to handle that. It is also possible to pass in a value other than
0 or 1. This happens in two cases:
 (1) a bug, where the source end really ought to be using 0 or 1;
often this doesn't have any visible bad effects because the
destination end is testing 0 vs not-0, rather than 0 vs 1
 (2) we really do want to transfer an integer rather than just
a 0-vs-1 level. This is less common and only happens when both
ends of the "wire" know that that's the convention they want to use.

I think the overall theme of my reply is that fundamentally
QEMU is modelling hardware. If you want to understand why
parts 

[Qemu-devel] [PATCH v9 01/10] block: Introduce API for copy offloading

2018-06-01 Thread Fam Zheng
Introduce the bdrv_co_copy_range() API for copy offloading.  Block
drivers implementing this API support efficient copy operations that
avoid reading each block from the source device and writing it to the
destination devices.  Examples of copy offload primitives are SCSI
EXTENDED COPY and Linux copy_file_range(2).

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c| 97 +++
 include/block/block.h | 32 +
 include/block/block_int.h | 38 +++
 3 files changed, 167 insertions(+)

diff --git a/block/io.c b/block/io.c
index ca96b487eb..b7beaeeb9f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2835,3 +2835,100 @@ void bdrv_unregister_buf(BlockDriverState *bs, void 
*host)
 bdrv_unregister_buf(child->bs, host);
 }
 }
+
+static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
+uint64_t src_offset,
+BdrvChild *dst,
+uint64_t dst_offset,
+uint64_t bytes,
+BdrvRequestFlags flags,
+bool recurse_src)
+{
+int ret;
+
+if (!src || !dst || !src->bs || !dst->bs) {
+return -ENOMEDIUM;
+}
+ret = bdrv_check_byte_request(src->bs, src_offset, bytes);
+if (ret) {
+return ret;
+}
+
+ret = bdrv_check_byte_request(dst->bs, dst_offset, bytes);
+if (ret) {
+return ret;
+}
+if (flags & BDRV_REQ_ZERO_WRITE) {
+return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags);
+}
+
+if (!src->bs->drv->bdrv_co_copy_range_from
+|| !dst->bs->drv->bdrv_co_copy_range_to
+|| src->bs->encrypted || dst->bs->encrypted) {
+return -ENOTSUP;
+}
+if (recurse_src) {
+return src->bs->drv->bdrv_co_copy_range_from(src->bs,
+ src, src_offset,
+ dst, dst_offset,
+ bytes, flags);
+} else {
+return dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
+   src, src_offset,
+   dst, dst_offset,
+   bytes, flags);
+}
+}
+
+/* Copy range from @src to @dst.
+ *
+ * See the comment of bdrv_co_copy_range for the parameter and return value
+ * semantics. */
+int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
+ BdrvChild *dst, uint64_t dst_offset,
+ uint64_t bytes, BdrvRequestFlags 
flags)
+{
+return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
+   bytes, flags, true);
+}
+
+/* Copy range from @src to @dst.
+ *
+ * See the comment of bdrv_co_copy_range for the parameter and return value
+ * semantics. */
+int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
+   BdrvChild *dst, uint64_t dst_offset,
+   uint64_t bytes, BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
+   bytes, flags, false);
+}
+
+int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
+BdrvChild *dst, uint64_t dst_offset,
+uint64_t bytes, BdrvRequestFlags flags)
+{
+BdrvTrackedRequest src_req, dst_req;
+BlockDriverState *src_bs = src->bs;
+BlockDriverState *dst_bs = dst->bs;
+int ret;
+
+bdrv_inc_in_flight(src_bs);
+bdrv_inc_in_flight(dst_bs);
+tracked_request_begin(&src_req, src_bs, src_offset,
+  bytes, BDRV_TRACKED_READ);
+tracked_request_begin(&dst_req, dst_bs, dst_offset,
+  bytes, BDRV_TRACKED_WRITE);
+
+wait_serialising_requests(&src_req);
+wait_serialising_requests(&dst_req);
+ret = bdrv_co_copy_range_from(src, src_offset,
+  dst, dst_offset,
+  bytes, flags);
+
+tracked_request_end(&src_req);
+tracked_request_end(&dst_req);
+bdrv_dec_in_flight(src_bs);
+bdrv_dec_in_flight(dst_bs);
+return ret;
+}
diff --git a/include/block/block.h b/include/block/block.h
index 3894edda9d..6cc6c7e699 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -611,4 +611,36 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, 
const char *name,
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
 void bdrv_unregister_buf(BlockDriver

[Qemu-devel] [PATCH v9 00/10] qemu-img convert with copy offloading

2018-06-01 Thread Fam Zheng
v9: Don't break older libiscsi. [patchew]

v8: Fix compiling against new glibc and libiscsi on Fedora 28 where v7 had
conflict definitions. [Stefan, myself]
- Add HAVE_COPY_FILE_RANGE in configure.
- Drop IDENT_DESCR_TGT_DESCR from scsi constants header.

v7: Fix qcow2.

v6: Pick up rev-by from Stefan and Eric.
Tweak patch 2 commit message.

v5: - Fix raw offset/bytes check for read. [Eric]
- Fix qcow2_handle_l2meta. [Stefan]
- Add coroutine_fn whereever appropriate. [Stefan]

v4: - Fix raw offset and size. [Eric]
- iscsi: Drop unnecessary return values and variables in favor of
  constants. [Stefan]
- qcow2: Handle small backing case. [Stefan]
- file-posix: Translate ENOSYS to ENOTSUP. [Stefan]
- API documentation and commit message. [Stefan]
- Add rev-by to patches 3, 5 - 10. [Stefan, Eric]

This series introduces block layer API for copy offloading and makes use of it
in qemu-img convert.

For now we implemented the operation in local file protocol with
copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
and potentially more.

As far as its usage goes, in addition to qemu-img convert, we can emulate
offloading in scsi-disk (handle EXTENDED COPY command), and use the API in
block jobs too.

Fam Zheng (10):
  block: Introduce API for copy offloading
  raw: Check byte range uniformly
  raw: Implement copy offloading
  qcow2: Implement copy offloading
  file-posix: Implement bdrv_co_copy_range
  iscsi: Query and save device designator when opening
  iscsi: Create and use iscsi_co_wait_for_task
  iscsi: Implement copy offloading
  block-backend: Add blk_co_copy_range
  qemu-img: Convert with copy offloading

 block/block-backend.c  |  18 ++
 block/file-posix.c |  98 +-
 block/io.c |  97 ++
 block/iscsi.c  | 314 +
 block/qcow2.c  | 229 
 block/raw-format.c |  96 +++---
 configure  |  17 ++
 include/block/block.h  |  32 
 include/block/block_int.h  |  38 
 include/block/raw-aio.h|  10 +-
 include/scsi/constants.h   |   4 +
 include/sysemu/block-backend.h |   4 +
 qemu-img.c |  50 +-
 13 files changed, 908 insertions(+), 99 deletions(-)

-- 
2.17.0




[Qemu-devel] [PATCH v9 02/10] raw: Check byte range uniformly

2018-06-01 Thread Fam Zheng
We don't verify the request range against s->size in the I/O callbacks
except for raw_co_pwritev. This is inconsistent (especially for
raw_co_pwrite_zeroes and raw_co_pdiscard), so fix them, in the meanwhile
make the helper reusable by the coming new callbacks.

Note that in most cases the block layer already verifies the request
byte range against our reported image length, before invoking the driver
callbacks.  The exception is during image creating, after
blk_set_allow_write_beyond_eof(blk, true) is called. But in that case,
the requests are not directly from the user or guest. So there is no
visible behavior change in adding the check code.

The int64_t -> uint64_t inconsistency, as shown by the type casting, is
pre-existing due to the interface.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Fam Zheng 
---
 block/raw-format.c | 64 --
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index fe33693a2d..b69a0674b3 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -167,16 +167,37 @@ static void raw_reopen_abort(BDRVReopenState *state)
 state->opaque = NULL;
 }
 
+/* Check and adjust the offset, against 'offset' and 'size' options. */
+static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset,
+uint64_t bytes, bool is_write)
+{
+BDRVRawState *s = bs->opaque;
+
+if (s->has_size && (*offset > s->size || bytes > (s->size - *offset))) {
+/* There's not enough space for the write, or the read request is
+ * out-of-range. Don't read/write anything to prevent leaking out of
+ * the size specified in options. */
+return is_write ? -ENOSPC : -EINVAL;;
+}
+
+if (*offset > INT64_MAX - s->offset) {
+return -EINVAL;
+}
+*offset += s->offset;
+
+return 0;
+}
+
 static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, QEMUIOVector *qiov,
   int flags)
 {
-BDRVRawState *s = bs->opaque;
+int ret;
 
-if (offset > UINT64_MAX - s->offset) {
-return -EINVAL;
+ret = raw_adjust_offset(bs, &offset, bytes, false);
+if (ret) {
+return ret;
 }
-offset += s->offset;
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
 return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@@ -186,23 +207,11 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
 {
-BDRVRawState *s = bs->opaque;
 void *buf = NULL;
 BlockDriver *drv;
 QEMUIOVector local_qiov;
 int ret;
 
-if (s->has_size && (offset > s->size || bytes > (s->size - offset))) {
-/* There's not enough space for the data. Don't write anything and just
- * fail to prevent leaking out of the size specified in options. */
-return -ENOSPC;
-}
-
-if (offset > UINT64_MAX - s->offset) {
-ret = -EINVAL;
-goto fail;
-}
-
 if (bs->probed && offset < BLOCK_PROBE_BUF_SIZE && bytes) {
 /* Handling partial writes would be a pain - so we just
  * require that guests have 512-byte request alignment if
@@ -237,7 +246,10 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 qiov = &local_qiov;
 }
 
-offset += s->offset;
+ret = raw_adjust_offset(bs, &offset, bytes, true);
+if (ret) {
+goto fail;
+}
 
 BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
 ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -267,22 +279,24 @@ static int coroutine_fn 
raw_co_pwrite_zeroes(BlockDriverState *bs,
  int64_t offset, int bytes,
  BdrvRequestFlags flags)
 {
-BDRVRawState *s = bs->opaque;
-if (offset > UINT64_MAX - s->offset) {
-return -EINVAL;
+int ret;
+
+ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
+if (ret) {
+return ret;
 }
-offset += s->offset;
 return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
 }
 
 static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
 int64_t offset, int bytes)
 {
-BDRVRawState *s = bs->opaque;
-if (offset > UINT64_MAX - s->offset) {
-return -EINVAL;
+int ret;
+
+ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
+if (ret) {
+return ret;
 }
-offset += s->offset;
 return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
 }
 
-- 
2.17.0




[Qemu-devel] [PATCH v9 05/10] file-posix: Implement bdrv_co_copy_range

2018-06-01 Thread Fam Zheng
With copy_file_range(2), we can implement the bdrv_co_copy_range
semantics.

Signed-off-by: Fam Zheng 
---
 block/file-posix.c  | 98 +++--
 configure   | 17 +++
 include/block/raw-aio.h | 10 -
 3 files changed, 120 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 5a602cfe37..513d371bb1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -59,6 +59,7 @@
 #ifdef __linux__
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -187,6 +188,8 @@ typedef struct RawPosixAIOData {
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
 off_t aio_offset;
 int aio_type;
+int aio_fd2;
+off_t aio_offset2;
 } RawPosixAIOData;
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -1446,6 +1449,49 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 return -ENOTSUP;
 }
 
+#ifndef HAVE_COPY_FILE_RANGE
+static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
+ off_t *out_off, size_t len, unsigned int flags)
+{
+#ifdef __NR_copy_file_range
+return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
+   out_off, len, flags);
+#else
+errno = ENOSYS;
+return -1;
+#endif
+}
+#endif
+
+static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
+{
+uint64_t bytes = aiocb->aio_nbytes;
+off_t in_off = aiocb->aio_offset;
+off_t out_off = aiocb->aio_offset2;
+
+while (bytes) {
+ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
+  aiocb->aio_fd2, &out_off,
+  bytes, 0);
+if (ret == -EINTR) {
+continue;
+}
+if (ret < 0) {
+if (errno == ENOSYS) {
+return -ENOTSUP;
+} else {
+return -errno;
+}
+}
+if (!ret) {
+/* No progress (e.g. when beyond EOF), fall back to buffer I/O. */
+return -ENOTSUP;
+}
+bytes -= ret;
+}
+return 0;
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -1526,6 +1572,9 @@ static int aio_worker(void *arg)
 case QEMU_AIO_WRITE_ZEROES:
 ret = handle_aiocb_write_zeroes(aiocb);
 break;
+case QEMU_AIO_COPY_RANGE:
+ret = handle_aiocb_copy_range(aiocb);
+break;
 default:
 fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
 ret = -EINVAL;
@@ -1536,9 +1585,10 @@ static int aio_worker(void *arg)
 return ret;
 }
 
-static int paio_submit_co(BlockDriverState *bs, int fd,
-  int64_t offset, QEMUIOVector *qiov,
-  int bytes, int type)
+static int paio_submit_co_full(BlockDriverState *bs, int fd,
+   int64_t offset, int fd2, int64_t offset2,
+   QEMUIOVector *qiov,
+   int bytes, int type)
 {
 RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
 ThreadPool *pool;
@@ -1546,6 +1596,8 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
 acb->bs = bs;
 acb->aio_type = type;
 acb->aio_fildes = fd;
+acb->aio_fd2 = fd2;
+acb->aio_offset2 = offset2;
 
 acb->aio_nbytes = bytes;
 acb->aio_offset = offset;
@@ -1561,6 +1613,13 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
 return thread_pool_submit_co(pool, aio_worker, acb);
 }
 
+static inline int paio_submit_co(BlockDriverState *bs, int fd,
+ int64_t offset, QEMUIOVector *qiov,
+ int bytes, int type)
+{
+return paio_submit_co_full(bs, fd, offset, -1, 0, qiov, bytes, type);
+}
+
 static BlockAIOCB *paio_submit(BlockDriverState *bs, int fd,
 int64_t offset, QEMUIOVector *qiov, int bytes,
 BlockCompletionFunc *cb, void *opaque, int type)
@@ -2451,6 +2510,35 @@ static void raw_abort_perm_update(BlockDriverState *bs)
 raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
 }
 
+static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
+   BdrvChild *src, uint64_t 
src_offset,
+   BdrvChild *dst, uint64_t 
dst_offset,
+   uint64_t bytes, 
BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, 
flags);
+}
+
+static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
+ BdrvChild *src, uint64_t 
src_offset,
+ BdrvChild *dst, uint64_t 
dst_offset,
+ uint64_t bytes, BdrvRequestFlags 
flags)
+{
+BDRVRawState *s = bs->opaque;
+BDRVRawState *src_s;
+
+assert(dst->bs =

[Qemu-devel] [PATCH v9 03/10] raw: Implement copy offloading

2018-06-01 Thread Fam Zheng
Just pass down to ->file.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
---
 block/raw-format.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index b69a0674b3..f2e468df6f 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -497,6 +497,36 @@ static int raw_probe_geometry(BlockDriverState *bs, 
HDGeometry *geo)
 return bdrv_probe_geometry(bs->file->bs, geo);
 }
 
+static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
+   BdrvChild *src, uint64_t 
src_offset,
+   BdrvChild *dst, uint64_t 
dst_offset,
+   uint64_t bytes, 
BdrvRequestFlags flags)
+{
+int ret;
+
+ret = raw_adjust_offset(bs, &src_offset, bytes, false);
+if (ret) {
+return ret;
+}
+return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset,
+   bytes, flags);
+}
+
+static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
+ BdrvChild *src, uint64_t 
src_offset,
+ BdrvChild *dst, uint64_t 
dst_offset,
+ uint64_t bytes, BdrvRequestFlags 
flags)
+{
+int ret;
+
+ret = raw_adjust_offset(bs, &dst_offset, bytes, true);
+if (ret) {
+return ret;
+}
+return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes,
+ flags);
+}
+
 BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .instance_size= sizeof(BDRVRawState),
@@ -513,6 +543,8 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
 .bdrv_co_pdiscard = &raw_co_pdiscard,
 .bdrv_co_block_status = &raw_co_block_status,
+.bdrv_co_copy_range_from = &raw_co_copy_range_from,
+.bdrv_co_copy_range_to  = &raw_co_copy_range_to,
 .bdrv_truncate= &raw_truncate,
 .bdrv_getlength   = &raw_getlength,
 .has_variable_length  = true,
-- 
2.17.0




[Qemu-devel] [PATCH v9 06/10] iscsi: Query and save device designator when opening

2018-06-01 Thread Fam Zheng
The device designator data returned in INQUIRY command will be useful to
fill in source/target fields during copy offloading. Do this when
connecting to the target and save the data for later use.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
---
 block/iscsi.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3fd7203916..6d0035d4b9 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -68,6 +68,7 @@ typedef struct IscsiLun {
 QemuMutex mutex;
 struct scsi_inquiry_logical_block_provisioning lbp;
 struct scsi_inquiry_block_limits bl;
+struct scsi_inquiry_device_designator *dd;
 unsigned char *zeroblock;
 /* The allocmap tracks which clusters (pages) on the iSCSI target are
  * allocated and which are not. In case a target returns zeros for
@@ -1740,6 +1741,30 @@ static QemuOptsList runtime_opts = {
 },
 };
 
+static void iscsi_save_designator(IscsiLun *lun,
+  struct scsi_inquiry_device_identification 
*inq_di)
+{
+struct scsi_inquiry_device_designator *desig, *copy = NULL;
+
+for (desig = inq_di->designators; desig; desig = desig->next) {
+if (desig->association ||
+desig->designator_type > SCSI_DESIGNATOR_TYPE_NAA) {
+continue;
+}
+/* NAA works better than T10 vendor ID based designator. */
+if (!copy || copy->designator_type < desig->designator_type) {
+copy = desig;
+}
+}
+if (copy) {
+lun->dd = g_new(struct scsi_inquiry_device_designator, 1);
+*lun->dd = *copy;
+lun->dd->next = NULL;
+lun->dd->designator = g_malloc(copy->designator_length);
+memcpy(lun->dd->designator, copy->designator, copy->designator_length);
+}
+}
+
 static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -1922,6 +1947,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 struct scsi_task *inq_task;
 struct scsi_inquiry_logical_block_provisioning *inq_lbp;
 struct scsi_inquiry_block_limits *inq_bl;
+struct scsi_inquiry_device_identification *inq_di;
 switch (inq_vpd->pages[i]) {
 case SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING:
 inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
@@ -1947,6 +1973,17 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
sizeof(struct scsi_inquiry_block_limits));
 scsi_free_scsi_task(inq_task);
 break;
+case SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION:
+inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
+
SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION,
+(void **) &inq_di, errp);
+if (inq_task == NULL) {
+ret = -EINVAL;
+goto out;
+}
+iscsi_save_designator(iscsilun, inq_di);
+scsi_free_scsi_task(inq_task);
+break;
 default:
 break;
 }
@@ -2003,6 +2040,10 @@ static void iscsi_close(BlockDriverState *bs)
 iscsi_logout_sync(iscsi);
 }
 iscsi_destroy_context(iscsi);
+if (iscsilun->dd) {
+g_free(iscsilun->dd->designator);
+g_free(iscsilun->dd);
+}
 g_free(iscsilun->zeroblock);
 iscsi_allocmap_free(iscsilun);
 qemu_mutex_destroy(&iscsilun->mutex);
-- 
2.17.0




[Qemu-devel] [PATCH v9 09/10] block-backend: Add blk_co_copy_range

2018-06-01 Thread Fam Zheng
It's a BlockBackend wrapper of the BDS interface.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
---
 block/block-backend.c  | 18 ++
 include/sysemu/block-backend.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 89f47b00ea..d55c328736 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2211,3 +2211,21 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
 {
 bdrv_unregister_buf(blk_bs(blk), host);
 }
+
+int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+   BlockBackend *blk_out, int64_t off_out,
+   int bytes, BdrvRequestFlags flags)
+{
+int r;
+r = blk_check_byte_request(blk_in, off_in, bytes);
+if (r) {
+return r;
+}
+r = blk_check_byte_request(blk_out, off_out, bytes);
+if (r) {
+return r;
+}
+return bdrv_co_copy_range(blk_in->root, off_in,
+  blk_out->root, off_out,
+  bytes, flags);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 92ab624fac..8d03d493c2 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -232,4 +232,8 @@ void blk_set_force_allow_inactivate(BlockBackend *blk);
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
 void blk_unregister_buf(BlockBackend *blk, void *host);
 
+int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+   BlockBackend *blk_out, int64_t off_out,
+   int bytes, BdrvRequestFlags flags);
+
 #endif
-- 
2.17.0




[Qemu-devel] [PATCH v9 04/10] qcow2: Implement copy offloading

2018-06-01 Thread Fam Zheng
The two callbacks are implemented quite similarly to the read/write
functions: bdrv_co_copy_range_from maps for read and calls into bs->file
or bs->backing depending on the allocation status; bdrv_co_copy_range_to
maps for write and calls into bs->file.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Fam Zheng 
---
 block/qcow2.c | 229 +++---
 1 file changed, 199 insertions(+), 30 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6d532470a8..8f89c4fe72 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1762,6 +1762,39 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 return status;
 }
 
+static coroutine_fn int qcow2_handle_l2meta(BlockDriverState *bs,
+QCowL2Meta **pl2meta,
+bool link_l2)
+{
+int ret = 0;
+QCowL2Meta *l2meta = *pl2meta;
+
+while (l2meta != NULL) {
+QCowL2Meta *next;
+
+if (!ret && link_l2) {
+ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
+if (ret) {
+goto out;
+}
+}
+
+/* Take the request off the list of running requests */
+if (l2meta->nb_clusters != 0) {
+QLIST_REMOVE(l2meta, next_in_flight);
+}
+
+qemu_co_queue_restart_all(&l2meta->dependent_requests);
+
+next = l2meta->next;
+g_free(l2meta);
+l2meta = next;
+}
+out:
+*pl2meta = l2meta;
+return ret;
+}
+
 static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
 uint64_t bytes, QEMUIOVector *qiov,
 int flags)
@@ -2048,24 +2081,9 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
 }
 }
 
-while (l2meta != NULL) {
-QCowL2Meta *next;
-
-ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
-if (ret < 0) {
-goto fail;
-}
-
-/* Take the request off the list of running requests */
-if (l2meta->nb_clusters != 0) {
-QLIST_REMOVE(l2meta, next_in_flight);
-}
-
-qemu_co_queue_restart_all(&l2meta->dependent_requests);
-
-next = l2meta->next;
-g_free(l2meta);
-l2meta = next;
+ret = qcow2_handle_l2meta(bs, &l2meta, true);
+if (ret) {
+goto fail;
 }
 
 bytes -= cur_bytes;
@@ -2076,18 +2094,7 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
 ret = 0;
 
 fail:
-while (l2meta != NULL) {
-QCowL2Meta *next;
-
-if (l2meta->nb_clusters != 0) {
-QLIST_REMOVE(l2meta, next_in_flight);
-}
-qemu_co_queue_restart_all(&l2meta->dependent_requests);
-
-next = l2meta->next;
-g_free(l2meta);
-l2meta = next;
-}
+qcow2_handle_l2meta(bs, &l2meta, false);
 
 qemu_co_mutex_unlock(&s->lock);
 
@@ -3274,6 +3281,166 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 return ret;
 }
 
+static int coroutine_fn
+qcow2_co_copy_range_from(BlockDriverState *bs,
+ BdrvChild *src, uint64_t src_offset,
+ BdrvChild *dst, uint64_t dst_offset,
+ uint64_t bytes, BdrvRequestFlags flags)
+{
+BDRVQcow2State *s = bs->opaque;
+int ret;
+unsigned int cur_bytes; /* number of bytes in current iteration */
+BdrvChild *child = NULL;
+BdrvRequestFlags cur_flags;
+
+assert(!bs->encrypted);
+qemu_co_mutex_lock(&s->lock);
+
+while (bytes != 0) {
+uint64_t copy_offset = 0;
+/* prepare next request */
+cur_bytes = MIN(bytes, INT_MAX);
+cur_flags = flags;
+
+ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, 
©_offset);
+if (ret < 0) {
+goto out;
+}
+
+switch (ret) {
+case QCOW2_CLUSTER_UNALLOCATED:
+if (bs->backing && bs->backing->bs) {
+int64_t backing_length = bdrv_getlength(bs->backing->bs);
+if (src_offset >= backing_length) {
+cur_flags |= BDRV_REQ_ZERO_WRITE;
+} else {
+child = bs->backing;
+cur_bytes = MIN(cur_bytes, backing_length - src_offset);
+copy_offset = src_offset;
+}
+} else {
+cur_flags |= BDRV_REQ_ZERO_WRITE;
+}
+break;
+
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+cur_flags |= BDRV_REQ_ZERO_WRITE;
+break;
+
+case QCOW2_CLUSTER_COMPRESSED:
+ret = -ENOTSUP;
+goto out;
+break;
+
+case QCOW2_CLUSTER_NORMAL:
+child = bs->file;
+c

[Qemu-devel] [PATCH v9 07/10] iscsi: Create and use iscsi_co_wait_for_task

2018-06-01 Thread Fam Zheng
This loop is repeated a growing number times. Make a helper.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 block/iscsi.c | 54 ---
 1 file changed, 17 insertions(+), 37 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 6d0035d4b9..6a365cb07b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -556,6 +556,17 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun 
*iscsilun,
offset / iscsilun->cluster_size) == size);
 }
 
+static void coroutine_fn iscsi_co_wait_for_task(IscsiTask *iTask,
+IscsiLun *iscsilun)
+{
+while (!iTask->complete) {
+iscsi_set_events(iscsilun);
+qemu_mutex_unlock(&iscsilun->mutex);
+qemu_coroutine_yield();
+qemu_mutex_lock(&iscsilun->mutex);
+}
+}
+
 static int coroutine_fn
 iscsi_co_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 QEMUIOVector *iov, int flags)
@@ -617,12 +628,7 @@ retry:
 scsi_task_set_iov_out(iTask.task, (struct scsi_iovec *) iov->iov,
   iov->niov);
 #endif
-while (!iTask.complete) {
-iscsi_set_events(iscsilun);
-qemu_mutex_unlock(&iscsilun->mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(&iscsilun->mutex);
-}
+iscsi_co_wait_for_task(&iTask, iscsilun);
 
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
@@ -693,13 +699,7 @@ retry:
 ret = -ENOMEM;
 goto out_unlock;
 }
-
-while (!iTask.complete) {
-iscsi_set_events(iscsilun);
-qemu_mutex_unlock(&iscsilun->mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(&iscsilun->mutex);
-}
+iscsi_co_wait_for_task(&iTask, iscsilun);
 
 if (iTask.do_retry) {
 if (iTask.task != NULL) {
@@ -863,13 +863,8 @@ retry:
 #if LIBISCSI_API_VERSION < (20160603)
 scsi_task_set_iov_in(iTask.task, (struct scsi_iovec *) iov->iov, 
iov->niov);
 #endif
-while (!iTask.complete) {
-iscsi_set_events(iscsilun);
-qemu_mutex_unlock(&iscsilun->mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(&iscsilun->mutex);
-}
 
+iscsi_co_wait_for_task(&iTask, iscsilun);
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
 iTask.task = NULL;
@@ -906,12 +901,7 @@ retry:
 return -ENOMEM;
 }
 
-while (!iTask.complete) {
-iscsi_set_events(iscsilun);
-qemu_mutex_unlock(&iscsilun->mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(&iscsilun->mutex);
-}
+iscsi_co_wait_for_task(&iTask, iscsilun);
 
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
@@ -1143,12 +1133,7 @@ retry:
 goto out_unlock;
 }
 
-while (!iTask.complete) {
-iscsi_set_events(iscsilun);
-qemu_mutex_unlock(&iscsilun->mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(&iscsilun->mutex);
-}
+iscsi_co_wait_for_task(&iTask, iscsilun);
 
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
@@ -1244,12 +1229,7 @@ retry:
 return -ENOMEM;
 }
 
-while (!iTask.complete) {
-iscsi_set_events(iscsilun);
-qemu_mutex_unlock(&iscsilun->mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(&iscsilun->mutex);
-}
+iscsi_co_wait_for_task(&iTask, iscsilun);
 
 if (iTask.status == SCSI_STATUS_CHECK_CONDITION &&
 iTask.task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
-- 
2.17.0




Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3

2018-06-01 Thread Alex Bennée


Richard Henderson  writes:

> On 05/30/2018 03:46 PM, Richard Henderson wrote:
>> Thanks.  Queued to tcg-next.
> Hmph.  Unqueued, at least for now.
>
> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug:
> assertion failed: (page_is_locked(pd))
>
> #3  0x74b6915e in g_assertion_message_expr ()
> at /lib64/libglib-2.0.so.0
> #4  0x5583c088 in page_unlock__debug (pd=0x7fffa423aa80)
> at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615
> #5  0x5583c1be in page_unlock (pd=0x7fffa423aa80)
> at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:661
> #6  0x5583c2ef in page_entry_destroy (p=0x7fffa8024460)
> at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:694
> #7  0x74b6f448 in  () at /lib64/libglib-2.0.so.0
> #8  0x74b6fea2 in g_tree_destroy () at /lib64/libglib-2.0.so.0
> #9  0x5583c791 in page_collection_unlock (set=0x7fffa802eba0)
> at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:842
> #10 0x557b301a in memory_notdirty_write_complete (ndi=0x7fffd9cf6050)
> at /home/rth/work/qemu/qemu/exec.c:2495
> #11 0x557b317f in notdirty_mem_write (opaque=0x0, ram_addr=12334096,
> val=18446739675675374544, size=8) at /home/rth/work/qemu/qemu/exec.c:2535
> #12 0x5580f14b in memory_region_write_accessor (mr=0x562a38a0
> , addr=12334096, value=0x7fffd9cf6178, size=8, shift=0,
> mask=18446744073709551615, attrs=...) at /home/rth/work/qemu/qemu/memory.c:530
> #13 0x5580f360 in access_with_adjusted_size (addr=12334096,
> value=0x7fffd9cf6178, size=8, access_size_min=1, access_size_max=8, access_fn=
> 0x5580f061 , mr=0x562a38a0
> , attrs=...) at /home/rth/work/qemu/qemu/memory.c:597
> #14 0x55811cef in memory_region_dispatch_write (mr=0x562a38a0
> , addr=12334096, data=18446739675675374544, size=8, 
> attrs=...)
> at /home/rth/work/qemu/qemu/memory.c:1474
> #15 0x55825d73 in io_writex (env=0x56869090,
> iotlbentry=0x56870520, mmu_idx=0, val=18446739675675374544,
> addr=18446739675675374608, retaddr=140736231479305, size=8) at
> /home/rth/work/qemu/qemu/accel/tcg/cputlb.c:813
> #16 0x55828b6d in io_writeq (env=0x56869090, mmu_idx=0, index=225,
> val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305)
> at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:265
> #17 0x55828d2c in helper_le_stq_mmu (env=0x56869090,
> addr=18446739675675374608, val=18446739675675374544, oi=48,
> retaddr=140736231479305)
> at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:301
> #18 0x7fffb5159809 in code_gen_buffer ()
>
> I can invoke similar crashes with just about every image I try.

Just booting up? I've been hammering builds in my system image with
debug-tcg enabled and haven't triggered it yet.

Using:

./aarch64-softmmu/qemu-system-aarch64 -machine 
virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 --serial 
mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::-:22 -device 
virtio-blk-device,drive=myblock -drive 
file=/home/alex/lsrc/qemu/images/debian-stable-arm64.qcow2,id=myblock,index=0,if=none
 -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img 
-append "console=ttyAMA0 root=/dev/vda1" -display none -m 4096 -name 
debug-threads=on -smp 8
--
Alex Bennée



[Qemu-devel] [PATCH v9 08/10] iscsi: Implement copy offloading

2018-06-01 Thread Fam Zheng
Issue EXTENDED COPY (LID1) command to implement the copy_range API.

The parameter data construction code is modified from libiscsi's
iscsi-dd.c.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
---
 block/iscsi.c| 219 +++
 include/scsi/constants.h |   4 +
 2 files changed, 223 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 6a365cb07b..c2fbd8a8aa 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2205,6 +2205,221 @@ static void coroutine_fn 
iscsi_co_invalidate_cache(BlockDriverState *bs,
 iscsi_allocmap_invalidate(iscsilun);
 }
 
+static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs,
+ BdrvChild *src,
+ uint64_t src_offset,
+ BdrvChild *dst,
+ uint64_t dst_offset,
+ uint64_t bytes,
+ BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, 
flags);
+}
+
+static struct scsi_task *iscsi_xcopy_task(int param_len)
+{
+struct scsi_task *task;
+
+task = g_new0(struct scsi_task, 1);
+
+task->cdb[0] = EXTENDED_COPY;
+task->cdb[10]= (param_len >> 24) & 0xFF;
+task->cdb[11]= (param_len >> 16) & 0xFF;
+task->cdb[12]= (param_len >> 8) & 0xFF;
+task->cdb[13]= param_len & 0xFF;
+task->cdb_size   = 16;
+task->xfer_dir   = SCSI_XFER_WRITE;
+task->expxferlen = param_len;
+
+return task;
+}
+
+static void iscsi_populate_target_desc(unsigned char *desc, IscsiLun *lun)
+{
+struct scsi_inquiry_device_designator *dd = lun->dd;
+
+memset(desc, 0, 32);
+desc[0] = 0xE4; /* IDENT_DESCR_TGT_DESCR */
+desc[4] = dd->code_set;
+desc[5] = (dd->designator_type & 0xF)
+| ((dd->association & 3) << 4);
+desc[7] = dd->designator_length;
+memcpy(desc + 8, dd->designator, dd->designator_length);
+
+desc[28] = 0;
+desc[29] = (lun->block_size >> 16) & 0xFF;
+desc[30] = (lun->block_size >> 8) & 0xFF;
+desc[31] = lun->block_size & 0xFF;
+}
+
+static void iscsi_xcopy_desc_hdr(uint8_t *hdr, int dc, int cat, int src_index,
+ int dst_index)
+{
+hdr[0] = 0x02; /* BLK_TO_BLK_SEG_DESCR */
+hdr[1] = ((dc << 1) | cat) & 0xFF;
+hdr[2] = (XCOPY_BLK2BLK_SEG_DESC_SIZE >> 8) & 0xFF;
+/* don't account for the first 4 bytes in descriptor header*/
+hdr[3] = (XCOPY_BLK2BLK_SEG_DESC_SIZE - 4 /* SEG_DESC_SRC_INDEX_OFFSET */) 
& 0xFF;
+hdr[4] = (src_index >> 8) & 0xFF;
+hdr[5] = src_index & 0xFF;
+hdr[6] = (dst_index >> 8) & 0xFF;
+hdr[7] = dst_index & 0xFF;
+}
+
+static void iscsi_xcopy_populate_desc(uint8_t *desc, int dc, int cat,
+  int src_index, int dst_index, int 
num_blks,
+  uint64_t src_lba, uint64_t dst_lba)
+{
+iscsi_xcopy_desc_hdr(desc, dc, cat, src_index, dst_index);
+
+/* The caller should verify the request size */
+assert(num_blks < 65536);
+desc[10] = (num_blks >> 8) & 0xFF;
+desc[11] = num_blks & 0xFF;
+desc[12] = (src_lba >> 56) & 0xFF;
+desc[13] = (src_lba >> 48) & 0xFF;
+desc[14] = (src_lba >> 40) & 0xFF;
+desc[15] = (src_lba >> 32) & 0xFF;
+desc[16] = (src_lba >> 24) & 0xFF;
+desc[17] = (src_lba >> 16) & 0xFF;
+desc[18] = (src_lba >> 8) & 0xFF;
+desc[19] = src_lba & 0xFF;
+desc[20] = (dst_lba >> 56) & 0xFF;
+desc[21] = (dst_lba >> 48) & 0xFF;
+desc[22] = (dst_lba >> 40) & 0xFF;
+desc[23] = (dst_lba >> 32) & 0xFF;
+desc[24] = (dst_lba >> 24) & 0xFF;
+desc[25] = (dst_lba >> 16) & 0xFF;
+desc[26] = (dst_lba >> 8) & 0xFF;
+desc[27] = dst_lba & 0xFF;
+}
+
+static void iscsi_xcopy_populate_header(unsigned char *buf, int list_id, int 
str,
+int list_id_usage, int prio,
+int tgt_desc_len,
+int seg_desc_len, int inline_data_len)
+{
+buf[0] = list_id;
+buf[1] = ((str & 1) << 5) | ((list_id_usage & 3) << 3) | (prio & 7);
+buf[2] = (tgt_desc_len >> 8) & 0xFF;
+buf[3] = tgt_desc_len & 0xFF;
+buf[8] = (seg_desc_len >> 24) & 0xFF;
+buf[9] = (seg_desc_len >> 16) & 0xFF;
+buf[10] = (seg_desc_len >> 8) & 0xFF;
+buf[11] = seg_desc_len & 0xFF;
+buf[12] = (inline_data_len >> 24) & 0xFF;
+buf[13] = (inline_data_len >> 16) & 0xFF;
+buf[14] = (inline_data_len >> 8) & 0xFF;
+buf[15] = inline_data_len & 0xFF;
+}
+
+static void iscsi_xcopy_data(struct iscsi_data *data,
+ IscsiLun *src, int64_t src_lba,
+ IscsiLun *dst, int64_t dst_lba,
+ uint16_t num_blocks)
+{
+uint8_

Re: [Qemu-devel] [PATCH v8 06/11] file-posix: Implement bdrv_co_copy_range

2018-06-01 Thread Stefan Hajnoczi
On Fri, Jun 01, 2018 at 02:28:44PM +0800, Fam Zheng wrote:
> With copy_file_range(2), we can implement the bdrv_co_copy_range
> semantics.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/file-posix.c  | 98 +++--
>  configure   | 17 +++
>  include/block/raw-aio.h | 10 -
>  3 files changed, 120 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v9 10/10] qemu-img: Convert with copy offloading

2018-06-01 Thread Fam Zheng
The new blk_co_copy_range interface offers a more efficient way in the
case of network based storage. Make use of it to allow faster convert
operation.

Since copy offloading cannot do zero detection ('-S') and compression
(-c), only try it when these options are not used.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
---
 qemu-img.c | 50 --
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 976b437da0..75f1610aa0 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1547,6 +1547,7 @@ typedef struct ImgConvertState {
 bool compressed;
 bool target_has_backing;
 bool wr_in_order;
+bool copy_range;
 int min_sparse;
 size_t cluster_sectors;
 size_t buf_sectors;
@@ -1740,6 +1741,37 @@ static int coroutine_fn convert_co_write(ImgConvertState 
*s, int64_t sector_num,
 return 0;
 }
 
+static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t 
sector_num,
+  int nb_sectors)
+{
+int n, ret;
+
+while (nb_sectors > 0) {
+BlockBackend *blk;
+int src_cur;
+int64_t bs_sectors, src_cur_offset;
+int64_t offset;
+
+convert_select_part(s, sector_num, &src_cur, &src_cur_offset);
+offset = (sector_num - src_cur_offset) << BDRV_SECTOR_BITS;
+blk = s->src[src_cur];
+bs_sectors = s->src_sectors[src_cur];
+
+n = MIN(nb_sectors, bs_sectors - (sector_num - src_cur_offset));
+
+ret = blk_co_copy_range(blk, offset, s->target,
+sector_num << BDRV_SECTOR_BITS,
+n << BDRV_SECTOR_BITS, 0);
+if (ret < 0) {
+return ret;
+}
+
+sector_num += n;
+nb_sectors -= n;
+}
+return 0;
+}
+
 static void coroutine_fn convert_co_do_copy(void *opaque)
 {
 ImgConvertState *s = opaque;
@@ -1762,6 +1794,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
 int n;
 int64_t sector_num;
 enum ImgConvertBlockStatus status;
+bool copy_range;
 
 qemu_co_mutex_lock(&s->lock);
 if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) {
@@ -1791,7 +1824,9 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
 s->allocated_sectors, 0);
 }
 
-if (status == BLK_DATA) {
+retry:
+copy_range = s->copy_range && s->status == BLK_DATA;
+if (status == BLK_DATA && !copy_range) {
 ret = convert_co_read(s, sector_num, n, buf);
 if (ret < 0) {
 error_report("error while reading sector %" PRId64
@@ -1813,7 +1848,15 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
 }
 
 if (s->ret == -EINPROGRESS) {
-ret = convert_co_write(s, sector_num, n, buf, status);
+if (copy_range) {
+ret = convert_co_copy_range(s, sector_num, n);
+if (ret) {
+s->copy_range = false;
+goto retry;
+}
+} else {
+ret = convert_co_write(s, sector_num, n, buf, status);
+}
 if (ret < 0) {
 error_report("error while writing sector %" PRId64
  ": %s", sector_num, strerror(-ret));
@@ -1936,6 +1979,7 @@ static int img_convert(int argc, char **argv)
 ImgConvertState s = (ImgConvertState) {
 /* Need at least 4k of zeros for sparse detection */
 .min_sparse = 8,
+.copy_range = true,
 .buf_sectors= IO_BUF_SIZE / BDRV_SECTOR_SIZE,
 .wr_in_order= true,
 .num_coroutines = 8,
@@ -1976,6 +2020,7 @@ static int img_convert(int argc, char **argv)
 break;
 case 'c':
 s.compressed = true;
+s.copy_range = false;
 break;
 case 'o':
 if (!is_valid_option_list(optarg)) {
@@ -2017,6 +2062,7 @@ static int img_convert(int argc, char **argv)
 }
 
 s.min_sparse = sval / BDRV_SECTOR_SIZE;
+s.copy_range = false;
 break;
 }
 case 'p':
-- 
2.17.0




Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading

2018-06-01 Thread Stefan Hajnoczi
On Thu, May 31, 2018 at 11:45:17PM -0700, no-re...@patchew.org wrote:
> /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c: In function 
> ‘iscsi_populate_target_desc’:
> /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: error: 
> ‘IDENT_DESCR_TGT_DESCR’ undeclared (first use in this function); did you mean 
> ‘IDENT_DESCR_TGT_DESCR_SIZE’?
>  desc[0] = IDENT_DESCR_TGT_DESCR;
>^
>IDENT_DESCR_TGT_DESCR_SIZE
> /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: note: each 
> undeclared identifier is reported only once for each function it appears in

Fam, is this failure expected?

Aside from this I'm happy with the series.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 05/20] 9p: Properly set errp in fstatfs error path

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:26:00 -0400
Keno Fischer  wrote:

> In the review of
> 
> 9p: Avoid warning if FS_IOC_GETVERSION is not defined
> 
> Grep Kurz noted this error path was failing to set errp.
> Fix that.
> 
> Signed-off-by: Keno Fischer 
> ---

This is a bug fix so I've applied it to 9p-next.

Thanks!

> 
> Changes since v1: New patch
> 
>  hw/9pfs/9p-local.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> index adc169a..576c8e3 100644
> --- a/hw/9pfs/9p-local.c
> +++ b/hw/9pfs/9p-local.c
> @@ -1420,6 +1420,8 @@ static int local_init(FsContext *ctx, Error **errp)
>   */
>  if (fstatfs(data->mountfd, &stbuf) < 0) {
>  close_preserve_errno(data->mountfd);
> +error_setg_errno(errp, errno,
> +"failed to stat file system at '%s'", ctx->fs_root);
>  goto err;
>  }
>  switch (stbuf.f_type) {




Re: [Qemu-devel] [PATCH v8 01/11] docker: Update fedora image to 28

2018-06-01 Thread Stefan Hajnoczi
On Fri, Jun 01, 2018 at 02:28:39PM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  tests/docker/dockerfiles/fedora.docker | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Seems reasonable, Fedora is a cutting-edge distro.  Unlike stable
distros like CentOS and Debian where we actually want the oldest
supported release, we want the latest release for Fedora.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT

2018-06-01 Thread Pavel Dovgalyuk
That’s right.

 

Pavel Dovgalyuk

 

From: Arnabjyoti Kalita [mailto:akal...@cs.stonybrook.edu] 
Sent: Friday, June 01, 2018 11:27 AM
To: Pavel Dovgalyuk
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Pavel Dovgalyuk
Subject: Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT

 

Dear Pavel,

 

Thank you for providing me with all the details. Let us take an example of a 
Network packet. In icount mode, when the network backend, receives a network 
packet, you record the whole packet with the help of the replay-filter. This 
packet will be written to the log file. Now when the time comes for replay, you 
stop accepting any packets from the network backend and directly inject all of 
the packets that you have already recorded in the log file into the guest 
address space memory. Am I correct in understanding this ?

 

Thanks and Regards,

Arnab  

 

On Fri, Jun 1, 2018 at 1:31 AM, Pavel Dovgalyuk  wrote:

Hi,

 

I’m not familiar with KVM, but I know successful attempts of replaying the 
execution by logging IO and MMIO in TCG mode.

The difference in CPU I/O and VM I/O is the following. In icount we record 
anything coming into the VM, but not into the CPU.

It means that the whole packet is recorded. Virtual hardware behaves 
deterministically and therefore CPU will get identical

input in case of replay, because the whole recorded packet is injected again by 
the filter.

 

Pavel Dovgalyuk

 

From: Arnabjyoti Kalita [mailto:akal...@cs.stonybrook.edu] 
Sent: Thursday, May 31, 2018 11:14 PM
To: Pavel Dovgalyuk
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Pavel Dovgalyuk
Subject: Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT

 

Dear Pavel,

 

Thank you for your answer. I am not being able to understand the difference 
between CPU I/Os and VM I/Os. Would any network packet that comes into the 
Guest OS from the outside be a part of VM I/O or CPU I/O ? I am only interested 
in "recording" and "replaying" those network packets that come from the outside 
into the networking backend and not the other way around. Say for example when 
I get a VMExit because of the arrival of a network packet, I will use the 
VMExit reason : "KVM_EXIT_MMIO"  to trace back to "e1000_mmio_write()" which I 
expect should be enough to record network packets that come from the outside 
and write to the guest address space for "e1000" devices. In such a case, I 
think I will not have to use the "network-filter" backend that you use to 
record VM I/O only. Let me know if you find errors in my approach.

 

I will try to see how I can record disk packets. If disk packets use other ways 
of writing to the guest memory apart from a normal VMExit, I will try to find 
it out. Eventually I hope that it will use one of the available disk front-end 
functions to write to the guest memory from the disk, just like e1000 does with 
an "e1000_mmio_write()" call. 

 

Thanks and best regards,

Arnab

 

 

 

 

 



 

On Thu, May 31, 2018 at 8:44 AM, Pavel Dovgalyuk  wrote:

> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> On Wed, May 30, 2018 at 11:19:13PM -0400, Arnabjyoti Kalita wrote:
> > I am trying to implement a 'minimal' record-replay mechanism for KVM, which
> > is similar to the one existing for TCG via -icount. I am trying to record
> > I/O events only (specifically disk and network events) when KVM does a
> > VMEXIT. This has led me to the function kvm_cpu_exec where I can clearly
> > see the different ways of handling all of the possible VMExit cases (like
> > PIO, MMIO etc.). To record network packets, I am working with the e1000
> > hardware device.
> >
> > Can I make sure that all of the network I/O, atleast for the e1000 device
> > happens through the KVM_EXIT_MMIO case and subsequent use of the
> > address_space_rw() function ? Do I also need to look at other functions as
> > well ? Also for recording disk activity, can I make sure that looking out
> > for the KVM_EXIT_MMIO and/or KVM_EXIT_PIO cases in the vmexit mechanism,
> > will be enough ?
> >
> > Let me know if there are other details that I need to take care of. I am
> > using QEMU 2.11 on a x86-64 CPU and the guest runs a Linux Kernel 4.4 with
> > Ubuntu 16.04.

The main icount-based record/replay advantage is that we don't record
any CPU IO. We record only VM IO (e.g., by using the network filter).

Disk devices may transfer data to CPU using DMA, therefore intercepting
only VMExit cases will not be enough.

Pavel Dovgalyuk

 

 



Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading

2018-06-01 Thread Fam Zheng
On Fri, 06/01 10:37, Stefan Hajnoczi wrote:
> On Thu, May 31, 2018 at 11:45:17PM -0700, no-re...@patchew.org wrote:
> > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c: In function 
> > ‘iscsi_populate_target_desc’:
> > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: error: 
> > ‘IDENT_DESCR_TGT_DESCR’ undeclared (first use in this function); did you 
> > mean ‘IDENT_DESCR_TGT_DESCR_SIZE’?
> >  desc[0] = IDENT_DESCR_TGT_DESCR;
> >^
> >IDENT_DESCR_TGT_DESCR_SIZE
> > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: note: each 
> > undeclared identifier is reported only once for each function it appears in
> 
> Fam, is this failure expected?

Nope. See my other reply (v9 posted).

Fam



Re: [Qemu-devel] [PATCH v2 06/20] 9p: Avoid warning if FS_IOC_GETVERSION is not defined

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:26:01 -0400
Keno Fischer  wrote:

> Both `stbuf` and `local_ioc_getversion` where unused when
> FS_IOC_GETVERSION was not defined, causing a compiler warning.
> 
> Reorgnaize the code to avoid this warning.
> 
> Signed-off-by: Keno Fischer 
> ---
> 
> Changes since v1:
>  * As request in review, logic is factored into a
>local_ioc_getversion_init function.
> 
>  hw/9pfs/9p-local.c | 43 +--
>  1 file changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> index 576c8e3..6222891 100644
> --- a/hw/9pfs/9p-local.c
> +++ b/hw/9pfs/9p-local.c
> @@ -1375,10 +1375,10 @@ static int local_unlinkat(FsContext *ctx, V9fsPath 
> *dir,
>  return ret;
>  }
>  
> +#ifdef FS_IOC_GETVERSION
>  static int local_ioc_getversion(FsContext *ctx, V9fsPath *path,
>  mode_t st_mode, uint64_t *st_gen)
>  {
> -#ifdef FS_IOC_GETVERSION
>  int err;
>  V9fsFidOpenState fid_open;
>  
> @@ -1397,32 +1397,19 @@ static int local_ioc_getversion(FsContext *ctx, 
> V9fsPath *path,
>  err = ioctl(fid_open.fd, FS_IOC_GETVERSION, st_gen);
>  local_close(ctx, &fid_open);
>  return err;
> -#else
> -errno = ENOTTY;
> -return -1;
> -#endif
>  }
> +#endif
>  
> -static int local_init(FsContext *ctx, Error **errp)
> +static int local_ioc_getversion_init(FsContext *ctx, LocalData *data)
>  {
> +#ifdef FS_IOC_GETVERSION
>  struct statfs stbuf;
> -LocalData *data = g_malloc(sizeof(*data));
>  
> -data->mountfd = open(ctx->fs_root, O_DIRECTORY | O_RDONLY);
> -if (data->mountfd == -1) {
> -error_setg_errno(errp, errno, "failed to open '%s'", ctx->fs_root);
> -goto err;
> -}
> -
> -#ifdef FS_IOC_GETVERSION
>  /*
>   * use ioc_getversion only if the ioctl is definied
>   */
>  if (fstatfs(data->mountfd, &stbuf) < 0) {
> -close_preserve_errno(data->mountfd);
> -error_setg_errno(errp, errno,
> -"failed to stat file system at '%s'", ctx->fs_root);
> -goto err;

Hmm, I'd prefer to keep the error_setg_errno() with fstatfs(), ie,
add an errp argument to this function.

> +return -1;
>  }
>  switch (stbuf.f_type) {
>  case EXT2_SUPER_MAGIC:
> @@ -1433,6 +1420,26 @@ static int local_init(FsContext *ctx, Error **errp)
>  break;
>  }
>  #endif
> +return 0;
> +}
> +
> +static int local_init(FsContext *ctx, Error **errp)
> +{
> +LocalData *data = g_malloc(sizeof(*data));
> +
> +data->mountfd = open(ctx->fs_root, O_DIRECTORY | O_RDONLY);
> +if (data->mountfd == -1) {
> +error_setg_errno(errp, errno, "failed to open '%s'", ctx->fs_root);
> +goto err;
> +}
> +
> +if (local_ioc_getversion_init(ctx, data) < 0) {
> +close_preserve_errno(data->mountfd);

And this could even be a plain close()

> +error_setg_errno(errp, errno,
> +"failed initialize ioc_getversion for file system at '%s'",

True, but I think "failed to stat file system" is more meaningful,
especially with the errno.

> +ctx->fs_root);
> +goto err;
> +}
>  
>  if (ctx->export_flags & V9FS_SM_PASSTHROUGH) {
>  ctx->xops = passthrough_xattr_ops;




Re: [Qemu-devel] [PATCH 2/2] backup: Use copy offloading

2018-06-01 Thread Stefan Hajnoczi
On Thu, May 31, 2018 at 10:34:45AM +0800, Fam Zheng wrote:
> The implementation is similar to the 'qemu-img convert'. In the
> beginning of the job, offloaded copy is attempted. If it fails, further
> I/O will go through the existing bounce buffer code path.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/backup.c | 93 
> +++---
>  block/trace-events |  1 +
>  2 files changed, 62 insertions(+), 32 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 4e228e959b..ab189693f4 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -45,6 +45,8 @@ typedef struct BackupBlockJob {
>  QLIST_HEAD(, CowRequest) inflight_reqs;
>  
>  HBitmap *copy_bitmap;
> +bool use_copy_range;
> +int64_t copy_range_size;
>  } BackupBlockJob;
>  
>  static const BlockJobDriver backup_job_driver;
> @@ -111,49 +113,70 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  cow_request_begin(&cow_request, job, start, end);
>  
>  for (; start < end; start += job->cluster_size) {
> +retry:

This for loop is becoming complex.  Please introduce helper functions.
The loop body can be replaced with something like this:

  if (!hbitmap_get(job->copy_bitmap, start / job->cluster_size)) {
  trace_backup_do_cow_skip(job, start);
  continue; /* already copied */
  }

  trace_backup_do_cow_process(job, start);

  ret = -ENOTSUPP;
  if (job->use_copy_range) {
  ret = cow_with_offload(...);
  }
  if (ret < 0) {
  job->use_copy_range = false;
  ret = cow_with_bounce_buffer(...);
  }
  if (ret < 0) {
  trace_backup_do_cow_write_fail(job, start, ret);
  goto out;
  }


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

2018-06-01 Thread Peter Xu
On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote:
> On 06/01/2018 01:07 PM, Peter Xu wrote:
> > On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
> > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > > This is the deivce part implementation to add a new feature,
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > > receives the guest free page hints from the driver and clears the
> > > > corresponding bits in the dirty bitmap, so that those free pages are
> > > > not transferred by the migration thread to the destination.
> > > > 
> > > > - Test Environment
> > > >  Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > >  Guest: 8G RAM, 4 vCPU
> > > >  Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 
> > > > second
> > > > 
> > > > - Test Results
> > > >  - Idle Guest Live Migration Time (results are averaged over 10 
> > > > runs):
> > > >  - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > > >  - Guest with Linux Compilation Workload (make bzImage -j4):
> > > >  - Live Migration Time (average)
> > > >Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% 
> > > > reduction
> > > >  - Linux Compilation Time
> > > >Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > > >--> no obvious difference
> > > > 
> > > > - Source Code
> > > >  - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > > >  - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > > Hi, Wei,
> > > 
> > > I have a very high-level question to the series.
> > > 
> > > IIUC the core idea for this series is that we can avoid sending some
> > > of the pages if we know that we don't need to send them.  I think this
> > > is based on the fact that on the destination side all the pages are by
> > > default zero after they are malloced.  While before this series, IIUC
> > > any migration will send every single page to destination, no matter
> > > whether it's zeroed or not.  So I'm uncertain about whether this will
> > > affect the received bitmap on the destination side.  Say, before this
> > > series, the received bitmap will directly cover the whole RAM bitmap
> > > after migration is finished, now it's won't.  Will there be any side
> > > effect?  I don't see obvious issue now, but just raise this question
> > > up.
> > > 
> > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > just avoid sending the zero pages directly from QEMU's perspective.
> > > In other words, can we just do nothing if save_zero_page() detected
> > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > but I don't know exactly how fast it is)?  And how that would be
> > > differed from this page hinting way in either performance and other
> > > aspects.
> > I noticed a problem (after I wrote the above paragraph 5 minutes
> > ago...): when a page was valid and sent to the destination (with
> > non-zero data), however after a while that page was zeroed.  Then if
> > we don't send zero pages at all, we won't send the page after it's
> > zeroed.  Then on the destination side we'll have a stale non-zero
> > page.  Is my understanding correct?  Will that be a problem to this
> > series too where a valid page can be possibly freed and hinted?
> 
> I think that won't be an issue either for zero page optimization or this
> free page optimization.
> 
> For the zero page optimization, QEMU always sends compressed 0s to the
> destination. The zero page is detected at the time QEMU checks it (before
> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just
> a flag) and send it.

what I meant is, can we just do not even send that ZERO flag at all? :)

> 
> For the free page optimization, we skip free pages (could be thought of as 0
> pages in this context). The zero pages are detected at the time guest
> reports it QEMU. The page won't be reported if it is non-zero (i.e. used).

Sorry I must have not explained myself well.  Let's assume the page
hint is used.  I meant this:

- start precopy, page P is non-zero (let's say, page has content P1,
  which is non-zero)
- we send page P with content P1 on src, then latest destination cache
  of page P is P1
- page P is freed by the guest, then it becomes zero, dirty bitmap of
  P is set since it's changed (from P1 to zeroed page)
- page P is provided as hint that we can skip it since it's zeroed,
  then the dirty bit of P is cleared
- ... (page P is never used until migration completes)

After migration completes, page P should be an zeroed page on the
source, while IIUC on the destination side it's still with stale data
P1.  Did I miss anything important?

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 07/20] 9p: Move a couple xattr functions to 9p-util

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:26:02 -0400
Keno Fischer  wrote:

> These functions will need custom implementations on Darwin. Since the
> implementation is very similar among all of them, and 9p-util already
> has the _nofollow version of fgetxattrat, let's move them all there.
> 
> Signed-off-by: Keno Fischer 
> ---
> 

This cleanup makes sense irrespective of the rest of the series.

Applied to 9p-next.

Thanks!

> Changes since v1:
>  * fgetxattr_follow is dropped in favor of a different approach
>later in the series.
> 
>  hw/9pfs/9p-util.c  | 33 +
>  hw/9pfs/9p-util.h  |  4 
>  hw/9pfs/9p-xattr.c | 33 -
>  3 files changed, 37 insertions(+), 33 deletions(-)
> 
> diff --git a/hw/9pfs/9p-util.c b/hw/9pfs/9p-util.c
> index f709c27..614b7fc 100644
> --- a/hw/9pfs/9p-util.c
> +++ b/hw/9pfs/9p-util.c
> @@ -24,3 +24,36 @@ ssize_t fgetxattrat_nofollow(int dirfd, const char 
> *filename, const char *name,
>  g_free(proc_path);
>  return ret;
>  }
> +
> +ssize_t flistxattrat_nofollow(int dirfd, const char *filename,
> +  char *list, size_t size)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = llistxattr(proc_path, list, size);
> +g_free(proc_path);
> +return ret;
> +}
> +
> +ssize_t fremovexattrat_nofollow(int dirfd, const char *filename,
> +const char *name)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = lremovexattr(proc_path, name);
> +g_free(proc_path);
> +return ret;
> +}
> +
> +int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
> + void *value, size_t size, int flags)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = lsetxattr(proc_path, name, value, size, flags);
> +g_free(proc_path);
> +return ret;
> +}
> diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> index dc0d2e2..79ed6b2 100644
> --- a/hw/9pfs/9p-util.h
> +++ b/hw/9pfs/9p-util.h
> @@ -60,5 +60,9 @@ ssize_t fgetxattrat_nofollow(int dirfd, const char *path, 
> const char *name,
>   void *value, size_t size);
>  int fsetxattrat_nofollow(int dirfd, const char *path, const char *name,
>   void *value, size_t size, int flags);
> +ssize_t flistxattrat_nofollow(int dirfd, const char *filename,
> +  char *list, size_t size);
> +ssize_t fremovexattrat_nofollow(int dirfd, const char *filename,
> +const char *name);
>  
>  #endif
> diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
> index d05c1a1..c696d8f 100644
> --- a/hw/9pfs/9p-xattr.c
> +++ b/hw/9pfs/9p-xattr.c
> @@ -60,17 +60,6 @@ ssize_t pt_listxattr(FsContext *ctx, const char *path,
>  return name_size;
>  }
>  
> -static ssize_t flistxattrat_nofollow(int dirfd, const char *filename,
> - char *list, size_t size)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = llistxattr(proc_path, list, size);
> -g_free(proc_path);
> -return ret;
> -}
> -
>  /*
>   * Get the list and pass to each layer to find out whether
>   * to send the data or not
> @@ -196,17 +185,6 @@ ssize_t pt_getxattr(FsContext *ctx, const char *path, 
> const char *name,
>  return local_getxattr_nofollow(ctx, path, name, value, size);
>  }
>  
> -int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
> - void *value, size_t size, int flags)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = lsetxattr(proc_path, name, value, size, flags);
> -g_free(proc_path);
> -return ret;
> -}
> -
>  ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path,
>  const char *name, void *value, size_t size,
>  int flags)
> @@ -235,17 +213,6 @@ int pt_setxattr(FsContext *ctx, const char *path, const 
> char *name, void *value,
>  return local_setxattr_nofollow(ctx, path, name, value, size, flags);
>  }
>  
> -static ssize_t fremovexattrat_nofollow(int dirfd, const char *filename,
> -   const char *name)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = lremovexattr(proc_path, name);
> -g_free(proc_path);
> -return ret;
> -}
> -
>  ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path,
> const char *name)
>  {




Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap

2018-06-01 Thread Peter Xu
On Fri, Jun 01, 2018 at 03:36:01PM +0800, Wei Wang wrote:
> On 06/01/2018 12:00 PM, Peter Xu wrote:
> > On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
> > > This patch adds an API to clear bits corresponding to guest free pages
> > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> > > RAMBlock boundary.
> > > 
> > > Signed-off-by: Wei Wang 
> > > CC: Dr. David Alan Gilbert 
> > > CC: Juan Quintela 
> > > CC: Michael S. Tsirkin 
> > > ---
> > >   include/migration/misc.h |  2 ++
> > >   migration/ram.c  | 44 
> > > 
> > >   2 files changed, 46 insertions(+)
> > > 
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index 4ebf24c..113320e 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -14,11 +14,13 @@
> > >   #ifndef MIGRATION_MISC_H
> > >   #define MIGRATION_MISC_H
> > > +#include "exec/cpu-common.h"
> > >   #include "qemu/notify.h"
> > >   /* migration/ram.c */
> > >   void ram_mig_init(void);
> > > +void qemu_guest_free_page_hint(void *addr, size_t len);
> > >   /* migration/block.c */
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 9a72b1a..0147548 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
> > >   }
> > >   /*
> > > + * This function clears bits of the free pages reported by the caller 
> > > from the
> > > + * migration dirty bitmap. @addr is the host address corresponding to the
> > > + * start of the continuous guest free pages, and @len is the total bytes 
> > > of
> > > + * those pages.
> > > + */
> > > +void qemu_guest_free_page_hint(void *addr, size_t len)
> > > +{
> > > +RAMBlock *block;
> > > +ram_addr_t offset;
> > > +size_t used_len, start, npages;
> > Do we need to check here on whether a migration is in progress?  Since
> > if not I'm not sure whether this hint still makes any sense any more,
> > and more importantly it seems to me that block->bmap below at [1] is
> > only valid during a migration.  So I'm not sure whether QEMU will
> > crash if this function is called without a running migration.
> 
> OK. How about just adding comments above to have users noted that this
> function should be used during migration?
> 
> If we want to do a sanity check here, I think it would be easier to just
> check !block->bmap here.

I think the faster way might be that we check against the migration
state.

> 
> 
> > 
> > > +
> > > +for (; len > 0; len -= used_len) {
> > > +block = qemu_ram_block_from_host(addr, false, &offset);
> > > +if (unlikely(!block)) {
> > > +return;
> > We should never reach here, should we?  Assuming the callers of this
> > function should always pass in a correct host address. If we are very
> > sure that the host addr should be valid, could we just assert?
> 
> Probably not the case, because of the corner case that the memory would be
> hot unplugged after the free page is reported to QEMU.

Question: Do we allow to do hot plug/unplug for memory during
migration?

> 
> 
> 
> > 
> > > +}
> > > +
> > > +/*
> > > + * This handles the case that the RAMBlock is resized after the 
> > > free
> > > + * page hint is reported.
> > > + */
> > > +if (unlikely(offset > block->used_length)) {
> > > +return;
> > > +}
> > > +
> > > +if (len <= block->used_length - offset) {
> > > +used_len = len;
> > > +} else {
> > > +used_len = block->used_length - offset;
> > > +addr += used_len;
> > > +}
> > > +
> > > +start = offset >> TARGET_PAGE_BITS;
> > > +npages = used_len >> TARGET_PAGE_BITS;
> > > +
> > > +qemu_mutex_lock(&ram_state->bitmap_mutex);
> > So now I think I understand the lock can still be meaningful since
> > this function now can be called outside the migration thread (e.g., in
> > vcpu thread).  But still it would be nice to mention it somewhere on

(Actually after read the next patch I think it's in iothread, so I'd
 better reply with all the series read over next time :)

> > the truth of the lock.
> > 
> 
> Yes. Thanks for the reminder. I will add some explanation to the patch 2
> commit log.

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 08/20] 9p: Rename 9p-util -> 9p-util-linux

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:26:03 -0400
Keno Fischer  wrote:

> The current file only has the Linux versions of these functions.
> Rename the file accordingly and update the Makefile to only build
> it on Linux. A Darwin version of these will follow later in the
> series.
> 
> Signed-off-by: Keno Fischer 
> ---
> 

Reviewed-by: Greg Kurz 

> Changes since v1: New patch
> 
>  hw/9pfs/9p-util-linux.c | 59 
> +
>  hw/9pfs/9p-util.c   | 59 
> -
>  hw/9pfs/Makefile.objs   |  3 ++-
>  3 files changed, 61 insertions(+), 60 deletions(-)
>  create mode 100644 hw/9pfs/9p-util-linux.c
>  delete mode 100644 hw/9pfs/9p-util.c
> 
> diff --git a/hw/9pfs/9p-util-linux.c b/hw/9pfs/9p-util-linux.c
> new file mode 100644
> index 000..defa3a4
> --- /dev/null
> +++ b/hw/9pfs/9p-util-linux.c
> @@ -0,0 +1,59 @@
> +/*
> + * 9p utilities (Linux Implementation)
> + *
> + * Copyright IBM, Corp. 2017
> + *
> + * Authors:
> + *  Greg Kurz 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/xattr.h"
> +#include "9p-util.h"
> +
> +ssize_t fgetxattrat_nofollow(int dirfd, const char *filename, const char 
> *name,
> + void *value, size_t size)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = lgetxattr(proc_path, name, value, size);
> +g_free(proc_path);
> +return ret;
> +}
> +
> +ssize_t flistxattrat_nofollow(int dirfd, const char *filename,
> +  char *list, size_t size)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = llistxattr(proc_path, list, size);
> +g_free(proc_path);
> +return ret;
> +}
> +
> +ssize_t fremovexattrat_nofollow(int dirfd, const char *filename,
> +const char *name)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = lremovexattr(proc_path, name);
> +g_free(proc_path);
> +return ret;
> +}
> +
> +int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
> + void *value, size_t size, int flags)
> +{
> +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> +int ret;
> +
> +ret = lsetxattr(proc_path, name, value, size, flags);
> +g_free(proc_path);
> +return ret;
> +}
> diff --git a/hw/9pfs/9p-util.c b/hw/9pfs/9p-util.c
> deleted file mode 100644
> index 614b7fc..000
> --- a/hw/9pfs/9p-util.c
> +++ /dev/null
> @@ -1,59 +0,0 @@
> -/*
> - * 9p utilities
> - *
> - * Copyright IBM, Corp. 2017
> - *
> - * Authors:
> - *  Greg Kurz 
> - *
> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
> - * See the COPYING file in the top-level directory.
> - */
> -
> -#include "qemu/osdep.h"
> -#include "qemu/xattr.h"
> -#include "9p-util.h"
> -
> -ssize_t fgetxattrat_nofollow(int dirfd, const char *filename, const char 
> *name,
> - void *value, size_t size)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = lgetxattr(proc_path, name, value, size);
> -g_free(proc_path);
> -return ret;
> -}
> -
> -ssize_t flistxattrat_nofollow(int dirfd, const char *filename,
> -  char *list, size_t size)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = llistxattr(proc_path, list, size);
> -g_free(proc_path);
> -return ret;
> -}
> -
> -ssize_t fremovexattrat_nofollow(int dirfd, const char *filename,
> -const char *name)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = lremovexattr(proc_path, name);
> -g_free(proc_path);
> -return ret;
> -}
> -
> -int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
> - void *value, size_t size, int flags)
> -{
> -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, 
> filename);
> -int ret;
> -
> -ret = lsetxattr(proc_path, name, value, size, flags);
> -g_free(proc_path);
> -return ret;
> -}
> diff --git a/hw/9pfs/Makefile.objs b/hw/9pfs/Makefile.objs
> index fd90b62..083508f 100644
> --- a/hw/9pfs/Makefile.objs
> +++ b/hw/9pfs/Makefile.objs
> @@ -1,4 +1,5 @@
> -common-obj-y  = 9p.o 9p-util.o
> +common-obj-y  = 9p.o
> +common-obj-$(CONFIG_LINUX) += 9p-util-linux.o
>  common-obj-y += 9p-local.o 9p-xattr.o
>  common-obj-y += 9p-xattr-user.o 9p-posix-acl.o
>  common-obj-y += coth.o cofs.o codir.o cofile.o




Re: [Qemu-devel] [PATCH v2 09/20] 9p: Properly check/translate flags in unlinkat

2018-06-01 Thread Greg Kurz
On Thu, 31 May 2018 21:26:04 -0400
Keno Fischer  wrote:

> This code previously relied on P9_DOTL_AT_REMOVEDIR and AT_REMOVEDIR
> having the same numerical value and deferred any errorchecking to the
> syscall itself. However, while the former assumption is true on Linux,
> it is not true in general. Thus, add appropriate error checking and
> translation to the 9p unlinkat server code.
> 
> Signed-off-by: Keno Fischer 
> ---
> 

Looks good but handle_unlinkat() needs to be adapted to this change.
Other backends (proxy and synth) seem to ignore the flags.

> Changes since v1:
>  * Code was moved from 9p-local.c to server entry point in 9p.c
> 
>  hw/9pfs/9p.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index b80db65..a757374 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -2522,7 +2522,7 @@ static void coroutine_fn v9fs_unlinkat(void *opaque)
>  {
>  int err = 0;
>  V9fsString name;
> -int32_t dfid, flags;
> +int32_t dfid, flags, rflags = 0;
>  size_t offset = 7;
>  V9fsPath path;
>  V9fsFidState *dfidp;
> @@ -2549,6 +2549,15 @@ static void coroutine_fn v9fs_unlinkat(void *opaque)
>  goto out_nofid;
>  }
>  
> +if (flags & ~P9_DOTL_AT_REMOVEDIR) {
> +err = -EINVAL;
> +goto out_nofid;
> +}
> +
> +if (flags & P9_DOTL_AT_REMOVEDIR) {
> +rflags |= AT_REMOVEDIR;
> +}
> +
>  dfidp = get_fid(pdu, dfid);
>  if (dfidp == NULL) {
>  err = -EINVAL;
> @@ -2567,7 +2576,7 @@ static void coroutine_fn v9fs_unlinkat(void *opaque)
>  if (err < 0) {
>  goto out_err;
>  }
> -err = v9fs_co_unlinkat(pdu, &dfidp->path, &name, flags);
> +err = v9fs_co_unlinkat(pdu, &dfidp->path, &name, rflags);
>  if (!err) {
>  err = offset;
>  }




[Qemu-devel] virtio-vsock feature has no TCG (non-KVM) support

2018-06-01 Thread Artem Pisarenko
Please, add important note to https://wiki.qemu.org/Features/VirtioVsock page,
that this feature only supported in KVM accelerated mode. It's not obvious.
Furthermore, it isn't checked by qemu when invoking with "-device
vhost-vsock-pci,..." and user encounters this only when communicating (via
AF_VSOCK) application fails to connect() with weird "Connection timed out"
error.
-- 

С уважением,
  Артем Писаренко


Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-01 Thread Igor Mammedov
On Fri, 1 Jun 2018 08:17:12 +
xuyandong  wrote:

> Hi there,
> 
> I am doing some test on qemu vcpu hotplug and I run into some trouble.
> An emulation failure occurs and qemu prints the following msg:
> 
> KVM internal error. Suberror: 1
> emulation failure
> EAX= EBX= ECX= EDX=0600
> ESI= EDI= EBP= ESP=fff8
> EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =   9300
> CS =f000 000f  9b00
> SS =   9300
> DS =   9300
> FS =   9300
> GS =   9300
> LDT=   8200
> TR =   8b00if
> GDT=  
> IDT=  
> CR0=6010 CR2= CR3= CR4=
> DR0= DR1= DR2= 
> DR3=
> DR6=0ff0 DR7=0400
> EFER=
> Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3  66 68 
> 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 
> c6
> 
> I notice that guest is still running SeabBIOS in real mode when the vcpu has 
> just been pluged.
> This emulation failure can be steadly reproduced if I am doing vcpu hotplug 
> during VM launch process.
> After some digging, I find this KVM internal error shows up because KVM 
> cannot emulate some MMIO (gpa 0xfff53 ).
> 
> So I am confused,
> (1) does qemu support vcpu hotplug even if guest is running seabios ?
There is no code that forbids it, and I would expect it not to trigger error
and be NOP.

> (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm confirm 
> it as a mmio address incorrectly?
KVM trace and bios debug log might give more information to guess where to look
or even better would be to debug Seabios and find out what exactly
goes wrong if you could do it.




Re: [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers

2018-06-01 Thread Igor Mammedov
On Thu, 17 May 2018 10:15:19 +0200
David Hildenbrand  wrote:

maybe subj: make hotplug handlers use local_error
> For multi stage hotplug handlers, we'll have to do some error handling
> in some hotplug functions, so let's use a local error variable (except
> for unplug requests).


> 
> Also, add code to pass control to the final stage hotplug handler at the
> parent bus.
doing several not related things in one patch doesn't help reviewing it.
Also as explained 04/14 it's not needed at all.
Could you try to keep patches minimal,
we can add more complexity in later revisions if it really necessary.

 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/ppc/spapr.c | 54 +++---
>  1 file changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ebf30dd60b..b7c5c95f7a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  {
>  MachineState *ms = MACHINE(hotplug_dev);
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> +Error *local_err = NULL;
>  
> +/* final stage hotplug handler */
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  int node;
>  
>  if (!smc->dr_lmb_enabled) {
> -error_setg(errp, "Memory hotplug not supported for this 
> machine");
> -return;
> +error_setg(&local_err,
> +   "Memory hotplug not supported for this machine");
> +goto out;
>  }
> -node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, 
> errp);
> -if (*errp) {
> -return;
> +node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> +&local_err);
> +if (local_err) {
> +goto out;
>  }
>  if (node < 0 || node >= MAX_NODES) {
> -error_setg(errp, "Invaild node %d", node);
> -return;
> +error_setg(&local_err, "Invaild node %d", node);
> +goto out;
>  }
>  
> -spapr_memory_plug(hotplug_dev, dev, node, errp);
> +spapr_memory_plug(hotplug_dev, dev, node, &local_err);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -spapr_core_plug(hotplug_dev, dev, errp);
> +spapr_core_plug(hotplug_dev, dev, &local_err);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, 
> &local_err);
> +}
> +out:
> +error_propagate(errp, local_err);
> +}
> +
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error **errp)
> +{
> +Error *local_err = NULL;
> +
> +/* final stage hotplug handler */
> +if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> +   &local_err);
>  }
> +error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3618,17 +3639,27 @@ static void 
> spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>  return;
>  }
>  spapr_core_unplug_request(hotplug_dev, dev, errp);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> +   errp);
>  }
>  }
>  
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> +Error *local_err = NULL;
> +
> +/* final stage hotplug handler */
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -spapr_core_pre_plug(hotplug_dev, dev, errp);
> +spapr_core_pre_plug(hotplug_dev, dev, &local_err);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> + &local_err);
>  }
> +error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
> @@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
>  mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>  hc->unplug_request = spapr_machine_device_unplug_request;
> +hc->unplug = spapr_machine_device_unplug;
>  
>  smc->dr_lmb_enabled = true;
>  mc->d

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

2018-06-01 Thread Peter Xu
On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote:
> On 06/01/2018 12:58 PM, Peter Xu wrote:
> > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > This is the deivce part implementation to add a new feature,
> > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > receives the guest free page hints from the driver and clears the
> > > corresponding bits in the dirty bitmap, so that those free pages are
> > > not transferred by the migration thread to the destination.
> > > 
> > > - Test Environment
> > >  Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > >  Guest: 8G RAM, 4 vCPU
> > >  Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 
> > > second
> > > 
> > > - Test Results
> > >  - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > >  - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > >  - Guest with Linux Compilation Workload (make bzImage -j4):
> > >  - Live Migration Time (average)
> > >Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% 
> > > reduction
> > >  - Linux Compilation Time
> > >Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > >--> no obvious difference
> > > 
> > > - Source Code
> > >  - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > >  - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > Hi, Wei,
> > 
> > I have a very high-level question to the series.
> 
> Hi Peter,
> 
> Thanks for joining the discussion :)

Thanks for letting me know this thread.  It's an interesting idea. :)

> 
> > 
> > IIUC the core idea for this series is that we can avoid sending some
> > of the pages if we know that we don't need to send them.  I think this
> > is based on the fact that on the destination side all the pages are by
> > default zero after they are malloced.  While before this series, IIUC
> > any migration will send every single page to destination, no matter
> > whether it's zeroed or not.  So I'm uncertain about whether this will
> > affect the received bitmap on the destination side.  Say, before this
> > series, the received bitmap will directly cover the whole RAM bitmap
> > after migration is finished, now it's won't.  Will there be any side
> > effect?  I don't see obvious issue now, but just raise this question
> > up.
> 
> This feature currently only supports pre-copy (I think the received bitmap
> is something matters to post copy only).
> That's why we have
> rs->free_page_support = ..&& !migrate_postcopy();

Okay.

> 
> > Meanwhile, this reminds me about a more funny idea: whether we can
> > just avoid sending the zero pages directly from QEMU's perspective.
> > In other words, can we just do nothing if save_zero_page() detected
> > that the page is zero (I guess the is_zero_range() can be fast too,
> > but I don't know exactly how fast it is)?  And how that would be
> > differed from this page hinting way in either performance and other
> > aspects.
> 
> I guess you referred to the zero page optimization. I think the major
> overhead comes to the zero page checking - lots of memory accesses, which
> also waste memory bandwidth. Please see the results attached in the cover
> letter. The legacy case already includes the zero page optimization.

I replied in the other thread.  We can discuss there altogether.

Actually after a second thought I think maybe what I worried there is
exactly the reason why we must send the zero page flag - otherwise
there can be stale non-zero page on destination.  Here "zero page" and
"freed page" is totally different idea since even if a page is zeroed
it might still be in use (not freed)!  While instead for a "free page"
even if it's non-zero we might be able to not send it at all, though I
am not sure whether that mismatch of data might cause any side effect
too. I think the corresponding question would be: if a page is freed
in Linux kernel, would its data matter any more?

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [RFC 2/3] hw/char/nrf51_uart: Implement nRF51 SoC UART

2018-06-01 Thread Stefan Hajnoczi
On Thu, May 31, 2018 at 2:58 PM, sundeep subbaraya
 wrote:
> On Wed, May 30, 2018 at 3:33 AM, Julia Suvorova via Qemu-devel
>  wrote:
>> +static uint64_t uart_read(void *opaque, hwaddr addr, unsigned int size)
>> +{
>> +Nrf51UART *s = NRF51_UART(opaque);
>> +uint64_t r;
>> +
>> +switch (addr) {
>> +case A_RXD:
>> +r = s->rx_fifo[s->rx_fifo_pos];
>> +if (s->rx_fifo_len > 0) {
>> +s->rx_fifo_pos = (s->rx_fifo_pos + 1) % UART_FIFO_LENGTH;
>> +s->rx_fifo_len--;
>> +qemu_chr_fe_accept_input(&s->chr);
>> +}
>> +break;
>> +
>> +case A_INTENSET:
>> +case A_INTENCLR:
>> +case A_INTEN:
>> +r = s->reg[A_INTEN];
>> +break;
>> +default:
>> +r = s->reg[addr];
>
> You can use R_* macros for registers and access regs[ ] with addr/4 as index.
> It is better than using big regs[ ] array out of which most of
> locations go unused.

Good point.  The bug is more severe than an inefficiency.
s->reg[addr] allows out-of-bounds accesses.  This is a security bug.

The memory region is 0x1000 *bytes* long, but the array has 0x1000
32-bit *elements*.  A read from address 0xfffc results in a memory
load from s->reg + 0xfffc * sizeof(s->reg[0]).  That's beyond the end
of the array!

s->reg[A_*] should be changed to s->reg[R_*].  s->reg[addr] needs to
be s->reg[addr / sizeof(s->reg[0])].

It may be worth adding a warning to scripts/checkpatch.pl for
array[A_*] so this bug is reported automatically in the future.

Stefan



Re: [Qemu-devel] [RFC 2/3] hw/char/nrf51_uart: Implement nRF51 SoC UART

2018-06-01 Thread Stefan Hajnoczi
On Fri, Jun 1, 2018 at 11:41 AM, Stefan Hajnoczi  wrote:
> On Thu, May 31, 2018 at 2:58 PM, sundeep subbaraya
>  wrote:
>> On Wed, May 30, 2018 at 3:33 AM, Julia Suvorova via Qemu-devel
>>  wrote:
>>> +static uint64_t uart_read(void *opaque, hwaddr addr, unsigned int size)
>>> +{
>>> +Nrf51UART *s = NRF51_UART(opaque);
>>> +uint64_t r;
>>> +
>>> +switch (addr) {
>>> +case A_RXD:
>>> +r = s->rx_fifo[s->rx_fifo_pos];
>>> +if (s->rx_fifo_len > 0) {
>>> +s->rx_fifo_pos = (s->rx_fifo_pos + 1) % UART_FIFO_LENGTH;
>>> +s->rx_fifo_len--;
>>> +qemu_chr_fe_accept_input(&s->chr);
>>> +}
>>> +break;
>>> +
>>> +case A_INTENSET:
>>> +case A_INTENCLR:
>>> +case A_INTEN:
>>> +r = s->reg[A_INTEN];
>>> +break;
>>> +default:
>>> +r = s->reg[addr];
>>
>> You can use R_* macros for registers and access regs[ ] with addr/4 as index.
>> It is better than using big regs[ ] array out of which most of
>> locations go unused.
>
> Good point.  The bug is more severe than an inefficiency.
> s->reg[addr] allows out-of-bounds accesses.  This is a security bug.
>
> The memory region is 0x1000 *bytes* long, but the array has 0x1000
> 32-bit *elements*.  A read from address 0xfffc results in a memory
> load from s->reg + 0xfffc * sizeof(s->reg[0]).  That's beyond the end
> of the array!

Sorry, I was wrong.  The array is large enough after all.  It's just
an inefficiency, but still worth fixing.  Similar issues could lead to
out-of-bound accesses.

Stefan



Re: [Qemu-devel] [RFC] Intermediate block mirroring

2018-06-01 Thread Alberto Garcia
On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote:
>> > Were the (more or less) exact requirements of QMP blockdev-reopen
>> > discussed? How is it different from qemu-io's "reopen" command?
>> > What are the options that you can and can not change?
>> 
>> I can't quite remember, I'm afraid.  I think it was supposed to be
>> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
>> cannot change the driver (obviously) or probably the node name, because
>> either would result in the node being replaced by a completely new one.
>> 
>> Other than that, it probably depends on what the block driver supports,
>> but ideally you should be able to change everything.
>
> Honestly the design of bdrv_reopen() is quite broken because of the
> way it tries to maintain old options if they aren't specified, and
> guesses what you might mean when you add flags to the mix. The exact
> semantics are quite complicated and I'd rather avoid them in a stable
> API.
>
> A clean QMP command would probably apply the same defaults as
> blockdev-add, so you just get to specify the full options again.

I have a prototype of this working and almost ready to be published, but
there's a tricky thing with this part:

If we want blockdev-reopen to apply the defaults for all options except
from the ones expliclity specified by the user, then it means that we
need to check not just the options that are present, but also the ones
that are omitted.

For example:

   { "execute": "blockdev-add",
 "arguments": { "driver": "null-aio",
"node-name": "root",
"size": 1024 }

This adds a null-aio block device with the "size" option set to 1024
(the default is 1 << 30).

null_reopen_prepare() allows reopening that block device, but it does
not allow changing any of its options. Attempting to change the value of
"size" is detected by the loop that checks unhandled options at the end
of bdrv_reopen_prepare() and returns "Cannot change the option 'size'".

So far, so good. We have this generic check for all options that works
with all drivers, so as long as we only specify options that we know
that can be changed, everything is fine.

However if we want blockdev-reopen to apply the default values for all
omitted options, then omitting "size" would be equivalent to setting it
to its default value (1 << 30). And if "size" cannot be changed then
QEMU should complain unless we explicitly set "size" to 1024 again on
reopen.

This complicates things a bit, because we would go from "the options
that can't be changed are the ones that are not handled by each driver's
_prepare() function" to "options that are absent can also produce an
error".

Berto



Re: [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain

2018-06-01 Thread Igor Mammedov
On Thu, 17 May 2018 10:15:21 +0200
David Hildenbrand  wrote:

> Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/
> unplug memory devices (which a pc-dimm is) later.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/ppc/spapr.c | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2f315f963b..286c38c842 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3291,7 +3291,8 @@ static sPAPRDIMMState 
> *spapr_recover_pending_dimm_state(sPAPRMachineState *ms,
>  /* Callback to be called during DRC release. */
>  void spapr_lmb_release(DeviceState *dev)
>  {
> -sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl);
>  sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, 
> PC_DIMM(dev));
>  
>  /* This information will get lost if a migration occurs
> @@ -3309,9 +3310,21 @@ void spapr_lmb_release(DeviceState *dev)
>  
>  /*
>   * Now that all the LMBs have been removed by the guest, call the
> - * pc-dimm unplug handler to cleanup up the pc-dimm device.
> + * unplug handler chain. This can never fail.
>   */
> -pc_dimm_memory_unplug(dev, MACHINE(spapr));
> +hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState 
> *dev,
> +Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, 
> PC_DIMM(dev));

> +g_assert(ds);
> +g_assert(!ds->nr_lmbs);
Theses 2 lines seems to unrelated to patch topic,
could you drop it?

if these values should be checked, it would be better to audit 'ds' use
across spapr.c and file separate patch  separately from this series.

> +pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev));
>  object_unparent(OBJECT(dev));
>  spapr_pending_dimm_unplugs_remove(spapr, ds);
>  }
> @@ -3608,7 +3621,9 @@ static void spapr_machine_device_unplug(HotplugHandler 
> *hotplug_dev,
>  Error *local_err = NULL;
>  
>  /* final stage hotplug handler */
> -if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +spapr_memory_unplug(hotplug_dev, dev, &local_err);
> +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>  hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> &local_err);
>  }
otherwise, ignoring dev->parent_bus parts, patch looks reasonable



Re: [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core unplug via hotplug handler chain

2018-06-01 Thread Igor Mammedov
On Thu, 17 May 2018 10:15:22 +0200
David Hildenbrand  wrote:

> Let's handle it via hotplug_handler_unplug().
> 
> Signed-off-by: David Hildenbrand 
Acked-by: Igor Mammedov 

> ---
>  hw/ppc/spapr.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 286c38c842..13d153b5a6 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3412,7 +3412,16 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState 
> *cs, int *fdt_offset,
>  /* Callback to be called during DRC release. */
>  void spapr_core_release(DeviceState *dev)
>  {
> -MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev));
> +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +
> +/* Call the unplug handler chain. This can never fail. */
> +hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +  Error **errp)
> +{
> +MachineState *ms = MACHINE(hotplug_dev);
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>  CPUCore *cc = CPU_CORE(dev);
>  CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
> @@ -3623,6 +3632,8 @@ static void spapr_machine_device_unplug(HotplugHandler 
> *hotplug_dev,
>  /* final stage hotplug handler */
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  spapr_memory_unplug(hotplug_dev, dev, &local_err);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +spapr_core_unplug(hotplug_dev, dev, &local_err);
>  } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>  hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> &local_err);




  1   2   3   4   >