Re: [Qemu-devel] [PATCH] linux-user: Add binfmt wrapper
On Tue, Jul 15, 2014 at 05:11:48PM +0200, Joakim Tjernlund wrote: Riku Voipio riku.voi...@iki.fi wrote on 2014/07/15 16:12:26: On Mon, Jul 14, 2014 at 05:38:49PM +0200, Joakim Tjernlund wrote: Alexander Graf ag...@suse.de wrote on 2014/07/14 17:21:33: On 14.07.14 16:38, Joakim Tjernlund wrote: The popular binfmt-wrapper patch adds an additional executable which mangle argv suitable for binfmt flag P. In a chroot you need the both (statically linked) qemu-$arch and qemu-$arch-binfmt-wrapper. This is sub optimal and a better approach is to recognize the -binfmt-wrapper extension within linux-user(qemu-$arch) and mangle argv there. This just produces on executable which can be either copied to the chroot or bind mounted with the appropriate -binfmt-wrapper suffix. Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se Please make sure to CC Riku on patches like this - he is the linux-user maintainer. Doesn't he read the devel list? Anyhow CC:ed I do - but CC gets directly to my inbox while qemu-devel goes to an folder. I take from this discussion, that this patch has been superceded by the Patch at: http://patchwork.ozlabs.org/patch/369770/ ? BTW, any chance qemu binfmt could fixup the ps output from within a container: jocke-ppc2 ~ # ps uaxww USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1 0.1 0.0 4138016 7600 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /sbin/init /sbin/init root79 0.0 0.0 4138016 5792 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /usr/sbin/sshd /usr/sbin/sshd root 293 0.0 0.0 4137952 4072 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /bin/busybox /bin/busybox udhcpc -x hostname:jocke-ppc2 --interface=eth0 --now --script=/lib/netifrc/sh/udhcpc-hook.sh --pidfile=/var/run/udhcpc-eth0.pid root 334 0.3 0.0 4138016 5964 tty1 Ss+ 17:02 0:00 /usr/bin/qemu-ppc /sbin/agetty /sbin/agetty 38400 tty1 linux root 335 3.1 0.0 4138048 7064 console Ss 17:02 0:00 /usr/bin/qemu-ppc /bin/login /bin/login -- root root 336 3.3 0.0 4138016 9764 console S17:02 0:00 /usr/bin/qemu-ppc /bin/bash -bash root 340 0.0 0.0 4138016 6336 ?R+ Jul10 0:00 /bin/ps ps uaxww As you can see, qemu-ppc is visible. This isn't something binfmt could do. ps uses /proc/$pid/exe to map the right binary. However, qemu already fixes /proc/self/exe to point to right file - as you can see from the last line where ps uaxww doesn't have qemu shown. So it would be possible to make do_open() in syscall.c do similar mapping for /proc/$pid/exe. Exactly how and when the qemu processes should be hidden within qemu needs careful thought thou. A naive approach would check if /proc/$pid/exe points to same binary as /proc/self/exe, hiding at least same architecture qemu processes. Riku
Re: [Qemu-devel] [PATCH] cadence_uart: check for serial backend before using it.
On 16/07/2014 05:48, Peter Crosthwaite wrote: On Wed, Jul 16, 2014 at 1:18 AM, fred.kon...@greensocs.com wrote: From: KONRAD Frederic fred.kon...@greensocs.com This checks that s-chr is not NULL before using it. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- hw/char/cadence_uart.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c index dbbc167..a5736cb 100644 --- a/hw/char/cadence_uart.c +++ b/hw/char/cadence_uart.c @@ -175,8 +175,10 @@ static void uart_send_breaks(UartState *s) { int break_enabled = 1; -qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_BREAK, - break_enabled); +if (s-chr) { +qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_BREAK, + break_enabled); +} } static void uart_parameters_setup(UartState *s) @@ -227,7 +229,9 @@ static void uart_parameters_setup(UartState *s) packet_size += ssp.data_bits + ssp.stop_bits; s-char_tx_time = (get_ticks_per_sec() / ssp.speed) * packet_size; -qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_PARAMS, ssp); +if (s-chr) { +qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_PARAMS, ssp); +} } static int uart_can_receive(void *opaque) @@ -295,6 +299,7 @@ static gboolean cadence_uart_xmit(GIOChannel *chan, GIOCondition cond, /* instant drain the fifo when there's no back-end */ if (!s-chr) { s-tx_count = 0; +return FALSE; Is this needed? With s-tx_count forced to 0 in the NULL dev case, won't the next if trigger immediately and return? True, I'll make this change and resend. Thanks, Fred 300 if (!s-tx_count) { 301 return FALSE; 302 } With this hunk dropped, Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com And recommended for 2.1. Regards, Peter } if (!s-tx_count) { @@ -375,7 +380,9 @@ static void uart_read_rx_fifo(UartState *s, uint32_t *c) *c = s-rx_fifo[rx_rpos]; s-rx_count--; -qemu_chr_accept_input(s-chr); +if (s-chr) { +qemu_chr_accept_input(s-chr); +} } else { *c = 0; } -- 1.9.0
Re: [Qemu-devel] [PATCH] linux-user: Add binfmt wrapper
Riku Voipio riku.voi...@iki.fi wrote on 2014/07/16 08:54:45: On Tue, Jul 15, 2014 at 05:11:48PM +0200, Joakim Tjernlund wrote: Riku Voipio riku.voi...@iki.fi wrote on 2014/07/15 16:12:26: On Mon, Jul 14, 2014 at 05:38:49PM +0200, Joakim Tjernlund wrote: Alexander Graf ag...@suse.de wrote on 2014/07/14 17:21:33: On 14.07.14 16:38, Joakim Tjernlund wrote: The popular binfmt-wrapper patch adds an additional executable which mangle argv suitable for binfmt flag P. In a chroot you need the both (statically linked) qemu-$arch and qemu-$arch-binfmt-wrapper. This is sub optimal and a better approach is to recognize the -binfmt-wrapper extension within linux-user(qemu-$arch) and mangle argv there. This just produces on executable which can be either copied to the chroot or bind mounted with the appropriate -binfmt-wrapper suffix. Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se Please make sure to CC Riku on patches like this - he is the linux-user maintainer. Doesn't he read the devel list? Anyhow CC:ed I do - but CC gets directly to my inbox while qemu-devel goes to an folder. I take from this discussion, that this patch has been superceded by the Patch at: http://patchwork.ozlabs.org/patch/369770/ ? BTW, any chance qemu binfmt could fixup the ps output from within a container: jocke-ppc2 ~ # ps uaxww USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1 0.1 0.0 4138016 7600 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /sbin/init /sbin/init root79 0.0 0.0 4138016 5792 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /usr/sbin/sshd /usr/sbin/sshd root 293 0.0 0.0 4137952 4072 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /bin/busybox /bin/busybox udhcpc -x hostname:jocke-ppc2 --interface=eth0 --now --script=/lib/netifrc/sh/udhcpc-hook.sh --pidfile=/var/run/udhcpc-eth0.pid root 334 0.3 0.0 4138016 5964 tty1 Ss+ 17:02 0:00 /usr/bin/qemu-ppc /sbin/agetty /sbin/agetty 38400 tty1 linux root 335 3.1 0.0 4138048 7064 console Ss 17:02 0:00 /usr/bin/qemu-ppc /bin/login /bin/login -- root root 336 3.3 0.0 4138016 9764 console S17:02 0:00 /usr/bin/qemu-ppc /bin/bash -bash root 340 0.0 0.0 4138016 6336 ?R+ Jul10 0:00 /bin/ps ps uaxww As you can see, qemu-ppc is visible. This isn't something binfmt could do. ps uses /proc/$pid/exe to map the Why not? I think it is the perfekt place to do it in Linux binfmt code. All the other interp(ELF ld.so and normal bash scripts) do it. Fixing this with no support from Linux amounts to hacks like: /* * Munge our argv so it will look like it would * if started without linux-user */ if (execfd 0) { unsigned long len; char *p = argv[0]; for (i = 0; i target_argc; i++, p += len) { len = strlen(target_argv[i]) + 1; memcpy(p, target_argv[i], len); } len = *envp - p; memset(p, 0, len); } and it is still not perfekt, apps like ps and top still does not work. Linux binfmt code should at least pass a correct argv to us BTW, You forgot: [PATCH 4/4] ppc: remove excessive logging
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. Marcelo, the original reason for patch adding cpu_synchronize_all_states() there was because this bug affected non-migration operations as well - http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html. Won't moving it only to migration code break these things again? diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include qemu/host-utils.h #include sysemu/sysemu.h #include sysemu/kvm.h -#include sysemu/cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); -assert(time.tsc_timestamp = migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift 0) { delta = -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s-clock_valid) { return; } - -cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data); if (ret 0) { fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); +cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret = 0) { qemu_file_set_rate_limit(s-file, INT64_MAX); -- mg
[Qemu-devel] Help on vm (winxp sp3) having no sound
Hi all, I have one vm (winxp sp3), but it has no sound. I was using intel hda, but the guest could not find the driver for Audio Device on High Definition Audio Bus. The vm is created by qemu-kvm on host which is centos6.4. The command is: qemu-system-x86_64 -enable-kvm -device intel-hda,id=sound0 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 ... Thank you for any help.
[Qemu-devel] [PATCH for-2.1] cadence_uart: check serial backend before using it.
From: KONRAD Frederic fred.kon...@greensocs.com Segfault occurs when there are less than two serial backends with zynq platform. This checks that s-chr is not NULL before using it. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com --- hw/char/cadence_uart.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c index dbbc167..958b6ac 100644 --- a/hw/char/cadence_uart.c +++ b/hw/char/cadence_uart.c @@ -175,8 +175,10 @@ static void uart_send_breaks(UartState *s) { int break_enabled = 1; -qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_BREAK, - break_enabled); +if (s-chr) { +qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_BREAK, + break_enabled); +} } static void uart_parameters_setup(UartState *s) @@ -227,7 +229,9 @@ static void uart_parameters_setup(UartState *s) packet_size += ssp.data_bits + ssp.stop_bits; s-char_tx_time = (get_ticks_per_sec() / ssp.speed) * packet_size; -qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_PARAMS, ssp); +if (s-chr) { +qemu_chr_fe_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_PARAMS, ssp); +} } static int uart_can_receive(void *opaque) @@ -375,7 +379,9 @@ static void uart_read_rx_fifo(UartState *s, uint32_t *c) *c = s-rx_fifo[rx_rpos]; s-rx_count--; -qemu_chr_accept_input(s-chr); +if (s-chr) { +qemu_chr_accept_input(s-chr); +} } else { *c = 0; } -- 1.9.0
Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging
On Sat, Jul 12, 2014 at 04:06:09PM +0200, Joakim Tjernlund wrote: Alexander Graf ag...@suse.de wrote on 2014/07/12 12:41:05: On 12.07.14 12:40, Peter Maydell wrote: On 12 July 2014 10:39, Alexander Graf ag...@suse.de wrote: On 12.07.14 10:58, Peter Maydell wrote: On 12 July 2014 01:39, Alexander Graf ag...@suse.de wrote: What do the other platforms do on illegal instructions during user mode? Any way we can get consistency across the board? Mostly it looks like they just silently generate the SIGILL. Consistency has never been our strong point :-) That means this patch brings things towards consistency? It's certainly good for me then :) No, this just removes one use of this logging. If you wanted consistency we should remove all of them... Agreed :) So can I infer from this discussion that you will apply the patch? I think Peter and Alex suggest that the EXCP_DUMP() loggings should be removed from all cases in PPC code where TARGET_SIGILL is risen. Your patch removes just once case, and that would make PPC code become internally inconsistent where some SIGILLs are logged and others aren't. Even more dramatically, we remove the whole EXCP_DUMP and users since none of the other archs output anything for SIGFPE/SIGSEGV either. After all, the Linux kernel (ppc or others) doesn't output anything either. And it is the behaviour of linux we try to match. Riku
Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging
Riku Voipio riku.voi...@iki.fi wrote on 2014/07/16 09:55:50: From: Riku Voipio riku.voi...@iki.fi To: Joakim Tjernlund joakim.tjernl...@transmode.se, Cc: Alexander Graf ag...@suse.de, Peter Maydell peter.mayd...@linaro.org, qemu-...@nongnu.org qemu-...@nongnu.org, QEMU Developers qemu-devel@nongnu.org Date: 2014/07/16 09:55 Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging On Sat, Jul 12, 2014 at 04:06:09PM +0200, Joakim Tjernlund wrote: Alexander Graf ag...@suse.de wrote on 2014/07/12 12:41:05: On 12.07.14 12:40, Peter Maydell wrote: On 12 July 2014 10:39, Alexander Graf ag...@suse.de wrote: On 12.07.14 10:58, Peter Maydell wrote: On 12 July 2014 01:39, Alexander Graf ag...@suse.de wrote: What do the other platforms do on illegal instructions during user mode? Any way we can get consistency across the board? Mostly it looks like they just silently generate the SIGILL. Consistency has never been our strong point :-) That means this patch brings things towards consistency? It's certainly good for me then :) No, this just removes one use of this logging. If you wanted consistency we should remove all of them... Agreed :) So can I infer from this discussion that you will apply the patch? I think Peter and Alex suggest that the EXCP_DUMP() loggings should be removed from all cases in PPC code where TARGET_SIGILL is risen. Your patch removes just once case, and that would make PPC code become internally inconsistent where some SIGILLs are logged and others aren't. Something like that. This is one step in that direction. We(or I cannot) do the consistency for all cases/arches at once. With the patch we become one step closer to the Linux kernel so I don't see why not apply it. Even more dramatically, we remove the whole EXCP_DUMP and users since none of the other archs output anything for SIGFPE/SIGSEGV either. After all, the Linux kernel (ppc or others) doesn't output anything either. And it is the behaviour of linux we try to match. hmm, not clear to me what you mean here Jocke
Re: [Qemu-devel] Bogus struct stat64 for qemu-microblaze (user emulation)?
On 16 July 2014 05:02, Rich Felker dal...@libc.org wrote: The qemu-microblaze definition of struct stat64 seems to mismatch the kernel definition, which is using asm-generic/stat.h. See: http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall_defs.h;h=c9e6323905486452f518102bf40ba73143c9d601;hb=HEAD#l1469 http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall.c;h=a50229d0d72fc68966515fcf2bc308b833a3c032;hb=HEAD#l4949 This seems to be causing a truncated-to-32-bit inode number to be stored in the location where st_ino should reside, and a spurious copy of the inode number to be written in a unused slot at the end of the structure. Sounds quite plausible -- we've had issues with other archs not having correct stat struct definitions in QEMU. I don't suppose anybody's done much testing of the microblaze linux-user code. Is my analysis correct? Stefan Kristiansson and I found this while working on the or1k port of musl libc, where it seems our structure for the existing microblaze port is wrongly aligned with the qemu definition rather than the definition the real kernel is using. Before I try correcting this on our side, I want to make sure we're working with the right version. I would definitely trust the kernel definition, not QEMU's! thanks -- PMM
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 15/07/2014 23:25, Andrey Korolyov ha scritto: On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote: On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com wrote: On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote: Hello, the issue is not specific to the iothread code because generic virtio-blk also hangs up: Do you know which version works well? If you could bisect, that'll help a lot. Thanks, Amit Hi, 2.0 works definitely well. I`ll try to finish bisection today, though every step takes about 10 minutes to complete. Yay! It is even outside of virtio-blk. commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 Author: Marcelo Tosatti mtosa...@redhat.com Date: Tue Jun 3 13:34:48 2014 -0300 kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec calculation Ensure proper env-tsc value for kvmclock_current_nsec calculation. Reported-by: Marcin Gibuła m.gib...@beyond.pl Cc: qemu-sta...@nongnu.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include qemu/host-utils.h #include sysemu/sysemu.h #include sysemu/kvm.h -#include sysemu/cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); -assert(time.tsc_timestamp = migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift 0) { delta = -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s-clock_valid) { return; } - -cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data); if (ret 0) { fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); +cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret = 0) { qemu_file_set_rate_limit(s-file, INT64_MAX); It could also be useful to apply the above patch _and_ revert a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce. Paolo Yes, it solved the issue for me! (it took much time to check because most of country` debian mirrors went inconsistent by some reason) Also trivial addition: diff --git a/migration.c b/migration.c index 34f2325..65d1c88 100644 --- a/migration.c +++ b/migration.c @@ -25,6 +25,7 @@ #include qemu/thread.h #include qmp-commands.h #include trace.h +#include sysemu/cpus.h And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ? That is, test with a stock qemu.git tree and the patch sent today, on this thread, to move cpu_synchronize_all_states ? The main reason for things to work for me is a revert of 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other patches. I had tested two cases, with Alexander`s patch completely reverted plus suggestion from Marcelo and only with revert 9b178682 plug same suggestion. The difference is that the until Alexander` patch is not reverted, live migration is always failing by the timeout value, and when reverted migration always succeeds in 8-10 seconds. Appropriate diffs are attached for the reference. diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..93e1829 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include qemu/host-utils.h #include sysemu/sysemu.h #include sysemu/kvm.h -#include sysemu/cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -36,49 +35,6 @@ typedef struct KVMClockState {
Re: [Qemu-devel] hw/arm: add Lego NXT board
On 07/15/2014 10:09 PM, Paolo Bonzini wrote: BTW, sorry for the confusion. There is another frequent contributor to QEMU with the same name as yours. I only noticed now that the email address is different. Oh right I noticed that; should have said something. 4. Now, thanks to the help on this list I know there is a the -chardev functionality in qemu that basically archives what I did by hand using pipes. Now my idea is to port my own proprietary implementation to the qemu way - chardevs. You said you think it is a bad idea to build a device that directly translates I/O memory access to a chardev. I still don't understand why. Is this all about legal issues or is there a technical reason? I think that there are two other ways to do it. And what about the chardev way? Your silence in this regard is misterious ;-) 1) [...] 2) [...] Thanks a lot for your suggestions. I will play a bit and come back when I have some code to show. Alexander
Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging
Am 16.07.2014 um 10:32 schrieb Joakim Tjernlund joakim.tjernl...@transmode.se: Riku Voipio riku.voi...@iki.fi wrote on 2014/07/16 09:55:50: From: Riku Voipio riku.voi...@iki.fi To: Joakim Tjernlund joakim.tjernl...@transmode.se, Cc: Alexander Graf ag...@suse.de, Peter Maydell peter.mayd...@linaro.org, qemu-...@nongnu.org qemu-...@nongnu.org, QEMU Developers qemu-devel@nongnu.org Date: 2014/07/16 09:55 Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging On Sat, Jul 12, 2014 at 04:06:09PM +0200, Joakim Tjernlund wrote: Alexander Graf ag...@suse.de wrote on 2014/07/12 12:41:05: On 12.07.14 12:40, Peter Maydell wrote: On 12 July 2014 10:39, Alexander Graf ag...@suse.de wrote: On 12.07.14 10:58, Peter Maydell wrote: On 12 July 2014 01:39, Alexander Graf ag...@suse.de wrote: What do the other platforms do on illegal instructions during user mode? Any way we can get consistency across the board? Mostly it looks like they just silently generate the SIGILL. Consistency has never been our strong point :-) That means this patch brings things towards consistency? It's certainly good for me then :) No, this just removes one use of this logging. If you wanted consistency we should remove all of them... Agreed :) So can I infer from this discussion that you will apply the patch? I think Peter and Alex suggest that the EXCP_DUMP() loggings should be removed from all cases in PPC code where TARGET_SIGILL is risen. Your patch removes just once case, and that would make PPC code become internally inconsistent where some SIGILLs are logged and others aren't. Something like that. This is one step in that direction. We(or I cannot) do the consistency for all cases/arches at once. With the patch we become one step closer to the Linux kernel so I don't see why not apply it. That's not how it will end up though. If we apply this one patch now, it will stay that way forever and be even more confusing and inconsistent than today. I think what we really want is proper -d int logging on all archs for linux-user. This patch is getting us nowhere close to it. Alex
Re: [Qemu-devel] hw/arm: add Lego NXT board
Il 16/07/2014 10:40, Alexander Graf ha scritto: I think that there are two other ways to do it. And what about the chardev way? Your silence in this regard is misterious ;-) The main problem with chardevs is that they are completely asynchronous. So they may not be a good match for your use case. (In fact, this is a problem for the i2c-over-chardev and especially spi-over-chardev ideas I shot out yesterday. For i2c-over-chardev one could add support to clock stretching in QEMU, but SPI is entirely synchronous). Note that -qmp and -qtest both use chardevs internally to connect the external program with QEMU. Paolo 1) [...] 2) [...] Thanks a lot for your suggestions. I will play a bit and come back when I have some code to show.
[Qemu-devel] [PULL for-2.1] virtio-serial: bugfix for virtconsole port unplug
Hi, This is a small bugfix to prevent port unplug causing port 0 getting unreserved for virtconsole use. Patch has been on-list for a few days and has been reviewed. Please pull. The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a: Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-serial.git for-2.1 for you to fetch changes up to 57d84cf35302fe51789c18354bf09a521bb603df: virtio-serial-bus: keep port 0 reserved for virtconsole even on unplug (2014-07-16 14:32:40 +0530) Amit Shah (1): virtio-serial-bus: keep port 0 reserved for virtconsole even on unplug hw/char/virtio-serial-bus.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) Amit
[Qemu-devel] [PULL for-2.1] migration: detect section renames in vmstate static checker script
Hi, A minor enhancement to the static checker script that detects section renames. One such rename was added recently (ICH9-LPC - ICH9 LPC). This doesn't affect the qemu code, and enhances a test program, so I recommend we pull it for 2.1. Patch has been on list for a few days. The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a: Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git for-2.1 for you to fetch changes up to 79fe16c0489ca658f53796206067a551fc915ba2: vmstate static checker: detect section renames (2014-07-16 14:29:34 +0530) Amit Shah (1): vmstate static checker: detect section renames scripts/vmstate-static-checker.py | 27 ++- 1 file changed, 22 insertions(+), 5 deletions(-) Amit
[Qemu-devel] [PULL for-2.1] virtio-rng: Fix abort on invalid input
Hi, This patch returns an error instead of aborting, which is desirable not just for cmdline invocation, but prevents an abort in case of device hotplug. Patch is small, and reviewed on-list. Please pull, The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a: Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-rng.git for-2.1 for you to fetch changes up to 9ef6be93250e46d35062c84d5c75c7cb515dc27c: virtio-rng: Add human-readable error message for negative max-bytes parameter (2014-07-16 14:25:29 +0530) John Snow (1): virtio-rng: Add human-readable error message for negative max-bytes parameter hw/virtio/virtio-rng.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Amit
Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging
Joakim Tjernlund writes: Alexander Graf ag...@suse.de wrote on 2014/07/12 02:39:21: On 11.07.14 20:22, Peter Maydell wrote: On 11 July 2014 19:15, Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Peter Maydell peter.mayd...@linaro.org wrote on 2014/07/11 19:14:25: On 11 July 2014 16:18, Joakim Tjernlund joakim.tjernl...@transmode.se wrote: ppc logs every type of Invalid instruction. This generates a lot Rather than just deleting this EXCP_DUMP, I would suggest changing the EXCP_DUMP macro so it only does anything if the user has passed the -d int debug logging flag: I don't think ppc wants that. They want unconditionally debug on to get relevant bug reports. This one is getting in the way of normal operations so I think it should be deleted. If the PPC maintainers want that behaviour then they need to defend it. No other architecture's linux-user code spews junk to stderr for exceptions, and PPC shouldn't either. The debug log switches are exactly for allowing us to say please turn on debug logging when bugs are reported, and those are what we should use. I agree - and it's how we behave in system emulation mode already :). What do the other platforms do on illegal instructions during user mode? Any way we can get consistency across the board? In this case it is not an illegal insn, it is a valid insn but not just for the QEMU emulated CPU. On aarch64 we do: #define unsupported_encoding(s, insn)\ do { \ qemu_log_mask(LOG_UNIMP, \ %s:%d: unsupported instruction encoding 0x%08x \ at pc=%016 PRIx64 \n, \ __FILE__, __LINE__, insn, s-pc - 4); \ unallocated_encoding(s); \ } while (0); So we signal it's unimplemented before falling through to the signal generating code. -- Alex Bennée
Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
* Paolo Bonzini (pbonz...@redhat.com) wrote: Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto: * Paolo Bonzini (pbonz...@redhat.com) wrote: Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto: From: Dr. David Alan Gilbert dgilb...@redhat.com Postcopy needs to have two migration streams loading concurrently; one from memory (with the device state) and the other from the fd with the memory transactions. Can you explain this? I would have though the order is precopy RAM and everything prepare postcopy RAM (sent dirty bitmap) finish precopy non-RAM finish devices postcopy RAM Why do you need to have all the packaging stuff and a separate memory-based migration stream for devices? I'm sure I'm missing something. :) The thing you're missing is the details of 'finish devices'. The device emulation may access guest memory as part of loading it's state, so you can't successfully complete 'finish devices' without having the 'postcopy RAM' available to provide pages. I see. Can you document the flow (preferrably as a reply to this email _and_ in docs/ when you send v2 of the code :))? I thought I documented enough in the docs/migration.txt stuff in the last patch (see the 'Postcopy states' section); however lets see if I the following is better: Postcopy stream Loading of device data may cause the device emulation to access guest RAM that may trigger faults that have to be resolved by the source, as such the migration stream has to be able to respond with page data *during* the device load, and hence the device data has to be read from the stream completely before the device load begins to free the stream up. This is acheived by 'packaging' the device data into a blob that's read in one go. Source behaviour Until postcopy is entered the migration stream is identical to normal postcopy, except for the addition of a 'postcopy advise' command at the beginning to let the destination know that postcopy might happen. When postcopy starts the source sends the page discard data and then forms the 'package' containing: Command: 'postcopy ram listen' The device state A series of sections, identical to the precopy streams device state stream containing everything except postcopiable devices (i.e. RAM) Command: 'postcopy ram run' The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the contents are formatted in the same way as the main migration stream. Destination behaviour Initially the destination looks the same as precopy, with a single thread reading the migration stream; the 'postcopy advise' and 'discard' commands are processed to change the way RAM is managed, but don't affect the stream processing. -- 1 2 3 4 5 6 7 main -DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) thread | | | (page request) |\___ v\ listen thread: --- page -- page -- page -- page -- page -- a bc -- On receipt of CMD_PACKAGED (1) All the data associated with the package - the ( ... ) section in the diagram - is read into memory (into a QEMUSizedBuffer), and the main thread recurses into qemu_loadvm_state_main to process the contents of the package (2) which contains commands (3,6) and devices (4...) On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package) a new thread (a) is started that takes over servicing the migration stream, while the main thread carries on loading the package. It loads normal background page data (b) but if during a device load a fault happens (5) the returned page (c) is loaded by the listen thread allowing the main threads device load to carry on. The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination CPUs start running. At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour and is no longer used by migration, while the listen thread carries on servicing page data until the end of migration. Is that any better? Dave P.S. I know of at least one bug in this code at the moment, it happens on a VM that doesn't have many dirty pages where all the pages are transmitted, and hence the listen thread finishes, before the main thread gets to 'run'. From my cursory read of the code it is something like this on the source: finish precopy non-RAM start RAM postcopy for each device pack up data send it to destination and on the destination: while source sends packet pick up packet atomically pass
Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
* Paolo Bonzini (pbonz...@redhat.com) wrote: Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto: From: Dr. David Alan Gilbert dgilb...@redhat.com Postcopy needs a method to send messages from the destination back to the source, this is the 'return path'. snip +/* Give a QEMUFile* off the same socket but data in the opposite + * direction. + * qemu_fopen_socket marks write fd's as blocking, but doesn't + * touch read fd's status, so we dup the fd just to keep settings + * separate. [TBD: Do I need to explicitly mark as non-block on read?] + */ +static QEMUFile *socket_dup_return_path(void *opaque) +{ +QEMUFileSocket *qfs = opaque; +int revfd; +bool this_is_read; +QEMUFile *result; + +/* If it's already open, return it */ +if (qfs-file-return_path) { +return qfs-file-return_path; Wouldn't this leave a dangling file descriptor if you call socket_dup_return_path twice, and then close the original QEMUFile? Hmm - how? +} + +if (qemu_file_get_error(qfs-file)) { +/* If the forward file is in error, don't try and open a return */ +return NULL; +} + +/* I don't think there's a better way to tell which direction 'this' is */ +this_is_read = qfs-file-ops-get_buffer != NULL; + +revfd = dup(qfs-fd); +if (revfd == -1) { +error_report(Error duplicating fd for return path: %s, + strerror(errno)); +return NULL; +} + +qemu_set_nonblock(revfd); Blocking/nonblocking is per-file *description*, not descriptor. So you're making the original QEMUFile nonblocking as well. Can you explain why this is needed before I reach the meat of the patch series? Yes, I went through that a few times until I got that it was per-entity not the fd itself, it still makes life easier for the rest of the QEMUFile code to have a separate fd for it (hence still worth the dup). In other words, can you draw a table with source/dest and read/write, and whether it should be blocking or non-blocking? Sure; the non-blocking ness is mostly on the source side; modifying the table in the docs patch a little: Source side Forward path - written by migration thread : It's OK for this to be blocking, but we end up with it being non-blocking, and modify the socket code to emulate blocking. Return path - opened by main thread, read by fd_handler on main thread : Must be non-blocking so as not to block the main thread while waiting for a partially sent command. Destination side Forward path - read by main thread Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) I think I'm OK with both these being blocking. Dave Paolo +result = qemu_fopen_socket(revfd, this_is_read ? wb : rb); +qfs-file-return_path = result; + +if (result) { +/* We are the reverse path of our reverse path (although I don't + expect this to be used, it would stop another dup if it was */ +result-return_path = qfs-file; +} else { +close(revfd); +} + +return result; +} + static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt, int64_t pos) { @@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode) } static const QEMUFileOps socket_read_ops = { -.get_fd = socket_get_fd, -.get_buffer = socket_get_buffer, -.close = socket_close +.get_fd = socket_get_fd, +.get_buffer = socket_get_buffer, +.close = socket_close, +.get_return_path = socket_dup_return_path }; static const QEMUFileOps socket_write_ops = { -.get_fd = socket_get_fd, -.writev_buffer = socket_writev_buffer, -.close = socket_close +.get_fd = socket_get_fd, +.writev_buffer = socket_writev_buffer, +.close = socket_close, +.get_return_path = socket_dup_return_path }; +/* + * Result: QEMUFile* for a 'return path' for comms in the opposite direction + * NULL if not available + */ +QEMUFile *qemu_file_get_return_path(QEMUFile *f) +{ +if (!f-ops-get_return_path) { +return NULL; +} +return f-ops-get_return_path(f-opaque); +} + bool qemu_file_mode_is_not_valid(const char *mode) { if (mode == NULL || -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto: + +/* If it's already open, return it */ +if (qfs-file-return_path) { +return qfs-file-return_path; Wouldn't this leave a dangling file descriptor if you call socket_dup_return_path twice, and then close the original QEMUFile? Hmm - how? The problem is that there is no reference count on QEMUFile, so if you do f1 = open_return_path(f0); f2 = open_return_path(f0); /* now f1 == f2 */ qemu_fclose(f1); /* now f2 is dangling */ The remark about closing the original QEMUFile is also related to this part: if (result) { /* We are the reverse path of our reverse path (although I don't expect this to be used, it would stop another dup if it was / result-return_path = qfs-file; which has a similar bug f1 = open_return_path(f0); f2 = open_return_path(f1); /* now f2 == f0 */ qemu_fclose(f0); /* now f2 is dangling */ If this is correct, the simplest fix is to drop the optimization. Source side Forward path - written by migration thread : It's OK for this to be blocking, but we end up with it being non-blocking, and modify the socket code to emulate blocking. This likely has a performance impact though. The first migration thread code drop from Juan already improved throughput a lot, even if it kept the iothread all the time and only converted from nonblocking writes to blocking. Return path - opened by main thread, read by fd_handler on main thread : Must be non-blocking so as not to block the main thread while waiting for a partially sent command. Why can't you handle this in the migration thread (or a new postcopy thread on the source side)? Then it can stay blocking. Destination side Forward path - read by main thread This must be nonblocking so that the monitor keeps responding. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) When does the main thread needs to write? If it doesn't need that, you can just switch to blocking when you process the listen command (i.e. when the postcopy thread starts). Paolo I think I'm OK with both these being blocking.
Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking on tap
-Original Message- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Tuesday, July 15, 2014 11:00 PM To: Wangkai (Kevin,C) Cc: Stefan Hajnoczi; Lee yang; qemu-devel@nongnu.org; aligu...@amazon.com Subject: Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking on tap On Mon, Jul 14, 2014 at 10:44:58AM +, Wangkai (Kevin,C) wrote: Here the detail network: ++ | The host add tap1 and eth10 to bridge 'br1'| +- ---+ | ++ | | send | | | VM eth1-+-tap1 --- bridge --- eth10 --+-+ | | packets| | ++ | | | ++ ++ ++ Qemu start vm by virtio, use tap interface, option is: -net nic,vlan=101, model=virtio -net tap,vlan=101,ifname=tap1,script=no,downscript=no Use the newer -netdev/-device syntax to get offload support and slightly better performance: -netdev tap,id=tap0,ifname=tap1,script=no,downscript=no \ -device virtio-net-pci,netdev=tap0 And add tap1 and eth10 to bridge br1 in the host: Brctl addif br1 tap1 Brctl addif br1 eth10 total recv 505387 time 2000925 us: means call tap_send once dealing 505387 packets, the packet payload was 300 bytes, and time use for tap_send() was 2,000,925 micro-seconds, time was measured by record time stamp at function tap_send() start and end. We just test the performance of VM. That is 150 MB of incoming packets in a single tap_send(). Network rx queues are maybe a few 1000 packets so I wonder what is going on here. Maybe more packets are arriving while QEMU is reading them and we keep looping. That's strange though because the virtio-net rx virtqueue should fill up (it only has 256 entries). Can you investigate more and find out exactly what is going on? It's not clear yet that adding a budget is the solution or just hiding a deeper problem. Stefan [Wangkai (Kevin,C)] Hi Stefan, Yes, I have one machine keep sending packets 300M/second, about 1M packets per second, I am checking the code, you mean virtqueue has only 256 entries, is this member keep the value? vq-vring.num I will check and add some debug info to check this problem. I have got the most time taking by the virtio_net_receive() virtio_net_receive qemu_deliver_packet qemu_net_queue_send net_hub_port_receive qemu_deliver_packet qemu_net_queue_send tap_send qemu_iohandler_poll main_loop_wait main_loop main regards Wangkai
[Qemu-devel] [Bug 1185228] Re: with 'monitor pty', it needs to flush pts device after sending command to it
This is not a bug. What happens is that QEMU echoes the command you type onto the PTY. If you do not do a cat /dev/pts/8, the PTY's buffer gets full and QEMU stops reading from the PTY until it is emptied. It doesn't happen if you use QMP. ** Changed in: qemu Status: New = Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1185228 Title: with 'monitor pty', it needs to flush pts device after sending command to it Status in QEMU: Invalid Bug description: Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Linux kvm.git Commit: 660696d1d16a71e15549ce1bf74953be1592bcd3 qemu.git Commit: 64afc2b4d48fb21e085517c38a59a3f61a11283c Host Kernel Version:3.9.0-rc3 Hardware: SandyBridge Bug detailed description: -- when creating guest with parameter -monitor pty, then send command (e.g. migrate, device_add) to the pts device. If I don't flush the pts device (e.g. using cat /dev/pts/8) to flush the pts device, the monitor command doesn't take effect. I know some old qemu (e.g. 1.3) didn't have this issue. create KVM guest: [root@vt-snb9 ~]# qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none /root/cathy/rhel6u4.qcow -monitor pty char device redirected to /dev/pts/8 (label compat_monitor0) VNC server running on `::1:5900' hot plug a PCI device: [root@vt-snb9 ~]# echo device_add pci-assign,host=05:10.0,id=nic /dev/pts/8 (here, the device cannot hot plug to the guest.) if you run command cat /dev/pts/8, the device will hot plug to the guest. hot remove the device: [root@vt-snb9 ~]# echo device_del nic/dev/pts/8 (here, the device cannot hot remove from the guest.) if you run command cat /dev/pts/8 the device can hot remove from the guest Reproduce steps: 1. qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none /root/cathy/rhel6u4.qcow -monitor pty 2. echo device_add pci-assign,host=05:10.0,id=nic /dev/pts/8 (or using 'migrate' command in qemu monitor redirection device /dev/pts/8 ) Current result: device can not hot plug or hot remove from the guest before exec 'cat /dev/pts/8'. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1185228/+subscriptions
[Qemu-devel] [PATCH 0/2] virtio-serial: Prevent addition of ports with same name
Hi, The 2nd patch prevents adding ports to the system with conflicts in the 'name' parameter. This is done by going through all the ports in the system, including ports on other virtio-serial devices. The first patch prepares for this by creating a linked list of all available virtio-serial devices, so they can be walked over to check their ports' names. Please review. Amit Shah (2): virtio-serial: create a linked list of all active devices virtio-serial: search for duplicate port names before adding new ports hw/char/virtio-serial-bus.c | 32 include/hw/virtio/virtio-serial.h | 2 ++ 2 files changed, 34 insertions(+) -- 1.9.3
[Qemu-devel] [PATCH 1/2] virtio-serial: create a linked list of all active devices
To ensure two virtserialports don't get added to the system with the same 'name' parameter, we need to access all the ports on all the devices added, and compare the names. We currently don't have a list of all VirtIOSerial devices added to the system. This commit adds a simple linked list in which devices are put when they're initialized, and removed when they go away. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/char/virtio-serial-bus.c | 10 ++ include/hw/virtio/virtio-serial.h | 2 ++ 2 files changed, 12 insertions(+) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 07bebc0..8c26f4e 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -26,6 +26,10 @@ #include hw/virtio/virtio-serial.h #include hw/virtio/virtio-access.h +struct VirtIOSerialDevices { +QLIST_HEAD(, VirtIOSerial) devices; +} vserdevices; + static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id) { VirtIOSerialPort *port; @@ -975,6 +979,8 @@ static void virtio_serial_device_realize(DeviceState *dev, Error **errp) */ register_savevm(dev, virtio-console, -1, 3, virtio_serial_save, virtio_serial_load, vser); + +QLIST_INSERT_HEAD(vserdevices.devices, vser, next); } static void virtio_serial_port_class_init(ObjectClass *klass, void *data) @@ -1003,6 +1009,8 @@ static void virtio_serial_device_unrealize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOSerial *vser = VIRTIO_SERIAL(dev); +QLIST_REMOVE(vser, next); + unregister_savevm(dev, virtio-console, vser); g_free(vser-ivqs); @@ -1027,6 +1035,8 @@ static void virtio_serial_class_init(ObjectClass *klass, void *data) DeviceClass *dc = DEVICE_CLASS(klass); VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass); +QLIST_INIT(vserdevices.devices); + dc-props = virtio_serial_properties; set_bit(DEVICE_CATEGORY_INPUT, dc-categories); vdc-realize = virtio_serial_device_realize; diff --git a/include/hw/virtio/virtio-serial.h b/include/hw/virtio/virtio-serial.h index 4746312..a679e54 100644 --- a/include/hw/virtio/virtio-serial.h +++ b/include/hw/virtio/virtio-serial.h @@ -202,6 +202,8 @@ struct VirtIOSerial { QTAILQ_HEAD(, VirtIOSerialPort) ports; +QLIST_ENTRY(VirtIOSerial) next; + /* bitmap for identifying active ports */ uint32_t *ports_map; -- 1.9.3
[Qemu-devel] [PATCH 2/2] virtio-serial: search for duplicate port names before adding new ports
Before adding new ports to VirtIOSerial devices, check if there's a conflict in the 'name' parameter. This ensures two virtserialports with identical names are not initialized. Reported-by: mazh...@redhat.com Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/char/virtio-serial-bus.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 8c26f4e..76e4228 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -56,6 +56,22 @@ static VirtIOSerialPort *find_port_by_vq(VirtIOSerial *vser, VirtQueue *vq) return NULL; } +static VirtIOSerialPort *find_port_by_name(char *name) +{ +VirtIOSerial *vser; + +QLIST_FOREACH(vser, vserdevices.devices, next) { +VirtIOSerialPort *port; + +QTAILQ_FOREACH(port, vser-ports, next) { +if (!strcmp(port-name, name)) { +return port; +} +} +} +return NULL; +} + static bool use_multiport(VirtIOSerial *vser) { VirtIODevice *vdev = VIRTIO_DEVICE(vser); @@ -847,6 +863,12 @@ static void virtser_port_device_realize(DeviceState *dev, Error **errp) return; } +if (find_port_by_name(port-name)) { +error_setg(errp, virtio-serial-bus: A port already exists by name %s, + port-name); +return; +} + if (port-id == VIRTIO_CONSOLE_BAD_ID) { if (plugging_port0) { port-id = 0; -- 1.9.3
Re: [Qemu-devel] [PATCH] linux-user: Add binfmt wrapper
Riku Voipio riku.voi...@iki.fi wrote on 2014/07/16 08:54:45: On Tue, Jul 15, 2014 at 05:11:48PM +0200, Joakim Tjernlund wrote: Riku Voipio riku.voi...@iki.fi wrote on 2014/07/15 16:12:26: On Mon, Jul 14, 2014 at 05:38:49PM +0200, Joakim Tjernlund wrote: Alexander Graf ag...@suse.de wrote on 2014/07/14 17:21:33: On 14.07.14 16:38, Joakim Tjernlund wrote: The popular binfmt-wrapper patch adds an additional executable which mangle argv suitable for binfmt flag P. In a chroot you need the both (statically linked) qemu-$arch and qemu-$arch-binfmt-wrapper. This is sub optimal and a better approach is to recognize the -binfmt-wrapper extension within linux-user(qemu-$arch) and mangle argv there. This just produces on executable which can be either copied to the chroot or bind mounted with the appropriate -binfmt-wrapper suffix. Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se Please make sure to CC Riku on patches like this - he is the linux-user maintainer. Doesn't he read the devel list? Anyhow CC:ed I do - but CC gets directly to my inbox while qemu-devel goes to an folder. I take from this discussion, that this patch has been superceded by the Patch at: http://patchwork.ozlabs.org/patch/369770/ ? BTW, any chance qemu binfmt could fixup the ps output from within a container: jocke-ppc2 ~ # ps uaxww USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1 0.1 0.0 4138016 7600 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /sbin/init /sbin/init root79 0.0 0.0 4138016 5792 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /usr/sbin/sshd /usr/sbin/sshd root 293 0.0 0.0 4137952 4072 ?Ss 17:02 0:00 /usr/bin/qemu-ppc /bin/busybox /bin/busybox udhcpc -x hostname:jocke-ppc2 --interface=eth0 --now --script=/lib/netifrc/sh/udhcpc-hook.sh --pidfile=/var/run/udhcpc-eth0.pid root 334 0.3 0.0 4138016 5964 tty1 Ss+ 17:02 0:00 /usr/bin/qemu-ppc /sbin/agetty /sbin/agetty 38400 tty1 linux root 335 3.1 0.0 4138048 7064 console Ss 17:02 0:00 /usr/bin/qemu-ppc /bin/login /bin/login -- root root 336 3.3 0.0 4138016 9764 console S17:02 0:00 /usr/bin/qemu-ppc /bin/bash -bash root 340 0.0 0.0 4138016 6336 ?R+ Jul10 0:00 /bin/ps ps uaxww As you can see, qemu-ppc is visible. This isn't something binfmt could do. ps uses /proc/$pid/exe to map the right binary. However, qemu already fixes /proc/self/exe to point to right file - as you can see from the last line where ps uaxww doesn't have qemu shown. So it would be possible to make do_open() in syscall.c do similar mapping for /proc/$pid/exe. Exactly how and when the qemu processes should be hidden within qemu needs careful thought thou. A naive approach would check if /proc/$pid/exe points to same binary as /proc/self/exe, hiding at least same architecture qemu processes. Took a look at do_open(), what horror to read what you do there. How on earth is this supposed to work? You don't separate normal qemu invocation from binfmt, only adjust self stuff, always only remove one entry(binfmt P flag is 2 entries) Reasonably this hiding should only be performed when started by binfmt? What are the other uses for ? Jocke
Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
* Paolo Bonzini (pbonz...@redhat.com) wrote: Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto: + +/* If it's already open, return it */ +if (qfs-file-return_path) { +return qfs-file-return_path; Wouldn't this leave a dangling file descriptor if you call socket_dup_return_path twice, and then close the original QEMUFile? Hmm - how? The problem is that there is no reference count on QEMUFile, so if you do f1 = open_return_path(f0); f2 = open_return_path(f0); /* now f1 == f2 */ qemu_fclose(f1); /* now f2 is dangling */ I think from the way I'm using it, I can remove the optimisation, but I do need to check. I'm not too sure what your worry is about 'f2' in this case; I guess the caller needs to know that it should only close the return path once - is that your worry? The remark about closing the original QEMUFile is also related to this part: if (result) { /* We are the reverse path of our reverse path (although I don't expect this to be used, it would stop another dup if it was / result-return_path = qfs-file; which has a similar bug f1 = open_return_path(f0); f2 = open_return_path(f1); /* now f2 == f0 */ qemu_fclose(f0); /* now f2 is dangling */ If this is correct, the simplest fix is to drop the optimization. I'm more nervous about dropping that one, because the current scheme does provide a clean way of finding the forward path if you've got the reverse (although I don't think I make use of it). Source side Forward path - written by migration thread : It's OK for this to be blocking, but we end up with it being non-blocking, and modify the socket code to emulate blocking. This likely has a performance impact though. The first migration thread code drop from Juan already improved throughput a lot, even if it kept the iothread all the time and only converted from nonblocking writes to blocking. Can you give some more reasoning as to why you think this will hit the performance so much; I thought the output buffers were quite big anyway. Return path - opened by main thread, read by fd_handler on main thread : Must be non-blocking so as not to block the main thread while waiting for a partially sent command. Why can't you handle this in the migration thread (or a new postcopy thread on the source side)? Then it can stay blocking. Handling it within the migration thread would make it much more complicated (which would be bad since it's already complex enough); A return path thread on the source side, hmm yes that could do it - especially since the migration thread is already a separate thread from the main thread handling this and thus already needs locking paraphernalia. Destination side Forward path - read by main thread This must be nonblocking so that the monitor keeps responding. Interesting, I suspect none of the code in there is set up for that at the moment, so how does that work during migration at the moment? Actually, I see I missed something here; this should be: Destination side Forward path - read by main thread, and listener thread (see the separate mail that described that listner thread) and that means that once postcopy is going (and the listener thread started) it can't block the monitor. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) When does the main thread needs to write? Not much; the only things the main thread currently responds to are the ReqAck (ping like) requests; those are turning out to be very useful during debug; I've also got the ability for the destination to send a migration result back to the source which seems useful to be able to 'fail' early. If it doesn't need that, you can just switch to blocking when you process the listen command (i.e. when the postcopy thread starts). Why don't I just do it anyway? Prior to postcopy starting we're in the same situation as we're in with precopy today, so can already get mainblock threading. Dave Paolo I think I'm OK with both these being blocking. -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 15/07/2014 23:25, Andrey Korolyov ha scritto: On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote: On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com wrote: On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote: Hello, the issue is not specific to the iothread code because generic virtio-blk also hangs up: Do you know which version works well? If you could bisect, that'll help a lot. Thanks, Amit Hi, 2.0 works definitely well. I`ll try to finish bisection today, though every step takes about 10 minutes to complete. Yay! It is even outside of virtio-blk. commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 Author: Marcelo Tosatti mtosa...@redhat.com Date: Tue Jun 3 13:34:48 2014 -0300 kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec calculation Ensure proper env-tsc value for kvmclock_current_nsec calculation. Reported-by: Marcin Gibuła m.gib...@beyond.pl Cc: qemu-sta...@nongnu.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include qemu/host-utils.h #include sysemu/sysemu.h #include sysemu/kvm.h -#include sysemu/cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); -assert(time.tsc_timestamp = migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift 0) { delta = -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s-clock_valid) { return; } - -cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data); if (ret 0) { fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); +cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret = 0) { qemu_file_set_rate_limit(s-file, INT64_MAX); It could also be useful to apply the above patch _and_ revert a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce. Paolo Yes, it solved the issue for me! (it took much time to check because most of country` debian mirrors went inconsistent by some reason) Also trivial addition: diff --git a/migration.c b/migration.c index 34f2325..65d1c88 100644 --- a/migration.c +++ b/migration.c @@ -25,6 +25,7 @@ #include qemu/thread.h #include qmp-commands.h #include trace.h +#include sysemu/cpus.h And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ? That is, test with a stock qemu.git tree and the patch sent today, on this thread, to move cpu_synchronize_all_states ? The main reason for things to work for me is a revert of 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other patches. I had tested two cases, with Alexander`s patch completely reverted plus suggestion from Marcelo and only with revert 9b178682 plug same suggestion. The difference is that the until Alexander` patch is not reverted, live migration is always failing by the timeout value, and when reverted migration always succeeds in 8-10 seconds. Appropriate diffs are attached for the reference. Andrey, Can you please apply only the following attached patch to an upstream QEMU git tree (move_synchronize_all_states.patch), plus
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On Wed, Jul 16, 2014 at 09:35:16AM +0200, Marcin Gibuła wrote: Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. Marcelo, the original reason for patch adding cpu_synchronize_all_states() there was because this bug affected non-migration operations as well - http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html. Won't moving it only to migration code break these things again? Yes - its just for debug purposes.
Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging
On 16 July 2014 10:17, Alex Bennée alex.ben...@linaro.org wrote: On aarch64 we do: #define unsupported_encoding(s, insn)\ do { \ qemu_log_mask(LOG_UNIMP, \ %s:%d: unsupported instruction encoding 0x%08x \ at pc=%016 PRIx64 \n, \ __FILE__, __LINE__, insn, s-pc - 4); \ unallocated_encoding(s); \ } while (0); So we signal it's unimplemented before falling through to the signal generating code. Yes, but that macro is only for instructions which exist but which QEMU doesn't implement, not for instructions which exist and which we do have an implementation of but which we generate an exception for because they aren't present on this particular CPU. That's a separate issue from whether you might want to optionally log every SIGILL a process generates. thanks -- PMM
[Qemu-devel] [RFC PATCH V4 3/6] icount: Make icount_time_shift available everywhere
icount_time_shift is used for calculting the delay qemu has to sleep in order to synchronise the host and guest clocks. Therefore, we need it in cpu-exec.c. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c| 8 ++-- include/qemu-common.h | 1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/cpus.c b/cpus.c index 34cc4c8..de0a8b2 100644 --- a/cpus.c +++ b/cpus.c @@ -105,8 +105,12 @@ static bool all_cpu_threads_idle(void) /* Compensate for varying guest execution speed. */ static int64_t qemu_icount_bias; static int64_t vm_clock_warp_start; -/* Conversion factor from emulated instructions to virtual clock ticks. */ -static int icount_time_shift; +/* Conversion factor from emulated instructions to virtual clock ticks. + * icount_time_shift is defined as extern in include/qemu-common.h because + * it is used (in cpu-exec.c) for calculating the delay for sleeping + * qemu in order to align the host and virtual clocks. + */ +int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ #define MAX_ICOUNT_SHIFT 10 diff --git a/include/qemu-common.h b/include/qemu-common.h index 860bb15..d47aa02 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -109,6 +109,7 @@ static inline char *realpath(const char *path, char *resolved_path) void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; extern int icount_align_option; +extern int icount_time_shift; #include qemu/osdep.h #include qemu/bswap.h -- 2.0.0.rc2
[Qemu-devel] [RFC PATCH V4 1/6] icount: Add QemuOpts for icount
Make icount parameter use QemuOpts style options in order to easily add other suboptions. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c| 10 +- include/qemu-common.h | 3 ++- qemu-options.hx | 4 ++-- qtest.c | 13 +++-- vl.c | 35 --- 5 files changed, 52 insertions(+), 13 deletions(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..dcca96a 100644 --- a/cpus.c +++ b/cpus.c @@ -440,13 +440,21 @@ static const VMStateDescription vmstate_timers = { } }; -void configure_icount(const char *option) +void configure_icount(QemuOpts *opts, Error **errp) { +const char *option; + seqlock_init(timers_state.vm_clock_seqlock, NULL); vmstate_register(NULL, 0, vmstate_timers, timers_state); +option = qemu_opt_get(opts, shift); if (!option) { return; } +/* When using -icount shift, the shift option will be + misinterpreted as a boolean */ +if (strcmp(option, on) == 0 || strcmp(option, off) == 0) { +error_setg(errp, The shift option must be a number or auto); +} icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME, icount_warp_rt, NULL); diff --git a/include/qemu-common.h b/include/qemu-common.h index ae76197..cc346ec 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -41,6 +41,7 @@ #include assert.h #include signal.h #include glib-compat.h +#include qemu/option.h #ifdef _WIN32 #include sysemu/os-win32.h @@ -105,7 +106,7 @@ static inline char *realpath(const char *path, char *resolved_path) #endif /* icount */ -void configure_icount(const char *option); +void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; #include qemu/osdep.h diff --git a/qemu-options.hx b/qemu-options.hx index 9e54686..143def4 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -3011,11 +3011,11 @@ re-inject them. ETEXI DEF(icount, HAS_ARG, QEMU_OPTION_icount, \ --icount [N|auto]\n \ +-icount [shift=N|auto]\n \ enable virtual instruction counter with 2^N clock ticks per\n \ instruction\n, QEMU_ARCH_ALL) STEXI -@item -icount [@var{N}|auto] +@item -icount [shift=@var{N}|auto] @findex -icount Enable virtual instruction counter. The virtual cpu will execute one instruction every 2^@var{N} ns of virtual time. If @code{auto} is specified diff --git a/qtest.c b/qtest.c index 04a6dc1..ef0d991 100644 --- a/qtest.c +++ b/qtest.c @@ -19,6 +19,9 @@ #include hw/irq.h #include sysemu/sysemu.h #include sysemu/cpus.h +#include qemu/config-file.h +#include qemu/option.h +#include qemu/error-report.h #define MAX_IRQ 256 @@ -509,10 +512,16 @@ static void qtest_event(void *opaque, int event) } } -int qtest_init_accel(MachineClass *mc) +static void configure_qtest_icount(const char *options) { -configure_icount(0); +QemuOpts *opts = qemu_opts_parse(qemu_find_opts(icount), options, 1); +configure_icount(opts, error_abort); +qemu_opts_del(opts); +} +int qtest_init_accel(MachineClass *mc) +{ +configure_qtest_icount(0); return 0; } diff --git a/vl.c b/vl.c index 41ddcd2..103027f 100644 --- a/vl.c +++ b/vl.c @@ -537,6 +537,20 @@ static QemuOptsList qemu_mem_opts = { }, }; +static QemuOptsList qemu_icount_opts = { +.name = icount, +.implied_opt_name = shift, +.merge_lists = true, +.head = QTAILQ_HEAD_INITIALIZER(qemu_icount_opts.head), +.desc = { +{ +.name = shift, +.type = QEMU_OPT_STRING, +}, +{ /* end of list */ } +}, +}; + /** * Get machine options * @@ -2896,13 +2910,12 @@ int main(int argc, char **argv, char **envp) { int i; int snapshot, linux_boot; -const char *icount_option = NULL; const char *initrd_filename; const char *kernel_filename, *kernel_cmdline; const char *boot_order; DisplayState *ds; int cyls, heads, secs, translation; -QemuOpts *hda_opts = NULL, *opts, *machine_opts; +QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL; QemuOptsList *olist; int optind; const char *optarg; @@ -2967,6 +2980,7 @@ int main(int argc, char **argv, char **envp) qemu_add_opts(qemu_msg_opts); qemu_add_opts(qemu_name_opts); qemu_add_opts(qemu_numa_opts); +qemu_add_opts(qemu_icount_opts); runstate_init(); @@ -3817,7 +3831,11 @@ int main(int argc, char **argv, char **envp) } break; case QEMU_OPTION_icount: -icount_option = optarg; +icount_opts = qemu_opts_parse(qemu_find_opts(icount), + optarg, 1); +if (!icount_opts) { +exit(1); +
[Qemu-devel] [RFC PATCH V4 4/6] cpu_exec: Add sleeping algorithm
The goal is to sleep qemu whenever the guest clock is in advance compared to the host clock (we use the monotonic clocks). The amount of time to sleep is calculated in the execution loop in cpu_exec. At first, we tried to approximate at each for loop the real time elapsed while searching for a TB (generating or retrieving from cache) and executing it. We would then approximate the virtual time corresponding to the number of virtual instructions executed. The difference between these 2 values would allow us to know if the guest is in advance or delayed. However, the function used for measuring the real time (qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive. We had an added overhead of 13% of the total run time. Therefore, we modified the algorithm and only take into account the difference between the 2 clocks at the begining of the cpu_exec function. During the for loop we try to reduce the advance of the guest only by computing the virtual time elapsed and sleeping if necessary. The overhead is thus reduced to 3%. Even though this method still has a noticeable overhead, it no longer is a bottleneck in trying to achieve a better guest frequency for which the guest clock is faster than the host one. As for the the alignement of the 2 clocks, with the first algorithm the guest clock was oscillating between -1 and 1ms compared to the host clock. Using the second algorithm we notice that the guest is 5ms behind the host, which is still acceptable for our use case. The tests where conducted using fio and stress. The host machine in an i5 CPU at 3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb built with buildroot. Currently, on our test machine, the lowest icount we can achieve that is suitable for aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are slower than the cpu tests (using stress). Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c | 97 ++ 1 file changed, 97 insertions(+) diff --git a/cpu-exec.c b/cpu-exec.c index 38e5f02..9ed6ac1 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -22,6 +22,90 @@ #include tcg.h #include qemu/atomic.h #include sysemu/qtest.h +#include qemu/timer.h + +/* -icount align implementation. */ + +typedef struct SyncClocks { +int64_t diff_clk; +int64_t original_instr_counter; +} SyncClocks; + +#if !defined(CONFIG_USER_ONLY) +/* Allow the guest to have a max 3ms advance. + * The difference between the 2 clocks could therefore + * oscillate around 0. + */ +#define VM_CLOCK_ADVANCE 300 + +static int64_t delay_host(int64_t diff_clk) +{ +if (diff_clk VM_CLOCK_ADVANCE) { +#ifndef _WIN32 +struct timespec sleep_delay, rem_delay; +sleep_delay.tv_sec = diff_clk / 10LL; +sleep_delay.tv_nsec = diff_clk % 10LL; +if (nanosleep(sleep_delay, rem_delay) 0) { +diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) * 10LL; +diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec; +} else { +diff_clk = 0; +} +#else +Sleep(diff_clk / SCALE_MS); +diff_clk = 0; +#endif +} +return diff_clk; +} + +static int64_t instr_to_vtime(int64_t instr_counter, const CPUState *cpu) +{ +int64_t instr_exec_time; +instr_exec_time = instr_counter - + (cpu-icount_extra + + cpu-icount_decr.u16.low); +instr_exec_time = instr_exec_time icount_time_shift; + +return instr_exec_time; +} + +static void align_clocks(SyncClocks *sc, const CPUState *cpu) +{ +if (!icount_align_option) { +return; +} +sc-diff_clk += instr_to_vtime(sc-original_instr_counter, cpu); +sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low; +sc-diff_clk = delay_host(sc-diff_clk); +} + +static void init_delay_params(SyncClocks *sc, + const CPUState *cpu) +{ +static int64_t clocks_offset; +if (!icount_align_option) { +return; +} +int64_t realtime_clock_value, virtual_clock_value; +realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); +virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); +/* Compute offset between the 2 clocks. */ +if (!clocks_offset) { +clocks_offset = realtime_clock_value - virtual_clock_value; +} +sc-diff_clk = virtual_clock_value - realtime_clock_value + clocks_offset; +sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low; +} +#else +static void align_clocks(SyncClocks *sc, const CPUState *cpu) +{ +} + +static void init_delay_params(SyncClocks *sc, const CPUState *cpu) +{ +} +#endif /* CONFIG USER ONLY */ void cpu_loop_exit(CPUState *cpu) { @@ -227,6 +311,8 @@ int cpu_exec(CPUArchState *env)
[Qemu-devel] [RFC PATCH V4 5/6] cpu_exec: Print to console if the guest is late
If the align option is enabled, we print to the user whenever the guest clock is behind the host clock in order for he/she to have a hint about the actual performance. The maximum print interval is 2s and we limit the number of messages to 100. If desired, this can be changed in cpu-exec.c Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Conflicts: cpu-exec.c --- cpu-exec.c | 39 ++- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 9ed6ac1..b62df15 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -29,6 +29,7 @@ typedef struct SyncClocks { int64_t diff_clk; int64_t original_instr_counter; +int64_t realtime_clock; } SyncClocks; #if !defined(CONFIG_USER_ONLY) @@ -37,6 +38,9 @@ typedef struct SyncClocks { * oscillate around 0. */ #define VM_CLOCK_ADVANCE 300 +#define THRESHOLD_REDUCE 1.5 +#define MAX_DELAY_PRINT_RATE 2 +#define MAX_NB_PRINTS 100 static int64_t delay_host(int64_t diff_clk) { @@ -80,6 +84,28 @@ static void align_clocks(SyncClocks *sc, const CPUState *cpu) sc-diff_clk = delay_host(sc-diff_clk); } +static void print_delay(const SyncClocks *sc) +{ +static float threshold_delay; +static int64_t last_realtime_clock; +static int nb_prints; + +if (icount_align_option +(sc-realtime_clock - last_realtime_clock) / 10LL += MAX_DELAY_PRINT_RATE nb_prints MAX_NB_PRINTS) { +if ((-sc-diff_clk / (float)10LL threshold_delay) || +(-sc-diff_clk / (float)10LL + (threshold_delay - THRESHOLD_REDUCE))) { +threshold_delay = (-sc-diff_clk / 10LL) + 1; +printf(Warning: The guest is now late by %.1f to %.1f seconds\n, + threshold_delay - 1, + threshold_delay); +nb_prints++; +last_realtime_clock = sc-realtime_clock; +} +} +} + static void init_delay_params(SyncClocks *sc, const CPUState *cpu) { @@ -87,15 +113,18 @@ static void init_delay_params(SyncClocks *sc, if (!icount_align_option) { return; } -int64_t realtime_clock_value, virtual_clock_value; -realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); -virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); +sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); /* Compute offset between the 2 clocks. */ if (!clocks_offset) { -clocks_offset = realtime_clock_value - virtual_clock_value; +clocks_offset = sc-realtime_clock - +qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); } -sc-diff_clk = virtual_clock_value - realtime_clock_value + clocks_offset; +sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - + sc-realtime_clock + clocks_offset; sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low; +/* Print every 2s max if the guest is late. We limit the number + of printed messages to NB_PRINT_MAX(currently 100) */ +print_delay(sc); } #else static void align_clocks(SyncClocks *sc, const CPUState *cpu) -- 2.0.0.rc2
[Qemu-devel] [RFC PATCH V4 2/6] icount: Add align option to icount
The align option is used for activating the align algorithm in order to synchronise the host clock and the guest clock. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr --- cpus.c| 19 --- include/qemu-common.h | 1 + qemu-options.hx | 15 +-- vl.c | 4 4 files changed, 30 insertions(+), 9 deletions(-) diff --git a/cpus.c b/cpus.c index dcca96a..34cc4c8 100644 --- a/cpus.c +++ b/cpus.c @@ -443,25 +443,30 @@ static const VMStateDescription vmstate_timers = { void configure_icount(QemuOpts *opts, Error **errp) { const char *option; +char *rem_str = NULL; seqlock_init(timers_state.vm_clock_seqlock, NULL); vmstate_register(NULL, 0, vmstate_timers, timers_state); option = qemu_opt_get(opts, shift); if (!option) { +if (qemu_opt_get(opts, align) != NULL) { +error_setg(errp, Please specify shift option when using align); +} return; } -/* When using -icount shift, the shift option will be - misinterpreted as a boolean */ -if (strcmp(option, on) == 0 || strcmp(option, off) == 0) { -error_setg(errp, The shift option must be a number or auto); -} - +icount_align_option = qemu_opt_get_bool(opts, align, false); icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME, icount_warp_rt, NULL); if (strcmp(option, auto) != 0) { -icount_time_shift = strtol(option, NULL, 0); +errno = 0; +icount_time_shift = strtol(option, rem_str, 0); +if (errno != 0 || *rem_str != '\0' || !strlen(option)) { +error_setg(errp, icount: Invalid shift value); +} use_icount = 1; return; +} else if (icount_align_option) { +error_setg(errp, shift=auto and align=on are incompatible); } use_icount = 2; diff --git a/include/qemu-common.h b/include/qemu-common.h index cc346ec..860bb15 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -108,6 +108,7 @@ static inline char *realpath(const char *path, char *resolved_path) /* icount */ void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; +extern int icount_align_option; #include qemu/osdep.h #include qemu/bswap.h diff --git a/qemu-options.hx b/qemu-options.hx index 143def4..379d932 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -3011,9 +3011,9 @@ re-inject them. ETEXI DEF(icount, HAS_ARG, QEMU_OPTION_icount, \ --icount [shift=N|auto]\n \ +-icount [shift=N|auto][,align=on|off]\n \ enable virtual instruction counter with 2^N clock ticks per\n \ -instruction\n, QEMU_ARCH_ALL) +instruction and enable aligning the host and virtual clocks\n, QEMU_ARCH_ALL) STEXI @item -icount [shift=@var{N}|auto] @findex -icount @@ -3026,6 +3026,17 @@ Note that while this option can give deterministic behavior, it does not provide cycle accurate emulation. Modern CPUs contain superscalar out of order cores with complex cache hierarchies. The number of instructions executed often has little or no correlation with actual performance. + +@option{align=on} will activate the delay algorithm which will try to +to synchronise the host clock and the virtual clock. The goal is to +have a guest running at the real frequency imposed by the shift option. +Whenever the guest clock is behind the host clock and if +@option{align=on} is specified then we print a messsage to the user +to inform about the delay. +Currently this option does not work when @option{shift} is @code{auto}. +Note: The sync algorithm will work for those shift values for which +the guest clock runs ahead of the host clock. Typically this happens +when the shift value is high (how high depends on the host machine). ETEXI DEF(watchdog, HAS_ARG, QEMU_OPTION_watchdog, \ diff --git a/vl.c b/vl.c index 103027f..d270070 100644 --- a/vl.c +++ b/vl.c @@ -183,6 +183,7 @@ uint8_t *boot_splash_filedata; size_t boot_splash_filedata_size; uint8_t qemu_extra_params_fw[2]; +int icount_align_option; typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { @@ -546,6 +547,9 @@ static QemuOptsList qemu_icount_opts = { { .name = shift, .type = QEMU_OPT_STRING, +}, { +.name = align, +.type = QEMU_OPT_BOOL, }, { /* end of list */ } }, -- 2.0.0.rc2
[Qemu-devel] [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit'
Show in 'info jit' the current delay between the host clock and the guest clock. In addition, print the maximum advance and delay of the guest compared to the host. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr --- cpu-exec.c| 19 ++- cpus.c| 17 + include/qemu-common.h | 5 + monitor.c | 1 + 4 files changed, 37 insertions(+), 5 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index b62df15..055e0e3 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -109,19 +109,28 @@ static void print_delay(const SyncClocks *sc) static void init_delay_params(SyncClocks *sc, const CPUState *cpu) { -static int64_t clocks_offset; -if (!icount_align_option) { -return; +static int64_t realtime_clock_value; +if (icount_align_option || !realtime_clock_value) { +realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); } -sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); /* Compute offset between the 2 clocks. */ if (!clocks_offset) { -clocks_offset = sc-realtime_clock - +clocks_offset = realtime_clock_value - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); } +if (!icount_align_option) { +return; +} +sc-realtime_clock = realtime_clock_value; sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - sc-realtime_clock + clocks_offset; sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low; +if (sc-diff_clk max_delay) { +max_delay = sc-diff_clk; +} +if (sc-diff_clk max_advance) { +max_advance = sc-diff_clk; +} /* Print every 2s max if the guest is late. We limit the number of printed messages to NB_PRINT_MAX(currently 100) */ print_delay(sc); diff --git a/cpus.c b/cpus.c index de0a8b2..3c3fb64 100644 --- a/cpus.c +++ b/cpus.c @@ -64,6 +64,9 @@ #endif /* CONFIG_LINUX */ static CPUState *next_cpu; +int64_t clocks_offset; +int64_t max_delay; +int64_t max_advance; bool cpu_is_stopped(CPUState *cpu) { @@ -1504,3 +1507,17 @@ void qmp_inject_nmi(Error **errp) error_set(errp, QERR_UNSUPPORTED); #endif } + +void dump_drift_info(FILE *f, fprintf_function cpu_fprintf) +{ +cpu_fprintf(f, Host - Guest clock %ld(ms)\n, +(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - clocks_offset - + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL))/SCALE_MS); +if (icount_align_option) { +cpu_fprintf(f, Max guest delay %ld(ms)\n, -max_delay/SCALE_MS); +cpu_fprintf(f, Max guest advance %ld(ms)\n, max_advance/SCALE_MS); +} else { +cpu_fprintf(f, Max guest delay NA\n); +cpu_fprintf(f, Max guest advance NA\n); +} +} diff --git a/include/qemu-common.h b/include/qemu-common.h index d47aa02..fbc1bca 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -110,6 +110,11 @@ void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; extern int icount_align_option; extern int icount_time_shift; +extern int64_t clocks_offset; +/* drift information for info jit command */ +extern int64_t max_delay; +extern int64_t max_advance; +void dump_drift_info(FILE *f, fprintf_function cpu_fprintf); #include qemu/osdep.h #include qemu/bswap.h diff --git a/monitor.c b/monitor.c index 5bc70a6..cdbaa60 100644 --- a/monitor.c +++ b/monitor.c @@ -1047,6 +1047,7 @@ static void do_info_registers(Monitor *mon, const QDict *qdict) static void do_info_jit(Monitor *mon, const QDict *qdict) { dump_exec_info((FILE *)mon, monitor_fprintf); +dump_drift_info((FILE *)mon, monitor_fprintf); } static void do_info_history(Monitor *mon, const QDict *qdict) -- 2.0.0.rc2
[Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks
The icount option already implemented in QEMU allows the guest to run at a theoretical frequency of 1/(2^N) GHz (N is the icount parameter). The goal of this patch is to have a real guest frequency close to the one imposed by using the icount option. The main idea behind the algorithm is that we compare the virtual monotonic clock and the host monotonic clock. For big icounts (on our test machine, an i5 CPU @ 3.10GHz, icounts starting at 6) the guest clock will be ahead of the host clock. In this case, we try to sleep QEMU for the difference between the 2 clocks. Therefore, the guest would have executed for a period almost equally to the one imposed by icount. We should point out that the algorithm works only for those icounts that allow the guest clock to be in front of the host clock. The first patch adds support for QemuOpts for the 'icount' parameter. It also adds a suboption called 'shift' that will hold the value for 'icount'. Therefore we now have -icount shift=N|auto or -icount N|auto. The second patch adds the 'align' suboption for icount. The third patch exports 'icount_time_shift' so that it can be used in places other than cpus.c; we need it in cpu-exec.c for calculating for how long we want QEMU to sleep. The forth patch implements the algorithm used for calculating the delay we want to sleep. It uses the number of instructions executed by the virtual cpu and also 'icount_time_shift'. The fifth patch prints to the console whenever the guest clock runs behind the host clock. The fastest printing speed is every 2 seconds, and we only print if the align option is enabled. We also have a limit to 100 printed messages. The sixth patch adds information about the difference between the host and guest clocks (taking into account the offset) in the 'info jit' command. We also print the maximum delay and advance of the guest clock compared to the host clock. v3 - v4 * Add better error handling for 'strtol' in patch 2 * Add 'Sleep' instead of 'nanosleep' for Windows hosts in patch 4 * Remove function pointers from patches 4 and 5 Sebastian Tanase (6): icount: Add QemuOpts for icount icount: Add align option to icount icount: Make icount_time_shift available everywhere cpu_exec: Add sleeping algorithm cpu_exec: Print to console if the guest is late monitor: Add drift info to 'info jit' cpu-exec.c| 135 ++ cpus.c| 44 ++-- include/qemu-common.h | 10 +++- monitor.c | 1 + qemu-options.hx | 17 +-- qtest.c | 13 - vl.c | 39 --- 7 files changed, 241 insertions(+), 18 deletions(-) -- 2.0.0.rc2
Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto: * Paolo Bonzini (pbonz...@redhat.com) wrote: Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto: + +/* If it's already open, return it */ +if (qfs-file-return_path) { +return qfs-file-return_path; Wouldn't this leave a dangling file descriptor if you call socket_dup_return_path twice, and then close the original QEMUFile? Hmm - how? The problem is that there is no reference count on QEMUFile, so if you do f1 = open_return_path(f0); f2 = open_return_path(f0); /* now f1 == f2 */ qemu_fclose(f1); /* now f2 is dangling */ I think from the way I'm using it, I can remove the optimisation, but I do need to check. I'm not too sure what your worry is about 'f2' in this case; I guess the caller needs to know that it should only close the return path once - is that your worry? Yes. The API is not well encapsulated; a random caller of open_return_path does not know (and cannot know) whether it should close the returned file or not. I'm more nervous about dropping that one, because the current scheme does provide a clean way of finding the forward path if you've got the reverse (although I don't think I make use of it). If it isn't used, why keep it? Source side Forward path - written by migration thread : It's OK for this to be blocking, but we end up with it being non-blocking, and modify the socket code to emulate blocking. This likely has a performance impact though. The first migration thread code drop from Juan already improved throughput a lot, even if it kept the iothread all the time and only converted from nonblocking writes to blocking. Can you give some more reasoning as to why you think this will hit the performance so much; I thought the output buffers were quite big anyway. I don't really know, it's Return path - opened by main thread, read by fd_handler on main thread : Must be non-blocking so as not to block the main thread while waiting for a partially sent command. Why can't you handle this in the migration thread (or a new postcopy thread on the source side)? Then it can stay blocking. Handling it within the migration thread would make it much more complicated (which would be bad since it's already complex enough); Ok. I'm not sure why it is more complicated since migration is essentially two-phase, one where the source drives it and one where the source just waits for requests, but I'll trust you on this. :) Destination side Forward path - read by main thread This must be nonblocking so that the monitor keeps responding. Interesting, I suspect none of the code in there is set up for that at the moment, so how does that work during migration at the moment? It sure is. :) On the destination side, migration is done in a coroutine (see process_incoming_migration) so it's all transparent. Only socket_get_buffer has to know about this: len = qemu_recv(s-fd, buf, size, 0); if (len != -1) { break; } if (socket_error() == EAGAIN) { yield_until_fd_readable(s-fd); } else if (socket_error() != EINTR) { break; } If the socket is put in blocking mode recv will never return EAGAIN, so this code will only run if the socket is nonblocking. Actually, I see I missed something here; this should be: Destination side Forward path - read by main thread, and listener thread (see the separate mail that described that listner thread) and that means that once postcopy is going (and the listener thread started) it can't block the monitor. Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once it gets its hands on the QEMUFile. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) When does the main thread needs to write? Not much; the only things the main thread currently responds to are the ReqAck (ping like) requests; those are turning out to be very useful during debug; I've also got the ability for the destination to send a migration result back to the source which seems useful to be able to 'fail' early. Why can't this be done in the listener thread? (Thus transforming it into a more general postcopy migration thread; later we could even change incoming migration from a coroutine to a thread). If it doesn't need that, you can just switch to blocking when you process the listen command (i.e. when the postcopy thread starts). Why don't I just do it anyway? Prior to postcopy starting we're in the same situation as we're in with precopy today, so can already get mainblock threading. See above for the explanation. Paolo
[Qemu-devel] [PATCH for-2.2 0/5] scsi: enable passthrough of vendor-specific commands
Right now scsi-generic is parsing the CDB, in order to compute the expected number of bytes to be transferred. This is necessary if DMA is done by the HBA via scsi_req_data, but it prevents executing vendor-specific commands via scsi-generic because we don't know how to parse them. If DMA is delegated to the SCSI layer via get_sg_list, we know in advance how many bytes the guest will want to receive and we can pass the information straight from the guest to SG_IO. In this case, it is unnecessary to parse the CDB to get the same information. scsi-disk needs it to detect underruns and overruns, but scsi-generic and scsi-block can just ask the HBA about the transfer direction and size. This series introduces a new parse_cdb callback in both the device and the HBA. The latter is called by scsi_bus_parse_cdb, which devices can call for passthrough requests in their implementation of parse_cdb. Tamuki-san, can you please test if these patches are okay for your usecase? Paolo Paolo Bonzini (5): scsi-bus: prepare scsi_req_new for introduction of parse_cdb scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo scsi-block: extract scsi_block_is_passthrough scsi-block, scsi-generic: implement parse_cdb virtio-scsi: implement parse_cdb hw/scsi/scsi-bus.c | 68 ++ hw/scsi/scsi-disk.c| 52 +- hw/scsi/scsi-generic.c | 7 ++ hw/scsi/virtio-scsi.c | 24 ++ include/hw/scsi/scsi.h | 7 ++ 5 files changed, 124 insertions(+), 34 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough
This will be used for both scsi_block_new_request and the scsi-block implementation of parse_cdb. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-disk.c | 38 ++ 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index d47ecd6..81b7276 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2501,12 +2501,8 @@ static int scsi_block_initfn(SCSIDevice *dev) return scsi_initfn(s-qdev); } -static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, - uint32_t lun, uint8_t *buf, - void *hba_private) +static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf) { -SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); - switch (buf[0]) { case READ_6: case READ_10: @@ -2523,9 +2519,9 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, case WRITE_VERIFY_12: case WRITE_VERIFY_16: /* If we are not using O_DIRECT, we might read stale data from the -* host cache if writes were made using other commands than these -* ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without -* O_DIRECT everything must go through SG_IO. + * host cache if writes were made using other commands than these + * ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without + * O_DIRECT everything must go through SG_IO. */ if (!(bdrv_get_flags(s-qdev.conf.bs) BDRV_O_NOCACHE)) { break; @@ -2542,13 +2538,31 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, * just make scsi-block operate the same as scsi-generic for them. */ if (s-qdev.type != TYPE_ROM) { -return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun, - hba_private); +return false; } +break; + +default: +break; } -return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, - hba_private); +return true; +} + + +static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, + uint32_t lun, uint8_t *buf, + void *hba_private) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); + +if (scsi_block_is_passthrough(s, buf)) { +return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, + hba_private); +} else { +return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun, + hba_private); +} } #endif -- 1.9.3
[Qemu-devel] [Bug 1342704] [NEW] error: Crash of qemu-img/qemu-io on the qcow2 image with large values in 'incompatible features' field
Public bug reported: qemu-io and qemu-img fails with an assertion (see below) at attempt to interact with the qcow2 image having large values in the 'incompatible features' header field. util/error.c:34: error_set: Assertion `*errp == ((void *)0)' failed. The backtrace file and the test image can be found in the attachment. The backtraces are for the next command: qemu-img check -f qcow2 test_image The image was generated by the qcow2 image fuzzer. qemu.git head: 5a7348045091a2bc15 ** Affects: qemu Importance: Undecided Status: New ** Attachment added: Backtraces and the test image https://bugs.launchpad.net/bugs/1342704/+attachment/4153944/+files/artifacts.tar.gz -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1342704 Title: error: Crash of qemu-img/qemu-io on the qcow2 image with large values in 'incompatible features' field Status in QEMU: New Bug description: qemu-io and qemu-img fails with an assertion (see below) at attempt to interact with the qcow2 image having large values in the 'incompatible features' header field. util/error.c:34: error_set: Assertion `*errp == ((void *)0)' failed. The backtrace file and the test image can be found in the attachment. The backtraces are for the next command: qemu-img check -f qcow2 test_image The image was generated by the qcow2 image fuzzer. qemu.git head: 5a7348045091a2bc15 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1342704/+subscriptions
[Qemu-devel] [PATCH 2/5] scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
These callbacks will let devices do their own request parsing, or defer it to the bus. If the bus does not provide an implementation, in turn, fall back to the default parsing routine. Swap the first two arguments to scsi_req_parse, and rename it to scsi_req_parse_cdb, for consistency. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-bus.c | 25 ++--- include/hw/scsi/scsi.h | 6 ++ 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index bf7ba8c..54fbf29 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -9,7 +9,7 @@ static char *scsibus_get_dev_path(DeviceState *dev); static char *scsibus_get_fw_dev_path(DeviceState *dev); -static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf); +static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); static void scsi_req_dequeue(SCSIRequest *req); static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len); static void scsi_target_free_buf(SCSIRequest *req); @@ -54,6 +54,18 @@ static void scsi_device_destroy(SCSIDevice *s) } } +int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private) +{ +SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev-qdev.parent_bus); + +if (bus-info-parse_cdb) { +return bus-info-parse_cdb(dev, cmd, buf, hba_private); +} else { +return scsi_req_parse_cdb(dev, cmd, buf); +} +} + static SCSIRequest *scsi_device_alloc_req(SCSIDevice *s, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private) { @@ -562,8 +574,10 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, { SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d-qdev.parent_bus); const SCSIReqOps *ops; +SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(d); SCSIRequest *req; SCSICommand cmd; +int ret; if ((d-unit_attention.key == UNIT_ATTENTION || bus-unit_attention.key == UNIT_ATTENTION) @@ -586,8 +600,13 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, ops = NULL; } +if (ops != NULL || !sc-parse_cdb) { +ret = scsi_req_parse_cdb(d, cmd, buf); +} else { +ret = sc-parse_cdb(d, cmd, buf, hba_private); +} -if (scsi_req_parse(cmd, d, buf) != 0) { +if (ret != 0) { trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]); req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private); } else { @@ -1188,7 +1207,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd) return lba; } -static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf) +static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) { int rc; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 1adb549..4a0b860 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -76,6 +76,8 @@ typedef struct SCSIDeviceClass { DeviceClass parent_class; int (*init)(SCSIDevice *dev); void (*destroy)(SCSIDevice *s); +int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private); SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private); void (*unit_attention_reported)(SCSIDevice *s); @@ -131,6 +133,8 @@ struct SCSIReqOps { struct SCSIBusInfo { int tcq; int max_channel, max_target, max_lun; +int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private); void (*transfer_data)(SCSIRequest *req, uint32_t arg); void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid); void (*cancel)(SCSIRequest *req); @@ -244,6 +248,8 @@ void scsi_req_free(SCSIRequest *req); SCSIRequest *scsi_req_ref(SCSIRequest *req); void scsi_req_unref(SCSIRequest *req); +int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private); void scsi_req_build_sense(SCSIRequest *req, SCSISense sense); void scsi_req_print(SCSIRequest *req); void scsi_req_continue(SCSIRequest *req); -- 1.9.3
[Qemu-devel] [PATCH 1/5] scsi-bus: prepare scsi_req_new for introduction of parse_cdb
The per-SCSIDevice parse_cdb callback must not be called if the request will go through special SCSIReqOps, so detect the special cases early enough. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-bus.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 4341754..bf7ba8c 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -561,9 +561,32 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private) { SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d-qdev.parent_bus); +const SCSIReqOps *ops; SCSIRequest *req; SCSICommand cmd; +if ((d-unit_attention.key == UNIT_ATTENTION || + bus-unit_attention.key == UNIT_ATTENTION) +(buf[0] != INQUIRY + buf[0] != REPORT_LUNS + buf[0] != GET_CONFIGURATION + buf[0] != GET_EVENT_STATUS_NOTIFICATION + + /* + * If we already have a pending unit attention condition, + * report this one before triggering another one. + */ + !(buf[0] == REQUEST_SENSE d-sense_is_ua))) { +ops = reqops_unit_attention; +} else if (lun != d-lun || + buf[0] == REPORT_LUNS || + (buf[0] == REQUEST_SENSE d-sense_len)) { +ops = reqops_target_command; +} else { +ops = NULL; +} + + if (scsi_req_parse(cmd, d, buf) != 0) { trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]); req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private); @@ -577,25 +600,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, if (cmd.xfer INT32_MAX) { req = scsi_req_alloc(reqops_invalid_field, d, tag, lun, hba_private); -} else if ((d-unit_attention.key == UNIT_ATTENTION || - bus-unit_attention.key == UNIT_ATTENTION) - (buf[0] != INQUIRY - buf[0] != REPORT_LUNS - buf[0] != GET_CONFIGURATION - buf[0] != GET_EVENT_STATUS_NOTIFICATION - - /* -* If we already have a pending unit attention condition, -* report this one before triggering another one. -*/ - !(buf[0] == REQUEST_SENSE d-sense_is_ua))) { -req = scsi_req_alloc(reqops_unit_attention, d, tag, lun, - hba_private); -} else if (lun != d-lun || - buf[0] == REPORT_LUNS || - (buf[0] == REQUEST_SENSE d-sense_len)) { -req = scsi_req_alloc(reqops_target_command, d, tag, lun, - hba_private); +} else if (ops) { +req = scsi_req_alloc(ops, d, tag, lun, hba_private); } else { req = scsi_device_alloc_req(d, tag, lun, buf, hba_private); } -- 1.9.3
[Qemu-devel] [PATCH 4/5] scsi-block, scsi-generic: implement parse_cdb
The callback lets the bus provide the direction and transfer count for passthrough commands, enabling passthrough of vendor-specific commands. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-bus.c | 3 +-- hw/scsi/scsi-disk.c| 14 ++ hw/scsi/scsi-generic.c | 7 +++ include/hw/scsi/scsi.h | 1 + 4 files changed, 23 insertions(+), 2 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 54fbf29..6167bea 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -9,7 +9,6 @@ static char *scsibus_get_dev_path(DeviceState *dev); static char *scsibus_get_fw_dev_path(DeviceState *dev); -static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); static void scsi_req_dequeue(SCSIRequest *req); static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len); static void scsi_target_free_buf(SCSIRequest *req); @@ -1207,7 +1206,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd) return lba; } -static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) +int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) { int rc; diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 81b7276..d55521d 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2564,6 +2564,19 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, hba_private); } } + +static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd, + uint8_t *buf, void *hba_private) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); + +if (scsi_block_is_passthrough(s, buf)) { +return scsi_bus_parse_cdb(s-qdev, cmd, buf, hba_private); +} else { +return scsi_req_parse_cdb(s-qdev, cmd, buf); +} +} + #endif #define DEFINE_SCSI_DISK_PROPERTIES()\ @@ -2672,6 +2685,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data) sc-init = scsi_block_initfn; sc-destroy = scsi_destroy; sc-alloc_req= scsi_block_new_request; +sc-parse_cdb= scsi_block_parse_cdb; dc-fw_name = disk; dc-desc = SCSI block device passthrough; dc-reset = scsi_disk_reset; diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c index 3733d2c..0b2ff90 100644 --- a/hw/scsi/scsi-generic.c +++ b/hw/scsi/scsi-generic.c @@ -490,6 +490,12 @@ static Property scsi_generic_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static int scsi_generic_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, + uint8_t *buf, void *hba_private) +{ +return scsi_bus_parse_cdb(dev, cmd, buf, hba_private); +} + static void scsi_generic_class_initfn(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -498,6 +504,7 @@ static void scsi_generic_class_initfn(ObjectClass *klass, void *data) sc-init = scsi_generic_initfn; sc-destroy = scsi_destroy; sc-alloc_req= scsi_new_request; +sc-parse_cdb= scsi_generic_parse_cdb; dc-fw_name = disk; dc-desc = pass through generic scsi device (/dev/sg*); dc-reset = scsi_generic_reset; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 4a0b860..a7a28e6 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -250,6 +250,7 @@ void scsi_req_unref(SCSIRequest *req); int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, void *hba_private); +int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); void scsi_req_build_sense(SCSIRequest *req, SCSISense sense); void scsi_req_print(SCSIRequest *req); void scsi_req_continue(SCSIRequest *req); -- 1.9.3
[Qemu-devel] [PATCH 5/5] virtio-scsi: implement parse_cdb
Enable passthrough of vendor-specific commands. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/virtio-scsi.c | 24 1 file changed, 24 insertions(+) diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 0eb069a..e8f4c0c 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -406,6 +406,29 @@ static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status, virtio_scsi_complete_cmd_req(req); } +static int virtio_scsi_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, + uint8_t *buf, void *hba_private) +{ +VirtIOSCSIReq *req = hba_private; + +cmd-lba = -1; +cmd-len = MIN(VIRTIO_SCSI_CDB_SIZE, SCSI_CMD_BUF_SIZE); +memcpy(cmd-buf, buf, cmd-len); + +/* Extract the direction and mode directly from the request, for + * host device passthrough. + */ +cmd-xfer = req-qsgl.size; +if (cmd-xfer == 0) { +cmd-mode = SCSI_XFER_NONE; +} else if (iov_size(req-elem.in_sg, req-elem.in_num) req-resp_size) { +cmd-mode = SCSI_XFER_FROM_DEV; +} else { +cmd-mode = SCSI_XFER_TO_DEV; +} +return 0; +} + static QEMUSGList *virtio_scsi_get_sg_list(SCSIRequest *r) { VirtIOSCSIReq *req = r-hba_private; @@ -658,6 +681,7 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = { .change = virtio_scsi_change, .hotplug = virtio_scsi_hotplug, .hot_unplug = virtio_scsi_hot_unplug, +.parse_cdb = virtio_scsi_parse_cdb, .get_sg_list = virtio_scsi_get_sg_list, .save_request = virtio_scsi_save_request, .load_request = virtio_scsi_load_request, -- 1.9.3
Re: [Qemu-devel] [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit'
Il 16/07/2014 14:18, Sebastian Tanase ha scritto: -static int64_t clocks_offset; -if (!icount_align_option) { -return; +static int64_t realtime_clock_value; Does this really need to be static? +if (icount_align_option || !realtime_clock_value) { +realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); } -sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); /* Compute offset between the 2 clocks. */ if (!clocks_offset) { -clocks_offset = sc-realtime_clock - +clocks_offset = realtime_clock_value - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); } Isn't clocks_offset roughly the same as -timers_state.cpu_clock_offset? If so, you could be some simplification in the code. Feel free to move the TimersState struct definition to include/sysemu/cpus.h and make timers_state public. +cpu_fprintf(f, Host - Guest clock %ld(ms)\n, +(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - clocks_offset - + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL))/SCALE_MS); I think this is (cpu_get_clock() - cpu_get_icount()) / SCALE_MS. Paolo
Re: [Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks
Il 16/07/2014 14:18, Sebastian Tanase ha scritto: v3 - v4 * Add better error handling for 'strtol' in patch 2 * Add 'Sleep' instead of 'nanosleep' for Windows hosts in patch 4 * Remove function pointers from patches 4 and 5 Hi Sebastian, I think we're getting really close. I asked a question about clocks_offset; I think it's not necessary to compute it in cpu-exec.c because the same information is available elsewhere. It is also probably not necessary to make it public. Can you please check this? Once this is sorted out, the patch should be ready for inclusion in 2.2. Thanks for your effort! Paolo
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 15/07/2014 23:25, Andrey Korolyov ha scritto: On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote: On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com wrote: On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote: Hello, the issue is not specific to the iothread code because generic virtio-blk also hangs up: Do you know which version works well? If you could bisect, that'll help a lot. Thanks, Amit Hi, 2.0 works definitely well. I`ll try to finish bisection today, though every step takes about 10 minutes to complete. Yay! It is even outside of virtio-blk. commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 Author: Marcelo Tosatti mtosa...@redhat.com Date: Tue Jun 3 13:34:48 2014 -0300 kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec calculation Ensure proper env-tsc value for kvmclock_current_nsec calculation. Reported-by: Marcin Gibuła m.gib...@beyond.pl Cc: qemu-sta...@nongnu.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include qemu/host-utils.h #include sysemu/sysemu.h #include sysemu/kvm.h -#include sysemu/cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); -assert(time.tsc_timestamp = migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift 0) { delta = -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s-clock_valid) { return; } - -cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data); if (ret 0) { fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); +cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret = 0) { qemu_file_set_rate_limit(s-file, INT64_MAX); It could also be useful to apply the above patch _and_ revert a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce. Paolo Yes, it solved the issue for me! (it took much time to check because most of country` debian mirrors went inconsistent by some reason) Also trivial addition: diff --git a/migration.c b/migration.c index 34f2325..65d1c88 100644 --- a/migration.c +++ b/migration.c @@ -25,6 +25,7 @@ #include qemu/thread.h #include qmp-commands.h #include trace.h +#include sysemu/cpus.h And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ? That is, test with a stock qemu.git tree and the patch sent today, on this thread, to move cpu_synchronize_all_states ? The main reason for things to work for me is a revert of 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other patches. I had tested two cases, with Alexander`s patch completely reverted plus suggestion from Marcelo and only with revert 9b178682 plug same suggestion. The difference is that the until Alexander` patch is not reverted, live migration is always failing by the timeout value, and when reverted migration always succeeds in 8-10 seconds. Appropriate diffs are attached for the reference. Andrey, Can you please apply only the following
Re: [Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks
On Wed, 16 Jul 2014 14:18:00 +0200 Sebastian Tanase sebastian.tan...@openwide.fr wrote: The icount option already implemented in QEMU allows the guest to run at a theoretical frequency of 1/(2^N) GHz (N is the icount parameter). The goal of this patch is to have a real guest frequency close to the one imposed by using the icount option. Is this really an RFC?
[Qemu-devel] [Bug 1342686] [NEW] Windows 95 setup hangs
Public bug reported: Windows 95 (first version, not 95A or OSR2) setup hangs on QEMU version 2.0.0. It was compiled from the sources on Fedora 20. Setup starts without problems, it goes through the first phase and then hangs on the Getting ready to run Windows 95 for the first time... screen (http://www.guidebookgallery.org/screenshots/win95#win95startupshutdownsplash1correctaspect). Steps to reproduce: qemu-img create -f qcow2 win95.img 2G qemu -L . -machine isapc -cpu pentium -m 32 -vga std -soundhw sb16 -hda win95.img -cdrom POLWIN95.ISO -fda BOOT95A.IMA -boot a after this select default options everywhere. When it asks to remove the boot floppy do eject floppy0 and confirm. It displays the Getting ready to run Windows 95 for the first time... and hangs. The same behavior can be observed on 2.1.0-rc2. On 1.7.1 It hangs immediately after this screen, it hangs on displaying a underscore cursor. I managed to complete the setup on QEMU 1.6.2. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1342686 Title: Windows 95 setup hangs Status in QEMU: New Bug description: Windows 95 (first version, not 95A or OSR2) setup hangs on QEMU version 2.0.0. It was compiled from the sources on Fedora 20. Setup starts without problems, it goes through the first phase and then hangs on the Getting ready to run Windows 95 for the first time... screen (http://www.guidebookgallery.org/screenshots/win95#win95startupshutdownsplash1correctaspect). Steps to reproduce: qemu-img create -f qcow2 win95.img 2G qemu -L . -machine isapc -cpu pentium -m 32 -vga std -soundhw sb16 -hda win95.img -cdrom POLWIN95.ISO -fda BOOT95A.IMA -boot a after this select default options everywhere. When it asks to remove the boot floppy do eject floppy0 and confirm. It displays the Getting ready to run Windows 95 for the first time... and hangs. The same behavior can be observed on 2.1.0-rc2. On 1.7.1 It hangs immediately after this screen, it hangs on displaying a underscore cursor. I managed to complete the setup on QEMU 1.6.2. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1342686/+subscriptions
Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Thu, Jul 03, 2014 at 11:27:40PM +0300, Michael S. Tsirkin wrote: On Thu, Jul 03, 2014 at 12:09:28PM -0700, Jesse Barnes wrote: On Thu, 3 Jul 2014 14:26:12 -0400 Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Thu, Jul 03, 2014 at 10:32:12AM +0300, Michael S. Tsirkin wrote: On Wed, Jul 02, 2014 at 12:23:37PM -0400, Konrad Rzeszutek Wilk wrote: On Wed, Jul 02, 2014 at 04:50:15PM +0200, Paolo Bonzini wrote: Il 02/07/2014 16:00, Konrad Rzeszutek Wilk ha scritto: With this long thread I lost a bit context about the challenges that exists. But let me try summarizing it here - which will hopefully get some consensus. 1). Fix IGD hardware to not use Southbridge magic addresses. We can moan and moan but I doubt it is going to change. There are two problems: - Northbridge (i.e. MCH i.e. PCI host bridge) configuration space addresses Right. So in drivers/gpu/drm/i915/i915_dma.c: 1135 #define MCHBAR_I915 0x44 1136 #define MCHBAR_I965 0x48 1147 int reg = INTEL_INFO(dev)-gen = 4 ? MCHBAR_I965 : MCHBAR_I915; 1152 if (INTEL_INFO(dev)-gen = 4) 1153 pci_read_config_dword(dev_priv-bridge_dev, reg + 4, temp_hi); 1154 pci_read_config_dword(dev_priv-bridge_dev, reg, temp_lo); 1155 mchbar_addr = ((u64)temp_hi 32) | temp_lo; and 1139 #define DEVEN_REG 0x54 1193 int mchbar_reg = INTEL_INFO(dev)-gen = 4 ? MCHBAR_I965 : MCHBAR_I915; 1202 if (IS_I915G(dev) || IS_I915GM(dev)) { 1203 pci_read_config_dword(dev_priv-bridge_dev, DEVEN_REG, temp); 1204 enabled = !!(temp DEVEN_MCHBAR_EN); 1205 } else { 1206 pci_read_config_dword(dev_priv-bridge_dev, mchbar_reg, temp); 1207 enabled = temp 1; 1208 } - Southbridge (i.e. PCH i.e. ISA bridge) vendor/device ID; some versions of the driver identify it by class, some versions identify it by slot (1f.0). Right, So in drivers/gpu/drm/i915/i915_drv.c the giant intel_detect_pch which sets the pch_type based on : 432 if (pch-vendor == PCI_VENDOR_ID_INTEL) { 433 unsigned short id = pch-device INTEL_PCH_DEVICE_ID_MASK; 434 dev_priv-pch_id = id; 435 436 if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) { It checks for 0x3b00, 0x1c00, 0x1e00, 0x8c00 and 0x9c00. The INTEL_PCH_DEVICE_ID_MASK is 0xff00 To solve the first, make a new machine type, PIIX4-based, and pass through the registers you need. The patch must document _exactly_ why the registers are safe to pass. If they are not reserved on PIIX4, the patch must document what the same offsets mean on PIIX4, and why it's sensible to assume that firmware for virtual machine will not read/write them. Bonus point for also documenting the same for Q35. OK. They look to be related to setting up an MBAR , but I don't understand why it is needed. Hopefully some of the i915 folks CC-ed here can answer. In particular, I think setting up MBAR should move out of i915 and become the bridge driver tweak on linux and windows. That is an excellent idea. However I am still curious to _why_ it has to be done in the first place. The display part of the GPU is partly on the PCH, and it's possible to mix match them a bit, so we have this detection code to figure things out. In some cases, the PCH display portion may not even be present, so we look for that too. Practically speaking, we could probably assume specific CPU/PCH combos, as I don't think they're generally mixed across generations, though SNB and IVB did have compatible sockets, so there is the possibility of mixing CPT and PPT PCHs, but those are register identical as far as the graphics driver is concerned, so even that should be safe. Beyond that, the other MCH data we need to look at is mirrored into the GPU's
[Qemu-devel] [PATCH 0/2] Add SP810 to Versatile Express boards
Hi, The Versatile Express emulation in QEMU currently does not have a model of the SP810 used in real hardware. The registers provided by this System Controller can be used to set the frequency of the SP804 timers. On newer releases of the board the SP804 is set to 32kHz by default and has to be increased by writing to the SCCTRL. See: http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/038696.html Since QEMU sets the SP804 timers to 1MHz by default this should be reflected in the SCCTRL register. These two patches add a model of the SP804 to the vexpress-boards and sets the SCCTRL register so that software that queries this register behaves as expected. Feedback is greatly appreciated! Best, Fabian Fabian Aggeler (2): hw/misc/arm_sp810: Create SP810 device hw/arm/vexpress: add SP810 to the vexpress default-configs/arm-softmmu.mak | 1 + hw/arm/vexpress.c | 11 ++- hw/misc/Makefile.objs | 1 + hw/misc/arm_sp810.c | 184 4 files changed, 195 insertions(+), 2 deletions(-) create mode 100644 hw/misc/arm_sp810.c -- 1.8.3.2
[Qemu-devel] [PATCH 1/2] hw/misc/arm_sp810: Create SP810 device
This adds a device model for the PrimeXsys System Controller (SP810) which is present in the Versatile Express motherboards. It is so far read-only but allows to set the SCCTRL register. Signed-off-by: Fabian Aggeler aggel...@ethz.ch --- default-configs/arm-softmmu.mak | 1 + hw/misc/Makefile.objs | 1 + hw/misc/arm_sp810.c | 184 3 files changed, 186 insertions(+) create mode 100644 hw/misc/arm_sp810.c diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index f3513fa..67ba99a 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -55,6 +55,7 @@ CONFIG_PL181=y CONFIG_PL190=y CONFIG_PL310=y CONFIG_PL330=y +CONFIG_SP810=y CONFIG_CADENCE=y CONFIG_XGMAC=y CONFIG_EXYNOS4=y diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index 979e532..49d023b 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -13,6 +13,7 @@ common-obj-$(CONFIG_PL310) += arm_l2x0.o common-obj-$(CONFIG_INTEGRATOR_DEBUG) += arm_integrator_debug.o common-obj-$(CONFIG_A9SCU) += a9scu.o common-obj-$(CONFIG_ARM11SCU) += arm11scu.o +common-obj-$(CONFIG_SP810) += arm_sp810.o # PKUnity SoC devices common-obj-$(CONFIG_PUV3) += puv3_pm.o diff --git a/hw/misc/arm_sp810.c b/hw/misc/arm_sp810.c new file mode 100644 index 000..82cbcf0 --- /dev/null +++ b/hw/misc/arm_sp810.c @@ -0,0 +1,184 @@ +/* + * ARM PrimeXsys System Controller (SP810) + * + * Copyright (C) 2014 Fabian Aggeler aggel...@ethz.ch + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include hw/hw.h +#include hw/sysbus.h + +#define LOCK_VALUE 0xa05f + +#define TYPE_ARM_SP810 arm_sp810 +#define ARM_SP810(obj) OBJECT_CHECK(arm_sp810_state, (obj), TYPE_ARM_SP810) + +typedef struct { +SysBusDevice parent_obj; +MemoryRegion iomem; + +uint32_t sc_ctrl; /* SCCTRL */ +} arm_sp810_state; + +#define TIMEREN0SEL (1 15) +#define TIMEREN1SEL (1 17) +#define TIMEREN2SEL (1 19) +#define TIMEREN3SEL (1 21) + +static const VMStateDescription vmstate_arm_sysctl = { +.name = arm_sp810, +.version_id = 1, +.minimum_version_id = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT32(sc_ctrl, arm_sp810_state), +VMSTATE_END_OF_LIST() +} +}; + +static uint64_t arm_sp810_read(void *opaque, hwaddr offset, unsigned size) +{ +arm_sp810_state *s = (arm_sp810_state *)opaque; + +switch (offset) { +case 0x000: /* SCCTRL */ +return s-sc_ctrl; +case 0x004: /* SCSYSSTAT */ +case 0x008: /* SCIMCTRL */ +case 0x00C: /* SCIMSTAT */ +case 0x010: /* SCXTALCTRL */ +case 0x014: /* SCPLLCTRL */ +case 0x018: /* SCPLLFCTRL */ +case 0x01C: /* SCPERCTRL0 */ +case 0x020: /* SCPERCTRL1 */ +case 0x024: /* SCPEREN */ +case 0x028: /* SCPERDIS */ +case 0x02C: /* SCPERCLKEN */ +case 0x030: /* SCPERSTAT */ +case 0xEE0: /* SCSYSID0 */ +case 0xEE4: /* SCSYSID1 */ +case 0xEE8: /* SCSYSID2 */ +case 0xEEC: /* SCSYSID3 */ +case 0xF00: /* SCITCR */ +case 0xF04: /* SCITIR0 */ +case 0xF08: /* SCITIR1 */ +case 0xF0C: /* SCITOR */ +case 0xF10: /* SCCNTCTRL */ +case 0xF14: /* SCCNTDATA */ +case 0xF18: /* SCCNTSTEP */ +case 0xFE0: /* SCPERIPHID0 */ +case 0xFE4: /* SCPERIPHID1 */ +case 0xFE8: /* SCPERIPHID2 */ +case 0xFEC: /* SCPERIPHID3 */ +case 0xFF0: /* SCPCELLID0 */ +case 0xFF4: /* SCPCELLID1 */ +case 0xFF8: /* SCPCELLID2 */ +case 0xFFC: /* SCPCELLID3 */ +default: +qemu_log_mask(LOG_UNIMP, +arm_sp810_read: Register not yet implemented: +0x%x\n, +(int)offset); +return 0; +} +} + + + +static void arm_sp810_write(void *opaque, hwaddr offset, + uint64_t val, unsigned size) +{ +switch (offset) { +case 0x000: /* SCCTRL */ +case 0x004: /* SCSYSSTAT */ +case 0x008: /* SCIMCTRL */ +case 0x00C: /* SCIMSTAT */ +case 0x010: /* SCXTALCTRL */ +case 0x014: /* SCPLLCTRL */ +case 0x018: /* SCPLLFCTRL */ +case 0x01C: /* SCPERCTRL0 */ +case 0x020: /* SCPERCTRL1 */ +case 0x024: /* SCPEREN */ +case 0x028: /* SCPERDIS */ +case 0x02C: /* SCPERCLKEN */ +case 0x030: /* SCPERSTAT */ +case 0xEE0: /* SCSYSID0 */ +case 0xEE4: /* SCSYSID1 */ +case 0xEE8: /* SCSYSID2 */ +case 0xEEC: /* SCSYSID3 */ +case 0xF00: /* SCITCR */ +case 0xF04: /* SCITIR0 */ +case 0xF08: /* SCITIR1 */ +case
[Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress
The SP810, which is present in the Versatile Express motherboards, allows to set the timing reference to either REFCLK or TIMCLK. QEMU currently sets the SP804 timer to 1MHz by default. To reflect this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1). Signed-off-by: Fabian Aggeler aggel...@ethz.ch --- hw/arm/vexpress.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c index a88732c..b96c3fd 100644 --- a/hw/arm/vexpress.c +++ b/hw/arm/vexpress.c @@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, const char *name, static void vexpress_common_init(VEDBoardInfo *daughterboard, MachineState *machine) { -DeviceState *dev, *sysctl, *pl041; +DeviceState *dev, *sysctl, *pl041, *sp810; qemu_irq pic[64]; uint32_t sys_id; DriveInfo *dinfo; @@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo *daughterboard, qdev_init_nofail(sysctl); sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]); -/* VE_SP810: not modelled */ +/* VE_SP810 */ +sp810 = qdev_create(NULL, arm_sp810); +/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */ +qdev_prop_set_uint32(sp810, sc-ctrl, (1 15) | (1 17) + | (1 19) | (1 21)); +qdev_init_nofail(sp810); +sysbus_mmio_map(SYS_BUS_DEVICE(sp810), 0, map[VE_SP810]); + /* VE_SERIALPCI: not modelled */ pl041 = qdev_create(NULL, pl041); -- 1.8.3.2
Re: [Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress
Fabian Aggeler writes: The SP810, which is present in the Versatile Express motherboards, allows to set the timing reference to either REFCLK or TIMCLK. QEMU currently sets the SP804 timer to 1MHz by default. To reflect this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1). Signed-off-by: Fabian Aggeler aggel...@ethz.ch --- hw/arm/vexpress.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c index a88732c..b96c3fd 100644 --- a/hw/arm/vexpress.c +++ b/hw/arm/vexpress.c @@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, const char *name, static void vexpress_common_init(VEDBoardInfo *daughterboard, MachineState *machine) { -DeviceState *dev, *sysctl, *pl041; +DeviceState *dev, *sysctl, *pl041, *sp810; qemu_irq pic[64]; uint32_t sys_id; DriveInfo *dinfo; @@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo *daughterboard, qdev_init_nofail(sysctl); sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]); -/* VE_SP810: not modelled */ +/* VE_SP810 */ +sp810 = qdev_create(NULL, arm_sp810); +/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */ +qdev_prop_set_uint32(sp810, sc-ctrl, (1 15) | (1 17) + | (1 19) | (1 21)); snip Could the #defines in the first patch be moved into a header and used here rather than manually setting these bits? -- Alex Bennée
[Qemu-devel] [PATCH for-2.1 0/2] raw-posix: Fail gracefully if no working alignment is found
Kevin Wolf (2): block: Add Error argument to bdrv_refresh_limits() raw-posix: Fail gracefully if no working alignment is found block.c | 33 +++-- block/iscsi.c | 3 +-- block/qcow2.c | 4 +--- block/qed.c | 4 +--- block/raw-posix.c | 39 --- block/raw_bsd.c | 3 +-- block/stream.c| 2 +- block/vmdk.c | 4 +--- include/block/block.h | 2 +- include/block/block_int.h | 2 +- 10 files changed, 59 insertions(+), 37 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()
Signed-off-by: Kevin Wolf kw...@redhat.com --- block.c | 33 +++-- block/iscsi.c | 3 +-- block/qcow2.c | 4 +--- block/qed.c | 4 +--- block/raw-posix.c | 4 +--- block/raw_bsd.c | 3 +-- block/stream.c| 2 +- block/vmdk.c | 4 +--- include/block/block.h | 2 +- include/block/block_int.h | 2 +- 10 files changed, 32 insertions(+), 29 deletions(-) diff --git a/block.c b/block.c index 3e252a2..8cf519b 100644 --- a/block.c +++ b/block.c @@ -508,19 +508,24 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp) return ret; } -int bdrv_refresh_limits(BlockDriverState *bs) +void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) { BlockDriver *drv = bs-drv; +Error *local_err = NULL; memset(bs-bl, 0, sizeof(bs-bl)); if (!drv) { -return 0; +return; } /* Take some limits from the children as a default */ if (bs-file) { -bdrv_refresh_limits(bs-file); +bdrv_refresh_limits(bs-file, local_err); +if (local_err) { +error_propagate(errp, local_err); +return; +} bs-bl.opt_transfer_length = bs-file-bl.opt_transfer_length; bs-bl.opt_mem_alignment = bs-file-bl.opt_mem_alignment; } else { @@ -528,7 +533,11 @@ int bdrv_refresh_limits(BlockDriverState *bs) } if (bs-backing_hd) { -bdrv_refresh_limits(bs-backing_hd); +bdrv_refresh_limits(bs-backing_hd, local_err); +if (local_err) { +error_propagate(errp, local_err); +return; +} bs-bl.opt_transfer_length = MAX(bs-bl.opt_transfer_length, bs-backing_hd-bl.opt_transfer_length); @@ -539,10 +548,8 @@ int bdrv_refresh_limits(BlockDriverState *bs) /* Then let the driver override it */ if (drv-bdrv_refresh_limits) { -return drv-bdrv_refresh_limits(bs); +drv-bdrv_refresh_limits(bs, errp); } - -return 0; } /* @@ -993,7 +1000,13 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, goto free_and_fail; } -bdrv_refresh_limits(bs); +bdrv_refresh_limits(bs, local_err); +if (local_err) { +error_propagate(errp, local_err); +ret = -EINVAL; +goto free_and_fail; +} + assert(bdrv_opt_mem_align(bs) != 0); assert((bs-request_alignment != 0) || bs-sg); return 0; @@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd) bdrv_op_unblock(bs-backing_hd, BLOCK_OP_TYPE_COMMIT, bs-backing_blocker); out: -bdrv_refresh_limits(bs); +bdrv_refresh_limits(bs, NULL); } /* @@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state) BDRV_O_CACHE_WB); reopen_state-bs-read_only = !(reopen_state-flags BDRV_O_RDWR); -bdrv_refresh_limits(reopen_state-bs); +bdrv_refresh_limits(reopen_state-bs, NULL); } /* diff --git a/block/iscsi.c b/block/iscsi.c index f3e83e2..a7bb697 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1450,7 +1450,7 @@ static void iscsi_close(BlockDriverState *bs) memset(iscsilun, 0, sizeof(IscsiLun)); } -static int iscsi_refresh_limits(BlockDriverState *bs) +static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp) { IscsiLun *iscsilun = bs-opaque; @@ -1475,7 +1475,6 @@ static int iscsi_refresh_limits(BlockDriverState *bs) } bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len, iscsilun); -return 0; } /* Since iscsi_open() ignores bdrv_flags, there is nothing to do here in diff --git a/block/qcow2.c b/block/qcow2.c index b0faa69..445ead4 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -855,13 +855,11 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags, return ret; } -static int qcow2_refresh_limits(BlockDriverState *bs) +static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVQcowState *s = bs-opaque; bs-bl.write_zeroes_alignment = s-cluster_sectors; - -return 0; } static int qcow2_set_key(BlockDriverState *bs, const char *key) diff --git a/block/qed.c b/block/qed.c index cd4872b..7944832 100644 --- a/block/qed.c +++ b/block/qed.c @@ -528,13 +528,11 @@ out: return ret; } -static int bdrv_qed_refresh_limits(BlockDriverState *bs) +static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVQEDState *s = bs-opaque; bs-bl.write_zeroes_alignment = s-header.cluster_size BDRV_SECTOR_BITS; - -return 0; } /* We have nothing to do for QED reopen, stubs just return diff --git a/block/raw-posix.c b/block/raw-posix.c index 2bcc73d..ef497b2 100644 --- a/block/raw-posix.c +++
[Qemu-devel] [PATCH qemu] i386, linux-headers: Add support for kvm_get_rng_seed
This updates x86's kvm_para.h for the feature bit definition and target-i386/cpu.c for the feature name and default. Signed-off-by: Andy Lutomirski l...@amacapital.net --- linux-headers/asm-x86/kvm_para.h | 2 ++ target-i386/cpu.c| 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index e41c5c1..a9b27ce 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 8fd1497..4ea7e6c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = { static const char *kvm_feature_name[] = { kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock, kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt, -NULL, NULL, NULL, NULL, +kvm_get_rng_seed, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = { (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_PV_EOI) | -(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT), +(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | +(1 KVM_FEATURE_GET_RNG_SEED), [FEAT_1_ECX] = CPUID_EXT_X2APIC, }; -- 1.9.3
Re: [Qemu-devel] [musl] Re: Bogus struct stat64 for qemu-microblaze (user emulation)?
On Wed, Jul 16, 2014 at 09:36:23AM +0100, Peter Maydell wrote: On 16 July 2014 05:02, Rich Felker dal...@libc.org wrote: The qemu-microblaze definition of struct stat64 seems to mismatch the kernel definition, which is using asm-generic/stat.h. See: http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall_defs.h;h=c9e6323905486452f518102bf40ba73143c9d601;hb=HEAD#l1469 http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall.c;h=a50229d0d72fc68966515fcf2bc308b833a3c032;hb=HEAD#l4949 This seems to be causing a truncated-to-32-bit inode number to be stored in the location where st_ino should reside, and a spurious copy of the inode number to be written in a unused slot at the end of the structure. Sounds quite plausible -- we've had issues with other archs not having correct stat struct definitions in QEMU. I don't suppose anybody's done much testing of the microblaze linux-user code. The bug seems to have been introduced here. http://git.qemu.org/?p=qemu.git;a=commitdiff;h=a523eb06ec3fb2f4f4f4d362bb23704811d11379 I'm CC'ing the author/committer in case he has any input on why he did this. Is my analysis correct? Stefan Kristiansson and I found this while working on the or1k port of musl libc, where it seems our structure for the existing microblaze port is wrongly aligned with the qemu definition rather than the definition the real kernel is using. Before I try correcting this on our side, I want to make sure we're working with the right version. I would definitely trust the kernel definition, not QEMU's! Yes. Rich
[Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases (was: [ANNOUNCE] QEMU 2.1.0-rc2 is now available)
Am 16.07.2014 02:26, schrieb Michael Roth: Hello, On behalf of the QEMU Team, I'd like to announce the availability of the third release candidate for the QEMU 2.1 release. This release is meant for testing purposes and should not be used in a production environment. http://wiki.qemu.org/download/qemu-2.1.0-rc2.tar.bz2 You can help improve the quality of the QEMU 2.1 release by testing this release and reporting bugs on Launchpad: https://bugs.launchpad.net/qemu/ Hi, a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of view, this change results in a regression for Debian and similar Linux distributions: QEMU no longer includes iscsi features. In this special case, the current Debian stable includes QEMU packages with libiscsi 1.4, but new builds won't include it because that version is now too old. Debian testing includes a brand new libiscsi, but it does not include libiscsi.pc, so pkg-config won't know that it is available and configure will disable libiscsi. I have a patch which fixes this, so QEMU for Debian testing could include libiscsi again. Is a feature regression like this one acceptable? Do we need additional testing (maybe run the build bots with --enable-xxx, so builds fail when xxx no longer works)? Regards Stefan
[Qemu-devel] [PATCH RFC 00/14] dataplane: performance optimization and multi virtqueue
Hi, These patches bring up below 4 changes: - introduce selective coroutine bypass mechanism for improving performance of virtio-blk dataplane with raw format image - introduce object allocation pool and apply it to virtio-blk dataplane for improving its performance - linux-aio changes: fixing for cases of -EAGAIN and partial completion, increase max events to 256, and remove one unuseful fields in 'struct qemu_laiocb' - support multi virtqueue for virtio-blk dataplane The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1], and the 3.17 linux kernel[2] will support the feature in virtio-blk driver. For those who wants to play the stuff, the kernel side patche can be found in either Jens's block tree[3] or linux-next[4]. Below fio script running from VM is used for test improvement of these patches: [global] direct=1 size=128G bsrange=4k-4k timeout=120 numjobs=${JOBS} ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f] rw=randread One quadcore VM is created in below two hosts to run above fio test: - laptop(quadcore: 2 physical cores, 2 threads per physical core) - server(16cores: 8 physical cores, 2 threads per physical core) Follows the test result on throughput improvement(IOPS) with this patchset(2 virtqueues per virito-blk device) against QEMU 2.1.0-rc1: --- | VM in server host | VM in laptop host --- JOBS=2 | +21% | +30% --- JOBS=4 | +84% | +85% --- From above result, we can see both scalability and performance get improved a lot. After commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O), average time for submiting one single request has been increased a lot, as my trace, the average time taken for submiting one request has been doubled even though block plugunplug mechanism is introduced to ease its effect. That is why this patchset introduces selective coroutine bypass mechanism and object allocation pool for saving the time first. Based on QEMU 2.0, only single virtio-blk dataplane multi virtqueue patch can get much better improvement than current result[5]. TODO: - optimize block layer for linux aio, so that more time can be saved for submitting request - support more than one aio-context for improving virtio-blk performance async.c |1 + block.c | 129 ++- block/linux-aio.c | 93 +++- block/raw-posix.c | 34 ++ hw/block/dataplane/virtio-blk.c | 221 ++- hw/block/virtio-blk.c | 30 +- include/block/aio.h | 13 +++ include/block/block.h |9 ++ include/block/coroutine.h |7 ++ include/block/coroutine_int.h |5 + include/hw/virtio/virtio-blk.h | 15 +++ include/qemu/obj_pool.h | 64 qemu-coroutine-lock.c |4 +- qemu-coroutine.c| 33 ++ 14 files changed, 553 insertions(+), 105 deletions(-) [1], http://marc.info/?l=linux-apim=140486843317107w=2 [2], http://marc.info/?l=linux-apim=140418368421229w=2 [3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ #for-3.17/drivers [4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/ [5], http://marc.info/?l=linux-apim=140377573830230w=2 Thanks, -- Ming Lei
[Qemu-devel] [PATCH RFC 03/14] block: support to bypass qemu coroutinue
This patch adds bypass mode support for the coroutinue in bdrv_co_aio_rw_vector(), which is in the fast path of lots of block device, especially for virtio-blk dataplane. Signed-off-by: Ming Lei ming@canonical.com --- block.c | 129 +++ 1 file changed, 105 insertions(+), 24 deletions(-) diff --git a/block.c b/block.c index 128a14f..db7ba37 100644 --- a/block.c +++ b/block.c @@ -55,6 +55,21 @@ struct BdrvDirtyBitmap { QLIST_ENTRY(BdrvDirtyBitmap) list; }; +typedef struct CoroutineIOCompletion { +Coroutine *coroutine; +int ret; +bool bypass; +QEMUIOVector *bounced_iov; +} CoroutineIOCompletion; + +typedef struct BlockDriverAIOCBCoroutine { +BlockDriverAIOCB common; +BlockRequest req; +bool is_write; +bool *done; +QEMUBH *bh; +} BlockDriverAIOCBCoroutine; + #define NOT_DONE 0x7fff /* used while emulated sync operation in progress */ #define COROUTINE_POOL_RESERVATION 64 /* number of coroutines to reserve */ @@ -122,6 +137,21 @@ int is_windows_drive(const char *filename) } #endif +static CoroutineIOCompletion *bdrv_get_co_io_comp(BlockDriverAIOCBCoroutine + *acb) +{ +return (CoroutineIOCompletion *)((void *)acb + + sizeof(BlockDriverAIOCBCoroutine)); +} + +static BlockDriverAIOCBCoroutine *bdrv_get_aio_co(CoroutineIOCompletion *co) +{ +assert(co-bypass); + +return (BlockDriverAIOCBCoroutine *)((void *)co - + sizeof(BlockDriverAIOCBCoroutine)); +} + /* throttling disk I/O limits */ void bdrv_set_io_limits(BlockDriverState *bs, ThrottleConfig *cfg) @@ -3074,7 +3104,16 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, ret = drv-bdrv_co_readv(bs, sector_num, local_sectors, local_qiov); -qemu_iovec_destroy(local_qiov); + +if (qemu_coroutine_self_bypassed()) { +CoroutineIOCompletion *pco = bdrv_get_co_io_comp( + (BlockDriverAIOCBCoroutine *) + qemu_coroutine_get_var()); +pco-bounced_iov = g_malloc(sizeof(QEMUIOVector)); +*pco-bounced_iov = local_qiov; +} else { +qemu_iovec_destroy(local_qiov); +} } else { ret = 0; } @@ -4652,15 +4691,6 @@ static BlockDriverAIOCB *bdrv_aio_writev_em(BlockDriverState *bs, return bdrv_aio_rw_vector(bs, sector_num, qiov, nb_sectors, cb, opaque, 1); } - -typedef struct BlockDriverAIOCBCoroutine { -BlockDriverAIOCB common; -BlockRequest req; -bool is_write; -bool *done; -QEMUBH* bh; -} BlockDriverAIOCBCoroutine; - static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb) { AioContext *aio_context = bdrv_get_aio_context(blockacb-bs); @@ -4679,6 +4709,12 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = { .cancel = bdrv_aio_co_cancel_em, }; +static const AIOCBInfo bdrv_em_co_bypass_aiocb_info = { +.aiocb_size = sizeof(BlockDriverAIOCBCoroutine) + + sizeof(CoroutineIOCompletion), +.cancel = bdrv_aio_co_cancel_em, +}; + static void bdrv_co_em_bh(void *opaque) { BlockDriverAIOCBCoroutine *acb = opaque; @@ -4698,6 +4734,12 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque) { BlockDriverAIOCBCoroutine *acb = opaque; BlockDriverState *bs = acb-common.bs; +bool bypass = qemu_coroutine_self_bypassed(); + +if (bypass) { +qemu_coroutine_set_var(acb); +memset(bdrv_get_co_io_comp(acb), 0, sizeof(CoroutineIOCompletion)); +} if (!acb-is_write) { acb-req.error = bdrv_co_do_readv(bs, acb-req.sector, @@ -4707,8 +4749,10 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque) acb-req.nb_sectors, acb-req.qiov, acb-req.flags); } -acb-bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb); -qemu_bh_schedule(acb-bh); +if (!bypass) { +acb-bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb); +qemu_bh_schedule(acb-bh); +} } static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, @@ -4722,8 +4766,18 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, { Coroutine *co; BlockDriverAIOCBCoroutine *acb; +const AIOCBInfo *aiocb_info; +bool bypass; -acb = qemu_aio_get(bdrv_em_co_aiocb_info, bs, cb, opaque); +if (qemu_aio_get_bypass_co(bdrv_get_aio_context(bs))) { +aiocb_info = bdrv_em_co_bypass_aiocb_info; +bypass = true; +} else { +aiocb_info = bdrv_em_co_aiocb_info; +bypass = false; +} + +acb = qemu_aio_get(aiocb_info, bs, cb, opaque); acb-req.sector = sector_num; acb-req.nb_sectors = nb_sectors; acb-req.qiov = qiov;
[Qemu-devel] [PATCH RFC 14/14] dataplane: virtio-blk: support mutlti virtqueue
This patch supports to handle host notify from multi virt queues, but still process/submit io in the one iothread. Currently this patch brings up below improvement with two virtqueues(against single virtqueue): --- | VM in server host | VM in laptop host --- JOBS=2 | +8% | -11% --- JOBS=4 | +64% | +29% --- The reason is that commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O) causes average submitting time for single request doubled, so io thread performance get decreased. Based on QEMU 2.0, only this single patch can achieve very good improvement: http://marc.info/?l=linux-apim=140377573830230w=2 So hope QEMU block layer can get optimized for linux aio, or maybe a fast path is needed for linux aio. Signed-off-by: Ming Lei ming@canonical.com --- hw/block/dataplane/virtio-blk.c | 209 --- include/hw/virtio/virtio-blk.h |1 + 2 files changed, 153 insertions(+), 57 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 828fe99..bd66274 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -26,6 +26,11 @@ #define REQ_POOL_SZ 128 +typedef struct { +EventNotifier notifier; +VirtIOBlockDataPlane *s; +} VirtIOBlockNotifier; + struct VirtIOBlockDataPlane { bool started; bool starting; @@ -35,9 +40,10 @@ struct VirtIOBlockDataPlane { VirtIOBlkConf *blk; VirtIODevice *vdev; -Vring vring;/* virtqueue vring */ -EventNotifier *guest_notifier; /* irq */ -QEMUBH *bh; /* bh for guest notification */ +Vring *vring;/* virtqueue vring */ +EventNotifier **guest_notifier; /* irq */ +uint64_t pending_guest_notifier; /* pending guest notifer for vq */ +QEMUBH *bh; /* bh for guest notification */ /* Note that these EventNotifiers are assigned by value. This is * fine as long as you do not call event_notifier_cleanup on them @@ -47,7 +53,9 @@ struct VirtIOBlockDataPlane { IOThread *iothread; IOThread internal_iothread_obj; AioContext *ctx; -EventNotifier host_notifier;/* doorbell */ +VirtIOBlockNotifier *host_notifier; /* doorbell */ +uint64_t pending_host_notifier; /* pending host notifer for vq */ +QEMUBH *host_notifier_bh; /* for handle host notifier */ /* Operation blocker on BDS */ Error *blocker; @@ -60,20 +68,26 @@ struct VirtIOBlockDataPlane { }; /* Raise an interrupt to signal guest, if necessary */ -static void notify_guest(VirtIOBlockDataPlane *s) +static void notify_guest(VirtIOBlockDataPlane *s, unsigned int qid) { -if (!vring_should_notify(s-vdev, s-vring)) { -return; +if (vring_should_notify(s-vdev, s-vring[qid])) { +event_notifier_set(s-guest_notifier[qid]); } - -event_notifier_set(s-guest_notifier); } static void notify_guest_bh(void *opaque) { VirtIOBlockDataPlane *s = opaque; +unsigned int qid; +uint64_t pending = s-pending_guest_notifier; + +s-pending_guest_notifier = 0; -notify_guest(s); +while ((qid = ffsl(pending))) { +qid--; +notify_guest(s, qid); +pending = ~(1 qid); +} } static void complete_request_vring(VirtIOBlockReq *req, unsigned char status) @@ -81,7 +95,7 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status) VirtIOBlockDataPlane *s = req-dev-dataplane; stb_p(req-in-status, status); -vring_push(req-dev-dataplane-vring, req-elem, +vring_push(s-vring[req-qid], req-elem, req-qiov.size + sizeof(*req-in)); /* Suppress notification to guest by BH and its scheduled @@ -90,17 +104,15 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status) * executed in dataplane aio context even after it is * stopped, so needn't worry about notification loss with BH. */ +assert(req-qid 64); +s-pending_guest_notifier |= (1 req-qid); qemu_bh_schedule(s-bh); } -static void handle_notify(EventNotifier *e) +static void process_vq_notify(VirtIOBlockDataPlane *s, unsigned short qid) { -VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane, - host_notifier); VirtIOBlock *vblk = VIRTIO_BLK(s-vdev); -event_notifier_test_and_clear(s-host_notifier); -bdrv_io_plug(s-blk-conf.bs); for (;;) { MultiReqBuffer mrb = { .num_writes = 0, @@ -108,12 +120,13 @@ static void handle_notify(EventNotifier *e) int ret; /*
[Qemu-devel] [PATCH RFC 08/14] linux-aio: fix submit aio as a batch
In the enqueue path, we can't complete request, otherwise Co-routine re-entered recursively may be caused, so this patch fixes the issue with below ideas: - for -EAGAIN or partial completion, retry the submission by an introduced event handler - for part of completion, also update the io queue - for other failure, return the failure if in enqueue path, otherwise, abort all queued I/O Signed-off-by: Ming Lei ming@canonical.com --- block/linux-aio.c | 90 - 1 file changed, 68 insertions(+), 22 deletions(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 7ac7e8c..5eb9c92 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -51,6 +51,7 @@ struct qemu_laio_state { /* io queue for submit at batch */ LaioQueue io_q; +EventNotifier retry; /* handle -EAGAIN and partial completion */ }; static inline ssize_t io_event_ret(struct io_event *ev) @@ -154,45 +155,80 @@ static void ioq_init(LaioQueue *io_q) io_q-plugged = 0; } -static int ioq_submit(struct qemu_laio_state *s) +static void abort_queue(struct qemu_laio_state *s) +{ +int i; +for (i = 0; i s-io_q.idx; i++) { +struct qemu_laiocb *laiocb = container_of(s-io_q.iocbs[i], + struct qemu_laiocb, + iocb); +laiocb-ret = -EIO; +qemu_laio_process_completion(s, laiocb); +} +} + +static int ioq_submit(struct qemu_laio_state *s, bool enqueue) { int ret, i = 0; int len = s-io_q.idx; +int j = 0; -do { -ret = io_submit(s-ctx, len, s-io_q.iocbs); -} while (i++ 3 ret == -EAGAIN); +if (!len) { +return 0; +} -/* empty io queue */ -s-io_q.idx = 0; +ret = io_submit(s-ctx, len, s-io_q.iocbs); +if (ret == -EAGAIN) { +event_notifier_set(s-retry); +return 0; +} else if (ret 0) { +if (enqueue) { +return ret; +} -if (ret 0) { -i = 0; -} else { -i = ret; +/* in non-queue path, all IOs have to be completed */ +abort_queue(s); +ret = len; +} else if (ret == 0) { +goto out; } -for (; i len; i++) { -struct qemu_laiocb *laiocb = -container_of(s-io_q.iocbs[i], struct qemu_laiocb, iocb); - -laiocb-ret = (ret 0) ? ret : -EIO; -qemu_laio_process_completion(s, laiocb); +for (i = ret; i len; i++) { +s-io_q.iocbs[j++] = s-io_q.iocbs[i]; } + + out: +/* update io queue */ +s-io_q.idx -= ret; + return ret; } -static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb) +static void ioq_submit_retry(EventNotifier *e) +{ +struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, retry); + +event_notifier_test_and_clear(e); +ioq_submit(s, false); +} + +static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb) { unsigned int idx = s-io_q.idx; +if (unlikely(idx == s-io_q.size)) { +return -1; +} + s-io_q.iocbs[idx++] = iocb; s-io_q.idx = idx; -/* submit immediately if queue is full */ -if (idx == s-io_q.size) { -ioq_submit(s); +/* submit immediately if queue depth is above 2/3 */ +if (idx s-io_q.size * 2 / 3) { +return ioq_submit(s, true); } + +return 0; } void laio_io_plug(BlockDriverState *bs, void *aio_ctx) @@ -214,7 +250,7 @@ int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug) } if (s-io_q.idx 0) { -ret = ioq_submit(s); +ret = ioq_submit(s, false); } return ret; @@ -258,7 +294,9 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd, goto out_free_aiocb; } } else { -ioq_enqueue(s, iocbs); +if (ioq_enqueue(s, iocbs) 0) { +goto out_free_aiocb; +} } return laiocb-common; @@ -272,6 +310,7 @@ void laio_detach_aio_context(void *s_, AioContext *old_context) struct qemu_laio_state *s = s_; aio_set_event_notifier(old_context, s-e, NULL); +aio_set_event_notifier(old_context, s-retry, NULL); } void laio_attach_aio_context(void *s_, AioContext *new_context) @@ -279,6 +318,7 @@ void laio_attach_aio_context(void *s_, AioContext *new_context) struct qemu_laio_state *s = s_; aio_set_event_notifier(new_context, s-e, qemu_laio_completion_cb); +aio_set_event_notifier(new_context, s-retry, ioq_submit_retry); } void *laio_init(void) @@ -295,9 +335,14 @@ void *laio_init(void) } ioq_init(s-io_q); +if (event_notifier_init(s-retry, false) 0) { +goto out_notifer_init; +} return s; +out_notifer_init: +io_destroy(s-ctx); out_close_efd: event_notifier_cleanup(s-e); out_free_state: @@ -310,6 +355,7 @@ void laio_cleanup(void *s_)
[Qemu-devel] [PATCH RFC 12/14] hw/virtio/virtio-blk.h: introduce VIRTIO_BLK_F_MQ
Prepare for supporting mutli vqs per virtio-blk device. Signed-off-by: Ming Lei ming@canonical.com --- include/hw/virtio/virtio-blk.h |8 1 file changed, 8 insertions(+) diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index 45f8894..ad70c9a 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -42,6 +42,12 @@ #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ #define VIRTIO_BLK_F_CONFIG_WCE 11 /* write cache configurable */ +/* + * support multi vqs, and virtio_blk_config.num_queues is only + * available when this feature is enabled + */ +#define VIRTIO_BLK_F_MQ12 + #define VIRTIO_BLK_ID_BYTES 20 /* ID string length */ struct virtio_blk_config @@ -58,6 +64,8 @@ struct virtio_blk_config uint16_t min_io_size; uint32_t opt_io_size; uint8_t wce; +uint8_t unused; +uint16_t num_queues; /* must be at the end */ } QEMU_PACKED; /* These two define direction. */ -- 1.7.9.5
[Qemu-devel] [PATCH RFC 04/14] Revert raw-posix: drop raw_get_aio_fd() since it is no longer used
This reverts commit 76ef2cf5493a215efc351f48ae7094d6c183fcac. Reintroduce the helper of raw_get_aio_fd() for enabling coroutinue bypass mode in case of raw image. Signed-off-by: Ming Lei ming@canonical.com --- block/raw-posix.c | 34 ++ include/block/block.h |9 + 2 files changed, 43 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 2bcc73d..98b9626 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -2419,6 +2419,40 @@ static BlockDriver bdrv_host_cdrom = { }; #endif /* __FreeBSD__ */ +#ifdef CONFIG_LINUX_AIO +/** + * Return the file descriptor for Linux AIO + * + * This function is a layering violation and should be removed when it becomes + * possible to call the block layer outside the global mutex. It allows the + * caller to hijack the file descriptor so I/O can be performed outside the + * block layer. + */ +int raw_get_aio_fd(BlockDriverState *bs) +{ +BDRVRawState *s; + +if (!bs-drv) { +return -ENOMEDIUM; +} + +if (bs-drv == bdrv_find_format(raw)) { +bs = bs-file; +} + +/* raw-posix has several protocols so just check for raw_aio_readv */ +if (bs-drv-bdrv_aio_readv != raw_aio_readv) { +return -ENOTSUP; +} + +s = bs-opaque; +if (!s-use_aio) { +return -ENOTSUP; +} +return s-fd; +} +#endif /* CONFIG_LINUX_AIO */ + static void bdrv_file_init(void) { /* diff --git a/include/block/block.h b/include/block/block.h index 32d3676..8d15693 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -482,6 +482,15 @@ void bdrv_op_block_all(BlockDriverState *bs, Error *reason); void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason); bool bdrv_op_blocker_is_empty(BlockDriverState *bs); +#ifdef CONFIG_LINUX_AIO +int raw_get_aio_fd(BlockDriverState *bs); +#else +static inline int raw_get_aio_fd(BlockDriverState *bs) +{ +return -ENOTSUP; +} +#endif + enum BlockAcctType { BDRV_ACCT_READ, BDRV_ACCT_WRITE, -- 1.7.9.5
[Qemu-devel] [PATCH RFC 05/14] dataplane: enable selective bypassing coroutine
This patch enables selective bypassing for the coroutine in bdrv_co_aio_rw_vector() if the image format is raw. With this patch, ~16% thoughput improvement can be observed in my laptop based VM test, and ~7% improvement is observed in the VM based on server. Signed-off-by: Ming Lei ming@canonical.com --- hw/block/dataplane/virtio-blk.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index d6ba65c..2093e4a 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -28,6 +28,7 @@ struct VirtIOBlockDataPlane { bool started; bool starting; bool stopping; +bool raw_format; VirtIOBlkConf *blk; @@ -193,6 +194,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk, error_setg(s-blocker, block device is in use by data plane); bdrv_op_block_all(blk-conf.bs, s-blocker); +s-raw_format = (raw_get_aio_fd(blk-conf.bs) = 0); + *dataplane = s; } @@ -262,6 +265,10 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) /* Kick right away to begin processing requests already in vring */ event_notifier_set(virtio_queue_get_host_notifier(vq)); +if (s-raw_format) { +qemu_aio_set_bypass_co(s-ctx, true); +} + /* Get this show started by hooking up our callbacks */ aio_context_acquire(s-ctx); aio_set_event_notifier(s-ctx, s-host_notifier, handle_notify); @@ -291,6 +298,9 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s) aio_context_release(s-ctx); +if (s-raw_format) { +qemu_aio_set_bypass_co(s-ctx, false); +} /* Sync vring state back to virtqueue so that non-dataplane request * processing can continue when we disable the host notifier below. */ -- 1.7.9.5
[Qemu-devel] [PATCH RFC 13/14] hw/block/virtio-blk: create num_queues vqs if dataplane is enabled
Now we only support multi vqs for dataplane case. Signed-off-by: Ming Lei ming@canonical.com --- hw/block/virtio-blk.c | 16 +++- include/hw/virtio/virtio-blk.h |3 +++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index ab99156..160b021 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -556,6 +556,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) blkcfg.physical_block_exp = get_physical_block_exp(s-conf); blkcfg.alignment_offset = 0; blkcfg.wce = bdrv_enable_write_cache(s-bs); +stw_p(blkcfg.num_queues, s-blk.num_queues); memcpy(config, blkcfg, sizeof(struct virtio_blk_config)); } @@ -590,6 +591,12 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features) if (bdrv_is_read_only(s-bs)) features |= 1 VIRTIO_BLK_F_RO; +#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE +if (s-blk.num_queues 1) { +features |= 1 VIRTIO_BLK_F_MQ; +} +#endif + return features; } @@ -739,8 +746,13 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE Error *err = NULL; #endif +int i; static int virtio_blk_id; +#ifndef CONFIG_VIRTIO_BLK_DATA_PLANE +blk-num_queues = 1; +#endif + if (!blk-conf.bs) { error_setg(errp, drive property not set); return; @@ -765,7 +777,9 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) s-rq = NULL; s-sector_mask = (s-conf-logical_block_size / BDRV_SECTOR_SIZE) - 1; -s-vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output); +for (i = 0; i blk-num_queues; i++) +s-vqs[i] = virtio_add_queue(vdev, 128, virtio_blk_handle_output); +s-vq = s-vqs[0]; s-complete_request = virtio_blk_complete_request; #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE virtio_blk_data_plane_create(vdev, blk, s-dataplane, err); diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index ad70c9a..91489b0 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -50,6 +50,8 @@ #define VIRTIO_BLK_ID_BYTES 20 /* ID string length */ +#define VIRTIO_BLK_MAX_VQS 16 /* max virtio queues supported now */ + struct virtio_blk_config { uint64_t capacity; @@ -132,6 +134,7 @@ typedef struct VirtIOBlock { VirtIODevice parent_obj; BlockDriverState *bs; VirtQueue *vq; +VirtQueue *vqs[VIRTIO_BLK_MAX_VQS]; void *rq; QEMUBH *bh; BlockConf *conf; -- 1.7.9.5
[Qemu-devel] [PATCH RFC 06/14] qemu/obj_pool.h: introduce object allocation pool
This patch introduces object allocation pool for speeding up object allocation in fast path. Signed-off-by: Ming Lei ming@canonical.com --- include/qemu/obj_pool.h | 64 +++ 1 file changed, 64 insertions(+) create mode 100644 include/qemu/obj_pool.h diff --git a/include/qemu/obj_pool.h b/include/qemu/obj_pool.h new file mode 100644 index 000..94b5f49 --- /dev/null +++ b/include/qemu/obj_pool.h @@ -0,0 +1,64 @@ +#ifndef QEMU_OBJ_POOL_HEAD +#define QEMU_OBJ_POOL_HEAD + +typedef struct { +unsigned int size; +unsigned int cnt; + +void **free_obj; +int free_idx; + +char *objs; +} ObjPool; + +static inline void obj_pool_init(ObjPool *op, void *objs_buf, void **free_objs, + unsigned int obj_size, unsigned cnt) +{ +int i; + +op-objs = (char *)objs_buf; +op-free_obj = free_objs; +op-size = obj_size; +op-cnt = cnt; + +for (i = 0; i op-cnt; i++) { +op-free_obj[i] = (void *)op-objs[i * op-size]; +} +op-free_idx = op-cnt; +} + +static inline void *obj_pool_get(ObjPool *op) +{ +void *obj; + +if (!op) { +return NULL; +} + +if (op-free_idx = 0) { +return NULL; +} + +obj = op-free_obj[--op-free_idx]; +return obj; +} + +static inline bool obj_pool_has_obj(ObjPool *op, void *obj) +{ +return op (unsigned long)obj = (unsigned long)op-objs[0] + (unsigned long)obj = + (unsigned long)op-objs[(op-cnt - 1) * op-size]; +} + +static inline void obj_pool_put(ObjPool *op, void *obj) +{ +if (!op || !obj_pool_has_obj(op, obj)) { +return; +} + +assert(op-free_idx op-cnt); + +op-free_obj[op-free_idx++] = obj; +} + +#endif -- 1.7.9.5
[Qemu-devel] [PATCH RFC 07/14] dataplane: use object pool to speed up allocation for virtio blk request
g_slice_new(VirtIOBlockReq), its free pair and access the instance is a bit slow since sizeof(VirtIOBlockReq) takes more than 40KB, so use object pool to speed up its allocation and release. With this patch, ~20% thoughput improvement can be observed in my laptop based VM test, and ~5% improvement is observed in the VM based on server. Signed-off-by: Ming Lei ming@canonical.com --- hw/block/dataplane/virtio-blk.c | 12 hw/block/virtio-blk.c | 13 +++-- include/hw/virtio/virtio-blk.h |2 ++ 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 2093e4a..828fe99 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -24,6 +24,8 @@ #include hw/virtio/virtio-bus.h #include qom/object_interfaces.h +#define REQ_POOL_SZ 128 + struct VirtIOBlockDataPlane { bool started; bool starting; @@ -51,6 +53,10 @@ struct VirtIOBlockDataPlane { Error *blocker; void (*saved_complete_request)(struct VirtIOBlockReq *req, unsigned char status); + +VirtIOBlockReq reqs[REQ_POOL_SZ]; +void *free_reqs[REQ_POOL_SZ]; +ObjPool req_pool; }; /* Raise an interrupt to signal guest, if necessary */ @@ -238,6 +244,10 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) return; } +vblk-obj_pool = s-req_pool; +obj_pool_init(vblk-obj_pool, s-reqs, s-free_reqs, + sizeof(VirtIOBlockReq), REQ_POOL_SZ); + /* Set up guest notifier (irq) */ if (k-set_guest_notifiers(qbus-parent, 1, true) != 0) { fprintf(stderr, virtio-blk failed to set guest notifier, @@ -298,6 +308,8 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s) aio_context_release(s-ctx); +vblk-obj_pool = NULL; + if (s-raw_format) { qemu_aio_set_bypass_co(s-ctx, false); } diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index c241c50..2a11bc4 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -31,7 +31,11 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) { -VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq); +VirtIOBlockReq *req = obj_pool_get(s-obj_pool); + +if (!req) { +req = g_slice_new(VirtIOBlockReq); +} req-dev = s; req-qiov.size = 0; req-next = NULL; @@ -41,7 +45,11 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) void virtio_blk_free_request(VirtIOBlockReq *req) { if (req) { -g_slice_free(VirtIOBlockReq, req); +if (obj_pool_has_obj(req-dev-obj_pool, req)) { +obj_pool_put(req-dev-obj_pool, req); +} else { +g_slice_free(VirtIOBlockReq, req); +} } } @@ -801,6 +809,7 @@ static void virtio_blk_instance_init(Object *obj) { VirtIOBlock *s = VIRTIO_BLK(obj); +s-obj_pool = NULL; object_property_add_link(obj, iothread, TYPE_IOTHREAD, (Object **)s-blk.iothread, qdev_prop_allow_set_link_before_realize, diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index afb7b8d..49ac234 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -18,6 +18,7 @@ #include hw/block/block.h #include sysemu/iothread.h #include block/block.h +#include qemu/obj_pool.h #define TYPE_VIRTIO_BLK virtio-blk-device #define VIRTIO_BLK(obj) \ @@ -135,6 +136,7 @@ typedef struct VirtIOBlock { Notifier migration_state_notifier; struct VirtIOBlockDataPlane *dataplane; #endif +ObjPool *obj_pool; } VirtIOBlock; typedef struct MultiReqBuffer { -- 1.7.9.5
[Qemu-devel] [PATCH RFC 09/14] linux-aio: increase max event to 256
This patch increases max event to 256 for the comming virtio-blk multi virtqueue support. Signed-off-by: Ming Lei ming@canonical.com --- block/linux-aio.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 5eb9c92..c06a57d 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -23,7 +23,7 @@ * than this we will get EAGAIN from io_submit which is communicated to * the guest as an I/O error. */ -#define MAX_EVENTS 128 +#define MAX_EVENTS 256 #define MAX_QUEUED_IO 128 -- 1.7.9.5
[Qemu-devel] [PATCH RFC 11/14] hw/virtio-pci: introduce num_queues property
This patch introduces the parameter of 'num_queues', and prepare for supporting mutli vqs of virtio-blk. Signed-off-by: Ming Lei ming@canonical.com --- hw/block/virtio-blk.c |1 + include/hw/virtio/virtio-blk.h |1 + 2 files changed, 2 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 2a11bc4..ab99156 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -826,6 +826,7 @@ static Property virtio_blk_properties[] = { #endif #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE DEFINE_PROP_BIT(x-data-plane, VirtIOBlock, blk.data_plane, 0, false), +DEFINE_PROP_UINT32(num_queues, VirtIOBlock, blk.num_queues, 1), #endif DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index 49ac234..45f8894 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -114,6 +114,7 @@ struct VirtIOBlkConf uint32_t scsi; uint32_t config_wce; uint32_t data_plane; +uint32_t num_queues; }; struct VirtIOBlockDataPlane; -- 1.7.9.5
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases (was: [ANNOUNCE] QEMU 2.1.0-rc2 is now available)
On 16 July 2014 17:28, Stefan Weil s...@weilnetz.de wrote: a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of view, this change results in a regression for Debian and similar Linux distributions: QEMU no longer includes iscsi features. In this special case, the current Debian stable includes QEMU packages with libiscsi 1.4, but new builds won't include it because that version is now too old. Debian testing includes a brand new libiscsi, but it does not include libiscsi.pc, so pkg-config won't know that it is available and configure will disable libiscsi. I have a patch which fixes this, so QEMU for Debian testing could include libiscsi again. Is a feature regression like this one acceptable? Do we need additional testing (maybe run the build bots with --enable-xxx, so builds fail when xxx no longer works)? In general, we should try to avoid feature regressions, but it's going to be a case-by-case thing. In this particular instance, upstream libiscsi don't recommend pre-1.9 for production use (as the commit message documents), and I don't think we would be doing our users any favours by allowing them to build something that's likely to be broken. We should of course flag up this sort of minimum version of our dependencies has been bumped info in the release notes. I think that does QEMU still build with all the features we need on our distro? has to be testing done by the people who package and maintain QEMU for each distro -- they're the only people who know which configurations options they enable and which baseline versions of their distro they still care about packaging mainline QEMU for. thanks -- PMM
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases
Il 16/07/2014 18:28, Stefan Weil ha scritto: Debian testing includes a brand new libiscsi, but it does not include libiscsi.pc, so pkg-config won't know that it is available and configure will disable libiscsi. That's a packaging bug. I have a patch which fixes this, so QEMU for Debian testing could include libiscsi again. Is a feature regression like this one acceptable? Do we need additional testing (maybe run the build bots with --enable-xxx, so builds fail when xxx no longer works)? As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit message, this was intentional. I was reluctant to do it, but ultimately Peter Lieven convinced me that it isn't just about using fancy new APIs; libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0 requires a patch to avoid segfaults, and many more if you want to run it on ARM). Paolo
Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
* Paolo Bonzini (pbonz...@redhat.com) wrote: Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto: * Paolo Bonzini (pbonz...@redhat.com) wrote: Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto: + +/* If it's already open, return it */ +if (qfs-file-return_path) { +return qfs-file-return_path; Wouldn't this leave a dangling file descriptor if you call socket_dup_return_path twice, and then close the original QEMUFile? Hmm - how? The problem is that there is no reference count on QEMUFile, so if you do f1 = open_return_path(f0); f2 = open_return_path(f0); /* now f1 == f2 */ qemu_fclose(f1); /* now f2 is dangling */ I think from the way I'm using it, I can remove the optimisation, but I do need to check. I'm not too sure what your worry is about 'f2' in this case; I guess the caller needs to know that it should only close the return path once - is that your worry? Yes. The API is not well encapsulated; a random caller of open_return_path does not know (and cannot know) whether it should close the returned file or not. OK, then yes I'll give that a go taking out those optimisations. I'm more nervous about dropping that one, because the current scheme does provide a clean way of finding the forward path if you've got the reverse (although I don't think I make use of it). If it isn't used, why keep it? It just felt pleasently symmetric being able to get the forward path by asking for the return path on the return path; but I can remove it. Source side Forward path - written by migration thread : It's OK for this to be blocking, but we end up with it being non-blocking, and modify the socket code to emulate blocking. This likely has a performance impact though. The first migration thread code drop from Juan already improved throughput a lot, even if it kept the iothread all the time and only converted from nonblocking writes to blocking. Can you give some more reasoning as to why you think this will hit the performance so much; I thought the output buffers were quite big anyway. I don't really know, it's Return path - opened by main thread, read by fd_handler on main thread : Must be non-blocking so as not to block the main thread while waiting for a partially sent command. Why can't you handle this in the migration thread (or a new postcopy thread on the source side)? Then it can stay blocking. Handling it within the migration thread would make it much more complicated (which would be bad since it's already complex enough); Ok. I'm not sure why it is more complicated since migration is essentially two-phase, one where the source drives it and one where the source just waits for requests, but I'll trust you on this. :) It's not as clean a split like that; during the postcopy phase we still do the linear page scan to send pages before they're requested, so the main migration thread code keeps going. (There's an 'interesting' balance here, send too many linear pages and they get in the way of the postcopy requests and increase the latency, but sending them means that you get a lot of the pages without having to request them which is 0 latency) Destination side Forward path - read by main thread This must be nonblocking so that the monitor keeps responding. Interesting, I suspect none of the code in there is set up for that at the moment, so how does that work during migration at the moment? It sure is. :) Oh so it is; I missed the 'qemu_set_nonblock(fd);' in process_incoming_migration On the destination side, migration is done in a coroutine (see process_incoming_migration) so it's all transparent. Only socket_get_buffer has to know about this: len = qemu_recv(s-fd, buf, size, 0); if (len != -1) { break; } if (socket_error() == EAGAIN) { yield_until_fd_readable(s-fd); } else if (socket_error() != EINTR) { break; } If the socket is put in blocking mode recv will never return EAGAIN, so this code will only run if the socket is nonblocking. OK, yes. Actually, I see I missed something here; this should be: Destination side Forward path - read by main thread, and listener thread (see the separate mail that described that listner thread) and that means that once postcopy is going (and the listener thread started) it can't block the monitor. Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once it gets its hands on the QEMUFile. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) When does the main thread needs to write? Not much; the only things the main thread currently responds to are the ReqAck (ping like) requests; those are turning out to be very useful during
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases
Am 16.07.2014 19:23, schrieb ronnie sahlberg: On Wed, Jul 16, 2014 at 10:11 AM, Stefan Weil s...@weilnetz.de wrote: Am 16.07.2014 18:49, schrieb Paolo Bonzini: Il 16/07/2014 18:28, Stefan Weil ha scritto: Debian testing includes a brand new libiscsi, but it does not include libiscsi.pc, so pkg-config won't know that it is available and configure will disable libiscsi. That's a packaging bug. CC'ing Michael as he is the Debian maintainer of this package and Aurélien who maintains QEMU for Debian. Michael, should I send a Debian bug report for libiscsi-dev? Would an update of libiscsi for Debian stable be reasonable if versions older than 1.9 are too buggy to be used? If you ask debian to upgrade. Could you ask them to wait and upgrade after I have release the next version, hopefully if all goes well, at the end of this week? If someone from Ubuntu reads here, this is also for you. Your latest LTS release does also contain 1.4.0. It contains new functionality, thanks to plieven, to better handle cases where active/passive storage arrays perform failover. Yes and it fixes yet another bug in serial logic. Not a big one, but it could cause protocol errors. I must admit that I'm a little bit surprised because iSCSI support worked for me quite well the last time I used it with Debian wheezy. I think, and plieven please correct me if I am wrong, earlier version would work reasonably well for basic use but there were bugs and gaps in functionality that made it ill suited for enterprise environments. It depends how you define basic use. It causes severe protocol violations especially on reconnects and it will simply stop sending PDUs if CmdSN wraps from 0x to 0x0. Thats ok for try and mount an iSCSI volume and see if it works, but its not usable for production use enterprise or not. I mentioned the most critical bugs in the commit message to the libiscsi version bump to 1.8.0 (which Paolo later increased to 1.9.0 to make the BUSY handling possible). Peter
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases
Am 16.07.2014 18:46, schrieb Peter Maydell: On 16 July 2014 17:28, Stefan Weil s...@weilnetz.de wrote: a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of view, this change results in a regression for Debian and similar Linux distributions: QEMU no longer includes iscsi features. In this special case, the current Debian stable includes QEMU packages with libiscsi 1.4, but new builds won't include it because that version is now too old. Debian testing includes a brand new libiscsi, but it does not include libiscsi.pc, so pkg-config won't know that it is available and configure will disable libiscsi. I have a patch which fixes this, so QEMU for Debian testing could include libiscsi again. Is a feature regression like this one acceptable? Do we need additional testing (maybe run the build bots with --enable-xxx, so builds fail when xxx no longer works)? In general, we should try to avoid feature regressions, but it's going to be a case-by-case thing. In this particular instance, upstream libiscsi don't recommend pre-1.9 for production use (as the commit message documents), and I don't think we would be doing our users any favours by allowing them to build something that's likely to be broken. We should of course flag up this sort of minimum version of our dependencies has been bumped info in the release notes. I will update the Wiki. Peter I think that does QEMU still build with all the features we need on our distro? has to be testing done by the people who package and maintain QEMU for each distro -- they're the only people who know which configurations options they enable and which baseline versions of their distro they still care about packaging mainline QEMU for. thanks -- PMM
Re: [Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()
On 07/16/2014 09:48 AM, Kevin Wolf wrote: Signed-off-by: Kevin Wolf kw...@redhat.com --- block.c | 33 +++-- block/iscsi.c | 3 +-- block/qcow2.c | 4 +--- block/qed.c | 4 +--- block/raw-posix.c | 4 +--- block/raw_bsd.c | 3 +-- block/stream.c| 2 +- block/vmdk.c | 4 +--- include/block/block.h | 2 +- include/block/block_int.h | 2 +- 10 files changed, 32 insertions(+), 29 deletions(-) /* Take some limits from the children as a default */ if (bs-file) { -bdrv_refresh_limits(bs-file); +bdrv_refresh_limits(bs-file, local_err); +if (local_err) { +error_propagate(errp, local_err); +return; +} Nice that you are no longer ignoring failure from the child. @@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd) bdrv_op_unblock(bs-backing_hd, BLOCK_OP_TYPE_COMMIT, bs-backing_blocker); out: -bdrv_refresh_limits(bs); +bdrv_refresh_limits(bs, NULL); } /* @@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state) BDRV_O_CACHE_WB); reopen_state-bs-read_only = !(reopen_state-flags BDRV_O_RDWR); -bdrv_refresh_limits(reopen_state-bs); +bdrv_refresh_limits(reopen_state-bs, NULL); +++ b/block/stream.c @@ -76,7 +76,7 @@ static void close_unused_images(BlockDriverState *top, BlockDriverState *base, bdrv_unref(unused); } -bdrv_refresh_limits(top); +bdrv_refresh_limits(top, NULL); } Should these three callers be concerned about failure? If so, would error_abort be better than NULL? But as for this patch, you are preserving existing semantics, so you could save it for a later patch. Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 2/2] raw-posix: Fail gracefully if no working alignment is found
On 07/16/2014 09:48 AM, Kevin Wolf wrote: If qemu couldn't find out what O_DIRECT alignment to use with a given file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c and confuse users. This adds a more descriptive error message for such cases. Signed-off-by: Kevin Wolf kw...@redhat.com --- block/raw-posix.c | 35 +++ 1 file changed, 27 insertions(+), 8 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On Wed, Jul 16, 2014 at 5:24 PM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote: On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 15/07/2014 23:25, Andrey Korolyov ha scritto: On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote: On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com wrote: On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote: Hello, the issue is not specific to the iothread code because generic virtio-blk also hangs up: Do you know which version works well? If you could bisect, that'll help a lot. Thanks, Amit Hi, 2.0 works definitely well. I`ll try to finish bisection today, though every step takes about 10 minutes to complete. Yay! It is even outside of virtio-blk. commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 Author: Marcelo Tosatti mtosa...@redhat.com Date: Tue Jun 3 13:34:48 2014 -0300 kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec calculation Ensure proper env-tsc value for kvmclock_current_nsec calculation. Reported-by: Marcin Gibuła m.gib...@beyond.pl Cc: qemu-sta...@nongnu.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include qemu/host-utils.h #include sysemu/sysemu.h #include sysemu/kvm.h -#include sysemu/cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); -assert(time.tsc_timestamp = migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift 0) { delta = -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s-clock_valid) { return; } - -cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data); if (ret 0) { fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); +cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret = 0) { qemu_file_set_rate_limit(s-file, INT64_MAX); It could also be useful to apply the above patch _and_ revert a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce. Paolo Yes, it solved the issue for me! (it took much time to check because most of country` debian mirrors went inconsistent by some reason) Also trivial addition: diff --git a/migration.c b/migration.c index 34f2325..65d1c88 100644 --- a/migration.c +++ b/migration.c @@ -25,6 +25,7 @@ #include qemu/thread.h #include qmp-commands.h #include trace.h +#include sysemu/cpus.h And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ? That is, test with a stock qemu.git tree and the patch sent today, on this thread, to move cpu_synchronize_all_states ? The main reason for things to work for me is a revert of 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other patches. I had tested two cases, with Alexander`s patch completely reverted plus suggestion from Marcelo and only with revert 9b178682 plug same suggestion. The difference is that the until Alexander` patch is not reverted, live migration is always failing by the timeout value, and when reverted migration always succeeds in 8-10 seconds. Appropriate
[Qemu-devel] [PATCH V3 1/5] docs: Specification for the image fuzzer
'Overall fuzzer requirements' chapter contains the current product vision and features done and to be done. This chapter is still in progress. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/docs/image-fuzzer.txt | 239 +++ 1 file changed, 239 insertions(+) create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt b/tests/image-fuzzer/docs/image-fuzzer.txt new file mode 100644 index 000..c1594a0 --- /dev/null +++ b/tests/image-fuzzer/docs/image-fuzzer.txt @@ -0,0 +1,239 @@ +# Specification for the fuzz testing tool +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + + +Image fuzzer + + +Description +--- + +The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img +by providing to them randomly corrupted images. +Test images are generated from scratch and have valid inner structure with some +elements, e.g. L1/L2 tables, having random invalid values. + + +Test runner +--- + +The test runner generates test images, executes tests utilizing generated +images, indicates their results and collects all test related artifacts (logs, +core dumps, test images). +The test means execution of all available commands under test with the same +generated test image. +By default, the test runner generates new tests and executes them until +keyboard interruption. But if a test seed is specified via the '--seed' runner +parameter, then only one test with this seed will be executed, after its finish +the runner will exit. + +The runner uses an external image fuzzer to generate test images. An image +generator should be specified as a mandatory parameter of the test runner. +Details about interactions between the runner and fuzzers see Module +interfaces. + +The runner activates generation of core dumps during test executions, but it +assumes that core dumps will be generated in the current working directory. +For comprehensive test results, please, set up your test environment +properly. + +Paths to binaries under test (SUTs) qemu-img and qemu-io are retrieved from +environment variables. If the environment check fails the runner will +use SUTs installed in system paths. +qemu-img is required for creation of backing files, so it's mandatory to set +the related environment variable if it's not installed in the system path. +For details about environment variables see qemu-iotests/check. + +The runner accepts via the '--config' argument JSON objects with lists of +fields expected to be fuzzed, e.g. + + '[[feature_name_table], [header, l1_table_offset]]' + +Each sublist can be have one or two strings defining image structure elements. +In the latter case a parent element should be placed on the first position, +and a field name on the second one. + +The runner accepts lists of commands under test as JSON objects via +the '--command' argument. Each command is a list containing a SUT and all its +arguments, e.g. + + runner.py -c '[[qemu-io, $test_img, -c, write $off $len]]' + /tmp/test ../qcow2 + +For variable arguments next aliases can be used: +- $test_img for a fuzzed img +- $off for an offset in the fuzzed image +- $len for a data size + +Values for last two aliases will be generated based on the size of the virtal +disk in the fuzzed image. +In case when no commands are specified the runner will execute commands from +the default list: +- qemu-img check +- qemu-img info +- qemu-img convert +- qemu-io -c read +- qemu-io -c write +- qemu-io -c aio_read +- qemu-io -c aio_write +- qemu-io -c flush +- qemu-io -c discard +- qemu-io -c truncate + + +Qcow2 image generator +- + +The 'qcow2' generator is a Python package providing 'create_image' method as +a single public API. See details in 'Test runner/image fuzzer' in 'Module +interfaces'. + +Qcow2 contains two submodules: fuzz.py and layout.py. + +'fuzz.py' contains all fuzzing functions, one per image field. It's assumed +that after code analysis every field will have own constraints for its value. +For now only universal potentially dangerous values are used, e.g. type limits +for integers or unsafe symbols as '%s' for strings. For bitmasks random amount +of bits are set to ones. All fuzzed values are
[Qemu-devel] [PATCH V3 5/5] package: Public API for image-fuzzer/runner/runner.py
__init__.py provides the public API required by the test runner Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/__init__.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 tests/image-fuzzer/qcow2/__init__.py diff --git a/tests/image-fuzzer/qcow2/__init__.py b/tests/image-fuzzer/qcow2/__init__.py new file mode 100644 index 000..e2ebe19 --- /dev/null +++ b/tests/image-fuzzer/qcow2/__init__.py @@ -0,0 +1 @@ +from layout import create_image -- 1.9.3
[Qemu-devel] [PATCH V3 4/5] layout: Generator of fuzzed qcow2 images
The layout submodule of the qcow2 package creates a random valid image, randomly selects some amount of its fields, fuzzes them and write the fuzzed image to the file. Fuzzing process can be controlled by external configuration. Now only a header, header extensions and a backing file name are generated. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/layout.py | 359 + 1 file changed, 359 insertions(+) create mode 100644 tests/image-fuzzer/qcow2/layout.py diff --git a/tests/image-fuzzer/qcow2/layout.py b/tests/image-fuzzer/qcow2/layout.py new file mode 100644 index 000..f475639 --- /dev/null +++ b/tests/image-fuzzer/qcow2/layout.py @@ -0,0 +1,359 @@ +# Generator of fuzzed qcow2 images +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import random +import struct +import fuzz + +MAX_IMAGE_SIZE = 10*2**20 +# Standard sizes +UINT32_S = 4 +UINT64_S = 8 + + +class Field(object): +Atomic image element (field) + +The class represents an image field as quadruple of a data format +of value necessary for its packing to binary form, an offset from +the beginning of the image, a value and a name. + +The field can be iterated as a list [format, offset, value] + +__slots__ = ('fmt', 'offset', 'value', 'name') + +def __init__(self, fmt, offset, val, name): +self.fmt = fmt +self.offset = offset +self.value = val +self.name = name + +def __iter__(self): +return (x for x in [self.fmt, self.offset, self.value]) + +def __repr__(self): +return Field(fmt='%s', offset=%d, value=%s, name=%s) % \ +(self.fmt, self.offset, str(self.value), self.name) + + +class FieldsList(object): +List of fields + +The class allows access to a field in the list by its name and joins +several list in one via in-place addition + +def __init__(self, meta_data=None): +if meta_data is None: +self.data = [] +else: +self.data = [Field(f[0], f[1], f[2], f[3]) + for f in meta_data] + +def __getitem__(self, name): +return [x for x in self.data if x.name == name] + +def __iter__(self): +return (x for x in self.data) + +def __iadd__(self, other): +self.data += other.data +return self + +def __len__(self): +return len(self.data) + + +class Image(object): + Qcow2 image object + +This class allows to create qcow2 images with random valid structures and +values, fuzz them via external qcow2.fuzz module and write the result to +a file + +@staticmethod +def _size_params(): +Generate a random image size aligned to a random correct +cluster size + +cluster_bits = random.randrange(9, 21) +cluster_size = 1 cluster_bits +img_size = random.randrange(0, MAX_IMAGE_SIZE + 1, +cluster_size) +return [cluster_bits, img_size] + +@staticmethod +def _header(cluster_bits, img_size, backing_file_name=None): +Generate a random valid header +meta_header = [ +['4s', 0, QFI\xfb, 'magic'], +['I', 4, random.randint(2, 3), 'version'], +['Q', 8, 0, 'backing_file_offset'], +['I', 16, 0, 'backing_file_size'], +['I', 20, cluster_bits, 'cluster_bits'], +['Q', 24, img_size, 'size'], +['I', 32, 0, 'crypt_method'], +['I', 36, 0, 'l1_size'], +['Q', 40, 0, 'l1_table_offset'], +['Q', 48, 0, 'refcount_table_offset'], +['I', 56, 0, 'refcount_table_clusters'], +['I', 60, 0, 'nb_snapshots'], +['Q', 64, 0, 'snapshots_offset'], +['Q', 72, 0, 'incompatible_features'], +['Q', 80, 0, 'compatible_features'], +['Q', 88, 0, 'autoclear_features'], +# Only refcount_order = 4 is supported by current (07.2014) +# implementaation of QEMU +['I', 96, 4, 'refcount_order'], +['I', 100, 0, 'header_length'] +] +v_header = FieldsList(meta_header) + +if v_header['version'][0].value == 2: +v_header['header_length'][0].value = 72 +else: +
[Qemu-devel] [PATCH V3 3/5] fuzz: Fuzzing functions for qcow2 images
The fuzz submodule of the qcow2 image generator contains fuzzing functions for image fields. Each fuzzing function contains list of constraints and a call of a helper function that randomly selects a fuzzed value satisfied to one of constraints. For now constraints include only known as invalid or potentially dangerous values. But after investigation of code coverage by fuzz tests they will be expanded by heuristic values based on inner checks and flows of a program under test. Now fuzzing of a header, header extensions and a backing file name is supported. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/fuzz.py | 336 +++ 1 file changed, 336 insertions(+) create mode 100644 tests/image-fuzzer/qcow2/fuzz.py diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py new file mode 100644 index 000..b576259 --- /dev/null +++ b/tests/image-fuzzer/qcow2/fuzz.py @@ -0,0 +1,336 @@ +# Fuzzing functions for qcow2 fields +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import random + + +UINT32 = 0x +UINT64 = 0x +# Most significant bit orders +UINT32_M = 31 +UINT64_M = 63 +# Fuzz vectors +UINT32_V = [0, 0x100, 0x1000, 0x1, 0x10, UINT32/4, UINT32/2 - 1, +UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32] +UINT64_V = UINT32_V + [0x100, 0x1000, 0x1, UINT64/4, + UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1, + UINT64] +STRING_V = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x', +'%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d', '%%20n', +'%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s', '%p%p%p%p%p%p%p%p%p%p', +'%#0123456x%08x%x%s%p%d%n%o%u%c%h%l%q%j%z%Z%t%i%e%g%f%a%C%S%08x%%', +'%s x 129', '%x x 257'] + + +def random_from_intervals(intervals): +Select a random integer number from the list of specified intervals + +Each interval is a tuple of lower and upper limits of the interval. The +limits are included. Intervals in a list should not overlap. + +total = reduce(lambda x, y: x + y[1] - y[0] + 1, intervals, 0) +r = random.randint(0, total-1) + intervals[0][0] +temp = zip(intervals, intervals[1:]) +for x in temp: +r = r + (r x[0][1])*(x[1][0] - x[0][1] - 1) +return r + + +def random_bits(bit_ranges): +Generate random binary mask with ones in the specified bit ranges + +Each bit_ranges is a list of tuples of lower and upper limits of bit +positions will be fuzzed. The limits are included. Random amount of bits +in range limits will be set to ones. The mask is returned in decimal +integer format. + +bit_numbers = [] +# Select random amount of random positions in bit_ranges +for rng in bit_ranges: +bit_numbers += random.sample(range(rng[0], rng[1] + 1), + random.randint(0, rng[1] - rng[0] + 1)) +val = 0 +# Set bits on selected possitions to ones +for bit in bit_numbers: +val |= 1 bit +return val + + +def truncate_string(strings, length): +Return strings truncated to specified length +if type(strings) == list: +return [s[:length] for s in strings] +else: +return strings[:length] + + +def int_validator(current, intervals): +Return a random value from intervals not equal to the current. + +This function is useful for selection from valid values except current one. + +val = random_from_intervals(intervals) +if val == current: +return int_validator(current, intervals) +else: +return val + + +def bit_validator(current, bit_ranges): +Return a random bit mask not equal to the current. + +This function is useful for selection from valid values except current one. + + +val = random_bits(bit_ranges) +if val == current: +return bit_validator(current, bit_ranges) +else: +return val + + +def string_validator(current, strings): +Return a random string value from the list not equal to the current. + +This function is useful for selection from valid values except current one. + +val = random.choice(strings) +if val == current: +return string_validator(current,
Re: [Qemu-devel] [PATCH v6 0/3] s390: Support for Hotplug of Standby Memory
On 06/30/2014 10:00 AM, Matthew Rosato wrote: This patchset adds support in s390 for a pool of standby memory, which can be set online/offline by the guest (ie, via chmem). The standby pool of memory is allocated as the difference between the initial memory setting and the maxmem setting. As part of this work, additional results are provided for the Read SCP Information SCLP, and new implentation is added for the Read Storage Element Information, Attach Storage Element, Assign Storage and Unassign Storage SCLPs, which enables the s390 guest to manipulate the standby memory pool. This patchset is based on work originally done by Jeng-Fang (Nick) Wang. Sample qemu command snippet: qemu -machine s390-ccw-virtio -m 1024M,maxmem=2048M,slots=32 -enable-kvm This will allocate 1024M of active memory, and another 1024M of standby memory. Example output from s390-tools lsmem: = 0x-0x0fff256 online no 0-127 0x1000-0x1fff256 online yes128-255 0x2000-0x3fff512 online no 256-511 0x4000-0x7fff 1024 offline - 512-1023 Memory device size : 2 MB Memory block size : 256 MB Total online memory : 1024 MB Total offline memory: 1024 MB The guest can dynamically enable part or all of the standby pool via the s390-tools chmem, for example: chmem -e 512M And can attempt to dynamically disable: chmem -d 512M Ping... Changes for v6: * Fix in sclp.h - DeviceState parent -- SysBusDevice parent in struct sclpMemoryHotplugDev. * Fix in assign_storage - int this_subregion_size, should be uint64_t. * Added information on how to test in the cover letter. Changes for v5: * Since ACPI memory hotplug is now in, removed Igor's patches from this set. * Updated sclp.c to use object_resolve_path() instead of object_property_find(). Changes for v4: * Remove initialization code from get_sclp_memory_hotplug_dev() and place in its own function, init_sclp_memory_hotplug_dev(). * Add hit to qemu-options.hx to note the fact that the memory size specified via -m might be forced to a boundary. * Account for the legacy s390 machine, which does not support memory hotplug. * Fix a bug in sclp.c - Change memory hotplug device parent to sysbus. * Pulled latest version of Igor's patch. Matthew Rosato (3): sclp-s390: Add device to manage s390 memory hotplug virtio-ccw: Include standby memory when calculating storage increment sclp-s390: Add memory hotplug SCLPs hw/s390x/s390-virtio-ccw.c | 46 +-- hw/s390x/sclp.c| 289 +++- include/hw/s390x/sclp.h| 20 +++ qemu-options.hx|3 +- target-s390x/cpu.h | 18 +++ target-s390x/kvm.c |5 + 6 files changed, 366 insertions(+), 15 deletions(-)
[Qemu-devel] [Bug 1318830] Re: High CPU usage on windows virtual machine
Hi, could you please try adding hyperv to the vm options, as shown here: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1308341/comments/10 Please let us know if that helps. ** Changed in: qemu (Ubuntu) Status: New = Incomplete -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1318830 Title: High CPU usage on windows virtual machine Status in QEMU: New Status in “qemu” package in Ubuntu: Incomplete Bug description: I got Ubuntu 14.04, with Qemu 2.0 and moved my windows VM to this new box, and made sure that what this article indicates was achieved https://www.kraxel.org/blog/2014/03/qemu-and-usb-tablet-cpu-consumtion/ I can attest that it works following the instructions, erasing the registry, etc. Unfortunately, with 4 cpus as below, I still see 60% CPU outside as shown by Top versus 0% CPU inside. My Kernel is 3.15.0-031500rc4-generic If some developer wants to log in and take a look, I am happy to help. The box is not in production and I take full responsibility. Until this is solved, KVM is not going to compete with Hyper-V or Vmware. Basically KVM is not suitable for the Enterprise as of yet. qemu-system-x86_64 -enable-kvm -name Production -S -machine pc- i440fx-2.0,accel=kvm,usb=off -cpu kvm64,+rdtscp,+pdpe1gb,+x2apic,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,hv_relaxed,hv_vapic,hv_spinlocks=0xfff -m 4024 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid e8701c5c-b542-0199-fd2a-1047df24770e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Production.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb- uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/Production.img,if=none,id=drive-virtio- disk0,format=raw -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio- disk0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=00:16:3a:d2:cd:ea,bus=pci.0,addr=0x3 -netdev tap,fd=35,id=hostnet1,vhost=on,vhostfd=36 -device virtio-net- pci,netdev=hostnet1,id=net1,mac=52:54:00:70:fe:54,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa- serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device VGA,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x5 -device hda- duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x7 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1318830/+subscriptions
[Qemu-devel] pause and restore function before and after QEMU Live Snapshot
Hi folks, I am going to make a patch, and need to find the pause and thaw(restore) function before and after taking Live QEMU Snapshot respectively. Basically, I'm using # (qemu) snapshot_blkdev blockX snapshot-fileformat taking snapshot, http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when giving command inside QMP. Does anyone know how to find the pause (freeze) and restore(thaw) function before and after taking snapshot? Or anyway using snapshot_blkdev command outside QEMU console? Thanks a lot in advance! Best, Yuanzhen Best, Yuanzhen
Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot
On 07/16/2014 03:13 PM, Yuanzhen Gu wrote: Hi folks, I am going to make a patch, and need to find the pause and thaw(restore) function before and after taking Live QEMU Snapshot respectively. Libvirt has the ability to do just that, when taking external disk-only snapshots. You can turn on libvirt debugging to trace what QMP/agent commands are sent during the overall snapshot operation. Basically, I'm using # (qemu) snapshot_blkdev blockX snapshot-fileformat taking snapshot, http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when giving command inside QMP. Does anyone know how to find the pause (freeze) and restore(thaw) function before and after taking snapshot? Or anyway using snapshot_blkdev command outside QEMU console? Thanks a lot in advance! You have to coordinate multiple commands: freeze to the guest agent, then snapshot to QMP, then thaw to the guest agent. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot
On Thu, Jul 17, 2014 at 1:18 AM, Eric Blake ebl...@redhat.com wrote: On 07/16/2014 03:13 PM, Yuanzhen Gu wrote: Hi folks, I am going to make a patch, and need to find the pause and thaw(restore) function before and after taking Live QEMU Snapshot respectively. Libvirt has the ability to do just that, when taking external disk-only snapshots. You can turn on libvirt debugging to trace what QMP/agent commands are sent during the overall snapshot operation. Basically, I'm using # (qemu) snapshot_blkdev blockX snapshot-fileformat taking snapshot, http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when giving command inside QMP. Does anyone know how to find the pause (freeze) and restore(thaw) function before and after taking snapshot? Or anyway using snapshot_blkdev command outside QEMU console? Thanks a lot in advance! You have to coordinate multiple commands: freeze to the guest agent, then snapshot to QMP, then thaw to the guest agent. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org Just 2c: fsfreeze will not succeed if there are any container instances, for example Docker, running in the VM on same filesystem. This exact case can be fixed by extending agent` logic but it looks that the putting a note is enough.
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Tested on iscsi pool, though there are no-cache requirement and rbd with disabled cache may survive one migration but iscsi backend hangs always. As it was before, just rolling back problematic commit fixes the problem and adding cpu_synchronize_all_states to migration.c has no difference at a glance in a VM` behavior. The problem consist at least two separate ones: the current hang and behavior with the unreverted patch from agraf - last one causes live migration with writeback cache to fail, cache=none works well in any variant which survives first condition. Marcin, would you mind to check the current state of the problem on your environments in a spare time? It is probably easier to reproduce on iscsi because of way smaller time needed to set it up, command line and libvirt config attached (v2.1.0-rc2 plus iscsi-1.11.0). Ok, but what exacly do you want me to test? Just to avoid any confusion, originally there were two problems with kvmclock: 1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when clock drift (?) caused kvmclock in guest to report time in past which caused guest kernel to hang. This is hard to reproduce reliably (probably as it requires long time for drift to accumulate). 2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during non-migration operations (drive-mirror + pivot), which also caused guest kernel to hang. This is trival to reproduce. I'm using both of them applied on top of 2.0 in production and have no problems with them. I'm using NFS exclusively with cache=none. So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no extra patches applied or reverted, on VM that is running fio, am I correct? -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On Thu, Jul 17, 2014 at 1:28 AM, Marcin Gibuła m.gib...@beyond.pl wrote: Tested on iscsi pool, though there are no-cache requirement and rbd with disabled cache may survive one migration but iscsi backend hangs always. As it was before, just rolling back problematic commit fixes the problem and adding cpu_synchronize_all_states to migration.c has no difference at a glance in a VM` behavior. The problem consist at least two separate ones: the current hang and behavior with the unreverted patch from agraf - last one causes live migration with writeback cache to fail, cache=none works well in any variant which survives first condition. Marcin, would you mind to check the current state of the problem on your environments in a spare time? It is probably easier to reproduce on iscsi because of way smaller time needed to set it up, command line and libvirt config attached (v2.1.0-rc2 plus iscsi-1.11.0). Ok, but what exacly do you want me to test? Just to avoid any confusion, originally there were two problems with kvmclock: 1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when clock drift (?) caused kvmclock in guest to report time in past which caused guest kernel to hang. This is hard to reproduce reliably (probably as it requires long time for drift to accumulate). 2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during non-migration operations (drive-mirror + pivot), which also caused guest kernel to hang. This is trival to reproduce. I'm using both of them applied on top of 2.0 in production and have no problems with them. I'm using NFS exclusively with cache=none. So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no extra patches applied or reverted, on VM that is running fio, am I correct? Yes, exactly. ISCSI-based setup can take some minutes to deploy, given prepared image, and I have one hundred percent hit rate for the original issue with it. -- mg
Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot
Thank you very much, Eric! So do you mean, if I turn on libvirt debugging to trace QMP/agent commands, I can find the pause and thaw functions in the source code, is it right ? and is there any way that I freeze and thaw outside guest agent via command line? or profiling tools suggest for freeze and thaw in guest agent? thanks a lot! Best, Yuanzhen On Wed, Jul 16, 2014 at 5:18 PM, Eric Blake ebl...@redhat.com wrote: On 07/16/2014 03:13 PM, Yuanzhen Gu wrote: Hi folks, I am going to make a patch, and need to find the pause and thaw(restore) function before and after taking Live QEMU Snapshot respectively. Libvirt has the ability to do just that, when taking external disk-only snapshots. You can turn on libvirt debugging to trace what QMP/agent commands are sent during the overall snapshot operation. So do you mean, if I turn on libvirt debugging to trace QMP/agent commands, I can find the pause and thaw functions in the source code, is it right ? Basically, I'm using # (qemu) snapshot_blkdev blockX snapshot-fileformat taking snapshot, http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when giving command inside QMP. Does anyone know how to find the pause (freeze) and restore(thaw) function before and after taking snapshot? Or anyway using snapshot_blkdev command outside QEMU console? Thanks a lot in advance! You have to coordinate multiple commands: freeze to the guest agent, then snapshot to QMP, then thaw to the guest agent. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot
Thanks very much Andrey! how can I figure out whether fsfreeze not succeed while taking snapshot, if I run a simple application inside VM, say mysql? thanks! Best, Yuanzhen On Wed, Jul 16, 2014 at 5:25 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Jul 17, 2014 at 1:18 AM, Eric Blake ebl...@redhat.com wrote: On 07/16/2014 03:13 PM, Yuanzhen Gu wrote: Hi folks, I am going to make a patch, and need to find the pause and thaw(restore) function before and after taking Live QEMU Snapshot respectively. Libvirt has the ability to do just that, when taking external disk-only snapshots. You can turn on libvirt debugging to trace what QMP/agent commands are sent during the overall snapshot operation. Basically, I'm using # (qemu) snapshot_blkdev blockX snapshot-fileformat taking snapshot, http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when giving command inside QMP. Does anyone know how to find the pause (freeze) and restore(thaw) function before and after taking snapshot? Or anyway using snapshot_blkdev command outside QEMU console? Thanks a lot in advance! You have to coordinate multiple commands: freeze to the guest agent, then snapshot to QMP, then thaw to the guest agent. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org Just 2c: fsfreeze will not succeed if there are any container instances, for example Docker, running in the VM on same filesystem. This exact case can be fixed by extending agent` logic but it looks that the putting a note is enough.
Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot
On Thu, Jul 17, 2014 at 1:48 AM, Yuanzhen Gu yg...@cs.rutgers.edu wrote: Thanks very much Andrey! how can I figure out whether fsfreeze not succeed while taking snapshot, if I run a simple application inside VM, say mysql? thanks! Best, Yuanzhen On Wed, Jul 16, 2014 at 5:25 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Jul 17, 2014 at 1:18 AM, Eric Blake ebl...@redhat.com wrote: On 07/16/2014 03:13 PM, Yuanzhen Gu wrote: Hi folks, I am going to make a patch, and need to find the pause and thaw(restore) function before and after taking Live QEMU Snapshot respectively. Libvirt has the ability to do just that, when taking external disk-only snapshots. You can turn on libvirt debugging to trace what QMP/agent commands are sent during the overall snapshot operation. Basically, I'm using # (qemu) snapshot_blkdev blockX snapshot-fileformat taking snapshot, http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when giving command inside QMP. Does anyone know how to find the pause (freeze) and restore(thaw) function before and after taking snapshot? Or anyway using snapshot_blkdev command outside QEMU console? Thanks a lot in advance! You have to coordinate multiple commands: freeze to the guest agent, then snapshot to QMP, then thaw to the guest agent. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org Just 2c: fsfreeze will not succeed if there are any container instances, for example Docker, running in the VM on same filesystem. This exact case can be fixed by extending agent` logic but it looks that the putting a note is enough. Just non-zero exit code from agent`s operation, not matter if you communicating directly via json exchange or via libvirt.
Re: [Qemu-devel] [PATCH 1/2] hw/misc/arm_sp810: Create SP810 device
On Thu, Jul 17, 2014 at 12:42 AM, Fabian Aggeler aggel...@ethz.ch wrote: This adds a device model for the PrimeXsys System Controller (SP810) which is present in the Versatile Express motherboards. It is so far read-only but allows to set the SCCTRL register. Signed-off-by: Fabian Aggeler aggel...@ethz.ch --- default-configs/arm-softmmu.mak | 1 + hw/misc/Makefile.objs | 1 + hw/misc/arm_sp810.c | 184 3 files changed, 186 insertions(+) create mode 100644 hw/misc/arm_sp810.c diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index f3513fa..67ba99a 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -55,6 +55,7 @@ CONFIG_PL181=y CONFIG_PL190=y CONFIG_PL310=y CONFIG_PL330=y +CONFIG_SP810=y CONFIG_CADENCE=y CONFIG_XGMAC=y CONFIG_EXYNOS4=y diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index 979e532..49d023b 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -13,6 +13,7 @@ common-obj-$(CONFIG_PL310) += arm_l2x0.o common-obj-$(CONFIG_INTEGRATOR_DEBUG) += arm_integrator_debug.o common-obj-$(CONFIG_A9SCU) += a9scu.o common-obj-$(CONFIG_ARM11SCU) += arm11scu.o +common-obj-$(CONFIG_SP810) += arm_sp810.o # PKUnity SoC devices common-obj-$(CONFIG_PUV3) += puv3_pm.o diff --git a/hw/misc/arm_sp810.c b/hw/misc/arm_sp810.c new file mode 100644 index 000..82cbcf0 --- /dev/null +++ b/hw/misc/arm_sp810.c @@ -0,0 +1,184 @@ +/* + * ARM PrimeXsys System Controller (SP810) + * + * Copyright (C) 2014 Fabian Aggeler aggel...@ethz.ch + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include hw/hw.h +#include hw/sysbus.h + +#define LOCK_VALUE 0xa05f + +#define TYPE_ARM_SP810 arm_sp810 +#define ARM_SP810(obj) OBJECT_CHECK(arm_sp810_state, (obj), TYPE_ARM_SP810) + +typedef struct { +SysBusDevice parent_obj; +MemoryRegion iomem; + +uint32_t sc_ctrl; /* SCCTRL */ +} arm_sp810_state; + +#define TIMEREN0SEL (1 15) +#define TIMEREN1SEL (1 17) +#define TIMEREN2SEL (1 19) +#define TIMEREN3SEL (1 21) + Are these fields of a particular register? Is so they should have the register name as a prefix. +static const VMStateDescription vmstate_arm_sysctl = { +.name = arm_sp810, +.version_id = 1, +.minimum_version_id = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT32(sc_ctrl, arm_sp810_state), +VMSTATE_END_OF_LIST() +} +}; + +static uint64_t arm_sp810_read(void *opaque, hwaddr offset, unsigned size) +{ +arm_sp810_state *s = (arm_sp810_state *)opaque; Drop the cast and just rely on the implicit. + +switch (offset) { +case 0x000: /* SCCTRL */ Define macros for register offset rather than switch on literal addresses + comments. +return s-sc_ctrl; +case 0x004: /* SCSYSSTAT */ +case 0x008: /* SCIMCTRL */ +case 0x00C: /* SCIMSTAT */ +case 0x010: /* SCXTALCTRL */ +case 0x014: /* SCPLLCTRL */ +case 0x018: /* SCPLLFCTRL */ +case 0x01C: /* SCPERCTRL0 */ +case 0x020: /* SCPERCTRL1 */ +case 0x024: /* SCPEREN */ +case 0x028: /* SCPERDIS */ +case 0x02C: /* SCPERCLKEN */ +case 0x030: /* SCPERSTAT */ +case 0xEE0: /* SCSYSID0 */ +case 0xEE4: /* SCSYSID1 */ +case 0xEE8: /* SCSYSID2 */ +case 0xEEC: /* SCSYSID3 */ +case 0xF00: /* SCITCR */ +case 0xF04: /* SCITIR0 */ +case 0xF08: /* SCITIR1 */ +case 0xF0C: /* SCITOR */ +case 0xF10: /* SCCNTCTRL */ +case 0xF14: /* SCCNTDATA */ +case 0xF18: /* SCCNTSTEP */ +case 0xFE0: /* SCPERIPHID0 */ +case 0xFE4: /* SCPERIPHID1 */ +case 0xFE8: /* SCPERIPHID2 */ +case 0xFEC: /* SCPERIPHID3 */ +case 0xFF0: /* SCPCELLID0 */ +case 0xFF4: /* SCPCELLID1 */ +case 0xFF8: /* SCPCELLID2 */ +case 0xFFC: /* SCPCELLID3 */ I would just let the default do it's thing for the moment and not worry about these. As a general rule I prefer to just index an array for register writes and limit the switch statement to only regs with side effects. So most of these would disappear under that scheme. You could #define them all (as mentioned above for SCCTRL) if you are looking for completeness. +default: +qemu_log_mask(LOG_UNIMP, +arm_sp810_read: Register not yet implemented: +0x%x\n, +(int)offset); HWADDR_PRIx will avoid the cast.
Re: [Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress
On Thu, Jul 17, 2014 at 1:05 AM, Alex Bennée alex.ben...@linaro.org wrote: Fabian Aggeler writes: The SP810, which is present in the Versatile Express motherboards, allows to set the timing reference to either REFCLK or TIMCLK. QEMU currently sets the SP804 timer to 1MHz by default. To reflect this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1). Signed-off-by: Fabian Aggeler aggel...@ethz.ch --- hw/arm/vexpress.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c index a88732c..b96c3fd 100644 --- a/hw/arm/vexpress.c +++ b/hw/arm/vexpress.c @@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, const char *name, static void vexpress_common_init(VEDBoardInfo *daughterboard, MachineState *machine) { -DeviceState *dev, *sysctl, *pl041; +DeviceState *dev, *sysctl, *pl041, *sp810; qemu_irq pic[64]; uint32_t sys_id; DriveInfo *dinfo; @@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo *daughterboard, qdev_init_nofail(sysctl); sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]); -/* VE_SP810: not modelled */ +/* VE_SP810 */ +sp810 = qdev_create(NULL, arm_sp810); Move the the type definition macro to header as well. Regards, Peter +/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */ +qdev_prop_set_uint32(sp810, sc-ctrl, (1 15) | (1 17) + | (1 19) | (1 21)); snip Could the #defines in the first patch be moved into a header and used here rather than manually setting these bits? -- Alex Bennée
Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking on tap
-Original Message- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Tuesday, July 15, 2014 11:00 PM To: Wangkai (Kevin,C) Cc: Stefan Hajnoczi; Lee yang; qemu-devel@nongnu.org; aligu...@amazon.com Subject: Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking on tap On Mon, Jul 14, 2014 at 10:44:58AM +, Wangkai (Kevin,C) wrote: Here the detail network: ++ | The host add tap1 and eth10 to bridge 'br1'| +- ---+ | ++ | | send | | | VM eth1-+-tap1 --- bridge --- eth10 --+-+ | | packets| | ++ | | | ++ ++ ++ Qemu start vm by virtio, use tap interface, option is: -net nic,vlan=101, model=virtio -net tap,vlan=101,ifname=tap1,script=no,downscript=no Use the newer -netdev/-device syntax to get offload support and slightly better performance: -netdev tap,id=tap0,ifname=tap1,script=no,downscript=no \ -device virtio-net-pci,netdev=tap0 And add tap1 and eth10 to bridge br1 in the host: Brctl addif br1 tap1 Brctl addif br1 eth10 total recv 505387 time 2000925 us: means call tap_send once dealing 505387 packets, the packet payload was 300 bytes, and time use for tap_send() was 2,000,925 micro-seconds, time was measured by record time stamp at function tap_send() start and end. We just test the performance of VM. That is 150 MB of incoming packets in a single tap_send(). Network rx queues are maybe a few 1000 packets so I wonder what is going on here. Maybe more packets are arriving while QEMU is reading them and we keep looping. That's strange though because the virtio-net rx virtqueue should fill up (it only has 256 entries). Can you investigate more and find out exactly what is going on? It's not clear yet that adding a budget is the solution or just hiding a deeper problem. Stefan [Wangkai (Kevin,C)] Hi Stefan, I think I have found the problem, why 256 entries virtqueue cannot prevent packets receiving. I have start one SMP guest, which have 2 cores, one core was pending on Io, and the other core was receiving the packets, and QEMU filling the virtqueue while the guest kernel was moving the packets out from the queue and process. They were racing, only if the guest got enough packets and receive slower than QEMU sending, the virtqueue full, then finish once receive. And I have tried -netdev/-device syntax start guest again, got very few Improvement. Regards Wangkai
[Qemu-devel] [PATCH v4 0/2] add read-pattern for block qourum
v4: - swap the patch order - update comment for fifo pattern in qaip - use qapi enumeration in quorum driver instead of manual parsing v3: - separate patch into two, one for quorum and one for qapi for easier review - add enumeration for quorum read pattern - remove unrelated blank line fix from this patch set v2: - rename single as 'fifo' - rename read_all_children as read_quorum_children - fix quorum_aio_finalize() for fifo pattern This patch adds single read pattern to quorum driver and quorum vote is default pattern. For now we do a quorum vote on all the reads, it is designed for unreliable underlying storage such as non-redundant NFS to make sure data integrity at the cost of the read performance. For some use cases as following: VM -- || vv AB Both A and B has hardware raid storage to justify the data integrity on its own. So it would help performance if we do a single read instead of on all the nodes. Further, if we run VM on either of the storage node, we can make a local read request for better performance. This patch generalize the above 2 nodes case in the N nodes. That is, vm - write to all the N nodes, read just one of them. If single read fails, we try to read next node in FIFO order specified by the startup command. The 2 nodes case is very similar to DRBD[1] though lack of auto-sync functionality in the single device/node failure for now. But compared with DRBD we still have some advantages over it: - Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed storage. And if A crashes, we need to restart all the VMs on node B. But for practice case, we can't because B might not have enough resources to setup 20 VMs at once. So if we run our 20 VMs with quorum driver, and scatter the replicated images over the data center, we can very likely restart 20 VMs without any resource problem. After all, I think we can build a more powerful replicated image functionality on quorum and block jobs(block mirror) to meet various High Availibility needs. E.g, Enable single read pattern on 2 children, -drive driver=quorum,children.0.file.filename=0.qcow2,\ children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1 [1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device Liu Yuan (2): qapi: add read-pattern support for quorum block/quorum: add simple read pattern support block/quorum.c | 181 +-- qapi/block-core.json | 19 +- 2 files changed, 149 insertions(+), 51 deletions(-) -- 1.9.1
[Qemu-devel] [PATCH v4 2/2] block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default pattern. For now we do a quorum vote on all the reads, it is designed for unreliable underlying storage such as non-redundant NFS to make sure data integrity at the cost of the read performance. For some use cases as following: VM -- || vv AB Both A and B has hardware raid storage to justify the data integrity on its own. So it would help performance if we do a single read instead of on all the nodes. Further, if we run VM on either of the storage node, we can make a local read request for better performance. This patch generalize the above 2 nodes case in the N nodes. That is, vm - write to all the N nodes, read just one of them. If single read fails, we try to read next node in FIFO order specified by the startup command. The 2 nodes case is very similar to DRBD[1] though lack of auto-sync functionality in the single device/node failure for now. But compared with DRBD we still have some advantages over it: - Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed storage. And if A crashes, we need to restart all the VMs on node B. But for practice case, we can't because B might not have enough resources to setup 20 VMs at once. So if we run our 20 VMs with quorum driver, and scatter the replicated images over the data center, we can very likely restart 20 VMs without any resource problem. After all, I think we can build a more powerful replicated image functionality on quorum and block jobs(block mirror) to meet various High Availibility needs. E.g, Enable single read pattern on 2 children, -drive driver=quorum,children.0.file.filename=0.qcow2,\ children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1 [1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device Cc: Benoit Canet ben...@irqsave.net Cc: Eric Blake ebl...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: Stefan Hajnoczi stefa...@redhat.com Signed-off-by: Liu Yuan namei.u...@gmail.com --- block/quorum.c | 179 + 1 file changed, 131 insertions(+), 48 deletions(-) diff --git a/block/quorum.c b/block/quorum.c index d5ee9c0..ebf5c71 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -24,6 +24,7 @@ #define QUORUM_OPT_VOTE_THRESHOLD vote-threshold #define QUORUM_OPT_BLKVERIFY blkverify #define QUORUM_OPT_REWRITErewrite-corrupted +#define QUORUM_OPT_READ_PATTERN read-pattern /* This union holds a vote hash value */ typedef union QuorumVoteValue { @@ -74,6 +75,8 @@ typedef struct BDRVQuorumState { bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted * block if Quorum is reached. */ + +QuorumReadPattern read_pattern; } BDRVQuorumState; typedef struct QuorumAIOCB QuorumAIOCB; @@ -117,6 +120,7 @@ struct QuorumAIOCB { bool is_read; int vote_ret; +int child_iter; /* which child to read in fifo pattern */ }; static bool quorum_vote(QuorumAIOCB *acb); @@ -154,8 +158,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb) if (acb-is_read) { for (i = 0; i s-num_children; i++) { -qemu_vfree(acb-qcrs[i].buf); -qemu_iovec_destroy(acb-qcrs[i].qiov); +if (i = acb-child_iter) { +qemu_vfree(acb-qcrs[i].buf); +qemu_iovec_destroy(acb-qcrs[i].qiov); +} } } @@ -256,6 +262,21 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret) quorum_aio_finalize(acb); } +static BlockDriverAIOCB *read_fifo_child(QuorumAIOCB *acb); + +static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source) +{ +int i; +assert(dest-niov == source-niov); +assert(dest-size == source-size); +for (i = 0; i source-niov; i++) { +assert(dest-iov[i].iov_len == source-iov[i].iov_len); +memcpy(dest-iov[i].iov_base, + source-iov[i].iov_base, + source-iov[i].iov_len); +} +} + static void quorum_aio_cb(void *opaque, int ret) { QuorumChildRequest *sacb = opaque; @@ -263,6 +284,21 @@ static void quorum_aio_cb(void *opaque, int ret) BDRVQuorumState *s = acb-common.bs-opaque; bool rewrite = false; +if (acb-is_read s-read_pattern == QUORUM_READ_PATTERN_FIFO) { +/* We try to read next child in FIFO order if we fail to read */ +if (ret 0 ++acb-child_iter s-num_children) { +read_fifo_child(acb); +return; +} + +if (ret == 0) { +quorum_copy_qiov(acb-qiov, acb-qcrs[acb-child_iter].qiov); +} +acb-vote_ret = ret; +quorum_aio_finalize(acb); +return; +} + sacb-ret = ret; acb-count++; if (ret == 0) { @@ -343,19 +379,6 @@ static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB
[Qemu-devel] [PATCH v4 1/2] qapi: add read-pattern enum for quorum
Cc: Eric Blake ebl...@redhat.com Signed-off-by: Liu Yuan namei.u...@gmail.com --- qapi/block-core.json | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index e378653..42033d9 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1384,6 +1384,19 @@ 'raw': 'BlockdevRef' } } ## +# @QuorumReadPattern +# +# An enumeration of quorum read patterns. +# +# @quorum: read all the children and do a quorum vote on reads +# +# @fifo: read only from the first child that has not failed +# +# Since: 2.2 +## +{ 'enum': 'QuorumReadPattern', 'data': [ 'quorum', 'fifo' ] } + +## # @BlockdevOptionsQuorum # # Driver specific block device options for Quorum @@ -1398,12 +1411,17 @@ # @rewrite-corrupted: #optional rewrite corrupted data when quorum is reached # (Since 2.1) # +# @read-pattern: #optional choose read pattern and set to quorum by default +#(Since 2.2) +# # Since: 2.0 ## { 'type': 'BlockdevOptionsQuorum', 'data': { '*blkverify': 'bool', 'children': [ 'BlockdevRef' ], -'vote-threshold': 'int', '*rewrite-corrupted': 'bool' } } +'vote-threshold': 'int', +'*rewrite-corrupted': 'bool', +'*read-pattern': 'QuorumReadPattern' } } ## # @BlockdevOptions -- 1.9.1