Re: [ANNOUNCE] kvm-kmod-2.6.32-rc7
[ fixing up the list address ] John Wong wrote: When i install win7_x86 with this kvm-kmod-2.6.32-rc7, win7_x86 means 64-bit version? kvm will trip to blue screen. I can install win7_86 with my debian-sid/2.6.31-x kernel modules. uname -a: Linux retro 2.6.31-1-amd64 #1 SMP Sat Oct 24 17:50:31 UTC 2009 x86_64 GNU/Linux Which qemu-kvm version are you using for these tests? Does anyone have some idea on this effect? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
On 10/28/2009 08:54 AM, Michael Goldish wrote: - Dor Laordl...@redhat.com wrote: On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote: Hi Michael, I am reviewing your patchset and have just a minor remark to make here: On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldishmgold...@redhat.com wrote: This patch adds a new test that checks the timedrift introduced by migrations. It uses the same parameters used by the timedrift test to get the guest time. In addition, the number of migrations the test performs is controlled by the parameter 'migration_iterations'. Signed-off-by: Michael Goldishmgold...@redhat.com --- client/tests/kvm/kvm_tests.cfg.sample | 33 --- client/tests/kvm/tests/timedrift_with_migration.py | 95 2 files changed, 115 insertions(+), 13 deletions(-) create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 540d0a2..618c21e 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -100,19 +100,26 @@ variants: type = linux_s3 - timedrift:install setup -type = timedrift extra_params += -rtc-td-hack -# Pin the VM and host load to CPU #0 -cpu_mask = 0x1 -# Set the load and rest durations -load_duration = 20 -rest_duration = 20 -# Fail if the drift after load is higher than 50% -drift_threshold = 50 -# Fail if the drift after the rest period is higher than 10% -drift_threshold_after_rest = 10 -# For now, make sure this test is executed alone -used_cpus = 100 +variants: +- with_load: +type = timedrift +# Pin the VM and host load to CPU #0 +cpu_mask = 0x1 Let's use -smp 2 always. We can also just make -smp 2 the default for all tests. Does that sound good? Yes btw: we need not to parallel the load test with standard tests. We already don't, because the load test has used_cpus = 100 which forces it to run alone. Soon I'll have 100 on my laptop :), better change it to -1 or MAX_INT +# Set the load and rest durations +load_duration = 20 +rest_duration = 20 Even the default duration here seems way too brief here, is there any reason why 20s was chosen instead of, let's say, 1800s? I am under the impression that 20s of load won't be enough to cause any noticeable drift... +# Fail if the drift after load is higher than 50% +drift_threshold = 50 +# Fail if the drift after the rest period is higher than 10% +drift_threshold_after_rest = 10 I am also curious about those tresholds and the reasoning behind them. Is there any official agreement on what we consider to be an unreasonable drift? Another thing that struck me out is drift calculation: On the original timedrift test, the guest drift is normalized against the host drift: drift = 100.0 * (host_delta - guest_delta) / host_delta While in the new drift tests, we consider only the guest drift. I believe is better to normalize all tests based on one drift calculation criteria, and those values should be reviewed, and at least a certain level of agreement on our development community should be reached. I think we don't need to calculate drift ratio. We should define a threshold in seconds, let's say 2 seconds. Beyond that, there should not be any drift. Are you talking about the timedrift with load or timedrift with migration or reboot tests? I was told that when running the load test for e.g 60 secs, the drift should be given in % of that duration. In the case of migration and reboot, absolute durations are used (in seconds, no %). Should we do that in the load test too? Yes, but: during extreme load, we do predict that a guest *without* pv clock will drift and won't be able to catchup until the load stops and only then it will catchup. So my recommendation is to do the following: - pvclock guest - can check with 'cat /sys/devices/system/clocksource/clocksource0/current_clocksource ' don't allow drift during huge loads. Exist (+safe) for rhel5.4 guests and ~2.6.29 (from 2.6.27). - non-pv clock - run the load, stop the load, wait 5 seconds, measure time For both, use absolute times. Do we support migration to a different host? We should, especially in this test too. The destination host reading should also be used. Apart for that, good patchset, and good thing you refactored some of the code to shared utils. We don't, and it would be very messy to implement with the framework right now. We should probably do that as some sort of server side test, but we don't have server side tests right now, so doing it may take a little time and effort. I got the
problem wit svm_get_msr on kvm-kmod-2.6.31.6
Hi all, We are testing kvm-kmod-2.6.31.6, and several people reported problems with AMD cpus: Nov 14 21:17:59 bigproxmox kernel: Pid: 3616, comm: kvm Not tainted 2.6.24-9-pve #1 ovz005 Nov 14 21:17:59 bigproxmox kernel: RIP: 0010:[88537906] [88537906] :kvm_amd:svm_get_msr+0x146/0x300 ... see http://www.proxmox.com/forum/showthread.php?t=2591 Any ideas? kvm-kmod-2.6.31.5 worked without problems. - Dietmar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtualization Performance: Intel vs. AMD
On 11/15/2009 05:55 PM, Thomas Treutner wrote: On Sunday 15 November 2009 14:05:52 Neil Aggarwal wrote: I prefer AMD CPUs, they give you a better bang for the buck. Besides that, I don't think they would be any technical differences, they are supposed to be completely compatible. I have seen no evidence to the contrary. Isn't AMD the only one who has hardware support for nested virtualization? Or isn't that true any longer? No, the Core i7 has ept which is the Intel equivalent. Anyways, I'm just curious, as this feature is primarily interesting for development, IMHO. No, it's primarily interesting for performance. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: monitoring guest sidt execution
On 11/15/2009 05:37 PM, matteo wrote: Hi to all, I'm trying to intercept the guest sidt instruction execution from the host i've added the bit to the control structure: control-intercept = | (1ULL INTERCEPT_STORE_IDTR); then I have defined the sidt handler to manage the STORE_IDTR action: [SVM_EXIT_IDTR_READ]= idtr_write_interception, So, in the idtr_write_interception handler there is the invocation of the emulate_instruction(svm- vcpu, kvm_run, 0, 0, 0); function. Following the execution flow i found that the emulation failed in the x86_emulate.c source file and precisely in the if (c-d == 0) conditional statement but i really don't know why it happens and how to fix it. could you please give me some hints with respect to this issue? You need to fill the appropriate table entry for sidt (most likely group_table) and implement the opcode in the emulator. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
On 11/16/2009 11:42 AM, Dietmar Maurer wrote: Hi all, We are testing kvm-kmod-2.6.31.6, and several people reported problems with AMD cpus: Nov 14 21:17:59 bigproxmox kernel: Pid: 3616, comm: kvm Not tainted 2.6.24-9-pve #1 ovz005 Nov 14 21:17:59 bigproxmox kernel: RIP: 0010:[88537906] [88537906] :kvm_amd:svm_get_msr+0x146/0x300 ... see http://www.proxmox.com/forum/showthread.php?t=2591 Any ideas? kvm-kmod-2.6.31.5 worked without problems. Nothing changed between these two versions to warrant this. Can you post a disassembly of svm_get_msr() around the offending address? Did you change qemu-kvm as well? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git access via http to *bios.git repositories
ext Avi Kivity schrieb: On 11/14/2009 02:46 PM, Avi Kivity wrote: On 11/13/2009 08:16 PM, Jan Kiszka wrote: Bernhard Kohl wrote: Hi, there is something wrong with the new *bios.git repositories. I need to use git via http because of a firewall. This works well with the other repos, e.g. kvm.git. $ git clone http://www.kernel.org/pub/scm/virt/kvm/pcbios.git Initialized empty Git repository in /home/bernd/src/pcbios/.git/ fatal: http://www.kernel.org/pub/scm/virt/kvm/pcbios.git/info/refs not found: did you run git update-server-info on the server? I think that should happen automatically on kernel.org. But the required files are missing in the bios repositories, at least in the non-public master that is replicated into the public area. Avi, any idea? Just like the comment says, I need to run git update-server-info. I'll do that and set the hooks to automatically do it from now on. That's now done. Please try again. Thanks, now it works. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: problem wit svm_get_msr on kvm-kmod-2.6.31.6
Nothing changed between these two versions to warrant this. Oh, sorry - the one which works is kvm-kmod-2.6.30.1 Can you post a disassembly of svm_get_msr() around the offending address? Please can you tell me how to do that? Did you change qemu-kvm as well? no, same qemu-kvm version (0.11.0) - Dietmar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
On 11/16/2009 12:46 PM, Dietmar Maurer wrote: Nothing changed between these two versions to warrant this. Oh, sorry - the one which works is kvm-kmod-2.6.30.1 Can you post a disassembly of svm_get_msr() around the offending address? Please can you tell me how to do that? objdump -Dr .../kvm-amd.ko Look at the start address of svm_get_msr (search for the name), add 0x146 (from :kvm_amd:svm_get_msr+0x146/0x300), list ~30 lines above and below that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: problem wit svm_get_msr on kvm-kmod-2.6.31.6
37c0 svm_get_msr: ... 387e: 66 90 xchg %ax,%ax 3880: 0f 84 8a 00 00 00 je 3910 svm_get_msr+0x150 3886: 66 90 xchg %ax,%ax 3888: 0f 86 c2 01 00 00 jbe3a50 svm_get_msr+0x290 388e: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3895: 48 8b 80 08 06 00 00mov0x608(%rax),%rax 389c: 48 89 02mov%rax,(%rdx) 389f: 90 nop 38a0: 31 c0 xor%eax,%eax 38a2: c3 retq 38a3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 38a8: 81 fe d9 01 00 00 cmp$0x1d9,%esi 38ae: 0f 84 7c 00 00 00 je 3930 svm_get_msr+0x170 38b4: 0f 86 46 01 00 00 jbe3a00 svm_get_msr+0x240 38ba: 81 fe db 01 00 00 cmp$0x1db,%esi 38c0: 0f 84 ca 01 00 00 je 3a90 svm_get_msr+0x2d0 38c6: 81 fe dc 01 00 00 cmp$0x1dc,%esi 38cc: 0f 1f 40 00 nopl 0x0(%rax) 38d0: 75 98 jne386a svm_get_msr+0xaa 38d2: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 38d9: 48 8b 80 80 06 00 00mov0x680(%rax),%rax 38e0: 48 89 02mov%rax,(%rdx) 38e3: eb bb jmp38a0 svm_get_msr+0xe0 38e5: 0f 1f 00nopl (%rax) 38e8: 48 83 bf 78 28 00 00cmpq $0x0,0x2878(%rdi) 38ef: 00 38f0: 0f 85 82 01 00 00 jne3a78 svm_get_msr+0x2b8 38f6: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 38fd: 48 8b 48 50 mov0x50(%rax),%rcx 3901: 0f 31 rdtsc 3903: 48 01 c8add%rcx,%rax # this is svm_get_msr+0x146 3906: 48 89 02mov%rax,(%rdx) 3909: eb 95 jmp38a0 svm_get_msr+0xe0 390b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 3910: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3917: 48 8b 80 00 06 00 00mov0x600(%rax),%rax 391e: 48 89 02mov%rax,(%rdx) 3921: e9 7a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3926: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 392d: 00 00 00 3930: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3937: 48 8b 80 70 06 00 00mov0x670(%rax),%rax 393e: 48 89 02mov%rax,(%rdx) 3941: e9 5a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3946: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 394d: 00 00 00 3950: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3957: 48 8b 80 28 06 00 00mov0x628(%rax),%rax 395e: 48 89 02mov%rax,(%rdx) 3961: e9 3a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3966: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 396d: 00 00 00 3970: 48 c7 02 65 00 00 01movq $0x165,(%rdx) 3977: e9 24 ff ff ff jmpq 38a0 svm_get_msr+0xe0 397c: 0f 1f 40 00 nopl 0x0(%rax) 3980: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3987: 48 8b 80 10 06 00 00mov0x610(%rax),%rax 398e: 48 89 02mov%rax,(%rdx) 3991: e9 0a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3996: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 399d: 00 00 00 ... We use the ubunto 2.6.24 kernel (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=summary) They have a few more patches applied: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=history;f=include/asm-x86/msr.h;h=cfe169475b5b50a448326ef3c34f50100ac83faf;hb=HEAD Maybe those last 2 patches can cause the problem? -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Montag, 16. November 2009 11:52 To: Dietmar Maurer Cc: kvm Subject: Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6 On 11/16/2009 12:46 PM, Dietmar Maurer wrote: Nothing changed between these two versions to warrant this. Oh, sorry - the one which works is kvm-kmod-2.6.30.1 Can you post a disassembly of svm_get_msr() around the offending address? Please can you tell me how to do that? objdump -Dr .../kvm-amd.ko Look at the start address of svm_get_msr (search for the name), add 0x146 (from :kvm_amd:svm_get_msr+0x146/0x300), list ~30 lines above and below that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
On 11/16/2009 01:17 PM, Dietmar Maurer wrote: 38f0: 0f 85 82 01 00 00 jne3a78svm_get_msr+0x2b8 38f6: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 38fd: 48 8b 48 50 mov0x50(%rax),%rcx 3901: 0f 31 rdtsc 3903: 48 01 c8add%rcx,%rax # this is svm_get_msr+0x146 3906: 48 89 02mov%rax,(%rdx) Looks like a miscompile of native_read_tsc(), it needs to use %edx:%eax, not assume the result is in %rax. Jan, looks like the culprit is static inline unsigned long long kvm_native_read_tsc(void) { unsigned long long val; asm volatile(rdtsc : =A (val)); return val; } =A only works correctly on i386, need to use =a =d for portability. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtualization Performance: Intel vs. AMD
Thomas Fjellstrom tfjellst...@shaw.ca writes: Hardware context switches aren't free either. FWIW, SMT has no hardware context switches, the 'S' stands for simultaneous: the operations from the different threads are travelling simultaneously through the CPU's pipeline. You seem to confuse it with 'CMT' (Coarse-grained Multi Threading), which has context switches. -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
Dietmar Maurer wrote: 37c0 svm_get_msr: ... 387e: 66 90 xchg %ax,%ax 3880: 0f 84 8a 00 00 00 je 3910 svm_get_msr+0x150 3886: 66 90 xchg %ax,%ax 3888: 0f 86 c2 01 00 00 jbe3a50 svm_get_msr+0x290 388e: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3895: 48 8b 80 08 06 00 00mov0x608(%rax),%rax 389c: 48 89 02mov%rax,(%rdx) 389f: 90 nop 38a0: 31 c0 xor%eax,%eax 38a2: c3 retq 38a3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 38a8: 81 fe d9 01 00 00 cmp$0x1d9,%esi 38ae: 0f 84 7c 00 00 00 je 3930 svm_get_msr+0x170 38b4: 0f 86 46 01 00 00 jbe3a00 svm_get_msr+0x240 38ba: 81 fe db 01 00 00 cmp$0x1db,%esi 38c0: 0f 84 ca 01 00 00 je 3a90 svm_get_msr+0x2d0 38c6: 81 fe dc 01 00 00 cmp$0x1dc,%esi 38cc: 0f 1f 40 00 nopl 0x0(%rax) 38d0: 75 98 jne386a svm_get_msr+0xaa 38d2: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 38d9: 48 8b 80 80 06 00 00mov0x680(%rax),%rax 38e0: 48 89 02mov%rax,(%rdx) 38e3: eb bb jmp38a0 svm_get_msr+0xe0 38e5: 0f 1f 00nopl (%rax) 38e8: 48 83 bf 78 28 00 00cmpq $0x0,0x2878(%rdi) 38ef: 00 38f0: 0f 85 82 01 00 00 jne3a78 svm_get_msr+0x2b8 38f6: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 38fd: 48 8b 48 50 mov0x50(%rax),%rcx 3901: 0f 31 rdtsc 3903: 48 01 c8add%rcx,%rax # this is svm_get_msr+0x146 3906: 48 89 02mov%rax,(%rdx) 3909: eb 95 jmp38a0 svm_get_msr+0xe0 390b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 3910: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3917: 48 8b 80 00 06 00 00mov0x600(%rax),%rax 391e: 48 89 02mov%rax,(%rdx) 3921: e9 7a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3926: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 392d: 00 00 00 3930: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3937: 48 8b 80 70 06 00 00mov0x670(%rax),%rax 393e: 48 89 02mov%rax,(%rdx) 3941: e9 5a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3946: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 394d: 00 00 00 3950: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3957: 48 8b 80 28 06 00 00mov0x628(%rax),%rax 395e: 48 89 02mov%rax,(%rdx) 3961: e9 3a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3966: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 396d: 00 00 00 3970: 48 c7 02 65 00 00 01movq $0x165,(%rdx) 3977: e9 24 ff ff ff jmpq 38a0 svm_get_msr+0xe0 397c: 0f 1f 40 00 nopl 0x0(%rax) 3980: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 3987: 48 8b 80 10 06 00 00mov0x610(%rax),%rax 398e: 48 89 02mov%rax,(%rdx) 3991: e9 0a ff ff ff jmpq 38a0 svm_get_msr+0xe0 3996: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 399d: 00 00 00 ... We use the ubunto 2.6.24 kernel (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=summary) They have a few more patches applied: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=history;f=include/asm-x86/msr.h;h=cfe169475b5b50a448326ef3c34f50100ac83faf;hb=HEAD Maybe those last 2 patches can cause the problem? Nope, it was most probably a kvm-kmod bug. Patch below should fix it. Jan - Fix native_read_tsc wrapping for x86-64 Use register constraint macros so that the return values of rdtsc are properly picked up and no local variable is overwritten. This is supposed to fix an oops on x86-64 with a 2.6.24 host kernel. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- x86/external-module-compat.h |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/x86/external-module-compat.h b/x86/external-module-compat.h index b0b9f21..b0de024 100644 --- a/x86/external-module-compat.h +++ b/x86/external-module-compat.h @@ -94,9 +94,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr, static inline unsigned long long kvm_native_read_tsc(void) { - unsigned long long val; - asm
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
Avi Kivity wrote: On 11/16/2009 01:17 PM, Dietmar Maurer wrote: 38f0: 0f 85 82 01 00 00 jne3a78svm_get_msr+0x2b8 38f6: 48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax 38fd: 48 8b 48 50 mov0x50(%rax),%rcx 3901: 0f 31 rdtsc 3903: 48 01 c8add%rcx,%rax # this is svm_get_msr+0x146 3906: 48 89 02mov%rax,(%rdx) Looks like a miscompile of native_read_tsc(), it needs to use %edx:%eax, not assume the result is in %rax. Jan, looks like the culprit is static inline unsigned long long kvm_native_read_tsc(void) { unsigned long long val; asm volatile(rdtsc : =A (val)); return val; } =A only works correctly on i386, need to use =a =d for portability. Yes, already commit a fix, currently propagating it through all series. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
On 11/16/2009 02:03 PM, Jan Kiszka wrote: Yes, already commit a fix, currently propagating it through all series. Naming the fix will be interesting. kvm-kmod-2.6.31.6.1? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtualization Performance: Intel vs. AMD
On 11/16/2009 12:29 AM, Gordan Bobic wrote: Thomas Fjellstrom wrote: On Sun November 15 2009, Neil Aggarwal wrote: The Core i7 has hyperthreading, so you see 8 logical CPUs. Are you saying the AMD processors do not have hyperthreading? Course not. Hyperthreading is dubious at best. That's a rather questionable answer to a rather broad issue. SMT is useful, especially on processors with deep pipelines (think Pentium 4 - and in general, deeper pipelines tend to be required for higher clock speeds), because it reduces the number of context switches. Context switches are certainly one of the most expensive operations if not the most expensive operation you can do on a processor, and typically requires flushing the pipelines. Double the number of hardware threads, and you halve the number of context switches. The real win is in parallelizing memory access. If a cache miss costs 200 cycles, no amount of pipelining and out-of-order execution will hide this cost. Running two threads in parallel will at best hide the cost by letting another thread execute, or at least issue two memory accesses in parallel instead of just one. This typically isn't useful if your CPU is processing one single-threaded application 99% of the time, but on a loaded server it can make a significant difference to throughput. If you are able to saturate the multiple threads (typically easier with many small guests rather than a few large ones) then hyperthreading is likely a win. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
On 11/16/2009 02:08 PM, Jan Kiszka wrote: Avi Kivity wrote: On 11/16/2009 02:03 PM, Jan Kiszka wrote: Yes, already commit a fix, currently propagating it through all series. Naming the fix will be interesting. kvm-kmod-2.6.31.6.1? Yes, good question. I already thought about kvm-kmod-2.6.31.6b or kvm-kmod-2.6.31.6-2 as well. Nothing convinced be yet, still open for creative ideas. -2 may confuse rpm if someone packages it. b or .1 ought to work. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/42] KVM updates for the 2.6.33 merge window (batch 1/2)
Highlights: - improved kernel context switching speed - better interoperation with other users of virtualization extensions - improved irq scaling - nested svm improvements and tracing - improved cpufreq integration - spin loop detection on newer hardware Notes: - kvm/ppc64 support will be merged through the powerpc tree - depends on tip x86/entry branch (user return notifiers) Alexander Graf (2): KVM: Activate Virtualization On Demand KVM: SVM: Notify nested hypervisor of lost event injections Avi Kivity (6): core, x86: Add user return notifiers x86: Fix user return notifier build KVM: Don't wrap schedule() with vcpu_put()/vcpu_load() KVM: Don't pass kvm_run arguments KVM: Return -ENOTTY on unrecognized ioctls KVM: Move assigned device code to own file Glauber Costa (1): KVM: x86: include pvclock MSRs in msrs_to_save Gleb Natapov (9): KVM: Call pic_clear_isr() on pic reset to reuse logic there KVM: Move irq sharing information to irqchip level KVM: Change irq routing table to use gsi indexed array KVM: Maintain back mapping from irqchip/pin to gsi KVM: Move irq routing data structure to rcu locking KVM: Move irq ack notifier list to arch independent code KVM: Convert irq notifiers lists to RCU locking KVM: Move IO APIC to its own lock KVM: Drop kvm-irq_lock lock from irq injection path Huang Weiyi (1): KVM: remove duplicated #include Jan Kiszka (2): KVM: x86: Refactor guest debug IOCTL handling KVM: x86: Rework guest single-step flag injection and filtering Jiri Slaby (1): KVM: fix lock imbalance in kvm_*_irq_source_id() Joerg Roedel (7): KVM: SVM: reorganize svm_interrupt_allowed KVM: SVM: don't copy exit_int_info on nested vmrun KVM: SVM: Remove remaining occurences of rdtscll KVM: SVM: Move INTR vmexit out of atomic code KVM: SVM: Add tracepoint for nested vmrun KVM: SVM: Add tracepoint for nested #vmexit KVM: SVM: Add tracepoint for injected #vmexit Juan Quintela (1): KVM: remove pre_task_link setting in save_state_to_tss16 Marcelo Tosatti (2): KVM: SVM: remove needless mmap_sem acquision from nested_svm_map KVM: x86: disable paravirt mmu reporting Mohammed Gamal (5): KVM: x86 emulator: Add 'push/pop sreg' instructions KVM: x86 emulator: Introduce No64 decode option KVM: x86 emulator: Add missing decoder flags for 'or' instructions KVM: x86 emulator: Add pusha and popa instructions KVM: VMX: Enhance invalid guest state emulation Stephen Rothwell (1): x86: Fix user return notifier put_cpu_var() invocation Zachary Amsden (4): KVM: Separate timer intialization into an indepedent function KVM: Kill the confusing tsc_ref_khz and ref_freq variables KVM: Fix printk name error in svm.c KVM: Fix hotplug of CPUs -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/42] x86: Fix user return notifier put_cpu_var() invocation
From: Stephen Rothwell s...@canb.auug.org.au Today's linux-next build (x86_64 allmodconfig) failed like this: kernel/user-return-notifier.c: In function 'fire_user_return_notifiers': kernel/user-return-notifier.c:45: error: expected expression before ')' token Introduced by commit 7c68af6e32c73992bad24107311f3433c89016e2 (core, x86: Add user return notifiers) from the tip and kvm trees but revealed by commit e0fdb0e050eae331046385643618f12452aa7e73 (percpu: add __percpu for sparse) from the percpu tree. Before that percpu tree commit, put_cpu_var() would compile without error (even though it really needs a parameter). Signed-off-by: Stephen Rothwell s...@canb.auug.org.au Cc: Avi Kivity a...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: Tejun Heo t...@kernel.org Cc: Rusty Russell ru...@rustcorp.com.au Cc: Christoph Lameter c...@linux-foundation.org LKML-Reference: 20091102161722.eea4358d@canb.auug.org.au Signed-off-by: Ingo Molnar mi...@elte.hu --- kernel/user-return-notifier.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/user-return-notifier.c b/kernel/user-return-notifier.c index 530ccb8..03e2d6f 100644 --- a/kernel/user-return-notifier.c +++ b/kernel/user-return-notifier.c @@ -42,5 +42,5 @@ void fire_user_return_notifiers(void) head = get_cpu_var(return_notifier_list); hlist_for_each_entry_safe(urn, tmp1, tmp2, head, link) urn-on_user_return(urn); - put_cpu_var(); + put_cpu_var(return_notifier_list); } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 28/42] KVM: fix lock imbalance in kvm_*_irq_source_id()
From: Jiri Slaby jirisl...@gmail.com Stanse found 2 lock imbalances in kvm_request_irq_source_id and kvm_free_irq_source_id. They omit to unlock kvm-irq_lock on fail paths. Fix that by adding unlock labels at the end of the functions and jump there from the fail paths. Signed-off-by: Jiri Slaby jirisl...@gmail.com Cc: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- virt/kvm/irq_comm.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 15a83b9..00c68d2 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -220,11 +220,13 @@ int kvm_request_irq_source_id(struct kvm *kvm) if (irq_source_id = sizeof(kvm-arch.irq_sources_bitmap)) { printk(KERN_WARNING kvm: exhaust allocatable IRQ sources!\n); - return -EFAULT; + irq_source_id = -EFAULT; + goto unlock; } ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID); set_bit(irq_source_id, bitmap); +unlock: mutex_unlock(kvm-irq_lock); return irq_source_id; @@ -240,7 +242,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id) if (irq_source_id 0 || irq_source_id = sizeof(kvm-arch.irq_sources_bitmap)) { printk(KERN_ERR kvm: IRQ source ID out of range!\n); - return; + goto unlock; } for (i = 0; i KVM_IOAPIC_NUM_PINS; i++) { clear_bit(irq_source_id, kvm-arch.vioapic-irq_states[i]); @@ -251,6 +253,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id) #endif } clear_bit(irq_source_id, kvm-arch.irq_sources_bitmap); +unlock: mutex_unlock(kvm-irq_lock); } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/42] KVM: Move assigned device code to own file
Signed-off-by: Avi Kivity a...@redhat.com --- arch/ia64/kvm/Makefile |2 +- arch/x86/kvm/Makefile|3 +- include/linux/kvm_host.h | 17 + virt/kvm/assigned-dev.c | 818 ++ virt/kvm/kvm_main.c | 798 + 5 files changed, 840 insertions(+), 798 deletions(-) create mode 100644 virt/kvm/assigned-dev.c diff --git a/arch/ia64/kvm/Makefile b/arch/ia64/kvm/Makefile index 0bb99b7..1089b3e 100644 --- a/arch/ia64/kvm/Makefile +++ b/arch/ia64/kvm/Makefile @@ -49,7 +49,7 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/ia64/kvm/ EXTRA_AFLAGS += -Ivirt/kvm -Iarch/ia64/kvm/ common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ - coalesced_mmio.o irq_comm.o) + coalesced_mmio.o irq_comm.o assigned-dev.o) ifeq ($(CONFIG_IOMMU_API),y) common-objs += $(addprefix ../../../virt/kvm/, iommu.o) diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 0e7fe78..31a7035 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -6,7 +6,8 @@ CFLAGS_svm.o := -I. CFLAGS_vmx.o := -I. kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ - coalesced_mmio.o irq_comm.o eventfd.o) + coalesced_mmio.o irq_comm.o eventfd.o \ + assigned-dev.o) kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o) kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4aa5e1d..c0a1cc3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -577,4 +577,21 @@ static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id; } #endif + +#ifdef __KVM_HAVE_DEVICE_ASSIGNMENT + +long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, + unsigned long arg); + +#else + +static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, + unsigned long arg) +{ + return -ENOTTY; +} + #endif + +#endif + diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c new file mode 100644 index 000..fd9c097 --- /dev/null +++ b/virt/kvm/assigned-dev.c @@ -0,0 +1,818 @@ +/* + * Kernel-based Virtual Machine - device assignment support + * + * Copyright (C) 2006-9 Red Hat, Inc + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include linux/kvm_host.h +#include linux/kvm.h +#include linux/uaccess.h +#include linux/vmalloc.h +#include linux/errno.h +#include linux/spinlock.h +#include linux/pci.h +#include linux/interrupt.h +#include irq.h + +static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head, + int assigned_dev_id) +{ + struct list_head *ptr; + struct kvm_assigned_dev_kernel *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_assigned_dev_kernel, list); + if (match-assigned_dev_id == assigned_dev_id) + return match; + } + return NULL; +} + +static int find_index_from_host_irq(struct kvm_assigned_dev_kernel + *assigned_dev, int irq) +{ + int i, index; + struct msix_entry *host_msix_entries; + + host_msix_entries = assigned_dev-host_msix_entries; + + index = -1; + for (i = 0; i assigned_dev-entries_nr; i++) + if (irq == host_msix_entries[i].vector) { + index = i; + break; + } + if (index 0) { + printk(KERN_WARNING Fail to find correlated MSI-X entry!\n); + return 0; + } + + return index; +} + +static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work) +{ + struct kvm_assigned_dev_kernel *assigned_dev; + struct kvm *kvm; + int i; + + assigned_dev = container_of(work, struct kvm_assigned_dev_kernel, + interrupt_work); + kvm = assigned_dev-kvm; + + spin_lock_irq(assigned_dev-assigned_dev_lock); + if (assigned_dev-irq_requested_type KVM_DEV_IRQ_HOST_MSIX) { + struct kvm_guest_msix_entry *guest_entries = + assigned_dev-guest_msix_entries; + for (i = 0; i assigned_dev-entries_nr; i++) { + if (!(guest_entries[i].flags + KVM_ASSIGNED_MSIX_PENDING)) + continue; + guest_entries[i].flags = ~KVM_ASSIGNED_MSIX_PENDING; + kvm_set_irq(assigned_dev-kvm, +
[PATCH 25/42] KVM: SVM: reorganize svm_interrupt_allowed
From: Joerg Roedel joerg.roe...@amd.com This patch reorganizes the logic in svm_interrupt_allowed to make it better to read. This is important because the logic is a lot more complicated with Nested SVM. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c | 16 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 59fe4d5..3f3fe81 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2472,10 +2472,18 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); struct vmcb *vmcb = svm-vmcb; - return (vmcb-save.rflags X86_EFLAGS_IF) - !(vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK) - gif_set(svm) - !(is_nested(svm) (svm-vcpu.arch.hflags HF_VINTR_MASK)); + int ret; + + if (!gif_set(svm) || +(vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK)) + return 0; + + ret = !!(vmcb-save.rflags X86_EFLAGS_IF); + + if (is_nested(svm)) + return ret !(svm-vcpu.arch.hflags HF_VINTR_MASK); + + return ret; } static void enable_irq_window(struct kvm_vcpu *vcpu) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 37/42] KVM: x86: include pvclock MSRs in msrs_to_save
From: Glauber Costa glom...@redhat.com For a while now, we are issuing a rdmsr instruction to find out which msrs in our save list are really supported by the underlying machine. However, it fails to account for kvm-specific msrs, such as the pvclock ones. This patch moves then to the beginning of the list, and skip testing them. Cc: sta...@kernel.org Signed-off-by: Glauber Costa glom...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/x86.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 385cd0a..4de5bc0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -503,16 +503,19 @@ static inline u32 bit(int bitno) * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. * * This list is modified at module load time to reflect the - * capabilities of the host cpu. + * capabilities of the host cpu. This capabilities test skips MSRs that are + * kvm-specific. Those are put in the beginning of the list. */ + +#define KVM_SAVE_MSRS_BEGIN2 static u32 msrs_to_save[] = { + MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_K6_STAR, #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TSC, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, - MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA + MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA }; static unsigned num_msrs_to_save; @@ -2446,7 +2449,8 @@ static void kvm_init_msr_list(void) u32 dummy[2]; unsigned i, j; - for (i = j = 0; i ARRAY_SIZE(msrs_to_save); i++) { + /* skip the first msrs in the list. KVM-specific */ + for (i = j = KVM_SAVE_MSRS_BEGIN; i ARRAY_SIZE(msrs_to_save); i++) { if (rdmsr_safe(msrs_to_save[i], dummy[0], dummy[1]) 0) continue; if (j i) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/42] KVM: VMX: Enhance invalid guest state emulation
From: Mohammed Gamal m.gamal...@gmail.com - Change returned handle_invalid_guest_state() to return relevant exit codes - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit() - Return to userspace instead of repeatedly trying to emulate instructions that have already failed Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/vmx.c | 44 1 files changed, 20 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4635298..73cb5dd 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -107,7 +107,6 @@ struct vcpu_vmx { } rmode; int vpid; bool emulation_required; - enum emulation_result invalid_state_emulation_result; /* Support for vnmi-less CPUs */ int soft_vnmi_blocked; @@ -3322,35 +3321,37 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu) return 1; } -static void handle_invalid_guest_state(struct kvm_vcpu *vcpu) +static int handle_invalid_guest_state(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); enum emulation_result err = EMULATE_DONE; - - local_irq_enable(); - preempt_enable(); + int ret = 1; while (!guest_state_valid(vcpu)) { err = emulate_instruction(vcpu, 0, 0, 0); - if (err == EMULATE_DO_MMIO) - break; + if (err == EMULATE_DO_MMIO) { + ret = 0; + goto out; + } if (err != EMULATE_DONE) { kvm_report_emulation_failure(vcpu, emulation failure); - break; + vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu-run-internal.suberror = KVM_INTERNAL_ERROR_EMULATION; + ret = 0; + goto out; } if (signal_pending(current)) - break; + goto out; if (need_resched()) schedule(); } - preempt_disable(); - local_irq_disable(); - - vmx-invalid_state_emulation_result = err; + vmx-emulation_required = 0; +out: + return ret; } /* @@ -3406,13 +3407,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) trace_kvm_exit(exit_reason, kvm_rip_read(vcpu)); - /* If we need to emulate an MMIO from handle_invalid_guest_state -* we just return 0 */ - if (vmx-emulation_required emulate_invalid_guest_state) { - if (guest_state_valid(vcpu)) - vmx-emulation_required = 0; - return vmx-invalid_state_emulation_result != EMULATE_DO_MMIO; - } + /* If guest state is invalid, start emulating */ + if (vmx-emulation_required emulate_invalid_guest_state) + return handle_invalid_guest_state(vcpu); /* Access CR3 don't cause VMExit in paging mode, so we need * to sync with guest real CR3. */ @@ -3607,11 +3604,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) if (unlikely(!cpu_has_virtual_nmis() vmx-soft_vnmi_blocked)) vmx-entry_time = ktime_get(); - /* Handle invalid guest state instead of entering VMX */ - if (vmx-emulation_required emulate_invalid_guest_state) { - handle_invalid_guest_state(vcpu); + /* Don't enter VMX if guest state is invalid, let the exit handler + start emulation until we arrive back to a valid state */ + if (vmx-emulation_required emulate_invalid_guest_state) return; - } if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty)) vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/42] KVM: Maintain back mapping from irqchip/pin to gsi
From: Gleb Natapov g...@redhat.com Maintain back mapping from irqchip/pin to gsi to speedup interrupt acknowledgment notifications. [avi: build fix on non-x86/ia64] Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/ia64/include/asm/kvm.h |1 + arch/x86/include/asm/kvm.h |1 + include/linux/kvm_host.h|9 + virt/kvm/irq_comm.c | 31 ++- 4 files changed, 25 insertions(+), 17 deletions(-) diff --git a/arch/ia64/include/asm/kvm.h b/arch/ia64/include/asm/kvm.h index 18a7e49..bc90c75 100644 --- a/arch/ia64/include/asm/kvm.h +++ b/arch/ia64/include/asm/kvm.h @@ -60,6 +60,7 @@ struct kvm_ioapic_state { #define KVM_IRQCHIP_PIC_MASTER 0 #define KVM_IRQCHIP_PIC_SLAVE1 #define KVM_IRQCHIP_IOAPIC 2 +#define KVM_NR_IRQCHIPS 3 #define KVM_CONTEXT_SIZE 8*1024 diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 4a5fe91..f02e87a 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -79,6 +79,7 @@ struct kvm_ioapic_state { #define KVM_IRQCHIP_PIC_MASTER 0 #define KVM_IRQCHIP_PIC_SLAVE1 #define KVM_IRQCHIP_IOAPIC 2 +#define KVM_NR_IRQCHIPS 3 /* for KVM_GET_REGS and KVM_SET_REGS */ struct kvm_regs { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f403e66..cc2d749 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -131,7 +131,10 @@ struct kvm_kernel_irq_routing_entry { struct hlist_node link; }; +#ifdef __KVM_HAVE_IOAPIC + struct kvm_irq_routing_table { + int chip[KVM_NR_IRQCHIPS][KVM_IOAPIC_NUM_PINS]; struct kvm_kernel_irq_routing_entry *rt_entries; u32 nr_rt_entries; /* @@ -141,6 +144,12 @@ struct kvm_irq_routing_table { struct hlist_head map[0]; }; +#else + +struct kvm_irq_routing_table {}; + +#endif + struct kvm { spinlock_t mmu_lock; spinlock_t requests_lock; diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 81950f6..59cf8da 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -175,25 +175,16 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) { struct kvm_irq_ack_notifier *kian; struct hlist_node *n; - unsigned gsi = pin; - int i; + int gsi; trace_kvm_ack_irq(irqchip, pin); - for (i = 0; i kvm-irq_routing-nr_rt_entries; i++) { - struct kvm_kernel_irq_routing_entry *e; - e = kvm-irq_routing-rt_entries[i]; - if (e-type == KVM_IRQ_ROUTING_IRQCHIP - e-irqchip.irqchip == irqchip - e-irqchip.pin == pin) { - gsi = e-gsi; - break; - } - } - - hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list, link) - if (kian-gsi == gsi) - kian-irq_acked(kian); + gsi = kvm-irq_routing-chip[irqchip][pin]; + if (gsi != -1) + hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list, +link) + if (kian-gsi == gsi) + kian-irq_acked(kian); } void kvm_register_irq_ack_notifier(struct kvm *kvm, @@ -332,6 +323,9 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt, } e-irqchip.irqchip = ue-u.irqchip.irqchip; e-irqchip.pin = ue-u.irqchip.pin + delta; + if (e-irqchip.pin = KVM_IOAPIC_NUM_PINS) + goto out; + rt-chip[ue-u.irqchip.irqchip][e-irqchip.pin] = ue-gsi; break; case KVM_IRQ_ROUTING_MSI: e-set = kvm_set_msi; @@ -356,7 +350,7 @@ int kvm_set_irq_routing(struct kvm *kvm, unsigned flags) { struct kvm_irq_routing_table *new, *old; - u32 i, nr_rt_entries = 0; + u32 i, j, nr_rt_entries = 0; int r; for (i = 0; i nr; ++i) { @@ -377,6 +371,9 @@ int kvm_set_irq_routing(struct kvm *kvm, new-rt_entries = (void *)new-map[nr_rt_entries]; new-nr_rt_entries = nr_rt_entries; + for (i = 0; i 3; i++) + for (j = 0; j KVM_IOAPIC_NUM_PINS; j++) + new-chip[i][j] = -1; for (i = 0; i nr; ++i) { r = -EINVAL; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/42] KVM: Change irq routing table to use gsi indexed array
From: Gleb Natapov g...@redhat.com Use gsi indexed array instead of scanning all entries on each interrupt injection. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- include/linux/kvm_host.h | 21 +-- virt/kvm/irq_comm.c | 88 +++-- virt/kvm/kvm_main.c |1 - 3 files changed, 71 insertions(+), 39 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1c7f8c4..f403e66 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -128,7 +128,17 @@ struct kvm_kernel_irq_routing_entry { } irqchip; struct msi_msg msi; }; - struct list_head link; + struct hlist_node link; +}; + +struct kvm_irq_routing_table { + struct kvm_kernel_irq_routing_entry *rt_entries; + u32 nr_rt_entries; + /* +* Array indexed by gsi. Each entry contains list of irq chips +* the gsi is connected to. +*/ + struct hlist_head map[0]; }; struct kvm { @@ -166,7 +176,7 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP - struct list_head irq_routing; /* of kvm_kernel_irq_routing_entry */ + struct kvm_irq_routing_table *irq_routing; struct hlist_head mask_notifier_list; #endif @@ -390,7 +400,12 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, struct kvm_irq_mask_notifier *kimn); void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask); -int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level); +#ifdef __KVM_HAVE_IOAPIC +void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, + union kvm_ioapic_redirect_entry *entry, + unsigned long *deliver_bitmask); +#endif +int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9783f5c..81950f6 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -144,10 +144,12 @@ static int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, * = 0 Interrupt was coalesced (previous irq is still pending) * 0 Number of CPUs interrupt was delivered to */ -int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) +int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) { struct kvm_kernel_irq_routing_entry *e; int ret = -1; + struct kvm_irq_routing_table *irq_rt; + struct hlist_node *n; trace_kvm_set_irq(irq, level, irq_source_id); @@ -157,8 +159,9 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ - list_for_each_entry(e, kvm-irq_routing, link) - if (e-gsi == irq) { + irq_rt = kvm-irq_routing; + if (irq irq_rt-nr_rt_entries) + hlist_for_each_entry(e, n, irq_rt-map[irq], link) { int r = e-set(e, kvm, irq_source_id, level); if (r 0) continue; @@ -170,20 +173,23 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) { - struct kvm_kernel_irq_routing_entry *e; struct kvm_irq_ack_notifier *kian; struct hlist_node *n; unsigned gsi = pin; + int i; trace_kvm_ack_irq(irqchip, pin); - list_for_each_entry(e, kvm-irq_routing, link) + for (i = 0; i kvm-irq_routing-nr_rt_entries; i++) { + struct kvm_kernel_irq_routing_entry *e; + e = kvm-irq_routing-rt_entries[i]; if (e-type == KVM_IRQ_ROUTING_IRQCHIP e-irqchip.irqchip == irqchip e-irqchip.pin == pin) { gsi = e-gsi; break; } + } hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list, link) if (kian-gsi == gsi) @@ -280,26 +286,30 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask) kimn-func(kimn, mask); } -static void __kvm_free_irq_routing(struct list_head *irq_routing) -{ - struct kvm_kernel_irq_routing_entry *e, *n; - - list_for_each_entry_safe(e, n, irq_routing, link) - kfree(e); -} - void kvm_free_irq_routing(struct kvm *kvm) { mutex_lock(kvm-irq_lock); - __kvm_free_irq_routing(kvm-irq_routing); + kfree(kvm-irq_routing); mutex_unlock(kvm-irq_lock); } -static
[PATCH 09/42] KVM: Move irq sharing information to irqchip level
From: Gleb Natapov g...@redhat.com This removes assumptions that max GSIs is smaller than number of pins. Sharing is tracked on pin level not GSI level. [avi: no PIC on ia64] Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 - arch/x86/kvm/irq.h |1 + include/linux/kvm_host.h|2 +- virt/kvm/ioapic.h |1 + virt/kvm/irq_comm.c | 59 +++--- 5 files changed, 39 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0b113f2..35d3236 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -410,7 +410,6 @@ struct kvm_arch{ gpa_t ept_identity_map_addr; unsigned long irq_sources_bitmap; - unsigned long irq_states[KVM_IOAPIC_NUM_PINS]; u64 vm_init_tsc; }; diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 7d6058a..c025a23 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -71,6 +71,7 @@ struct kvm_pic { int output; /* intr from master PIC */ struct kvm_io_device dev; void (*ack_notifier)(void *opaque, int irq); + unsigned long irq_states[16]; }; struct kvm_pic *kvm_create_pic(struct kvm *kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b7bbb5d..1c7f8c4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -120,7 +120,7 @@ struct kvm_kernel_irq_routing_entry { u32 gsi; u32 type; int (*set)(struct kvm_kernel_irq_routing_entry *e, - struct kvm *kvm, int level); + struct kvm *kvm, int irq_source_id, int level); union { struct { unsigned irqchip; diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h index 7080b71..6e461ad 100644 --- a/virt/kvm/ioapic.h +++ b/virt/kvm/ioapic.h @@ -41,6 +41,7 @@ struct kvm_ioapic { u32 irr; u32 pad; union kvm_ioapic_redirect_entry redirtbl[IOAPIC_NUM_PINS]; + unsigned long irq_states[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm; void (*ack_notifier)(void *opaque, int irq); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 001663f..9783f5c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -31,20 +31,39 @@ #include ioapic.h +static inline int kvm_irq_line_state(unsigned long *irq_state, +int irq_source_id, int level) +{ + /* Logical OR for level trig interrupt */ + if (level) + set_bit(irq_source_id, irq_state); + else + clear_bit(irq_source_id, irq_state); + + return !!(*irq_state); +} + static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e, - struct kvm *kvm, int level) + struct kvm *kvm, int irq_source_id, int level) { #ifdef CONFIG_X86 - return kvm_pic_set_irq(pic_irqchip(kvm), e-irqchip.pin, level); + struct kvm_pic *pic = pic_irqchip(kvm); + level = kvm_irq_line_state(pic-irq_states[e-irqchip.pin], + irq_source_id, level); + return kvm_pic_set_irq(pic, e-irqchip.pin, level); #else return -1; #endif } static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e, - struct kvm *kvm, int level) + struct kvm *kvm, int irq_source_id, int level) { - return kvm_ioapic_set_irq(kvm-arch.vioapic, e-irqchip.pin, level); + struct kvm_ioapic *ioapic = kvm-arch.vioapic; + level = kvm_irq_line_state(ioapic-irq_states[e-irqchip.pin], + irq_source_id, level); + + return kvm_ioapic_set_irq(ioapic, e-irqchip.pin, level); } inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) @@ -96,10 +115,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } static int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, - struct kvm *kvm, int level) + struct kvm *kvm, int irq_source_id, int level) { struct kvm_lapic_irq irq; + if (!level) + return -1; + trace_kvm_msi_set_irq(e-msi.address_lo, e-msi.data); irq.dest_id = (e-msi.address_lo @@ -125,34 +147,19 @@ static int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) { struct kvm_kernel_irq_routing_entry *e; - unsigned long *irq_state, sig_level; int ret = -1; trace_kvm_set_irq(irq, level, irq_source_id); WARN_ON(!mutex_is_locked(kvm-irq_lock)); - if (irq KVM_IOAPIC_NUM_PINS) { - irq_state = (unsigned long *)kvm-arch.irq_states[irq]; - -
[PATCH 06/42] KVM: x86 emulator: Introduce No64 decode option
From: Mohammed Gamal m.gamal...@gmail.com Introduces a new decode option No64, which is used for instructions that are invalid in long mode. Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c | 42 ++ 1 files changed, 14 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1cdfec5..1f0ff4a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -75,6 +75,8 @@ #define Group (114) /* Bits 3:5 of modrm byte extend opcode */ #define GroupDual (115) /* Alternate decoding of mod == 3 */ #define GroupMask 0xff/* Group number stored in bits 0:7 */ +/* Misc flags */ +#define No64 (128) /* Source 2 operand type */ #define Src2None(029) #define Src2CL (129) @@ -93,21 +95,21 @@ static u32 opcode_table[256] = { ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, - ImplicitOps | Stack, ImplicitOps | Stack, + ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, /* 0x08 - 0x0F */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, - 0, 0, ImplicitOps | Stack, 0, + 0, 0, ImplicitOps | Stack | No64, 0, /* 0x10 - 0x17 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, - ImplicitOps | Stack, ImplicitOps | Stack, + ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, /* 0x18 - 0x1F */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, - ImplicitOps | Stack, ImplicitOps | Stack, + ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, /* 0x20 - 0x27 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, @@ -161,7 +163,7 @@ static u32 opcode_table[256] = { /* 0x90 - 0x97 */ DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, /* 0x98 - 0x9F */ - 0, 0, SrcImm | Src2Imm16, 0, + 0, 0, SrcImm | Src2Imm16 | No64, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0, /* 0xA0 - 0xA7 */ ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs, @@ -188,7 +190,7 @@ static u32 opcode_table[256] = { ByteOp | DstMem | SrcImm | ModRM | Mov, DstMem | SrcImm | ModRM | Mov, /* 0xC8 - 0xCF */ 0, 0, 0, ImplicitOps | Stack, - ImplicitOps, SrcImmByte, ImplicitOps, ImplicitOps, + ImplicitOps, SrcImmByte, ImplicitOps | No64, ImplicitOps, /* 0xD0 - 0xD7 */ ByteOp | DstMem | SrcImplicit | ModRM, DstMem | SrcImplicit | ModRM, ByteOp | DstMem | SrcImplicit | ModRM, DstMem | SrcImplicit | ModRM, @@ -201,7 +203,7 @@ static u32 opcode_table[256] = { ByteOp | SrcImmUByte, SrcImmUByte, /* 0xE8 - 0xEF */ SrcImm | Stack, SrcImm | ImplicitOps, - SrcImmU | Src2Imm16, SrcImmByte | ImplicitOps, + SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps, SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* 0xF0 - 0xF7 */ @@ -967,6 +969,11 @@ done_prefixes: } } + if (mode == X86EMUL_MODE_PROT64 (c-d No64)) { + kvm_report_emulation_failure(ctxt-vcpu, invalid x86/64 instruction);; + return -1; + } + if (c-d Group) { group = c-d GroupMask; c-modrm = insn_fetch(u8, 1, c-eip); @@ -1739,15 +1746,9 @@ special_insn: emulate_2op_SrcV(add, c-src, c-dst, ctxt-eflags); break; case 0x06: /* push es */ - if (ctxt-mode == X86EMUL_MODE_PROT64) - goto cannot_emulate; - emulate_push_sreg(ctxt, VCPU_SREG_ES); break; case 0x07: /* pop es */ -if (ctxt-mode == X86EMUL_MODE_PROT64) -goto cannot_emulate; - rc = emulate_pop_sreg(ctxt, ops, VCPU_SREG_ES); if (rc != 0) goto done; @@ -1757,9 +1758,6 @@ special_insn: emulate_2op_SrcV(or, c-src, c-dst, ctxt-eflags); break; case 0x0e: /* push cs */ -if (ctxt-mode == X86EMUL_MODE_PROT64) -goto cannot_emulate; - emulate_push_sreg(ctxt, VCPU_SREG_CS); break; case
[PATCH 02/42] x86: Fix user return notifier build
When CONFIG_USER_RETURN_NOTIFIER is set, we need to link kernel/user-return-notifier.o. Signed-off-by: Avi Kivity a...@redhat.com LKML-Reference: 1256473485-23109-1-git-send-email-...@redhat.com Signed-off-by: Ingo Molnar mi...@elte.hu --- kernel/Makefile |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/Makefile b/kernel/Makefile index b8d4cd8..0ae57a8 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -95,6 +95,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/ obj-$(CONFIG_SMP) += sched_cpupri.o obj-$(CONFIG_SLOW_WORK) += slow-work.o obj-$(CONFIG_PERF_EVENTS) += perf_event.o +obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) # According to Alan Modra a...@linuxcare.com.au, the -fno-omit-frame-pointer is -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/42] KVM: Don't wrap schedule() with vcpu_put()/vcpu_load()
Preemption notifiers will do that for us automatically. Signed-off-by: Avi Kivity a...@redhat.com --- virt/kvm/kvm_main.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7495ce3..22b520b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1689,9 +1689,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) if (signal_pending(current)) break; - vcpu_put(vcpu); schedule(); - vcpu_load(vcpu); } finish_wait(vcpu-wq, wait); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/42] core, x86: Add user return notifiers
Add a general per-cpu notifier that is called whenever the kernel is about to return to userspace. The notifier uses a thread_info flag and existing checks, so there is no impact on user return or context switch fast paths. This will be used initially to speed up KVM task switching by lazily updating MSRs. Signed-off-by: Avi Kivity a...@redhat.com LKML-Reference: 1253342422-13811-1-git-send-email-...@redhat.com Signed-off-by: H. Peter Anvin h...@zytor.com --- arch/Kconfig | 10 +++ arch/x86/Kconfig |1 + arch/x86/include/asm/thread_info.h |7 +++- arch/x86/kernel/process.c|2 + arch/x86/kernel/signal.c |3 ++ include/linux/user-return-notifier.h | 42 +++ kernel/user-return-notifier.c| 46 ++ 7 files changed, 109 insertions(+), 2 deletions(-) create mode 100644 include/linux/user-return-notifier.h create mode 100644 kernel/user-return-notifier.c diff --git a/arch/Kconfig b/arch/Kconfig index 7f418bb..4e312ff 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -83,6 +83,13 @@ config KRETPROBES def_bool y depends on KPROBES HAVE_KRETPROBES +config USER_RETURN_NOTIFIER + bool + depends on HAVE_USER_RETURN_NOTIFIER + help + Provide a kernel-internal notification when a cpu is about to + switch to user mode. + config HAVE_IOREMAP_PROT bool @@ -126,4 +133,7 @@ config HAVE_DMA_API_DEBUG config HAVE_DEFAULT_NO_SPIN_MUTEXES bool +config HAVE_USER_RETURN_NOTIFIER + bool + source kernel/gcov/Kconfig diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8da9374..1df175d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -50,6 +50,7 @@ config X86 select HAVE_KERNEL_BZIP2 select HAVE_KERNEL_LZMA select HAVE_ARCH_KMEMCHECK + select HAVE_USER_RETURN_NOTIFIER config OUTPUT_FORMAT string diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index d27d0a2..375c917 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -83,6 +83,7 @@ struct thread_info { #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP8 /* secure computing */ #define TIF_MCE_NOTIFY 10 /* notify userspace of an MCE */ +#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* 32bit process */ #define TIF_FORK 18 /* ret_from_fork */ @@ -107,6 +108,7 @@ struct thread_info { #define _TIF_SYSCALL_AUDIT (1 TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 TIF_SECCOMP) #define _TIF_MCE_NOTIFY(1 TIF_MCE_NOTIFY) +#define _TIF_USER_RETURN_NOTIFY(1 TIF_USER_RETURN_NOTIFY) #define _TIF_NOTSC (1 TIF_NOTSC) #define _TIF_IA32 (1 TIF_IA32) #define _TIF_FORK (1 TIF_FORK) @@ -142,13 +144,14 @@ struct thread_info { /* Only used for 64 bit */ #define _TIF_DO_NOTIFY_MASK\ - (_TIF_SIGPENDING|_TIF_MCE_NOTIFY|_TIF_NOTIFY_RESUME) + (_TIF_SIGPENDING | _TIF_MCE_NOTIFY | _TIF_NOTIFY_RESUME | \ +_TIF_USER_RETURN_NOTIFY) /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ (_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC) -#define _TIF_WORK_CTXSW_PREV _TIF_WORK_CTXSW +#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW|_TIF_DEBUG) #define PREEMPT_ACTIVE 0x1000 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 5284cd2..e51b056 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -9,6 +9,7 @@ #include linux/pm.h #include linux/clockchips.h #include linux/random.h +#include linux/user-return-notifier.h #include trace/events/power.h #include asm/system.h #include asm/apic.h @@ -224,6 +225,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, */ memset(tss-io_bitmap, 0xff, prev-io_bitmap_max); } + propagate_user_return_notify(prev_p, next_p); } int sys_fork(struct pt_regs *regs) diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 6a44a76..c49f90f 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -19,6 +19,7 @@ #include linux/stddef.h #include linux/personality.h #include linux/uaccess.h +#include linux/user-return-notifier.h #include asm/processor.h #include asm/ucontext.h @@ -872,6 +873,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags) if
[PATCH 39/42] KVM: SVM: Move INTR vmexit out of atomic code
From: Joerg Roedel joerg.roe...@amd.com The nested SVM code emulates a #vmexit caused by a request to open the irq window right in the request function. This is a bug because the request function runs with preemption and interrupts disabled but the #vmexit emulation might sleep. This can cause a schedule()-while-atomic bug and is fixed with this patch. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c | 26 +- 1 files changed, 25 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e372854..884bffc 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -85,6 +85,9 @@ struct nested_state { /* gpa pointers to the real vectors */ u64 vmcb_msrpm; + /* A VMEXIT is required but not yet emulated */ + bool exit_required; + /* cache for intercepts of the guest */ u16 intercept_cr_read; u16 intercept_cr_write; @@ -1379,7 +1382,14 @@ static inline int nested_svm_intr(struct vcpu_svm *svm) svm-vmcb-control.exit_code = SVM_EXIT_INTR; - if (nested_svm_exit_handled(svm)) { + if (svm-nested.intercept 1ULL) { + /* +* The #vmexit can't be emulated here directly because this +* code path runs with irqs and preemtion disabled. A +* #vmexit emulation might sleep. Only signal request for +* the #vmexit here. +*/ + svm-nested.exit_required = true; nsvm_printk(VMexit - INTR\n); return 1; } @@ -2340,6 +2350,13 @@ static int handle_exit(struct kvm_vcpu *vcpu) trace_kvm_exit(exit_code, svm-vmcb-save.rip); + if (unlikely(svm-nested.exit_required)) { + nested_svm_vmexit(svm); + svm-nested.exit_required = false; + + return 1; + } + if (is_nested(svm)) { int vmexit; @@ -2615,6 +2632,13 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) u16 gs_selector; u16 ldt_selector; + /* +* A vmexit emulation is required before the vcpu can be executed +* again. +*/ + if (unlikely(svm-nested.exit_required)) + return; + svm-vmcb-save.rax = vcpu-arch.regs[VCPU_REGS_RAX]; svm-vmcb-save.rsp = vcpu-arch.regs[VCPU_REGS_RSP]; svm-vmcb-save.rip = vcpu-arch.regs[VCPU_REGS_RIP]; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/42] KVM: x86 emulator: Add pusha and popa instructions
From: Mohammed Gamal m.gamal...@gmail.com This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting MINIX with invalid guest state emulation on. [marcelo: remove unused variable] Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c | 48 +++- 1 files changed, 47 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index db0820d..d226dff 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -139,7 +139,8 @@ static u32 opcode_table[256] = { DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack, /* 0x60 - 0x67 */ - 0, 0, 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ , + ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, + 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ , 0, 0, 0, 0, /* 0x68 - 0x6F */ SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0, @@ -1225,6 +1226,43 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt, return rc; } +static void emulate_pusha(struct x86_emulate_ctxt *ctxt) +{ + struct decode_cache *c = ctxt-decode; + unsigned long old_esp = c-regs[VCPU_REGS_RSP]; + int reg = VCPU_REGS_RAX; + + while (reg = VCPU_REGS_RDI) { + (reg == VCPU_REGS_RSP) ? + (c-src.val = old_esp) : (c-src.val = c-regs[reg]); + + emulate_push(ctxt); + ++reg; + } +} + +static int emulate_popa(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops) +{ + struct decode_cache *c = ctxt-decode; + int rc = 0; + int reg = VCPU_REGS_RDI; + + while (reg = VCPU_REGS_RAX) { + if (reg == VCPU_REGS_RSP) { + register_address_increment(c, c-regs[VCPU_REGS_RSP], + c-op_bytes); + --reg; + } + + rc = emulate_pop(ctxt, ops, c-regs[reg], c-op_bytes); + if (rc != 0) + break; + --reg; + } + return rc; +} + static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { @@ -1816,6 +1854,14 @@ special_insn: if (rc != 0) goto done; break; + case 0x60: /* pusha */ + emulate_pusha(ctxt); + break; + case 0x61: /* popa */ + rc = emulate_popa(ctxt, ops); + if (rc != 0) + goto done; + break; case 0x63: /* movsxd */ if (ctxt-mode != X86EMUL_MODE_PROT64) goto cannot_emulate; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/42] KVM: remove duplicated #include
From: Huang Weiyi weiyi.hu...@gmail.com Remove duplicated #include('s) in arch/x86/kvm/lapic.c Signed-off-by: Huang Weiyi weiyi.hu...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/lapic.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8787637..cd60c0b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -32,7 +32,6 @@ #include asm/current.h #include asm/apicdef.h #include asm/atomic.h -#include asm/apicdef.h #include kvm_cache_regs.h #include irq.h #include trace.h -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/42] KVM: SVM: don't copy exit_int_info on nested vmrun
From: Joerg Roedel joerg.roe...@amd.com The exit_int_info field is only written by the hardware and never read. So it does not need to be copied on a vmrun emulation. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3f3fe81..41c996a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1797,8 +1797,6 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) svm-nested.intercept= nested_vmcb-control.intercept; force_new_asid(svm-vcpu); - svm-vmcb-control.exit_int_info = nested_vmcb-control.exit_int_info; - svm-vmcb-control.exit_int_info_err = nested_vmcb-control.exit_int_info_err; svm-vmcb-control.int_ctl = nested_vmcb-control.int_ctl | V_INTR_MASKING_MASK; if (nested_vmcb-control.int_ctl V_IRQ_MASK) { nsvm_printk(nSVM Injecting Interrupt: 0x%x\n, -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/42] KVM: SVM: Remove remaining occurences of rdtscll
From: Joerg Roedel joerg.roe...@amd.com This patch replaces them with native_read_tsc() which can also be used in expressions and saves a variable on the stack in this case. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 41c996a..9a4daca 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -763,14 +763,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) int i; if (unlikely(cpu != vcpu-cpu)) { - u64 tsc_this, delta; + u64 delta; /* * Make sure that the guest sees a monotonically * increasing TSC. */ - rdtscll(tsc_this); - delta = vcpu-arch.host_tsc - tsc_this; + delta = vcpu-arch.host_tsc - native_read_tsc(); svm-vmcb-control.tsc_offset += delta; if (is_nested(svm)) svm-nested.hsave-control.tsc_offset += delta; @@ -792,7 +791,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) for (i = 0; i NR_HOST_SAVE_USER_MSRS; i++) wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]); - rdtscll(vcpu-arch.host_tsc); + vcpu-arch.host_tsc = native_read_tsc(); } static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/42] KVM: Don't pass kvm_run arguments
They're just copies of vcpu-run, which is readily accessible. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h | 10 ++-- arch/x86/kvm/emulate.c |6 +- arch/x86/kvm/mmu.c |2 +- arch/x86/kvm/svm.c | 102 ++ arch/x86/kvm/vmx.c | 113 ++ arch/x86/kvm/x86.c | 50 - 6 files changed, 141 insertions(+), 142 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d838922..0b113f2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -506,8 +506,8 @@ struct kvm_x86_ops { void (*tlb_flush)(struct kvm_vcpu *vcpu); - void (*run)(struct kvm_vcpu *vcpu, struct kvm_run *run); - int (*handle_exit)(struct kvm_run *run, struct kvm_vcpu *vcpu); + void (*run)(struct kvm_vcpu *vcpu); + int (*handle_exit)(struct kvm_vcpu *vcpu); void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); @@ -568,7 +568,7 @@ enum emulation_result { #define EMULTYPE_NO_DECODE (1 0) #define EMULTYPE_TRAP_UD (1 1) #define EMULTYPE_SKIP (1 2) -int emulate_instruction(struct kvm_vcpu *vcpu, struct kvm_run *run, +int emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, u16 error_code, int emulation_type); void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context); void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); @@ -585,9 +585,9 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); struct x86_emulate_ctxt; -int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in, +int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in, int size, unsigned port); -int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in, +int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in, int size, unsigned long count, int down, gva_t address, int rep, unsigned port); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1f0ff4a..0644d3d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1826,7 +1826,7 @@ special_insn: break; case 0x6c: /* insb */ case 0x6d: /* insw/insd */ -if (kvm_emulate_pio_string(ctxt-vcpu, NULL, +if (kvm_emulate_pio_string(ctxt-vcpu, 1, (c-d ByteOp) ? 1 : c-op_bytes, c-rep_prefix ? @@ -1842,7 +1842,7 @@ special_insn: return 0; case 0x6e: /* outsb */ case 0x6f: /* outsw/outsd */ - if (kvm_emulate_pio_string(ctxt-vcpu, NULL, + if (kvm_emulate_pio_string(ctxt-vcpu, 0, (c-d ByteOp) ? 1 : c-op_bytes, c-rep_prefix ? @@ -2135,7 +2135,7 @@ special_insn: case 0xef: /* out (e/r)ax,dx */ port = c-regs[VCPU_REGS_RDX]; io_dir_in = 0; - do_io: if (kvm_emulate_pio(ctxt-vcpu, NULL, io_dir_in, + do_io: if (kvm_emulate_pio(ctxt-vcpu, io_dir_in, (c-d ByteOp) ? 1 : c-op_bytes, port) != 0) { c-eip = saved_eip; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 818b92a..a902479 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2789,7 +2789,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code) if (r) goto out; - er = emulate_instruction(vcpu, vcpu-run, cr2, error_code, 0); + er = emulate_instruction(vcpu, cr2, error_code, 0); switch (er) { case EMULATE_DONE: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c17404a..92048a6 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -286,7 +286,7 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); if (!svm-next_rip) { - if (emulate_instruction(vcpu, vcpu-run, 0, 0, EMULTYPE_SKIP) != + if (emulate_instruction(vcpu, 0, 0, EMULTYPE_SKIP) != EMULATE_DONE) printk(KERN_DEBUG %s: NOP\n, __func__); return; @@ -1180,7 +1180,7 @@ static void svm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long value, } } -static int pf_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +static
[PATCH 41/42] KVM: SVM: Add tracepoint for nested #vmexit
From: Joerg Roedel joerg.roe...@amd.com This patch adds a tracepoint for every #vmexit we get from a nested guest. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |6 ++ arch/x86/kvm/trace.h | 36 arch/x86/kvm/x86.c |1 + 3 files changed, 43 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 907af3f..edf6e8b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2366,6 +2366,12 @@ static int handle_exit(struct kvm_vcpu *vcpu) if (is_nested(svm)) { int vmexit; + trace_kvm_nested_vmexit(svm-vmcb-save.rip, exit_code, + svm-vmcb-control.exit_info_1, + svm-vmcb-control.exit_info_2, + svm-vmcb-control.exit_int_info, + svm-vmcb-control.exit_int_info_err); + nsvm_printk(nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n, exit_code, svm-vmcb-control.exit_info_1, svm-vmcb-control.exit_info_2, svm-vmcb-save.rip); diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index b5798e1..a7eb629 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -382,6 +382,42 @@ TRACE_EVENT(kvm_nested_vmrun, __entry-npt ? on : off) ); +/* + * Tracepoint for #VMEXIT while nested + */ +TRACE_EVENT(kvm_nested_vmexit, + TP_PROTO(__u64 rip, __u32 exit_code, +__u64 exit_info1, __u64 exit_info2, +__u32 exit_int_info, __u32 exit_int_info_err), + TP_ARGS(rip, exit_code, exit_info1, exit_info2, + exit_int_info, exit_int_info_err), + + TP_STRUCT__entry( + __field(__u64, rip ) + __field(__u32, exit_code ) + __field(__u64, exit_info1 ) + __field(__u64, exit_info2 ) + __field(__u32, exit_int_info ) + __field(__u32, exit_int_info_err ) + ), + + TP_fast_assign( + __entry-rip= rip; + __entry-exit_code = exit_code; + __entry-exit_info1 = exit_info1; + __entry-exit_info2 = exit_info2; + __entry-exit_int_info = exit_int_info; + __entry-exit_int_info_err = exit_int_info_err; + ), + TP_printk(rip: 0x%016llx reason: %s ext_inf1: 0x%016llx + ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x\n, + __entry-rip, + ftrace_print_symbols_seq(p, __entry-exit_code, + kvm_x86_ops-exit_reasons_str), + __entry-exit_info1, __entry-exit_info2, + __entry-exit_int_info, __entry-exit_int_info_err) +); + #endif /* _TRACE_KVM_H */ /* This part must be outside protection */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ab2f90..192d58e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4985,3 +4985,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/42] KVM: x86 emulator: Add 'push/pop sreg' instructions
From: Mohammed Gamal m.gamal...@gmail.com [avi: avoid buffer overflow] Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c | 107 +--- 1 files changed, 101 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1be5cd6..1cdfec5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -92,19 +92,22 @@ static u32 opcode_table[256] = { /* 0x00 - 0x07 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, - ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0, + ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, + ImplicitOps | Stack, ImplicitOps | Stack, /* 0x08 - 0x0F */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, - 0, 0, 0, 0, + 0, 0, ImplicitOps | Stack, 0, /* 0x10 - 0x17 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, - ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0, + ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, + ImplicitOps | Stack, ImplicitOps | Stack, /* 0x18 - 0x1F */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, - ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0, + ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, + ImplicitOps | Stack, ImplicitOps | Stack, /* 0x20 - 0x27 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, @@ -244,11 +247,13 @@ static u32 twobyte_table[256] = { /* 0x90 - 0x9F */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0xA0 - 0xA7 */ - 0, 0, 0, DstMem | SrcReg | ModRM | BitOp, + ImplicitOps | Stack, ImplicitOps | Stack, + 0, DstMem | SrcReg | ModRM | BitOp, DstMem | SrcReg | Src2ImmByte | ModRM, DstMem | SrcReg | Src2CL | ModRM, 0, 0, /* 0xA8 - 0xAF */ - 0, 0, 0, DstMem | SrcReg | ModRM | BitOp, + ImplicitOps | Stack, ImplicitOps | Stack, + 0, DstMem | SrcReg | ModRM | BitOp, DstMem | SrcReg | Src2ImmByte | ModRM, DstMem | SrcReg | Src2CL | ModRM, ModRM, 0, @@ -1186,6 +1191,32 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt, return rc; } +static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg) +{ + struct decode_cache *c = ctxt-decode; + struct kvm_segment segment; + + kvm_x86_ops-get_segment(ctxt-vcpu, segment, seg); + + c-src.val = segment.selector; + emulate_push(ctxt); +} + +static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, int seg) +{ + struct decode_cache *c = ctxt-decode; + unsigned long selector; + int rc; + + rc = emulate_pop(ctxt, ops, selector, c-op_bytes); + if (rc != 0) + return rc; + + rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)selector, 1, seg); + return rc; +} + static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { @@ -1707,18 +1738,66 @@ special_insn: add: /* add */ emulate_2op_SrcV(add, c-src, c-dst, ctxt-eflags); break; + case 0x06: /* push es */ + if (ctxt-mode == X86EMUL_MODE_PROT64) + goto cannot_emulate; + + emulate_push_sreg(ctxt, VCPU_SREG_ES); + break; + case 0x07: /* pop es */ +if (ctxt-mode == X86EMUL_MODE_PROT64) +goto cannot_emulate; + + rc = emulate_pop_sreg(ctxt, ops, VCPU_SREG_ES); + if (rc != 0) + goto done; + break; case 0x08 ... 0x0d: or: /* or */ emulate_2op_SrcV(or, c-src, c-dst, ctxt-eflags); break; + case 0x0e: /* push cs */ +if (ctxt-mode == X86EMUL_MODE_PROT64) +goto cannot_emulate; + + emulate_push_sreg(ctxt, VCPU_SREG_CS); + break; case 0x10 ... 0x15: adc: /* adc */ emulate_2op_SrcV(adc, c-src, c-dst, ctxt-eflags); break; + case 0x16: /* push ss */ +if (ctxt-mode == X86EMUL_MODE_PROT64) +goto cannot_emulate; + + emulate_push_sreg(ctxt, VCPU_SREG_SS); + break; + case 0x17: /* pop ss */ +if
[PATCH 34/42] KVM: x86: Refactor guest debug IOCTL handling
From: Jan Kiszka jan.kis...@web.de Much of so far vendor-specific code for setting up guest debug can actually be handled by the generic code. This also fixes a minor deficit in the SVM part /wrt processing KVM_GUESTDBG_ENABLE. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |4 ++-- arch/x86/kvm/svm.c | 14 ++ arch/x86/kvm/vmx.c | 18 +- arch/x86/kvm/x86.c | 28 +--- 4 files changed, 26 insertions(+), 38 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 295c7c4..e7f8708 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -475,8 +475,8 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); - int (*set_guest_debug)(struct kvm_vcpu *vcpu, - struct kvm_guest_debug *dbg); + void (*set_guest_debug)(struct kvm_vcpu *vcpu, + struct kvm_guest_debug *dbg); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 02a4269..279a2ae 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1065,26 +1065,16 @@ static void update_db_intercept(struct kvm_vcpu *vcpu) vcpu-guest_debug = 0; } -static int svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) +static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { - int old_debug = vcpu-guest_debug; struct vcpu_svm *svm = to_svm(vcpu); - vcpu-guest_debug = dbg-control; - - update_db_intercept(vcpu); - if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) svm-vmcb-save.dr7 = dbg-arch.debugreg[7]; else svm-vmcb-save.dr7 = vcpu-arch.dr7; - if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) - svm-vmcb-save.rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF; - else if (old_debug KVM_GUESTDBG_SINGLESTEP) - svm-vmcb-save.rflags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF); - - return 0; + update_db_intercept(vcpu); } static void load_host_msrs(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 97f4265..70020e5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1096,30 +1096,14 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) } } -static int set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) +static void set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { - int old_debug = vcpu-guest_debug; - unsigned long flags; - - vcpu-guest_debug = dbg-control; - if (!(vcpu-guest_debug KVM_GUESTDBG_ENABLE)) - vcpu-guest_debug = 0; - if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) vmcs_writel(GUEST_DR7, dbg-arch.debugreg[7]); else vmcs_writel(GUEST_DR7, vcpu-arch.dr7); - flags = vmcs_readl(GUEST_RFLAGS); - if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) - flags |= X86_EFLAGS_TF | X86_EFLAGS_RF; - else if (old_debug KVM_GUESTDBG_SINGLESTEP) - flags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF); - vmcs_writel(GUEST_RFLAGS, flags); - update_exception_bitmap(vcpu); - - return 0; } static __init int cpu_has_kvm_support(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f44d56..a06f88e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4472,12 +4472,19 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { - int i, r; + unsigned long rflags; + int old_debug; + int i; vcpu_load(vcpu); - if ((dbg-control (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP)) == - (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP)) { + old_debug = vcpu-guest_debug; + + vcpu-guest_debug = dbg-control; + if (!(vcpu-guest_debug KVM_GUESTDBG_ENABLE)) + vcpu-guest_debug = 0; + + if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) { for (i = 0; i KVM_NR_DB_REGS; ++i) vcpu-arch.eff_db[i] = dbg-arch.debugreg[i]; vcpu-arch.switch_db_regs = @@ -4488,16 +4495,23 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, vcpu-arch.switch_db_regs = (vcpu-arch.dr7 DR7_BP_EN_MASK); } - r = kvm_x86_ops-set_guest_debug(vcpu, dbg); + rflags = kvm_x86_ops-get_rflags(vcpu);
[PATCH 42/42] KVM: SVM: Add tracepoint for injected #vmexit
From: Joerg Roedel joerg.roe...@amd.com This patch adds a tracepoint for a nested #vmexit that gets re-injected to the guest. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |6 ++ arch/x86/kvm/trace.h | 33 + arch/x86/kvm/x86.c |1 + 3 files changed, 40 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index edf6e8b..369eeb8 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1592,6 +1592,12 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) struct vmcb *hsave = svm-nested.hsave; struct vmcb *vmcb = svm-vmcb; + trace_kvm_nested_vmexit_inject(vmcb-control.exit_code, + vmcb-control.exit_info_1, + vmcb-control.exit_info_2, + vmcb-control.exit_int_info, + vmcb-control.exit_int_info_err); + nested_vmcb = nested_svm_map(svm, svm-nested.vmcb, KM_USER0); if (!nested_vmcb) return 1; diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index a7eb629..4d6bb5e 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -418,6 +418,39 @@ TRACE_EVENT(kvm_nested_vmexit, __entry-exit_int_info, __entry-exit_int_info_err) ); +/* + * Tracepoint for #VMEXIT reinjected to the guest + */ +TRACE_EVENT(kvm_nested_vmexit_inject, + TP_PROTO(__u32 exit_code, +__u64 exit_info1, __u64 exit_info2, +__u32 exit_int_info, __u32 exit_int_info_err), + TP_ARGS(exit_code, exit_info1, exit_info2, + exit_int_info, exit_int_info_err), + + TP_STRUCT__entry( + __field(__u32, exit_code ) + __field(__u64, exit_info1 ) + __field(__u64, exit_info2 ) + __field(__u32, exit_int_info ) + __field(__u32, exit_int_info_err ) + ), + + TP_fast_assign( + __entry-exit_code = exit_code; + __entry-exit_info1 = exit_info1; + __entry-exit_info2 = exit_info2; + __entry-exit_int_info = exit_int_info; + __entry-exit_int_info_err = exit_int_info_err; + ), + + TP_printk(reason: %s ext_inf1: 0x%016llx + ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x\n, + ftrace_print_symbols_seq(p, __entry-exit_code, + kvm_x86_ops-exit_reasons_str), + __entry-exit_info1, __entry-exit_info2, + __entry-exit_int_info, __entry-exit_int_info_err) +); #endif /* _TRACE_KVM_H */ /* This part must be outside protection */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 192d58e..a522d9b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4986,3 +4986,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 40/42] KVM: SVM: Add tracepoint for nested vmrun
From: Joerg Roedel joerg.roe...@amd.com This patch adds a dedicated kvm tracepoint for a nested vmrun. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |6 ++ arch/x86/kvm/trace.h | 33 + arch/x86/kvm/x86.c |1 + 3 files changed, 40 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 884bffc..907af3f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1726,6 +1726,12 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) /* nested_vmcb is our indicator if nested SVM is activated */ svm-nested.vmcb = svm-vmcb-save.rax; + trace_kvm_nested_vmrun(svm-vmcb-save.rip - 3, svm-nested.vmcb, + nested_vmcb-save.rip, + nested_vmcb-control.int_ctl, + nested_vmcb-control.event_inj, + nested_vmcb-control.nested_ctl); + /* Clear internal status */ kvm_clear_exception_queue(svm-vcpu); kvm_clear_interrupt_queue(svm-vcpu); diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 0d480e7..b5798e1 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -349,6 +349,39 @@ TRACE_EVENT(kvm_apic_accept_irq, __entry-coalesced ? (coalesced) : ) ); +/* + * Tracepoint for nested VMRUN + */ +TRACE_EVENT(kvm_nested_vmrun, + TP_PROTO(__u64 rip, __u64 vmcb, __u64 nested_rip, __u32 int_ctl, +__u32 event_inj, bool npt), + TP_ARGS(rip, vmcb, nested_rip, int_ctl, event_inj, npt), + + TP_STRUCT__entry( + __field(__u64, rip ) + __field(__u64, vmcb) + __field(__u64, nested_rip ) + __field(__u32, int_ctl ) + __field(__u32, event_inj ) + __field(bool, npt ) + ), + + TP_fast_assign( + __entry-rip= rip; + __entry-vmcb = vmcb; + __entry-nested_rip = nested_rip; + __entry-int_ctl= int_ctl; + __entry-event_inj = event_inj; + __entry-npt= npt; + ), + + TP_printk(rip: 0x%016llx vmcb: 0x%016llx nrip: 0x%016llx int_ctl: 0x%08x + event_inj: 0x%08x npt: %s\n, + __entry-rip, __entry-vmcb, __entry-nested_rip, + __entry-int_ctl, __entry-event_inj, + __entry-npt ? on : off) +); + #endif /* _TRACE_KVM_H */ /* This part must be outside protection */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4de5bc0..3ab2f90 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4984,3 +4984,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 38/42] KVM: SVM: Notify nested hypervisor of lost event injections
From: Alexander Graf ag...@suse.de If event_inj is valid on a #vmexit the host CPU would write the contents to exit_int_info, so the hypervisor knows that the event wasn't injected. We don't do this in nested SVM by now which is a bug and fixed by this patch. Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 279a2ae..e372854 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1615,6 +1615,22 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) nested_vmcb-control.exit_info_2 = vmcb-control.exit_info_2; nested_vmcb-control.exit_int_info = vmcb-control.exit_int_info; nested_vmcb-control.exit_int_info_err = vmcb-control.exit_int_info_err; + + /* +* If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have +* to make sure that we do not lose injected events. So check event_inj +* here and copy it to exit_int_info if it is valid. +* Exit_int_info and event_inj can't be both valid because the case +* below only happens on a VMRUN instruction intercept which has +* no valid exit_int_info set. +*/ + if (vmcb-control.event_inj SVM_EVTINJ_VALID) { + struct vmcb_control_area *nc = nested_vmcb-control; + + nc-exit_int_info = vmcb-control.event_inj; + nc-exit_int_info_err = vmcb-control.event_inj_err; + } + nested_vmcb-control.tlb_ctl = 0; nested_vmcb-control.event_inj = 0; nested_vmcb-control.event_inj_err = 0; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 35/42] KVM: x86: disable paravirt mmu reporting
From: Marcelo Tosatti mtosa...@redhat.com Disable paravirt MMU capability reporting, so that new (or rebooted) guests switch to native operation. Paravirt MMU is a burden to maintain and does not bring significant advantages compared to shadow anymore. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a06f88e..4693f91 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1238,8 +1238,8 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_MEMORY_SLOTS; break; - case KVM_CAP_PV_MMU: - r = !tdp_enabled; + case KVM_CAP_PV_MMU:/* obsolete */ + r = 0; break; case KVM_CAP_IOMMU: r = iommu_found(); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 36/42] KVM: x86: Rework guest single-step flag injection and filtering
From: Jan Kiszka jan.kis...@siemens.com Push TF and RF injection and filtering on guest single-stepping into the vender get/set_rflags callbacks. This makes the whole mechanism more robust wrt user space IOCTL order and instruction emulations. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |3 ++ arch/x86/kvm/x86.c | 77 +++ 2 files changed, 48 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e7f8708..179a919 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -614,6 +614,9 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); +unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); +void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); + void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr); void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4693f91..385cd0a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -235,6 +235,25 @@ bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl) } EXPORT_SYMBOL_GPL(kvm_require_cpl); +unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) +{ + unsigned long rflags; + + rflags = kvm_x86_ops-get_rflags(vcpu); + if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) + rflags = ~(unsigned long)(X86_EFLAGS_TF | X86_EFLAGS_RF); + return rflags; +} +EXPORT_SYMBOL_GPL(kvm_get_rflags); + +void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) + rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF; + kvm_x86_ops-set_rflags(vcpu, rflags); +} +EXPORT_SYMBOL_GPL(kvm_set_rflags); + /* * Load the pae pdptrs. Return true is they are all valid. */ @@ -2777,7 +2796,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l); vcpu-arch.emulate_ctxt.vcpu = vcpu; - vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu); + vcpu-arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu); vcpu-arch.emulate_ctxt.mode = (vcpu-arch.emulate_ctxt.eflags X86_EFLAGS_VM) ? X86EMUL_MODE_REAL : cs_l @@ -2855,7 +2874,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, return EMULATE_DO_MMIO; } - kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); + kvm_set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); if (vcpu-mmio_is_write) { vcpu-mmio_needed = 0; @@ -3291,7 +3310,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags) { kvm_lmsw(vcpu, msw); - *rflags = kvm_x86_ops-get_rflags(vcpu); + *rflags = kvm_get_rflags(vcpu); } unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr) @@ -3329,7 +3348,7 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val, switch (cr) { case 0: kvm_set_cr0(vcpu, mk_cr_64(vcpu-arch.cr0, val)); - *rflags = kvm_x86_ops-get_rflags(vcpu); + *rflags = kvm_get_rflags(vcpu); break; case 2: vcpu-arch.cr2 = val; @@ -3460,7 +3479,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu) { struct kvm_run *kvm_run = vcpu-run; - kvm_run-if_flag = (kvm_x86_ops-get_rflags(vcpu) X86_EFLAGS_IF) != 0; + kvm_run-if_flag = (kvm_get_rflags(vcpu) X86_EFLAGS_IF) != 0; kvm_run-cr8 = kvm_get_cr8(vcpu); kvm_run-apic_base = kvm_get_apic_base(vcpu); if (irqchip_in_kernel(vcpu-kvm)) @@ -3840,13 +3859,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) #endif regs-rip = kvm_rip_read(vcpu); - regs-rflags = kvm_x86_ops-get_rflags(vcpu); - - /* -* Don't leak debug flags in case they were set for guest debugging -*/ - if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) - regs-rflags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF); + regs-rflags = kvm_get_rflags(vcpu); vcpu_put(vcpu); @@ -3874,12 +3887,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvm_register_write(vcpu, VCPU_REGS_R13, regs-r13); kvm_register_write(vcpu, VCPU_REGS_R14, regs-r14); kvm_register_write(vcpu, VCPU_REGS_R15, regs-r15); - #endif kvm_rip_write(vcpu, regs-rip); - kvm_x86_ops-set_rflags(vcpu, regs-rflags); - +
[PATCH 15/42] KVM: Move IO APIC to its own lock
From: Gleb Natapov g...@redhat.com The allows removal of irq_lock from the injection path. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/ia64/kvm/kvm-ia64.c |7 +--- arch/x86/kvm/i8259.c | 22 +--- arch/x86/kvm/lapic.c |5 +-- arch/x86/kvm/x86.c | 10 + virt/kvm/ioapic.c| 80 +++--- virt/kvm/ioapic.h|4 ++ virt/kvm/irq_comm.c | 23 - 7 files changed, 100 insertions(+), 51 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 0ad09f0..4a98314 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -851,8 +851,7 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, r = 0; switch (chip-chip_id) { case KVM_IRQCHIP_IOAPIC: - memcpy(chip-chip.ioapic, ioapic_irqchip(kvm), - sizeof(struct kvm_ioapic_state)); + r = kvm_get_ioapic(kvm, chip-chip.ioapic); break; default: r = -EINVAL; @@ -868,9 +867,7 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) r = 0; switch (chip-chip_id) { case KVM_IRQCHIP_IOAPIC: - memcpy(ioapic_irqchip(kvm), - chip-chip.ioapic, - sizeof(struct kvm_ioapic_state)); + r = kvm_set_ioapic(kvm, chip-chip.ioapic); break; default: r = -EINVAL; diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index ccc941a..d057c0c 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -38,7 +38,15 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq) s-isr_ack |= (1 irq); if (s != s-pics_state-pics[0]) irq += 8; + /* +* We are dropping lock while calling ack notifiers since ack +* notifier callbacks for assigned devices call into PIC recursively. +* Other interrupt may be delivered to PIC while lock is dropped but +* it should be safe since PIC state is already updated at this stage. +*/ + spin_unlock(s-pics_state-lock); kvm_notify_acked_irq(s-pics_state-kvm, SELECT_PIC(irq), irq); + spin_lock(s-pics_state-lock); } void kvm_pic_clear_isr_ack(struct kvm *kvm) @@ -176,16 +184,18 @@ int kvm_pic_set_irq(void *opaque, int irq, int level) static inline void pic_intack(struct kvm_kpic_state *s, int irq) { s-isr |= 1 irq; - if (s-auto_eoi) { - if (s-rotate_on_auto_eoi) - s-priority_add = (irq + 1) 7; - pic_clear_isr(s, irq); - } /* * We don't clear a level sensitive interrupt here */ if (!(s-elcr (1 irq))) s-irr = ~(1 irq); + + if (s-auto_eoi) { + if (s-rotate_on_auto_eoi) + s-priority_add = (irq + 1) 7; + pic_clear_isr(s, irq); + } + } int kvm_pic_read_irq(struct kvm *kvm) @@ -294,9 +304,9 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 val) priority = get_priority(s, s-isr); if (priority != 8) { irq = (priority + s-priority_add) 7; - pic_clear_isr(s, irq); if (cmd == 5) s-priority_add = (irq + 1) 7; + pic_clear_isr(s, irq); pic_update_irq(s-pics_state); } break; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 23c2176..df8bcb0 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -471,11 +471,8 @@ static void apic_set_eoi(struct kvm_lapic *apic) trigger_mode = IOAPIC_LEVEL_TRIG; else trigger_mode = IOAPIC_EDGE_TRIG; - if (!(apic_get_reg(apic, APIC_SPIV) APIC_SPIV_DIRECTED_EOI)) { - mutex_lock(apic-vcpu-kvm-irq_lock); + if (!(apic_get_reg(apic, APIC_SPIV) APIC_SPIV_DIRECTED_EOI)) kvm_ioapic_update_eoi(apic-vcpu-kvm, vector, trigger_mode); - mutex_unlock(apic-vcpu-kvm-irq_lock); - } } static void apic_send_ipi(struct kvm_lapic *apic) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1687d12..fdf989f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2038,9 +2038,7 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) sizeof(struct kvm_pic_state)); break; case KVM_IRQCHIP_IOAPIC: - memcpy(chip-chip.ioapic, - ioapic_irqchip(kvm), -
[PATCH 31/42] KVM: Fix printk name error in svm.c
From: Zachary Amsden zams...@redhat.com Signed-off-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9a4daca..d1036ce 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -330,13 +330,14 @@ static int svm_hardware_enable(void *garbage) return -EBUSY; if (!has_svm()) { - printk(KERN_ERR svm_cpu_init: err EOPNOTSUPP on %d\n, me); + printk(KERN_ERR svm_hardware_enable: err EOPNOTSUPP on %d\n, + me); return -EINVAL; } svm_data = per_cpu(svm_data, me); if (!svm_data) { - printk(KERN_ERR svm_cpu_init: svm_data is NULL on %d\n, + printk(KERN_ERR svm_hardware_enable: svm_data is NULL on %d\n, me); return -EINVAL; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 29/42] KVM: Separate timer intialization into an indepedent function
From: Zachary Amsden zams...@redhat.com Signed-off-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/x86.c | 23 +++ 1 files changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3d83de8..6a31dfb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3118,9 +3118,22 @@ static struct notifier_block kvmclock_cpufreq_notifier_block = { .notifier_call = kvmclock_cpufreq_notifier }; +static void kvm_timer_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) + per_cpu(cpu_tsc_khz, cpu) = tsc_khz; + if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { + tsc_khz_ref = tsc_khz; + cpufreq_register_notifier(kvmclock_cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER); + } +} + int kvm_arch_init(void *opaque) { - int r, cpu; + int r; struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque; if (kvm_x86_ops) { @@ -3152,13 +3165,7 @@ int kvm_arch_init(void *opaque) kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, PT_DIRTY_MASK, PT64_NX_MASK, 0); - for_each_possible_cpu(cpu) - per_cpu(cpu_tsc_khz, cpu) = tsc_khz; - if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { - tsc_khz_ref = tsc_khz; - cpufreq_register_notifier(kvmclock_cpufreq_notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - } + kvm_timer_init(); return 0; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 22/42] KVM: SVM: remove needless mmap_sem acquision from nested_svm_map
From: Marcelo Tosatti mtosa...@redhat.com nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since gfn_to_page / get_user_pages are responsible for it. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Acked-by: Alexander Graf ag...@suse.de Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/svm.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 92048a6..f54c4f9 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1396,10 +1396,7 @@ static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa, enum km_type idx) { struct page *page; - down_read(current-mm-mmap_sem); page = gfn_to_page(svm-vcpu.kvm, gpa PAGE_SHIFT); - up_read(current-mm-mmap_sem); - if (is_error_page(page)) goto error; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 33/42] KVM: remove pre_task_link setting in save_state_to_tss16
From: Juan Quintela quint...@redhat.com Now, also remove pre_task_link setting in save_state_to_tss16. commit b237ac37a149e8b56436fabf093532483bff13b0 Author: Gleb Natapov g...@redhat.com Date: Mon Mar 30 16:03:24 2009 +0300 KVM: Fix task switch back link handling. CC: Gleb Natapov g...@redhat.com Signed-off-by: Juan Quintela quint...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/x86.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6f75856..5f44d56 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4203,7 +4203,6 @@ static void save_state_to_tss16(struct kvm_vcpu *vcpu, tss-ss = get_segment_selector(vcpu, VCPU_SREG_SS); tss-ds = get_segment_selector(vcpu, VCPU_SREG_DS); tss-ldt = get_segment_selector(vcpu, VCPU_SREG_LDTR); - tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR); } static int load_state_from_tss16(struct kvm_vcpu *vcpu, -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 32/42] KVM: Fix hotplug of CPUs
From: Zachary Amsden zams...@redhat.com Both VMX and SVM require per-cpu memory allocation, which is done at module init time, for only online cpus. Backend was not allocating enough structure for all possible CPUs, so new CPUs coming online could not be hardware enabled. Signed-off-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/svm.c |4 ++-- arch/x86/kvm/vmx.c |6 -- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d1036ce..02a4269 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -482,7 +482,7 @@ static __init int svm_hardware_setup(void) kvm_enable_efer_bits(EFER_SVME); } - for_each_online_cpu(cpu) { + for_each_possible_cpu(cpu) { r = svm_cpu_init(cpu); if (r) goto err; @@ -516,7 +516,7 @@ static __exit void svm_hardware_unsetup(void) { int cpu; - for_each_online_cpu(cpu) + for_each_possible_cpu(cpu) svm_cpu_uninit(cpu); __free_pages(pfn_to_page(iopm_base PAGE_SHIFT), IOPM_ALLOC_ORDER); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a187570..97f4265 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1350,15 +1350,17 @@ static void free_kvm_area(void) { int cpu; - for_each_online_cpu(cpu) + for_each_possible_cpu(cpu) { free_vmcs(per_cpu(vmxarea, cpu)); + per_cpu(vmxarea, cpu) = NULL; + } } static __init int alloc_kvm_area(void) { int cpu; - for_each_online_cpu(cpu) { + for_each_possible_cpu(cpu) { struct vmcs *vmcs; vmcs = alloc_vmcs_cpu(cpu); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/42] KVM: Move irq routing data structure to rcu locking
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- virt/kvm/irq_comm.c | 16 +++- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 59cf8da..fb861dd 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -159,7 +159,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ - irq_rt = kvm-irq_routing; + rcu_read_lock(); + irq_rt = rcu_dereference(kvm-irq_routing); if (irq irq_rt-nr_rt_entries) hlist_for_each_entry(e, n, irq_rt-map[irq], link) { int r = e-set(e, kvm, irq_source_id, level); @@ -168,6 +169,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) ret = r + ((ret 0) ? 0 : ret); } + rcu_read_unlock(); return ret; } @@ -179,7 +181,10 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) trace_kvm_ack_irq(irqchip, pin); - gsi = kvm-irq_routing-chip[irqchip][pin]; + rcu_read_lock(); + gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin]; + rcu_read_unlock(); + if (gsi != -1) hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list, link) @@ -279,9 +284,9 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask) void kvm_free_irq_routing(struct kvm *kvm) { - mutex_lock(kvm-irq_lock); + /* Called only during vm destruction. Nobody can use the pointer + at this stage */ kfree(kvm-irq_routing); - mutex_unlock(kvm-irq_lock); } static int setup_routing_entry(struct kvm_irq_routing_table *rt, @@ -387,8 +392,9 @@ int kvm_set_irq_routing(struct kvm *kvm, mutex_lock(kvm-irq_lock); old = kvm-irq_routing; - kvm-irq_routing = new; + rcu_assign_pointer(kvm-irq_routing, new); mutex_unlock(kvm-irq_lock); + synchronize_rcu(); new = old; r = 0; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/42] KVM: Convert irq notifiers lists to RCU locking
From: Gleb Natapov g...@redhat.com Use RCU locking for mask/ack notifiers lists. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- virt/kvm/irq_comm.c | 22 -- 1 files changed, 12 insertions(+), 10 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index f019725..6c94614 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -183,19 +183,19 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) rcu_read_lock(); gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin]; - rcu_read_unlock(); - if (gsi != -1) - hlist_for_each_entry(kian, n, kvm-irq_ack_notifier_list, link) + hlist_for_each_entry_rcu(kian, n, kvm-irq_ack_notifier_list, +link) if (kian-gsi == gsi) kian-irq_acked(kian); + rcu_read_unlock(); } void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian) { mutex_lock(kvm-irq_lock); - hlist_add_head(kian-link, kvm-irq_ack_notifier_list); + hlist_add_head_rcu(kian-link, kvm-irq_ack_notifier_list); mutex_unlock(kvm-irq_lock); } @@ -203,8 +203,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian) { mutex_lock(kvm-irq_lock); - hlist_del_init(kian-link); + hlist_del_init_rcu(kian-link); mutex_unlock(kvm-irq_lock); + synchronize_rcu(); } int kvm_request_irq_source_id(struct kvm *kvm) @@ -257,7 +258,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, { mutex_lock(kvm-irq_lock); kimn-irq = irq; - hlist_add_head(kimn-link, kvm-mask_notifier_list); + hlist_add_head_rcu(kimn-link, kvm-mask_notifier_list); mutex_unlock(kvm-irq_lock); } @@ -265,8 +266,9 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, struct kvm_irq_mask_notifier *kimn) { mutex_lock(kvm-irq_lock); - hlist_del(kimn-link); + hlist_del_rcu(kimn-link); mutex_unlock(kvm-irq_lock); + synchronize_rcu(); } void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask) @@ -274,11 +276,11 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask) struct kvm_irq_mask_notifier *kimn; struct hlist_node *n; - WARN_ON(!mutex_is_locked(kvm-irq_lock)); - - hlist_for_each_entry(kimn, n, kvm-mask_notifier_list, link) + rcu_read_lock(); + hlist_for_each_entry_rcu(kimn, n, kvm-mask_notifier_list, link) if (kimn-irq == irq) kimn-func(kimn, mask); + rcu_read_unlock(); } void kvm_free_irq_routing(struct kvm *kvm) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/42] KVM: Return -ENOTTY on unrecognized ioctls
Not the incorrect -EINVAL. Signed-off-by: Avi Kivity a...@redhat.com --- arch/ia64/kvm/kvm-ia64.c |2 +- arch/powerpc/kvm/powerpc.c |2 +- arch/s390/kvm/kvm-s390.c |2 +- arch/x86/kvm/x86.c |2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index f534e0f..f6471c8 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -941,7 +941,7 @@ long kvm_arch_vm_ioctl(struct file *filp, { struct kvm *kvm = filp-private_data; void __user *argp = (void __user *)arg; - int r = -EINVAL; + int r = -ENOTTY; switch (ioctl) { case KVM_SET_MEMORY_REGION: { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2a4551f..95af622 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -421,7 +421,7 @@ long kvm_arch_vm_ioctl(struct file *filp, switch (ioctl) { default: - r = -EINVAL; + r = -ENOTTY; } return r; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 07ced89..00e2ce8 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -150,7 +150,7 @@ long kvm_arch_vm_ioctl(struct file *filp, break; } default: - r = -EINVAL; + r = -ENOTTY; } return r; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5beb4c1..829e306 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2176,7 +2176,7 @@ long kvm_arch_vm_ioctl(struct file *filp, { struct kvm *kvm = filp-private_data; void __user *argp = (void __user *)arg; - int r = -EINVAL; + int r = -ENOTTY; /* * This union makes it completely explicit to gcc-3.x * that these two variables' stack usage should be -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/42] KVM: Call pic_clear_isr() on pic reset to reuse logic there
From: Gleb Natapov g...@redhat.com Also move call of ack notifiers after pic state change. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/i8259.c | 22 +- 1 files changed, 9 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 01f1516..ccc941a 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -225,22 +225,11 @@ int kvm_pic_read_irq(struct kvm *kvm) void kvm_pic_reset(struct kvm_kpic_state *s) { - int irq, irqbase, n; + int irq; struct kvm *kvm = s-pics_state-irq_request_opaque; struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu; + u8 irr = s-irr, isr = s-imr; - if (s == s-pics_state-pics[0]) - irqbase = 0; - else - irqbase = 8; - - for (irq = 0; irq PIC_NUM_PINS/2; irq++) { - if (vcpu0 kvm_apic_accept_pic_intr(vcpu0)) - if (s-irr (1 irq) || s-isr (1 irq)) { - n = irq + irqbase; - kvm_notify_acked_irq(kvm, SELECT_PIC(n), n); - } - } s-last_irr = 0; s-irr = 0; s-imr = 0; @@ -256,6 +245,13 @@ void kvm_pic_reset(struct kvm_kpic_state *s) s-rotate_on_auto_eoi = 0; s-special_fully_nested_mode = 0; s-init4 = 0; + + for (irq = 0; irq PIC_NUM_PINS/2; irq++) { + if (vcpu0 kvm_apic_accept_pic_intr(vcpu0)) + if (irr (1 irq) || isr (1 irq)) { + pic_clear_isr(s, irq); + } + } } static void pic_ioport_write(void *opaque, u32 addr, u32 val) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM Fault Tolerance: Kemari for KVM
Avi Kivity wrote: On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote: Kemari runs paired virtual machines in an active-passive configuration and achieves whole-system replication by continuously copying the state of the system (dirty pages and the state of the virtual devices) from the active node to the passive node. An interesting implication of this is that during normal operation only the active node is actually executing code. Can you characterize the performance impact for various workloads? I assume you are running continuously in log-dirty mode. Doesn't this make memory intensive workloads suffer? Yes, we're running continuously in log-dirty mode. We still do not have numbers to show for KVM, but the snippets below from several runs of lmbench using Xen+Kemari will give you an idea of what you can expect in terms of overhead. All the tests were run using a fully virtualized Debian guest with hardware nested paging enabled. fork exec shP/F C/S [us] -- Base 114 349 1197 1.2845 8.2 Kemari(10GbE) + FC141 403 1280 1.2835 11.6 Kemari(10GbE) + DRBD 161 415 1388 1.3145 11.6 Kemari(1GbE) + FC 151 410 1335 1.3370 11.5 Kemari(1GbE) + DRBD 162 413 1318 1.3239 11.6 * P/F=page fault, C/S=context switch The benchmarks above are memory intensive and, as you can see, the overhead varies widely from 7% to 40%. We also measured CPU bound operations, but, as expected, Kemari incurred almost no overhead. The synchronization process can be broken down as follows: - Event tapping: On KVM all I/O generates a VMEXIT that is synchronously handled by the Linux kernel monitor i.e. KVM (it is worth noting that this applies to virtio devices too, because they use MMIO and PIO just like a regular PCI device). Some I/O (virtio-based) is asynchronous, but you still have well-known tap points within qemu. Yep, and in some cases we have polling from the backend, which I forgot to mention in the RFC. - Notification to qemu: Taking a page from live migration's playbook, the synchronization process is user-space driven, which means that qemu needs to be woken up at each synchronization point. That is already the case for qemu-emulated devices, but we also have in-kernel emulators. To compound the problem, even for user-space emulated devices accesses to coalesced MMIO areas can not be detected. As a consequence we need a mechanism to communicate KVM-handled events to qemu. Do you mean the ioapic, pic, and lapic? Well, I was more worried about the in-kernel backends currently in the works. To save the state of those devices we could leverage qemu's vmstate infrastructure and even reuse struct VMStateDescription's pre_save() callback, but we would like to pass the device state through the kvm_run area to avoid a ioctl call right after returning to user space. Perhaps its best to start with those in userspace (-no-kvm-irqchip). That's precisely what we were planning to do. Once we get a working prototype we will take care of existing optimizations such as in-kernel emulators and add our own. Why is access to those chips considered a synchronization point? The main problem with those is that to get the chip state we use an ioctl when we could have copied it to qemu's memory before going back to user space. Not all accesses to those chips need to be treated as synchronization points. - Virtual machine synchronization: All the dirty pages since the last synchronization point and the state of the virtual devices is sent to the fallback node from the user-space qemu process. For this the existing savevm infrastructure and KVM's dirty page tracking capabilities can be reused. Regarding in-kernel devices, with the likely advent of in-kernel virtio backends we need a generic way to access their state from user-space, for which, again, the kvm_run share memory area could be used. I wonder if you can pipeline dirty memory synchronization. That is, write-protect those pages that are dirty, start copying them to the other side, and continue execution, copying memory if the guest faults it again. Asynchronous transmission of dirty pages would be really helpful to eliminate the performance hiccups that tend to occur at synchronization points. What we can do is to copy dirty pages asynchronously until we reach a synchronization point, where we need to stop the guest and send the remaining dirty pages and the state of devices to the other side. However, we can not delay the transmission of a dirty page across a synchronization point, because if the primary node crashed before the page reached the fallback node the I/O operation that caused the synchronization point cannot be replayed reliably. How many pages do you copy per synchronization point for reasonably difficult workloads? That is very
kvm segmentation fault on pci-hotplug add/del/re-add
Hi, running kvm 0.11.0 from http://www.corpit.ru/debian/tls/ on debian lenny host, guest also debian lenny (with virtio and pci_hotplug, 2.6.30) Starting the guest with: /usr/bin/kvm -name test -vnc :5 -net nic,vlan=0,model=virtio,macaddr=00:08:15:00:09:95 -net tap,vlan=0,ifname=tap05,script=no -m 256 -drive file=/vsimages/test/test.raw,if=virtio,boot=on -monitor tcp:127.0.0.1:3,server,nowait -vga std -k de -tdf 1. Adding hdd On qemu-console: pci_add auto storage file=test2.raw,if=virtio OK domain 0, bus 0, slot 6, function 0 In guest: pci :00:06.0: reg 10 io port: [0x00-0x3ff] pci :00:02.0: BAR 6: bogus alignment [0x0-0x0] flags 0x2 decode_hpp: Could not get hotplug parameters. Use defaults virtio-pci :00:06.0: enabling device ( - 0001) ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11 virtio-pci :00:06.0: PCI INT A - Link[LNKB] - GSI 11 (level, high) - IRQ 11 vdb: vdb1 2. Deleting hdd On qemu-console: pci_del 0:6 In guest: virtio-pci :00:06.0: PCI INT A disabled 3. Re-adding the same hdd fails qemu-console: pci_add auto storage file=test2.raw,if=virtio host: Segmentation fault Running with gdb gives: dbg: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fc15550a6e0 (LWP 9085)] virtio_blk_init (dev=0xcb2010) at /build/kvm/qemu-kvm-0.11.0/build/hw/virtio-blk.c:439 439 /build/kvm/qemu-kvm-0.11.0/build/hw/virtio-blk.c: No such file or directory. in /build/kvm/qemu-kvm-0.11.0/build/hw/virtio-blk.c Is this problem known? Is pci-hotplug a qemu or kvm feature? Where to file a bug? Regards Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM Fault Tolerance: Kemari for KVM
On 11/16/2009 04:18 PM, Fernando Luis Vázquez Cao wrote: Avi Kivity wrote: On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote: Kemari runs paired virtual machines in an active-passive configuration and achieves whole-system replication by continuously copying the state of the system (dirty pages and the state of the virtual devices) from the active node to the passive node. An interesting implication of this is that during normal operation only the active node is actually executing code. Can you characterize the performance impact for various workloads? I assume you are running continuously in log-dirty mode. Doesn't this make memory intensive workloads suffer? Yes, we're running continuously in log-dirty mode. We still do not have numbers to show for KVM, but the snippets below from several runs of lmbench using Xen+Kemari will give you an idea of what you can expect in terms of overhead. All the tests were run using a fully virtualized Debian guest with hardware nested paging enabled. fork exec shP/F C/S [us] -- Base 114 349 1197 1.2845 8.2 Kemari(10GbE) + FC141 403 1280 1.2835 11.6 Kemari(10GbE) + DRBD 161 415 1388 1.3145 11.6 Kemari(1GbE) + FC 151 410 1335 1.3370 11.5 Kemari(1GbE) + DRBD 162 413 1318 1.3239 11.6 * P/F=page fault, C/S=context switch The benchmarks above are memory intensive and, as you can see, the overhead varies widely from 7% to 40%. We also measured CPU bound operations, but, as expected, Kemari incurred almost no overhead. Is lmbench fork that memory intensive? Do you have numbers for benchmarks that use significant anonymous RSS? Say, a parallel kernel build. Note that scaling vcpus will increase a guest's memory-dirtying power but snapshot rate will not scale in the same way. - Notification to qemu: Taking a page from live migration's playbook, the synchronization process is user-space driven, which means that qemu needs to be woken up at each synchronization point. That is already the case for qemu-emulated devices, but we also have in-kernel emulators. To compound the problem, even for user-space emulated devices accesses to coalesced MMIO areas can not be detected. As a consequence we need a mechanism to communicate KVM-handled events to qemu. Do you mean the ioapic, pic, and lapic? Well, I was more worried about the in-kernel backends currently in the works. To save the state of those devices we could leverage qemu's vmstate infrastructure and even reuse struct VMStateDescription's pre_save() callback, but we would like to pass the device state through the kvm_run area to avoid a ioctl call right after returning to user space. Hm, let's defer all that until we have something working so we can estimate the impact of userspace virtio in those circumstances. Why is access to those chips considered a synchronization point? The main problem with those is that to get the chip state we use an ioctl when we could have copied it to qemu's memory before going back to user space. Not all accesses to those chips need to be treated as synchronization points. Ok. Note that piggybacking on an exit will work for the lapic, but not for the global irqchips (ioapic, pic) since they can still be modified by another vcpu. I wonder if you can pipeline dirty memory synchronization. That is, write-protect those pages that are dirty, start copying them to the other side, and continue execution, copying memory if the guest faults it again. Asynchronous transmission of dirty pages would be really helpful to eliminate the performance hiccups that tend to occur at synchronization points. What we can do is to copy dirty pages asynchronously until we reach a synchronization point, where we need to stop the guest and send the remaining dirty pages and the state of devices to the other side. However, we can not delay the transmission of a dirty page across a synchronization point, because if the primary node crashed before the page reached the fallback node the I/O operation that caused the synchronization point cannot be replayed reliably. What I mean is: - choose synchronization point A - start copying memory for synchronization point A - output is delayed - choose synchronization point B - copy memory for A and B if guest touches memory not yet copied for A, COW it - once A copying is complete, release A output - continue copying memory for B - choose synchronization point B by keeping two synchronization points active, you don't have any pauses. The cost is maintaining copy-on-write so we can copy dirty pages for A while keeping execution. How many pages do you copy per synchronization point for reasonably difficult workloads? That is very workload-dependent, but if you take a look at the examples below you can get a feeling of how Kemari behaves. IOzoneKemari sync
buildbot failure in qemu-kvm on default_i386_debian_5_0
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/158 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state
This patch aims at addressing the mp_state writeback issue in a cleaner fashion. By introducing additional information about the scope of the scheduled vcpu state writeback, we can simply skin mp_state (and maybe other specific states in the future) when updating the in-kernel state. The writeback scope is defined when calling cpu_synchronize_state. It accumulated, ie. once a full writeback was requested, this will stick until it was performed. This unbreaks --disable-kvm builds of qemu-kvm again. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- A corresponding upstream patch is ready to be posted as well, just waiting for comments on the general direction from KVM POV. cpu-defs.h|2 +- exec.c|4 ++-- gdbstub.c |8 hw/apic.c |5 ++--- hw/pc.c |2 +- monitor.c |6 ++ qemu-kvm-ia64.c |2 ++ qemu-kvm-x86.c|6 -- qemu-kvm.c| 44 +--- qemu-kvm.h| 13 ++--- target-i386/helper.c |2 +- target-i386/machine.c |7 ++- target-ppc/machine.c |4 ++-- 13 files changed, 54 insertions(+), 51 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index cf502e9..b7cda81 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -142,7 +142,7 @@ struct KVMCPUState { pthread_t thread; int signalled; struct qemu_work_item *queued_work_first, *queued_work_last; -int regs_modified; +int writeback_scope; }; #define CPU_TEMP_BUF_NLONGS 128 diff --git a/exec.c b/exec.c index fcffb0f..290a565 100644 --- a/exec.c +++ b/exec.c @@ -529,14 +529,14 @@ static void cpu_common_pre_save(void *opaque) { CPUState *env = opaque; -cpu_synchronize_state(env); +cpu_synchronize_state(env, CPU_SYNC_RUNTIME); } static int cpu_common_pre_load(void *opaque) { CPUState *env = opaque; -cpu_synchronize_state(env); +cpu_synchronize_state(env, CPU_SYNC_RESET); return 0; } diff --git a/gdbstub.c b/gdbstub.c index ad7cdca..5a3e5ee 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -1598,7 +1598,7 @@ static void gdb_breakpoint_remove_all(void) static void gdb_set_cpu_pc(GDBState *s, target_ulong pc) { #if defined(TARGET_I386) -cpu_synchronize_state(s-c_cpu); +cpu_synchronize_state(s-c_cpu, CPU_SYNC_RUNTIME); s-c_cpu-eip = pc; #elif defined (TARGET_PPC) s-c_cpu-nip = pc; @@ -1785,7 +1785,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) } break; case 'g': -cpu_synchronize_state(s-g_cpu); +cpu_synchronize_state(s-g_cpu, CPU_SYNC_RUNTIME); len = 0; for (addr = 0; addr num_g_regs; addr++) { reg_size = gdb_read_register(s-g_cpu, mem_buf + len, addr); @@ -1795,7 +1795,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) put_packet(s, buf); break; case 'G': -cpu_synchronize_state(s-g_cpu); +cpu_synchronize_state(s-g_cpu, CPU_SYNC_RUNTIME); registers = mem_buf; len = strlen(p) / 2; hextomem((uint8_t *)registers, p, len); @@ -1959,7 +1959,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) thread = strtoull(p+16, (char **)p, 16); env = find_cpu(thread); if (env != NULL) { -cpu_synchronize_state(env); +cpu_synchronize_state(env, CPU_SYNC_RUNTIME); len = snprintf((char *)mem_buf, sizeof(mem_buf), CPU#%d [%s], env-cpu_index, env-halted ? halted : running); diff --git a/hw/apic.c b/hw/apic.c index f7cb9d2..abebde3 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -488,7 +488,7 @@ void apic_init_reset(CPUState *env) if (!s) return; -cpu_synchronize_state(env); +cpu_synchronize_state(env, CPU_SYNC_RESET); s-tpr = 0; s-spurious_vec = 0xff; s-log_dest = 0; @@ -512,7 +512,6 @@ void apic_init_reset(CPUState *env) if (kvm_enabled() kvm_irqchip_in_kernel()) { env-mp_state = env-halted ? KVM_MP_STATE_UNINITIALIZED : KVM_MP_STATE_RUNNABLE; -kvm_load_mpstate(env); } #endif } @@ -1070,7 +1069,7 @@ static void apic_reset(void *opaque) APICState *s = opaque; int bsp; -cpu_synchronize_state(s-cpu_env); +cpu_synchronize_state(s-cpu_env, CPU_SYNC_RESET); bsp = cpu_is_bsp(s-cpu_env); s-apicbase = 0xfee0 | diff --git a/hw/pc.c b/hw/pc.c index 5d90f8c..23d4a8e 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1021,7 +1021,7 @@ CPUState *pc_new_cpu(const char *cpu_model) fprintf(stderr, Unable to find x86 CPU definition\n); exit(1); } -env-kvm_cpu_state.regs_modified = 1; +env-kvm_cpu_state.writeback_scope = CPU_SYNC_RESET; if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { env-cpuid_apic_id =
[PATCH] Fix qemu user mode build
This hunk is bogus, probably some merge left-over. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/Makefile.target b/Makefile.target index 9e7481e..17ffece 100644 --- a/Makefile.target +++ b/Makefile.target @@ -102,7 +102,6 @@ VPATH+=:$(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user -I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \ elfload.o linuxload.o uaccess.o gdbstub.o -obj-$(CONFIG_CPU_EMULATION) += tcg-runtime.o obj-y += host-utils.o obj-$(TARGET_HAS_BFLT) += flatload.o -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio disk slower than IDE?
[prior attempts from elsewhere kept bouncing, apologies for any replication] Gordan Bobic wrote: The test is building the Linux kernel (only taking the second run to give the test the benefit of local cache): make clean; make -j8 all; make clean; sync; time make -j8 all This takes about 10 minutes with IDE disk emulation and about 13 minutes with virtio. I ran the tests multiple time with most non-essential services on the host switched off (including cron/atd), and the guest in single-user mode to reduce the noise in the test to the minimum, and the results are pretty consistent, with virtio being about 30% behind. I'd expect for an observed 30% wall clock time difference of an operation as complex as a kernel build the base i/o throughput disparity is substantially greater. Did you try a more simple/regular load, eg: a streaming dd read of various block sizes from guest raw disk devices? This is also considerably easier to debug vs. the complex i/o load generated by a build. One way to chop up the problem space is using blktrace on the host to observe both the i/o patterns coming out of qemu and the host's response to them in terms of turn around time. I expect you'll see somewhat different nature requests generated by qemu w/r/t blocking and number of threads serving virtio_blk requests relative to ide but the host response should be essentially the same in terms of data returned per unit time. If the host looks to be turning around i/o request with similar latency in both cases, the problem would be lower frequency of requests generated by qemu in the case of virtio_blk. Here it would be useful to know the host load generated by the guest for both cases. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio disk slower than IDE?
Gordan Bobic wrote: The test is building the Linux kernel (only taking the second run to give the test the benefit of local cache): make clean; make -j8 all; make clean; sync; time make -j8 all This takes about 10 minutes with IDE disk emulation and about 13 minutes with virtio. I ran the tests multiple time with most non-essential services on the host switched off (including cron/atd), and the guest in single-user mode to reduce the noise in the test to the minimum, and the results are pretty consistent, with virtio being about 30% behind. I'd expect for an observed 30% wall clock time difference of an operation as complex as a kernel build the base i/o throughput disparity is substantially greater. Did you try a more simple/regular load, eg: a streaming dd read of various block sizes from guest raw disk devices? This is also considerably easier to debug vs. the complex i/o load generated by a build. One way to chop up the problem space is using blktrace on the host to observe both the i/o patterns coming out of qemu and the host's response to them in terms of turn around time. I expect you'll see somewhat different nature requests generated by qemu w/r/t blocking and number of threads serving virtio_blk requests relative to ide but the host response should be essentially the same in terms of data returned per unit time. If the host looks to be turning around i/o request with similar latency in both cases, the problem would be lower frequency of requests generated by qemu in the case of virtio_blk. Here it would be useful to know the host load generated by the guest for both cases. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio disk slower than IDE?
Gordan Bobic wrote: Lastly, do you use cache=wb on qemu? it's just a fun mode, we use cache=off only. I don't see the option being set in the logs, so I'd guess it's whatever qemu-kvm defaults to. You can set this through libvirt by putting an element such as the following within your disk element: driver name='qemu' type='qcow2' cache='none'/ (Setting the type is preferred to avoid security issues wherein a guest writes an arbitrary qcow2 header to the beginning of a raw disk, reboots and allows qemu's autodetection to decide that this formerly-raw disk should now be treated as a delta against a file they otherwise might not have access to read; as such, it's particularly important if you intend that the type be raw). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state
Am 16.11.2009 um 18:00 schrieb Jan Kiszka jan.kis...@siemens.com: This patch aims at addressing the mp_state writeback issue in a cleaner fashion. By introducing additional information about the scope of the scheduled vcpu state writeback, we can simply skin mp_state (and maybe other specific states in the future) when updating the in-kernel state. The writeback scope is defined when calling cpu_synchronize_state. It accumulated, ie. once a full writeback was requested, this will stick until it was performed. This unbreaks --disable-kvm builds of qemu-kvm again. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- A corresponding upstream patch is ready to be posted as well, just waiting for comments on the general direction from KVM POV. I think I'd rather have a sync function that implicitly does the RUNTIME sync, the way it is now, and an 'advanced' one you can pass a constant what it syncs. That way most code continues to work the way it is now. It also makes it easier readable IMHO. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state
On 11/16/2009 07:00 PM, Jan Kiszka wrote: This patch aims at addressing the mp_state writeback issue in a cleaner fashion. What's the issue? the fact that mp_state is updated whenever state is synchronized, while it could be simultaneously updated from other vcpus (which latter updates are then lost)? By introducing additional information about the scope of the scheduled vcpu state writeback, we can simply skin mp_state (and maybe other specific states in the future) when updating the in-kernel state. The writeback scope is defined when calling cpu_synchronize_state. It accumulated, ie. once a full writeback was requested, this will stick until it was performed. Maybe it's just simpler to divorce mp_state from the rest of the state. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] viostor driver. XP driver performance improvement.
repository: /home/vadimr/shares/kvm-guest-drivers-windows branch: XP commit e58183a0df0fd398e31e3f079b6c301520b2a5a2 Author: Vadim Rozenfeldvroze...@redhat.com Date: Sun Nov 15 22:50:55 2009 +0200 [PATCH] viostor driver. XP driver performance improvement. Signed-off-by: Vadim Rozenfeldvroze...@redhat.com diff --git a/viostor/virtio_ring.c b/viostor/virtio_ring.c index 2911cef..de69557 100644 --- a/viostor/virtio_ring.c +++ b/viostor/virtio_ring.c @@ -39,7 +39,6 @@ initialize_virtqueue( IN VOID (*notify)(struct virtqueue *)); -//#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) #define to_vvq(_vq) (struct vring_virtqueue *)_vq static @@ -119,7 +118,7 @@ vring_add_buf( RhelDbgPrint(TRACE_LEVEL_VERBOSE, (%s: Added buffer head %i to %p\n, __FUNCTION__, head, vq) ); -return 0; +return vq-num_free; } static diff --git a/viostor/virtio_stor_hw_helper.c b/viostor/virtio_stor_hw_helper.c index 21d27cd..2e61b30 100644 --- a/viostor/virtio_stor_hw_helper.c +++ b/viostor/virtio_stor_hw_helper.c @@ -54,6 +54,8 @@ RhelDoReadWrite(PVOID DeviceExtension, PVOID DataBuffer; PADAPTER_EXTENSIONadaptExt; PRHEL_SRB_EXTENSION srbExt; +int num_free; + cdb = (PCDB)Srb-Cdb[0]; srbExt = (PRHEL_SRB_EXTENSION)Srb-SrbExtension; adaptExt = (PADAPTER_EXTENSION)DeviceExtension; @@ -88,20 +90,21 @@ RhelDoReadWrite(PVOID DeviceExtension, srbExt-vbr.sg[sgElement].physAddr = ScsiPortGetPhysicalAddress(DeviceExtension, NULL,srbExt-vbr.status,fragLen); srbExt-vbr.sg[sgElement].ulSize = sizeof(srbExt-vbr.status); - - -if (adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq, +num_free = adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq, srbExt-vbr.sg[0], srbExt-out, srbExt-in, -srbExt-vbr) == 0) { +srbExt-vbr); +if ( num_free= 0) { InsertTailList(adaptExt-list_head,srbExt-vbr.list_entry); adaptExt-pci_vq_info.vq-vq_ops-kick(adaptExt-pci_vq_info.vq); -if(++adaptExt-requests adaptExt-queue_depth) { +srbExt-call_next = FALSE; +if(num_free VIRTIO_MAX_SG) { + srbExt-call_next = TRUE; +} else { ScsiPortNotification(NextLuRequest, DeviceExtension, Srb-PathId, Srb-TargetId, Srb-Lun); } -return TRUE; } -return FALSE; +return TRUE; } #endif diff --git a/viostor/virtio_ring.c b/viostor/virtio_ring.c index 2911cef..de69557 100644 --- a/viostor/virtio_ring.c +++ b/viostor/virtio_ring.c @@ -39,7 +39,6 @@ initialize_virtqueue( IN VOID (*notify)(struct virtqueue *)); -//#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) #define to_vvq(_vq) (struct vring_virtqueue *)_vq static @@ -119,7 +118,7 @@ vring_add_buf( RhelDbgPrint(TRACE_LEVEL_VERBOSE, (%s: Added buffer head %i to %p\n, __FUNCTION__, head, vq) ); -return 0; +return vq-num_free; } static diff --git a/viostor/virtio_stor_hw_helper.c b/viostor/virtio_stor_hw_helper.c index 21d27cd..2e61b30 100644 --- a/viostor/virtio_stor_hw_helper.c +++ b/viostor/virtio_stor_hw_helper.c @@ -54,6 +54,8 @@ RhelDoReadWrite(PVOID DeviceExtension, PVOID DataBuffer; PADAPTER_EXTENSIONadaptExt; PRHEL_SRB_EXTENSION srbExt; +int num_free; + cdb = (PCDB)Srb-Cdb[0]; srbExt = (PRHEL_SRB_EXTENSION)Srb-SrbExtension; adaptExt = (PADAPTER_EXTENSION)DeviceExtension; @@ -88,20 +90,21 @@ RhelDoReadWrite(PVOID DeviceExtension, srbExt-vbr.sg[sgElement].physAddr = ScsiPortGetPhysicalAddress(DeviceExtension, NULL, srbExt-vbr.status, fragLen); srbExt-vbr.sg[sgElement].ulSize = sizeof(srbExt-vbr.status); - - -if (adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq, +num_free = adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq, srbExt-vbr.sg[0], srbExt-out, srbExt-in, - srbExt-vbr) == 0) { + srbExt-vbr); +if ( num_free = 0) { InsertTailList(adaptExt-list_head, srbExt-vbr.list_entry); adaptExt-pci_vq_info.vq-vq_ops-kick(adaptExt-pci_vq_info.vq); -if(++adaptExt-requests adaptExt-queue_depth) { +srbExt-call_next = FALSE; +if(num_free VIRTIO_MAX_SG) { + srbExt-call_next = TRUE; +} else { ScsiPortNotification(NextLuRequest, DeviceExtension, Srb-PathId, Srb-TargetId, Srb-Lun); } -return TRUE; } -return FALSE; +return TRUE; } #endif
Re: [ANNOUNCE] kvm-kmod-2.6.32-rc7
Jan Kiszka 提到: win7_x86 means 64-bit version? win7_x86 means windows 7 32bit version Which qemu-kvm version are you using for these tests? I download qemu-kvm from http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=summary; at 09-Nov-2009 QEMU 0.11.50 monitor - type 'help' for more information -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio disk slower than IDE?
On 11/16/2009 08:11 PM, Charles Duffy wrote: Gordan Bobic wrote: Lastly, do you use cache=wb on qemu? it's just a fun mode, we use cache=off only. I don't see the option being set in the logs, so I'd guess it's whatever qemu-kvm defaults to. You can set this through libvirt by putting an element such as the following within your disk element: driver name='qemu' type='qcow2' cache='none'/ It's not needed on rhel5.4 qemu - we have cache=none as a default (Setting the type is preferred to avoid security issues wherein a guest writes an arbitrary qcow2 header to the beginning of a raw disk, reboots and allows qemu's autodetection to decide that this formerly-raw disk should now be treated as a delta against a file they otherwise might not have access to read; as such, it's particularly important if you intend that the type be raw). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state
Avi Kivity wrote: On 11/16/2009 07:00 PM, Jan Kiszka wrote: This patch aims at addressing the mp_state writeback issue in a cleaner fashion. What's the issue? the fact that mp_state is updated whenever state is synchronized, while it could be simultaneously updated from other vcpus (which latter updates are then lost)? Right, the issue b8a7857071 addressed. But that approach spreads more kvm_* fragments in unrelated qemu code, e.g. the monitor, and fails to update other parts (gdbstub). And it doesn't care about what happens if kvm is off at build or runtime. Such things are better addressed in upstream by encapsulating kvm calls in synchronization points. By introducing additional information about the scope of the scheduled vcpu state writeback, we can simply skin mp_state (and maybe other specific states in the future) when updating the in-kernel state. The writeback scope is defined when calling cpu_synchronize_state. It accumulated, ie. once a full writeback was requested, this will stick until it was performed. Maybe it's just simpler to divorce mp_state from the rest of the state. That won't solve the core issue. mp_state *is* part of the state, and needs to be read (to update halted) and sometimes also written when the state was hard reset. Jan signature.asc Description: OpenPGP digital signature
[ kvm-Bugs-2897679 ] strange mouse behavior when connecting to kvm-sesion via vnc
Bugs item #2897679, was opened at 2009-11-14 14:19 Message generated for change (Comment added) made by d3vi0n You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2897679group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Michael Mair-Keimberger (d3vi0n) Assigned to: Nobody/Anonymous (nobody) Summary: strange mouse behavior when connecting to kvm-sesion via vnc Initial Comment: I've a few running kvm-sessions on my server (fedora, ubuntu and winxp). I can connect to them via vnc which i enabled in kvm (via kvm -vnc). The problem is that the mouse in a kvm-window is never there where it should be. This is really annoying because, for example if i try to press the start button (in windows), most of the time my mouse is already out of the window. I always have to play with the mouse to reach the button. Generally its with everything i do in windows with the mouse. Its the same with the other linux-oses. It seems it depends on which point i jump into the window. Also the mouse distance between the local mouse and the mouse in windows changes while i move the mouse in windows. I already made an bug-report on bugs.kde.org, because i though it's the fault with krdc (which is my vnc client), but i have the same issue with other clients too. Here is the link of the bug-report: https://bugs.kde.org/show_bug.cgi?id=212498 Some info about the system: Its a stable full 64-bit (no multilib) gentoo system: Portage 2.1.6.13 (default/linux/amd64/10.0/no-multilib, gcc-4.3.4, glibc-2.9_p20081201-r2, 2.6.30-gentoo-r4 x86_64) = System uname: linux-2.6.30-gentoo-r4-x86_64-intel-r-_xeon-r-_cpu_e54...@_2.00ghz-with-gentoo-2.0.1 Timestamp of tree: Sat, 14 Nov 2009 05:20:01 + app-shells/bash: 4.0_p28 dev-lang/python: 2.6.2-r1 sys-apps/baselayout: 2.0.1 sys-apps/openrc: 0.5.2-r2 sys-apps/sandbox:1.6-r2 sys-devel/autoconf: 2.63-r1 sys-devel/automake: 1.7.9-r1, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6a virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS=amd64 CBUILD=x86_64-pc-linux-gnu CFLAGS=-O2 -pipe -march=nocona -msse4.1 CHOST=x86_64-pc-linux-gnu CONFIG_PROTECT=/etc CONFIG_PROTECT_MASK=/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d CXXFLAGS=-O2 -pipe -march=nocona -msse4.1 DISTDIR=/usr/portage/distfiles FEATURES=distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch GENTOO_MIRRORS=http://gentoo.supp.name/ http://ftp.fi.muni.cz/pub/linux/gentoo/ http://gentoo.mirror.web4u.cz/ http://gentoo.mirror.dkm.cz/pub/gentoo/ http://gentoo.ynet.sk/pub; LANG=de_DE.utf8 LDFLAGS=-Wl,-O1 LINGUAS=de MAKEOPTS=-j9 PKGDIR=/usr/portage/packages PORTAGE_CONFIGROOT=/ PORTAGE_RSYNC_OPTS=--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages PORTAGE_TMPDIR=/var/tmp/tunafix PORTDIR=/usr/portage PORTDIR_OVERLAY=/home/clown/overlays/local /home/clown/overlays/layman/x11 /home/clown/overlays/layman/sunrise SYNC=rsync://rsync.europe.gentoo.org/gentoo-portage/ USE=acl acpi amd64 berkdb bzip2 cli cracklib crypt cups dbus dri fortran gdbm gpm iconv ipv6 mmx modules mudflap ncurses nls nptl nptlonly openmp pam pcre perl pppd python readline reflection session spl sse sse2 ssl ssse3 sysfs tcpd unicode xorg zlib ALSA_CARDS=ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci ALSA_PCM_PLUGINS=adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol APACHE2_MODULES=actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias DVB_CARDS=ttpci ELIBC=glibc INPUT_DEVICES=keyboard mouse evdev KERNEL=linux LCD_DEVICES=bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text LINGUAS=de USERLAND=GNU
[ kvm-Bugs-2897679 ] strange mouse behavior when connecting to kvm-sesion via vnc
Bugs item #2897679, was opened at 2009-11-14 16:19 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2897679group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Michael Mair-Keimberger (d3vi0n) Assigned to: Nobody/Anonymous (nobody) Summary: strange mouse behavior when connecting to kvm-sesion via vnc Initial Comment: I've a few running kvm-sessions on my server (fedora, ubuntu and winxp). I can connect to them via vnc which i enabled in kvm (via kvm -vnc). The problem is that the mouse in a kvm-window is never there where it should be. This is really annoying because, for example if i try to press the start button (in windows), most of the time my mouse is already out of the window. I always have to play with the mouse to reach the button. Generally its with everything i do in windows with the mouse. Its the same with the other linux-oses. It seems it depends on which point i jump into the window. Also the mouse distance between the local mouse and the mouse in windows changes while i move the mouse in windows. I already made an bug-report on bugs.kde.org, because i though it's the fault with krdc (which is my vnc client), but i have the same issue with other clients too. Here is the link of the bug-report: https://bugs.kde.org/show_bug.cgi?id=212498 Some info about the system: Its a stable full 64-bit (no multilib) gentoo system: Portage 2.1.6.13 (default/linux/amd64/10.0/no-multilib, gcc-4.3.4, glibc-2.9_p20081201-r2, 2.6.30-gentoo-r4 x86_64) = System uname: linux-2.6.30-gentoo-r4-x86_64-intel-r-_xeon-r-_cpu_e54...@_2.00ghz-with-gentoo-2.0.1 Timestamp of tree: Sat, 14 Nov 2009 05:20:01 + app-shells/bash: 4.0_p28 dev-lang/python: 2.6.2-r1 sys-apps/baselayout: 2.0.1 sys-apps/openrc: 0.5.2-r2 sys-apps/sandbox:1.6-r2 sys-devel/autoconf: 2.63-r1 sys-devel/automake: 1.7.9-r1, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6a virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS=amd64 CBUILD=x86_64-pc-linux-gnu CFLAGS=-O2 -pipe -march=nocona -msse4.1 CHOST=x86_64-pc-linux-gnu CONFIG_PROTECT=/etc CONFIG_PROTECT_MASK=/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d CXXFLAGS=-O2 -pipe -march=nocona -msse4.1 DISTDIR=/usr/portage/distfiles FEATURES=distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch GENTOO_MIRRORS=http://gentoo.supp.name/ http://ftp.fi.muni.cz/pub/linux/gentoo/ http://gentoo.mirror.web4u.cz/ http://gentoo.mirror.dkm.cz/pub/gentoo/ http://gentoo.ynet.sk/pub; LANG=de_DE.utf8 LDFLAGS=-Wl,-O1 LINGUAS=de MAKEOPTS=-j9 PKGDIR=/usr/portage/packages PORTAGE_CONFIGROOT=/ PORTAGE_RSYNC_OPTS=--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages PORTAGE_TMPDIR=/var/tmp/tunafix PORTDIR=/usr/portage PORTDIR_OVERLAY=/home/clown/overlays/local /home/clown/overlays/layman/x11 /home/clown/overlays/layman/sunrise SYNC=rsync://rsync.europe.gentoo.org/gentoo-portage/ USE=acl acpi amd64 berkdb bzip2 cli cracklib crypt cups dbus dri fortran gdbm gpm iconv ipv6 mmx modules mudflap ncurses nls nptl nptlonly openmp pam pcre perl pppd python readline reflection session spl sse sse2 ssl ssse3 sysfs tcpd unicode xorg zlib ALSA_CARDS=ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci ALSA_PCM_PLUGINS=adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol APACHE2_MODULES=actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias DVB_CARDS=ttpci ELIBC=glibc INPUT_DEVICES=keyboard mouse evdev KERNEL=linux LCD_DEVICES=bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text LINGUAS=de USERLAND=GNU
Windows XP Bluescreen when unplugging printer
Hi all, I want to run an epson inkjet within my windows xp guest. my host has enabled usb 2.0, the USB flashdrive works without any problems. When I plug in the printer (works with the same drivers on a native windows xp!), it is recognized and the status monitor shows also the ink levels. Until I start printing. Then the ink level disappears and the status monitor hangs. The printer itself doesn't do anything, no LED blinks, no printing starts. When I shut down windows or unplug the printer I get a bluescreen in XP on usbuhci.sys Interesting: When I switch off USB 2.0 and enable only USB 1.x in the host BIOS, everything related to USB is SLOW (usb flash drive, too!) but the printer works (also slow, but it prints). Any ideas what could cause that behaviour? Comes up with kvm-88 and kvm-77 as well. I tested it on two different systems both the same behaviour. Best regards, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Serial Port Driver does not handle interrupt
Erik Rull wrote: Hi all, I've tested two kvm versions 77 and 88, both with the same behaviour: I add a serial device with -serial /dev/ttyS0 to my guest and launched HyperTerm on my Windows Guest. Additionally I plugged in a loopback plug on the serial connector that just routes back the data send back to the receive line (TXD - RXD). When I enter characters on the Hyperterm all characters are displayed except the last one that was sent. When I plug in a real serial device and request its status via Hyperterm I get only back the first 16 chars (I expect 50) when I send an additional dummy character (sizeof: 16550 FIFO buffer). On my normal Windows PC it works as expected - so it seems to be an issue with kvm. In the linux host, I don't see any changes in /proc/interrupts, the driver is opened and exclusively on this interrupt line. Within the host the serial line works without any problems, and there, the interrupt counts increase during normal operation with a software that runs on the host and that uses this serial line. What must I do to bring the serial line to real life with a complete communication without any data lost caused by missing interrupts? At the moment this is a key issue that has to be solved to continue my work with kvm. Any Ideas? I also tested other IRQ lines and other ttyS* on the system - same behaviour. Thanks in advance, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html fixed, apic on host side was disabled, kvm / qemu seems to need it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2896992 ] Intel PCI NIC passthrough problem
Bugs item #2896992, was opened at 2009-11-13 05:58 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2896992group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sergey Cheperis () Assigned to: Nobody/Anonymous (nobody) Summary: Intel PCI NIC passthrough problem Initial Comment: Host: - CPU Core2Duo E6300, Intel q45 chipset, VT-d enabled - Ubuntu 9.10 x86_64, kernel 2.6.31.4 recompiled with CONFIG_DMAR=y and CONFIG_INTR_REMAP=y according to http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM - in-kernel kvm module - qemu-kvm commit c04b2aebf50c7d8cba883b86d1b872ccfc8f2249 - intel_iommu=igfx_off - Qemu command line: /usr/local/bin/qemu-system-x86_64 -m 512 -k en-us -drive if=ide,file=/dev/server2/ubuntu,boot=on -cdrom /home/install/Linux/i386/ubuntu-9.10-desktop-i386.iso -boot d -vga std -pcidevice host=01:00.0 -net none -vnc :15 -daemonize Guest OSes: - Ubuntu 9.10 live CD i386, Windows Server 2003 i386, Windows Server 2008 x86_64, MacOS X 10.5.x i386 The device on the host: 01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) Subsystem: Intel Corporation Device 002e Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 31 Region 0: Memory at d054 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at d052 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at d000 [size=64] Expansion ROM at bf00 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [f0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Address: fee0300c Data: 41e9 Kernel driver in use: pci-stub Kernel modules: e1000 Symptoms: The device is found and initialized in all OS'es. The driver properly detects the link speed, and does detect when I plug the cable in or out. However, it does not send or receive any packets, all the counters are always at 0's though there must be more at least due to DHCP activity. dmesg does not show any errors neither on the host or on the guest. ifconfig on the guest: eth0 Link encap:Ethernet HWaddr 00:07:e9:0f:c8:10 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) lspci -vvv on the guest: 00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) Subsystem: Intel Corporation Device 002e Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at f000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f002 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at c040 [size=64] Expansion ROM at 2000 [disabled] [size=128K] Capabilities: [40] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable- Address: Data: Kernel driver in use: e1000 Kernel modules: e1000 cat /proc/interrupts on the guest: CPU0 0: 89 IO-APIC-edge timer 1: 1088 IO-APIC-edge i8042 4: 2 IO-APIC-edge 6: 2 IO-APIC-edge floppy 7: 4 IO-APIC-edge parport0 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 11: 0 IO-APIC-fasteoi eth0 12: 1333 IO-APIC-edge i8042 14: 98 IO-APIC-edge ata_piix 15: 9874 IO-APIC-edge ata_piix NMI: 0 Non-maskable interrupts LOC: 59892 Local timer interrupts SPU: 0 Spurious
Re: Windows XP Bluescreen when unplugging printer
Erik Rull wrote: Hi all, I want to run an epson inkjet within my windows xp guest. my host has enabled usb 2.0, the USB flashdrive works without any problems. When I plug in the printer (works with the same drivers on a native windows xp!), it is recognized and the status monitor shows also the ink levels. Until I start printing. Then the ink level disappears and the status monitor hangs. The printer itself doesn't do anything, no LED blinks, no printing starts. When I shut down windows or unplug the printer I get a bluescreen in XP on usbuhci.sys Interesting: When I switch off USB 2.0 and enable only USB 1.x in the host BIOS, everything related to USB is SLOW (usb flash drive, too!) but the printer works (also slow, but it prints). Any ideas what could cause that behaviour? Comes up with kvm-88 and kvm-77 as well. I tested it on two different systems both the same behaviour. You might try this usb fix: http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=c4c0e236beabb9de5ff472f77aeb811ec5484615 It's been around for a while but hasn't made it into any qemu or kvm releases yet. -jim -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
1:1 mapping
Hi all I work in the embedded world and am evaluating potential hypervisors and kvm looks suitable for what we want However, I see some old references to 1:1 mapping which would allow a guest to see a PCI device even without an IOMMU Is this currently incorporated into the qemu-0.11 and kvm-88 release? Is dma=none equivalent to this by any chance or is it only meant for devices that do not use DMA? If 1:1 mapping is not in the tree, is there a reasonable chance that the old patch would work with the latest? Thanks Marty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST] KSM-overcommit test v.2 (python version)
Hi, based on your requirements we have created new version of KSM-overcommit patch (submitted in September). Describe: It tests KSM (kernel shared memory) with overcommit of memory. Changelog: 1) Based only on python (remove C code) 2) Add new test (check last 96B) 3) Separate test to (serial,parallel,both) 4) Improve log and documentation 5) Add perf constat to change time limit for waiting. (slow computer problem) Functionality: KSM test start guests. They are connect to guest over ssh. Copy and run allocator.py to guests. Host can run any python command over Allocator.py loop on client side. Start run_ksm_overcommit. Define host and guest reserve variables (host_reserver,guest_reserver). Calculate amount of virtual machine and their memory based on variables host_mem and overcommit. Check KSM status. Create and start virtual guests. Test : a] serial 1) initialize, merge all mem to single page 2) separate first guset mem 3) separate rest of guest up to fill all mem 4) kill all guests except for the last 5) check if mem of last guest is ok 6) kill guest b] parallel 1) initialize, merge all mem to single page 2) separate mem of guest 3) verification of guest mem 4) merge mem to one block 5) verification of guests mem 6) separate mem of guests by 96B 7) check if mem is all right 8) kill guest allocator.py (client side script) After start they wait for command witch they make in client side. mem_fill class implement commands to fill, check mem and return error to host. We need client side script because we need generate lot of GB of special data. Future plane: We want to add to log information about time spend in task. Information from log we want to use to automatic compute perf contant. And add New tests. diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index ac9ef66..90f62bb 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -118,6 +118,23 @@ variants: test_name = npb test_control_file = npb.control +- ksm_overcommit: +# Don't preprocess any vms as we need to change it's params +vms = '' +image_snapshot = yes +kill_vm_gracefully = no +type = ksm_overcommit +ksm_swap = yes # yes | no +no hugepages +# Overcommit of host memmory +ksm_overcommit_ratio = 3 +# Max paralel runs machine +ksm_paralel_ratio = 4 +variants: +- serial +ksm_test_size = serial +- paralel +ksm_test_size = paralel - linux_s3: install setup unattended_install type = linux_s3 diff --git a/client/tests/kvm/tests/ksm_overcommit.py b/client/tests/kvm/tests/ksm_overcommit.py new file mode 100644 index 000..fb7ded6 --- /dev/null +++ b/client/tests/kvm/tests/ksm_overcommit.py @@ -0,0 +1,605 @@ +import logging, time +from autotest_lib.client.common_lib import error +import kvm_subprocess, kvm_test_utils, kvm_utils +import kvm_preprocessing +import random, string, math, os + +def run_ksm_overcommit(test, params, env): + +Test how KSM (Kernel Shared Memory) act with more than physical memory is +used. In second part is also tested, how KVM can handle the situation, +when the host runs out of memory (expected is to pause the guest system, +wait until some process returns the memory and bring the guest back to life) + +@param test: kvm test object. +@param params: Dictionary with test parameters. +@param env: Dictionary with the test wnvironment. + + +def parse_meminfo(rowName): + +Function get date from file /proc/meminfo + +@param rowName: Name of line in meminfo + +for line in open('/proc/meminfo').readlines(): +if line.startswith(rowName+:): +name, amt, unit = line.split() +return name, amt, unit + +def parse_meminfo_value(rowName): + +Function convert meminfo value to int + +@param rowName: Name of line in meminfo + +name, amt, unit = parse_meminfo(rowName) +return amt + + +def get_stat(lvms): + +Get statistics in format: +Host: memfree = XXXM; Guests memsh = {XXX,XXX,...} + +@params lvms: List of VMs + +if not isinstance(lvms, list): +raise error.TestError(get_stat: parameter have to be proper list) + +try: +stat = Host: memfree = +stat += str(int(parse_meminfo_value(MemFree)) / 1024) + M; +stat += swapfree = +stat += str(int(parse_meminfo_value(SwapFree)) / 1024) + M; +except: +raise error.TestFail(Could not fetch free memory
Re: Serial Port Driver does not handle interrupt
On Tue, Nov 17, 2009 at 12:01:08AM +0100, Erik Rull wrote: Erik Rull wrote: Any Ideas? I also tested other IRQ lines and other ttyS* on the system - same behaviour. fixed, apic on host side was disabled, kvm / qemu seems to need it. I think I hit the same issue. What did you do exactly to solve it ? Enable a kernel option ? May I ask which one ? :) Sorry, I dont have the hardware right now (so I can't play with apic options). I will have it in a few weeks, so that's why I am asking :) Thanks a lot, Rodrigo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio disk slower than IDE?
john cooper wrote: The test is building the Linux kernel (only taking the second run to give the test the benefit of local cache): make clean; make -j8 all; make clean; sync; time make -j8 all This takes about 10 minutes with IDE disk emulation and about 13 minutes with virtio. I ran the tests multiple time with most non-essential services on the host switched off (including cron/atd), and the guest in single-user mode to reduce the noise in the test to the minimum, and the results are pretty consistent, with virtio being about 30% behind. I'd expect for an observed 30% wall clock time difference of an operation as complex as a kernel build the base i/o throughput disparity is substantially greater. Did you try a more simple/regular load, eg: a streaming dd read of various block sizes from guest raw disk devices? This is also considerably easier to debug vs. the complex i/o load generated by a build. I'm not convinced it's the read performance, since it's the second pass that is time, by which time all the source files will be in the guest's cache. I verified this by doing just one pass and priming it with: find . -type f -exec cat '{}' /dev/null \; The execution times are indistinguishable from the second pass in the two-pass test. To me that would indicate the the problem is with write performance, rather than read performance. One way to chop up the problem space is using blktrace on the host to observe both the i/o patterns coming out of qemu and the host's response to them in terms of turn around time. I expect you'll see somewhat different nature requests generated by qemu w/r/t blocking and number of threads serving virtio_blk requests relative to ide but the host response should be essentially the same in terms of data returned per unit time. If the host looks to be turning around i/o request with similar latency in both cases, the problem would be lower frequency of requests generated by qemu in the case of virtio_blk. Here it would be useful to know the host load generated by the guest for both cases. With virtio the CPU usage did seem to be noticeably lower. I figured that was because it was spending more time waiting for I/O to finish, since it was clearly bottlenecking on disk I/O (since that's the only thing that changed). I'll try iozone's write tests and see how that compares. If I'm right about write performance being problematic, iozone might show the same performance deterioration on write tests compared to the IDE emulation. Gordan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html