[PATCH] KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)
Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 26f69c2..e5f7f5c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2487,7 +2487,7 @@ static struct opcode opcode_table[256] = { X8(D(SrcAcc | DstReg)), /* 0x98 - 0x9F */ D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd), - D(SrcImmFAddr | No64), N, + I(SrcImmFAddr | No64, em_call_far), N, D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N, /* 0xA0 - 0xA7 */ D(ByteOp | DstAcc | SrcMem | Mov | MemAbs), D(DstAcc | SrcMem | Mov | MemAbs), -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add realmode test for CALL FAR IMM instruction
Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- x86/realmode.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index a833829..2e12680 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -437,6 +437,9 @@ void test_call(void) ret\n\t 2:\t); MK_INSN(call_far1, lcallw *(%ebx)\n\t); + MK_INSN(call_far2, .byte 0x9a\n\t + .word retf\n\t + .word 0x00\n\t); MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b); exec_in_big_real_mode(insn_call1); @@ -453,6 +456,9 @@ void test_call(void) exec_in_big_real_mode(insn_call_far1); report(call far 1, 0, 1); + exec_in_big_real_mode(insn_call_far2); + report(call far 2, 0, 1); + exec_in_big_real_mode(insn_ret_imm); report(ret imm 1, 0, 1); } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add realmode test for CALL FAR IMM instruction
On 08/25/2010 09:16 AM, Wei Yongjun wrote: Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- x86/realmode.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index a833829..2e12680 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -437,6 +437,9 @@ void test_call(void) ret\n\t 2:\t); MK_INSN(call_far1, lcallw *(%ebx)\n\t); + MK_INSN(call_far2, .byte 0x9a\n\t + .word retf\n\t + .word 0x00\n\t); Why .byte encoding? won't lcallw $0, $retf (or the other way round) work? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add realmode test for CALL FAR IMM instruction
On 08/25/2010 09:16 AM, Wei Yongjun wrote: Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- x86/realmode.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index a833829..2e12680 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -437,6 +437,9 @@ void test_call(void) ret\n\t 2:\t); MK_INSN(call_far1, lcallw *(%ebx)\n\t); +MK_INSN(call_far2, .byte 0x9a\n\t +.word retf\n\t +.word 0x00\n\t); Why .byte encoding? won't lcallw $0, $retf (or the other way round) work? Oh, it works, I will fix this. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] Add realmode test for CALL FAR IMM instruction
Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- x86/realmode.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index a833829..d171a56 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -437,6 +437,7 @@ void test_call(void) ret\n\t 2:\t); MK_INSN(call_far1, lcallw *(%ebx)\n\t); + MK_INSN(call_far2, lcallw $0, $retf\n\t); MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b); exec_in_big_real_mode(insn_call1); @@ -453,6 +454,9 @@ void test_call(void) exec_in_big_real_mode(insn_call_far1); report(call far 1, 0, 1); + exec_in_big_real_mode(insn_call_far2); + report(call far 2, 0, 1); + exec_in_big_real_mode(insn_ret_imm); report(ret imm 1, 0, 1); } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote: +static void hotplug_devices(struct work_struct *dummy) +{ + unsigned int i; + struct kvm_device_desc *d; + struct device *dev; + + for (i = 0; i PAGE_SIZE; i += desc_size(d)) { This should be for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) { otherwise you might have memory accesses beyond the device page... + d = kvm_devices + i; + + /* end of list */ + if (d-type == 0) + break; ...even if that should not happen if everything works. But let's be paranoid. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On 25.08.2010, at 10:16, Heiko Carstens wrote: On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote: +static void hotplug_devices(struct work_struct *dummy) +{ +unsigned int i; +struct kvm_device_desc *d; +struct device *dev; + +for (i = 0; i PAGE_SIZE; i += desc_size(d)) { This should be for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) { otherwise you might have memory accesses beyond the device page... Oh, this is a simple copypaste from the original search method. Is d valid in the first part of the loop already? +d = kvm_devices + i; + +/* end of list */ +if (d-type == 0) +break; ...even if that should not happen if everything works. But let's be paranoid. Yeah :). I like paranoid. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM doesn't send an arp announce after live migrating a domain
Hey guys, not sure if this is a bug or a feature request. It's just something we've noticed and are having problems with. We're using the qemu-kvm lenny-backports package on Debian 5.0.5. When doing a live migration from the virsh shell, the server in question becomes unreachable because the ARP cache on our switches still thinks the server is on another port. As soon as the server sends out some traffic, such as a ping, the ARP cache get's updated as expected. If it does nothing, the server remains unreachable until the ARP cache expires on the switches. (in our case 4 hours) We would like to be able to do live migration for customer machines on which we have no access, so we really need KVM to send out an ARP announcement/gratuitous ARP when doing a live migration. Could anyone tell me if this is a bug in KVM, libvirt, or the debian qemu-kvm package? (or if I'm doing something wrong? :-) ) I've been tcpdumping the bridged network interfaces on the hosts while doing the migrate, and couldn't see any ARP broadcasts. Debian 5.0.5 Kernel: 2.6.32-bpo.3-amd64 qemu-kvm 0.12.4+dfsg-1~bpo50+2 libvirt0 0.7.6-1~bpo50+1 migrate --live testserver qemu+ssh://192.168.1.3/system Thanks in advance, Nils -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote: On 25.08.2010, at 10:16, Heiko Carstens wrote: On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote: +static void hotplug_devices(struct work_struct *dummy) +{ + unsigned int i; + struct kvm_device_desc *d; + struct device *dev; + + for (i = 0; i PAGE_SIZE; i += desc_size(d)) { This should be for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) { otherwise you might have memory accesses beyond the device page... Oh, this is a simple copypaste from the original search method. Is d valid in the first part of the loop already? No, you would need to initialize it with kvm_devices if you change the loop. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On 25.08.2010, at 10:35, Heiko Carstens wrote: On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote: On 25.08.2010, at 10:16, Heiko Carstens wrote: On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote: +static void hotplug_devices(struct work_struct *dummy) +{ + unsigned int i; + struct kvm_device_desc *d; + struct device *dev; + + for (i = 0; i PAGE_SIZE; i += desc_size(d)) { This should be for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) { otherwise you might have memory accesses beyond the device page... Oh, this is a simple copypaste from the original search method. Is d valid in the first part of the loop already? No, you would need to initialize it with kvm_devices if you change the loop. But even then it would take the size of the current entry, not the next one we're actually looking for, no? Maybe it'd be easier to just convert this into a while loop and explicitly open code everything. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
25.08.2010 11:52, Nils Cant wrote: Hey guys, not sure if this is a bug or a feature request. It's just something we've noticed and are having problems with. We're using the qemu-kvm lenny-backports package on Debian 5.0.5. When doing a live migration from the virsh shell, the server in question becomes unreachable because the ARP cache on our switches still thinks the server is on another port. As soon as the server sends out some traffic, such as a ping, the ARP cache get's updated as expected. If it does nothing, the server remains unreachable until the ARP cache expires on the switches. (in our case 4 hours) We would like to be able to do live migration for customer machines on which we have no access, so we really need KVM to send out an ARP announcement/gratuitous ARP when doing a live migration. Could anyone tell me if this is a bug in KVM, libvirt, or the debian qemu-kvm package? (or if I'm doing something wrong? :-) ) It's probably a bug in your understanding ;) Jokes aside, the thing is that kvm does not know what is an ARP and what is an IP address. It emulates a hardware network card, which never sends any ARP out by its own, it is the operating system IP stack who's doing that. That network card as emulated by kvm does not know what IP addresses are assigned to it inside the guest (there may be many, or may be none at all), so it just can not send the ARPs. These ARPs should be sent by guest. Another question is how to force/tell it to do so, and this is, again, depends on the guest operating system, number of addresses assigned to the interface and so on. The mechanism to trigger it may be based on link status of the card for example - kvm may lower it for a few ms right after migration, to indicate that the cord were un-plugged and plugged back, to force the guest to do whatever it needs to do... But that's just a possibility for future development. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 09:52:16AM +0200, Nils Cant wrote: Hey guys, not sure if this is a bug or a feature request. It's just something we've noticed and are having problems with. We're using the qemu-kvm lenny-backports package on Debian 5.0.5. When doing a live migration from the virsh shell, the server in question becomes unreachable because the ARP cache on our switches still thinks the server is on another port. As soon as the server sends out some traffic, such as a ping, the ARP cache get's updated as expected. If it does nothing, the server remains unreachable until the ARP cache expires on the switches. (in our case 4 hours) We would like to be able to do live migration for customer machines on which we have no access, so we really need KVM to send out an ARP announcement/gratuitous ARP when doing a live migration. Could anyone tell me if this is a bug in KVM, libvirt, or the debian qemu-kvm package? (or if I'm doing something wrong? :-) ) qemu sends gratuitous ARP after migration. Check forward delay setting on your bridge interface. It should be set to zero. I've been tcpdumping the bridged network interfaces on the hosts while doing the migrate, and couldn't see any ARP broadcasts. Debian 5.0.5 Kernel: 2.6.32-bpo.3-amd64 qemu-kvm 0.12.4+dfsg-1~bpo50+2 libvirt0 0.7.6-1~bpo50+1 migrate --live testserver qemu+ssh://192.168.1.3/system Thanks in advance, Nils -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 12:37:21PM +0400, Michael Tokarev wrote: 25.08.2010 11:52, Nils Cant wrote: Hey guys, not sure if this is a bug or a feature request. It's just something we've noticed and are having problems with. We're using the qemu-kvm lenny-backports package on Debian 5.0.5. When doing a live migration from the virsh shell, the server in question becomes unreachable because the ARP cache on our switches still thinks the server is on another port. As soon as the server sends out some traffic, such as a ping, the ARP cache get's updated as expected. If it does nothing, the server remains unreachable until the ARP cache expires on the switches. (in our case 4 hours) We would like to be able to do live migration for customer machines on which we have no access, so we really need KVM to send out an ARP announcement/gratuitous ARP when doing a live migration. Could anyone tell me if this is a bug in KVM, libvirt, or the debian qemu-kvm package? (or if I'm doing something wrong? :-) ) It's probably a bug in your understanding ;) Jokes aside, the thing is that kvm does not know what is an ARP and what is an IP address. It emulates a hardware network card, which never sends any ARP out by its own, it is the operating system IP stack who's doing that. That network card as emulated by kvm does not know what IP addresses are assigned to it inside the guest (there may be many, or may be none at all), so it just can not send the ARPs. True. Although qemu sends gratuitous ARP the IP field there is incorrect. It is done to update layer 2 topology, not layer 3. These ARPs should be sent by guest. Another question is how to force/tell it to do so, and this is, again, depends on the guest operating system, number of addresses assigned to the interface and so on. The mechanism to trigger it may be based on link status of the card for example - kvm may lower it for a few ms right after migration, to indicate that the cord were un-plugged and plugged back, to force the guest to do whatever it needs to do... But that's just a possibility for future development. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On Wed, Aug 25, 2010 at 10:34:29AM +0200, Alexander Graf wrote: On 25.08.2010, at 10:35, Heiko Carstens wrote: On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote: On 25.08.2010, at 10:16, Heiko Carstens wrote: On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote: +static void hotplug_devices(struct work_struct *dummy) +{ +unsigned int i; +struct kvm_device_desc *d; +struct device *dev; + +for (i = 0; i PAGE_SIZE; i += desc_size(d)) { This should be for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) { otherwise you might have memory accesses beyond the device page... Oh, this is a simple copypaste from the original search method. Is d valid in the first part of the loop already? No, you would need to initialize it with kvm_devices if you change the loop. But even then it would take the size of the current entry, not the next one we're actually looking for, no? Maybe it'd be easier to just convert this into a while loop and explicitly open code everything. Yes indeed. It's going to be quite ugly if you want to make sure that at no time you're going to access memory beyond the device page. Eeek.. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On 25.08.2010, at 10:48, Heiko Carstens wrote: On Wed, Aug 25, 2010 at 10:34:29AM +0200, Alexander Graf wrote: On 25.08.2010, at 10:35, Heiko Carstens wrote: On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote: On 25.08.2010, at 10:16, Heiko Carstens wrote: On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote: +static void hotplug_devices(struct work_struct *dummy) +{ +unsigned int i; +struct kvm_device_desc *d; +struct device *dev; + +for (i = 0; i PAGE_SIZE; i += desc_size(d)) { This should be for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) { otherwise you might have memory accesses beyond the device page... Oh, this is a simple copypaste from the original search method. Is d valid in the first part of the loop already? No, you would need to initialize it with kvm_devices if you change the loop. But even then it would take the size of the current entry, not the next one we're actually looking for, no? Maybe it'd be easier to just convert this into a while loop and explicitly open code everything. Yes indeed. It's going to be quite ugly if you want to make sure that at no time you're going to access memory beyond the device page. Eeek.. :) Thinking about this a bit more ... it's no problem to access beyond the device page. After the device page follow the rings and we always have rings because we always have a virtio console :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 10:38 AM, Gleb Natapov wrote: qemu sends gratuitous ARP after migration. Check forward delay setting on your bridge interface. It should be set to zero. Aha! That fixed it. Turns out that debian bridge-utils sets the default to 15 for bridges. Manually setting it to 0 with 'brctl setfd br0 0' or setting the 'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue. Thanks for the help! Nils -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 07/10] Correct the tss size
- Avi Kivity a...@redhat.com wrote: On 08/24/2010 04:47 PM, Jason Wang wrote: TSS size should be 104 byte. Signed-off-by: Jason Wangjasow...@redhat.com --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 ring 0 stack .quad 0, 0, 0 rings 1, 2, 3 stack Hello avi: Rechek with the manual, there's no filed of RSP3. So this patch may make sense. But unfortunately it breaks 64bit vmexit test. Triple fault happens in setup_args(). Any suggestions or is there any thing I missed? - .quad 0, 0, 0, 0, 0, 0, 0, 0 1 qword reserved, 7 qwords IST + .quad 0, 0, 0, 0, 0, 0, 0 .long 0, 0, 0 3 dwords reserved + I/O map base address - so this looks correct? i = i + 1 .endr -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 2/3] KVM: x86 emulator: move string instruction completion check into separate function
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 37 - 1 files changed, 24 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8a77c99..922f874 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2938,6 +2938,28 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static bool string_insn_completed(struct x86_emulate_ctxt *ctxt) +{ + struct decode_cache *c = ctxt-decode; + + /* The second termination condition only applies for REPE +* and REPNE. Test if the repeat string operation prefix is +* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the +* corresponding termination condition according to: +* - if REPE/REPZ and ZF = 0 then done +* - if REPNE/REPNZ and ZF = 1 then done +*/ + if (((c-b == 0xa6) || (c-b == 0xa7) || +(c-b == 0xae) || (c-b == 0xaf)) +(((c-rep_prefix == REPE_PREFIX) +((ctxt-eflags EFLG_ZF) == 0)) + || ((c-rep_prefix == REPNE_PREFIX) + ((ctxt-eflags EFLG_ZF) == EFLG_ZF + return true; + + return false; +} + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) { @@ -3426,19 +3448,8 @@ writeback: if (c-rep_prefix (c-d String)) { struct read_cache *r = ctxt-decode.io_read; register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); - /* The second termination condition only applies for REPE -* and REPNE. Test if the repeat string operation prefix is -* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the -* corresponding termination condition according to: -* - if REPE/REPZ and ZF = 0 then done -* - if REPNE/REPNZ and ZF = 1 then done -*/ - if (((c-b == 0xa6) || (c-b == 0xa7) || -(c-b == 0xae) || (c-b == 0xaf)) -(((c-rep_prefix == REPE_PREFIX) -((ctxt-eflags EFLG_ZF) == 0)) - || ((c-rep_prefix == REPNE_PREFIX) - ((ctxt-eflags EFLG_ZF) == EFLG_ZF + + if (string_insn_completed(ctxt)) ctxt-restart = false; /* * Re-enter guest when pio read ahead buffer is empty or, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 1/3] KVM: x86 emulator: Rename variable that shadows another local variable.
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4a89348..8a77c99 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3424,7 +3424,7 @@ writeback: c-dst); if (c-rep_prefix (c-d String)) { - struct read_cache *rc = ctxt-decode.io_read; + struct read_cache *r = ctxt-decode.io_read; register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); /* The second termination condition only applies for REPE * and REPNE. Test if the repeat string operation prefix is @@ -3444,8 +3444,8 @@ writeback: * Re-enter guest when pio read ahead buffer is empty or, * if it is not used, after each 1024 iteration. */ - else if ((rc-end == 0 !(c-regs[VCPU_REGS_RCX] 0x3ff)) || -(rc-end != 0 rc-end == rc-pos)) { + else if ((r-end == 0 !(c-regs[VCPU_REGS_RCX] 0x3ff)) || +(r-end != 0 r-end == r-pos)) { ctxt-restart = false; c-eip = ctxt-eip; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 3/3] KVM: x86 emulator: get rid of restart in emulation context.
x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |4 ++- arch/x86/kvm/emulate.c | 43 arch/x86/kvm/x86.c | 16 ++-- 3 files changed, 30 insertions(+), 33 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f22e5da..3a82ecd 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -220,7 +220,6 @@ struct x86_emulate_ctxt { /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; - bool restart; /* restart string instruction after writeback */ bool perm_ok; /* do not check permissions if true */ int exception; /* exception that happens during emulation or -1 */ @@ -251,6 +250,9 @@ struct x86_emulate_ctxt { #endif int x86_decode_insn(struct x86_emulate_ctxt *ctxt); +#define EMULATION_FAILED -1 +#define EMULATION_OK 0 +#define EMULATION_RESTART 1 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt); int emulator_task_switch(struct x86_emulate_ctxt *ctxt, u16 tss_selector, int reason, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 922f874..a4cf1f6 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -438,7 +438,6 @@ static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, ctxt-exception = vec; ctxt-error_code = error; ctxt-error_code_valid = valid; - ctxt-restart = false; } static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err) @@ -2635,9 +2634,6 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt) struct opcode opcode, *g_mod012, *g_mod3; struct operand memop = { .type = OP_NONE }; - /* we cannot decode insn before we complete previous rep insn */ - WARN_ON(ctxt-restart); - c-eip = ctxt-eip; c-fetch.start = c-fetch.end = c-eip; ctxt-cs_base = seg_base(ctxt, ops, VCPU_SREG_CS); @@ -2990,10 +2986,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt) } if (c-rep_prefix (c-d String)) { - ctxt-restart = true; /* All REP prefixes have the same first termination condition */ if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) { - ctxt-restart = false; ctxt-eip = c-eip; goto done; } @@ -3449,28 +3443,29 @@ writeback: struct read_cache *r = ctxt-decode.io_read; register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); - if (string_insn_completed(ctxt)) - ctxt-restart = false; - /* -* Re-enter guest when pio read ahead buffer is empty or, -* if it is not used, after each 1024 iteration. -*/ - else if ((r-end == 0 !(c-regs[VCPU_REGS_RCX] 0x3ff)) || -(r-end != 0 r-end == r-pos)) { - ctxt-restart = false; - c-eip = ctxt-eip; + if (!string_insn_completed(ctxt)) { + /* +* Re-enter guest when pio read ahead buffer is empty +* or, if it is not used, after each 1024 iteration. +*/ + if ((r-end != 0 || c-regs[VCPU_REGS_RCX] 0x3ff) + (r-end == 0 || r-end != r-pos)) { + /* +* Reset read cache. Usually happens before +* decode, but since instruction is restarted +* we have to do it here. +*/ + ctxt-decode.mem_read.end = 0; + return EMULATION_RESTART; + } + goto done; /* skip rip writeback */ } } - /* -* reset read cache here in case string instruction is restared -* without decoding -*/ - ctxt-decode.mem_read.end = 0; - if (!ctxt-restart) - ctxt-eip = c-eip; + + ctxt-eip = c-eip; done: - return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; + return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK; twobyte_insn: switch (c-b) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 524bd3f..112213e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4171,18 +4171,17 @@ int emulate_instruction(struct kvm_vcpu *vcpu, restart: r = x86_emulate_insn(vcpu-arch.emulate_ctxt); - if (r) { /* emulation failed */ + if (r == EMULATION_FAILED) { if
Re: [PATCH kvm-unit-tests 07/10] Correct the tss size
On 08/25/2010 12:40 PM, Jason Wang wrote: - Avi Kivitya...@redhat.com wrote: On 08/24/2010 04:47 PM, Jason Wang wrote: TSS size should be 104 byte. Signed-off-by: Jason Wangjasow...@redhat.com --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 ring 0 stack .quad 0, 0, 0 rings 1, 2, 3 stack Hello avi: Rechek with the manual, there's no filed of RSP3. So this patch may make sense. That is true. But please redo it to remove one 0 from the line above, not from the IST. But unfortunately it breaks 64bit vmexit test. Triple fault happens in setup_args(). Any suggestions or is there any thing I missed? No idea. Can you post an ftrace of the crash? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: guest MAC-address isolation
On 08/20/2010 08:48 PM, Robert Rebstock wrote: Hello. Thank you for your answer. - Original Message - From: Avi Kivitya...@redhat.com To: Robert Rebstockrebst...@scienceworks.com Cc: kvm@vger.kernel.org Sent: Tuesday, August 17, 2010 11:36:41 AM Subject: Re: guest MAC-address isolation On 08/06/2010 08:09 PM, Robert Rebstock wrote: Hello all, can anyone recommend a better way to achieve (guest agnostic) MAC-address isolation in qemu/kvm then with user-mode networking? I have multiple guests requiring the same MAC-address, and user-mode/slirp networking is quite slow. You can put the different guests on different bridges, and use IP routing to connect the two bridges; or you can use ebtables to mangle the MAC addresses. Could you possibly give me an example? Unfortunately my networking skills are not the best, which is not to say that I don't try. The best I can do, after reading the documentation I could find, is: ebtables -t nat -A PREROUTING -d 00:11:11:11:11:11 -j dnat --to-dest 00:01:23:45:67:89 --dnat-target ACCEPT ebtables -t nat -A POSTROUTING -s 00:01:23:45:67:89 -j snat --to-src 00:11:11:11:11:11 --snat-arp --snat-target ACCEPT but I can see no way to mangle multiple identical MACs so as to achieve layer-2 isolation for my snapshotted VMs. You could use --in-interface to select packets based on which guest they originated from (for snat). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 3/3] KVM: x86 emulator: get rid of restart in emulation context.
On 08/25/2010 12:47 PM, Gleb Natapov wrote: x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Looks good. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 12:21 PM, Nils Cant wrote: On 08/25/2010 10:38 AM, Gleb Natapov wrote: qemu sends gratuitous ARP after migration. Check forward delay setting on your bridge interface. It should be set to zero. Aha! That fixed it. Turns out that debian bridge-utils sets the default to 15 for bridges. Manually setting it to 0 with 'brctl setfd br0 0' or setting the 'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue. I think libvirt is doing something about this, copying list for further info. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
Gleb Natapov wrote: On Wed, Aug 25, 2010 at 12:37:21PM +0400, Michael Tokarev wrote: [] Jokes aside, the thing is that kvm does not know what is an ARP and what is an IP address. It emulates a hardware network card, which never sends any ARP out by its own, it is the operating system IP stack who's doing that. That network card as emulated by kvm does not know what IP addresses are assigned to it inside the guest (there may be many, or may be none at all), so it just can not send the ARPs. True. Although qemu sends gratuitous ARP the IP field there is incorrect. It is done to update layer 2 topology, not layer 3. Actually, the more I think about that, the more it looks like a job for external (for the guest) piece. For example, we may teach libvirt or kvm about IP addresses the guest is using, so that kvm will send these ARPs automatically after migration has completed. It shouldn't be difficult to implement. Something like: -net nic,model=virtio,arp=1.2.3.4:5.6.7.8,mac=foo:bar or, even, -net tap,arp=...,... for the command-line interface, and/or a 'sendarp' monitor command that expects a network device and a list of ip addresses. Kvm is the most natural place to do that, I think, and it's easy to implement there too (it has the tun device which can inject packets on behalf of the guest) Yes, the configuration will be duplicated somehow, but that's not a big problem, and it will make things much more reliable. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 01:40:19PM +0300, Avi Kivity wrote: On 08/25/2010 12:21 PM, Nils Cant wrote: On 08/25/2010 10:38 AM, Gleb Natapov wrote: qemu sends gratuitous ARP after migration. Check forward delay setting on your bridge interface. It should be set to zero. Aha! That fixed it. Turns out that debian bridge-utils sets the default to 15 for bridges. Manually setting it to 0 with 'brctl setfd br0 0' or setting the 'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue. I think libvirt is doing something about this, copying list for further info. libvirt doesn't set a policy for this. It provides an API for configuring host networking, but we don't override the kernel's forward delay policy, since we don't presume that all bridges are going to have VMs attached. In any case the API isn't available for Debian yet, since no one has ported netcf to Debian, so I assume the OP set bridging up manually. The '15' second default is actually a kernel level default IIRC. The two main host network configs recommended for use with libvirt+KVM (either NAT or bridging) are documented here: http://wiki.libvirt.org/page/Networking Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 02:51:31PM +0400, Michael Tokarev wrote: Gleb Natapov wrote: On Wed, Aug 25, 2010 at 12:37:21PM +0400, Michael Tokarev wrote: [] Jokes aside, the thing is that kvm does not know what is an ARP and what is an IP address. It emulates a hardware network card, which never sends any ARP out by its own, it is the operating system IP stack who's doing that. That network card as emulated by kvm does not know what IP addresses are assigned to it inside the guest (there may be many, or may be none at all), so it just can not send the ARPs. True. Although qemu sends gratuitous ARP the IP field there is incorrect. It is done to update layer 2 topology, not layer 3. Actually, the more I think about that, the more it looks like a job for external (for the guest) piece. For example, we may teach libvirt or kvm about IP addresses the guest is using, so that kvm will send these ARPs automatically after migration has completed. It shouldn't be difficult to implement. Something like: -net nic,model=virtio,arp=1.2.3.4:5.6.7.8,mac=foo:bar Back to static IP age? or, even, -net tap,arp=...,... for the command-line interface, and/or a 'sendarp' monitor command that expects a network device and a list of ip addresses. Kvm is the most natural place to do that, I think, and it's easy to implement there too (it has the tun device which can inject packets on behalf of the guest) Yes, the configuration will be duplicated somehow, but that's not a big problem, and it will make things much more reliable. KVM is certainly not the most natural place to do that. Even gratuitous ARP we have today will not work if guest changes mac address. KVM couldn't care less about host network protocols. Management may implement guest daemon that will take appropriate action to restore networking after migration on demand (send gratuitous pigeon if IP over pigeons are used by guest). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 01:52 PM, Daniel P. Berrange wrote: I think libvirt is doing something about this, copying list for further info. libvirt doesn't set a policy for this. It provides an API for configuring host networking, but we don't override the kernel's forward delay policy, since we don't presume that all bridges are going to have VMs attached. In any case the API isn't available for Debian yet, since no one has ported netcf to Debian, so I assume the OP set bridging up manually. The '15' second default is actually a kernel level default IIRC. The two main host network configs recommended for use with libvirt+KVM (either NAT or bridging) are documented here: http://wiki.libvirt.org/page/Networking From that page: # virsh net-define /usr/share/libvirt/networks/default.xml From my copy of that file: network namedefault/name bridge name=virbr0 / forward/ ip address=192.168.122.1 netmask=255.255.255.0 dhcp range start=192.168.122.2 end=192.168.122.254 / /dhcp /ip /network So it looks like the default config uses the kernel default? If libvirt uses an existing bridge I agree it shouldn't hack it, but if it creates its own can't it use a sensible default? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 01:59:02PM +0300, Gleb Natapov wrote: Kvm is the most natural place to do that, I think, and it's easy to implement there too (it has the tun device which can inject packets on behalf of the guest) Yes, the configuration will be duplicated somehow, but that's not a big problem, and it will make things much more reliable. KVM is certainly not the most natural place to do that. Even gratuitous ARP we have today will not work if guest changes mac address. KVM couldn't care less about host network protocols. Management may implement Correction: couldn't care less about _guest_ network protocols guest daemon that will take appropriate action to restore networking after migration on demand (send gratuitous pigeon if IP over pigeons are used by guest). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 02:05:45PM +0300, Avi Kivity wrote: On 08/25/2010 01:52 PM, Daniel P. Berrange wrote: I think libvirt is doing something about this, copying list for further info. libvirt doesn't set a policy for this. It provides an API for configuring host networking, but we don't override the kernel's forward delay policy, since we don't presume that all bridges are going to have VMs attached. In any case the API isn't available for Debian yet, since no one has ported netcf to Debian, so I assume the OP set bridging up manually. The '15' second default is actually a kernel level default IIRC. The two main host network configs recommended for use with libvirt+KVM (either NAT or bridging) are documented here: http://wiki.libvirt.org/page/Networking From that page: # virsh net-define /usr/share/libvirt/networks/default.xml From my copy of that file: network namedefault/name bridge name=virbr0 / forward/ ip address=192.168.122.1 netmask=255.255.255.0 dhcp range start=192.168.122.2 end=192.168.122.254 / /dhcp /ip /network So it looks like the default config uses the kernel default? If libvirt uses an existing bridge I agree it shouldn't hack it, but if it creates its own can't it use a sensible default? That is the NAT virtual network. That one *does* default to a forward delay of 0, but since it is NAT, it is fairly useless for migration in anycase. If you do 'virsh net-dumpxml default' you should see that delay='0' was added The OP was using bridging rather than NAT though, so this XML example doesn't apply. My comments about libvirt not overriding kenrel policy for forward delay were WRT full bridging mode, not the NAT mode[1] Regards, Daniel [1] Yes, the NAT mode uses a bridge as an implementation detail, but there's no physical NIC in that bridge - it is merely to connect the TAP devices together. Connection to the LAN is forwarded + NAT. -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 02:15 PM, Daniel P. Berrange wrote: So it looks like the default config uses the kernel default? If libvirt uses an existing bridge I agree it shouldn't hack it, but if it creates its own can't it use a sensible default? That is the NAT virtual network. That one *does* default to a forward delay of 0, but since it is NAT, it is fairly useless for migration in anycase. If you do 'virsh net-dumpxml default' you should see that delay='0' was added The OP was using bridging rather than NAT though, so this XML example doesn't apply. My comments about libvirt not overriding kenrel policy for forward delay were WRT full bridging mode, not the NAT mode[1] Yes, of course. Can't libvirt also create a non-NAT bridge? Looks like it would prevent a lot of manual work and opportunity for misconfiguration. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 02:30:01PM +0300, Avi Kivity wrote: On 08/25/2010 02:15 PM, Daniel P. Berrange wrote: So it looks like the default config uses the kernel default? If libvirt uses an existing bridge I agree it shouldn't hack it, but if it creates its own can't it use a sensible default? That is the NAT virtual network. That one *does* default to a forward delay of 0, but since it is NAT, it is fairly useless for migration in anycase. If you do 'virsh net-dumpxml default' you should see that delay='0' was added The OP was using bridging rather than NAT though, so this XML example doesn't apply. My comments about libvirt not overriding kenrel policy for forward delay were WRT full bridging mode, not the NAT mode[1] Yes, of course. Can't libvirt also create a non-NAT bridge? Looks like it would prevent a lot of manual work and opportunity for misconfiguration. Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated the docs to cover this functionality yet though. It also does bonding, and vlans, etc Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
Gleb Natapov wrote: On Wed, Aug 25, 2010 at 02:51:31PM +0400, Michael Tokarev wrote: [] For example, we may teach libvirt or kvm about IP addresses the guest is using, so that kvm will send these ARPs automatically after migration has completed. [] Kvm is the most natural place to do that, I think, and it's easy to implement there too (it has the tun device which can inject packets on behalf of the guest) Yes, the configuration will be duplicated somehow, but that's not a big problem, and it will make things much more reliable. KVM is certainly not the most natural place to do that. Even gratuitous ARP we have today will not work if guest changes mac address. KVM couldn't care less about host network protocols. Management may implement guest daemon that will take appropriate action to restore networking after migration on demand (send gratuitous pigeon if IP over pigeons are used by guest). I mean something else. When using standard, the most common configuration, without fancy settings or technologies like IP over pigeons, the most easy way to do that is in kvm, it should be just about 20 lines of code or so. Yes that will not work in some complex setups, where in-guest solution will be needed, but in that case a guest daemon alone wont help, it will need to run some script to do custom actions. For the MAC address changes for example -- the solution is simple: don't change MAC address in guest. Or if you do, either teach kvm about that (so it'll send proper ARP), or implement custom solution in guest, or don't migrate, or live with delays after migration. There are multiple choices. Yet doing it the simplest way (in kvm) will cover some 99% cases, and doing it as some daemon in the guest will cover that same 99% cases anyway (for the rest some custom script will be needed). So I think it's the best to implement it in kvm in the most stright-forward and easy way. Yes, some guest notification is probably needed anyway - not only for this case with networks but also in order to notify guest about, say, resume from freeze (after loadvm or migrate from file), afrer migration and so on, so guest can react to such events in a meaningful way. But this is in parallel with the ability to send an ARP after migration. Just IMHO ofcourse. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 02:36 PM, Daniel P. Berrange wrote: Can't libvirt also create a non-NAT bridge? Looks like it would prevent a lot of manual work and opportunity for misconfiguration. Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated the docs to cover this functionality yet though. It also does bonding, and vlans, etc Great. Is virt-manager able to drive this? it would be great if you could drive everything from there. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 02:38:25PM +0300, Avi Kivity wrote: On 08/25/2010 02:36 PM, Daniel P. Berrange wrote: Can't libvirt also create a non-NAT bridge? Looks like it would prevent a lot of manual work and opportunity for misconfiguration. Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated the docs to cover this functionality yet though. It also does bonding, and vlans, etc Great. Is virt-manager able to drive this? it would be great if you could drive everything from there. Yes, it does now, under the menu Edit - Host Details - Network Interfaces NetworkManager has also finally learnt to ignore ifcfg-XXX files which have a BRIDGE= setting in them, so it shouldn't totally trash your guest bridge networking if you leave NM running. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 02:42 PM, Daniel P. Berrange wrote: Is virt-manager able to drive this? it would be great if you could drive everything from there. Yes, it does now, under the menu Edit - Host Details - Network Interfaces NetworkManager has also finally learnt to ignore ifcfg-XXX files which have a BRIDGE= setting in them, so it shouldn't totally trash your guest bridge networking if you leave NM running. Cool. I guess what remains is to get people to unlearn all the previous hacks. (also would be nice to have libvirt talk to NetworkManager instead of /etc/sysconfig) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM doesn't send an arp announce after live migrating a domain
On Wed, Aug 25, 2010 at 03:36:29PM +0400, Michael Tokarev wrote: Gleb Natapov wrote: On Wed, Aug 25, 2010 at 02:51:31PM +0400, Michael Tokarev wrote: [] For example, we may teach libvirt or kvm about IP addresses the guest is using, so that kvm will send these ARPs automatically after migration has completed. [] Kvm is the most natural place to do that, I think, and it's easy to implement there too (it has the tun device which can inject packets on behalf of the guest) Yes, the configuration will be duplicated somehow, but that's not a big problem, and it will make things much more reliable. KVM is certainly not the most natural place to do that. Even gratuitous ARP we have today will not work if guest changes mac address. KVM couldn't care less about host network protocols. Management may implement guest daemon that will take appropriate action to restore networking after migration on demand (send gratuitous pigeon if IP over pigeons are used by guest). I mean something else. When using standard, the most common configuration, without fancy settings or technologies like IP over pigeons, the most easy way to do that is in kvm, it should be just about 20 lines of code or so. Yes that will not work in some complex setups, where in-guest solution will be needed, but in that case a guest daemon alone wont help, it will need to run some script to do custom actions. For the MAC address changes for example -- the solution is simple: don't change MAC address in guest. Or if you do, either teach kvm about that (so it'll send proper ARP), or implement custom solution in guest, or don't migrate, or live with delays after migration. There are multiple choices. Yet doing it the simplest way (in kvm) will cover some 99% cases, and doing it as some daemon in the guest will cover that same 99% cases anyway (for the rest some custom script will be needed). So I think it's the best to implement it in kvm in the most stright-forward and easy way. Yes, some guest notification is probably needed anyway - not only for this case with networks but also in order to notify guest about, say, resume from freeze (after loadvm or migrate from file), afrer migration and so on, so guest can react to such events in a meaningful way. But this is in parallel with the ability to send an ARP after migration. Just IMHO ofcourse. The most common case is for guest to use DHCP to obtain IP dynamically, so KVM cannot know what IP to use without sniffing network. And no, we do not want to pass mac/ip from guest to host. For that guest agent is needed anyway and it can send gratuitous ARP by itself. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 07/10] Correct the tss size
- Avi Kivity a...@redhat.com wrote: On 08/25/2010 12:40 PM, Jason Wang wrote: - Avi Kivitya...@redhat.com wrote: On 08/24/2010 04:47 PM, Jason Wang wrote: TSS size should be 104 byte. Signed-off-by: Jason Wangjasow...@redhat.com --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 ring 0 stack .quad 0, 0, 0 rings 1, 2, 3 stack Hello avi: Rechek with the manual, there's no filed of RSP3. So this patch may make sense. That is true. But please redo it to remove one 0 from the line above, not from the IST. But unfortunately it breaks 64bit vmexit test. Triple fault happens in setup_args(). Any suggestions or is there any thing I missed? No idea. Can you post an ftrace of the crash? The trace before triple fault: .. qemu-kvm-8101 [002] 243.138507: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138508: kvm_exit: reason IO_INSTRUCTION rip 0x400e5f qemu-kvm-8101 [002] 243.138508: kvm_pio: pio_read at 0x510 size 2 count 1 qemu-kvm-8101 [002] 243.138512: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138513: kvm_exit: reason IO_INSTRUCTION rip 0x400e71 qemu-kvm-8101 [002] 243.138514: kvm_emulate_insn: 0:400e71: ec (prot64) qemu-kvm-8101 [002] 243.138515: kvm_pio: pio_write at 0x511 size 1 count 1 qemu-kvm-8101 [002] 243.138519: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138520: kvm_exit: reason IO_INSTRUCTION rip 0x400e71 qemu-kvm-8101 [002] 243.138521: kvm_emulate_insn: 0:400e71: ec (prot64) qemu-kvm-8101 [002] 243.138521: kvm_pio: pio_write at 0x511 size 1 count 1 qemu-kvm-8101 [002] 243.138525: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138526: kvm_exit: reason CPUID rip 0x400ff7 qemu-kvm-8101 [002] 243.138526: kvm_cpuid: func 1 rax 6d3 rbx 800 rcx 80002001 rdx 78bfbfd qemu-kvm-8101 [002] 243.138527: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138528: kvm_exit: reason EXCEPTION_NMI rip 0x400271 qemu-kvm-8101 [002] 243.138528: kvm_page_fault: address 40f3a0 error_code b qemu-kvm-8101 [002] 243.138530: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138531: kvm_exit: reason TRIPLE_FAULT rip 0x400c15 -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 07/10] Correct the tss size
On 08/25/2010 03:27 PM, Jason Wang wrote: - Avi Kivitya...@redhat.com wrote: On 08/25/2010 12:40 PM, Jason Wang wrote: - Avi Kivitya...@redhat.com wrote: On 08/24/2010 04:47 PM, Jason Wang wrote: TSS size should be 104 byte. Signed-off-by: Jason Wangjasow...@redhat.com --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 ring 0 stack .quad 0, 0, 0 rings 1, 2, 3 stack Hello avi: Rechek with the manual, there's no filed of RSP3. So this patch may make sense. That is true. But please redo it to remove one 0 from the line above, not from the IST. But unfortunately it breaks 64bit vmexit test. Triple fault happens in setup_args(). Any suggestions or is there any thing I missed? No idea. Can you post an ftrace of the crash? The trace before triple fault: .. qemu-kvm-8101 [002] 243.138507: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138508: kvm_exit: reason IO_INSTRUCTION rip 0x400e5f qemu-kvm-8101 [002] 243.138508: kvm_pio: pio_read at 0x510 size 2 count 1 qemu-kvm-8101 [002] 243.138512: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138513: kvm_exit: reason IO_INSTRUCTION rip 0x400e71 qemu-kvm-8101 [002] 243.138514: kvm_emulate_insn: 0:400e71: ec (prot64) qemu-kvm-8101 [002] 243.138515: kvm_pio: pio_write at 0x511 size 1 count 1 qemu-kvm-8101 [002] 243.138519: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138520: kvm_exit: reason IO_INSTRUCTION rip 0x400e71 qemu-kvm-8101 [002] 243.138521: kvm_emulate_insn: 0:400e71: ec (prot64) qemu-kvm-8101 [002] 243.138521: kvm_pio: pio_write at 0x511 size 1 count 1 qemu-kvm-8101 [002] 243.138525: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138526: kvm_exit: reason CPUID rip 0x400ff7 qemu-kvm-8101 [002] 243.138526: kvm_cpuid: func 1 rax 6d3 rbx 800 rcx 80002001 rdx 78bfbfd qemu-kvm-8101 [002] 243.138527: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138528: kvm_exit: reason EXCEPTION_NMI rip 0x400271 qemu-kvm-8101 [002] 243.138528: kvm_page_fault: address 40f3a0 error_code b qemu-kvm-8101 [002] 243.138530: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138531: kvm_exit: reason TRIPLE_FAULT rip 0x400c15 What's the corresponding disassembly? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 07/10] Correct the tss size
- Avi Kivity a...@redhat.com wrote: On 08/25/2010 03:27 PM, Jason Wang wrote: - Avi Kivitya...@redhat.com wrote: On 08/25/2010 12:40 PM, Jason Wang wrote: - Avi Kivitya...@redhat.com wrote: On 08/24/2010 04:47 PM, Jason Wang wrote: TSS size should be 104 byte. Signed-off-by: Jason Wangjasow...@redhat.com --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 ring 0 stack .quad 0, 0, 0 rings 1, 2, 3 stack Hello avi: Rechek with the manual, there's no filed of RSP3. So this patch may make sense. That is true. But please redo it to remove one 0 from the line above, not from the IST. But unfortunately it breaks 64bit vmexit test. Triple fault happens in setup_args(). Any suggestions or is there any thing I missed? No idea. Can you post an ftrace of the crash? The trace before triple fault: .. qemu-kvm-8101 [002] 243.138507: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138508: kvm_exit: reason IO_INSTRUCTION rip 0x400e5f qemu-kvm-8101 [002] 243.138508: kvm_pio: pio_read at 0x510 size 2 count 1 qemu-kvm-8101 [002] 243.138512: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138513: kvm_exit: reason IO_INSTRUCTION rip 0x400e71 qemu-kvm-8101 [002] 243.138514: kvm_emulate_insn: 0:400e71: ec (prot64) qemu-kvm-8101 [002] 243.138515: kvm_pio: pio_write at 0x511 size 1 count 1 qemu-kvm-8101 [002] 243.138519: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138520: kvm_exit: reason IO_INSTRUCTION rip 0x400e71 qemu-kvm-8101 [002] 243.138521: kvm_emulate_insn: 0:400e71: ec (prot64) qemu-kvm-8101 [002] 243.138521: kvm_pio: pio_write at 0x511 size 1 count 1 qemu-kvm-8101 [002] 243.138525: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138526: kvm_exit: reason CPUID rip 0x400ff7 qemu-kvm-8101 [002] 243.138526: kvm_cpuid: func 1 rax 6d3 rbx 800 rcx 80002001 rdx 78bfbfd qemu-kvm-8101 [002] 243.138527: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138528: kvm_exit: reason EXCEPTION_NMI rip 0x400271 qemu-kvm-8101 [002] 243.138528: kvm_page_fault: address 40f3a0 error_code b qemu-kvm-8101 [002] 243.138530: kvm_entry: vcpu 0 qemu-kvm-8101 [002] 243.138531: kvm_exit: reason TRIPLE_FAULT rip 0x400c15 What's the corresponding disassembly? 00400bb8 __setup_args: 400bb8: 41 55 push %r13 400bba: 41 54 push %r12 400bbc: 55 push %rbp 400bbd: 53 push %rbx 400bbe: 48 8b 1d db e7 00 00mov0xe7db(%rip),%rbx# 40f3a0 __args 400bc5: 41 bc 80 ec 40 00 mov$0x40ec80,%r12d 400bcb: 41 bd 80 f0 40 00 mov$0x40f080,%r13d 400bd1: eb 42 jmp400c15 __setup_args+0x5d 400bd3: 4d 89 65 00 mov%r12,0x0(%r13) 400bd7: 0f b6 28movzbl (%rax),%ebp 400bda: 40 84 edtest %bpl,%bpl 400bdd: 75 16 jne400bf5 __setup_args+0x3d 400bdf: eb 21 jmp400c02 __setup_args+0x4a 400be1: 41 88 2c 24 mov%bpl,(%r12) 400be5: 49 83 c4 01 add$0x1,%r12 400bed: 0f b6 2bmovzbl (%rbx),%ebp 400bf0: 40 84 edtest %bpl,%bpl 400bf3: 74 0d je 400c02 __setup_args+0x4a 400bf5: 40 0f be fd movsbl %bpl,%edi 400bf9: e8 a6 ff ff ff callq 400ba4 isblank 400bfe: 84 c0 test %al,%al 400c00: 74 df je 400be1 __setup_args+0x29 400c02: 49 83 c5 08 add$0x8,%r13 400c06: 41 c6 04 24 00 movb $0x0,(%r12) 400c0b: 49 83 c4 01 add$0x1,%r12 400c0f: eb 04 jmp400c15 __setup_args+0x5d 400c11: 48 83 c3 01 add$0x1,%rbx 400c15: 0f b6 2bmovzbl (%rbx),%ebp 400c18: 40 0f be fd movsbl %bpl,%edi 400c1c: e8 83 ff ff ff callq 400ba4 isblank -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]
Hello! I try to use KVM on slackware host and centos5 guest with SMP. When the server is idle, I get following errors in /var/log/messages: http://pastebin.com/4RrMVuXq Guest: [r...@centos-5-x64 ~]# uname -a Linux centos-5-x64.slave 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10 19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Start command: /usr/local/qemu/bin/qemu-system-x86_64 -name centos-5-x64 -m 384 -smp 2 -cdrom /root/CentOS-5.5-x86_64-netinstall.iso -drive index=0,media=disk,if=virtio,file=/dev/mapper/vgvm-centos5x64,boot=on -vnc 0.0.0.0:1 -net nic,macaddr=fe:e1:de:ad:57:22 -net tap,script=/usr/local/qemu/bin/qemu-ifup -monitor stdio -usb -usbdevice tablet -enable-kvm Host: r...@superserver:~# uname -a Linux superserver 2.6.35.3 #4 SMP Mon Aug 23 12:17:01 MSD 2010 x86_64 Intel(R) Xeon(R) CPU X3460 @ 2.80GHz GenuineIntel GNU/Linux r...@superserver:~# /usr/local/qemu/bin/qemu-system-x86_64 -version QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c) 2003-2008 Fabrice Bellard What does it mean? -- Boris Dolgov. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]
On Wed, Aug 25, 2010 at 5:19 PM, Avi Kivity a...@redhat.com wrote: Looks like a bug in our USB emulation. Copying qemu-devel to see if anyone has a clue. I have disabled USB support, but the bug have not disappeared. Please, pay attention that CPU#1 was stuck too. -- Boris Dolgov. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[stable 2.6.32] instant crash (jump to NULL) with virtio-net, tap, bridge and veth
Hello. I'm seeing instant host kernel crash triggered by _any_ network activity to/from a kvm guest that's using virtio-net. My setup is maybe a bit unusual, but here we go. I've a host machine that has one bridge configured, and is running a few kvm virtual machines and a few linux containers (LXC). All the guests/containers are connected to that single bridge - guests using tap devices, lxc containers using veth devices. Host eth0 is connected to the same bridge as well. The problem happens with virtio-net drivers used in guest (this is windowsXP virtual machine with latest netkvm driver from alt.fedoraproject.org), when I connect to that guest from an LXC container. I.e, when packet goes lxc = veth = bridge = tun = kvm = virtio in guest (or back). When I connect to the same guest from _host_, it all works as expected. When I change (virtual) NIC in guest to e1000 or older (from 2009) virtio-net driver, it works. When I connect from lxc container to a linux guest with latest virtio-net drivers, it all works as expected too. So only one combination so far that triggers the issue. This is all with 2.6.32 kernel. Initially it was 2.6.32.15, but 2.6.32.20 behaves the same way too. All 64bit. Also it does NOT happen with 2.6.35.3, the current latest released kernel. Here's one of captured OOPSes (i did it several times, but they were incomplete): console [netcon0] enabled netconsole: network logging started BUG: unable to handle kernel NULL pointer dereference at (null) IP: [(null)] (null) PGD 177bf2067 PUD 177ae5067 PMD 0 Oops: 0010 [#1] SMP last sysfs file: /sys/devices/virtual/block/md8/md/mismatch_cnt CPU 0 Modules linked in: netconsole configfs squashfs kvm_amd kvm veth autofs4 bridge quota_v2 quota_tree ext4 jbd2 crc16 raid0 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx loop sr_mod cdrom tun powernow_k8 processor thermal_sys 8021q garp stp llc asus_atk0110 hwmon atl1 mii ext3 jbd mbcache raid1 md_mod pata_atiixp ehci_hcd ohci_hcd usbcore nls_base ahci libata sd_mod scsi_mod Pid: 2345, comm: kvm Not tainted 2.6.32-amd64 #2.6.32.20 System Product Name RIP: 0010:[] [(null)] (null) RSP: 0018:880028203e70 EFLAGS: 00010293 RAX: 880179480ec0 RBX: 8801a07770c0 RCX: RDX: RSI: 8801a07770c0 RDI: 8801a07770c0 RBP: 880124b89030 R08: 8125fab0 R09: 880028203e40 R10: R11: R12: 880028210888 R13: 880028210880 R14: 0001e60f R15: 0040 FS: 7fe2da5e5700() GS:88002820() knlGS:f74a59d0 CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 000177a8a000 CR4: 06f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kvm64 (pid: 2345, threadinfo 880177be2000, task 880177a7c0c0) Stack: 8125fbd5 0040 8126013c 8000 0 8800282108b8 0002 880028210888 880028210880 0 81236276 880028203f48 8800282108b8 Call Trace: IRQ [8125fbd5] ? ip_rcv_finish+0x125/0x430 [8126013c] ? ip_rcv+0x25c/0x350 [81236276] ? process_backlog+0x76/0xd0 [81236a18] ? net_rx_action+0xf8/0x1f0 [81059120] ? __do_softirq+0xb0/0x1d0 [8100c56c] ? call_softirq+0x1c/0x30 EOI [8100e595] ? do_softirq+0x65/0xa0 [81236b2e] ? netif_rx_ni+0x1e/0x30 [a014e97a] ? tun_chr_aio_write+0x35a/0x510 [tun] [a014e620] ? tun_chr_aio_write+0x0/0x510 [tun] [810ffea4] ? do_sync_readv_writev+0xd4/0x110 [8106e890] ? autoremove_wake_function+0x0/0x30 [81071709] ? enqueue_hrtimer+0x79/0xc0 [810ffd08] ? rw_copy_check_uvector+0x88/0x110 [811005bc] ? do_readv_writev+0xdc/0x220 [8106dafc] ? sys_timer_settime+0x13c/0x2e0 [8110084e] ? sys_writev+0x4e/0x90 [8100b482] ? system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [(null)] (null) RSP 880028203e70 CR2: ---[ end trace 1dcd3c52bde0fa25 ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 2345, comm: kvm Tainted: G D2.6.32-amd64 #2.6.32.20 Call Trace: IRQ [812c22de] ? panic+0x7a/0x134 [812c23d8] ? printk+0x40/0x48 [8100faa3] ? oops_end+0xa3/0xb0 [8103138a] ? no_context+0xfa/0x260 [812c52a5] ? page_fault+0x25/0x30 [8125fab0] ? ip_rcv_finish+0x0/0x430 [8125fbd5] ? ip_rcv_finish+0x125/0x430 [8126013c] ? ip_rcv+0x25c/0x350 [81236276] ? process_backlog+0x76/0xd0 [81236a18] ? net_rx_action+0xf8/0x1f0 [81059120] ? __do_softirq+0xb0/0x1d0 [8100c56c] ? call_softirq+0x1c/0x30 EOI [8100e595] ? do_softirq+0x65/0xa0 [81236b2e] ? netif_rx_ni+0x1e/0x30 [a014e97a] ?
Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]
On Wed, Aug 25, 2010 at 5:15 PM, Boris Dolgov bo...@dolgov.name wrote: Hello! I try to use KVM on slackware host and centos5 guest with SMP. When the server is idle, I get following errors in /var/log/messages: http://pastebin.com/4RrMVuXq Guest: [r...@centos-5-x64 ~]# uname -a Linux centos-5-x64.slave 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10 19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Start command: /usr/local/qemu/bin/qemu-system-x86_64 -name centos-5-x64 -m 384 -smp 2 -cdrom /root/CentOS-5.5-x86_64-netinstall.iso -drive index=0,media=disk,if=virtio,file=/dev/mapper/vgvm-centos5x64,boot=on -vnc 0.0.0.0:1 -net nic,macaddr=fe:e1:de:ad:57:22 -net tap,script=/usr/local/qemu/bin/qemu-ifup -monitor stdio -usb -usbdevice tablet -enable-kvm Host: r...@superserver:~# uname -a Linux superserver 2.6.35.3 #4 SMP Mon Aug 23 12:17:01 MSD 2010 x86_64 Intel(R) Xeon(R) CPU X3460 @ 2.80GHz GenuineIntel GNU/Linux r...@superserver:~# /usr/local/qemu/bin/qemu-system-x86_64 -version QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c) 2003-2008 Fabrice Bellard What does it mean? The problem also reproduces without SMP or USB for a guest. I have installed Debian 5.0.5 to another VM, and the problem is unreproducible under it: debian-5-x64:~# uname -a Linux debian-5-x64.slave 2.6.26-2-amd64 #1 SMP Thu Aug 19 00:37:36 UTC 2010 x86_64 GNU/Linux Should I post it to redhat bugzilla as redhat bug, or it is KVM bug? Also, I have noticed a strange thing. In top on host, Debian qemu process consumes nearly 1% of cpu, but CentOS qemu process consumes nearly 20-30%, even when VM is idle. The problem also reproduces on centos-i386 guest and doesn't reproduce on debian-5-i386. -- Boris Dolgov. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]
On 08/25/2010 11:14 AM, Boris Dolgov wrote: On Wed, Aug 25, 2010 at 5:15 PM, Boris Dolgovbo...@dolgov.name wrote: Hello! I try to use KVM on slackware host and centos5 guest with SMP. When the server is idle, I get following errors in /var/log/messages: http://pastebin.com/4RrMVuXq Guest: [r...@centos-5-x64 ~]# uname -a Linux centos-5-x64.slave 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10 19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Start command: /usr/local/qemu/bin/qemu-system-x86_64 -name centos-5-x64 -m 384 -smp 2 -cdrom /root/CentOS-5.5-x86_64-netinstall.iso -drive index=0,media=disk,if=virtio,file=/dev/mapper/vgvm-centos5x64,boot=on -vnc 0.0.0.0:1 -net nic,macaddr=fe:e1:de:ad:57:22 -net tap,script=/usr/local/qemu/bin/qemu-ifup -monitor stdio -usb -usbdevice tablet -enable-kvm Host: r...@superserver:~# uname -a Linux superserver 2.6.35.3 #4 SMP Mon Aug 23 12:17:01 MSD 2010 x86_64 Intel(R) Xeon(R) CPU X3460 @ 2.80GHz GenuineIntel GNU/Linux r...@superserver:~# /usr/local/qemu/bin/qemu-system-x86_64 -version QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c) 2003-2008 Fabrice Bellard What does it mean? The problem also reproduces without SMP or USB for a guest. I have installed Debian 5.0.5 to another VM, and the problem is unreproducible under it: debian-5-x64:~# uname -a Linux debian-5-x64.slave 2.6.26-2-amd64 #1 SMP Thu Aug 19 00:37:36 UTC 2010 x86_64 GNU/Linux Should I post it to redhat bugzilla as redhat bug, or it is KVM bug? Or post it to the CentOS bugzilla since you're not using Red Hat. There's been various time related issues in RHEL5.x so my inclination would be that it's not an upstream issue until you verify it with the latest upstream. Regards, Anthony Liguori Also, I have noticed a strange thing. In top on host, Debian qemu process consumes nearly 1% of cpu, but CentOS qemu process consumes nearly 20-30%, even when VM is idle. The problem also reproduces on centos-i386 guest and doesn't reproduce on debian-5-i386. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]
On Wed, Aug 25, 2010 at 8:19 PM, Anthony Liguori anth...@codemonkey.ws wrote: Or post it to the CentOS bugzilla since you're not using Red Hat. I'am using redhat, but not on this VM. There's been various time related issues in RHEL5.x so my inclination would be that it's not an upstream issue until you verify it with the latest upstream. Do you mean latest kvm or latest RHEL? I tried to use qemu-kvm from GIT today, but it doesn't compile. -- Boris Dolgov. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM timekeeping and TSC virtualization
On Tue, Aug 24, 2010 at 01:01:38PM -1000, Zachary Amsden wrote: With this patchset, KVM now has a much stronger guarantee: If you have old guest software running on broken hardware, using SMP virtual machines, you do not get hardware performance and error-free time virtualization.However, if you have new guest software, non-broken hardware, or can simply run UP guests instead of SMP, you can have hardware performance, and it is now error free. Alternatively, you can sacrifice some accuracy and have hardware performance, even for SMP guests, if you can tolerate some minor cross-CPU TSC variation. No other vendor I know of can make that guarantee. Zach If the processor has a stable TSC why trap it? I realize you are trying to cover a gauntlet of hardware and guests, so maybe a nerd knob is needed to disable. Exactly. If you have a stable TSC, we don't trap it. If you don't have a stable TSC, we do. That's the point of these patches. Wait, don't you trap if host TSC is faster than guest TSC? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM timekeeping 25/35] Add clock catchup mode
On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote: Make the clock update handler handle generic clock synchronization, not just KVM clock. We add a catchup mode which keeps passthrough TSC in line with absolute guest TSC. Signed-off-by: Zachary Amsden zams...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 55 ++ 2 files changed, 38 insertions(+), 18 deletions(-) kvm_x86_ops-vcpu_load(vcpu, cpu); - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : native_read_tsc() - vcpu-arch.last_host_tsc; if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); - if (check_tsc_unstable()) + if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); - kvm_migrate_timers(vcpu); + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } + if (vcpu-cpu != cpu) + kvm_migrate_timers(vcpu); vcpu-cpu = cpu; + vcpu-arch.tsc_rebase = 0; } } @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_x86_ops-vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); vcpu-arch.last_host_tsc = native_read_tsc(); + + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } The mix between catchup,trap versus stable,unstable TSC is confusing and difficult to grasp. Can you please introduce all the infrastructure first, then control usage of them in centralized places? Examples: +static void kvm_update_tsc_trapping(struct kvm *kvm) +{ + int trap, i; + struct kvm_vcpu *vcpu; + + trap = check_tsc_unstable() atomic_read(kvm-online_vcpus) 1; + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_x86_ops-set_tsc_trap(vcpu, trap !vcpu-arch.time_page); +} + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } kvm_guest_time_update is becoming very confusing too. I understand this is due to the many cases its dealing with, but please make it as simple as possible. + /* +* If we are trapping and no longer need to, use catchup to +* ensure passthrough TSC will not be less than trapped TSC +*/ + if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH vcpu-tsc_trapping + ((this_tsc_khz = v-kvm-arch.virtual_tsc_khz || kvmclock))) { + catchup = 1; What, TSC trapping with kvmclock enabled? For both catchup and trapping the resolution of the host clock is important, as Glauber commented for kvmclock. Can you comment on the problems that arrive from a low res clock for both modes? Similarly for catchup mode, the effect of exit frequency. No need for any guarantees? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Bridged Networking Issue in Ubuntu Lucid
Thanks for your help, it seems enabling ip forwarding on the host and setting a default gateway on the vm pointing to the host fixed it! On Wed, Aug 25, 2010 at 3:08 AM, haishan haishan@gmail.com wrote: Rus Hughes wrote: Hi guys, I'm trying to sort out networking for a VM I've created on my Ubuntu Lucid box but the VM cannot access the Internet. I can connect to the VM (crisps) from the host (holly) and to the host from the VM. But the VM cannot connect to the Internet and the Internet cannot connect to the VM. I followed the guide here: https://help.ubuntu.com/community/KVM I created it using ubuntu-vm-builder using : ubuntu-vm-builder kvm lucid \ --domain crisps \ --dest crisps \ --arch amd64 \ --hostname crisps \ --mem 2048 \ --rootsize 4096 \ --swapsize 1024 \ --user magicalusernameofdoom \ --pass somesecretawesomepassword \ --ip 178.63.60.159 \ --mask 255.255.255.192 \ --bcast 178.63.60.191 \ --gw 178.63.60.129 \ --dns 213.133.99.99 \ --mirror http://de.archive.ubuntu.com/ubuntu \ --components main,universe,restricted,multiverse \ --addpkg openssh-server \ --libvirt qemu:///system \ --bridge br0; virsh start crisps starts the vm up : root 29297 0.5 3.3 2280396 271220 ? Sl 14:12 0:13 /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps -uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait -monitor chardev:monitor -boot c -drive file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on -net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0 -net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc 127.0.0.1:0 -vga cirrus Networking on the host is like so: br0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e inet addr:178.63.60.138 Bcast:178.63.60.191 Mask:255.255.255.192 inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0 TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:35266886777 (35.2 GB) TX bytes:60292047383 (60.2 GB) eth0 Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0 TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:36337218359 (36.3 GB) TX bytes:60378047518 (60.3 GB) Interrupt:29 Base address:0x4000 eth0:0 Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a inet addr:178.63.60.178 Bcast:178.63.60.191 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:29 Base address:0x4000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0 TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:325111540 (325.1 MB) TX bytes:325111540 (325.1 MB) vnet0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72 errors:0 dropped:0 overruns:0 frame:0 TX packets:33 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:3412 (3.4 KB) TX bytes:2698 (2.6 KB) the hosts /etc/network/interfaces is like so : auto lo iface lo inet loopback # device: eth0 auto eth0 iface eth0 inet manual auto br0 iface br0 inet static address 178.63.60.138 broadcast 178.63.60.191 netmask 255.255.255.192 gateway 178.63.60.129 network 178.63.60.128 bridge_ports eth0 bridge_stp off bridge_fd 0 bridge_maxwait 0 # subli.me.uk auto eth0:0 iface eth0:0 inet static address 178.63.60.178 netmask 255.255.255.192 If I ping the VM I see the packets on the host br0 but not vnet0, this is the output of brctl show on the host : bridge name bridge id STP enabled interfaces br0 8000.265dc9d2752e no eth0 vnet0 this is the routing table on the host Kernel IP routing table Destination Gateway Genmask Flags
Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]
On Wed, Aug 25, 2010 at 8:19 PM, Anthony Liguori anth...@codemonkey.ws wrote: Or post it to the CentOS bugzilla since you're not using Red Hat. There's been various time related issues in RHEL5.x so my inclination would be that it's not an upstream issue until you verify it with the latest upstream. After setting clocksource to rtc (instead of dynticks), the problem disappeared. -- Boris Dolgov. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags
On Mon, Aug 23, 2010 at 02:06:11PM +0300, Avi Kivity wrote: Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c | 40 1 files changed, 16 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 152654e..79a5f81 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2370,42 +2370,34 @@ static struct group_dual group9 = { { static struct opcode opcode_table[256] = { /* 0x00 - 0x07 */ - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | Lock), - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM), - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm), + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM), + D2bv(DstAcc | SrcImm), D(ImplicitOps | Stack | No64), D(ImplicitOps | Stack | No64), /* 0x08 - 0x0F */ - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | Lock), - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM), - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm), + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM), + D2bv(DstAcc | SrcImm), D(ImplicitOps | Stack | No64), N, /* 0x10 - 0x17 */ - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | Lock), - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM), - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm), + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM), + D2bv(DstAcc | SrcImm), D(ImplicitOps | Stack | No64), D(ImplicitOps | Stack | No64), /* 0x18 - 0x1F */ - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | Lock), - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM), - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm), + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM), + D2bv(DstAcc | SrcImm), D(ImplicitOps | Stack | No64), D(ImplicitOps | Stack | No64), /* 0x20 - 0x27 */ - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | Lock), - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM), - D(ByteOp | DstAcc | SrcImmByte), D(DstAcc | SrcImm), N, N, + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM), + D2bv(DstAcc | SrcImmByte), N, N, SrcImm? opcode 0x25... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 5/5] KVM: MMU: combine guest pte read between fetch and pte prefetch
On Sun, Aug 22, 2010 at 07:13:33PM +0800, Xiao Guangrong wrote: Combine guest pte read between guest pte check in the fetch path and pte prefetch Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 40 +--- 1 files changed, 21 insertions(+), 19 deletions(-) Applied all, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM timekeeping and TSC virtualization
On 08/25/2010 06:55 AM, Marcelo Tosatti wrote: On Tue, Aug 24, 2010 at 01:01:38PM -1000, Zachary Amsden wrote: With this patchset, KVM now has a much stronger guarantee: If you have old guest software running on broken hardware, using SMP virtual machines, you do not get hardware performance and error-free time virtualization.However, if you have new guest software, non-broken hardware, or can simply run UP guests instead of SMP, you can have hardware performance, and it is now error free. Alternatively, you can sacrifice some accuracy and have hardware performance, even for SMP guests, if you can tolerate some minor cross-CPU TSC variation. No other vendor I know of can make that guarantee. Zach If the processor has a stable TSC why trap it? I realize you are trying to cover a gauntlet of hardware and guests, so maybe a nerd knob is needed to disable. Exactly. If you have a stable TSC, we don't trap it. If you don't have a stable TSC, we do. That's the point of these patches. Wait, don't you trap if host TSC is faster than guest TSC? Yes, but that's not a standard scenario, that only applies for a migration. There's also a way to avoid this: set guest TSC MAX (all host TSCs in cluster) Then catchup mode will be used to keep the TSC up to speed. Actually, it's an open question what to do for SMP in this case, I'm not sure I got that right. Technically the proper thing to do is to trap, but this should have a user override. Zach -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM timekeeping 25/35] Add clock catchup mode
On 08/25/2010 07:27 AM, Marcelo Tosatti wrote: On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote: Make the clock update handler handle generic clock synchronization, not just KVM clock. We add a catchup mode which keeps passthrough TSC in line with absolute guest TSC. Signed-off-by: Zachary Amsdenzams...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 55 ++ 2 files changed, 38 insertions(+), 18 deletions(-) kvm_x86_ops-vcpu_load(vcpu, cpu); - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : native_read_tsc() - vcpu-arch.last_host_tsc; if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); - if (check_tsc_unstable()) + if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); - kvm_migrate_timers(vcpu); + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } + if (vcpu-cpu != cpu) + kvm_migrate_timers(vcpu); vcpu-cpu = cpu; + vcpu-arch.tsc_rebase = 0; } } @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_x86_ops-vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); vcpu-arch.last_host_tsc = native_read_tsc(); + + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } The mix between catchup,trap versus stable,unstable TSC is confusing and difficult to grasp. Can you please introduce all the infrastructure first, then control usage of them in centralized places? Examples: +static void kvm_update_tsc_trapping(struct kvm *kvm) +{ + int trap, i; + struct kvm_vcpu *vcpu; + + trap = check_tsc_unstable() atomic_read(kvm-online_vcpus) 1; + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_x86_ops-set_tsc_trap(vcpu, trap !vcpu-arch.time_page); +} + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } kvm_guest_time_update is becoming very confusing too. I understand this is due to the many cases its dealing with, but please make it as simple as possible. I tried to comment as best as I could. I think the whole kvm_update_tsc_trapping thing is probably a poor design choice. It works, but it's thoroughly unintelligible right now without spending some days figuring out why. I'll rework the tail series of patches to try to make them more clear. + /* +* If we are trapping and no longer need to, use catchup to +* ensure passthrough TSC will not be less than trapped TSC +*/ + if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH vcpu-tsc_trapping + ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) { + catchup = 1; What, TSC trapping with kvmclock enabled? Transitioning to use of kvmclock after a cold boot means we may have been trapping and now we will not be. For both catchup and trapping the resolution of the host clock is important, as Glauber commented for kvmclock. Can you comment on the problems that arrive from a low res clock for both modes? Similarly for catchup mode, the effect of exit frequency. No need for any guarantees? The scheduler will do something to get an IRQ at whatever resolution it uses for it's timeslice. That guarantees an exit per timeslice, so we'll never be behind by more than one slice while scheduling. While not scheduling, we're dormant anyway, waiting on either an IRQ or shared memory variable change. Local timers could end up behind when dormant. We may need a hack to accelerate firing of timers in such a case, or perhaps bounds on when to use catchup mode and when to not. Partly, the lack of implementation is by deliberate choice; the logic involved with setting such bounds and wisdom of doing so is a choice most likely to be done by a policy agent in userspace, in our case, qemu. In the end, that is what has full control over the setting or not of guest TSC rate and choice of TSC mode. What's lacking is the ability to force the use of a certain mode. I think it's clear now, that needs to be a per-VM choice, not a global one. Zach -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH kvm-unit-tests 0/3] DIV exception tests
On Tue, Aug 24, 2010 at 02:01:08PM +0300, Avi Kivity wrote: DIV may raise an exception if dividing by zero or causing an overflow. Test it. Avi Kivity (3): Support #DE (divide error) exception handlers Allow emulation tests to trap exceptions Add DIV tests config-x86-common.mak |3 ++- x86/emulator.c| 20 x86/idt.c |8 +++- 3 files changed, 29 insertions(+), 2 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: trap and propagate #DE from DIV and IDIV
On Tue, Aug 24, 2010 at 02:10:29PM +0300, Avi Kivity wrote: Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c | 17 ++--- 1 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f82e43a..a7e26d0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -505,6 +505,12 @@ static void emulate_ts(struct x86_emulate_ctxt *ctxt, int err) emulate_exception(ctxt, TS_VECTOR, err, true); } +static int emulate_de(struct x86_emulate_ctxt *ctxt) +{ + emulate_exception(ctxt, DE_VECTOR, 0, false); + return X86EMUL_PROPAGATE_FAULT; +} + static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, unsigned long eip, u8 *dest) @@ -1459,6 +1465,7 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = ctxt-decode; unsigned long *rax = c-regs[VCPU_REGS_RAX]; unsigned long *rdx = c-regs[VCPU_REGS_RDX]; + u8 de = 0; switch (c-modrm_reg) { case 0 ... 1: /* test */ @@ -1477,14 +1484,18 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, emulate_1op_rax_rdx(imul, c-src, *rax, *rdx, ctxt-eflags); break; case 6: /* div */ - emulate_1op_rax_rdx(div, c-src, *rax, *rdx, ctxt-eflags); + emulate_1op_rax_rdx_ex(div, c-src, *rax, *rdx, +ctxt-eflags, de); break; case 7: /* idiv */ - emulate_1op_rax_rdx(idiv, c-src, *rax, *rdx, ctxt-eflags); + emulate_1op_rax_rdx_ex(idiv, c-src, *rax, *rdx, +ctxt-eflags, de); break; default: return X86EMUL_UNHANDLEABLE; } + if (de) + return emulate_de(ctxt); return X86EMUL_CONTINUE; } @@ -3363,7 +3374,7 @@ special_insn: break; case 0xf6 ... 0xf7: /* Grp3 */ if (emulate_grp3(ctxt, ops) != X86EMUL_CONTINUE) - goto cannot_emulate; + goto done; break; Must assign rc to emulate_grp3() retval. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] S390: take a full byte as ext_param indicator
On Tue, Aug 24, 2010 at 03:48:50PM +0200, Alexander Graf wrote: Currenty the ext_param field only distinguishes between config change and vring interrupt. We can do a lot more with it though, so let's enable a full byte of possible values and constants to #defines while at it. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - move defines to virtio_s390.h --- arch/s390/include/asm/kvm_virtio.h |6 ++ drivers/s390/kvm/kvm_virtio.c | 19 +-- 2 files changed, 19 insertions(+), 6 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cgroups: fix API thinko
On Fri, 06 Aug 2010 10:38:24 -0600 Alex Williamson alex.william...@redhat.com wrote: On Fri, 2010-08-06 at 09:34 -0700, Sridhar Samudrala wrote: On 8/5/2010 3:59 PM, Michael S. Tsirkin wrote: cgroup_attach_task_current_cg API that have upstream is backwards: we really need an API to attach to the cgroups from another process A to the current one. In our case (vhost), a priveledged user wants to attach it's task to cgroups from a less priveledged one, the API makes us run it in the other task's context, and this fails. So let's make the API generic and just pass in 'from' and 'to' tasks. Add an inline wrapper for cgroup_attach_task_current_cg to avoid breaking bisect. Signed-off-by: Michael S. Tsirkinm...@redhat.com --- Paul, Li, Sridhar, could you please review the following patch? I only compile-tested it due to travel, but looks straight-forward to me. Alex Williamson volunteered to test and report the results. Sending out now for review as I might be offline for a bit. Will only try to merge when done, obviously. If OK, I would like to merge this through -net tree, together with the patch fixing vhost-net. Let me know if that sounds ok. Thanks! This patch is on top of net-next, it is needed for fix vhost-net regression in net-next, where a non-priveledged process can't enable the device anymore: when qemu uses vhost, inside the ioctl call it creates a thread, and tries to add this thread to the groups of current, and it fails. But we control the thread, so to solve the problem, we really should tell it 'connect to out cgroups'. So am I correct to assume that this change is now needed in 2.6.36, and unneeded in 2.6.35? Can it affect the userspace-kernel API in amy manner? If so, it should be backported into earlier kernels to reduce the number of incompatible kernels out there. Paul, did you have any comments? I didn't see any update in response to the minor review comments, so... include/linux/cgroup.h |1 + kernel/cgroup.c|6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff -puN include/linux/cgroup.h~cgroups-fix-api-thinko-fix include/linux/cgroup.h --- a/include/linux/cgroup.h~cgroups-fix-api-thinko-fix +++ a/include/linux/cgroup.h @@ -579,6 +579,7 @@ void cgroup_iter_end(struct cgroup *cgrp int cgroup_scan_tasks(struct cgroup_scanner *scan); int cgroup_attach_task(struct cgroup *, struct task_struct *); int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); + static inline int cgroup_attach_task_current_cg(struct task_struct *tsk) { return cgroup_attach_task_all(current, tsk); diff -puN kernel/cgroup.c~cgroups-fix-api-thinko-fix kernel/cgroup.c --- a/kernel/cgroup.c~cgroups-fix-api-thinko-fix +++ a/kernel/cgroup.c @@ -1798,13 +1798,13 @@ out: int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk) { struct cgroupfs_root *root; - struct cgroup *cur_cg; int retval = 0; cgroup_lock(); for_each_active_root(root) { - cur_cg = task_cgroup_from_root(from, root); - retval = cgroup_attach_task(cur_cg, tsk); + struct cgroup *from_cg = task_cgroup_from_root(from, root); + + retval = cgroup_attach_task(from_cg, tsk); if (retval) break; } _ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/7] KVM steal time implementation
Hi, In this series, I am proposing an steal time implementation for KVM. It is an RFC, so don't be too harsh =] There are two parts of it: the guest and host part. The proposal for the guest part, is to just change the common time accounting, and try to identify at that spot, wether or not we should account any steal time. I considered this idea less cumbersome that trying to cook a clockevents implementation ourselves, since I see little value in it. I am, however, pretty open to suggestions. For the host-guest communications, I am using a shared page, in the same way as pvclock. Because of that, I am just hijacking pvclock structure anyway. There is a 32-bit field floating by, that gives us enough room for 8 years of steal time (we use msec resolution). The main idea is to timestamp our exit and entry through sched notifiers, and export the value at pvclock updates. This obviously have some disadvantages: by doing this we are giving up futures ideas about only updating this structure once, and even right now, won't work on pinned-smp (since we don't update pvclock if we haven't changed cpus.) But again, it is just an RFC, and I'd like to feel the reception of the idea as a whole. Have a nice review. Glauber Costa (6): change headers preparing for steal time measure time out of guest change kernel accounting to include steal time kvm steal time implementation touch softlockup watchdog tell guest about steal time feature Zachary Amsden (1): Implement getnsboottime kernel API arch/x86/include/asm/kvm_host.h|2 + arch/x86/include/asm/kvm_para.h|1 + arch/x86/include/asm/pvclock-abi.h |4 ++- arch/x86/kernel/kvmclock.c | 38 arch/x86/kvm/x86.c | 15 - include/linux/sched.h |1 + include/linux/time.h |1 + kernel/sched.c | 29 +++ kernel/time/timekeeping.c | 27 + 9 files changed, 115 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 1/7] Implement getnsboottime kernel API
From: Zachary Amsden zams...@redhat.com Add a kernel call to get the number of nanoseconds since boot. This is generally useful enough to make it a generic call. Signed-off-by: Zachary Amsden zams...@redhat.com --- include/linux/time.h |1 + kernel/time/timekeeping.c | 27 +++ 2 files changed, 28 insertions(+), 0 deletions(-) diff --git a/include/linux/time.h b/include/linux/time.h index ea3559f..5d04108 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -145,6 +145,7 @@ extern void getnstimeofday(struct timespec *tv); extern void getrawmonotonic(struct timespec *ts); extern void getboottime(struct timespec *ts); extern void monotonic_to_bootbased(struct timespec *ts); +extern s64 getnsboottime(void); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index caf8d4d..d250f0a 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -285,6 +285,33 @@ void ktime_get_ts(struct timespec *ts) } EXPORT_SYMBOL_GPL(ktime_get_ts); + +/** + * getnsboottime - get the bootbased clock in nsec format + * + * The function calculates the bootbased clock from the realtime + * clock and the wall_to_monotonic offset and stores the result + * in normalized timespec format in the variable pointed to by @ts. + */ +s64 getnsboottime(void) +{ + unsigned int seq; + s64 secs, nsecs; + + WARN_ON(timekeeping_suspended); + + do { + seq = read_seqbegin(xtime_lock); + secs = xtime.tv_sec + wall_to_monotonic.tv_sec; + secs += total_sleep_time.tv_sec; + nsecs = xtime.tv_nsec + wall_to_monotonic.tv_nsec; + nsecs += total_sleep_time.tv_nsec + timekeeping_get_ns(); + + } while (read_seqretry(xtime_lock, seq)); + return nsecs + (secs * NSEC_PER_SEC); +} +EXPORT_SYMBOL_GPL(getnsboottime); + /** * do_gettimeofday - Returns the time of day in a timeval * @tv:pointer to the timeval to be set -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/7] change headers preparing for steal time
This guest/host common patch prepares infrastructure for the steal time implementation. Some constants are added, and a name change happens in pvclock vcpu structure. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h|1 + arch/x86/include/asm/pvclock-abi.h |4 +++- 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 05eba5e..1759c81 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -25,6 +25,7 @@ * in pvclock structure. If no bits are set, all flags are ignored. */ #define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 +#define KVM_FEATURE_CLOCKSOURCE_STEAL_BIT 25 #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 35f2d19..417061b 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -24,7 +24,7 @@ struct pvclock_vcpu_time_info { u32 version; - u32 pad0; + u32 steal_time; u64 tsc_timestamp; u64 system_time; u32 tsc_to_system_mul; @@ -40,5 +40,7 @@ struct pvclock_wall_clock { } __attribute__((__packed__)); #define PVCLOCK_TSC_STABLE_BIT (1 0) +#define PVCLOCK_STEAL_BIT (2 0) + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 3/7] measure time out of guest
By measuring time between a vcpu_put and a vcpu_load, we can estimate how much time did the guest stay out of the cpu. This is exported to the guest at every clock update. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/x86.c | 12 +++- 2 files changed, 13 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 502e53f..bc28aff 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -364,6 +364,8 @@ struct kvm_vcpu_arch { u64 hv_vapic; cpumask_var_t wbinvd_dirty_mask; + u64 time_out; + u64 last_time_out; }; struct kvm_arch { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e7e3b50..680feaa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -925,8 +925,9 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) vcpu-hv_clock.system_time = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec) + v-kvm-arch.kvmclock_offset; - vcpu-hv_clock.flags = 0; + vcpu-hv_clock.flags = PVCLOCK_STEAL_BIT; + vcpu-hv_clock.steal_time = vcpu-time_out / 100; /* * The interface expects us to write an even number signaling that the * update is finished. Since the guest won't see the intermediate @@ -1798,6 +1799,8 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { + s64 now; + /* Address WBINVD may be executed by guest */ if (need_emulate_wbinvd(vcpu)) { if (kvm_x86_ops-has_wbinvd_exit()) @@ -1815,12 +1818,19 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) per_cpu(cpu_tsc_khz, cpu) = khz; } kvm_request_guest_time_update(vcpu); + + now = getnsboottime(); + + if (vcpu-arch.last_time_out != 0) + vcpu-arch.time_out += now - vcpu-arch.last_time_out; } void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { kvm_x86_ops-vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); + + vcpu-arch.last_time_out = getnsboottime(); } static int is_efer_nx(void) -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 4/7] change kernel accounting to include steal time
This patch proposes a common steal time implementation. When no steal time is accounted, we just add a branch to the current accounting code, that shouldn't add much overhead. When we do want to register steal time, we proceed as following: - if we would account user or system time in this tick, and there is out-of-cpu time registered, we skip it altogether, and account steal time only. - if we would account user or system time in this tick, and we got the cpu for the whole slice, we proceed normaly. - if we are idle in this tick, we flush out-of-cpu time to give it the chance to update whatever last-measure internal variable it may have. This approach is simple, but proved to work well for my test scenarios. in a UP guest on UP host, with a cpu-hog in both guest and host shows ~ 50 % steal time. steal time is also accounted proportionally, if nice values are given to the host cpu-hog. A cpu-hog in the host with no load in the guest, produces 0 % steal time, with 100 % idle, as one would expect. Signed-off-by: Glauber Costa glom...@redhat.com --- include/linux/sched.h |1 + kernel/sched.c| 29 + 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 047..e571ddd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout); extern void cpu_init (void); extern void trap_init(void); extern void update_process_times(int user); +extern cputime_t (*hypervisor_steal_time)(void); extern void scheduler_tick(void); extern void sched_show_task(struct task_struct *p); diff --git a/kernel/sched.c b/kernel/sched.c index f52a880..9695c92 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p) return ns; } +cputime_t (*hypervisor_steal_time)(void) = NULL; + +static inline cputime_t get_steal_time_from_hypervisor(void) +{ + if (!hypervisor_steal_time) + return 0; + return hypervisor_steal_time(); +} + + /* * Account user cpu time to a process. * @p: the process that the cpu time gets accounted to @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime, struct cpu_usage_stat *cpustat = kstat_this_cpu.cpustat; cputime64_t tmp; + tmp = get_steal_time_from_hypervisor(); + if (tmp) { + account_steal_time(tmp); + return; + } + /* Add user time to process. */ p-utime = cputime_add(p-utime, cputime); p-utimescaled = cputime_add(p-utimescaled, cputime_scaled); @@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int hardirq_offset, return; } + tmp = get_steal_time_from_hypervisor(); + if (tmp) { + account_steal_time(tmp); + return; + } + /* Add system time to process. */ p-stime = cputime_add(p-stime, cputime); p-stimescaled = cputime_add(p-stimescaled, cputime_scaled); @@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime) cputime64_t cputime64 = cputime_to_cputime64(cputime); struct rq *rq = this_rq(); + /* +* if we're idle, we don't account it as steal time, since we did +* not want to run anyway. We do call the steal function, however, to +* give the guest the chance to flush its internal buffers +*/ + get_steal_time_from_hypervisor(); + if (atomic_read(rq-nr_iowait) 0) cpustat-iowait = cputime64_add(cpustat-iowait, cputime64); else -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 5/7] kvm steal time implementation
This is the proposed kvm-side steal time implementation. It is migration safe, as it checks flags at every read. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/kernel/kvmclock.c | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index eb9b76c..a1f4852 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -18,6 +18,8 @@ #include linux/clocksource.h #include linux/kvm_para.h +#include linux/kernel_stat.h +#include linux/sched.h #include asm/pvclock.h #include asm/msr.h #include asm/apic.h @@ -41,6 +43,7 @@ early_param(no-kvmclock, parse_no_kvmclock); /* The hypervisor will put information about time periodically here */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock); +static DEFINE_PER_CPU(u64, steal_info); static struct pvclock_wall_clock wall_clock; /* @@ -82,6 +85,32 @@ static cycle_t kvm_clock_read(void) return ret; } +static DEFINE_PER_CPU(u64, steal_info); + +cputime_t kvm_get_steal_time(void) +{ + u64 delta = 0; + u64 *last_steal_info, this_steal_info; + struct pvclock_vcpu_time_info *src; + + src = get_cpu_var(hv_clock); + if (!(src-flags PVCLOCK_STEAL_BIT)) + goto out; + + this_steal_info = src-steal_time; + put_cpu_var(hv_clock); + + last_steal_info = get_cpu_var(steal_info); + + delta = this_steal_info - *last_steal_info; + + *last_steal_info = this_steal_info; + put_cpu_var(steal_info); + +out: + return msecs_to_cputime(delta); +} + static cycle_t kvm_clock_get_cycles(struct clocksource *cs) { return kvm_clock_read(); @@ -134,6 +163,8 @@ static int kvm_register_clock(char *txt) printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n, cpu, high, low, txt); + per_cpu(steal_info, cpu) = 0; + return native_write_msr_safe(msr_kvm_system_time, low, high); } @@ -218,4 +249,8 @@ void __init kvmclock_init(void) if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); + + + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STEAL_BIT)) + hypervisor_steal_time = kvm_get_steal_time; } -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 7/7] tell guest about steal time feature
Guest kernel will only activate steal time if the host exports it. Warn the guest about it. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/kvm/x86.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 680feaa..9a20cd8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2101,7 +2101,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry-eax = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_NOP_IO_DELAY) | (1 KVM_FEATURE_CLOCKSOURCE2) | -(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); +(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | +(1 KVM_FEATURE_CLOCKSOURCE_STEAL_BIT); entry-ebx = 0; entry-ecx = 0; entry-edx = 0; -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 6/7] touch softlockup watchdog
With a reliable steal time mechanism, we can tell if we're out of the cpu for very long, differentiating from the case that we simply got a real softlockup. In the case we were out of cpu, the watchdog is fed, making bogus softlockups disappear. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/kernel/kvmclock.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index a1f4852..1c496c8 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -104,6 +104,9 @@ cputime_t kvm_get_steal_time(void) delta = this_steal_info - *last_steal_info; + if (delta 1000UL) + touch_softlockup_watchdog(); + *last_steal_info = this_steal_info; put_cpu_var(steal_info); -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM timekeeping 25/35] Add clock catchup mode
On Wed, Aug 25, 2010 at 10:48:20AM -1000, Zachary Amsden wrote: On 08/25/2010 07:27 AM, Marcelo Tosatti wrote: On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote: Make the clock update handler handle generic clock synchronization, not just KVM clock. We add a catchup mode which keeps passthrough TSC in line with absolute guest TSC. Signed-off-by: Zachary Amsdenzams...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 55 ++ 2 files changed, 38 insertions(+), 18 deletions(-) kvm_x86_ops-vcpu_load(vcpu, cpu); - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : native_read_tsc() - vcpu-arch.last_host_tsc; if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); - if (check_tsc_unstable()) + if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); - kvm_migrate_timers(vcpu); + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } + if (vcpu-cpu != cpu) + kvm_migrate_timers(vcpu); vcpu-cpu = cpu; + vcpu-arch.tsc_rebase = 0; } } @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_x86_ops-vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); vcpu-arch.last_host_tsc = native_read_tsc(); + + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } The mix between catchup,trap versus stable,unstable TSC is confusing and difficult to grasp. Can you please introduce all the infrastructure first, then control usage of them in centralized places? Examples: +static void kvm_update_tsc_trapping(struct kvm *kvm) +{ + int trap, i; + struct kvm_vcpu *vcpu; + + trap = check_tsc_unstable() atomic_read(kvm-online_vcpus) 1; + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_x86_ops-set_tsc_trap(vcpu, trap !vcpu-arch.time_page); +} + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } kvm_guest_time_update is becoming very confusing too. I understand this is due to the many cases its dealing with, but please make it as simple as possible. I tried to comment as best as I could. I think the whole kvm_update_tsc_trapping thing is probably a poor design choice. It works, but it's thoroughly unintelligible right now without spending some days figuring out why. I'll rework the tail series of patches to try to make them more clear. + /* +* If we are trapping and no longer need to, use catchup to +* ensure passthrough TSC will not be less than trapped TSC +*/ + if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH vcpu-tsc_trapping + ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) { + catchup = 1; What, TSC trapping with kvmclock enabled? Transitioning to use of kvmclock after a cold boot means we may have been trapping and now we will not be. For both catchup and trapping the resolution of the host clock is important, as Glauber commented for kvmclock. Can you comment on the problems that arrive from a low res clock for both modes? Similarly for catchup mode, the effect of exit frequency. No need for any guarantees? The scheduler will do something to get an IRQ at whatever resolution it uses for it's timeslice. That guarantees an exit per timeslice, so we'll never be behind by more than one slice while scheduling. While not scheduling, we're dormant anyway, waiting on either an IRQ or shared memory variable change. Local timers could end up behind when dormant. We may need a hack to accelerate firing of timers in such a case, or perhaps bounds on when to use catchup mode and when to not. What about emulating rdtsc with low res clock? The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Partly, the lack of implementation is by deliberate choice; the logic involved with setting such bounds and wisdom of doing so is a choice most likely to be done by a policy agent in userspace, in our case, qemu. In the end, that is what has full control over the setting or not of guest TSC rate and choice of TSC mode. What's lacking is the ability to force the use of a certain mode.
QEMU/KVM Hardware Acceleration Clarity Question
Hello, After looking all over the net and trying some KVM stuff I just want to get some clarification on which QEMU I should be using. I am trying to use virtio with hardware acceleration. Platform Fedora x86_64-kernel 2.6.32.16-150 KVM kernel modules 2.6.32.16-150 Question: Does the qemu-system-x86_64 version 0.12.5 with the -enable-kvm switch have all of the hardware acceleration stuff that is in qemu-kvm 0.11.0? The reason I ask is I seem to be having network performance problems I will save for another email. I have read various message boards indicating that qemu-system does not have all of the acceleration in it yet and others that make me believe it should have all of the acceleration stuff in qemu-kvm. I have tried both and seem to get slightly better performance with qemu-kvm. The version of qemu-kvm came with the installation and I can't seem to find a newer version. I downloaded qemu-system 12.5 from the repository and built it myself. The qemu-system appears to be doing hardware acceleration as the indicators match the qemu-kvm indicators. But since the performance was slightly worse I thought I would check and see if version 12.5 had all of the hardware acceleration that is in qemu-kvm version 11.0. Also, Vhost support is only in kernel 2.6.35 even though qemu 12.5 has support. Is that correct? Thank you very much in advance! Your help is greatly appreciated. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)
On Wed, Aug 25, 2010 at 02:10:53PM +0800, Wei Yongjun wrote: Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Applied (and v2 testcase), thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM timekeeping 25/35] Add clock catchup mode
On Wed, Aug 25, 2010 at 07:01:34PM -0300, Marcelo Tosatti wrote: On Wed, Aug 25, 2010 at 10:48:20AM -1000, Zachary Amsden wrote: On 08/25/2010 07:27 AM, Marcelo Tosatti wrote: On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote: Make the clock update handler handle generic clock synchronization, not just KVM clock. We add a catchup mode which keeps passthrough TSC in line with absolute guest TSC. Signed-off-by: Zachary Amsdenzams...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 55 ++ 2 files changed, 38 insertions(+), 18 deletions(-) kvm_x86_ops-vcpu_load(vcpu, cpu); - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : native_read_tsc() - vcpu-arch.last_host_tsc; if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); - if (check_tsc_unstable()) + if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); - kvm_migrate_timers(vcpu); + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } + if (vcpu-cpu != cpu) + kvm_migrate_timers(vcpu); vcpu-cpu = cpu; + vcpu-arch.tsc_rebase = 0; } } @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_x86_ops-vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); vcpu-arch.last_host_tsc = native_read_tsc(); + + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } The mix between catchup,trap versus stable,unstable TSC is confusing and difficult to grasp. Can you please introduce all the infrastructure first, then control usage of them in centralized places? Examples: +static void kvm_update_tsc_trapping(struct kvm *kvm) +{ + int trap, i; + struct kvm_vcpu *vcpu; + + trap = check_tsc_unstable() atomic_read(kvm-online_vcpus) 1; + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_x86_ops-set_tsc_trap(vcpu, trap !vcpu-arch.time_page); +} + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } kvm_guest_time_update is becoming very confusing too. I understand this is due to the many cases its dealing with, but please make it as simple as possible. I tried to comment as best as I could. I think the whole kvm_update_tsc_trapping thing is probably a poor design choice. It works, but it's thoroughly unintelligible right now without spending some days figuring out why. I'll rework the tail series of patches to try to make them more clear. + /* +* If we are trapping and no longer need to, use catchup to +* ensure passthrough TSC will not be less than trapped TSC +*/ + if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH vcpu-tsc_trapping + ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) { + catchup = 1; What, TSC trapping with kvmclock enabled? Transitioning to use of kvmclock after a cold boot means we may have been trapping and now we will not be. For both catchup and trapping the resolution of the host clock is important, as Glauber commented for kvmclock. Can you comment on the problems that arrive from a low res clock for both modes? Similarly for catchup mode, the effect of exit frequency. No need for any guarantees? The scheduler will do something to get an IRQ at whatever resolution it uses for it's timeslice. That guarantees an exit per timeslice, so we'll never be behind by more than one slice while scheduling. While not scheduling, we're dormant anyway, waiting on either an IRQ or shared memory variable change. Local timers could end up behind when dormant. We may need a hack to accelerate firing of timers in such a case, or perhaps bounds on when to use catchup mode and when to not. What about emulating rdtsc with low res clock? The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. This is bad semantics, IMHO. It is a totally different behaviour than the one guest users would expect. -- To unsubscribe from this list: send the line
Re: [KVM timekeeping 25/35] Add clock catchup mode
On 08/25/2010 12:01 PM, Marcelo Tosatti wrote: On Wed, Aug 25, 2010 at 10:48:20AM -1000, Zachary Amsden wrote: On 08/25/2010 07:27 AM, Marcelo Tosatti wrote: On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote: Make the clock update handler handle generic clock synchronization, not just KVM clock. We add a catchup mode which keeps passthrough TSC in line with absolute guest TSC. Signed-off-by: Zachary Amsdenzams...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 55 ++ 2 files changed, 38 insertions(+), 18 deletions(-) kvm_x86_ops-vcpu_load(vcpu, cpu); - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : native_read_tsc() - vcpu-arch.last_host_tsc; if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); - if (check_tsc_unstable()) + if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); - kvm_migrate_timers(vcpu); + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } + if (vcpu-cpu != cpu) + kvm_migrate_timers(vcpu); vcpu-cpu = cpu; + vcpu-arch.tsc_rebase = 0; } } @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_x86_ops-vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); vcpu-arch.last_host_tsc = native_read_tsc(); + + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } The mix between catchup,trap versus stable,unstable TSC is confusing and difficult to grasp. Can you please introduce all the infrastructure first, then control usage of them in centralized places? Examples: +static void kvm_update_tsc_trapping(struct kvm *kvm) +{ + int trap, i; + struct kvm_vcpu *vcpu; + + trap = check_tsc_unstable() atomic_read(kvm-online_vcpus) 1; + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_x86_ops-set_tsc_trap(vcpu, trap !vcpu-arch.time_page); +} + /* For unstable TSC, force compensation and catchup on next CPU */ + if (check_tsc_unstable()) { + vcpu-arch.tsc_rebase = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } kvm_guest_time_update is becoming very confusing too. I understand this is due to the many cases its dealing with, but please make it as simple as possible. I tried to comment as best as I could. I think the whole kvm_update_tsc_trapping thing is probably a poor design choice. It works, but it's thoroughly unintelligible right now without spending some days figuring out why. I'll rework the tail series of patches to try to make them more clear. + /* +* If we are trapping and no longer need to, use catchup to +* ensure passthrough TSC will not be less than trapped TSC +*/ + if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH vcpu-tsc_trapping + ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) { + catchup = 1; What, TSC trapping with kvmclock enabled? Transitioning to use of kvmclock after a cold boot means we may have been trapping and now we will not be. For both catchup and trapping the resolution of the host clock is important, as Glauber commented for kvmclock. Can you comment on the problems that arrive from a low res clock for both modes? Similarly for catchup mode, the effect of exit frequency. No need for any guarantees? The scheduler will do something to get an IRQ at whatever resolution it uses for it's timeslice. That guarantees an exit per timeslice, so we'll never be behind by more than one slice while scheduling. While not scheduling, we're dormant anyway, waiting on either an IRQ or shared memory variable change. Local timers could end up behind when dormant. We may need a hack to accelerate firing of timers in such a case, or perhaps bounds on when to use catchup mode and when to not. What about emulating rdtsc with low res clock? The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Technically, that may not be quite correct. digression into weeds The RDTSC instruction will return a monotonically increasing unique value, but the execution and retirement of the instruction are unserialized. So technically, two simultaneous RDTSC could
Re: [Qemu-devel] [PATCH v2 0/2] Use kvm64/kvm32 when running under KVM
On Wednesday 28 July 2010 20:42:22 jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com This set of patches adds default CPU types to the PC compat definitions, and patch #2 sets the CPU type to kvm64/kvm32 when running under KVM. Long term we might want to qdev'ify the CPUs but I think it is better to keep it simple for 0.13. Looks good to me. Some Windows guest wouldn't enable MSI if it think the processor is too old. So we need this patchset. -- regards Yang, Sheng Jes Sorensen (2): Set a default CPU type for compat PC machine defs. Use kvm32/kvm64 as default CPUs when running under KVM. hw/boards.h |1 + hw/pc.c | 13 +++-- hw/pc_piix.c | 15 +++ vl.c |2 ++ 4 files changed, 29 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cgroups: fix API thinko
On Wed, Aug 25, 2010 at 2:35 PM, Andrew Morton a...@linux-foundation.org wrote: So am I correct to assume that this change is now needed in 2.6.36, and unneeded in 2.6.35? Can it affect the userspace-kernel API in amy manner? If so, it should be backported into earlier kernels to reduce the number of incompatible kernels out there. AFAICS it shouldn't affect any existing APIs, either in-kernel or to userspace - it just makes the existing function cgroup_attach_task_current_cg() a specialization of a more generic new function. Paul, did you have any comments? Other than the language being a bit confusing, it seems fine. I'd probably word the patch description as: Add cgroup_attach_task_all() The existing cgroup_attach_task_current_cg() API is called by a thread to attach another thread to all of its cgroups; this is unsuitable for cases where a privileged task wants to attach itself to the cgroups of a less privileged one, since the call must be made from the context of the target task. This patch adds a more generic cgroup_attach_task_all() API that allows both the source task and to-be-moved task to be specified. cgroup_attach_task_current_cg() becomes a specialization of the more generic new function. Acked-by: Paul Menage men...@google.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Receiving delayed packets from RTL8139 card in KVM
Hi Avi, There may be a missing wakeup in the networking path. You can try adding printf()s in RTL8139's receive function, and in kvm_set_irq_level() to see if there's a problem in interrupt flow. Thanks for your insights. I debugged the RTL8139's receive function rtl8139_do_receive() and the functions rtl8139_update_irq() (responsible for initiating the rtl8139 interrupts) and kvm_set_irq_level(). I came to know that the packets were not delayed as opposed to previous assumption. Also the interrupts were promptly provided by the rtl8139 using rtl8139_update_irq(). Also it seems kvm_set_irq_level() injects the interrupts to the guest properly. But inside the guest(kitten LWK) I see the number of interrupts to be less than what is injected from KVM. This may be due to the interrupts being coalesced in KVM before injection or the guest somehow misses some interrupts. I feel the coalescing of the interrupts may cause the problem in my application. For eg. Two packets needs to be received, hence instead of injecting two interrupts only one coalesced interrupt is injected to the guest. The RTL8139 driver in guest on receiving the coalesced interrupt reads the first packet(which is present in buffer) but when it tries to read the next packet immediately, the packet has not yet arrived to the guest's RTL8139's buffer hence it assumes no more data and returns. The second packet here is not received leading to retransmission. But in this scenario if a second interrupt occured then the driver would have once again checked the buffer to find the data available thereby avoiding retransmission. Please let me know what is the minimum time between interrupts to be considered for coalescing. Is it possible to reduce this time further as it will be useful in my case. Please let me know your comments on this issue. Thanks for your time, karthik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio-9p error
Hi, Every time I try to rsync something to my 9p fs mounted I get the following message on guest kernel: 8 [ 45.866789] 9pnet_virtio virtio2: requests:id 0 is not a head! 8 I'm sending the FS with the following argument via qemu-kvm (commit head 422476cc422 of August 12): -virtfs local,path=/home/outras/montecristo,mount_tag=home,security_model=passthrough And I'm mounting on the guest with: mount -t 9p -o trans=virtio home /home My guest is running kernel 2.6.34. After I start the rsync (a second later) i get the message i showed above and the the virtio fs crashes and i have to reboot the virtual machine. Another 'interesting' thing is that the filesystem wont show in 'df'. But appears correct in /proc/mounts 8 montecristo-nova:/home# mount|tail -1 home on /home type 9p (rw,sync,dirsync,relatime,trans=virtio) 8 Here is a little of my rsync output: 8 montecristo-nova:/home# cd nobackup/ montecristo-nova:/home/nobackup# rsync -aHx --progress xadrezlivre:/ xadrezlivre/ receiving incremental file list rsync: symlink /home/nobackup/xadrezlivre/cdrom - media/cdrom failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/initrd.img - boot/initrd.img-2.6.26-1-486 failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/initrd.img.old - boot/initrd.img-2.6.18-6-486 failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/vmlinuz - boot/vmlinuz-2.6.26-1-486 failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/vmlinuz.old - boot/vmlinuz-2.6.18-6-486 failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/bin/nc - /etc/alternatives/nc failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/bin/netcat - /etc/alternatives/netcat failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/bin/pidof - ../sbin/killall5 failed: No such file or directory (2) rsync: symlink /home/nobackup/xadrezlivre/bin/rnano - nano failed: No such file or directory (2) 8 If anyone knows what it is or what else should I send to understand better the problem, please tell me Thanks in advance -- Bruno Ribas - ri...@c3sl.ufpr.br http://www.inf.ufpr.br/ribas C3SL: http://www.c3sl.ufpr.br -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
latest KVM tree build fail on 32bit system
Hi, did anyone see latest kvm build fail on 32bit system? kvm commit: 152d921348ee2ac7bb73d599c5796a027f0a660c gcc version 4.1.2 ... LD arch/x86/boot/setup.elf OBJCOPY arch/x86/boot/setup.bin OBJCOPY arch/x86/boot/vmlinux.bin BUILD arch/x86/boot/bzImage Root device is (8, 2) Setup is 14008 bytes (padded to 14336 bytes). System is 2565 kB CRC 72558b30 Kernel: arch/x86/boot/bzImage is ready (#3) Building modules, stage 2. MODPOST 958 modules WARNING: drivers/block/cpqarray.o(.devinit.text+0x20d): Section mismatch in reference from the function cpqarray_register_ctlr() to the function .init.text:ida_procinit() The function __devinit cpqarray_register_ctlr() references a function __init ida_procinit(). If ida_procinit is only used by cpqarray_register_ctlr then annotate ida_procinit with a matching annotation. WARNING: net/bluetooth/rfcomm/rfcomm.o(.init.text+0xac): Section mismatch in reference from the function init_module() to the function .exit.text:rfcomm_cleanup_ttys() The function __init init_module() references a function __exit rfcomm_cleanup_ttys(). This is often seen when error handling in the init function uses functionality in the exit path. The fix is often to remove the __exit annotation of rfcomm_cleanup_ttys() so it may be used outside an exit section. ERROR: __udivdi3 [arch/x86/kvm/kvm.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 Best Regards, Xudong Hao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html