[PATCH] KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)

2010-08-25 Thread Wei Yongjun
Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 26f69c2..e5f7f5c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2487,7 +2487,7 @@ static struct opcode opcode_table[256] = {
X8(D(SrcAcc | DstReg)),
/* 0x98 - 0x9F */
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
-   D(SrcImmFAddr | No64), N,
+   I(SrcImmFAddr | No64, em_call_far), N,
D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N,
/* 0xA0 - 0xA7 */
D(ByteOp | DstAcc | SrcMem | Mov | MemAbs), D(DstAcc | SrcMem | Mov | 
MemAbs),
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add realmode test for CALL FAR IMM instruction

2010-08-25 Thread Wei Yongjun
Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 x86/realmode.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/x86/realmode.c b/x86/realmode.c
index a833829..2e12680 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -437,6 +437,9 @@ void test_call(void)
ret\n\t
2:\t);
MK_INSN(call_far1,  lcallw *(%ebx)\n\t);
+   MK_INSN(call_far2,  .byte 0x9a\n\t
+   .word retf\n\t
+   .word 0x00\n\t);
MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b);
 
exec_in_big_real_mode(insn_call1);
@@ -453,6 +456,9 @@ void test_call(void)
exec_in_big_real_mode(insn_call_far1);
report(call far 1, 0, 1);
 
+   exec_in_big_real_mode(insn_call_far2);
+   report(call far 2, 0, 1);
+
exec_in_big_real_mode(insn_ret_imm);
report(ret imm 1, 0, 1);
 }
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add realmode test for CALL FAR IMM instruction

2010-08-25 Thread Avi Kivity
 On 08/25/2010 09:16 AM, Wei Yongjun wrote:
 Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
 ---
  x86/realmode.c |6 ++
  1 files changed, 6 insertions(+), 0 deletions(-)

 diff --git a/x86/realmode.c b/x86/realmode.c
 index a833829..2e12680 100644
 --- a/x86/realmode.c
 +++ b/x86/realmode.c
 @@ -437,6 +437,9 @@ void test_call(void)
   ret\n\t
   2:\t);
   MK_INSN(call_far1,  lcallw *(%ebx)\n\t);
 + MK_INSN(call_far2,  .byte 0x9a\n\t
 + .word retf\n\t
 + .word 0x00\n\t);

Why .byte encoding? won't lcallw $0, $retf (or the other way round) work?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add realmode test for CALL FAR IMM instruction

2010-08-25 Thread Wei Yongjun


  On 08/25/2010 09:16 AM, Wei Yongjun wrote:
   
 Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
 ---
  x86/realmode.c |6 ++
  1 files changed, 6 insertions(+), 0 deletions(-)

 diff --git a/x86/realmode.c b/x86/realmode.c
 index a833829..2e12680 100644
 --- a/x86/realmode.c
 +++ b/x86/realmode.c
 @@ -437,6 +437,9 @@ void test_call(void)
  ret\n\t
  2:\t);
  MK_INSN(call_far1,  lcallw *(%ebx)\n\t);
 +MK_INSN(call_far2,  .byte 0x9a\n\t
 +.word retf\n\t
 +.word 0x00\n\t);
 
 Why .byte encoding? won't lcallw $0, $retf (or the other way round) work?
   

Oh, it works, I will fix this.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] Add realmode test for CALL FAR IMM instruction

2010-08-25 Thread Wei Yongjun
Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 x86/realmode.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/x86/realmode.c b/x86/realmode.c
index a833829..d171a56 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -437,6 +437,7 @@ void test_call(void)
ret\n\t
2:\t);
MK_INSN(call_far1,  lcallw *(%ebx)\n\t);
+   MK_INSN(call_far2,  lcallw $0, $retf\n\t);
MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b);
 
exec_in_big_real_mode(insn_call1);
@@ -453,6 +454,9 @@ void test_call(void)
exec_in_big_real_mode(insn_call_far1);
report(call far 1, 0, 1);
 
+   exec_in_big_real_mode(insn_call_far2);
+   report(call far 2, 0, 1);
+
exec_in_big_real_mode(insn_ret_imm);
report(ret imm 1, 0, 1);
 }
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-08-25 Thread Heiko Carstens
On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote:
 +static void hotplug_devices(struct work_struct *dummy)
 +{
 + unsigned int i;
 + struct kvm_device_desc *d;
 + struct device *dev;
 +
 + for (i = 0; i  PAGE_SIZE; i += desc_size(d)) {

This should be 

for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) {

otherwise you might have memory accesses beyond the device page...

 + d = kvm_devices + i;
 +
 + /* end of list */
 + if (d-type == 0)
 + break;

...even if that should not happen if everything works.
But let's be paranoid.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-08-25 Thread Alexander Graf

On 25.08.2010, at 10:16, Heiko Carstens wrote:

 On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote:
 +static void hotplug_devices(struct work_struct *dummy)
 +{
 +unsigned int i;
 +struct kvm_device_desc *d;
 +struct device *dev;
 +
 +for (i = 0; i  PAGE_SIZE; i += desc_size(d)) {
 
 This should be 
 
   for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) {
 
 otherwise you might have memory accesses beyond the device page...

Oh, this is a simple copypaste from the original search method. Is d valid in 
the first part of the loop already?

 
 +d = kvm_devices + i;
 +
 +/* end of list */
 +if (d-type == 0)
 +break;
 
 ...even if that should not happen if everything works.
 But let's be paranoid.

Yeah :). I like paranoid.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Nils Cant

Hey guys,

not sure if this is a bug or a feature request. It's just something 
we've noticed and are having problems with.


We're using the qemu-kvm lenny-backports package on Debian 5.0.5.

When doing a live migration from the virsh shell, the server in question 
becomes unreachable because the ARP cache on our switches still thinks 
the server is on another port.


As soon as the server sends out some traffic, such as a ping, the ARP 
cache get's updated as expected. If it does nothing, the server remains 
unreachable until the ARP cache expires on the switches. (in our case 4 
hours)


We would like to be able to do live migration for customer machines on 
which we have no access, so we really need KVM to send out an ARP 
announcement/gratuitous ARP when doing a live migration.


Could anyone tell me if this is a bug in KVM, libvirt, or the debian 
qemu-kvm package? (or if I'm doing something wrong? :-) )


I've been tcpdumping the bridged network interfaces on the hosts while 
doing the migrate, and couldn't see any ARP broadcasts.


Debian 5.0.5
Kernel: 2.6.32-bpo.3-amd64
qemu-kvm  0.12.4+dfsg-1~bpo50+2
libvirt0  0.7.6-1~bpo50+1

migrate --live testserver qemu+ssh://192.168.1.3/system

Thanks in advance,

Nils
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-08-25 Thread Heiko Carstens
On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote:
 On 25.08.2010, at 10:16, Heiko Carstens wrote:
  On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote:
  +static void hotplug_devices(struct work_struct *dummy)
  +{
  +  unsigned int i;
  +  struct kvm_device_desc *d;
  +  struct device *dev;
  +
  +  for (i = 0; i  PAGE_SIZE; i += desc_size(d)) {
  
  This should be 
  
  for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) {
  
  otherwise you might have memory accesses beyond the device page...
 
 Oh, this is a simple copypaste from the original search method.
 Is d valid in the first part of the loop already?

No, you would need to initialize it with kvm_devices if you change
the loop.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-08-25 Thread Alexander Graf

On 25.08.2010, at 10:35, Heiko Carstens wrote:

 On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote:
 On 25.08.2010, at 10:16, Heiko Carstens wrote:
 On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote:
 +static void hotplug_devices(struct work_struct *dummy)
 +{
 +  unsigned int i;
 +  struct kvm_device_desc *d;
 +  struct device *dev;
 +
 +  for (i = 0; i  PAGE_SIZE; i += desc_size(d)) {
 
 This should be 
 
 for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) {
 
 otherwise you might have memory accesses beyond the device page...
 
 Oh, this is a simple copypaste from the original search method.
 Is d valid in the first part of the loop already?
 
 No, you would need to initialize it with kvm_devices if you change
 the loop.

But even then it would take the size of the current entry, not the next one 
we're actually looking for, no? Maybe it'd be easier to just convert this into 
a while loop and explicitly open code everything.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Michael Tokarev
25.08.2010 11:52, Nils Cant wrote:
 Hey guys,
 
 not sure if this is a bug or a feature request. It's just something
 we've noticed and are having problems with.
 
 We're using the qemu-kvm lenny-backports package on Debian 5.0.5.
 
 When doing a live migration from the virsh shell, the server in question
 becomes unreachable because the ARP cache on our switches still thinks
 the server is on another port.
 
 As soon as the server sends out some traffic, such as a ping, the ARP
 cache get's updated as expected. If it does nothing, the server remains
 unreachable until the ARP cache expires on the switches. (in our case 4
 hours)
 
 We would like to be able to do live migration for customer machines on
 which we have no access, so we really need KVM to send out an ARP
 announcement/gratuitous ARP when doing a live migration.
 
 Could anyone tell me if this is a bug in KVM, libvirt, or the debian
 qemu-kvm package? (or if I'm doing something wrong? :-) )

It's probably a bug in your understanding ;)

Jokes aside, the thing is that kvm does not know what is
an ARP and what is an IP address.  It emulates a hardware
network card, which never sends any ARP out by its own,
it is the operating system IP stack who's doing that.
That network card as emulated by kvm does not know what
IP addresses are assigned to it inside the guest (there
may be many, or may be none at all), so it just can not
send the ARPs.

These ARPs should be sent by guest.  Another question is
how to force/tell it to do so, and this is, again, depends
on the guest operating system, number of addresses assigned
to the interface and so on.

The mechanism to trigger it may be based on link status
of the card for example - kvm may lower it for a few ms
right after migration, to indicate that the cord were
un-plugged and plugged back, to force the guest to do
whatever it needs to do...  But that's just a possibility
for future development.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Gleb Natapov
On Wed, Aug 25, 2010 at 09:52:16AM +0200, Nils Cant wrote:
 Hey guys,
 
 not sure if this is a bug or a feature request. It's just something
 we've noticed and are having problems with.
 
 We're using the qemu-kvm lenny-backports package on Debian 5.0.5.
 
 When doing a live migration from the virsh shell, the server in
 question becomes unreachable because the ARP cache on our switches
 still thinks the server is on another port.
 
 As soon as the server sends out some traffic, such as a ping, the
 ARP cache get's updated as expected. If it does nothing, the server
 remains unreachable until the ARP cache expires on the switches. (in
 our case 4 hours)
 
 We would like to be able to do live migration for customer machines
 on which we have no access, so we really need KVM to send out an ARP
 announcement/gratuitous ARP when doing a live migration.
 
 Could anyone tell me if this is a bug in KVM, libvirt, or the debian
 qemu-kvm package? (or if I'm doing something wrong? :-) )
 
qemu sends gratuitous ARP after migration. Check forward delay setting on your
bridge interface. It should be set to zero.
  
 I've been tcpdumping the bridged network interfaces on the hosts
 while doing the migrate, and couldn't see any ARP broadcasts.
 
 Debian 5.0.5
 Kernel: 2.6.32-bpo.3-amd64
 qemu-kvm  0.12.4+dfsg-1~bpo50+2
 libvirt0  0.7.6-1~bpo50+1
 
 migrate --live testserver qemu+ssh://192.168.1.3/system
 
 Thanks in advance,
 
 Nils
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Gleb Natapov
On Wed, Aug 25, 2010 at 12:37:21PM +0400, Michael Tokarev wrote:
 25.08.2010 11:52, Nils Cant wrote:
  Hey guys,
  
  not sure if this is a bug or a feature request. It's just something
  we've noticed and are having problems with.
  
  We're using the qemu-kvm lenny-backports package on Debian 5.0.5.
  
  When doing a live migration from the virsh shell, the server in question
  becomes unreachable because the ARP cache on our switches still thinks
  the server is on another port.
  
  As soon as the server sends out some traffic, such as a ping, the ARP
  cache get's updated as expected. If it does nothing, the server remains
  unreachable until the ARP cache expires on the switches. (in our case 4
  hours)
  
  We would like to be able to do live migration for customer machines on
  which we have no access, so we really need KVM to send out an ARP
  announcement/gratuitous ARP when doing a live migration.
  
  Could anyone tell me if this is a bug in KVM, libvirt, or the debian
  qemu-kvm package? (or if I'm doing something wrong? :-) )
 
 It's probably a bug in your understanding ;)
 
 Jokes aside, the thing is that kvm does not know what is
 an ARP and what is an IP address.  It emulates a hardware
 network card, which never sends any ARP out by its own,
 it is the operating system IP stack who's doing that.
 That network card as emulated by kvm does not know what
 IP addresses are assigned to it inside the guest (there
 may be many, or may be none at all), so it just can not
 send the ARPs.
 
True. Although qemu sends gratuitous ARP the IP field there is
incorrect. It is done to update layer 2 topology, not layer 3.

 These ARPs should be sent by guest.  Another question is
 how to force/tell it to do so, and this is, again, depends
 on the guest operating system, number of addresses assigned
 to the interface and so on.
 
 The mechanism to trigger it may be based on link status
 of the card for example - kvm may lower it for a few ms
 right after migration, to indicate that the cord were
 un-plugged and plugged back, to force the guest to do
 whatever it needs to do...  But that's just a possibility
 for future development.
 
 /mjt
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-08-25 Thread Heiko Carstens
On Wed, Aug 25, 2010 at 10:34:29AM +0200, Alexander Graf wrote:
 On 25.08.2010, at 10:35, Heiko Carstens wrote:
  On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote:
  On 25.08.2010, at 10:16, Heiko Carstens wrote:
  On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote:
  +static void hotplug_devices(struct work_struct *dummy)
  +{
  +unsigned int i;
  +struct kvm_device_desc *d;
  +struct device *dev;
  +
  +for (i = 0; i  PAGE_SIZE; i += desc_size(d)) {
  
  This should be 
  
for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) {
  
  otherwise you might have memory accesses beyond the device page...
  
  Oh, this is a simple copypaste from the original search method.
  Is d valid in the first part of the loop already?
  
  No, you would need to initialize it with kvm_devices if you change
  the loop.
 
 But even then it would take the size of the current entry, not the next
 one we're actually looking for, no? Maybe it'd be easier to just convert
 this into a while loop and explicitly open code everything.

Yes indeed. It's going to be quite ugly if you want to make sure that at
no time you're going to access memory beyond the device page. Eeek.. :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-08-25 Thread Alexander Graf

On 25.08.2010, at 10:48, Heiko Carstens wrote:

 On Wed, Aug 25, 2010 at 10:34:29AM +0200, Alexander Graf wrote:
 On 25.08.2010, at 10:35, Heiko Carstens wrote:
 On Wed, Aug 25, 2010 at 10:20:03AM +0200, Alexander Graf wrote:
 On 25.08.2010, at 10:16, Heiko Carstens wrote:
 On Tue, Aug 24, 2010 at 03:48:51PM +0200, Alexander Graf wrote:
 +static void hotplug_devices(struct work_struct *dummy)
 +{
 +unsigned int i;
 +struct kvm_device_desc *d;
 +struct device *dev;
 +
 +for (i = 0; i  PAGE_SIZE; i += desc_size(d)) {
 
 This should be 
 
   for (i = 0; i + desc_size(d) = PAGE_SIZE; i += desc_size(d)) {
 
 otherwise you might have memory accesses beyond the device page...
 
 Oh, this is a simple copypaste from the original search method.
 Is d valid in the first part of the loop already?
 
 No, you would need to initialize it with kvm_devices if you change
 the loop.
 
 But even then it would take the size of the current entry, not the next
 one we're actually looking for, no? Maybe it'd be easier to just convert
 this into a while loop and explicitly open code everything.
 
 Yes indeed. It's going to be quite ugly if you want to make sure that at
 no time you're going to access memory beyond the device page. Eeek.. :)

Thinking about this a bit more ... it's no problem to access beyond the device 
page. After the device page follow the rings and we always have rings because 
we always have a virtio console :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Nils Cant

On 08/25/2010 10:38 AM, Gleb Natapov wrote:

qemu sends gratuitous ARP after migration. Check forward delay setting on your
bridge interface. It should be set to zero.



Aha! That fixed it. Turns out that debian bridge-utils sets the default 
to 15 for bridges.
Manually setting it to 0 with 'brctl setfd br0 0' or setting the 
'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue.


Thanks for the help!

Nils
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-25 Thread Jason Wang

- Avi Kivity a...@redhat.com wrote:

 On 08/24/2010 04:47 PM, Jason Wang wrote:
  TSS size should be 104 byte.
 
  Signed-off-by: Jason Wangjasow...@redhat.com
  ---
x86/cstart64.S |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/x86/cstart64.S b/x86/cstart64.S
  index 5d358ad..b871153 100644
  --- a/x86/cstart64.S
  +++ b/x86/cstart64.S
  @@ -69,7 +69,7 @@ tss:
  .long 0
  .quad ring0stacktop - i * 4096
 
 ring 0 stack
 
  .quad 0, 0, 0
 
 rings 1, 2, 3 stack

Hello avi:

Rechek with the manual, there's no filed of RSP3. So this patch may
make sense. But unfortunately it breaks 64bit vmexit test. Triple
fault happens in setup_args(). Any suggestions or is there any thing I
missed?

 
  -   .quad 0, 0, 0, 0, 0, 0, 0, 0
 
 1 qword reserved, 7 qwords IST
 
  +   .quad 0, 0, 0, 0, 0, 0, 0
  .long 0, 0, 0
 
 3 dwords reserved + I/O map base address - so this looks correct?
 
i = i + 1
  .endr
 
 
 
 -- 
 error compiling committee.c: too many arguments to function
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 2/3] KVM: x86 emulator: move string instruction completion check into separate function

2010-08-25 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   37 -
 1 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8a77c99..922f874 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2938,6 +2938,28 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static bool string_insn_completed(struct x86_emulate_ctxt *ctxt)
+{
+   struct decode_cache *c = ctxt-decode;
+
+   /* The second termination condition only applies for REPE
+* and REPNE. Test if the repeat string operation prefix is
+* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
+* corresponding termination condition according to:
+*  - if REPE/REPZ and ZF = 0 then done
+*  - if REPNE/REPNZ and ZF = 1 then done
+*/
+   if (((c-b == 0xa6) || (c-b == 0xa7) ||
+(c-b == 0xae) || (c-b == 0xaf))
+(((c-rep_prefix == REPE_PREFIX) 
+((ctxt-eflags  EFLG_ZF) == 0))
+   || ((c-rep_prefix == REPNE_PREFIX) 
+   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF
+   return true;
+
+   return false;
+}
+
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 {
@@ -3426,19 +3448,8 @@ writeback:
if (c-rep_prefix  (c-d  String)) {
struct read_cache *r = ctxt-decode.io_read;
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
-   /* The second termination condition only applies for REPE
-* and REPNE. Test if the repeat string operation prefix is
-* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
-* corresponding termination condition according to:
-*  - if REPE/REPZ and ZF = 0 then done
-*  - if REPNE/REPNZ and ZF = 1 then done
-*/
-   if (((c-b == 0xa6) || (c-b == 0xa7) ||
-(c-b == 0xae) || (c-b == 0xaf))
-(((c-rep_prefix == REPE_PREFIX) 
-((ctxt-eflags  EFLG_ZF) == 0))
-   || ((c-rep_prefix == REPNE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF
+
+   if (string_insn_completed(ctxt))
ctxt-restart = false;
/*
 * Re-enter guest when pio read ahead buffer is empty or,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 1/3] KVM: x86 emulator: Rename variable that shadows another local variable.

2010-08-25 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4a89348..8a77c99 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3424,7 +3424,7 @@ writeback:
c-dst);
 
if (c-rep_prefix  (c-d  String)) {
-   struct read_cache *rc = ctxt-decode.io_read;
+   struct read_cache *r = ctxt-decode.io_read;
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
/* The second termination condition only applies for REPE
 * and REPNE. Test if the repeat string operation prefix is
@@ -3444,8 +3444,8 @@ writeback:
 * Re-enter guest when pio read ahead buffer is empty or,
 * if it is not used, after each 1024 iteration.
 */
-   else if ((rc-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
-(rc-end != 0  rc-end == rc-pos)) {
+   else if ((r-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
+(r-end != 0  r-end == r-pos)) {
ctxt-restart = false;
c-eip = ctxt-eip;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 3/3] KVM: x86 emulator: get rid of restart in emulation context.

2010-08-25 Thread Gleb Natapov
x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |4 ++-
 arch/x86/kvm/emulate.c |   43 
 arch/x86/kvm/x86.c |   16 ++--
 3 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f22e5da..3a82ecd 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -220,7 +220,6 @@ struct x86_emulate_ctxt {
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
 
-   bool restart; /* restart string instruction after writeback */
bool perm_ok; /* do not check permissions if true */
 
int exception; /* exception that happens during emulation or -1 */
@@ -251,6 +250,9 @@ struct x86_emulate_ctxt {
 #endif
 
 int x86_decode_insn(struct x86_emulate_ctxt *ctxt);
+#define EMULATION_FAILED -1
+#define EMULATION_OK 0
+#define EMULATION_RESTART 1
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 u16 tss_selector, int reason,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 922f874..a4cf1f6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -438,7 +438,6 @@ static void emulate_exception(struct x86_emulate_ctxt 
*ctxt, int vec,
ctxt-exception = vec;
ctxt-error_code = error;
ctxt-error_code_valid = valid;
-   ctxt-restart = false;
 }
 
 static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
@@ -2635,9 +2634,6 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt)
struct opcode opcode, *g_mod012, *g_mod3;
struct operand memop = { .type = OP_NONE };
 
-   /* we cannot decode insn before we complete previous rep insn */
-   WARN_ON(ctxt-restart);
-
c-eip = ctxt-eip;
c-fetch.start = c-fetch.end = c-eip;
ctxt-cs_base = seg_base(ctxt, ops, VCPU_SREG_CS);
@@ -2990,10 +2986,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
}
 
if (c-rep_prefix  (c-d  String)) {
-   ctxt-restart = true;
/* All REP prefixes have the same first termination condition */
if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
-   ctxt-restart = false;
ctxt-eip = c-eip;
goto done;
}
@@ -3449,28 +3443,29 @@ writeback:
struct read_cache *r = ctxt-decode.io_read;
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
 
-   if (string_insn_completed(ctxt))
-   ctxt-restart = false;
-   /*
-* Re-enter guest when pio read ahead buffer is empty or,
-* if it is not used, after each 1024 iteration.
-*/
-   else if ((r-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
-(r-end != 0  r-end == r-pos)) {
-   ctxt-restart = false;
-   c-eip = ctxt-eip;
+   if (!string_insn_completed(ctxt)) {
+   /*
+* Re-enter guest when pio read ahead buffer is empty
+* or, if it is not used, after each 1024 iteration.
+*/
+   if ((r-end != 0 || c-regs[VCPU_REGS_RCX]  0x3ff) 
+   (r-end == 0 || r-end != r-pos)) {
+   /*
+* Reset read cache. Usually happens before
+* decode, but since instruction is restarted
+* we have to do it here.
+*/
+   ctxt-decode.mem_read.end = 0;
+   return EMULATION_RESTART;
+   }
+   goto done; /* skip rip writeback */
}
}
-   /*
-* reset read cache here in case string instruction is restared
-* without decoding
-*/
-   ctxt-decode.mem_read.end = 0;
-   if (!ctxt-restart)
-   ctxt-eip = c-eip;
+
+   ctxt-eip = c-eip;
 
 done:
-   return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
+   return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK;
 
 twobyte_insn:
switch (c-b) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 524bd3f..112213e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4171,18 +4171,17 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 restart:
r = x86_emulate_insn(vcpu-arch.emulate_ctxt);
 
-   if (r) { /* emulation failed */
+   if (r == EMULATION_FAILED) {
if 

Re: [PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-25 Thread Avi Kivity

 On 08/25/2010 12:40 PM, Jason Wang wrote:

- Avi Kivitya...@redhat.com  wrote:


On 08/24/2010 04:47 PM, Jason Wang wrote:

TSS size should be 104 byte.

Signed-off-by: Jason Wangjasow...@redhat.com
---
   x86/cstart64.S |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/x86/cstart64.S b/x86/cstart64.S
index 5d358ad..b871153 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -69,7 +69,7 @@ tss:
.long 0
.quad ring0stacktop - i * 4096

ring 0 stack


.quad 0, 0, 0

rings 1, 2, 3 stack

Hello avi:

Rechek with the manual, there's no filed of RSP3. So this patch may
make sense.


That is true.  But please redo it to remove one 0 from the line above, 
not from the IST.



But unfortunately it breaks 64bit vmexit test. Triple
fault happens in setup_args(). Any suggestions or is there any thing I
missed?


No idea.  Can you post an ftrace of the crash?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: guest MAC-address isolation

2010-08-25 Thread Avi Kivity

 On 08/20/2010 08:48 PM, Robert Rebstock wrote:

Hello.
Thank you for your answer.


- Original Message -
From: Avi Kivitya...@redhat.com
To: Robert Rebstockrebst...@scienceworks.com
Cc: kvm@vger.kernel.org
Sent: Tuesday, August 17, 2010 11:36:41 AM
Subject: Re: guest MAC-address isolation

   On 08/06/2010 08:09 PM, Robert Rebstock wrote:

Hello all,

can anyone recommend a better way to achieve (guest agnostic) MAC-address
isolation in qemu/kvm then with user-mode networking?

I have multiple guests requiring the same MAC-address, and user-mode/slirp
networking is quite slow.


You can put the different guests on different bridges, and use IP
routing to connect the two bridges; or you can use ebtables to mangle
the MAC addresses.


Could you possibly give me an example? Unfortunately my networking skills are 
not the best,
which is not to say that I don't try. The best I can do, after reading the
documentation I could find, is:

ebtables -t nat -A PREROUTING  -d 00:11:11:11:11:11 -j dnat --to-dest 
00:01:23:45:67:89 --dnat-target ACCEPT
ebtables -t nat -A POSTROUTING -s 00:01:23:45:67:89 -j snat --to-src 
00:11:11:11:11:11 --snat-arp --snat-target ACCEPT

but I can see no way to mangle multiple identical MACs so as to achieve layer-2
isolation for my snapshotted VMs.



You could use --in-interface to select packets based on which guest they 
originated from (for snat).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 3/3] KVM: x86 emulator: get rid of restart in emulation context.

2010-08-25 Thread Avi Kivity

 On 08/25/2010 12:47 PM, Gleb Natapov wrote:

x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.


Looks good.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity

 On 08/25/2010 12:21 PM, Nils Cant wrote:

On 08/25/2010 10:38 AM, Gleb Natapov wrote:
qemu sends gratuitous ARP after migration. Check forward delay 
setting on your

bridge interface. It should be set to zero.



Aha! That fixed it. Turns out that debian bridge-utils sets the 
default to 15 for bridges.
Manually setting it to 0 with 'brctl setfd br0 0' or setting the 
'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue.




I think libvirt is doing something about this, copying list for further 
info.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Michael Tokarev
Gleb Natapov wrote:
 On Wed, Aug 25, 2010 at 12:37:21PM +0400, Michael Tokarev wrote:
[]
 Jokes aside, the thing is that kvm does not know what is
 an ARP and what is an IP address.  It emulates a hardware
 network card, which never sends any ARP out by its own,
 it is the operating system IP stack who's doing that.
 That network card as emulated by kvm does not know what
 IP addresses are assigned to it inside the guest (there
 may be many, or may be none at all), so it just can not
 send the ARPs.

 True. Although qemu sends gratuitous ARP the IP field there is
 incorrect. It is done to update layer 2 topology, not layer 3.

Actually, the more I think about that, the more it looks
like a job for external (for the guest) piece.

For example, we may teach libvirt or kvm about IP addresses
the guest is using, so that kvm will send these ARPs automatically
after migration has completed.

It shouldn't be difficult to implement.  Something like:

 -net nic,model=virtio,arp=1.2.3.4:5.6.7.8,mac=foo:bar

or, even,

 -net tap,arp=...,...

for the command-line interface, and/or a 'sendarp' monitor
command that expects a network device and a list of ip
addresses.

Kvm is the most natural place to do that, I think, and it's
easy to implement there too (it has the tun device which can
inject packets on behalf of the guest)   Yes, the configuration
will be duplicated somehow, but that's not a big problem, and
it will make things much more reliable.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Daniel P. Berrange
On Wed, Aug 25, 2010 at 01:40:19PM +0300, Avi Kivity wrote:
  On 08/25/2010 12:21 PM, Nils Cant wrote:
 On 08/25/2010 10:38 AM, Gleb Natapov wrote:
 qemu sends gratuitous ARP after migration. Check forward delay 
 setting on your
 bridge interface. It should be set to zero.
 
 
 Aha! That fixed it. Turns out that debian bridge-utils sets the 
 default to 15 for bridges.
 Manually setting it to 0 with 'brctl setfd br0 0' or setting the 
 'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue.
 
 
 I think libvirt is doing something about this, copying list for further 
 info.

libvirt doesn't set a policy for this. It provides an API for 
configuring host networking, but we don't override the kernel's
forward delay policy, since we don't presume that all bridges 
are going to have VMs attached. In any case the API isn't available
for Debian yet, since no one has ported netcf to Debian, so I 
assume the OP set bridging up manually. The '15' second default is
actually a kernel level default IIRC. 

The two main host network configs recommended for use with libvirt+KVM
(either NAT or bridging) are documented here:

  http://wiki.libvirt.org/page/Networking

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Gleb Natapov
On Wed, Aug 25, 2010 at 02:51:31PM +0400, Michael Tokarev wrote:
 Gleb Natapov wrote:
  On Wed, Aug 25, 2010 at 12:37:21PM +0400, Michael Tokarev wrote:
 []
  Jokes aside, the thing is that kvm does not know what is
  an ARP and what is an IP address.  It emulates a hardware
  network card, which never sends any ARP out by its own,
  it is the operating system IP stack who's doing that.
  That network card as emulated by kvm does not know what
  IP addresses are assigned to it inside the guest (there
  may be many, or may be none at all), so it just can not
  send the ARPs.
 
  True. Although qemu sends gratuitous ARP the IP field there is
  incorrect. It is done to update layer 2 topology, not layer 3.
 
 Actually, the more I think about that, the more it looks
 like a job for external (for the guest) piece.
 
 For example, we may teach libvirt or kvm about IP addresses
 the guest is using, so that kvm will send these ARPs automatically
 after migration has completed.
 
 It shouldn't be difficult to implement.  Something like:
 
  -net nic,model=virtio,arp=1.2.3.4:5.6.7.8,mac=foo:bar
 
Back to static IP age?

 or, even,
 
  -net tap,arp=...,...
 
 for the command-line interface, and/or a 'sendarp' monitor
 command that expects a network device and a list of ip
 addresses.
 
 Kvm is the most natural place to do that, I think, and it's
 easy to implement there too (it has the tun device which can
 inject packets on behalf of the guest)   Yes, the configuration
 will be duplicated somehow, but that's not a big problem, and
 it will make things much more reliable.
 
KVM is certainly not the most natural place to do that. Even gratuitous
ARP we have today will not work if guest changes mac address. KVM
couldn't care less about host network protocols. Management may implement
guest daemon that will take appropriate action to restore networking
after migration on demand (send gratuitous pigeon if IP over pigeons are
used by guest).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity

 On 08/25/2010 01:52 PM, Daniel P. Berrange wrote:



I think libvirt is doing something about this, copying list for further
info.

libvirt doesn't set a policy for this. It provides an API for
configuring host networking, but we don't override the kernel's
forward delay policy, since we don't presume that all bridges
are going to have VMs attached. In any case the API isn't available
for Debian yet, since no one has ported netcf to Debian, so I
assume the OP set bridging up manually. The '15' second default is
actually a kernel level default IIRC.

The two main host network configs recommended for use with libvirt+KVM
(either NAT or bridging) are documented here:

   http://wiki.libvirt.org/page/Networking


From that page:

# virsh net-define /usr/share/libvirt/networks/default.xml

From my copy of that file:

network
  namedefault/name
  bridge name=virbr0 /
  forward/
  ip address=192.168.122.1 netmask=255.255.255.0
dhcp
  range start=192.168.122.2 end=192.168.122.254 /
/dhcp
  /ip
/network

So it looks like the default config uses the kernel default?  If libvirt 
uses an existing bridge I agree it shouldn't hack it, but if it creates 
its own can't it use a sensible default?



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Gleb Natapov
On Wed, Aug 25, 2010 at 01:59:02PM +0300, Gleb Natapov wrote:
  Kvm is the most natural place to do that, I think, and it's
  easy to implement there too (it has the tun device which can
  inject packets on behalf of the guest)   Yes, the configuration
  will be duplicated somehow, but that's not a big problem, and
  it will make things much more reliable.
  
 KVM is certainly not the most natural place to do that. Even gratuitous
 ARP we have today will not work if guest changes mac address. KVM
 couldn't care less about host network protocols. Management may implement
Correction: couldn't care less about _guest_ network protocols

 guest daemon that will take appropriate action to restore networking
 after migration on demand (send gratuitous pigeon if IP over pigeons are
 used by guest).
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Daniel P. Berrange
On Wed, Aug 25, 2010 at 02:05:45PM +0300, Avi Kivity wrote:
  On 08/25/2010 01:52 PM, Daniel P. Berrange wrote:
 
 I think libvirt is doing something about this, copying list for further
 info.
 libvirt doesn't set a policy for this. It provides an API for
 configuring host networking, but we don't override the kernel's
 forward delay policy, since we don't presume that all bridges
 are going to have VMs attached. In any case the API isn't available
 for Debian yet, since no one has ported netcf to Debian, so I
 assume the OP set bridging up manually. The '15' second default is
 actually a kernel level default IIRC.
 
 The two main host network configs recommended for use with libvirt+KVM
 (either NAT or bridging) are documented here:
 
http://wiki.libvirt.org/page/Networking
 
 From that page:
 
 # virsh net-define /usr/share/libvirt/networks/default.xml
 
 From my copy of that file:
 
 network
   namedefault/name
   bridge name=virbr0 /
   forward/
   ip address=192.168.122.1 netmask=255.255.255.0
 dhcp
   range start=192.168.122.2 end=192.168.122.254 /
 /dhcp
   /ip
 /network
 
 So it looks like the default config uses the kernel default?  If libvirt 
 uses an existing bridge I agree it shouldn't hack it, but if it creates 
 its own can't it use a sensible default?

That is the NAT virtual network. That one *does* default to a forward
delay of 0, but since it is NAT, it is fairly useless for migration
in anycase. If you do 'virsh net-dumpxml default' you should see that
delay='0' was added

The OP was using bridging rather than NAT though, so this XML example
doesn't apply. My comments about libvirt not overriding kenrel policy
for forward delay were WRT full bridging mode, not the NAT mode[1]

Regards,
Daniel

[1] Yes, the NAT mode uses a bridge as an implementation detail, but
there's no physical NIC in that bridge - it is merely to connect
the TAP devices together. Connection to the LAN is forwarded + NAT.
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity

 On 08/25/2010 02:15 PM, Daniel P. Berrange wrote:



So it looks like the default config uses the kernel default?  If libvirt
uses an existing bridge I agree it shouldn't hack it, but if it creates
its own can't it use a sensible default?

That is the NAT virtual network. That one *does* default to a forward
delay of 0, but since it is NAT, it is fairly useless for migration
in anycase. If you do 'virsh net-dumpxml default' you should see that
delay='0' was added

The OP was using bridging rather than NAT though, so this XML example
doesn't apply. My comments about libvirt not overriding kenrel policy
for forward delay were WRT full bridging mode, not the NAT mode[1]


Yes, of course.

Can't libvirt also create a non-NAT bridge?  Looks like it would prevent 
a lot of manual work and opportunity for misconfiguration.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Daniel P. Berrange
On Wed, Aug 25, 2010 at 02:30:01PM +0300, Avi Kivity wrote:
  On 08/25/2010 02:15 PM, Daniel P. Berrange wrote:
 
 So it looks like the default config uses the kernel default?  If libvirt
 uses an existing bridge I agree it shouldn't hack it, but if it creates
 its own can't it use a sensible default?
 That is the NAT virtual network. That one *does* default to a forward
 delay of 0, but since it is NAT, it is fairly useless for migration
 in anycase. If you do 'virsh net-dumpxml default' you should see that
 delay='0' was added
 
 The OP was using bridging rather than NAT though, so this XML example
 doesn't apply. My comments about libvirt not overriding kenrel policy
 for forward delay were WRT full bridging mode, not the NAT mode[1]
 
 Yes, of course.
 
 Can't libvirt also create a non-NAT bridge?  Looks like it would prevent 
 a lot of manual work and opportunity for misconfiguration.

Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the
new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated
the docs to cover this functionality yet though. It also does bonding,
and vlans, etc

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Michael Tokarev
Gleb Natapov wrote:
 On Wed, Aug 25, 2010 at 02:51:31PM +0400, Michael Tokarev wrote:
[]
 For example, we may teach libvirt or kvm about IP addresses
 the guest is using, so that kvm will send these ARPs automatically
 after migration has completed.
[]
 Kvm is the most natural place to do that, I think, and it's
 easy to implement there too (it has the tun device which can
 inject packets on behalf of the guest)   Yes, the configuration
 will be duplicated somehow, but that's not a big problem, and
 it will make things much more reliable.

 KVM is certainly not the most natural place to do that. Even gratuitous
 ARP we have today will not work if guest changes mac address. KVM
 couldn't care less about host network protocols. Management may implement
 guest daemon that will take appropriate action to restore networking
 after migration on demand (send gratuitous pigeon if IP over pigeons are
 used by guest).

I mean something else.  When using standard, the most common configuration,
without fancy settings or technologies like IP over pigeons, the most easy
way to do that is in kvm, it should be just about 20 lines of code or so.
Yes that will not work in some complex setups, where in-guest solution will
be needed, but in that case a guest daemon alone wont help, it will need
to run some script to do custom actions.

For the MAC address changes for example -- the solution is simple: don't
change MAC address in guest.  Or if you do, either teach kvm about that
(so it'll send proper ARP), or implement custom solution in guest, or
don't migrate, or live with delays after migration.  There are multiple
choices.

Yet doing it the simplest way (in kvm) will cover some 99% cases, and doing
it as some daemon in the guest will cover that same 99% cases anyway (for
the rest some custom script will be needed).

So I think it's the best to implement it in kvm in the most stright-forward
and easy way.

Yes, some guest notification is probably needed anyway - not only for this
case with networks but also in order to notify guest about, say, resume
from freeze (after loadvm or migrate from file), afrer migration and so
on, so guest can react to such events in a meaningful way.  But this is
in parallel with the ability to send an ARP after migration.

Just IMHO ofcourse.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity

 On 08/25/2010 02:36 PM, Daniel P. Berrange wrote:



Can't libvirt also create a non-NAT bridge?  Looks like it would prevent
a lot of manual work and opportunity for misconfiguration.

Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the
new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated
the docs to cover this functionality yet though. It also does bonding,
and vlans, etc


Great.

Is virt-manager able to drive this?  it would be great if you could 
drive everything from there.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Daniel P. Berrange
On Wed, Aug 25, 2010 at 02:38:25PM +0300, Avi Kivity wrote:
  On 08/25/2010 02:36 PM, Daniel P. Berrange wrote:
 
 Can't libvirt also create a non-NAT bridge?  Looks like it would prevent
 a lot of manual work and opportunity for misconfiguration.
 Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the
 new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated
 the docs to cover this functionality yet though. It also does bonding,
 and vlans, etc
 
 Great.
 
 Is virt-manager able to drive this?  it would be great if you could 
 drive everything from there.

Yes, it does now, under the menu Edit - Host Details - Network Interfaces
NetworkManager has also finally learnt to ignore ifcfg-XXX files which
have a BRIDGE= setting in them, so it shouldn't totally trash your guest
bridge networking if you leave NM running.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity

 On 08/25/2010 02:42 PM, Daniel P. Berrange wrote:



Is virt-manager able to drive this?  it would be great if you could
drive everything from there.

Yes, it does now, under the menu Edit -  Host Details -  Network Interfaces
NetworkManager has also finally learnt to ignore ifcfg-XXX files which
have a BRIDGE= setting in them, so it shouldn't totally trash your guest
bridge networking if you leave NM running.


Cool.  I guess what remains is to get people to unlearn all the previous 
hacks.


(also would be nice to have libvirt talk to NetworkManager instead of 
/etc/sysconfig)


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Gleb Natapov
On Wed, Aug 25, 2010 at 03:36:29PM +0400, Michael Tokarev wrote:
 Gleb Natapov wrote:
  On Wed, Aug 25, 2010 at 02:51:31PM +0400, Michael Tokarev wrote:
 []
  For example, we may teach libvirt or kvm about IP addresses
  the guest is using, so that kvm will send these ARPs automatically
  after migration has completed.
 []
  Kvm is the most natural place to do that, I think, and it's
  easy to implement there too (it has the tun device which can
  inject packets on behalf of the guest)   Yes, the configuration
  will be duplicated somehow, but that's not a big problem, and
  it will make things much more reliable.
 
  KVM is certainly not the most natural place to do that. Even gratuitous
  ARP we have today will not work if guest changes mac address. KVM
  couldn't care less about host network protocols. Management may implement
  guest daemon that will take appropriate action to restore networking
  after migration on demand (send gratuitous pigeon if IP over pigeons are
  used by guest).
 
 I mean something else.  When using standard, the most common configuration,
 without fancy settings or technologies like IP over pigeons, the most easy
 way to do that is in kvm, it should be just about 20 lines of code or so.
 Yes that will not work in some complex setups, where in-guest solution will
 be needed, but in that case a guest daemon alone wont help, it will need
 to run some script to do custom actions.
 
 For the MAC address changes for example -- the solution is simple: don't
 change MAC address in guest.  Or if you do, either teach kvm about that
 (so it'll send proper ARP), or implement custom solution in guest, or
 don't migrate, or live with delays after migration.  There are multiple
 choices.
 
 Yet doing it the simplest way (in kvm) will cover some 99% cases, and doing
 it as some daemon in the guest will cover that same 99% cases anyway (for
 the rest some custom script will be needed).
 
 So I think it's the best to implement it in kvm in the most stright-forward
 and easy way.
 
 Yes, some guest notification is probably needed anyway - not only for this
 case with networks but also in order to notify guest about, say, resume
 from freeze (after loadvm or migrate from file), afrer migration and so
 on, so guest can react to such events in a meaningful way.  But this is
 in parallel with the ability to send an ARP after migration.
 
 Just IMHO ofcourse.
 
The most common case is for guest to use DHCP to obtain IP dynamically, so
KVM cannot know what IP to use without sniffing network. And no, we do
not want to pass mac/ip from guest to host. For that guest agent is
needed anyway and it can send gratuitous ARP by itself.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-25 Thread Jason Wang

- Avi Kivity a...@redhat.com wrote:

 On 08/25/2010 12:40 PM, Jason Wang wrote:
  - Avi Kivitya...@redhat.com  wrote:
 
  On 08/24/2010 04:47 PM, Jason Wang wrote:
  TSS size should be 104 byte.
 
  Signed-off-by: Jason Wangjasow...@redhat.com
  ---
 x86/cstart64.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/x86/cstart64.S b/x86/cstart64.S
  index 5d358ad..b871153 100644
  --- a/x86/cstart64.S
  +++ b/x86/cstart64.S
  @@ -69,7 +69,7 @@ tss:
.long 0
.quad ring0stacktop - i * 4096
  ring 0 stack
 
.quad 0, 0, 0
  rings 1, 2, 3 stack
  Hello avi:
 
  Rechek with the manual, there's no filed of RSP3. So this patch may
  make sense.
 
 That is true.  But please redo it to remove one 0 from the line above,
 
 not from the IST.
 
  But unfortunately it breaks 64bit vmexit test. Triple
  fault happens in setup_args(). Any suggestions or is there any thing
 I
  missed?
 
 No idea.  Can you post an ftrace of the crash?
 

The trace before triple fault:

..
qemu-kvm-8101  [002]   243.138507: kvm_entry: vcpu 0
qemu-kvm-8101  [002]   243.138508: kvm_exit: reason IO_INSTRUCTION rip 
0x400e5f
qemu-kvm-8101  [002]   243.138508: kvm_pio: pio_read at 0x510 size 2 
count 1
qemu-kvm-8101  [002]   243.138512: kvm_entry: vcpu 0
qemu-kvm-8101  [002]   243.138513: kvm_exit: reason IO_INSTRUCTION rip 
0x400e71
qemu-kvm-8101  [002]   243.138514: kvm_emulate_insn: 0:400e71: ec 
(prot64)
qemu-kvm-8101  [002]   243.138515: kvm_pio: pio_write at 0x511 size 1 
count 1
qemu-kvm-8101  [002]   243.138519: kvm_entry: vcpu 0
qemu-kvm-8101  [002]   243.138520: kvm_exit: reason IO_INSTRUCTION rip 
0x400e71
qemu-kvm-8101  [002]   243.138521: kvm_emulate_insn: 0:400e71: ec 
(prot64)
qemu-kvm-8101  [002]   243.138521: kvm_pio: pio_write at 0x511 size 1 
count 1
qemu-kvm-8101  [002]   243.138525: kvm_entry: vcpu 0
qemu-kvm-8101  [002]   243.138526: kvm_exit: reason CPUID rip 0x400ff7
qemu-kvm-8101  [002]   243.138526: kvm_cpuid: func 1 rax 6d3 rbx 800 
rcx 80002001 rdx 78bfbfd
qemu-kvm-8101  [002]   243.138527: kvm_entry: vcpu 0
qemu-kvm-8101  [002]   243.138528: kvm_exit: reason EXCEPTION_NMI rip 
0x400271
qemu-kvm-8101  [002]   243.138528: kvm_page_fault: address 40f3a0 
error_code b
qemu-kvm-8101  [002]   243.138530: kvm_entry: vcpu 0
qemu-kvm-8101  [002]   243.138531: kvm_exit: reason TRIPLE_FAULT rip 
0x400c15


 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-25 Thread Avi Kivity

 On 08/25/2010 03:27 PM, Jason Wang wrote:

- Avi Kivitya...@redhat.com  wrote:


On 08/25/2010 12:40 PM, Jason Wang wrote:

- Avi Kivitya...@redhat.com   wrote:


On 08/24/2010 04:47 PM, Jason Wang wrote:

TSS size should be 104 byte.

Signed-off-by: Jason Wangjasow...@redhat.com
---
x86/cstart64.S |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/x86/cstart64.S b/x86/cstart64.S
index 5d358ad..b871153 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -69,7 +69,7 @@ tss:
.long 0
.quad ring0stacktop - i * 4096

ring 0 stack


.quad 0, 0, 0

rings 1, 2, 3 stack

Hello avi:

Rechek with the manual, there's no filed of RSP3. So this patch may
make sense.

That is true.  But please redo it to remove one 0 from the line above,

not from the IST.


But unfortunately it breaks 64bit vmexit test. Triple
fault happens in setup_args(). Any suggestions or is there any thing

I

missed?

No idea.  Can you post an ftrace of the crash?


The trace before triple fault:

 ..
 qemu-kvm-8101  [002]   243.138507: kvm_entry: vcpu 0
 qemu-kvm-8101  [002]   243.138508: kvm_exit: reason IO_INSTRUCTION rip 
0x400e5f
 qemu-kvm-8101  [002]   243.138508: kvm_pio: pio_read at 0x510 size 2 
count 1
 qemu-kvm-8101  [002]   243.138512: kvm_entry: vcpu 0
 qemu-kvm-8101  [002]   243.138513: kvm_exit: reason IO_INSTRUCTION rip 
0x400e71
 qemu-kvm-8101  [002]   243.138514: kvm_emulate_insn: 0:400e71: ec 
(prot64)
 qemu-kvm-8101  [002]   243.138515: kvm_pio: pio_write at 0x511 size 1 
count 1
 qemu-kvm-8101  [002]   243.138519: kvm_entry: vcpu 0
 qemu-kvm-8101  [002]   243.138520: kvm_exit: reason IO_INSTRUCTION rip 
0x400e71
 qemu-kvm-8101  [002]   243.138521: kvm_emulate_insn: 0:400e71: ec 
(prot64)
 qemu-kvm-8101  [002]   243.138521: kvm_pio: pio_write at 0x511 size 1 
count 1
 qemu-kvm-8101  [002]   243.138525: kvm_entry: vcpu 0
 qemu-kvm-8101  [002]   243.138526: kvm_exit: reason CPUID rip 0x400ff7
 qemu-kvm-8101  [002]   243.138526: kvm_cpuid: func 1 rax 6d3 rbx 800 
rcx 80002001 rdx 78bfbfd
 qemu-kvm-8101  [002]   243.138527: kvm_entry: vcpu 0
 qemu-kvm-8101  [002]   243.138528: kvm_exit: reason EXCEPTION_NMI rip 
0x400271
 qemu-kvm-8101  [002]   243.138528: kvm_page_fault: address 40f3a0 
error_code b
 qemu-kvm-8101  [002]   243.138530: kvm_entry: vcpu 0
 qemu-kvm-8101  [002]   243.138531: kvm_exit: reason TRIPLE_FAULT rip 
0x400c15



What's the corresponding disassembly?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-25 Thread Jason Wang

- Avi Kivity a...@redhat.com wrote:

 On 08/25/2010 03:27 PM, Jason Wang wrote:
  - Avi Kivitya...@redhat.com  wrote:
 
  On 08/25/2010 12:40 PM, Jason Wang wrote:
  - Avi Kivitya...@redhat.com   wrote:
 
  On 08/24/2010 04:47 PM, Jason Wang wrote:
  TSS size should be 104 byte.
 
  Signed-off-by: Jason Wangjasow...@redhat.com
  ---
  x86/cstart64.S |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/x86/cstart64.S b/x86/cstart64.S
  index 5d358ad..b871153 100644
  --- a/x86/cstart64.S
  +++ b/x86/cstart64.S
  @@ -69,7 +69,7 @@ tss:
  .long 0
  .quad ring0stacktop - i * 4096
  ring 0 stack
 
  .quad 0, 0, 0
  rings 1, 2, 3 stack
  Hello avi:
 
  Rechek with the manual, there's no filed of RSP3. So this patch
 may
  make sense.
  That is true.  But please redo it to remove one 0 from the line
 above,
 
  not from the IST.
 
  But unfortunately it breaks 64bit vmexit test. Triple
  fault happens in setup_args(). Any suggestions or is there any
 thing
  I
  missed?
  No idea.  Can you post an ftrace of the crash?
 
  The trace before triple fault:
 
   ..
   qemu-kvm-8101  [002]   243.138507: kvm_entry: vcpu 0
   qemu-kvm-8101  [002]   243.138508: kvm_exit: reason
 IO_INSTRUCTION rip 0x400e5f
   qemu-kvm-8101  [002]   243.138508: kvm_pio: pio_read at
 0x510 size 2 count 1
   qemu-kvm-8101  [002]   243.138512: kvm_entry: vcpu 0
   qemu-kvm-8101  [002]   243.138513: kvm_exit: reason
 IO_INSTRUCTION rip 0x400e71
   qemu-kvm-8101  [002]   243.138514: kvm_emulate_insn:
 0:400e71: ec (prot64)
   qemu-kvm-8101  [002]   243.138515: kvm_pio: pio_write at
 0x511 size 1 count 1
   qemu-kvm-8101  [002]   243.138519: kvm_entry: vcpu 0
   qemu-kvm-8101  [002]   243.138520: kvm_exit: reason
 IO_INSTRUCTION rip 0x400e71
   qemu-kvm-8101  [002]   243.138521: kvm_emulate_insn:
 0:400e71: ec (prot64)
   qemu-kvm-8101  [002]   243.138521: kvm_pio: pio_write at
 0x511 size 1 count 1
   qemu-kvm-8101  [002]   243.138525: kvm_entry: vcpu 0
   qemu-kvm-8101  [002]   243.138526: kvm_exit: reason CPUID
 rip 0x400ff7
   qemu-kvm-8101  [002]   243.138526: kvm_cpuid: func 1 rax
 6d3 rbx 800 rcx 80002001 rdx 78bfbfd
   qemu-kvm-8101  [002]   243.138527: kvm_entry: vcpu 0
   qemu-kvm-8101  [002]   243.138528: kvm_exit: reason
 EXCEPTION_NMI rip 0x400271
   qemu-kvm-8101  [002]   243.138528: kvm_page_fault: address
 40f3a0 error_code b
   qemu-kvm-8101  [002]   243.138530: kvm_entry: vcpu 0
   qemu-kvm-8101  [002]   243.138531: kvm_exit: reason
 TRIPLE_FAULT rip 0x400c15
 
 
 What's the corresponding disassembly?

00400bb8 __setup_args:
  400bb8:   41 55   push   %r13
  400bba:   41 54   push   %r12
  400bbc:   55  push   %rbp
  400bbd:   53  push   %rbx
  400bbe:   48 8b 1d db e7 00 00mov0xe7db(%rip),%rbx# 
40f3a0 __args
  400bc5:   41 bc 80 ec 40 00   mov$0x40ec80,%r12d
  400bcb:   41 bd 80 f0 40 00   mov$0x40f080,%r13d
  400bd1:   eb 42   jmp400c15 __setup_args+0x5d
  400bd3:   4d 89 65 00 mov%r12,0x0(%r13)
  400bd7:   0f b6 28movzbl (%rax),%ebp
  400bda:   40 84 edtest   %bpl,%bpl
  400bdd:   75 16   jne400bf5 __setup_args+0x3d
  400bdf:   eb 21   jmp400c02 __setup_args+0x4a
  400be1:   41 88 2c 24 mov%bpl,(%r12)
  400be5:   49 83 c4 01 add$0x1,%r12
  400bed:   0f b6 2bmovzbl (%rbx),%ebp
  400bf0:   40 84 edtest   %bpl,%bpl
  400bf3:   74 0d   je 400c02 __setup_args+0x4a
  400bf5:   40 0f be fd movsbl %bpl,%edi
  400bf9:   e8 a6 ff ff ff  callq  400ba4 isblank
  400bfe:   84 c0   test   %al,%al
  400c00:   74 df   je 400be1 __setup_args+0x29
  400c02:   49 83 c5 08 add$0x8,%r13
  400c06:   41 c6 04 24 00  movb   $0x0,(%r12)
  400c0b:   49 83 c4 01 add$0x1,%r12
  400c0f:   eb 04   jmp400c15 __setup_args+0x5d
  400c11:   48 83 c3 01 add$0x1,%rbx
400c15:   0f b6 2bmovzbl (%rbx),%ebp
  400c18:   40 0f be fd movsbl %bpl,%edi
  400c1c:   e8 83 ff ff ff  callq  400ba4 isblank
  


 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

2010-08-25 Thread Boris Dolgov
Hello!

I try to use KVM on slackware host and centos5 guest with SMP.

When the server is idle, I get following errors in /var/log/messages:
http://pastebin.com/4RrMVuXq

Guest:
[r...@centos-5-x64 ~]# uname -a
Linux centos-5-x64.slave 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10
19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Start command: /usr/local/qemu/bin/qemu-system-x86_64 -name
centos-5-x64 -m 384 -smp 2 -cdrom
/root/CentOS-5.5-x86_64-netinstall.iso -drive
index=0,media=disk,if=virtio,file=/dev/mapper/vgvm-centos5x64,boot=on
-vnc 0.0.0.0:1 -net nic,macaddr=fe:e1:de:ad:57:22 -net
tap,script=/usr/local/qemu/bin/qemu-ifup -monitor stdio -usb
-usbdevice tablet -enable-kvm

Host:
r...@superserver:~# uname -a
Linux superserver 2.6.35.3 #4 SMP Mon Aug 23 12:17:01 MSD 2010 x86_64
Intel(R) Xeon(R) CPU   X3460  @ 2.80GHz GenuineIntel GNU/Linux
r...@superserver:~# /usr/local/qemu/bin/qemu-system-x86_64 -version
QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c)
2003-2008 Fabrice Bellard

What does it mean?

-- 
Boris Dolgov.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

2010-08-25 Thread Boris Dolgov
On Wed, Aug 25, 2010 at 5:19 PM, Avi Kivity a...@redhat.com wrote:
 Looks like a bug in our USB emulation.  Copying qemu-devel to see if anyone
 has a clue.
I have disabled USB support, but the bug have not disappeared.
Please, pay attention that CPU#1 was stuck too.


-- 
Boris Dolgov.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[stable 2.6.32] instant crash (jump to NULL) with virtio-net, tap, bridge and veth

2010-08-25 Thread Michael Tokarev
Hello.

I'm seeing instant host kernel crash triggered by _any_
network activity to/from a kvm guest that's using virtio-net.

My setup is maybe a bit unusual, but here we go.

I've a host machine that has one bridge configured,
and is running a few kvm virtual machines and a few
linux containers (LXC).  All the guests/containers
are connected to that single bridge - guests using
tap devices, lxc containers using veth devices. Host
eth0 is connected to the same bridge as well.

The problem happens with virtio-net drivers used in
guest (this is windowsXP virtual machine with latest
netkvm driver from alt.fedoraproject.org), when I
connect to that guest from an LXC container.  I.e,
when packet goes lxc = veth = bridge = tun =
kvm = virtio in guest (or back).

When I connect to the same guest from _host_, it all
works as expected.  When I change (virtual) NIC in
guest to e1000 or older (from 2009) virtio-net driver,
it works.  When I connect from lxc container to a
linux guest with latest virtio-net drivers, it all
works as expected too.  So only one combination so
far that triggers the issue.

This is all with 2.6.32 kernel.  Initially it was
2.6.32.15, but 2.6.32.20 behaves the same way too.
All 64bit.

Also it does NOT happen with 2.6.35.3, the current
latest released kernel.

Here's one of captured OOPSes (i did it several
times, but they were incomplete):

console [netcon0] enabled
netconsole: network logging started
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [(null)] (null)
PGD 177bf2067 PUD 177ae5067 PMD 0
Oops: 0010 [#1] SMP
last sysfs file: /sys/devices/virtual/block/md8/md/mismatch_cnt
CPU 0
Modules linked in: netconsole configfs squashfs kvm_amd kvm veth autofs4 bridge 
quota_v2 quota_tree ext4 jbd2 crc16 raid0 raid456 async_pq async_xor xor 
async_memcpy async_raid6_recov raid6_pq async_tx loop sr_mod cdrom tun 
powernow_k8 processor thermal_sys 8021q garp stp llc asus_atk0110 hwmon atl1 
mii ext3 jbd mbcache raid1 md_mod pata_atiixp ehci_hcd ohci_hcd usbcore 
nls_base ahci libata sd_mod scsi_mod
Pid: 2345, comm: kvm Not tainted 2.6.32-amd64 #2.6.32.20 System Product Name
RIP: 0010:[]  [(null)] (null)
RSP: 0018:880028203e70  EFLAGS: 00010293
RAX: 880179480ec0 RBX: 8801a07770c0 RCX: 
RDX:  RSI: 8801a07770c0 RDI: 8801a07770c0
RBP: 880124b89030 R08: 8125fab0 R09: 880028203e40
R10:  R11:  R12: 880028210888
R13: 880028210880 R14: 0001e60f R15: 0040
FS:  7fe2da5e5700() GS:88002820() knlGS:f74a59d0
CS:  0010 DS:  ES:  CR0: 8005003b
CR2:  CR3: 000177a8a000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kvm64 (pid: 2345, threadinfo 880177be2000, task 880177a7c0c0)
Stack:
 8125fbd5 0040 8126013c 8000
0 8800282108b8 0002 880028210888 880028210880
0 81236276 880028203f48 8800282108b8 
Call Trace:
 IRQ
 [8125fbd5] ? ip_rcv_finish+0x125/0x430
 [8126013c] ? ip_rcv+0x25c/0x350
 [81236276] ? process_backlog+0x76/0xd0
 [81236a18] ? net_rx_action+0xf8/0x1f0
 [81059120] ? __do_softirq+0xb0/0x1d0
 [8100c56c] ? call_softirq+0x1c/0x30
 EOI
 [8100e595] ? do_softirq+0x65/0xa0
 [81236b2e] ? netif_rx_ni+0x1e/0x30
 [a014e97a] ? tun_chr_aio_write+0x35a/0x510 [tun]
 [a014e620] ? tun_chr_aio_write+0x0/0x510 [tun]
 [810ffea4] ? do_sync_readv_writev+0xd4/0x110
 [8106e890] ? autoremove_wake_function+0x0/0x30
 [81071709] ? enqueue_hrtimer+0x79/0xc0
 [810ffd08] ? rw_copy_check_uvector+0x88/0x110
 [811005bc] ? do_readv_writev+0xdc/0x220
 [8106dafc] ? sys_timer_settime+0x13c/0x2e0
 [8110084e] ? sys_writev+0x4e/0x90
 [8100b482] ? system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [(null)] (null)
 RSP 880028203e70
CR2: 
---[ end trace 1dcd3c52bde0fa25 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 2345, comm: kvm Tainted: G  D2.6.32-amd64 #2.6.32.20
Call Trace:
 IRQ  [812c22de] ? panic+0x7a/0x134
 [812c23d8] ? printk+0x40/0x48
 [8100faa3] ? oops_end+0xa3/0xb0
 [8103138a] ? no_context+0xfa/0x260
 [812c52a5] ? page_fault+0x25/0x30
 [8125fab0] ? ip_rcv_finish+0x0/0x430
 [8125fbd5] ? ip_rcv_finish+0x125/0x430
 [8126013c] ? ip_rcv+0x25c/0x350
 [81236276] ? process_backlog+0x76/0xd0
 [81236a18] ? net_rx_action+0xf8/0x1f0
 [81059120] ? __do_softirq+0xb0/0x1d0
 [8100c56c] ? call_softirq+0x1c/0x30
 EOI  [8100e595] ? do_softirq+0x65/0xa0
 [81236b2e] ? netif_rx_ni+0x1e/0x30
 [a014e97a] ? 

Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

2010-08-25 Thread Boris Dolgov
On Wed, Aug 25, 2010 at 5:15 PM, Boris Dolgov bo...@dolgov.name wrote:
 Hello!

 I try to use KVM on slackware host and centos5 guest with SMP.

 When the server is idle, I get following errors in /var/log/messages:
 http://pastebin.com/4RrMVuXq

 Guest:
 [r...@centos-5-x64 ~]# uname -a
 Linux centos-5-x64.slave 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10
 19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

 Start command: /usr/local/qemu/bin/qemu-system-x86_64 -name
 centos-5-x64 -m 384 -smp 2 -cdrom
 /root/CentOS-5.5-x86_64-netinstall.iso -drive
 index=0,media=disk,if=virtio,file=/dev/mapper/vgvm-centos5x64,boot=on
 -vnc 0.0.0.0:1 -net nic,macaddr=fe:e1:de:ad:57:22 -net
 tap,script=/usr/local/qemu/bin/qemu-ifup -monitor stdio -usb
 -usbdevice tablet -enable-kvm

 Host:
 r...@superserver:~# uname -a
 Linux superserver 2.6.35.3 #4 SMP Mon Aug 23 12:17:01 MSD 2010 x86_64
 Intel(R) Xeon(R) CPU           X3460  @ 2.80GHz GenuineIntel GNU/Linux
 r...@superserver:~# /usr/local/qemu/bin/qemu-system-x86_64 -version
 QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c)
 2003-2008 Fabrice Bellard

 What does it mean?
The problem also reproduces without SMP or USB for a guest.

I have installed Debian 5.0.5 to another VM, and the problem is
unreproducible under it:
debian-5-x64:~# uname -a
Linux debian-5-x64.slave 2.6.26-2-amd64 #1 SMP Thu Aug 19 00:37:36 UTC
2010 x86_64 GNU/Linux

Should I post it to redhat bugzilla as redhat bug, or it is KVM bug?

Also, I have noticed a strange thing. In top on host, Debian qemu
process consumes nearly 1% of cpu, but CentOS qemu process consumes
nearly 20-30%, even when VM is idle.

The problem also reproduces on centos-i386 guest and doesn't reproduce
on debian-5-i386.

-- 
Boris Dolgov.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

2010-08-25 Thread Anthony Liguori

On 08/25/2010 11:14 AM, Boris Dolgov wrote:

On Wed, Aug 25, 2010 at 5:15 PM, Boris Dolgovbo...@dolgov.name  wrote:
   

Hello!

I try to use KVM on slackware host and centos5 guest with SMP.

When the server is idle, I get following errors in /var/log/messages:
http://pastebin.com/4RrMVuXq

Guest:
[r...@centos-5-x64 ~]# uname -a
Linux centos-5-x64.slave 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10
19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Start command: /usr/local/qemu/bin/qemu-system-x86_64 -name
centos-5-x64 -m 384 -smp 2 -cdrom
/root/CentOS-5.5-x86_64-netinstall.iso -drive
index=0,media=disk,if=virtio,file=/dev/mapper/vgvm-centos5x64,boot=on
-vnc 0.0.0.0:1 -net nic,macaddr=fe:e1:de:ad:57:22 -net
tap,script=/usr/local/qemu/bin/qemu-ifup -monitor stdio -usb
-usbdevice tablet -enable-kvm

Host:
r...@superserver:~# uname -a
Linux superserver 2.6.35.3 #4 SMP Mon Aug 23 12:17:01 MSD 2010 x86_64
Intel(R) Xeon(R) CPU   X3460  @ 2.80GHz GenuineIntel GNU/Linux
r...@superserver:~# /usr/local/qemu/bin/qemu-system-x86_64 -version
QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c)
2003-2008 Fabrice Bellard

What does it mean?
 

The problem also reproduces without SMP or USB for a guest.

I have installed Debian 5.0.5 to another VM, and the problem is
unreproducible under it:
debian-5-x64:~# uname -a
Linux debian-5-x64.slave 2.6.26-2-amd64 #1 SMP Thu Aug 19 00:37:36 UTC
2010 x86_64 GNU/Linux

Should I post it to redhat bugzilla as redhat bug, or it is KVM bug?
   


Or post it to the CentOS bugzilla since you're not using Red Hat.

There's been various time related issues in RHEL5.x so my inclination 
would be that it's not an upstream issue until you verify it with the 
latest upstream.


Regards,

Anthony Liguori


Also, I have noticed a strange thing. In top on host, Debian qemu
process consumes nearly 1% of cpu, but CentOS qemu process consumes
nearly 20-30%, even when VM is idle.

The problem also reproduces on centos-i386 guest and doesn't reproduce
on debian-5-i386.

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

2010-08-25 Thread Boris Dolgov
On Wed, Aug 25, 2010 at 8:19 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 Or post it to the CentOS bugzilla since you're not using Red Hat.
I'am using redhat, but not on this VM.

 There's been various time related issues in RHEL5.x so my inclination would
 be that it's not an upstream issue until you verify it with the latest
 upstream.
Do you mean latest kvm or latest RHEL?
I tried to use qemu-kvm from GIT today, but it doesn't compile.

-- 
Boris Dolgov.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM timekeeping and TSC virtualization

2010-08-25 Thread Marcelo Tosatti
On Tue, Aug 24, 2010 at 01:01:38PM -1000, Zachary Amsden wrote:
 With this patchset, KVM now has a much stronger guarantee: If you have
 old guest software running on broken hardware, using SMP virtual
 machines, you do not get hardware performance and error-free time
 virtualization.However, if you have new guest software, non-broken
 hardware, or can simply run UP guests instead of SMP, you can have
 hardware performance, and it is now error free.  Alternatively, you can
 sacrifice some accuracy and have hardware performance, even for SMP
 guests, if you can tolerate some minor cross-CPU TSC variation.  No
 other vendor I know of can make that guarantee.
 
 Zach
 If the processor has a stable TSC why trap it? I realize you are trying
 to cover a gauntlet of hardware and guests, so maybe a nerd knob is
 needed to disable.
 
 Exactly.  If you have a stable TSC, we don't trap it.  If you don't
 have a stable TSC, we do.  That's the point of these patches.

Wait, don't you trap if host TSC is faster than guest TSC?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM timekeeping 25/35] Add clock catchup mode

2010-08-25 Thread Marcelo Tosatti
On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote:
 Make the clock update handler handle generic clock synchronization,
 not just KVM clock.  We add a catchup mode which keeps passthrough
 TSC in line with absolute guest TSC.
 
 Signed-off-by: Zachary Amsden zams...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/x86.c  |   55 ++
  2 files changed, 38 insertions(+), 18 deletions(-)
 

   kvm_x86_ops-vcpu_load(vcpu, cpu);
 - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) {
 + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) {
   /* Make sure TSC doesn't go backwards */
   s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
   native_read_tsc() - vcpu-arch.last_host_tsc;
   if (tsc_delta  0)
   mark_tsc_unstable(KVM discovered backwards TSC);
 - if (check_tsc_unstable())
 + if (check_tsc_unstable()) {
   kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
 - kvm_migrate_timers(vcpu);
 + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 + }
 + if (vcpu-cpu != cpu)
 + kvm_migrate_timers(vcpu);
   vcpu-cpu = cpu;
 + vcpu-arch.tsc_rebase = 0;
   }
  }
  
 @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
   kvm_x86_ops-vcpu_put(vcpu);
   kvm_put_guest_fpu(vcpu);
   vcpu-arch.last_host_tsc = native_read_tsc();
 +
 + /* For unstable TSC, force compensation and catchup on next CPU */
 + if (check_tsc_unstable()) {
 + vcpu-arch.tsc_rebase = 1;
 + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 + }

The mix between catchup,trap versus stable,unstable TSC is confusing and
difficult to grasp. Can you please introduce all the infrastructure
first, then control usage of them in centralized places? Examples:

+static void kvm_update_tsc_trapping(struct kvm *kvm)
+{
+   int trap, i;
+   struct kvm_vcpu *vcpu;
+
+   trap = check_tsc_unstable()  atomic_read(kvm-online_vcpus)  1;
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_x86_ops-set_tsc_trap(vcpu, trap  !vcpu-arch.time_page);
+}

+   /* For unstable TSC, force compensation and catchup on next CPU */
+   if (check_tsc_unstable()) {
+   vcpu-arch.tsc_rebase = 1;
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }


kvm_guest_time_update is becoming very confusing too. I understand this
is due to the many cases its dealing with, but please make it as simple
as possible.

+   /*
+* If we are trapping and no longer need to, use catchup to
+* ensure passthrough TSC will not be less than trapped TSC
+*/
+   if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH  vcpu-tsc_trapping 
+   ((this_tsc_khz = v-kvm-arch.virtual_tsc_khz || kvmclock))) {
+   catchup = 1;

What, TSC trapping with kvmclock enabled?

For both catchup and trapping the resolution of the host clock is
important, as Glauber commented for kvmclock. Can you comment on the
problems that arrive from a low res clock for both modes?

Similarly for catchup mode, the effect of exit frequency. No need for
any guarantees?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Bridged Networking Issue in Ubuntu Lucid

2010-08-25 Thread Rus Hughes
Thanks for your help, it seems enabling ip forwarding on the host and
setting a default gateway on the vm pointing to the host fixed it!

On Wed, Aug 25, 2010 at 3:08 AM, haishan haishan@gmail.com wrote:
 Rus Hughes wrote:

 Hi guys,

 I'm trying to sort out networking for a VM I've created on my Ubuntu
 Lucid box but the VM cannot access the Internet.
 I can connect to the VM (crisps) from the host (holly) and to the host
 from the VM.
 But the VM cannot connect to the Internet and the Internet cannot
 connect to the VM.

 I followed the guide here: https://help.ubuntu.com/community/KVM

 I created it using ubuntu-vm-builder using :

 ubuntu-vm-builder kvm lucid \
                  --domain crisps \
                  --dest crisps \
                  --arch amd64 \
                  --hostname crisps \
                  --mem 2048 \
                  --rootsize 4096 \
                  --swapsize 1024 \
                  --user magicalusernameofdoom \
                  --pass somesecretawesomepassword \
                  --ip 178.63.60.159 \
                  --mask 255.255.255.192 \
                  --bcast 178.63.60.191 \
                  --gw 178.63.60.129 \
                  --dns 213.133.99.99 \
                  --mirror http://de.archive.ubuntu.com/ubuntu \
                  --components main,universe,restricted,multiverse \
                  --addpkg openssh-server \
                  --libvirt qemu:///system \
                --bridge br0;

 virsh start crisps starts the vm up :

 root     29297  0.5  3.3 2280396 271220 ?      Sl   14:12   0:13
 /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps
 -uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev
 socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait
 -monitor chardev:monitor -boot c -drive
 file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on
 -net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0
 -net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc
 127.0.0.1:0 -vga cirrus

 Networking on the host is like so:

 br0       Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
          inet addr:178.63.60.138  Bcast:178.63.60.191
  Mask:255.255.255.192
          inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0
          TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:35266886777 (35.2 GB)  TX bytes:60292047383 (60.2 GB)

 eth0      Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
          inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0
          TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:36337218359 (36.3 GB)  TX bytes:60378047518 (60.3 GB)
          Interrupt:29 Base address:0x4000

 eth0:0    Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
          inet addr:178.63.60.178  Bcast:178.63.60.191
  Mask:255.255.255.192
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:29 Base address:0x4000

 lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:325111540 (325.1 MB)  TX bytes:325111540 (325.1 MB)

 vnet0     Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
          inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:72 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:3412 (3.4 KB)  TX bytes:2698 (2.6 KB)

 the hosts /etc/network/interfaces is like so :

 auto lo
 iface lo inet loopback

 # device: eth0
 auto  eth0
 iface eth0 inet manual

 auto br0
 iface br0 inet static
  address   178.63.60.138
  broadcast 178.63.60.191
  netmask   255.255.255.192
  gateway   178.63.60.129
  network 178.63.60.128
  bridge_ports eth0
  bridge_stp off
  bridge_fd 0
  bridge_maxwait 0

 # subli.me.uk
 auto eth0:0
 iface eth0:0 inet static
  address 178.63.60.178
  netmask 255.255.255.192


 If I ping the VM I see the packets on the host br0 but not vnet0, this
 is the output of brctl show on the host :

 bridge name     bridge id               STP enabled     interfaces
 br0             8000.265dc9d2752e       no              eth0

     vnet0
 this is the routing table on the host

 Kernel IP routing table
 Destination     Gateway         Genmask         Flags 

Re: BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

2010-08-25 Thread Boris Dolgov
On Wed, Aug 25, 2010 at 8:19 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 Or post it to the CentOS bugzilla since you're not using Red Hat.

 There's been various time related issues in RHEL5.x so my inclination would
 be that it's not an upstream issue until you verify it with the latest
 upstream.
After setting clocksource to rtc (instead of dynticks), the problem disappeared.


-- 
Boris Dolgov.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags

2010-08-25 Thread Marcelo Tosatti
On Mon, Aug 23, 2010 at 02:06:11PM +0300, Avi Kivity wrote:
 Use the new byte/word dual opcode decode.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/emulate.c |   40 
  1 files changed, 16 insertions(+), 24 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 152654e..79a5f81 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2370,42 +2370,34 @@ static struct group_dual group9 = { {
  
  static struct opcode opcode_table[256] = {
   /* 0x00 - 0x07 */
 - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | 
 Lock),
 - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM),
 - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm),
 + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM),
 + D2bv(DstAcc | SrcImm),
   D(ImplicitOps | Stack | No64), D(ImplicitOps | Stack | No64),
   /* 0x08 - 0x0F */
 - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | 
 Lock),
 - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM),
 - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm),
 + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM),
 + D2bv(DstAcc | SrcImm),
   D(ImplicitOps | Stack | No64), N,
   /* 0x10 - 0x17 */
 - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | 
 Lock),
 - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM),
 - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm),
 + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM),
 + D2bv(DstAcc | SrcImm),
   D(ImplicitOps | Stack | No64), D(ImplicitOps | Stack | No64),
   /* 0x18 - 0x1F */
 - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | 
 Lock),
 - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM),
 - D(ByteOp | DstAcc | SrcImm), D(DstAcc | SrcImm),
 + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM),
 + D2bv(DstAcc | SrcImm),
   D(ImplicitOps | Stack | No64), D(ImplicitOps | Stack | No64),
   /* 0x20 - 0x27 */
 - D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | 
 Lock),
 - D(ByteOp | DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM),
 - D(ByteOp | DstAcc | SrcImmByte), D(DstAcc | SrcImm), N, N,
 + D2bv(DstMem | SrcReg | ModRM | Lock), D2bv(DstReg | SrcMem | ModRM),
 + D2bv(DstAcc | SrcImmByte), N, N,

SrcImm? opcode 0x25...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 5/5] KVM: MMU: combine guest pte read between fetch and pte prefetch

2010-08-25 Thread Marcelo Tosatti
On Sun, Aug 22, 2010 at 07:13:33PM +0800, Xiao Guangrong wrote:
 Combine guest pte read between guest pte check in the fetch path and pte 
 prefetch
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 ---
  arch/x86/kvm/paging_tmpl.h |   40 +---
  1 files changed, 21 insertions(+), 19 deletions(-)

Applied all, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM timekeeping and TSC virtualization

2010-08-25 Thread Zachary Amsden

On 08/25/2010 06:55 AM, Marcelo Tosatti wrote:

On Tue, Aug 24, 2010 at 01:01:38PM -1000, Zachary Amsden wrote:
   

With this patchset, KVM now has a much stronger guarantee: If you have
old guest software running on broken hardware, using SMP virtual
machines, you do not get hardware performance and error-free time
virtualization.However, if you have new guest software, non-broken
hardware, or can simply run UP guests instead of SMP, you can have
hardware performance, and it is now error free.  Alternatively, you can
sacrifice some accuracy and have hardware performance, even for SMP
guests, if you can tolerate some minor cross-CPU TSC variation.  No
other vendor I know of can make that guarantee.

Zach
 

If the processor has a stable TSC why trap it? I realize you are trying
to cover a gauntlet of hardware and guests, so maybe a nerd knob is
needed to disable.
   

Exactly.  If you have a stable TSC, we don't trap it.  If you don't
have a stable TSC, we do.  That's the point of these patches.
 

Wait, don't you trap if host TSC is faster than guest TSC?
   


Yes, but that's not a standard scenario, that only applies for a 
migration.  There's also a way to avoid this:


set guest TSC  MAX (all host TSCs in cluster)

Then catchup mode will be used to keep the TSC up to speed.  Actually, 
it's an open question what to do for SMP in this case, I'm not sure I 
got that right.  Technically the proper thing to do is to trap, but this 
should have a user override.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM timekeeping 25/35] Add clock catchup mode

2010-08-25 Thread Zachary Amsden

On 08/25/2010 07:27 AM, Marcelo Tosatti wrote:

On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote:
   

Make the clock update handler handle generic clock synchronization,
not just KVM clock.  We add a catchup mode which keeps passthrough
TSC in line with absolute guest TSC.

Signed-off-by: Zachary Amsdenzams...@redhat.com
---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/x86.c  |   55 ++
  2 files changed, 38 insertions(+), 18 deletions(-)

 
   

kvm_x86_ops-vcpu_load(vcpu, cpu);
-   if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) {
+   if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) {
/* Make sure TSC doesn't go backwards */
s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
native_read_tsc() - vcpu-arch.last_host_tsc;
if (tsc_delta  0)
mark_tsc_unstable(KVM discovered backwards TSC);
-   if (check_tsc_unstable())
+   if (check_tsc_unstable()) {
kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
-   kvm_migrate_timers(vcpu);
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }
+   if (vcpu-cpu != cpu)
+   kvm_migrate_timers(vcpu);
vcpu-cpu = cpu;
+   vcpu-arch.tsc_rebase = 0;
}
  }

@@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_x86_ops-vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu-arch.last_host_tsc = native_read_tsc();
+
+   /* For unstable TSC, force compensation and catchup on next CPU */
+   if (check_tsc_unstable()) {
+   vcpu-arch.tsc_rebase = 1;
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }
 

The mix between catchup,trap versus stable,unstable TSC is confusing and
difficult to grasp. Can you please introduce all the infrastructure
first, then control usage of them in centralized places? Examples:

+static void kvm_update_tsc_trapping(struct kvm *kvm)
+{
+   int trap, i;
+   struct kvm_vcpu *vcpu;
+
+   trap = check_tsc_unstable()  atomic_read(kvm-online_vcpus)  1;
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_x86_ops-set_tsc_trap(vcpu, trap  !vcpu-arch.time_page);
+}

+   /* For unstable TSC, force compensation and catchup on next CPU */
+   if (check_tsc_unstable()) {
+   vcpu-arch.tsc_rebase = 1;
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }


kvm_guest_time_update is becoming very confusing too. I understand this
is due to the many cases its dealing with, but please make it as simple
as possible.
   


I tried to comment as best as I could.  I think the whole 
kvm_update_tsc_trapping thing is probably a poor design choice.  It 
works, but it's thoroughly unintelligible right now without spending 
some days figuring out why.


I'll rework the tail series of patches to try to make them more clear.


+   /*
+* If we are trapping and no longer need to, use catchup to
+* ensure passthrough TSC will not be less than trapped TSC
+*/
+   if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH  vcpu-tsc_trapping
+   ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) {
+   catchup = 1;

What, TSC trapping with kvmclock enabled?
   


Transitioning to use of kvmclock after a cold boot means we may have 
been trapping and now we will not be.



For both catchup and trapping the resolution of the host clock is
important, as Glauber commented for kvmclock. Can you comment on the
problems that arrive from a low res clock for both modes?

Similarly for catchup mode, the effect of exit frequency. No need for
any guarantees?
   


The scheduler will do something to get an IRQ at whatever resolution it 
uses for it's timeslice.  That guarantees an exit per timeslice, so 
we'll never be behind by more than one slice while scheduling.  While 
not scheduling, we're dormant anyway, waiting on either an IRQ or shared 
memory variable change.  Local timers could end up behind when dormant.


We may need a hack to accelerate firing of timers in such a case, or 
perhaps bounds on when to use catchup mode and when to not.


Partly, the lack of implementation is by deliberate choice; the logic 
involved with setting such bounds and wisdom of doing so is a choice 
most likely to be done by a policy agent in userspace, in our case, 
qemu.  In the end, that is what has full control over the setting or not 
of guest TSC rate and choice of TSC mode.


What's lacking is the ability to force the use of a certain mode.  I 
think it's clear now, that needs to be a per-VM choice, not a global one.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH kvm-unit-tests 0/3] DIV exception tests

2010-08-25 Thread Marcelo Tosatti
On Tue, Aug 24, 2010 at 02:01:08PM +0300, Avi Kivity wrote:
 DIV may raise an exception if dividing by zero or causing an overflow.  Test
 it.
 
 Avi Kivity (3):
   Support #DE (divide error) exception handlers
   Allow emulation tests to trap exceptions
   Add DIV tests
 
  config-x86-common.mak |3 ++-
  x86/emulator.c|   20 
  x86/idt.c |8 +++-
  3 files changed, 29 insertions(+), 2 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: trap and propagate #DE from DIV and IDIV

2010-08-25 Thread Marcelo Tosatti
On Tue, Aug 24, 2010 at 02:10:29PM +0300, Avi Kivity wrote:
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/emulate.c |   17 ++---
  1 files changed, 14 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index f82e43a..a7e26d0 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -505,6 +505,12 @@ static void emulate_ts(struct x86_emulate_ctxt *ctxt, 
 int err)
   emulate_exception(ctxt, TS_VECTOR, err, true);
  }
  
 +static int emulate_de(struct x86_emulate_ctxt *ctxt)
 +{
 + emulate_exception(ctxt, DE_VECTOR, 0, false);
 + return X86EMUL_PROPAGATE_FAULT;
 +}
 +
  static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
 struct x86_emulate_ops *ops,
 unsigned long eip, u8 *dest)
 @@ -1459,6 +1465,7 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
 *ctxt,
   struct decode_cache *c = ctxt-decode;
   unsigned long *rax = c-regs[VCPU_REGS_RAX];
   unsigned long *rdx = c-regs[VCPU_REGS_RDX];
 + u8 de = 0;
  
   switch (c-modrm_reg) {
   case 0 ... 1:   /* test */
 @@ -1477,14 +1484,18 @@ static inline int emulate_grp3(struct 
 x86_emulate_ctxt *ctxt,
   emulate_1op_rax_rdx(imul, c-src, *rax, *rdx, ctxt-eflags);
   break;
   case 6: /* div */
 - emulate_1op_rax_rdx(div, c-src, *rax, *rdx, ctxt-eflags);
 + emulate_1op_rax_rdx_ex(div, c-src, *rax, *rdx,
 +ctxt-eflags, de);
   break;
   case 7: /* idiv */
 - emulate_1op_rax_rdx(idiv, c-src, *rax, *rdx, ctxt-eflags);
 + emulate_1op_rax_rdx_ex(idiv, c-src, *rax, *rdx,
 +ctxt-eflags, de);
   break;
   default:
   return X86EMUL_UNHANDLEABLE;
   }
 + if (de)
 + return emulate_de(ctxt);
   return X86EMUL_CONTINUE;
  }
  
 @@ -3363,7 +3374,7 @@ special_insn:
   break;
   case 0xf6 ... 0xf7: /* Grp3 */
   if (emulate_grp3(ctxt, ops) != X86EMUL_CONTINUE)
 - goto cannot_emulate;
 + goto done;
   break;

Must assign rc to emulate_grp3() retval.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] S390: take a full byte as ext_param indicator

2010-08-25 Thread Marcelo Tosatti
On Tue, Aug 24, 2010 at 03:48:50PM +0200, Alexander Graf wrote:
 Currenty the ext_param field only distinguishes between config change and
 vring interrupt. We can do a lot more with it though, so let's enable a
 full byte of possible values and constants to #defines while at it.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 
 ---
 
 v1 - v2:
 
   - move defines to virtio_s390.h
 ---
  arch/s390/include/asm/kvm_virtio.h |6 ++
  drivers/s390/kvm/kvm_virtio.c  |   19 +--
  2 files changed, 19 insertions(+), 6 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cgroups: fix API thinko

2010-08-25 Thread Andrew Morton
On Fri, 06 Aug 2010 10:38:24 -0600
Alex Williamson alex.william...@redhat.com wrote:

 On Fri, 2010-08-06 at 09:34 -0700, Sridhar Samudrala wrote:
  On 8/5/2010 3:59 PM, Michael S. Tsirkin wrote:
   cgroup_attach_task_current_cg API that have upstream is backwards: we
   really need an API to attach to the cgroups from another process A to
   the current one.
  
   In our case (vhost), a priveledged user wants to attach it's task to 
   cgroups
   from a less priveledged one, the API makes us run it in the other
   task's context, and this fails.
  
   So let's make the API generic and just pass in 'from' and 'to' tasks.
   Add an inline wrapper for cgroup_attach_task_current_cg to avoid
   breaking bisect.
  
   Signed-off-by: Michael S. Tsirkinm...@redhat.com
   ---
  
   Paul, Li, Sridhar, could you please review the following
   patch?
  
   I only compile-tested it due to travel, but looks
   straight-forward to me.
   Alex Williamson volunteered to test and report the results.
   Sending out now for review as I might be offline for a bit.
   Will only try to merge when done, obviously.
  
   If OK, I would like to merge this through -net tree,
   together with the patch fixing vhost-net.
   Let me know if that sounds ok.
  
   Thanks!
  
   This patch is on top of net-next, it is needed for fix
   vhost-net regression in net-next, where a non-priveledged
   process can't enable the device anymore:
  
   when qemu uses vhost, inside the ioctl call it
   creates a thread, and tries to add
   this thread to the groups of current, and it fails.
   But we control the thread, so to solve the problem,
   we really should tell it 'connect to out cgroups'.

So am I correct to assume that this change is now needed in 2.6.36, and
unneeded in 2.6.35?

Can it affect the userspace-kernel API in amy manner?  If so, it
should be backported into earlier kernels to reduce the number of
incompatible kernels out there.

Paul, did you have any comments?

I didn't see any update in response to the minor review comments, so...


 include/linux/cgroup.h |1 +
 kernel/cgroup.c|6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff -puN include/linux/cgroup.h~cgroups-fix-api-thinko-fix 
include/linux/cgroup.h
--- a/include/linux/cgroup.h~cgroups-fix-api-thinko-fix
+++ a/include/linux/cgroup.h
@@ -579,6 +579,7 @@ void cgroup_iter_end(struct cgroup *cgrp
 int cgroup_scan_tasks(struct cgroup_scanner *scan);
 int cgroup_attach_task(struct cgroup *, struct task_struct *);
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
+
 static inline int cgroup_attach_task_current_cg(struct task_struct *tsk)
 {
return cgroup_attach_task_all(current, tsk);
diff -puN kernel/cgroup.c~cgroups-fix-api-thinko-fix kernel/cgroup.c
--- a/kernel/cgroup.c~cgroups-fix-api-thinko-fix
+++ a/kernel/cgroup.c
@@ -1798,13 +1798,13 @@ out:
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
 {
struct cgroupfs_root *root;
-   struct cgroup *cur_cg;
int retval = 0;
 
cgroup_lock();
for_each_active_root(root) {
-   cur_cg = task_cgroup_from_root(from, root);
-   retval = cgroup_attach_task(cur_cg, tsk);
+   struct cgroup *from_cg = task_cgroup_from_root(from, root);
+
+   retval = cgroup_attach_task(from_cg, tsk);
if (retval)
break;
}
_

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/7] KVM steal time implementation

2010-08-25 Thread Glauber Costa
Hi,

In this series, I am proposing an steal time implementation
for KVM. It is an RFC, so don't be too harsh =]

There are two parts of it: the guest and host part.

The proposal for the guest part, is to just change the
common time accounting, and try to identify at that spot,
wether or not we should account any steal time. I considered
this idea less cumbersome that trying to cook a clockevents
implementation ourselves, since I see little value in it.
I am, however, pretty open to suggestions.

For the host-guest communications, I am using a shared
page, in the same way as pvclock. Because of that, I am just
hijacking pvclock structure anyway. There is a 32-bit field
floating by, that gives us enough room for 8 years of steal
time (we use msec resolution).

The main idea is to timestamp our exit and entry through
sched notifiers, and export the value at pvclock updates.
This obviously have some disadvantages: by doing this we
are giving up futures ideas about only updating
this structure once, and even right now, won't work
on pinned-smp (since we don't update pvclock if we
haven't changed cpus.) 

But again, it is just an RFC, and I'd like to feel the
reception of the idea as a whole.

Have a nice review.

Glauber Costa (6):
  change headers preparing for steal time
  measure time out of guest
  change kernel accounting to include steal time
  kvm steal time implementation
  touch softlockup watchdog
  tell guest about steal time feature

Zachary Amsden (1):
  Implement getnsboottime kernel API

 arch/x86/include/asm/kvm_host.h|2 +
 arch/x86/include/asm/kvm_para.h|1 +
 arch/x86/include/asm/pvclock-abi.h |4 ++-
 arch/x86/kernel/kvmclock.c |   38 
 arch/x86/kvm/x86.c |   15 -
 include/linux/sched.h  |1 +
 include/linux/time.h   |1 +
 kernel/sched.c |   29 +++
 kernel/time/timekeeping.c  |   27 +
 9 files changed, 115 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/7] Implement getnsboottime kernel API

2010-08-25 Thread Glauber Costa
From: Zachary Amsden zams...@redhat.com

Add a kernel call to get the number of nanoseconds since boot.  This
is generally useful enough to make it a generic call.

Signed-off-by: Zachary Amsden zams...@redhat.com
---
 include/linux/time.h  |1 +
 kernel/time/timekeeping.c |   27 +++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index ea3559f..5d04108 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -145,6 +145,7 @@ extern void getnstimeofday(struct timespec *tv);
 extern void getrawmonotonic(struct timespec *ts);
 extern void getboottime(struct timespec *ts);
 extern void monotonic_to_bootbased(struct timespec *ts);
+extern s64 getnsboottime(void);
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 extern int timekeeping_valid_for_hres(void);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index caf8d4d..d250f0a 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -285,6 +285,33 @@ void ktime_get_ts(struct timespec *ts)
 }
 EXPORT_SYMBOL_GPL(ktime_get_ts);
 
+
+/**
+ * getnsboottime - get the bootbased clock in nsec format
+ *
+ * The function calculates the bootbased clock from the realtime
+ * clock and the wall_to_monotonic offset and stores the result
+ * in normalized timespec format in the variable pointed to by @ts.
+ */
+s64 getnsboottime(void)
+{
+   unsigned int seq;
+   s64 secs, nsecs;
+
+   WARN_ON(timekeeping_suspended);
+
+   do {
+   seq = read_seqbegin(xtime_lock);
+   secs = xtime.tv_sec + wall_to_monotonic.tv_sec;
+   secs += total_sleep_time.tv_sec;
+   nsecs = xtime.tv_nsec + wall_to_monotonic.tv_nsec;
+   nsecs += total_sleep_time.tv_nsec + timekeeping_get_ns();
+
+   } while (read_seqretry(xtime_lock, seq));
+   return nsecs + (secs * NSEC_PER_SEC);
+}
+EXPORT_SYMBOL_GPL(getnsboottime);
+
 /**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:pointer to the timeval to be set
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/7] change headers preparing for steal time

2010-08-25 Thread Glauber Costa
This guest/host common patch prepares infrastructure for
the steal time implementation. Some constants are added,
and a name change happens in pvclock vcpu structure.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h|1 +
 arch/x86/include/asm/pvclock-abi.h |4 +++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 05eba5e..1759c81 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -25,6 +25,7 @@
  * in pvclock structure. If no bits are set, all flags are ignored.
  */
 #define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24
+#define KVM_FEATURE_CLOCKSOURCE_STEAL_BIT  25
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 35f2d19..417061b 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -24,7 +24,7 @@
 
 struct pvclock_vcpu_time_info {
u32   version;
-   u32   pad0;
+   u32   steal_time;
u64   tsc_timestamp;
u64   system_time;
u32   tsc_to_system_mul;
@@ -40,5 +40,7 @@ struct pvclock_wall_clock {
 } __attribute__((__packed__));
 
 #define PVCLOCK_TSC_STABLE_BIT (1  0)
+#define PVCLOCK_STEAL_BIT   (2  0)
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 3/7] measure time out of guest

2010-08-25 Thread Glauber Costa
By measuring time between a vcpu_put and a vcpu_load, we can
estimate how much time did the guest stay out of the cpu.
This is exported to the guest at every clock update.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/x86.c  |   12 +++-
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..bc28aff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -364,6 +364,8 @@ struct kvm_vcpu_arch {
u64 hv_vapic;
 
cpumask_var_t wbinvd_dirty_mask;
+   u64 time_out;
+   u64 last_time_out;
 };
 
 struct kvm_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e7e3b50..680feaa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -925,8 +925,9 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
vcpu-hv_clock.system_time = ts.tv_nsec +
 (NSEC_PER_SEC * (u64)ts.tv_sec) + 
v-kvm-arch.kvmclock_offset;
 
-   vcpu-hv_clock.flags = 0;
+   vcpu-hv_clock.flags = PVCLOCK_STEAL_BIT;
 
+   vcpu-hv_clock.steal_time = vcpu-time_out / 100;
/*
 * The interface expects us to write an even number signaling that the
 * update is finished. Since the guest won't see the intermediate
@@ -1798,6 +1799,8 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+   s64 now;
+
/* Address WBINVD may be executed by guest */
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops-has_wbinvd_exit())
@@ -1815,12 +1818,19 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
per_cpu(cpu_tsc_khz, cpu) = khz;
}
kvm_request_guest_time_update(vcpu);
+
+   now = getnsboottime();
+
+   if (vcpu-arch.last_time_out != 0)
+   vcpu-arch.time_out += now - vcpu-arch.last_time_out;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
kvm_x86_ops-vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
+
+   vcpu-arch.last_time_out = getnsboottime();
 }
 
 static int is_efer_nx(void)
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 4/7] change kernel accounting to include steal time

2010-08-25 Thread Glauber Costa
This patch proposes a common steal time implementation. When no
steal time is accounted, we just add a branch to the current
accounting code, that shouldn't add much overhead.

When we do want to register steal time, we proceed as following:
- if we would account user or system time in this tick, and there is
  out-of-cpu time registered, we skip it altogether, and account steal
  time only.
- if we would account user or system time in this tick, and we got the
  cpu for the whole slice, we proceed normaly.
- if we are idle in this tick, we flush out-of-cpu time to give it the
  chance to update whatever last-measure internal variable it may have.

This approach is simple, but proved to work well for my test scenarios.
in a UP guest on UP host, with a cpu-hog in both guest and host shows
~ 50 % steal time. steal time is also accounted proportionally, if
nice values are given to the host cpu-hog.

A cpu-hog in the host with no load in the guest, produces 0 % steal time,
with 100 % idle, as one would expect.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 include/linux/sched.h |1 +
 kernel/sched.c|   29 +
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 047..e571ddd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout);
 extern void cpu_init (void);
 extern void trap_init(void);
 extern void update_process_times(int user);
+extern cputime_t (*hypervisor_steal_time)(void);
 extern void scheduler_tick(void);
 
 extern void sched_show_task(struct task_struct *p);
diff --git a/kernel/sched.c b/kernel/sched.c
index f52a880..9695c92 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct 
task_struct *p)
return ns;
 }
 
+cputime_t (*hypervisor_steal_time)(void) = NULL;
+
+static inline cputime_t get_steal_time_from_hypervisor(void)
+{
+   if (!hypervisor_steal_time)
+   return 0;
+   return hypervisor_steal_time();
+}
+
+
 /*
  * Account user cpu time to a process.
  * @p: the process that the cpu time gets accounted to
@@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t 
cputime,
struct cpu_usage_stat *cpustat = kstat_this_cpu.cpustat;
cputime64_t tmp;
 
+   tmp = get_steal_time_from_hypervisor();
+   if (tmp) {
+   account_steal_time(tmp);
+   return;
+   }
+
/* Add user time to process. */
p-utime = cputime_add(p-utime, cputime);
p-utimescaled = cputime_add(p-utimescaled, cputime_scaled);
@@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int 
hardirq_offset,
return;
}
 
+   tmp = get_steal_time_from_hypervisor();
+   if (tmp) {
+   account_steal_time(tmp);
+   return;
+   }
+
/* Add system time to process. */
p-stime = cputime_add(p-stime, cputime);
p-stimescaled = cputime_add(p-stimescaled, cputime_scaled);
@@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime)
cputime64_t cputime64 = cputime_to_cputime64(cputime);
struct rq *rq = this_rq();
 
+   /*
+* if we're idle, we don't account it as steal time, since we did
+* not want to run anyway. We do call the steal function, however, to
+* give the guest the chance to flush its internal buffers
+*/
+   get_steal_time_from_hypervisor();
+
if (atomic_read(rq-nr_iowait)  0)
cpustat-iowait = cputime64_add(cpustat-iowait, cputime64);
else
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 5/7] kvm steal time implementation

2010-08-25 Thread Glauber Costa
This is the proposed kvm-side steal time implementation.
It is migration safe, as it checks flags at every read.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/kernel/kvmclock.c |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..a1f4852 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -18,6 +18,8 @@
 
 #include linux/clocksource.h
 #include linux/kvm_para.h
+#include linux/kernel_stat.h
+#include linux/sched.h
 #include asm/pvclock.h
 #include asm/msr.h
 #include asm/apic.h
@@ -41,6 +43,7 @@ early_param(no-kvmclock, parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
+static DEFINE_PER_CPU(u64, steal_info);
 static struct pvclock_wall_clock wall_clock;
 
 /*
@@ -82,6 +85,32 @@ static cycle_t kvm_clock_read(void)
return ret;
 }
 
+static DEFINE_PER_CPU(u64, steal_info);
+
+cputime_t kvm_get_steal_time(void)
+{
+   u64 delta = 0;
+   u64 *last_steal_info, this_steal_info;
+   struct pvclock_vcpu_time_info *src;
+
+   src = get_cpu_var(hv_clock);
+   if (!(src-flags  PVCLOCK_STEAL_BIT))
+   goto out;
+
+   this_steal_info = src-steal_time;
+   put_cpu_var(hv_clock);
+
+   last_steal_info = get_cpu_var(steal_info);
+
+   delta = this_steal_info - *last_steal_info;
+
+   *last_steal_info = this_steal_info;
+   put_cpu_var(steal_info);
+
+out:
+   return msecs_to_cputime(delta);
+}
+
 static cycle_t kvm_clock_get_cycles(struct clocksource *cs)
 {
return kvm_clock_read();
@@ -134,6 +163,8 @@ static int kvm_register_clock(char *txt)
printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n,
   cpu, high, low, txt);
 
+   per_cpu(steal_info, cpu) = 0;
+
return native_write_msr_safe(msr_kvm_system_time, low, high);
 }
 
@@ -218,4 +249,8 @@ void __init kvmclock_init(void)
 
if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+
+   
+   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STEAL_BIT))
+   hypervisor_steal_time = kvm_get_steal_time;
 }
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 7/7] tell guest about steal time feature

2010-08-25 Thread Glauber Costa
Guest kernel will only activate steal time if the host exports it.
Warn the guest about it.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/kvm/x86.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 680feaa..9a20cd8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2101,7 +2101,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
entry-eax = (1  KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_NOP_IO_DELAY) |
 (1  KVM_FEATURE_CLOCKSOURCE2) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_CLOCKSOURCE_STEAL_BIT);
entry-ebx = 0;
entry-ecx = 0;
entry-edx = 0;
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 6/7] touch softlockup watchdog

2010-08-25 Thread Glauber Costa
With a reliable steal time mechanism, we can tell if we're
out of the cpu for very long, differentiating from the case
that we simply got a real softlockup.

In the case we were out of cpu, the watchdog is fed, making
bogus softlockups disappear.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/kernel/kvmclock.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index a1f4852..1c496c8 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -104,6 +104,9 @@ cputime_t kvm_get_steal_time(void)
 
delta = this_steal_info - *last_steal_info;
 
+   if (delta  1000UL)
+   touch_softlockup_watchdog();
+
*last_steal_info = this_steal_info;
put_cpu_var(steal_info);
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM timekeeping 25/35] Add clock catchup mode

2010-08-25 Thread Marcelo Tosatti
On Wed, Aug 25, 2010 at 10:48:20AM -1000, Zachary Amsden wrote:
 On 08/25/2010 07:27 AM, Marcelo Tosatti wrote:
 On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote:
 Make the clock update handler handle generic clock synchronization,
 not just KVM clock.  We add a catchup mode which keeps passthrough
 TSC in line with absolute guest TSC.
 
 Signed-off-by: Zachary Amsdenzams...@redhat.com
 ---
   arch/x86/include/asm/kvm_host.h |1 +
   arch/x86/kvm/x86.c  |   55 
  ++
   2 files changed, 38 insertions(+), 18 deletions(-)
 
 kvm_x86_ops-vcpu_load(vcpu, cpu);
 -   if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) {
 +   if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) {
 /* Make sure TSC doesn't go backwards */
 s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
 native_read_tsc() - vcpu-arch.last_host_tsc;
 if (tsc_delta  0)
 mark_tsc_unstable(KVM discovered backwards TSC);
 -   if (check_tsc_unstable())
 +   if (check_tsc_unstable()) {
 kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
 -   kvm_migrate_timers(vcpu);
 +   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 +   }
 +   if (vcpu-cpu != cpu)
 +   kvm_migrate_timers(vcpu);
 vcpu-cpu = cpu;
 +   vcpu-arch.tsc_rebase = 0;
 }
   }
 
 @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 kvm_x86_ops-vcpu_put(vcpu);
 kvm_put_guest_fpu(vcpu);
 vcpu-arch.last_host_tsc = native_read_tsc();
 +
 +   /* For unstable TSC, force compensation and catchup on next CPU */
 +   if (check_tsc_unstable()) {
 +   vcpu-arch.tsc_rebase = 1;
 +   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 +   }
 The mix between catchup,trap versus stable,unstable TSC is confusing and
 difficult to grasp. Can you please introduce all the infrastructure
 first, then control usage of them in centralized places? Examples:
 
 +static void kvm_update_tsc_trapping(struct kvm *kvm)
 +{
 +   int trap, i;
 +   struct kvm_vcpu *vcpu;
 +
 +   trap = check_tsc_unstable()  atomic_read(kvm-online_vcpus)  1;
 +   kvm_for_each_vcpu(i, vcpu, kvm)
 +   kvm_x86_ops-set_tsc_trap(vcpu, trap  
 !vcpu-arch.time_page);
 +}
 
 +   /* For unstable TSC, force compensation and catchup on next CPU */
 +   if (check_tsc_unstable()) {
 +   vcpu-arch.tsc_rebase = 1;
 +   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 +   }
 
 
 kvm_guest_time_update is becoming very confusing too. I understand this
 is due to the many cases its dealing with, but please make it as simple
 as possible.
 
 I tried to comment as best as I could.  I think the whole
 kvm_update_tsc_trapping thing is probably a poor design choice.
 It works, but it's thoroughly unintelligible right now without
 spending some days figuring out why.
 
 I'll rework the tail series of patches to try to make them more clear.
 
 +   /*
 +* If we are trapping and no longer need to, use catchup to
 +* ensure passthrough TSC will not be less than trapped TSC
 +*/
 +   if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH  vcpu-tsc_trapping
 +   ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) {
 +   catchup = 1;
 
 What, TSC trapping with kvmclock enabled?
 
 Transitioning to use of kvmclock after a cold boot means we may have
 been trapping and now we will not be.
 
 For both catchup and trapping the resolution of the host clock is
 important, as Glauber commented for kvmclock. Can you comment on the
 problems that arrive from a low res clock for both modes?
 
 Similarly for catchup mode, the effect of exit frequency. No need for
 any guarantees?
 
 The scheduler will do something to get an IRQ at whatever resolution
 it uses for it's timeslice.  That guarantees an exit per timeslice,
 so we'll never be behind by more than one slice while scheduling.
 While not scheduling, we're dormant anyway, waiting on either an IRQ
 or shared memory variable change.  Local timers could end up behind
 when dormant.
 
 We may need a hack to accelerate firing of timers in such a case, or
 perhaps bounds on when to use catchup mode and when to not.

What about emulating rdtsc with low res clock? 

The RDTSC instruction reads the time-stamp counter and is guaranteed to
return a monotonically increasing unique value whenever executed, except
for a 64-bit counter wraparound.

 Partly, the lack of implementation is by deliberate choice; the
 logic involved with setting such bounds and wisdom of doing so is a
 choice most likely to be done by a policy agent in userspace, in our
 case, qemu.  In the end, that is what has full control over the
 setting or not of guest TSC rate and choice of TSC mode.
 
 What's lacking is the ability to force the use of a certain mode. 

QEMU/KVM Hardware Acceleration Clarity Question

2010-08-25 Thread matthew . r . rohrer
Hello, 
After looking all over the net and trying some KVM stuff I just want to
get some clarification on which QEMU I should be using.  I am trying to
use virtio with hardware acceleration.  

Platform Fedora x86_64-kernel 2.6.32.16-150 KVM kernel modules
2.6.32.16-150

Question:
Does the qemu-system-x86_64 version 0.12.5 with the -enable-kvm switch
have all of the hardware acceleration stuff that is in qemu-kvm 0.11.0?
The reason I ask is I seem to be having network performance problems I
will save for another email.  I have read various message boards
indicating that qemu-system does not have all of the acceleration in it
yet and others that make me believe it should have all of the
acceleration stuff in qemu-kvm.  I have tried both and seem to get
slightly better performance with qemu-kvm.  

The version of qemu-kvm came with the installation and I can't seem to
find a newer version.  I downloaded qemu-system 12.5 from the repository
and built it myself.

The qemu-system appears to be doing hardware acceleration as the
indicators match the qemu-kvm indicators.  But since the performance was
slightly worse I thought I would check and see if version 12.5 had all
of the hardware acceleration that is in qemu-kvm version 11.0.

Also, Vhost support is only in kernel 2.6.35 even though qemu 12.5 has
support.  Is that correct?

Thank you very much in advance! Your help is greatly appreciated.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)

2010-08-25 Thread Marcelo Tosatti
On Wed, Aug 25, 2010 at 02:10:53PM +0800, Wei Yongjun wrote:
 Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
 ---
  arch/x86/kvm/emulate.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Applied (and v2 testcase), thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM timekeeping 25/35] Add clock catchup mode

2010-08-25 Thread Glauber Costa
On Wed, Aug 25, 2010 at 07:01:34PM -0300, Marcelo Tosatti wrote:
 On Wed, Aug 25, 2010 at 10:48:20AM -1000, Zachary Amsden wrote:
  On 08/25/2010 07:27 AM, Marcelo Tosatti wrote:
  On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote:
  Make the clock update handler handle generic clock synchronization,
  not just KVM clock.  We add a catchup mode which keeps passthrough
  TSC in line with absolute guest TSC.
  
  Signed-off-by: Zachary Amsdenzams...@redhat.com
  ---
arch/x86/include/asm/kvm_host.h |1 +
arch/x86/kvm/x86.c  |   55 
   ++
2 files changed, 38 insertions(+), 18 deletions(-)
  
kvm_x86_ops-vcpu_load(vcpu, cpu);
  - if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) {
  + if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) {
/* Make sure TSC doesn't go backwards */
s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
native_read_tsc() - 
   vcpu-arch.last_host_tsc;
if (tsc_delta  0)
mark_tsc_unstable(KVM discovered backwards 
   TSC);
  - if (check_tsc_unstable())
  + if (check_tsc_unstable()) {
kvm_x86_ops-adjust_tsc_offset(vcpu, 
   -tsc_delta);
  - kvm_migrate_timers(vcpu);
  + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
  + }
  + if (vcpu-cpu != cpu)
  + kvm_migrate_timers(vcpu);
vcpu-cpu = cpu;
  + vcpu-arch.tsc_rebase = 0;
}
}
  
  @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_x86_ops-vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu-arch.last_host_tsc = native_read_tsc();
  +
  + /* For unstable TSC, force compensation and catchup on next CPU */
  + if (check_tsc_unstable()) {
  + vcpu-arch.tsc_rebase = 1;
  + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
  + }
  The mix between catchup,trap versus stable,unstable TSC is confusing and
  difficult to grasp. Can you please introduce all the infrastructure
  first, then control usage of them in centralized places? Examples:
  
  +static void kvm_update_tsc_trapping(struct kvm *kvm)
  +{
  +   int trap, i;
  +   struct kvm_vcpu *vcpu;
  +
  +   trap = check_tsc_unstable()  atomic_read(kvm-online_vcpus)  1;
  +   kvm_for_each_vcpu(i, vcpu, kvm)
  +   kvm_x86_ops-set_tsc_trap(vcpu, trap  
  !vcpu-arch.time_page);
  +}
  
  +   /* For unstable TSC, force compensation and catchup on next CPU */
  +   if (check_tsc_unstable()) {
  +   vcpu-arch.tsc_rebase = 1;
  +   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
  +   }
  
  
  kvm_guest_time_update is becoming very confusing too. I understand this
  is due to the many cases its dealing with, but please make it as simple
  as possible.
  
  I tried to comment as best as I could.  I think the whole
  kvm_update_tsc_trapping thing is probably a poor design choice.
  It works, but it's thoroughly unintelligible right now without
  spending some days figuring out why.
  
  I'll rework the tail series of patches to try to make them more clear.
  
  +   /*
  +* If we are trapping and no longer need to, use catchup to
  +* ensure passthrough TSC will not be less than trapped TSC
  +*/
  +   if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH  vcpu-tsc_trapping
  +   ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) {
  +   catchup = 1;
  
  What, TSC trapping with kvmclock enabled?
  
  Transitioning to use of kvmclock after a cold boot means we may have
  been trapping and now we will not be.
  
  For both catchup and trapping the resolution of the host clock is
  important, as Glauber commented for kvmclock. Can you comment on the
  problems that arrive from a low res clock for both modes?
  
  Similarly for catchup mode, the effect of exit frequency. No need for
  any guarantees?
  
  The scheduler will do something to get an IRQ at whatever resolution
  it uses for it's timeslice.  That guarantees an exit per timeslice,
  so we'll never be behind by more than one slice while scheduling.
  While not scheduling, we're dormant anyway, waiting on either an IRQ
  or shared memory variable change.  Local timers could end up behind
  when dormant.
  
  We may need a hack to accelerate firing of timers in such a case, or
  perhaps bounds on when to use catchup mode and when to not.
 
 What about emulating rdtsc with low res clock? 
 
 The RDTSC instruction reads the time-stamp counter and is guaranteed to
 return a monotonically increasing unique value whenever executed, except
 for a 64-bit counter wraparound.
 
This is bad semantics, IMHO. It is a totally different behaviour than the
one guest users would expect.
--
To unsubscribe from this list: send the line 

Re: [KVM timekeeping 25/35] Add clock catchup mode

2010-08-25 Thread Zachary Amsden

On 08/25/2010 12:01 PM, Marcelo Tosatti wrote:

On Wed, Aug 25, 2010 at 10:48:20AM -1000, Zachary Amsden wrote:
   

On 08/25/2010 07:27 AM, Marcelo Tosatti wrote:
 

On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote:
   

Make the clock update handler handle generic clock synchronization,
not just KVM clock.  We add a catchup mode which keeps passthrough
TSC in line with absolute guest TSC.

Signed-off-by: Zachary Amsdenzams...@redhat.com
---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/x86.c  |   55 ++
  2 files changed, 38 insertions(+), 18 deletions(-)

kvm_x86_ops-vcpu_load(vcpu, cpu);
-   if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) {
+   if (unlikely(vcpu-cpu != cpu) || vcpu-arch.tsc_rebase) {
/* Make sure TSC doesn't go backwards */
s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
native_read_tsc() - vcpu-arch.last_host_tsc;
if (tsc_delta   0)
mark_tsc_unstable(KVM discovered backwards TSC);
-   if (check_tsc_unstable())
+   if (check_tsc_unstable()) {
kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
-   kvm_migrate_timers(vcpu);
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }
+   if (vcpu-cpu != cpu)
+   kvm_migrate_timers(vcpu);
vcpu-cpu = cpu;
+   vcpu-arch.tsc_rebase = 0;
}
  }

@@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_x86_ops-vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu-arch.last_host_tsc = native_read_tsc();
+
+   /* For unstable TSC, force compensation and catchup on next CPU */
+   if (check_tsc_unstable()) {
+   vcpu-arch.tsc_rebase = 1;
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }
 

The mix between catchup,trap versus stable,unstable TSC is confusing and
difficult to grasp. Can you please introduce all the infrastructure
first, then control usage of them in centralized places? Examples:

+static void kvm_update_tsc_trapping(struct kvm *kvm)
+{
+   int trap, i;
+   struct kvm_vcpu *vcpu;
+
+   trap = check_tsc_unstable()   atomic_read(kvm-online_vcpus)   1;
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_x86_ops-set_tsc_trap(vcpu, trap   !vcpu-arch.time_page);
+}

+   /* For unstable TSC, force compensation and catchup on next CPU */
+   if (check_tsc_unstable()) {
+   vcpu-arch.tsc_rebase = 1;
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   }


kvm_guest_time_update is becoming very confusing too. I understand this
is due to the many cases its dealing with, but please make it as simple
as possible.
   

I tried to comment as best as I could.  I think the whole
kvm_update_tsc_trapping thing is probably a poor design choice.
It works, but it's thoroughly unintelligible right now without
spending some days figuring out why.

I'll rework the tail series of patches to try to make them more clear.

 

+   /*
+* If we are trapping and no longer need to, use catchup to
+* ensure passthrough TSC will not be less than trapped TSC
+*/
+   if (vcpu-tsc_mode == TSC_MODE_PASSTHROUGH   vcpu-tsc_trapping
+   ((this_tsc_khz= v-kvm-arch.virtual_tsc_khz || kvmclock))) {
+   catchup = 1;

What, TSC trapping with kvmclock enabled?
   

Transitioning to use of kvmclock after a cold boot means we may have
been trapping and now we will not be.

 

For both catchup and trapping the resolution of the host clock is
important, as Glauber commented for kvmclock. Can you comment on the
problems that arrive from a low res clock for both modes?

Similarly for catchup mode, the effect of exit frequency. No need for
any guarantees?
   

The scheduler will do something to get an IRQ at whatever resolution
it uses for it's timeslice.  That guarantees an exit per timeslice,
so we'll never be behind by more than one slice while scheduling.
While not scheduling, we're dormant anyway, waiting on either an IRQ
or shared memory variable change.  Local timers could end up behind
when dormant.

We may need a hack to accelerate firing of timers in such a case, or
perhaps bounds on when to use catchup mode and when to not.
 

What about emulating rdtsc with low res clock?

The RDTSC instruction reads the time-stamp counter and is guaranteed to
return a monotonically increasing unique value whenever executed, except
for a 64-bit counter wraparound.
   


Technically, that may not be quite correct.

digression into weeds

The RDTSC instruction will return a monotonically increasing unique 
value, but the execution and retirement of the instruction are 
unserialized.  So technically, two simultaneous RDTSC could 

Re: [Qemu-devel] [PATCH v2 0/2] Use kvm64/kvm32 when running under KVM

2010-08-25 Thread Sheng Yang
On Wednesday 28 July 2010 20:42:22 jes.soren...@redhat.com wrote:
 From: Jes Sorensen jes.soren...@redhat.com
 
 This set of patches adds default CPU types to the PC compat
 definitions, and patch #2 sets the CPU type to kvm64/kvm32 when
 running under KVM.
 
 Long term we might want to qdev'ify the CPUs but I think it is better
 to keep it simple for 0.13.
 
Looks good to me.

Some Windows guest wouldn't enable MSI if it think the processor is too old. So 
we 
need this patchset.

--
regards
Yang, Sheng


 Jes Sorensen (2):
   Set a default CPU type for compat PC machine defs.
   Use kvm32/kvm64 as default CPUs when running under KVM.
 
  hw/boards.h  |1 +
  hw/pc.c  |   13 +++--
  hw/pc_piix.c |   15 +++
  vl.c |2 ++
  4 files changed, 29 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cgroups: fix API thinko

2010-08-25 Thread Paul Menage
On Wed, Aug 25, 2010 at 2:35 PM, Andrew Morton
a...@linux-foundation.org wrote:

 So am I correct to assume that this change is now needed in 2.6.36, and
 unneeded in 2.6.35?

 Can it affect the userspace-kernel API in amy manner?  If so, it
 should be backported into earlier kernels to reduce the number of
 incompatible kernels out there.

AFAICS it shouldn't affect any existing APIs, either in-kernel or to
userspace - it just makes the existing function
cgroup_attach_task_current_cg() a specialization of a more generic new
function.


 Paul, did you have any comments?

Other than the language being a bit confusing, it seems fine. I'd
probably word the patch description as:

Add cgroup_attach_task_all()

The existing cgroup_attach_task_current_cg() API is called by a thread
to attach another thread to all of its cgroups; this is unsuitable for
cases where a privileged task wants to attach itself to the cgroups
of a less privileged one, since the call must be made from the context
of the target task.

This patch adds a more generic cgroup_attach_task_all() API that
allows both the source task and to-be-moved task to be specified.
cgroup_attach_task_current_cg() becomes a specialization of the more
generic new function.

Acked-by: Paul Menage men...@google.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receiving delayed packets from RTL8139 card in KVM

2010-08-25 Thread Karthik Vadambacheri Manian
Hi Avi,

 There may be a missing wakeup in the networking path.  You can try adding
 printf()s in RTL8139's receive function, and in kvm_set_irq_level() to see
 if there's a problem in interrupt flow.

Thanks for your insights. I debugged the RTL8139's receive function
rtl8139_do_receive() and the functions rtl8139_update_irq()
(responsible for initiating the rtl8139 interrupts) and
kvm_set_irq_level(). I came to know that the packets were not delayed
as opposed to previous assumption. Also the interrupts were promptly
provided by the rtl8139 using rtl8139_update_irq(). Also it seems
kvm_set_irq_level() injects the interrupts to the guest properly.

But inside the guest(kitten LWK) I see the number of interrupts to be
less than what is injected from KVM. This may be due to the interrupts
being coalesced in KVM before injection or the guest somehow misses
some interrupts. I feel the coalescing of the interrupts may cause the
problem in my application. For eg. Two packets needs to be received,
hence instead of injecting two interrupts only one coalesced interrupt
is injected to the guest. The RTL8139 driver in guest on receiving the
coalesced interrupt reads the first packet(which is present in buffer)
but when it tries to read the next packet immediately, the packet has
not yet arrived to the guest's RTL8139's buffer hence it assumes no
more data and returns. The second packet here is not received leading
to retransmission. But in this scenario if a second interrupt occured
then the driver would have once again checked the buffer to find the
data available thereby avoiding retransmission.

Please let me know what is the minimum time between interrupts to be
considered for coalescing. Is it possible to reduce this time further
as it will be useful in my case. Please let me know your comments on
this issue.

Thanks for your time,
karthik
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio-9p error

2010-08-25 Thread Bruno Cesar Ribas
Hi,

Every time I try to rsync something to my 9p fs mounted I get the following
message on guest kernel:

8
[   45.866789] 9pnet_virtio virtio2: requests:id 0 is not a head!
8

I'm sending the FS with the following argument via qemu-kvm (commit head
422476cc422 of August 12):
 -virtfs 
local,path=/home/outras/montecristo,mount_tag=home,security_model=passthrough

And I'm mounting on the guest with:
   mount -t 9p -o trans=virtio home /home

My guest is running kernel 2.6.34.

After I start the rsync (a second later) i get the message i showed above
and the the virtio fs crashes and i have to reboot the virtual machine.

Another 'interesting' thing is that the filesystem wont show in 'df'. But
appears correct in /proc/mounts

8
montecristo-nova:/home# mount|tail -1
home on /home type 9p (rw,sync,dirsync,relatime,trans=virtio)
8


Here is a little of my rsync output:

8
montecristo-nova:/home# cd nobackup/
montecristo-nova:/home/nobackup# rsync -aHx --progress xadrezlivre:/ 
xadrezlivre/
receiving incremental file list
rsync: symlink /home/nobackup/xadrezlivre/cdrom - media/cdrom
failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/initrd.img -
boot/initrd.img-2.6.26-1-486 failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/initrd.img.old -
boot/initrd.img-2.6.18-6-486 failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/vmlinuz -
boot/vmlinuz-2.6.26-1-486 failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/vmlinuz.old -
boot/vmlinuz-2.6.18-6-486 failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/bin/nc -
/etc/alternatives/nc failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/bin/netcat -
/etc/alternatives/netcat failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/bin/pidof -
../sbin/killall5 failed: No such file or directory (2)
rsync: symlink /home/nobackup/xadrezlivre/bin/rnano - nano
failed: No such file or directory (2)
8

If anyone knows what it is or what else should I send to understand better
the problem, please tell me

Thanks in advance

-- 
Bruno Ribas - ri...@c3sl.ufpr.br
http://www.inf.ufpr.br/ribas
C3SL: http://www.c3sl.ufpr.br
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


latest KVM tree build fail on 32bit system

2010-08-25 Thread Hao, Xudong
Hi, did anyone see latest kvm build fail on 32bit system?
kvm commit: 152d921348ee2ac7bb73d599c5796a027f0a660c
gcc version 4.1.2

   ...
LD  arch/x86/boot/setup.elf
OBJCOPY arch/x86/boot/setup.bin
OBJCOPY arch/x86/boot/vmlinux.bin
BUILD   arch/x86/boot/bzImage
Root device is (8, 2)
Setup is 14008 bytes (padded to 14336 bytes).
System is 2565 kB
CRC 72558b30
Kernel: arch/x86/boot/bzImage is ready  (#3)
  Building modules, stage 2.
MODPOST 958 modules
WARNING: drivers/block/cpqarray.o(.devinit.text+0x20d): Section 
mismatch in reference from the function cpqarray_register_ctlr() to the 
function .init.text:ida_procinit()
The function __devinit cpqarray_register_ctlr() references
a function __init ida_procinit().
If ida_procinit is only used by cpqarray_register_ctlr then
annotate ida_procinit with a matching annotation.

WARNING: net/bluetooth/rfcomm/rfcomm.o(.init.text+0xac): Section 
mismatch in reference from the function init_module() to the function 
.exit.text:rfcomm_cleanup_ttys()
The function __init init_module() references
a function __exit rfcomm_cleanup_ttys().
This is often seen when error handling in the init function
uses functionality in the exit path.
The fix is often to remove the __exit annotation of
rfcomm_cleanup_ttys() so it may be used outside an exit section.

ERROR: __udivdi3 [arch/x86/kvm/kvm.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2


Best Regards,
Xudong Hao
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html