[SeaBIOS PATCH v2] Fix regression of commit 87b533bf

2011-09-20 Thread Amos Kong
From 4678a3cb0e0a3cd7a4bc3d284c5719fdba90bc61 Mon Sep 17 00:00:00 2001
From: Kevin O'Connor ke...@koconnor.net
Date: Tue, 20 Sep 2011 15:43:55 +0800
Subject: [PATCH V2] Fix regression of commit 87b533bf

From: Kevin O'Connor ke...@koconnor.net

After adding more device entries in ACPI DSDT tables,
the filesize of bios.bin changed from 128K to 256K.
But bios could not initialize successfully.

This is a regression since seabios commit 87b533bf. Prior to
that commit, seabios did not mark the early 32bit initialization
code as init code. However, a side effect of marking that code
(handle_post) as init code is that it is more likely the linker
could place the code at an address less than 0xe.
The patch (just a hack) would cover up the issue.

---
Changes from v1:
- correct the attribution

Signed-off-by: Kevin O'Connor ke...@koconnor.net
---
 src/post.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/src/post.c b/src/post.c
index e195e89..bc2e548 100644
--- a/src/post.c
+++ b/src/post.c
@@ -336,7 +336,7 @@ reloc_init(void)
 // Start of Power On Self Test (POST) - the BIOS initilization phase.
 // This function does the setup needed for code relocation, and then
 // invokes the relocation and main setup code.
-void VISIBLE32INIT
+void VISIBLE32FLAT
 handle_post(void)
 {
 debug_serial_setup();
@@ -356,6 +356,14 @@ handle_post(void)
 
 // Allow writes to modify bios area (0xf)
 make_bios_writable();
+
+void handle_post2(void);
+handle_post2();
+}
+
+void VISIBLE32INIT
+handle_post2(void)
+{
 HaveRunPost = 1;
 
 // Detect ram and setup internal malloc.
-- 
1.7.6.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Migration (suspending) fails with large memory VM

2011-09-20 Thread Matthias Hovestadt

Hi!

In our cluster system (16 nodes, each node 32GB RAM and two quad-core
CPUs) we are using Gentoo with kvm 0.15.0 for executing virtual
machines.

---
Linux asok01 3.0.4-gentoo #1 SMP Thu Sep 8 09:15:01 CEST 2011 x86_64 
Intel(R) Xeon(R) CPU X5355 @ 2.66GHz GenuineIntel GNU/Linux

---

Migrating VMs between server nodes works perfectly fine, except for
migrating large memory VMs to local disk (a.k.a. suspending the VM).


The command we use for migrating (suspending) the VM is:

---
QEMU 0.15.0 monitor - type 'help' for more information
(qemu) migrate -d exec:gzip -c  /var/run/kvm/suspend-vm-219.bin.gz
---

For most VMs, this command triggers the suspension procedure, having
the file suspend-vm-219.bin.gz being generated. The progress can be
monitored using:

---
QEMU 0.15.0 monitor - type 'help' for more information
(qemu) info migrate
Migration status: active
transferred ram: 36758 kbytes
remaining ram: 4164888 kbytes
total ram: 4211136 kbytes
(qemu)
---

After some seconds, the memory of this VM has been written to disk,
which concludes the migration process:

---
QEMU 0.15.0 monitor - type 'help' for more information
(qemu) info migrate
Migration status: completed
(qemu)
---


Unfortunately, migration is not working for our large VMs, having 7 CPUs
and 30720 KB of memory:

---
qemu-system-x86_64 -name wip -boot c -smp 7 -m 30720 -drive 
file=/dev/disk/by-path/ip-192.168.251.1:3260-iscsi-iqn.2010-09.de.tu-berlin.cit.wip-lun-0 
-net nic,model=e1000,macaddr=02:59:59:59:3:E4 -net 
tap,ifname=tap249,script=/etc/kvm/kvm-ifup,downscript=/etc/kvm/kvm-ifdown -k 
de -vnc :30249 -monitor unix:/var/run/kvm/wip/monitor,server,nowait 
-serial none -parallel none -pidfile /var/run/kvm/wip/pid

---

In case of this VM, the migration process directly fails at execution
time:

---
QEMU 0.15.0 monitor - type 'help' for more information
(qemu) migrate -d exec:gzip -c  /var/run/kvm/suspend-vm-249.bin.gz
migration failed
(qemu)
---


I did quite some testing:

- the problem seems to be independend from the number of virtual cores
  of the VM. Suspension of -smp 7 fails as well as -smp 1

- the problem seems to be only related to the amount of memory of the
  VM. Suspending -m 30720 fails, but slightly reducing the memory of
  the VM is sufficient, so that suspending is working fine. For example,
  for -m 3 the suspending is working.

- the problem seems to affect ONLY the suspension using exec:gzip.
  Really migrating the VM to another server host using exec:tcp...
  is working absolutely fine.


Any help is greatly appreciated.


Best,
Matthias

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-20 Thread Avi Kivity

On 09/19/2011 06:57 PM, Marcelo Tosatti wrote:

  Decrement when setting nmi_injected = false, increment when setting
  nmi_injected = true, in vmx/svm.c.

  That gives a queue length of 3: one running nmi and nmi_pending = 2.

Increment through the same wrapper that will collapse the second and next, also
used by kvm_inject_nmi.




Cannot be done outside vcpu context.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Avi Kivity
If simultaneous NMIs happen, we're supposed to queue the second
and next (collapsing them), but currently we sometimes collapse
the second into the first.

Fix by using a counter for pending NMIs instead of a bool; since
the counter limit depends on whether the processor is currently
in an NMI handler, which can only be checked in vcpu context
(via the NMI mask), we add a new KVM_REQ_NMI to request recalculation
of the counter.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |5 ++-
 arch/x86/kvm/x86.c  |   48 +-
 include/linux/kvm_host.h|1 +
 3 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ab4241..ab62711 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -413,8 +413,9 @@ struct kvm_vcpu_arch {
u32  tsc_catchup_mult;
s8   tsc_catchup_shift;
 
-   bool nmi_pending;
-   bool nmi_injected;
+   atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
+   unsigned nmi_pending; /* NMI queued after currently running handler */
+   bool nmi_injected;/* Trying to inject an NMI this entry */
 
struct mtrr_state_type mtrr_state;
u32 pat;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b37f18..d51e407 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -83,6 +83,7 @@
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 __user *entries);
+static void process_nmi(struct kvm_vcpu *vcpu);
 
 struct kvm_x86_ops *kvm_x86_ops;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -359,8 +360,8 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct 
x86_exception *fault)
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
-   kvm_make_request(KVM_REQ_EVENT, vcpu);
-   vcpu-arch.nmi_pending = 1;
+   atomic_inc(vcpu-arch.nmi_queued);
+   kvm_make_request(KVM_REQ_NMI, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_inject_nmi);
 
@@ -2827,6 +2828,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu 
*vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
   struct kvm_vcpu_events *events)
 {
+   process_nmi(vcpu);
events-exception.injected =
vcpu-arch.exception.pending 
!kvm_exception_is_soft(vcpu-arch.exception.nr);
@@ -2844,7 +2846,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
kvm_vcpu *vcpu,
KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);
 
events-nmi.injected = vcpu-arch.nmi_injected;
-   events-nmi.pending = vcpu-arch.nmi_pending;
+   events-nmi.pending = vcpu-arch.nmi_pending != 0;
events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu);
events-nmi.pad = 0;
 
@@ -2864,6 +2866,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
kvm_vcpu *vcpu,
  | KVM_VCPUEVENT_VALID_SHADOW))
return -EINVAL;
 
+   process_nmi(vcpu);
vcpu-arch.exception.pending = events-exception.injected;
vcpu-arch.exception.nr = events-exception.nr;
vcpu-arch.exception.has_error_code = events-exception.has_error_code;
@@ -4763,7 +4766,7 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, 
int irq, int inc_eip)
kvm_set_rflags(vcpu, ctxt-eflags);
 
if (irq == NMI_VECTOR)
-   vcpu-arch.nmi_pending = false;
+   vcpu-arch.nmi_pending = 0;
else
vcpu-arch.interrupt.pending = false;
 
@@ -5572,7 +5575,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
/* try to inject new event if pending */
if (vcpu-arch.nmi_pending) {
if (kvm_x86_ops-nmi_allowed(vcpu)) {
-   vcpu-arch.nmi_pending = false;
+   --vcpu-arch.nmi_pending;
vcpu-arch.nmi_injected = true;
kvm_x86_ops-set_nmi(vcpu);
}
@@ -5604,10 +5607,26 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
}
 }
 
+static void process_nmi(struct kvm_vcpu *vcpu)
+{
+   unsigned limit = 2;
+
+   /*
+* x86 is limited to one NMI running, and one NMI pending after it.
+* If an NMI is already in progress, limit further NMIs to just one.
+* Otherwise, allow two (and we'll inject the first one immediately).
+*/
+   if (kvm_x86_ops-get_nmi_mask(vcpu) || vcpu-arch.nmi_injected)
+   limit = 1;
+
+   vcpu-arch.nmi_pending += atomic_xchg(vcpu-arch.nmi_queued, 0);
+   vcpu-arch.nmi_pending = min(vcpu-arch.nmi_pending, limit);
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+}
+
 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 {
int r;
-   bool nmi_pending;
 

[SeaBIOS PATCH] hotplug: Add device per func in ACPI DSDT tables

2011-09-20 Thread Amos Kong
- Original Message -
 On Mon, Sep 19, 2011 at 01:02:59PM +0300, Michael S. Tsirkin wrote:
  On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
   On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:

Only func 0 is registered to guest driver (we can
only found func 0 in slot-funcs list of driver),
the other functions could not be cleaned when
hot-removing the whole slot. This patch adds
device per function in ACPI DSDT tables.

   You can't unplug a single function. Guest surely knows that.
  
  Looking at guest code, it's clear that
  at least a Linux guest doesn't know that.
 
 acpiphp_disable_slot function appears to eject all functions.

Have tested with linux/winxp/win7, hot-adding/hot-remving,
single/multiple function device, they are all fine.

 
 Does not work for me (FC12 guest).

what's your problem?  only func 0 can be added?
I hotplug/hot-remove device by this script(add func 1~7, then add func 0):

j=6
for i in `seq 1 7` 0;do
qemu-img create /tmp/resize$j$i.qcow2 10M -f qcow2
echo drive_add 0x$j.$i id=drv$j$i,if=none,file=/tmp/resize$j$i.qcow2 | nc -U 
/tmp/a
echo device_add 
virtio-blk-pci,id=dev$j$i,drive=drv$j$i,addr=0x$j.$i,multifunction=on | nc -U 
/tmp/monitor
done
sleep 5;
echo device_del dev60 | nc -U /tmp/monitor

 As mentioned previously, Linux driver
 looks for function 0 when injection request is seen (see
 enable_device
 function in acpiphp_glue.c).

   What was not fine before?

When hot-removing multifunc device, only func 0 can be removed from guest.

   Have you looked at real HW that supports PCI hot plug DSDT? Does
   it
   looks the same?
  
  I recall I saw some examples like this on the net.
  
  
new acpi-dst.hex(332K):
http://amos-kong.rhcloud.com/pub/acpi-dsdt.hex

Signed-off-by: Amos Kong ak...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-20 Thread Amos Kong
From 48ea1c9188334b89a60b4f9e853e86fc04fda4a5 Mon Sep 17 00:00:00 2001
From: Amos Kong ak...@redhat.com
Date: Tue, 20 Sep 2011 15:38:43 +0800
Subject: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

Only func 0 is registered to guest driver (we can
only found func 0 in slot-funcs list of driver),
the other functions could not be cleaned when
hot-removing the whole slot. This patch adds
device per function in ACPI DSDT tables.

Have tested with linux/winxp/win7, hot-adding/hot-remving,
single/multiple function device, they are all fine.
---
Changes from v1:
- cleanup the macros, bios.bin gets back to 128K
- notify only when func0 is added and removed

Signed-off-by: Amos Kong ak...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 src/acpi-dsdt.dsl |  106 ++--
 1 files changed, 61 insertions(+), 45 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 08412e2..707c3d6 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -128,9 +128,9 @@ DefinitionBlock (
 PCRM, 32,
 }
 
-#define hotplug_slot(name, nr) \
-Device (S##name) {\
-   Name (_ADR, nr##)  \
+#define hotplug_func(name, nr, adr, fn) \
+Device (S##name##fn) {\
+   Name (_ADR, adr)\
Method (_EJ0,1) {  \
 Store(ShiftLeft(1, nr), B0EJ) \
 Return (0x0)  \
@@ -138,6 +138,16 @@ DefinitionBlock (
Name (_SUN, name)  \
 }
 
+#define hotplug_slot(name, nr) \
+   hotplug_func(name, nr, nr##, 0)  \
+   hotplug_func(name, nr, nr##0001, 1)  \
+   hotplug_func(name, nr, nr##0002, 2)  \
+   hotplug_func(name, nr, nr##0003, 3)  \
+   hotplug_func(name, nr, nr##0004, 4)  \
+   hotplug_func(name, nr, nr##0005, 5)  \
+   hotplug_func(name, nr, nr##0006, 6)  \
+   hotplug_func(name, nr, nr##0007, 7)
+
hotplug_slot(1, 0x0001)
hotplug_slot(2, 0x0002)
hotplug_slot(3, 0x0003)
@@ -460,7 +470,7 @@ DefinitionBlock (
}
}
 
-#define gen_pci_device(name, nr)\
+#define gen_pci_device(name, nr) \
 Device(SL##name) {  \
 Name (_ADR, nr##)   \
 Method (_RMV) { \
@@ -502,6 +512,52 @@ DefinitionBlock (
gen_pci_device(29, 0x001d)
gen_pci_device(30, 0x001e)
gen_pci_device(31, 0x001f)
+
+#define gen_pci_hotplug(nr) \
+If (And(PCIU, ShiftLeft(1, nr))) { \
+Notify(S##nr##0, 1)\
+}  \
+If (And(PCID, ShiftLeft(1, nr))) { \
+Notify(S##nr##0, 3)\
+}
+
+Method(HPLG) {
+gen_pci_hotplug(1)
+gen_pci_hotplug(2)
+gen_pci_hotplug(3)
+gen_pci_hotplug(4)
+gen_pci_hotplug(5)
+gen_pci_hotplug(6)
+gen_pci_hotplug(7)
+gen_pci_hotplug(8)
+gen_pci_hotplug(9)
+gen_pci_hotplug(10)
+gen_pci_hotplug(11)
+gen_pci_hotplug(12)
+gen_pci_hotplug(13)
+gen_pci_hotplug(14)
+gen_pci_hotplug(15)
+gen_pci_hotplug(16)
+gen_pci_hotplug(17)
+gen_pci_hotplug(18)
+gen_pci_hotplug(19)
+gen_pci_hotplug(20)
+gen_pci_hotplug(21)
+gen_pci_hotplug(22)
+gen_pci_hotplug(23)
+gen_pci_hotplug(24)
+gen_pci_hotplug(25)
+gen_pci_hotplug(26)
+gen_pci_hotplug(27)
+gen_pci_hotplug(28)
+gen_pci_hotplug(29)
+gen_pci_hotplug(30)
+gen_pci_hotplug(31)
+
+Return (0x01)
+}
+
+
 }
 
 /* PCI IRQs */
@@ -842,49 +898,9 @@ DefinitionBlock (
 Return(0x01)
 }
 
-#define gen_pci_hotplug(nr)   \
-If (And(\_SB.PCI0.PCIU, ShiftLeft(1, nr))) {  \
-Notify(\_SB.PCI0.S##nr, 1)\
-} \
-If (And(\_SB.PCI0.PCID, ShiftLeft(1, nr))) {  \
-Notify(\_SB.PCI0.S##nr, 3)\
-}
-
 Method(_L01) {
-gen_pci_hotplug(1)
-gen_pci_hotplug(2)
-gen_pci_hotplug(3)
-gen_pci_hotplug(4)
-gen_pci_hotplug(5)
-gen_pci_hotplug(6)
-gen_pci_hotplug(7)
-gen_pci_hotplug(8)
-gen_pci_hotplug(9)
-

Re: [SeaBIOS PATCH] hotplug: Add device per func in ACPI DSDT tables

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 06:44:54AM -0400, Amos Kong wrote:
 - Original Message -
  On Mon, Sep 19, 2011 at 01:02:59PM +0300, Michael S. Tsirkin wrote:
   On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
 
 Only func 0 is registered to guest driver (we can
 only found func 0 in slot-funcs list of driver),
 the other functions could not be cleaned when
 hot-removing the whole slot. This patch adds
 device per function in ACPI DSDT tables.
 
You can't unplug a single function. Guest surely knows that.
   
   Looking at guest code, it's clear that
   at least a Linux guest doesn't know that.
  
  acpiphp_disable_slot function appears to eject all functions.
 
 Have tested with linux/winxp/win7, hot-adding/hot-remving,
 single/multiple function device, they are all fine.
 
  
  Does not work for me (FC12 guest).
 
 what's your problem?  only func 0 can be added?
 I hotplug/hot-remove device by this script(add func 1~7, then add func 0):
 
 j=6
 for i in `seq 1 7` 0;do
 qemu-img create /tmp/resize$j$i.qcow2 10M -f qcow2
 echo drive_add 0x$j.$i id=drv$j$i,if=none,file=/tmp/resize$j$i.qcow2 | nc -U 
 /tmp/a
 echo device_add 
 virtio-blk-pci,id=dev$j$i,drive=drv$j$i,addr=0x$j.$i,multifunction=on | nc -U 
 /tmp/monitor
 done
 sleep 5;
 echo device_del dev60 | nc -U /tmp/monitor
 
  As mentioned previously, Linux driver
  looks for function 0 when injection request is seen (see
  enable_device
  function in acpiphp_glue.c).
 
What was not fine before?
 
 When hot-removing multifunc device, only func 0 can be removed from guest.

Ah OK, musunderstood the patch was aiming for per-function hotplug.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 06:45:57AM -0400, Amos Kong wrote:
 From 48ea1c9188334b89a60b4f9e853e86fc04fda4a5 Mon Sep 17 00:00:00 2001
 From: Amos Kong ak...@redhat.com
 Date: Tue, 20 Sep 2011 15:38:43 +0800
 Subject: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables
 
 Only func 0 is registered to guest driver (we can
 only found func 0 in slot-funcs list of driver),
 the other functions could not be cleaned when
 hot-removing the whole slot. This patch adds
 device per function in ACPI DSDT tables.
 
 Have tested with linux/winxp/win7, hot-adding/hot-remving,
 single/multiple function device, they are all fine.
 ---
 Changes from v1:
 - cleanup the macros, bios.bin gets back to 128K
 - notify only when func0 is added and removed
 
 Signed-off-by: Amos Kong ak...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  src/acpi-dsdt.dsl |  106 ++--
  1 files changed, 61 insertions(+), 45 deletions(-)
 
 diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
 index 08412e2..707c3d6 100644
 --- a/src/acpi-dsdt.dsl
 +++ b/src/acpi-dsdt.dsl
 @@ -128,9 +128,9 @@ DefinitionBlock (
  PCRM, 32,
  }
  
 -#define hotplug_slot(name, nr) \
 -Device (S##name) {\
 -   Name (_ADR, nr##)  \
 +#define hotplug_func(name, nr, adr, fn) \
 +Device (S##name##fn) {\
 +   Name (_ADR, adr)\
 Method (_EJ0,1) {  \
  Store(ShiftLeft(1, nr), B0EJ) \
  Return (0x0)  \
 @@ -138,6 +138,16 @@ DefinitionBlock (
 Name (_SUN, name)  \
  }
  
 +#define hotplug_slot(name, nr) \
 + hotplug_func(name, nr, nr##, 0)  \
 + hotplug_func(name, nr, nr##0001, 1)  \
 + hotplug_func(name, nr, nr##0002, 2)  \
 + hotplug_func(name, nr, nr##0003, 3)  \
 + hotplug_func(name, nr, nr##0004, 4)  \
 + hotplug_func(name, nr, nr##0005, 5)  \
 + hotplug_func(name, nr, nr##0006, 6)  \
 + hotplug_func(name, nr, nr##0007, 7)
 +
   hotplug_slot(1, 0x0001)
   hotplug_slot(2, 0x0002)
   hotplug_slot(3, 0x0003)
 @@ -460,7 +470,7 @@ DefinitionBlock (
   }
   }
  
 -#define gen_pci_device(name, nr)\
 +#define gen_pci_device(name, nr) \
  Device(SL##name) {  \
  Name (_ADR, nr##)   \
  Method (_RMV) { \
 @@ -502,6 +512,52 @@ DefinitionBlock (
   gen_pci_device(29, 0x001d)
   gen_pci_device(30, 0x001e)
   gen_pci_device(31, 0x001f)
 +
 +#define gen_pci_hotplug(nr) \
 +If (And(PCIU, ShiftLeft(1, nr))) { \
 +Notify(S##nr##0, 1)\
 +}  \
 +If (And(PCID, ShiftLeft(1, nr))) { \
 +Notify(S##nr##0, 3)\
 +}
 +
 +Method(HPLG) {
 +gen_pci_hotplug(1)
 +gen_pci_hotplug(2)
 +gen_pci_hotplug(3)
 +gen_pci_hotplug(4)
 +gen_pci_hotplug(5)
 +gen_pci_hotplug(6)
 +gen_pci_hotplug(7)
 +gen_pci_hotplug(8)
 +gen_pci_hotplug(9)
 +gen_pci_hotplug(10)
 +gen_pci_hotplug(11)
 +gen_pci_hotplug(12)
 +gen_pci_hotplug(13)
 +gen_pci_hotplug(14)
 +gen_pci_hotplug(15)
 +gen_pci_hotplug(16)
 +gen_pci_hotplug(17)
 +gen_pci_hotplug(18)
 +gen_pci_hotplug(19)
 +gen_pci_hotplug(20)
 +gen_pci_hotplug(21)
 +gen_pci_hotplug(22)
 +gen_pci_hotplug(23)
 +gen_pci_hotplug(24)
 +gen_pci_hotplug(25)
 +gen_pci_hotplug(26)
 +gen_pci_hotplug(27)
 +gen_pci_hotplug(28)
 +gen_pci_hotplug(29)
 +gen_pci_hotplug(30)
 +gen_pci_hotplug(31)
 +
 +Return (0x01)
 +}
 +
 +
  }
  
  /* PCI IRQs */
 @@ -842,49 +898,9 @@ DefinitionBlock (
  Return(0x01)
  }
  
 -#define gen_pci_hotplug(nr)   \
 -If (And(\_SB.PCI0.PCIU, ShiftLeft(1, nr))) {  \
 -Notify(\_SB.PCI0.S##nr, 1)\
 -} \
 -If (And(\_SB.PCI0.PCID, ShiftLeft(1, nr))) {  \
 -Notify(\_SB.PCI0.S##nr, 3)\
 -}
 -
  Method(_L01) {
 -gen_pci_hotplug(1)
 -gen_pci_hotplug(2)
 -gen_pci_hotplug(3)
 -gen_pci_hotplug(4)
 -   

RFC: s390: extension capability for new address space layout

2011-09-20 Thread Christian Borntraeger
Avi,Marcelo,

598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate 
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the 
memory is 1MB-aligned. This change was done without KVM extension
capability bit. 
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).

Can you have a look at the change below and give you ACK or NACK?
If ok, I would push this patch to Heiko to be submitted via the 
s390 stream for 3.1.

Christian

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -123,6 +123,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 
switch (ext) {
case KVM_CAP_S390_PSW:
+   case KVM_CAP_S390_GMAP:
r = 1;
break;
default:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2c366b5..b2e9cc1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -553,6 +553,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_SPAPR_TCE 63
 #define KVM_CAP_PPC_SMT 64
 #define KVM_CAP_PPC_RMA65
+#define KVM_CAP_S390_GMAP 66
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: s390: extension capability for new address space layout

2011-09-20 Thread Alexander Graf

On 09/20/2011 01:56 PM, Christian Borntraeger wrote:

Avi,Marcelo,

598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the
memory is 1MB-aligned. This change was done without KVM extension
capability bit.
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).

Can you have a look at the change below and give you ACK or NACK?
If ok, I would push this patch to Heiko to be submitted via the
s390 stream for 3.1.

Christian

Signed-off-by: Christian Borntraegerborntrae...@de.ibm.com

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -123,6 +123,7 @@ int kvm_dev_ioctl_check_extension(long ext)

switch (ext) {
case KVM_CAP_S390_PSW:
+   case KVM_CAP_S390_GMAP:
r = 1;
break;
default:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2c366b5..b2e9cc1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -553,6 +553,7 @@ struct kvm_ppc_pvinfo {
  #define KVM_CAP_SPAPR_TCE 63
  #define KVM_CAP_PPC_SMT 64
  #define KVM_CAP_PPC_RMA   65
+#define KVM_CAP_S390_GMAP 66


I would really appreciate if you could take capability number 71. I 
already have patches pending (partly already in avi's tree) that occupy 
everything up to 70 :)



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: s390: extension capability for new address space layout

2011-09-20 Thread Christian Borntraeger
On 20/09/11 14:00, Alexander Graf wrote:
 +#define KVM_CAP_S390_GMAP 66
 
 I would really appreciate if you could take capability number 71. I already 
 have patches pending (partly already in avi's tree) that occupy everything up 
 to 70 :)

Whatever number is best for you all. Just waiting for Avi or Marcelo to ACK.
Christian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: s390: extension capability for new address space layout

2011-09-20 Thread Avi Kivity

On 09/20/2011 02:56 PM, Christian Borntraeger wrote:

Avi,Marcelo,

598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the
memory is 1MB-aligned. This change was done without KVM extension
capability bit.
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).

Can you have a look at the change below and give you ACK or NACK?
If ok, I would push this patch to Heiko to be submitted via the
s390 stream for 3.1.


ACK, with Alex's changes.  Why are kvm changes not going in through the 
kvm tree?  Or at least kvm@ review?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: s390: extension capability for new address space layout

2011-09-20 Thread Carsten Otte

On 20.09.2011 14:11, Avi Kivity wrote:

ACK, with Alex's changes. Why are kvm changes not going in through the
kvm tree? Or at least kvm@ review?
It slipped through as I've put it into our CVS to test against Martin's 
changes. Sorry about that, we'll make sure it won't happen again.


so long,
Carsten
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-20 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
 On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
  On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com 
  wrote:
   On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
   Note:
        1.) When bps/iops limits are specified to a small value such as 
   511 bytes/s, this VM will hang up. We are considering how to handle 
   this senario.
  
   You can increase the length of the slice, if the request is larger than
   slice_time * bps_limit.
  Yeah, but it is a challenge for how to increase it. Do you have some nice 
  idea?
 
  If the queue is empty, and the request being processed does not fit the
  queue, increase the slice so that the request fits.
 Sorry for late reply. actually, do you think that this scenario is
 meaningful for the user?
 Since we implement this, if the user limits the bps below 512
 bytes/second, the VM can also not run every task.
 Can you let us know why we need to make such effort?

It would be good to handle request larger than the slice.

It is not strictly necessary, but in case its not handled, a minimum
should be in place, to reflect maximum request size known. Being able to
specify something which crashes is not acceptable.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 01:43:14PM +0300, Avi Kivity wrote:
 If simultaneous NMIs happen, we're supposed to queue the second
 and next (collapsing them), but currently we sometimes collapse
 the second into the first.
 
 Fix by using a counter for pending NMIs instead of a bool; since
 the counter limit depends on whether the processor is currently
 in an NMI handler, which can only be checked in vcpu context
 (via the NMI mask), we add a new KVM_REQ_NMI to request recalculation
 of the counter.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |5 ++-
  arch/x86/kvm/x86.c  |   48 +-
  include/linux/kvm_host.h|1 +
  3 files changed, 35 insertions(+), 19 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 6ab4241..ab62711 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -413,8 +413,9 @@ struct kvm_vcpu_arch {
   u32  tsc_catchup_mult;
   s8   tsc_catchup_shift;
  
 - bool nmi_pending;
 - bool nmi_injected;
 + atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
 + unsigned nmi_pending; /* NMI queued after currently running handler */
 + bool nmi_injected;/* Trying to inject an NMI this entry */
  
   struct mtrr_state_type mtrr_state;
   u32 pat;
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 6b37f18..d51e407 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -83,6 +83,7 @@
  static void update_cr8_intercept(struct kvm_vcpu *vcpu);
  static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
   struct kvm_cpuid_entry2 __user *entries);
 +static void process_nmi(struct kvm_vcpu *vcpu);
  
  struct kvm_x86_ops *kvm_x86_ops;
  EXPORT_SYMBOL_GPL(kvm_x86_ops);
 @@ -359,8 +360,8 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct 
 x86_exception *fault)
  
  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
  {
 - kvm_make_request(KVM_REQ_EVENT, vcpu);
 - vcpu-arch.nmi_pending = 1;
 + atomic_inc(vcpu-arch.nmi_queued);
 + kvm_make_request(KVM_REQ_NMI, vcpu);
  }
  EXPORT_SYMBOL_GPL(kvm_inject_nmi);
  
 @@ -2827,6 +2828,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu 
 *vcpu,
  static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
  struct kvm_vcpu_events *events)
  {
 + process_nmi(vcpu);
   events-exception.injected =
   vcpu-arch.exception.pending 
   !kvm_exception_is_soft(vcpu-arch.exception.nr);
 @@ -2844,7 +2846,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
 kvm_vcpu *vcpu,
   KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);
  
   events-nmi.injected = vcpu-arch.nmi_injected;
 - events-nmi.pending = vcpu-arch.nmi_pending;
 + events-nmi.pending = vcpu-arch.nmi_pending != 0;
   events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu);
   events-nmi.pad = 0;

nmi_queued should also be saved and restored. Not sure if its necessary
though.

Should at least reset nmi_queued somewhere (set_vcpu_events?).

 @@ -2864,6 +2866,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
 kvm_vcpu *vcpu,
 | KVM_VCPUEVENT_VALID_SHADOW))
   return -EINVAL;
  
 + process_nmi(vcpu);

This should be after nmi fields are set, not before?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Memory API code review

2011-09-20 Thread Anthony Liguori

On 09/14/2011 10:07 AM, Avi Kivity wrote:

I would like to carry out an online code review of the memory API so that more
people are familiar with the internals, and perhaps even to catch some bugs or
deficiency. I'd like to use the next kvm conference call slot for this (Tuesday
1400 UTC) since many people already have it reserved in the schedule.

It would be great if people from the wider qemu community be present, rather
than the usual x86 is everything crowd (+Jan) that usually participates in the
kvm weekly call.

Juan, Chris, can we dedicate next week's call to this?

We'll also need a way to disseminate a few slides and an editor session for
showing the code. We have an elluminate account that can be used for this, but
usually this has a 50% failure rate on Linux. Anthony, perhaps we can set up a
view-only vnc reflector on qemu.org?



The reflector is at:

qemu.osuosl.org:0

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Avi Kivity

On 09/20/2011 04:25 PM, Marcelo Tosatti wrote:


  @@ -2827,6 +2828,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu 
*vcpu,
   static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
struct kvm_vcpu_events *events)
   {
  + process_nmi(vcpu);
events-exception.injected =
vcpu-arch.exception.pending
!kvm_exception_is_soft(vcpu-arch.exception.nr);
  @@ -2844,7 +2846,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
kvm_vcpu *vcpu,
KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);

events-nmi.injected = vcpu-arch.nmi_injected;
  - events-nmi.pending = vcpu-arch.nmi_pending;
  + events-nmi.pending = vcpu-arch.nmi_pending != 0;
events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu);
events-nmi.pad = 0;

nmi_queued should also be saved and restored. Not sure if its necessary
though.

Should at least reset nmi_queued somewhere (set_vcpu_events?).


Did you miss the call to process_nmi()?

We do have a small issue.  If we exit during NMI-blocked-by-STI and 
nmi_pending == 2, then we lose the second interrupt.  Should rarely 
happen, since external interrupts never exit in that condition, but it's 
a wart.




  @@ -2864,6 +2866,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
kvm_vcpu *vcpu,
| KVM_VCPUEVENT_VALID_SHADOW))
return -EINVAL;

  + process_nmi(vcpu);

This should be after nmi fields are set, not before?



It's actually to clear queued NMIs in case we set nmi_pending (which 
should never happen unless the machine is completely quiet, since it's 
asynchronous to the vcpu; same as the IRR).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] s390 Fix address mode switching

2011-09-20 Thread Christian Borntraeger
Avi,

here is another fix I want to submit for 3.1 (if possible) 
that repairs KVM for s390. Are you ok to submit this patch
via s390 tree - iow can you ACK the kvm specific part
(arch/s390/kvm/kvm-s390.c). The biggest chunk is low level
SIE handling and memory management. This was already
discussed with Martin Schwidefsky.

Christian




From: Christian Borntraeger borntrae...@de.ibm.com

598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
spaces for kvm guest images) changed kvm to use a separate address
space for kvm guests. This address space was switched in __vcpu_run
In some cases (preemption, page fault) there is the possibility that
this address space switch is lost.
The typical symptom was a huge amount of validity intercepts or
random guest addressing exceptions.
Fix this by doing the switch in sie_loop and sie_exit and saving the
address space in the gmap structure itself. Also use the preempt
notifier.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

--- linux/arch/s390/include/asm/pgtable.h   21 Jun 2011 10:33:42 -  
1.38
+++ linux/arch/s390/include/asm/pgtable.h   19 Sep 2011 07:37:09 -  
1.39
@@ -658,12 +658,14 @@
  * struct gmap_struct - guest address space
  * @mm: pointer to the parent mm_struct
  * @table: pointer to the page directory
+ * @asce: address space control element for gmap page table
  * @crst_list: list of all crst tables used in the guest address space
  */
 struct gmap {
struct list_head list;
struct mm_struct *mm;
unsigned long *table;
+   unsigned long asce;
struct list_head crst_list;
 };

--- linux/arch/s390/kernel/asm-offsets.c14 Sep 2011 12:49:33 -  
1.36
+++ linux/arch/s390/kernel/asm-offsets.c19 Sep 2011 07:37:09 -  
1.37
@@ -10,6 +10,7 @@
 #include linux/sched.h
 #include asm/vdso.h
 #include asm/sigp.h
+#include asm/pgtable.h

 /*
  * Make sure that the compiler is new enough. We want a compiler that
@@ -125,6 +126,7 @@
DEFINE(__LC_KERNEL_STACK, offsetof(struct _lowcore, kernel_stack));
DEFINE(__LC_ASYNC_STACK, offsetof(struct _lowcore, async_stack));
DEFINE(__LC_PANIC_STACK, offsetof(struct _lowcore, panic_stack));
+   DEFINE(__LC_USER_ASCE, offsetof(struct _lowcore, user_asce));
DEFINE(__LC_INT_CLOCK, offsetof(struct _lowcore, int_clock));
DEFINE(__LC_MCCK_CLOCK, offsetof(struct _lowcore, mcck_clock));
DEFINE(__LC_MACHINE_FLAGS, offsetof(struct _lowcore, machine_flags));
@@ -149,6 +151,7 @@
DEFINE(__LC_VDSO_PER_CPU, offsetof(struct _lowcore, vdso_per_cpu_data));
DEFINE(__LC_GMAP, offsetof(struct _lowcore, gmap));
DEFINE(__LC_CMF_HPP, offsetof(struct _lowcore, cmf_hpp));
+   DEFINE(__GMAP_ASCE, offsetof(struct gmap, asce));
 #endif /* CONFIG_32BIT */
return 0;
 }
--- linux/arch/s390/kernel/entry64.S14 Sep 2011 12:49:33 -  1.139
+++ linux/arch/s390/kernel/entry64.S19 Sep 2011 07:37:09 -  1.140
@@ -1073,6 +1073,11 @@
lg  %r14,__LC_THREAD_INFO   # pointer thread_info struct
tm  __TI_flags+7(%r14),_TIF_EXIT_SIE
jnz sie_exit
+   lg  %r14,__LC_GMAP  # get gmap pointer
+   ltgr%r14,%r14
+   jz  sie_gmap
+   lctlg   %c1,%c1,__GMAP_ASCE(%r14)   # load primary asce
+sie_gmap:
lg  %r14,__SF_EMPTY(%r15)   # get control block pointer
SPP __SF_EMPTY(%r15)# set guest id
sie 0(%r14)
@@ -1080,6 +1085,7 @@
SPP __LC_CMF_HPP# set host id
lg  %r14,__LC_THREAD_INFO   # pointer thread_info struct
 sie_exit:
+   lctlg   %c1,%c1,__LC_USER_ASCE  # load primary asce
ni  __TI_flags+6(%r14),255-(_TIF_SIE8)
lg  %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg%r0,%r13,0(%r14)# save guest gprs 0-13
--- linux/arch/s390/kvm/kvm-s390.c  22 Jul 2011 07:35:28 -  1.27
+++ linux/arch/s390/kvm/kvm-s390.c  19 Sep 2011 07:37:09 -  1.28
@@ -263,10 +263,12 @@
vcpu-arch.guest_fpregs.fpc = FPC_VALID_MASK;
restore_fp_regs(vcpu-arch.guest_fpregs);
restore_access_regs(vcpu-arch.guest_acrs);
+   gmap_enable(vcpu-arch.gmap);
 }

 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   gmap_disable(vcpu-arch.gmap);
save_fp_regs(vcpu-arch.guest_fpregs);
save_access_regs(vcpu-arch.guest_acrs);
restore_fp_regs(vcpu-arch.host_fpregs);
@@ -461,7 +463,6 @@
local_irq_disable();
kvm_guest_enter();
local_irq_enable();
-   gmap_enable(vcpu-arch.gmap);
VCPU_EVENT(vcpu, 6, entering sie flags %x,
   atomic_read(vcpu-arch.sie_block-cpuflags));
if (sie64a(vcpu-arch.sie_block, vcpu-arch.guest_gprs)) {
@@ -470,7 +471,6 @@
}
VCPU_EVENT(vcpu, 6, exit sie icptcode %d,
 

Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 04:56:02PM +0300, Avi Kivity wrote:
 On 09/20/2011 04:25 PM, Marcelo Tosatti wrote:
 
   @@ -2827,6 +2828,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct 
  kvm_vcpu *vcpu,
static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 struct kvm_vcpu_events *events)
{
   + process_nmi(vcpu);
 events-exception.injected =
 vcpu-arch.exception.pending
 !kvm_exception_is_soft(vcpu-arch.exception.nr);
   @@ -2844,7 +2846,7 @@ static void 
  kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);
 
 events-nmi.injected = vcpu-arch.nmi_injected;
   - events-nmi.pending = vcpu-arch.nmi_pending;
   + events-nmi.pending = vcpu-arch.nmi_pending != 0;
 events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu);
 events-nmi.pad = 0;
 
 nmi_queued should also be saved and restored. Not sure if its necessary
 though.
 
 Should at least reset nmi_queued somewhere (set_vcpu_events?).
 
 Did you miss the call to process_nmi()?

It transfers nmi_queued to nmi_pending with capping. What i mean is that
upon system reset, nmi_queued (which refers to pre-reset system state)
should be zeroed.

 We do have a small issue.  If we exit during NMI-blocked-by-STI and
 nmi_pending == 2, then we lose the second interrupt.  Should rarely
 happen, since external interrupts never exit in that condition, but
 it's a wart.

And the above system reset case, you should be able to handle it by
saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).

 
   @@ -2864,6 +2866,7 @@ static int 
  kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 | KVM_VCPUEVENT_VALID_SHADOW))
 return -EINVAL;
 
   + process_nmi(vcpu);
 
 This should be after nmi fields are set, not before?
 
 
 It's actually to clear queued NMIs in case we set nmi_pending (which
 should never happen unless the machine is completely quiet, since
 it's asynchronous to the vcpu; same as the IRR).

OK.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s390 Fix address mode switching

2011-09-20 Thread Avi Kivity

On 09/20/2011 05:42 PM, Christian Borntraeger wrote:

Avi,

here is another fix I want to submit for 3.1 (if possible)
that repairs KVM for s390. Are you ok to submit this patch
via s390 tree - iow can you ACK the kvm specific part
(arch/s390/kvm/kvm-s390.c). The biggest chunk is low level
SIE handling and memory management. This was already
discussed with Martin Schwidefsky.




Sure - ACK.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Avi Kivity

On 09/20/2011 05:59 PM, Marcelo Tosatti wrote:

On Tue, Sep 20, 2011 at 04:56:02PM +0300, Avi Kivity wrote:
  On 09/20/2011 04:25 PM, Marcelo Tosatti wrote:
  
 @@ -2827,6 +2828,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct 
kvm_vcpu *vcpu,
  static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
struct kvm_vcpu_events *events)
  {
 +  process_nmi(vcpu);
events-exception.injected =
vcpu-arch.exception.pending
!kvm_exception_is_soft(vcpu-arch.exception.nr);
 @@ -2844,7 +2846,7 @@ static void 
kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
KVM_X86_SHADOW_INT_MOV_SS | 
KVM_X86_SHADOW_INT_STI);
  
events-nmi.injected = vcpu-arch.nmi_injected;
 -  events-nmi.pending = vcpu-arch.nmi_pending;
 +  events-nmi.pending = vcpu-arch.nmi_pending != 0;
events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu);
events-nmi.pad = 0;
  
  nmi_queued should also be saved and restored. Not sure if its necessary
  though.
  
  Should at least reset nmi_queued somewhere (set_vcpu_events?).

  Did you miss the call to process_nmi()?

It transfers nmi_queued to nmi_pending with capping. What i mean is that
upon system reset, nmi_queued (which refers to pre-reset system state)
should be zeroed.


The is get_events(); for set_events(), process_nmi() does zero 
nmi_queued() (and then we overwrite nmi_pending).




  We do have a small issue.  If we exit during NMI-blocked-by-STI and
  nmi_pending == 2, then we lose the second interrupt.  Should rarely
  happen, since external interrupts never exit in that condition, but
  it's a wart.

And the above system reset case, you should be able to handle it by
saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).


We could just add a KVM_CAP (and flag) that extends nmi_pending from a 
bool to a counter.





--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 07:24:01PM +0300, Avi Kivity wrote:
 On 09/20/2011 05:59 PM, Marcelo Tosatti wrote:
 On Tue, Sep 20, 2011 at 04:56:02PM +0300, Avi Kivity wrote:
   On 09/20/2011 04:25 PM, Marcelo Tosatti wrote:
   
  @@ -2827,6 +2828,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct 
  kvm_vcpu *vcpu,
   static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu 
  *vcpu,
   struct kvm_vcpu_events *events)
   {
  +process_nmi(vcpu);
   events-exception.injected =
   vcpu-arch.exception.pending
   !kvm_exception_is_soft(vcpu-arch.exception.nr);
  @@ -2844,7 +2846,7 @@ static void 
  kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
   KVM_X86_SHADOW_INT_MOV_SS | 
  KVM_X86_SHADOW_INT_STI);
   
   events-nmi.injected = vcpu-arch.nmi_injected;
  -events-nmi.pending = vcpu-arch.nmi_pending;
  +events-nmi.pending = vcpu-arch.nmi_pending != 0;
   events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu);
   events-nmi.pad = 0;
   
   nmi_queued should also be saved and restored. Not sure if its necessary
   though.
   
   Should at least reset nmi_queued somewhere (set_vcpu_events?).
 
   Did you miss the call to process_nmi()?
 
 It transfers nmi_queued to nmi_pending with capping. What i mean is that
 upon system reset, nmi_queued (which refers to pre-reset system state)
 should be zeroed.
 
 The is get_events(); for set_events(), process_nmi() does zero
 nmi_queued() (and then we overwrite nmi_pending).

Right.

 
   We do have a small issue.  If we exit during NMI-blocked-by-STI and
   nmi_pending == 2, then we lose the second interrupt.  Should rarely
   happen, since external interrupts never exit in that condition, but
   it's a wart.
 
 And the above system reset case, you should be able to handle it by
 saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).
 
 We could just add a KVM_CAP (and flag) that extends nmi_pending from
 a bool to a counter.

Or just add a new field to the pad.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-20 Thread Jan Kiszka
Upstream's version is about to be signal-free and will stop handling
SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
from signalfd to a pipe.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

This should help pulling upstream into qemu-kvm when block: avoid
SIGUSR2 is merged. And will help merging further cleanups of this code
I'm working on.

 posix-aio-compat.c |   73 ++-
 1 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index d8ad9ef..d375b56 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -27,7 +27,6 @@
 #include qemu-common.h
 #include trace.h
 #include block_int.h
-#include compatfd.h
 
 #include block/raw-posix-aio.h
 
@@ -43,7 +42,6 @@ struct qemu_paiocb {
 int aio_niov;
 size_t aio_nbytes;
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
-int ev_signo;
 off_t aio_offset;
 
 QTAILQ_ENTRY(qemu_paiocb) node;
@@ -54,7 +52,7 @@ struct qemu_paiocb {
 };
 
 typedef struct PosixAioState {
-int fd;
+int rfd, wfd;
 struct qemu_paiocb *first_aio;
 } PosixAioState;
 
@@ -310,12 +308,10 @@ static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
 return nbytes;
 }
 
+static void posix_aio_notify_event(void);
+
 static void *aio_thread(void *unused)
 {
-pid_t pid;
-
-pid = getpid();
-
 mutex_lock(lock);
 pending_threads--;
 mutex_unlock(lock);
@@ -382,7 +378,7 @@ static void *aio_thread(void *unused)
 aiocb-ret = ret;
 mutex_unlock(lock);
 
-if (kill(pid, aiocb-ev_signo)) die(kill failed);
+posix_aio_notify_event();
 }
 
 cur_threads--;
@@ -524,29 +520,18 @@ static int posix_aio_process_queue(void *opaque)
 static void posix_aio_read(void *opaque)
 {
 PosixAioState *s = opaque;
-union {
-struct qemu_signalfd_siginfo siginfo;
-char buf[128];
-} sig;
-size_t offset;
+ssize_t len;
 
-/* try to read from signalfd, don't freak out if we can't read anything */
-offset = 0;
-while (offset  128) {
-ssize_t len;
+/* read all bytes from signal pipe */
+for (;;) {
+char bytes[16];
 
-len = read(s-fd, sig.buf + offset, 128 - offset);
+len = read(s-rfd, bytes, sizeof(bytes));
 if (len == -1  errno == EINTR)
-continue;
-if (len == -1  errno == EAGAIN) {
-/* there is no natural reason for this to happen,
- * so we'll spin hard until we get everything just
- * to be on the safe side. */
-if (offset  0)
-continue;
-}
-
-offset += len;
+continue; /* try again */
+if (len == sizeof(bytes))
+continue; /* more to read */
+break;
 }
 
 posix_aio_process_queue(s);
@@ -560,6 +545,16 @@ static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
+static void posix_aio_notify_event(void)
+{
+char byte = 0;
+ssize_t ret;
+
+ret = write(posix_aio_state-wfd, byte, sizeof(byte));
+if (ret  0  errno != EAGAIN)
+die(write());
+}
+
 static void paio_remove(struct qemu_paiocb *acb)
 {
 struct qemu_paiocb **pacb;
@@ -621,7 +616,6 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 return NULL;
 acb-aio_type = type;
 acb-aio_fildes = fd;
-acb-ev_signo = SIGUSR2;
 
 if (qiov) {
 acb-aio_iov = qiov-iov;
@@ -649,7 +643,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 return NULL;
 acb-aio_type = QEMU_AIO_IOCTL;
 acb-aio_fildes = fd;
-acb-ev_signo = SIGUSR2;
 acb-aio_offset = 0;
 acb-aio_ioctl_buf = buf;
 acb-aio_ioctl_cmd = req;
@@ -663,8 +656,8 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 
 int paio_init(void)
 {
-sigset_t mask;
 PosixAioState *s;
+int fds[2];
 int ret;
 
 if (posix_aio_state)
@@ -672,21 +665,19 @@ int paio_init(void)
 
 s = g_malloc(sizeof(PosixAioState));
 
-/* Make sure to block AIO signal */
-sigemptyset(mask);
-sigaddset(mask, SIGUSR2);
-sigprocmask(SIG_BLOCK, mask, NULL);
-
 s-first_aio = NULL;
-s-fd = qemu_signalfd(mask);
-if (s-fd == -1) {
-fprintf(stderr, failed to create signalfd\n);
+if (qemu_pipe(fds) == -1) {
+fprintf(stderr, failed to create pipe\n);
 return -1;
 }
 
-fcntl(s-fd, F_SETFL, O_NONBLOCK);
+s-rfd = fds[0];
+s-wfd = fds[1];
+
+fcntl(s-rfd, F_SETFL, O_NONBLOCK);
+fcntl(s-wfd, F_SETFL, O_NONBLOCK);
 
-qemu_aio_set_fd_handler(s-fd, posix_aio_read, NULL, posix_aio_flush,
+qemu_aio_set_fd_handler(s-rfd, posix_aio_read, NULL, posix_aio_flush,
 posix_aio_process_queue, s);
 
 ret = pthread_attr_init(attr);
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH v2] KVM: Fix simultaneous NMIs

2011-09-20 Thread Avi Kivity

On 09/20/2011 07:30 PM, Marcelo Tosatti wrote:

  
 We do have a small issue.  If we exit during NMI-blocked-by-STI and
 nmi_pending == 2, then we lose the second interrupt.  Should rarely
 happen, since external interrupts never exit in that condition, but
 it's a wart.
  
  And the above system reset case, you should be able to handle it by
  saving/restoring nmi_queued (so that QEMU can zero it in vcpu_reset).

  We could just add a KVM_CAP (and flag) that extends nmi_pending from
  a bool to a counter.

Or just add a new field to the pad.



Okay; I'll address this in a follow-on patch (my preference is making 
nmi_pending a counter).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-20 Thread Scott Wood
On 09/19/2011 06:35 PM, Alexander Graf wrote:
 + /*
 +  * e500 doesn't implement the lowest tsize bit,
 +  * or 1K pages.
 +  */
 + tsize = max(BOOK3E_PAGESZ_4K, tsize  ~1);
 +
 + /* take the smallest page size that satisfies host and
 +guest mapping */

kernel comment style...

I don't personally care much, but this leaves a checkpatch complaint for
the next person to modify the comment.  Such as to make it say take the
largest page size that :-)

 + asm (PPC_CNTLZL %0,%1 : =r (lz) : r (psize));
 + tsize = min(21 - lz, tsize);

No need to open-code the cntlz and subtract-from-21:

tsize = min(ilog2(psize) - 10, tsize);

/*
 * e500 doesn't implement the lowest tsize bit,
 * or 1K pages.
 */
tsize = max(BOOK3E_PAGESZ_4K, tsize  ~1);

There's still an open-coded subtraction of 10, but that relates more
straightforwardly to the definition of tsize (and could be factored out
into a size-to-tsize function).

   }
  
   up_read(current-mm-mmap_sem);
   }
  
   if (likely(!pfnmap)) {
 + unsigned long tsize_pages = 1  (tsize - 2);

1  (tsize + 10 - PAGE_SHIFT);

   pfn = gfn_to_pfn_memslot(vcpu_e500-vcpu.kvm, slot, gfn);
 + pfn = ~(tsize_pages - 1);
 + gvaddr = ~((tsize_pages  PAGE_SHIFT) - 1);
   if (is_error_pfn(pfn)) {

Won't the masking of pfn affect is_error_pfn()?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Eric B Munson
On Thu, 15 Sep 2011, Marcelo Tosatti wrote:

 On Tue, Sep 13, 2011 at 04:49:55PM -0400, Eric B Munson wrote:
  On Fri, 09 Sep 2011, Marcelo Tosatti wrote:
  
   On Thu, Sep 01, 2011 at 02:27:49PM -0600, emun...@mgebm.net wrote:
On Thu, 01 Sep 2011 14:24:12 -0500, Anthony Liguori wrote:
On 08/30/2011 07:26 AM, Marcelo Tosatti wrote:
On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote:
Currently, when qemu stops a guest kernel that guest will
issue a soft lockup
message when it resumes.  This set provides the ability for
qemu to comminucate
to the guest that it has been stopped.  When the guest hits
the watchdog on
resume it will check if it was suspended before issuing the
warning.

Eric B Munson (4):
   Add flag to indicate that a vm was stopped by the host
   Add functions to check if the host has stopped the vm
   Add generic stubs for kvm stop check functions
   Add check for suspended vm in softlockup detector

  arch/x86/include/asm/pvclock-abi.h |1 +
  arch/x86/include/asm/pvclock.h |2 ++
  arch/x86/kernel/kvmclock.c |   14 ++
  include/asm-generic/pvclock.h  |   14 ++
  kernel/watchdog.c  |   12 
  5 files changed, 43 insertions(+), 0 deletions(-)
  create mode 100644 include/asm-generic/pvclock.h

--
1.7.4.1

How is the host supposed to set this flag?

As mentioned previously, if you save save/restore the offset
added to
kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no
paravirt infrastructure is required. Which means the issue is
also fixed
for older guests.

  
  Marcelo,
  
  I think that stopping the TSC is the wrong approach because it will break 
  time
  between the two systems so timething that expects the monotonic clock to 
  move
  consistently will be wrong.
 
 In case the VM stops for whatever reason, the host system is not
 supposed to adjust time related hardware state to compensate, in an
 attempt to present apparent continuous time.
 
 If you save a VM and then restore it later, it is the guest
 responsability to adjust its time representation.
 
 QEMU exposing continuous TSC and kvmclock state between stop and
 cont should not be a reason to introduce new paravirt infrastructure.
 
   IMO, messing with the TSC at run time to avoid a watchdog message
  is the wrong solution, better to teach the watchdog to ignore this
  special case.
 
 OK then, it is not a harmful addition, can you post the QEMU patches to
 set the ignore watchdog bit.

I am about to start working on the interface to set this bit, but to double
check I tried your suggestion to save the clock value on stop and restore it on
continue (see qemu patch below) and I am seeing very odd behavior from it.  On
random resumes the vm won't come back immediately (it took one 15 mintues to
respond) and the wall clock stays behind my host wall clock by the amount of
time it took to resume.

---
 cpus.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index 54c188c..4573e23 100644
--- a/cpus.c
+++ b/cpus.c
@@ -63,6 +63,7 @@
 #endif /* CONFIG_LINUX */
 
 static CPUState *next_cpu;
+struct kvm_clock_data data;
 
 /***/
 void hw_error(const char *fmt, ...)
@@ -788,6 +789,7 @@ static int all_vcpus_paused(void)
 void pause_all_vcpus(void)
 {
 CPUState *penv = first_cpu;
+int ret;
 
 while (penv) {
 penv-stop = 1;
@@ -803,11 +805,18 @@ void pause_all_vcpus(void)
 penv = (CPUState *)penv-next_cpu;
 }
 }
+
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
+if (ret  0) {
+fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
+data.clock = 0;
+}
 }
 
 void resume_all_vcpus(void)
 {
 CPUState *penv = first_cpu;
+int ret;
 
 while (penv) {
 penv-stop = 0;
@@ -815,6 +824,11 @@ void resume_all_vcpus(void)
 qemu_cpu_kick(penv);
 penv = (CPUState *)penv-next_cpu;
 }
+if (data.clock != 0) {
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, data);
+if (ret  0)
+fprintf(stderr, KVM_SET_CLOCK failed: %s\n, strerror(ret));
+}
 }
 
 static void qemu_tcg_init_vcpu(void *_env)
 


signature.asc
Description: Digital signature


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 03:00:54PM -0400, Eric B Munson wrote:
 On Thu, 15 Sep 2011, Marcelo Tosatti wrote:
 
  On Tue, Sep 13, 2011 at 04:49:55PM -0400, Eric B Munson wrote:
   On Fri, 09 Sep 2011, Marcelo Tosatti wrote:
   
On Thu, Sep 01, 2011 at 02:27:49PM -0600, emun...@mgebm.net wrote:
 On Thu, 01 Sep 2011 14:24:12 -0500, Anthony Liguori wrote:
 On 08/30/2011 07:26 AM, Marcelo Tosatti wrote:
 On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote:
 Currently, when qemu stops a guest kernel that guest will
 issue a soft lockup
 message when it resumes.  This set provides the ability for
 qemu to comminucate
 to the guest that it has been stopped.  When the guest hits
 the watchdog on
 resume it will check if it was suspended before issuing the
 warning.
 
 Eric B Munson (4):
Add flag to indicate that a vm was stopped by the host
Add functions to check if the host has stopped the vm
Add generic stubs for kvm stop check functions
Add check for suspended vm in softlockup detector
 
   arch/x86/include/asm/pvclock-abi.h |1 +
   arch/x86/include/asm/pvclock.h |2 ++
   arch/x86/kernel/kvmclock.c |   14 ++
   include/asm-generic/pvclock.h  |   14 ++
   kernel/watchdog.c  |   12 
   5 files changed, 43 insertions(+), 0 deletions(-)
   create mode 100644 include/asm-generic/pvclock.h
 
 --
 1.7.4.1
 
 How is the host supposed to set this flag?
 
 As mentioned previously, if you save save/restore the offset
 added to
 kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no
 paravirt infrastructure is required. Which means the issue is
 also fixed
 for older guests.
 
   
   Marcelo,
   
   I think that stopping the TSC is the wrong approach because it will break 
   time
   between the two systems so timething that expects the monotonic clock to 
   move
   consistently will be wrong.
  
  In case the VM stops for whatever reason, the host system is not
  supposed to adjust time related hardware state to compensate, in an
  attempt to present apparent continuous time.
  
  If you save a VM and then restore it later, it is the guest
  responsability to adjust its time representation.
  
  QEMU exposing continuous TSC and kvmclock state between stop and
  cont should not be a reason to introduce new paravirt infrastructure.
  
IMO, messing with the TSC at run time to avoid a watchdog message
   is the wrong solution, better to teach the watchdog to ignore this
   special case.
  
  OK then, it is not a harmful addition, can you post the QEMU patches to
  set the ignore watchdog bit.
 
 I am about to start working on the interface to set this bit, but to double
 check I tried your suggestion to save the clock value on stop and restore it 
 on
 continue (see qemu patch below) and I am seeing very odd behavior from it.  On
 random resumes the vm won't come back immediately (it took one 15 mintues to
 respond) 

You should also save/restore the TSC MSR for each VCPU. And it appears
there is a missing

kvm_for_each_vcpu(vcpu)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu)

in KVM_SET_CLOCK ioctl handling.

 and the wall clock stays behind my host wall clock by the amount of
 time it took to resume.

This is expected, similar to savevm/loadvm.

 ---
  cpus.c |   14 ++
  1 files changed, 14 insertions(+), 0 deletions(-)
 
 diff --git a/cpus.c b/cpus.c
 index 54c188c..4573e23 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -63,6 +63,7 @@
  #endif /* CONFIG_LINUX */
  
  static CPUState *next_cpu;
 +struct kvm_clock_data data;
  
  /***/
  void hw_error(const char *fmt, ...)
 @@ -788,6 +789,7 @@ static int all_vcpus_paused(void)
  void pause_all_vcpus(void)
  {
  CPUState *penv = first_cpu;
 +int ret;
  
  while (penv) {
  penv-stop = 1;
 @@ -803,11 +805,18 @@ void pause_all_vcpus(void)
  penv = (CPUState *)penv-next_cpu;
  }
  }
 +
 +ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
 +if (ret  0) {
 +fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
 +data.clock = 0;
 +}
  }
  
  void resume_all_vcpus(void)
  {
  CPUState *penv = first_cpu;
 +int ret;
  
  while (penv) {
  penv-stop = 0;
 @@ -815,6 +824,11 @@ void resume_all_vcpus(void)
  qemu_cpu_kick(penv);
  penv = (CPUState *)penv-next_cpu;
  }
 +if (data.clock != 0) {
 +ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, data);
 +if (ret  0)
 +fprintf(stderr, KVM_SET_CLOCK failed: %s\n, strerror(ret));
 +}
  }
  
  static void qemu_tcg_init_vcpu(void *_env)
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Jamie Lokier
Marcelo Tosatti wrote:
 In case the VM stops for whatever reason, the host system is not
 supposed to adjust time related hardware state to compensate, in an
 attempt to present apparent continuous time.
 
 If you save a VM and then restore it later, it is the guest
 responsability to adjust its time representation.

If the guest doesn't know it's been stopped, then its time
representation will be wrong until it finds out, e.g. after a few
minutes with NTP, or even a seconds can be too long.

That is sad when it happens because it breaks the coherence of any
timed-lease caching the guest is involved in.  I.e. where the guest
acquires a lock on some data object (like a file in NFS) that it can
efficiently access without network round trips (similar to MESI), with
all nodes having agreed that it's coherent for, say, 5 seconds before
renewing or breaking.  (It's just a way to reduce latency.)

But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
one of those facts of life.  So instead the effort is to try and
detect when a VM is involved and then distrust the clock.

(Non-VM) suspend/resume is similar, but there's usually a way to
be notified about that as it happens.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] Fix regression of commit 87b533bf

2011-09-20 Thread Kevin O'Connor
On Tue, Sep 20, 2011 at 04:00:57AM -0400, Amos Kong wrote:
 From 4678a3cb0e0a3cd7a4bc3d284c5719fdba90bc61 Mon Sep 17 00:00:00 2001
 From: Kevin O'Connor ke...@koconnor.net
 Date: Tue, 20 Sep 2011 15:43:55 +0800
 Subject: [PATCH V2] Fix regression of commit 87b533bf
 
 From: Kevin O'Connor ke...@koconnor.net
 
 After adding more device entries in ACPI DSDT tables,
 the filesize of bios.bin changed from 128K to 256K.
 But bios could not initialize successfully.
 
 This is a regression since seabios commit 87b533bf. Prior to
 that commit, seabios did not mark the early 32bit initialization
 code as init code. However, a side effect of marking that code
 (handle_post) as init code is that it is more likely the linker
 could place the code at an address less than 0xe.
 The patch (just a hack) would cover up the issue.

Thanks.  I committed a slightly different patch (but similar concept)
to the seabios repo.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-20 Thread Kevin O'Connor
On Tue, Sep 20, 2011 at 06:45:57AM -0400, Amos Kong wrote:
 From 48ea1c9188334b89a60b4f9e853e86fc04fda4a5 Mon Sep 17 00:00:00 2001
 From: Amos Kong ak...@redhat.com
 Date: Tue, 20 Sep 2011 15:38:43 +0800
 Subject: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables
 
 Only func 0 is registered to guest driver (we can
 only found func 0 in slot-funcs list of driver),
 the other functions could not be cleaned when
 hot-removing the whole slot. This patch adds
 device per function in ACPI DSDT tables.
 
 Have tested with linux/winxp/win7, hot-adding/hot-remving,
 single/multiple function device, they are all fine.
 ---
 Changes from v1:
 - cleanup the macros, bios.bin gets back to 128K
 - notify only when func0 is added and removed

How about moving code into functions so that it isn't duplicated for
each PCI device.  See the patch below as an example (100% untested).
The CPU hotplug stuff works this way, except it run-time generates the
equivalent of Device(S???) and PCNT.  (It may make sense to
run-time generate the PCI hotplug as well - it consumes about 10K of
space to statically generate the 31 devices.)

-Kevin


diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 08412e2..31ac5eb 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -128,48 +128,6 @@ DefinitionBlock (
 PCRM, 32,
 }
 
-#define hotplug_slot(name, nr) \
-Device (S##name) {\
-   Name (_ADR, nr##)  \
-   Method (_EJ0,1) {  \
-Store(ShiftLeft(1, nr), B0EJ) \
-Return (0x0)  \
-   }  \
-   Name (_SUN, name)  \
-}
-
-   hotplug_slot(1, 0x0001)
-   hotplug_slot(2, 0x0002)
-   hotplug_slot(3, 0x0003)
-   hotplug_slot(4, 0x0004)
-   hotplug_slot(5, 0x0005)
-   hotplug_slot(6, 0x0006)
-   hotplug_slot(7, 0x0007)
-   hotplug_slot(8, 0x0008)
-   hotplug_slot(9, 0x0009)
-   hotplug_slot(10, 0x000a)
-   hotplug_slot(11, 0x000b)
-   hotplug_slot(12, 0x000c)
-   hotplug_slot(13, 0x000d)
-   hotplug_slot(14, 0x000e)
-   hotplug_slot(15, 0x000f)
-   hotplug_slot(16, 0x0010)
-   hotplug_slot(17, 0x0011)
-   hotplug_slot(18, 0x0012)
-   hotplug_slot(19, 0x0013)
-   hotplug_slot(20, 0x0014)
-   hotplug_slot(21, 0x0015)
-   hotplug_slot(22, 0x0016)
-   hotplug_slot(23, 0x0017)
-   hotplug_slot(24, 0x0018)
-   hotplug_slot(25, 0x0019)
-   hotplug_slot(26, 0x001a)
-   hotplug_slot(27, 0x001b)
-   hotplug_slot(28, 0x001c)
-   hotplug_slot(29, 0x001d)
-   hotplug_slot(30, 0x001e)
-   hotplug_slot(31, 0x001f)
-
 Name (_CRS, ResourceTemplate ()
 {
 WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,
@@ -762,6 +720,119 @@ DefinitionBlock (
 Zero   /* reserved */
 })
 
+/* PCI hotplug */
+Scope(\_SB.PCI0) {
+/* Methods called by bulk generated PCI devices below */
+Method (PCEJ, 1, NotSerialized) {
+// _EJ0 method - eject callback
+Store(ShiftLeft(1, Arg0), B0EJ)
+Return (0x0)
+}
+
+/* Bulk generated PCI hotplug devices */
+#define hotplug_func(nr, fn)\
+Device (S##nr##fn) {\
+Name (_ADR, 0x##nr##000##fn)\
+Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
+Name (_SUN, 0x##nr) \
+}
+
+#define hotplug_slot(nr) \
+hotplug_func(nr, 0)  \
+hotplug_func(nr, 1)  \
+hotplug_func(nr, 2)  \
+hotplug_func(nr, 3)  \
+hotplug_func(nr, 4)  \
+hotplug_func(nr, 5)  \
+hotplug_func(nr, 6)  \
+hotplug_func(nr, 7)
+
+   hotplug_slot(01)
+   hotplug_slot(02)
+   hotplug_slot(03)
+   hotplug_slot(04)
+   hotplug_slot(05)
+   hotplug_slot(06)
+   hotplug_slot(07)
+   hotplug_slot(08)
+   hotplug_slot(09)
+   hotplug_slot(0a)
+   hotplug_slot(0b)
+   hotplug_slot(0c)
+   hotplug_slot(0d)
+   hotplug_slot(0e)
+   hotplug_slot(0f)
+   hotplug_slot(10)
+   hotplug_slot(11)
+   hotplug_slot(12)
+   hotplug_slot(13)
+   hotplug_slot(14)
+   hotplug_slot(15)
+   hotplug_slot(16)
+   hotplug_slot(17)
+   hotplug_slot(18)
+   hotplug_slot(19)
+   hotplug_slot(1a)
+   hotplug_slot(1b)
+   hotplug_slot(1c)
+   hotplug_slot(1d)
+   hotplug_slot(1e)
+   hotplug_slot(1f)
+
+/* PCI hotplug notify method */
+Method(PCNF, 0) {
+// Local0 = iterator
+

Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-20 Thread Zhi Yong Wu
On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
 On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
  On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com 
  wrote:
   On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
   Note:
        1.) When bps/iops limits are specified to a small value such as 
   511 bytes/s, this VM will hang up. We are considering how to handle 
   this senario.
  
   You can increase the length of the slice, if the request is larger than
   slice_time * bps_limit.
  Yeah, but it is a challenge for how to increase it. Do you have some nice 
  idea?
 
  If the queue is empty, and the request being processed does not fit the
  queue, increase the slice so that the request fits.
 Sorry for late reply. actually, do you think that this scenario is
 meaningful for the user?
 Since we implement this, if the user limits the bps below 512
 bytes/second, the VM can also not run every task.
 Can you let us know why we need to make such effort?

 It would be good to handle request larger than the slice.
OK. Let me spend some time on trying your way.

 It is not strictly necessary, but in case its not handled, a minimum
 should be in place, to reflect maximum request size known. Being able to
In fact, slice_time has been dynamic now, and adjusted in some range.
 specify something which crashes is not acceptable.
Do you mean that one warning should be displayed if the specified
limit is smaller than the minimum capability?






-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI DSDT tables

2011-09-20 Thread Amos Kong
- Original Message -
 On Tue, Sep 20, 2011 at 06:45:57AM -0400, Amos Kong wrote:
  From 48ea1c9188334b89a60b4f9e853e86fc04fda4a5 Mon Sep 17 00:00:00
  2001
  From: Amos Kong ak...@redhat.com
  Date: Tue, 20 Sep 2011 15:38:43 +0800
  Subject: [SeaBIOS PATCH v2] hotplug: Add device per func in ACPI
  DSDT tables
  
  Only func 0 is registered to guest driver (we can
  only found func 0 in slot-funcs list of driver),
  the other functions could not be cleaned when
  hot-removing the whole slot. This patch adds
  device per function in ACPI DSDT tables.
  
  Have tested with linux/winxp/win7, hot-adding/hot-remving,
  single/multiple function device, they are all fine.
  ---
  Changes from v1:
  - cleanup the macros, bios.bin gets back to 128K
  - notify only when func0 is added and removed
 
 How about moving code into functions so that it isn't duplicated for
 each PCI device.  See the patch below as an example (100% untested).

Tested, it works as my original patch-v2.

 The CPU hotplug stuff works this way, except it run-time generates
 the
 equivalent of Device(S???) and PCNT.  (It may make sense to
 run-time generate the PCI hotplug as well - it consumes about 10K of
 space to statically generate the 31 devices.)

I'm ok with the new version.

Acked-by: Amos Kong ak...@redhat.com

 -Kevin
 
 
 diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
 index 08412e2..31ac5eb 100644
 --- a/src/acpi-dsdt.dsl
 +++ b/src/acpi-dsdt.dsl
 @@ -128,48 +128,6 @@ DefinitionBlock (
  PCRM, 32,
  }
  
 -#define hotplug_slot(name, nr) \
 -Device (S##name) {\
 -   Name (_ADR, nr##)  \
 -   Method (_EJ0,1) {  \
 -Store(ShiftLeft(1, nr), B0EJ) \
 -Return (0x0)  \
 -   }  \
 -   Name (_SUN, name)  \
 -}
 -
 - hotplug_slot(1, 0x0001)
 - hotplug_slot(2, 0x0002)
 - hotplug_slot(3, 0x0003)
 - hotplug_slot(4, 0x0004)
 - hotplug_slot(5, 0x0005)
 - hotplug_slot(6, 0x0006)
 - hotplug_slot(7, 0x0007)
 - hotplug_slot(8, 0x0008)
 - hotplug_slot(9, 0x0009)
 - hotplug_slot(10, 0x000a)
 - hotplug_slot(11, 0x000b)
 - hotplug_slot(12, 0x000c)
 - hotplug_slot(13, 0x000d)
 - hotplug_slot(14, 0x000e)
 - hotplug_slot(15, 0x000f)
 - hotplug_slot(16, 0x0010)
 - hotplug_slot(17, 0x0011)
 - hotplug_slot(18, 0x0012)
 - hotplug_slot(19, 0x0013)
 - hotplug_slot(20, 0x0014)
 - hotplug_slot(21, 0x0015)
 - hotplug_slot(22, 0x0016)
 - hotplug_slot(23, 0x0017)
 - hotplug_slot(24, 0x0018)
 - hotplug_slot(25, 0x0019)
 - hotplug_slot(26, 0x001a)
 - hotplug_slot(27, 0x001b)
 - hotplug_slot(28, 0x001c)
 - hotplug_slot(29, 0x001d)
 - hotplug_slot(30, 0x001e)
 - hotplug_slot(31, 0x001f)
 -
  Name (_CRS, ResourceTemplate ()
  {
  WordBusNumber (ResourceProducer, MinFixed, MaxFixed,
  PosDecode,
 @@ -762,6 +720,119 @@ DefinitionBlock (
  Zero   /* reserved */
  })
  
 +/* PCI hotplug */
 +Scope(\_SB.PCI0) {
 +/* Methods called by bulk generated PCI devices below */
 +Method (PCEJ, 1, NotSerialized) {
 +// _EJ0 method - eject callback
 +Store(ShiftLeft(1, Arg0), B0EJ)
 +Return (0x0)
 +}
 +
 +/* Bulk generated PCI hotplug devices */
 +#define hotplug_func(nr, fn)\
 +Device (S##nr##fn) {\
 +Name (_ADR, 0x##nr##000##fn)\
 +Method (_EJ0, 1) { Return(PCEJ(0x##nr)) }   \
 +Name (_SUN, 0x##nr) \
 +}
 +
 +#define hotplug_slot(nr) \
 +hotplug_func(nr, 0)  \
 +hotplug_func(nr, 1)  \
 +hotplug_func(nr, 2)  \
 +hotplug_func(nr, 3)  \
 +hotplug_func(nr, 4)  \
 +hotplug_func(nr, 5)  \
 +hotplug_func(nr, 6)  \
 +hotplug_func(nr, 7)
 +
 + hotplug_slot(01)
 + hotplug_slot(02)
 + hotplug_slot(03)
 + hotplug_slot(04)
 + hotplug_slot(05)
 + hotplug_slot(06)
 + hotplug_slot(07)
 + hotplug_slot(08)
 + hotplug_slot(09)
 + hotplug_slot(0a)
 + hotplug_slot(0b)
 + hotplug_slot(0c)
 + hotplug_slot(0d)
 + hotplug_slot(0e)
 + hotplug_slot(0f)
 + hotplug_slot(10)
 + hotplug_slot(11)
 + hotplug_slot(12)
 + hotplug_slot(13)
 + hotplug_slot(14)
 + hotplug_slot(15)
 + hotplug_slot(16)
 + hotplug_slot(17)
 + hotplug_slot(18)
 + hotplug_slot(19)
 + hotplug_slot(1a)
 + hotplug_slot(1b)
 + hotplug_slot(1c)
 + 

Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-20 Thread Zhi Yong Wu
On Wed, Sep 21, 2011 at 11:14 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
 On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti mtosa...@redhat.com 
 wrote:
  On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
  On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com 
  wrote:
   On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
   Note:
        1.) When bps/iops limits are specified to a small value such as 
   511 bytes/s, this VM will hang up. We are considering how to handle 
   this senario.
  
   You can increase the length of the slice, if the request is larger than
   slice_time * bps_limit.
  Yeah, but it is a challenge for how to increase it. Do you have some 
  nice idea?
 
  If the queue is empty, and the request being processed does not fit the
  queue, increase the slice so that the request fits.
 Sorry for late reply. actually, do you think that this scenario is
 meaningful for the user?
 Since we implement this, if the user limits the bps below 512
 bytes/second, the VM can also not run every task.
 Can you let us know why we need to make such effort?

 It would be good to handle request larger than the slice.
 OK. Let me spend some time on trying your way.

 It is not strictly necessary, but in case its not handled, a minimum
 should be in place, to reflect maximum request size known. Being able to
 In fact, slice_time has been dynamic now, and adjusted in some range.
Sorry, I made a mistake. Currently it is fixed.
 specify something which crashes is not acceptable.
 Do you mean that one warning should be displayed if the specified
 limit is smaller than the minimum capability?






 --
 Regards,

 Zhi Yong Wu




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Eric B Munson
On Thu, 15 Sep 2011, Marcelo Tosatti wrote:

 On Tue, Sep 13, 2011 at 04:49:55PM -0400, Eric B Munson wrote:
  On Fri, 09 Sep 2011, Marcelo Tosatti wrote:
  
   On Thu, Sep 01, 2011 at 02:27:49PM -0600, emun...@mgebm.net wrote:
On Thu, 01 Sep 2011 14:24:12 -0500, Anthony Liguori wrote:
On 08/30/2011 07:26 AM, Marcelo Tosatti wrote:
On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote:
Currently, when qemu stops a guest kernel that guest will
issue a soft lockup
message when it resumes.  This set provides the ability for
qemu to comminucate
to the guest that it has been stopped.  When the guest hits
the watchdog on
resume it will check if it was suspended before issuing the
warning.

Eric B Munson (4):
   Add flag to indicate that a vm was stopped by the host
   Add functions to check if the host has stopped the vm
   Add generic stubs for kvm stop check functions
   Add check for suspended vm in softlockup detector

  arch/x86/include/asm/pvclock-abi.h |1 +
  arch/x86/include/asm/pvclock.h |2 ++
  arch/x86/kernel/kvmclock.c |   14 ++
  include/asm-generic/pvclock.h  |   14 ++
  kernel/watchdog.c  |   12 
  5 files changed, 43 insertions(+), 0 deletions(-)
  create mode 100644 include/asm-generic/pvclock.h

--
1.7.4.1

How is the host supposed to set this flag?

As mentioned previously, if you save save/restore the offset
added to
kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no
paravirt infrastructure is required. Which means the issue is
also fixed
for older guests.

  
  Marcelo,
  
  I think that stopping the TSC is the wrong approach because it will break 
  time
  between the two systems so timething that expects the monotonic clock to 
  move
  consistently will be wrong.
 
 In case the VM stops for whatever reason, the host system is not
 supposed to adjust time related hardware state to compensate, in an
 attempt to present apparent continuous time.
 
 If you save a VM and then restore it later, it is the guest
 responsability to adjust its time representation.
 
 QEMU exposing continuous TSC and kvmclock state between stop and
 cont should not be a reason to introduce new paravirt infrastructure.
 
   IMO, messing with the TSC at run time to avoid a watchdog message
  is the wrong solution, better to teach the watchdog to ignore this
  special case.
 
 OK then, it is not a harmful addition, can you post the QEMU patches to
 set the ignore watchdog bit.

I am about to start working on the interface to set this bit, but to double
check I tried your suggestion to save the clock value on stop and restore it on
continue (see qemu patch below) and I am seeing very odd behavior from it.  On
random resumes the vm won't come back immediately (it took one 15 mintues to
respond) and the wall clock stays behind my host wall clock by the amount of
time it took to resume.

---
 cpus.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index 54c188c..4573e23 100644
--- a/cpus.c
+++ b/cpus.c
@@ -63,6 +63,7 @@
 #endif /* CONFIG_LINUX */
 
 static CPUState *next_cpu;
+struct kvm_clock_data data;
 
 /***/
 void hw_error(const char *fmt, ...)
@@ -788,6 +789,7 @@ static int all_vcpus_paused(void)
 void pause_all_vcpus(void)
 {
 CPUState *penv = first_cpu;
+int ret;
 
 while (penv) {
 penv-stop = 1;
@@ -803,11 +805,18 @@ void pause_all_vcpus(void)
 penv = (CPUState *)penv-next_cpu;
 }
 }
+
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
+if (ret  0) {
+fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
+data.clock = 0;
+}
 }
 
 void resume_all_vcpus(void)
 {
 CPUState *penv = first_cpu;
+int ret;
 
 while (penv) {
 penv-stop = 0;
@@ -815,6 +824,11 @@ void resume_all_vcpus(void)
 qemu_cpu_kick(penv);
 penv = (CPUState *)penv-next_cpu;
 }
+if (data.clock != 0) {
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, data);
+if (ret  0)
+fprintf(stderr, KVM_SET_CLOCK failed: %s\n, strerror(ret));
+}
 }
 
 static void qemu_tcg_init_vcpu(void *_env)
 


signature.asc
Description: Digital signature


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Jamie Lokier
Marcelo Tosatti wrote:
 In case the VM stops for whatever reason, the host system is not
 supposed to adjust time related hardware state to compensate, in an
 attempt to present apparent continuous time.
 
 If you save a VM and then restore it later, it is the guest
 responsability to adjust its time representation.

If the guest doesn't know it's been stopped, then its time
representation will be wrong until it finds out, e.g. after a few
minutes with NTP, or even a seconds can be too long.

That is sad when it happens because it breaks the coherence of any
timed-lease caching the guest is involved in.  I.e. where the guest
acquires a lock on some data object (like a file in NFS) that it can
efficiently access without network round trips (similar to MESI), with
all nodes having agreed that it's coherent for, say, 5 seconds before
renewing or breaking.  (It's just a way to reduce latency.)

But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
one of those facts of life.  So instead the effort is to try and
detect when a VM is involved and then distrust the clock.

(Non-VM) suspend/resume is similar, but there's usually a way to
be notified about that as it happens.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html