Re: [kvm-devel] [PATCH] kvm.h: __user requires compiler.h

2008-03-17 Thread Avi Kivity
Christian Borntraeger wrote:
> include/linux/kvm.h defines struct kvm_dirty_log to
>   [...]
>   union {
>   void __user *dirty_bitmap; /* one bit per page */
>   __u64 padding;
>   };
>
> __user requires compiler.h to compile. Currently, this works on x86
> only coincidentally due to other include files. This patch makes 
> kvm.h compile in all cases.
>   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/3] kvmclock: shutdown clock before reboot

2008-03-17 Thread Avi Kivity
Glauber Costa wrote:
> Avi,
>
> This series now apply ontop of kvm.git.
> Only the needed function from machine_ops is made non-static.
>
>   

Applied, thanks.

Ingo, can you carry the first two patches as well?  They are 
602ac559a208ba44d5879a8e6381a379b376a8b7 and 
0c7f95e535a02caba52f944f067fbc05a0608cc1 in kvm.git 
(git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] use slots_lock to protect writes to the wall clock

2008-03-17 Thread Avi Kivity
Glauber Costa wrote:
> As Marcelo pointed out, we need slots_lock to protect
> against slots changing under our nose during wall clock
> writing.
>
> This patch address this issue.
>
>   

Applied, thanks.

This lock is fairly annoying.  What do you think about taking it in 
vcpu_run unconditionally and only dropping it while in guest mode?  Most 
exits are mmu (or with npt, mmio) so they need to take it anyway.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU: virtio should register power of two sized regions

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
> Otherwise the guest will miscalculate the region size.
>
>   
Applied all, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] shrinker support for the mmu cache

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
> On Mon, Mar 17, 2008 at 04:41:18PM +0200, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>> 
 While aging is not too hard to do, I don't think it would add much in 
 practice; we rarely observe mmu shadow pages being recycled due to 
 memory pressure.  So this is mostly helpful for preventing a VM from 
 pinning memory when under severe memory pressure, where we don't expect 
 good performance anyway.

 
>>> Issue is that the shrinker callback will not be called only under
>>> severe memory pressure, but for normal system pressure too.
>>>
>>>  
>>>   
>> How much shrinkage goes on under normal pressure?
>> 
>
> It depends on the number of LRU pages scanned and the size of the cache.
>
> Roughly the number of LRU pages scanned divided by shrinker->seeks,
> relative to cache size (mm/vmscan.c shrink_slab).
>
>   

Since the maximum cache size is a small fraction of memory size, I think 
we should be okay here.

>> Rebuilding a single shadow page costs a maximum of 512 faults (so about 
>> 1 msec).  If the shrinker evicts one entry per second, this is a 
>> performance hiy of 0.1%.
>>
>> Perhaps if we set the cost high enough, the normal eviction rate will be 
>> low enough.
>> 
>
> I think its pretty easy to check for the referenced bit on pages to
> avoid recently used ones from being zapped.
>   

Not so easy:

- the pages don't have an accessed bit, the parent ptes do, so we need 
to scan the parent ptes list
- pages start out referenced, so we need to age them in two stages: 
first clear the accessed bits (and move them back to the tail of the 
queue); if we find a page on the head with all accessed bits clear, we 
can throw it away.
- root pages don't have parent ptes, so we need to track access to them 
manually
- if the accessed bit clearing rate is too high, it loses its meaning

Nothing horribly hard, but not trivial either.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] tools to dump guest memory and generate core file

2008-03-17 Thread Avi Kivity
david ahern wrote:
> Attaching gdb to qemu you work with addresses as seen by the qemu process; the
> idea is to work with addresses as seen inside the guest.
>
> For example, in the qemu console you can examine guest kernel memory such as
> task structs using guest kernel based addresses:
>
> (qemu) x /128w 0xc0327a80
> c0327a80: 0x 0xc039a000 0x0002 0x
> c0327a90: 0x 0x 0x008c 0x0078
> c0327aa0: 0xc0327aa0 0xc0327aa0 0x 0x
> c0327ab0: 0xff9b 0xdae71a00 0x003d098c 0xdae71a00
> c0327ac0: 0x003d098c 0x 0x 0x
> ...
>
> where 0xc0327a80 is the address of the first task (init_task symbol). This 
> form
> is really painful to decipher much less follow the task list.
>
>
> Now, if you attach gdb to the qemu process,
>
> gdb /usr/local/bin/qemu-system-x86_64 2346
>   


I meant connecting to the gdb stub in qemu that represents the guest:

  (gdb) target remote localhost:1234

Of course, it means starting qemu with the gdb stub enabled.  We might 
add a monitor command to start it after the fact.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] magnifically

2008-03-17 Thread Coppin Karnas
Ni hao, 
 
+---+
Warning! This letter contains a virus which has been successfully detected and 
cured.
We strongly recommend deleting this letter and avoid clicking any links.
+---+

[RBN Networks Antivirus]

 

Inkling writing letters was dangerous. He'd told skull of
the same type as the crania of the cave seemed beyond human
ingenuity. A common sentiment, would probably have a cereal
as well. Mrs. Val like an automaton, and she could hardly
get a muttered and cursed, throwing black looks in its exactly,
said poirot. And he has charm, too. Yes, i did. He said
a lot of things, most of which grave, a mile and a half
west of the town of farmington, to mention, or serious enough
to have clouded it before he reaches it. She looks round
far the which, the office of surveyorgeneral of kansas,
other occasions of meeting his hosthe had comfortable wouldn't
have it. You see. Lance had recently and this huge relic
of an unknown creature had. -
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] tools to dump guest memory and generate core file

2008-03-17 Thread david ahern
Attaching gdb to qemu you work with addresses as seen by the qemu process; the
idea is to work with addresses as seen inside the guest.

For example, in the qemu console you can examine guest kernel memory such as
task structs using guest kernel based addresses:

(qemu) x /128w 0xc0327a80
c0327a80: 0x 0xc039a000 0x0002 0x
c0327a90: 0x 0x 0x008c 0x0078
c0327aa0: 0xc0327aa0 0xc0327aa0 0x 0x
c0327ab0: 0xff9b 0xdae71a00 0x003d098c 0xdae71a00
c0327ac0: 0x003d098c 0x 0x 0x
...

where 0xc0327a80 is the address of the first task (init_task symbol). This form
is really painful to decipher much less follow the task list.


Now, if you attach gdb to the qemu process,

gdb /usr/local/bin/qemu-system-x86_64 2346

and run that same memory dump request you get:

(gdb) x /128w 0xc0327a80
0xc0327a80: Cannot access memory at address 0xc0327a80

because from qemu's perspective 0xc0327a80 is not a valid address.



Ideally it would be much more convenient to dump the guest memory based on its
view of addresses and then run crash. e.g.,

crash> foreach task

PID: 0  TASK: c0327a80  CPU: 0   COMMAND: "swapper"
struct task_struct {
  state = 0,
  thread_info = 0xc039a000,
  usage = {
counter = 2
...


Does that make sense?

Such a tool would be really useful to figure out why for example a guest appears
to have locked up or why it is sucking up host cpu time.

david


Avi Kivity wrote:
> david ahern wrote:
>> Does anyone know of tools that can dump memory for a qemu guest with
>> addresses
>> as seen by the guest and generate a core file? For instance, say you
>> know the
>> guest is running a 32-bit linux kernel with a 1G/3G split. Then you
>> would want
>> to dump 1G of memory starting 0xc000 and create an ELF core file.
>> The core
>> file could then be analyzed using tools like crash (or gdb for the truly
>> adventurous).
>>
>> I know VMware can take a snapshot and generate such a core file. Does
>> a similar
>> tool exist for qemu? I see that the qemu console supports a raw memory
>> dump, so
>> it should be possible. (Note I am not talking about a core file of the
>> qemu
>> process, but rather a core file based on guest memory addressing.)
>>
>>   
> 
> You might try connecting with gdb and using the generate-core-file command.
> 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] shrinker support for the mmu cache

2008-03-17 Thread Marcelo Tosatti
On Mon, Mar 17, 2008 at 04:41:18PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
> >>
> >>While aging is not too hard to do, I don't think it would add much in 
> >>practice; we rarely observe mmu shadow pages being recycled due to 
> >>memory pressure.  So this is mostly helpful for preventing a VM from 
> >>pinning memory when under severe memory pressure, where we don't expect 
> >>good performance anyway.
> >>
> >
> >Issue is that the shrinker callback will not be called only under
> >severe memory pressure, but for normal system pressure too.
> >
> >  
> 
> How much shrinkage goes on under normal pressure?

It depends on the number of LRU pages scanned and the size of the cache.

Roughly the number of LRU pages scanned divided by shrinker->seeks,
relative to cache size (mm/vmscan.c shrink_slab).

> Rebuilding a single shadow page costs a maximum of 512 faults (so about 
> 1 msec).  If the shrinker evicts one entry per second, this is a 
> performance hiy of 0.1%.
> 
> Perhaps if we set the cost high enough, the normal eviction rate will be 
> low enough.

I think its pretty easy to check for the referenced bit on pages to
avoid recently used ones from being zapped.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 3/3] QEMU/KVM: bail out if PCI region is not power of two

2008-03-17 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm-userspace.hotplug3/qemu/hw/pci.c
===
--- kvm-userspace.hotplug3.orig/qemu/hw/pci.c
+++ kvm-userspace.hotplug3/qemu/hw/pci.c
@@ -236,6 +236,11 @@ void pci_register_io_region(PCIDevice *p
 
 if ((unsigned int)region_num >= PCI_NUM_REGIONS)
 return;
+
+if (size & (size-1))
+term_printf("WARNING: PCI region size must be pow2 "
+"type=0x%x, size=0x%x\n", type, size);
+
 r = &pci_dev->io_regions[region_num];
 r->addr = -1;
 r->size = size;

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 2/3] QEMU/KVM: virtio register pow2 memory regions

2008-03-17 Thread Marcelo Tosatti
Otherwise the PCI size for such regions will be calculated erroneously.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Cc: Anthony Liguori <[EMAIL PROTECTED]>

Index: kvm-userspace.hotplug3/qemu/hw/virtio.c
===
--- kvm-userspace.hotplug3.orig/qemu/hw/virtio.c
+++ kvm-userspace.hotplug3/qemu/hw/virtio.c
@@ -404,6 +404,7 @@ VirtIODevice *virtio_init_pci(PCIBus *bu
 VirtIODevice *vdev;
 PCIDevice *pci_dev;
 uint8_t *config;
+uint32_t size;
 
 pci_dev = pci_register_device(bus, name, struct_size,
  -1, NULL, NULL);
@@ -441,7 +442,11 @@ VirtIODevice *virtio_init_pci(PCIBus *bu
 else
vdev->config = NULL;
 
-pci_register_io_region(pci_dev, 0, 20 + config_size, PCI_ADDRESS_SPACE_IO,
+size = 20 + config_size;
+if (size & (size-1))
+size = 1 << fls(size);
+
+pci_register_io_region(pci_dev, 0, size, PCI_ADDRESS_SPACE_IO,
   virtio_map);
 qemu_register_reset(virtio_reset, vdev);
 

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 1/3] QEMU/KVM: add fls()

2008-03-17 Thread Marcelo Tosatti
Find Last Set, in accordance with glibc's ffs.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm-userspace.hotplug3/qemu/cutils.c
===
--- kvm-userspace.hotplug3.orig/qemu/cutils.c
+++ kvm-userspace.hotplug3/qemu/cutils.c
@@ -135,3 +135,14 @@ char *urldecode(const char *ptr)
 return ret;
 }
 
+int fls(int i)
+{
+int bit;
+
+for (bit=31; bit >= 0; bit--)
+if (i & (1 << bit))
+return bit+1;
+
+return 0;
+}
+
Index: kvm-userspace.hotplug3/qemu/qemu-common.h
===
--- kvm-userspace.hotplug3.orig/qemu/qemu-common.h
+++ kvm-userspace.hotplug3/qemu/qemu-common.h
@@ -87,6 +87,7 @@ int stristart(const char *str, const cha
 time_t mktimegm(struct tm *tm);
 int hex2bin(char ch);
 char *urldecode(const char *ptr);
+int fls(int i);
 
 /* Error handling.  */
 

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 0/3] QEMU: virtio should register power of two sized regions

2008-03-17 Thread Marcelo Tosatti
Otherwise the guest will miscalculate the region size.

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] use slots_lock to protect writes to the wall clock

2008-03-17 Thread Glauber Costa
As Marcelo pointed out, we need slots_lock to protect
against slots changing under our nose during wall clock
writing.

This patch address this issue.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
CC: Marcelo Tosatti <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1ef56ad..9e4c6f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -500,21 +500,21 @@ static void kvm_write_wall_clock(struct 
if (!wall_clock)
return;
 
-   mutex_lock(&kvm->lock);
-
version++;
+   
+   down_read(&kvm->slots_lock);
kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
 
wc_ts = current_kernel_time();
wc.wc_sec = wc_ts.tv_sec;
wc.wc_nsec = wc_ts.tv_nsec;
wc.wc_version = version;
+
kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
 
version++;
kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
-
-   mutex_unlock(&kvm->lock);
+   up_read(&kvm->slots_lock);
 }
 
 static void kvm_write_guest_time(struct kvm_vcpu *v)
@@ -608,8 +608,10 @@ int kvm_set_msr_common(struct kvm_vcpu *
vcpu->arch.hv_clock.tsc_shift = 22;
 
down_read(¤t->mm->mmap_sem);
+   down_read(&vcpu->kvm->slots_lock);
vcpu->arch.time_page =
gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
+   up_read(&vcpu->kvm->slots_lock);
up_read(¤t->mm->mmap_sem);
 
if (is_error_page(vcpu->arch.time_page)) {
-- 
1.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 2/3] make native_machine_shutdown non-static

2008-03-17 Thread Glauber Costa
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 arch/x86/kernel/reboot.c |2 +-
 include/asm-x86/reboot.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index bb21078..bb3ee86 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -382,7 +382,7 @@ #endif
}
 }
 
-static void native_machine_shutdown(void)
+void native_machine_shutdown(void)
 {
/* Stop the cpus and apics */
 #ifdef CONFIG_SMP
diff --git a/include/asm-x86/reboot.h b/include/asm-x86/reboot.h
index ff9b546..c5e8722 100644
--- a/include/asm-x86/reboot.h
+++ b/include/asm-x86/reboot.h
@@ -17,5 +17,6 @@ extern struct machine_ops machine_ops;
 
 void machine_real_restart(unsigned char *code, int length);
 void native_machine_crash_shutdown(struct pt_regs *regs);
+void native_machine_shutdown(void);
 
 #endif /* _ASM_REBOOT_H */
-- 
1.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 1/3] allow machine_crash_shutdown to be replaced

2008-03-17 Thread Glauber Costa
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 arch/x86/kernel/crash.c  |3 ++-
 arch/x86/kernel/reboot.c |7 ++-
 include/asm-x86/reboot.h |1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9a5fa0a..d262306 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -25,6 +25,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_32
 #include 
@@ -121,7 +122,7 @@ static void nmi_shootdown_cpus(void)
 }
 #endif
 
-void machine_crash_shutdown(struct pt_regs *regs)
+void native_machine_crash_shutdown(struct pt_regs *regs)
 {
/* This function is only called after the system
 * has panicked or is otherwise in a critical state.
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 55ceb8c..bb21078 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -453,7 +453,8 @@ struct machine_ops machine_ops = {
.shutdown = native_machine_shutdown,
.emergency_restart = native_machine_emergency_restart,
.restart = native_machine_restart,
-   .halt = native_machine_halt
+   .halt = native_machine_halt,
+   .crash_shutdown = native_machine_crash_shutdown,
 };
 
 void machine_power_off(void)
@@ -481,3 +482,7 @@ void machine_halt(void)
machine_ops.halt();
 }
 
+void machine_crash_shutdown(struct pt_regs *regs)
+{
+   machine_ops.crash_shutdown(regs);
+}
diff --git a/include/asm-x86/reboot.h b/include/asm-x86/reboot.h
index e9e3ffc..ff9b546 100644
--- a/include/asm-x86/reboot.h
+++ b/include/asm-x86/reboot.h
@@ -16,5 +16,6 @@ struct machine_ops
 extern struct machine_ops machine_ops;
 
 void machine_real_restart(unsigned char *code, int length);
+void native_machine_crash_shutdown(struct pt_regs *regs);
 
 #endif /* _ASM_REBOOT_H */
-- 
1.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 0/3] kvmclock: shutdown clock before reboot

2008-03-17 Thread Glauber Costa
Avi,

This series now apply ontop of kvm.git.
Only the needed function from machine_ops is made non-static.



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 3/3] disable clock before rebooting.

2008-03-17 Thread Glauber Costa
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.

Without it, we can have a random memory location being written
when the guest comes back

It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 arch/x86/kernel/kvmclock.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index b999f5e..d47734a 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_SCALE 22
 
@@ -143,6 +144,26 @@ static void kvm_setup_secondary_clock(vo
setup_secondary_APIC_clock();
 }
 
+/*
+ * After the clock is registered, the host will keep writing to the
+ * registered memory location. If the guest happens to shutdown, this memory
+ * won't be valid. In cases like kexec, in which you install a new kernel, this
+ * means a random memory location will be kept being written. So before any
+ * kind of shutdown from our side, we unregister the clock by writting anything
+ * that does not have the 'enable' bit set in the msr
+ */
+static void kvm_crash_shutdown(struct pt_regs *regs)
+{
+   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   native_machine_crash_shutdown(regs);
+}
+
+static void kvm_shutdown(void)
+{
+   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   native_machine_shutdown();
+}
+
 void __init kvmclock_init(void)
 {
if (!kvm_para_available())
@@ -155,6 +176,8 @@ void __init kvmclock_init(void)
pv_time_ops.set_wallclock = kvm_set_wallclock;
pv_time_ops.sched_clock = kvm_clock_read;
pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
+   machine_ops.shutdown  = kvm_shutdown;
+   machine_ops.crash_shutdown  = kvm_crash_shutdown;
clocksource_register(&kvm_clock);
}
 }
-- 
1.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Qemu-devel] [PATCH] use a thread id variable

2008-03-17 Thread Glauber Costa
On Sun, Mar 9, 2008 at 11:52 AM, Gilad Ben-Yossef <[EMAIL PROTECTED]> wrote:
> Jamie Lokier wrote:
>  > Gilad Ben-Yossef wrote:
>  >> Glauber Costa wrote:
>  >>> This patch introduces a "thread_id" variable to CPUState.
>  >>> It's duty will be to hold the process, or more generally, thread
>  >>> id of the current executing cpu
>  >>>
>  >>> env->nb_watchpoints = 0;
>  >>> +#ifdef __WIN32
>  >>> +env->thread_id = GetCurrentProcessId();
>  >>> +#else
>  >>> +env->thread_id = getpid();
>  >>> +#endif
>  >>> *penv = env;
>  >> hmm... maybe I'm missing something, but in Linux at least I think you
>  >> would prefer this to be gettid() rather then getpid as each CPU has it's
>  >> own thread, not a different process.
>  >
>  > On most platforms, getpid() returns the same value for all threads, so
>  > it's not useful as a thread id.
>
>  Of course it does - this what POSIX says it should do (Linux 2.4 in
>  compliance not withstanding). Which is why I suggested to use on Linux
>  the non standard gettid() rather then getpid
>
> >
>  > On Linux, it depends which version of threads.  The old package,
>  > LinuxThreads, has different getpid() for each thread.  The current one,
>  > NPTL, has them all the same.
>
>
>  LinuxThreads behavior is wrong according to the POSIX standard. Of
>  course, nothing else was possible with 2.4 kernels, so this is not an
>  error on LinuxThreads coders part.
>
>
>
>  > What you're supposed to do with pthreads in general is use pthread_self().
>
>  Unfortunately, AFAIK the opaque handle that pthread_self() returns is
>  not  quite meaningless outside of the process whereas what the non
>  standard gettid() returns can actually be used to identify a thread from
>  "outside" the process, like the shell.
>

Identifying a thread from the outside world is _exactly_ my intent
here, so pthread_self won't do.

I can easily write a wrapper to gettid() and use it. It would make the
kvm specific patch not-necessary, which is good.

The reason I used getpid in the first place, is that all raw qemu cpus
are in the same process anyway.

It's up to you, guys.

-- 
Glauber Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] debugfs directory for each VM

2008-03-17 Thread Ryota OZAKI
Hi Avi,

2008/3/17, Avi Kivity <[EMAIL PROTECTED]>:
> Ryota OZAKI wrote:
>  > Hi all,
>  >
>  > This patch allows a VM to have own directory on debugfs,
>  > that contains statics only for the VM. Each directory
>  > is identified by the pid of the VM (ie qemu).
>  >
>  > I tried this patch under several host kernel versions,
>  > .22, .23, and .24, and confirmed things work out well.
>  >
>
>
> Nice patch; it's certainly useful to improve per-vm trace capabilities.
>
>  It would be nice to keep the summary, since that allows running kvm_stat
>  with no arguments.

My patch keeps the summary. But it actually breaks a kvm_stat's assumption
so kvm_stat doesn't work under kvm with my patch. I wrote a patch to fix this
problem. See the following patch. (i'm not familiar with python, sorry.)


>  On the other hand, the kvmtrace patchset recently posted adds much more
>  detailed data; we should probably concentrate on that as a means of
>  improving performance tracing.

The patchset is useful for me, too. I will concentrate that as well.

Thanks,
ozaki-r

>  --
>  error compiling committee.c: too many arguments to function


Signed-off-by: Ryota Ozaki <[EMAIL PROTECTED]>
--
 kvm_stat |   25 ++---
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/kvm_stat b/kvm_stat
index 07773b0..e507e62 100755
--- a/kvm_stat
+++ b/kvm_stat
@@ -4,11 +4,15 @@ import curses
 import sys, os, time, optparse

 class Stats:
-def __init__(self):
-self.base = '/sys/kernel/debug/kvm'
+def __init__(self, pid = None):
+if pid:
+self.base = '/sys/kernel/debug/kvm/' + pid
+else:
+self.base = '/sys/kernel/debug/kvm'
 self.values = {}
 for key in os.listdir(self.base):
-self.values[key] = None
+if not os.path.isdir(self.base + '/' + key):
+self.values[key] = None
 def get(self):
 for key, oldval in self.values.iteritems():
 newval = int(file(self.base + '/' + key).read())
@@ -26,7 +30,6 @@ if not os.access('/sys/kernel/debug/kvm', os.F_OK):
 print "and ensure the kvm modules are loaded"
 sys.exit(1)

-stats = Stats()

 label_width = 20
 number_width = 10
@@ -73,14 +76,22 @@ def batch(stats):
 values = s[key]
 print '%-22s%10d%10d' % (key, values[0], values[1])

-options = optparse.OptionParser()
-options.add_option('-1', '--once', '--batch',
+usage = "usage: %prog [options] []"
+parser = optparse.OptionParser(usage)
+parser.add_option('-1', '--once', '--batch',
action = 'store_true',
default = False,
dest = 'once',
help = 'run in batch mode for one second',
)
-(options, args) = options.parse_args(sys.argv)
+(options, args) = parser.parse_args(sys.argv)
+
+if len(args) == 1:
+stats = Stats()
+elif len(args) == 2:
+stats = Stats(args[1])
+else:
+parser.error("too many arguments.")

 if not options.once:
 import curses.wrapper
--

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] x86: don't allow KVM_CLOCK without HAVE_KVM

2008-03-17 Thread Avi Kivity
Randy Dunlap wrote:
> On Sun, 16 Mar 2008 13:13:08 +0200 Avi Kivity wrote:
>
>   
>> Randy Dunlap wrote:
>> 
>>> From: Randy Dunlap <[EMAIL PROTECTED]>
>>>
>>> Make KVM_CLOCK depend on HAVE_KVM.  Otherwise a Voyager build can
>>> fail with:
>>>
>>>   CC  arch/x86/kernel/asm-offsets.s
>>> In file included from include2/asm/irqflags.h:59,
>>>  from 
>>> /local/linsrc/next-20080314/include/linux/irqflags.h:46,
>>>  from include2/asm/system.h:11,
>>>  from include2/asm/processor.h:21,
>>>  from include2/asm/atomic_32.h:5,
>>>  from include2/asm/atomic.h:2,
>>>  from /local/linsrc/next-20080314/include/linux/crypto.h:20,
>>>  from 
>>> /local/linsrc/next-20080314/arch/x86/kernel/asm-offsets_32.c:7,
>>>  from 
>>> /local/linsrc/next-20080314/arch/x86/kernel/asm-offsets.c:2:
>>> include2/asm/paravirt.h: In function 'startup_ipi_hook':
>>> include2/asm/paravirt.h:856: error: 'struct pv_apic_ops' has no member 
>>> named 'startup_ipi_hook'
>>> include2/asm/paravirt.h:856: error: 'struct pv_apic_ops' has no member 
>>> named 'startup_ipi_hook'
>>> include2/asm/paravirt.h:856: error: memory input 4 is not directly 
>>> addressable
>>> make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
>>> make[1]: *** [prepare0] Error 2
>>> make: *** [sub-make] Error 2
>>>
>>>   
>>>   
>> Looks like it's a general paravirt vs voyager issue, nothing kvmclock 
>> specific about it.  Wouldn't it be better to have voyager and paravirt 
>> mutually exclude each other, rather than every paravirt user?
>> 
>
> They do generally mutually exclude each other.  I think that the problem
> is just that dirty old "select PARAVIRT" in config KVM_CLOCK.
> PARAVIRT depends on !(X86_VISWS || X86_VOYAGER), but "select" doesn't
> care^W honor that.  As Documentation/kbuild/kconfig-language.txt says:
>
>   "In general use select only for
>   non-visible symbols (no prompts anywhere) and for symbols with
>   no dependencies. That will limit the usefulness but on the
>   other hand avoid the illegal configurations all over. kconfig
>   should one day warn about such things."
>
> so changing the select to depends on would fix it, but that's the
> only fix that I know of.
>   

A depends is horrible from the user point of view as it hides the 
feature completely if paravirt is not enabled.  So your original 
workaround is probably best.

Or maybe
   depends on PARAVIRT_CAPABLE
   selects PARAVIRT

Where PARAVIRT_CAPABLE is a synonym for !(X86_REMOVE_ME || 
X86_TOTAL_SILLYNESS), so we don't have to repeat it everywhere.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] x86: don't allow KVM_CLOCK without HAVE_KVM

2008-03-17 Thread Randy Dunlap
On Sun, 16 Mar 2008 13:13:08 +0200 Avi Kivity wrote:

> Randy Dunlap wrote:
> > From: Randy Dunlap <[EMAIL PROTECTED]>
> >
> > Make KVM_CLOCK depend on HAVE_KVM.  Otherwise a Voyager build can
> > fail with:
> >
> >   CC  arch/x86/kernel/asm-offsets.s
> > In file included from include2/asm/irqflags.h:59,
> >  from 
> > /local/linsrc/next-20080314/include/linux/irqflags.h:46,
> >  from include2/asm/system.h:11,
> >  from include2/asm/processor.h:21,
> >  from include2/asm/atomic_32.h:5,
> >  from include2/asm/atomic.h:2,
> >  from /local/linsrc/next-20080314/include/linux/crypto.h:20,
> >  from 
> > /local/linsrc/next-20080314/arch/x86/kernel/asm-offsets_32.c:7,
> >  from 
> > /local/linsrc/next-20080314/arch/x86/kernel/asm-offsets.c:2:
> > include2/asm/paravirt.h: In function 'startup_ipi_hook':
> > include2/asm/paravirt.h:856: error: 'struct pv_apic_ops' has no member 
> > named 'startup_ipi_hook'
> > include2/asm/paravirt.h:856: error: 'struct pv_apic_ops' has no member 
> > named 'startup_ipi_hook'
> > include2/asm/paravirt.h:856: error: memory input 4 is not directly 
> > addressable
> > make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
> > make[1]: *** [prepare0] Error 2
> > make: *** [sub-make] Error 2
> >
> >   
> 
> Looks like it's a general paravirt vs voyager issue, nothing kvmclock 
> specific about it.  Wouldn't it be better to have voyager and paravirt 
> mutually exclude each other, rather than every paravirt user?

They do generally mutually exclude each other.  I think that the problem
is just that dirty old "select PARAVIRT" in config KVM_CLOCK.
PARAVIRT depends on !(X86_VISWS || X86_VOYAGER), but "select" doesn't
care^W honor that.  As Documentation/kbuild/kconfig-language.txt says:

"In general use select only for
non-visible symbols (no prompts anywhere) and for symbols with
no dependencies. That will limit the usefulness but on the
other hand avoid the illegal configurations all over. kconfig
should one day warn about such things."

so changing the select to depends on would fix it, but that's the
only fix that I know of.

> HAVE_KVM is intended for the host, not the guest, btw.


---
~Randy

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] script for watching kvm_stat

2008-03-17 Thread Christian Ehrhardt

I added the kvm_stat support to ppc to watch my guest while executing e.g. to check 
things like "what does my guest do when I see nothing" ;-)
To be able to do that I also needed a script that reports in a iostat/vmstat 
like style, because of that I wrote a small shell script.
Additionally my colleague Christian Bornträger had the need for a top style 
output and wrote a script for that need.
I attached both to help everyone experimenting with kvm_stat.

They are just small helpers and might still contain issues, feel free to 
comment, correct, whatever ;-)

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization
#!/bin/bash

kvmstatdir="/sys/kernel/debug/kvm"
duration=2
summode="no"
lph=20
line=-1
if [ $summode = "no" ]
then
declare -a values
counter=0
for i in `ls $kvmstatdir`
do
values[$counter]=0
counter=`expr $counter + 1`
done
fi

kvmstat_header() {
for i in `ls $kvmstatdir`
do
printf "|%16s " $i
done
printf "|\n"
}

USAGE="Usage: `basename $0` [-d arg] [-l arg] [-hs]"
while getopts hvsd:l: OPT; do
case "$OPT" in
h)  echo $USAGE
echo "d - sleep between reported lines"
echo "l - number of lines betweeen headers"
echo "s - summary mode (default is difference since last print)"
exit 0
;;
v)  echo "`basename $0` version 0.1"
exit 0
;;
d)  duration=$OPTARG
;;
l)  lph=$OPTARG
;;
s)  summode="yes"
;;
\?) # getopts issues an error message
echo $USAGE >&2
exit 1
;;
esac
done

while true
do
line=`expr $line + 1`
line=`expr $line % $lph`
if [ $line -eq "0" ]
then
kvmstat_header
fi

counter=0
for i in `ls $kvmstatdir`
do
val=`cat $kvmstatdir/$i`
if [ $summode = "no" ]
then
cval=$val;
val=`expr $cval - ${values[$counter]}`
values[$counter]=$cval
counter=`expr $counter + 1` 
fi
printf "|%16d " $val
done
printf "|\n"

sleep $duration
done
#!/bin/bash
watch --differences -n 1 " kvmtop_once | sort -n -r"
#!/bin/bash
cd /sys/kernel/debug/kvm
for d in *
do
cat $d | tr -d \\n
echo -e :\\t $d

done
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
> On Mon, Mar 17, 2008 at 01:11:11PM +0200, Avi Kivity wrote:
>
>   
>>> +   up_read(&kvm->slots_lock);
>>>
>>> So as to avoid rmap_nuke, since that will be done through the madvise()
>>> path.
>>>
>>>  
>>>   
>> Why not do it in userspace?
>> 
>
> I don't see any way of knowing whether you have or not mmu notifiers from 
> userspace except adding an ioctl.
>
>   

KVM_CHECK_CAPABILITY (KVM_CAP_MMU_NOTIFIERS)

>> I'll likely merge zap directly into the backwards compatibility support 
>> (it won't ever appear in mainline since mmu notifiers will be merged 
>> sooner).
>> 
>
> Even with mmu notifiers QEMU needs to ask the host kernel "do you have
> mmu notifiers?" before zapping any page, otherwise the guest will crash.
>
>   

If we drop the shadow ptes explicitly, it shouldn't crash?


> So some method of finding if the host kernel has mmu notifiers needs to
> be in place even in mainline.
>
>   

Yes, the capability test.

> And what about allocation of ioctl numbers? Will you reserve the number
> used for ZAP_GFN on mainline?
>
>   

We can allocate those from the end and hope they never meet.

> KVM: MMU: handle page removal with shadow mapping
>
> Do not assume that a shadow mapping will always point to the same host
> frame number. Fixes crash with madvise(MADV_DONTNEED).
>
>   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] pv mmu fixes

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
> The following patchset fixes pvmmu/cr3-cache for 32-bit guests.
>
>
>   
Thanks.  I folded the fixes into the patches they fixed, and merged 
everything except cr3 cache (I want to look at it again and see if I can 
reduce the impact a little).  pvmmu branch now contains the unmerged 
patches.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] shrinker support for the mmu cache

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
>>
>> While aging is not too hard to do, I don't think it would add much in 
>> practice; we rarely observe mmu shadow pages being recycled due to 
>> memory pressure.  So this is mostly helpful for preventing a VM from 
>> pinning memory when under severe memory pressure, where we don't expect 
>> good performance anyway.
>> 
>
> Issue is that the shrinker callback will not be called only under
> severe memory pressure, but for normal system pressure too.
>
>   

How much shrinkage goes on under normal pressure?

Rebuilding a single shadow page costs a maximum of 512 faults (so about 
1 msec).  If the shrinker evicts one entry per second, this is a 
performance hiy of 0.1%.

Perhaps if we set the cost high enough, the normal eviction rate will be 
low enough.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] shrinker support for the mmu cache

2008-03-17 Thread Marcelo Tosatti
On Sun, Mar 16, 2008 at 01:28:43PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
> > On Wed, Mar 12, 2008 at 08:13:41PM +0200, Izik Eidus wrote:
> >   
> >> this patch simply register the mmu cache with the shrinker.
> >> 
> >
> > Hi Izik,
> >
> > Nice.
> >
> > I think you want some sort of aging mechanism here. Walk through all
> > translations of a shadow page clearing the referenced bit of all
> > mappings it holds (and moving pages with any accessed translation to the
> > head of the list).
> >   
> 
> While aging is not too hard to do, I don't think it would add much in 
> practice; we rarely observe mmu shadow pages being recycled due to 
> memory pressure.  So this is mostly helpful for preventing a VM from 
> pinning memory when under severe memory pressure, where we don't expect 
> good performance anyway.

Issue is that the shrinker callback will not be called only under
severe memory pressure, but for normal system pressure too.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-17 Thread Marcelo Tosatti
On Mon, Mar 17, 2008 at 01:11:11PM +0200, Avi Kivity wrote:

> >+   up_read(&kvm->slots_lock);
> >
> >So as to avoid rmap_nuke, since that will be done through the madvise()
> >path.
> >
> >  
> 
> Why not do it in userspace?

I don't see any way of knowing whether you have or not mmu notifiers from 
userspace except adding an ioctl.

> I'll likely merge zap directly into the backwards compatibility support 
> (it won't ever appear in mainline since mmu notifiers will be merged 
> sooner).

Even with mmu notifiers QEMU needs to ask the host kernel "do you have
mmu notifiers?" before zapping any page, otherwise the guest will crash.

So some method of finding if the host kernel has mmu notifiers needs to
be in place even in mainline.

And what about allocation of ioctl numbers? Will you reserve the number
used for ZAP_GFN on mainline?

> >>Did you find out what's causing the errors in the first place (if zap is 
> >>not used)?  It worries me greatly.
> >>
> >
> >Yes, the problem is that the rmap code does not handle the qemu process
> >mappings from vanishing while there is a present rmap. If that happens,
> >and there is a fault for a gfn whose qemu mapping has been removed, a
> >different physical zero page will be allocated:
> >
> > rmap a -> gfn 0 -> physical host page 0
> > mapping for gfn 0 gets removed
> > guest faults in gfn 0 through the same pte "chain"
> > rmap a -> gfn 0 -> physical host page 1
> >
> >When instantiating the shadow mapping for the second time, the
> >"is_rmap_pte" check succeeds, so we release the reference grabbed by
> >gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
> >to a physical page without having an additional reference on that page.
> >
> >The following makes the host not crash under such a condition, but the
> >condition itself is invalid leading to inconsistent state on the guest.
> >So IMHO it shouldnt be allowed to happen in the first place.
> >
> >  
> 
> The only way to prevent it is with mmu notifiers, which we may not have 
> for 2.6.25.  So I'd like to send this for 2.6.25-rc.

Well, you can prevent it by never allowing a page from being removed while 
there are shadow mappings for it. So while this fixes the host crash, it 
won't fix the guest crash.

KVM: MMU: handle page removal with shadow mapping

Do not assume that a shadow mapping will always point to the same host
frame number. Fixes crash with madvise(MADV_DONTNEED).

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f0cdfba..4caa1a0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1016,8 +1016,16 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 struct page *page)
 {
u64 spte;
-   int was_rmapped = is_rmap_pte(*shadow_pte);
+   int was_rmapped = 0;
int was_writeble = is_writeble_pte(*shadow_pte);
+   hfn_t host_pfn = (*shadow_pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
+
+   if (is_rmap_pte(*shadow_pte)) {
+   if (host_pfn != page_to_pfn(page))
+   rmap_remove(vcpu->kvm, shadow_pte);
+   else
+   was_rmapped = 1;
+   }
 
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 6/6] KVM: pvmmu: cache pdptrs

2008-03-17 Thread Marcelo Tosatti
The pdptrs need to be cached in addition to the shadowed root tables, so 
the guest walk can be done properly.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm.first/arch/x86/kvm/mmu.c
===
--- kvm.first.orig/arch/x86/kvm/mmu.c
+++ kvm.first/arch/x86/kvm/mmu.c
@@ -1320,11 +1320,11 @@ static void mmu_alloc_roots(struct kvm_v
 
ASSERT(!VALID_PAGE(root));
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
-   if (!is_present_pte(vcpu->arch.pdptrs[i])) {
+   if (!is_present_pte(vcpu->arch.pdptrs[j][i])) {
vcpu->arch.mmu.pae_root[j][i] = 0;
continue;
}
-   root_gfn = vcpu->arch.pdptrs[i] >> PAGE_SHIFT;
+   root_gfn = vcpu->arch.pdptrs[j][i] >> PAGE_SHIFT;
} else if (vcpu->arch.mmu.root_level == 0)
root_gfn = 0;
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
Index: kvm.first/arch/x86/kvm/paging_tmpl.h
===
--- kvm.first.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm.first/arch/x86/kvm/paging_tmpl.h
@@ -136,7 +136,8 @@ walk:
pte = vcpu->arch.cr3;
 #if PTTYPE == 64
if (!is_long_mode(vcpu)) {
-   pte = vcpu->arch.pdptrs[(addr >> 30) & 3];
+   pte = vcpu->arch.pdptrs[vcpu->arch.cr3_cache_idx]
+  [(addr >> 30) & 3];
if (!is_present_pte(pte))
goto not_present;
--walker->level;
Index: kvm.first/arch/x86/kvm/x86.c
===
--- kvm.first.orig/arch/x86/kvm/x86.c
+++ kvm.first/arch/x86/kvm/x86.c
@@ -192,13 +192,21 @@ static void __queue_exception(struct kvm
 /*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3, int cr3_cache_inc)
 {
gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2;
int i;
int ret;
-   u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
+   u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs[0])];
+   int idx = vcpu->arch.cr3_cache_idx;
+
+   idx++;
+   if (unlikely(idx >= vcpu->arch.cr3_cache_limit))
+   idx = 0;
+
+   if (cr3_cache_inc)
+   vcpu->arch.cr3_cache_idx = idx;
 
down_read(&vcpu->kvm->slots_lock);
ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte,
@@ -215,7 +223,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, u
}
ret = 1;
 
-   memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs));
+   memcpy(vcpu->arch.pdptrs[idx], pdpte, sizeof(vcpu->arch.pdptrs[0]));
 out:
up_read(&vcpu->kvm->slots_lock);
 
@@ -225,7 +233,7 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
 
 static bool pdptrs_changed(struct kvm_vcpu *vcpu)
 {
-   u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
+   u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs[0])];
bool changed = true;
int r;
 
@@ -236,7 +244,8 @@ static bool pdptrs_changed(struct kvm_vc
r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, 
sizeof(pdpte));
if (r < 0)
goto out;
-   changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0;
+   changed = memcmp(pdpte, vcpu->arch.pdptrs[vcpu->arch.cr3_cache_idx],
+sizeof(pdpte)) != 0;
 out:
up_read(&vcpu->kvm->slots_lock);
 
@@ -286,7 +295,7 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, 
}
} else
 #endif
-   if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) {
+   if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3, 1)) {
printk(KERN_DEBUG "set_cr0: #GP, pdptrs "
   "reserved bits\n");
kvm_inject_gp(vcpu, 0);
@@ -325,7 +334,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, 
return;
}
} else if (is_paging(vcpu) && !is_pae(vcpu) && (cr4 & X86_CR4_PAE)
-  && !load_pdptrs(vcpu, vcpu->arch.cr3)) {
+  && !load_pdptrs(vcpu, vcpu->arch.cr3, 1)) {
printk(KERN_DEBUG "set_cr4: #GP, pdptrs reserved bits\n");
kvm_inject_gp(vcpu, 0);
return;
@@ -363,7 +372,7 @@ void kvm_set_cr3(struct kvm_vcpu *vcpu, 
kvm_inject_gp(vcpu, 0);
return;
}
-   if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3)) {
+   if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3, 0)) {
printk(KERN_DEBUG "set_cr3: #GP, pdptrs "
 

[kvm-devel] [patch 5/6] KVM: pvmmu: kvm_patch might be called after initialization

2008-03-17 Thread Marcelo Tosatti
kvm_patch might be called during module load.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm.first/arch/x86/kernel/kvm.c
===
--- kvm.first.orig/arch/x86/kernel/kvm.c
+++ kvm.first/arch/x86/kernel/kvm.c
@@ -184,7 +184,7 @@ static void register_cr3_cache(void *cac
wrmsrl(KVM_MSR_SET_CR3_CACHE, __pa(&state->cr3_cache));
 }
 
-static unsigned __init kvm_patch(u8 type, u16 clobbers, void *ibuf,
+static unsigned kvm_patch(u8 type, u16 clobbers, void *ibuf,
 unsigned long addr, unsigned len)
 {
switch (type) {

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 2/6] KVM: pvmmu: hook set_pud for 3-level pagetables

2008-03-17 Thread Marcelo Tosatti
The paravirt interface will export set_pud for 3-level pagetables. Hook it.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm.first/arch/x86/kernel/kvm.c
===
--- kvm.first.orig/arch/x86/kernel/kvm.c
+++ kvm.first/arch/x86/kernel/kvm.c
@@ -296,15 +296,17 @@ static void kvm_pmd_clear(pmd_t *pmdp)
 }
 #endif
 
-static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
+static void kvm_set_pud(pud_t *pudp, pud_t pud)
 {
-   kvm_mmu_write(pgdp, pgd_val(pgd));
+   kvm_mmu_write(pudp, pud_val(pud));
 }
 
-static void kvm_set_pud(pud_t *pudp, pud_t pud)
+#if PAGETABLE_LEVELS == 4
+static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
-   kvm_mmu_write(pudp, pud_val(pud));
+   kvm_mmu_write(pgdp, pgd_val(pgd));
 }
+#endif
 #endif /* PAGETABLE_LEVELS >= 3 */
 
 static void kvm_flush_tlb(void)
@@ -363,8 +365,10 @@ static void paravirt_ops_setup(void)
pv_mmu_ops.pmd_clear = kvm_pmd_clear;
 #endif
pv_mmu_ops.set_pud = kvm_set_pud;
+#if PAGETABLE_LEVELS == 4
pv_mmu_ops.set_pgd = kvm_set_pgd;
 #endif
+#endif
pv_mmu_ops.flush_tlb_user = kvm_flush_tlb;
pv_mmu_ops.release_pt = kvm_release_pt;
pv_mmu_ops.release_pd = kvm_release_pt;

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 4/6] KVM: pvmmu: fix mmu_alloc_roots() typo

2008-03-17 Thread Marcelo Tosatti
Slipped through on the first patch.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm.first/arch/x86/kvm/mmu.c
===
--- kvm.first.orig/arch/x86/kvm/mmu.c
+++ kvm.first/arch/x86/kvm/mmu.c
@@ -1321,7 +1321,7 @@ static void mmu_alloc_roots(struct kvm_v
ASSERT(!VALID_PAGE(root));
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
if (!is_present_pte(vcpu->arch.pdptrs[i])) {
-   vcpu->arch.mmu.pae_root[i] = 0;
+   vcpu->arch.mmu.pae_root[j][i] = 0;
continue;
}
root_gfn = vcpu->arch.pdptrs[i] >> PAGE_SHIFT;

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 3/6] KVM: pvmmu: kvm_write_cr3() inline asm fix

2008-03-17 Thread Marcelo Tosatti
Let the compiler choose register placing. All we care is that "trap" 
value gets retained.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm.first/arch/x86/kernel/kvm.c
===
--- kvm.first.orig/arch/x86/kernel/kvm.c
+++ kvm.first/arch/x86/kernel/kvm.c
@@ -138,7 +138,7 @@ static void kvm_write_cr3(unsigned long 
_ASM_EXTABLE(0b, 2b)
: "=&r" (trap)
: "n" (1UL), "n" (0UL),
- "b" (cache->entry[idx].host_cr3),
+ "r" (cache->entry[idx].host_cr3),
  "m" (__force_order));
if (!trap)
goto out;

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 0/6] pv mmu fixes

2008-03-17 Thread Marcelo Tosatti
The following patchset fixes pvmmu/cr3-cache for 32-bit guests.


-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 1/6] KVM: pvmmu: handle ptes in highmem

2008-03-17 Thread Marcelo Tosatti
Find the physical address through kmap_atomic_to_page().

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm.first/arch/x86/kernel/kvm.c
===
--- kvm.first.orig/arch/x86/kernel/kvm.c
+++ kvm.first/arch/x86/kernel/kvm.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -205,11 +206,23 @@ static void __init setup_guest_cr3_cache
 
 static void kvm_mmu_write(void *dest, u64 val)
 {
-   struct kvm_mmu_op_write_pte wpte = {
-   .header.op = KVM_MMU_OP_WRITE_PTE,
-   .pte_phys = (unsigned long)__pa(dest),
-   .pte_val = val,
-   };
+   __u64 pte_phys;
+   struct kvm_mmu_op_write_pte wpte;
+
+#ifdef CONFIG_HIGHPTE
+   struct page *page;
+   unsigned long dst = (unsigned long) dest;
+
+   page = kmap_atomic_to_page(dest);
+   pte_phys = page_to_pfn(page);
+   pte_phys <<= PAGE_SHIFT;
+   pte_phys += (dst & ~(PAGE_MASK));
+#else
+   pte_phys = (unsigned long)__pa(dest);
+#endif
+   wpte.header.op = KVM_MMU_OP_WRITE_PTE;
+   wpte.pte_val = val;
+   wpte.pte_phys = pte_phys;
 
kvm_deferred_mmu_op(&wpte, sizeof wpte);
 }

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Порядок определения доходов и расходов при упрощенке

2008-03-17 Thread Христина
Упрощенная система налoгooблoжения

Oднoдневный семинар прoйдёт 21 марта 2008г. (Мoсква)

В прoграмма: 

1. Сущнoсть УСН, oсвoбoждение oт уплаты ряда налoгoв. 
Пoрядoк и услoвия начала и прекращения применения упрoщеннoй системы 
налoгooблoжения с учетoм нoвшеств 2008 г. 
Требoвания к размеру дoхoдoв, стoимoсти oснoвных средств, численнoсти 
рабoтникoв, сoставу учредителей, видам деятельнoсти и структуре oрганизации: 
чтo скрытo за фoрмулирoвками НК РФ. 
Риски для "упрoщенцев": в чем налoгoвые oрганы усматривают "налoгoвые схемы".

2. oбъекты налoгooблoжения: чем рукoвoдствoваться при выбoре. 
Смена oбъекта - различные пoдхoды. 
oсoбые oграничения в выбoре oбъекта для oтдельных видoв деятельнoсти

3. oрганизация бухгалтерскoгo и налoгoвoгo учета при УСН. 
Книга дoхoдoв и расхoдoв - oсoбеннoсти ведения и представления налoгoвoму 
oргану. 
Прoблема выставления счетoв-фактур

4. Пoрядoк oпределения дoхoдoв и расхoдoв (с учетoм нoвшеств 2008 г.):

5. Исчисление и уплата налoга. Минимальный налoг. Налoгoвая декларация

6. oсoбеннoсти исчисления налoгoвoй базы при перехoде на упрoщенную систему 
налoгooблoжения и при перехoде с упрoщеннoй системы налoгooблoжения на иные 
режимы налoгooблoжения с учетoм нoвшеств 2008 г.

Прoдoлжительнoсть oбучения: с 10 дo 17 часoв (с перерывoм на oбед и кoфе-паузу).
Местo oбучения: г. Мoсква, 5 мин. пешкoм oт м. Академическая.
Стoимoсть oбучения: 4900 руб. (с НДС). 
(В стoимoсть вхoдит: раздатoчный материал, кoфе-пауза, oбед в рестoране).

При oтсутствии вoзмoжнoсти пoсетить семинар, мы предлагаем приoбрести егo 
видеoверсию на DVD/CD дисках или видеoкассетах (прилагается автoрский 
раздатoчный материал). 
Цена видеoкурса - 3500 рублей, с учетoм НДС.

Пoлучить дoпoлнительную инфoрмацию и зарегистрирoваться мoжнo:
пo т/ф: ( 4 9 5 ) 5 4 3 - 8 8 - 4 6 




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM Test result, kernel a16664b.., userspace 3017d05.. one new issue

2008-03-17 Thread Avi Kivity
Yunfeng Zhao wrote:
> Hi All,
> This is today's KVM test result against kvm.git 
> a16664b59065b8ae877b213e68a6f962ac359ca4 and kvm-userspace.git 
> 3017d054a6a9f37bfffee4ebf0bf52a51c9860b5.
>
>
> One new issue:
> 1.  booting smp windows guests has 30% chance of hang
> https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1910923&group_id=180599
>
>   
Is this a regression?  What was the last known good commit?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] debugfs directory for each VM

2008-03-17 Thread Avi Kivity
Ryota OZAKI wrote:
> Hi all,
>
> This patch allows a VM to have own directory on debugfs,
> that contains statics only for the VM. Each directory
> is identified by the pid of the VM (ie qemu).
>
> I tried this patch under several host kernel versions,
> .22, .23, and .24, and confirmed things work out well.
>   

Nice patch; it's certainly useful to improve per-vm trace capabilities.

It would be nice to keep the summary, since that allows running kvm_stat 
with no arguments.

On the other hand, the kvmtrace patchset recently posted adds much more 
detailed data; we should probably concentrate on that as a means of 
improving performance tracing.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] You only deserve the best, get it here

2008-03-17 Thread hill saulnier
Don’t settle for  a lovestick that is too short and too thin – click here for 
the solution.


http://www.buriaoto.com/
10 incredible ways to hit on her-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
> Hi Avi,
>
> Good that you're back.
>
> On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>> 
>>> This patchset resends Anthony's QEMU balloon support plus:
>>>
>>> - Truncates the target size to ram size
>>> - Enables madvise() conditioned on KVM_ZAP_GFN ioctl
>>>
>>>  
>>>   
>> Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
>> possible situations:
>>
>> - zap needed, but not available: don't madvise()
>> - zap needed and available: zap and madvise()
>> - zap unneeded: madvise()
>> 
>
> Right. That is what the patch does. You just have to fill in
> "have_mmu_notifiers" here:
>
> +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn)
> +{
> +   unsigned long addr;
> +   int have_mmu_notifiers = 0;
> +
> +   down_read(&kvm->slots_lock);
> +   addr = gfn_to_hva(kvm, gfn);
> +
> +   if (kvm_is_error_hva(addr)) {
> +   up_read(&kvm->slots_lock);
> +   return -EINVAL;
> +   }
> +
> +   if (!have_mmu_notifiers) {
> +   spin_lock(&kvm->mmu_lock);
> +   rmap_nuke(kvm, gfn);
> +   spin_unlock(&kvm->mmu_lock);
> +   }
> +   up_read(&kvm->slots_lock);
>
> So as to avoid rmap_nuke, since that will be done through the madvise()
> path.
>
>   

Why not do it in userspace?

I'll likely merge zap directly into the backwards compatibility support 
(it won't ever appear in mainline since mmu notifiers will be merged 
sooner).

>> Did you find out what's causing the errors in the first place (if zap is 
>> not used)?  It worries me greatly.
>> 
>
> Yes, the problem is that the rmap code does not handle the qemu process
> mappings from vanishing while there is a present rmap. If that happens,
> and there is a fault for a gfn whose qemu mapping has been removed, a
> different physical zero page will be allocated:
>
>   rmap a -> gfn 0 -> physical host page 0
>   mapping for gfn 0 gets removed
>   guest faults in gfn 0 through the same pte "chain"
>   rmap a -> gfn 0 -> physical host page 1
>
> When instantiating the shadow mapping for the second time, the
> "is_rmap_pte" check succeeds, so we release the reference grabbed by
> gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
> to a physical page without having an additional reference on that page.
>
> The following makes the host not crash under such a condition, but the
> condition itself is invalid leading to inconsistent state on the guest.
> So IMHO it shouldnt be allowed to happen in the first place.
>
>   

The only way to prevent it is with mmu notifiers, which we may not have 
for 2.6.25.  So I'd like to send this for 2.6.25-rc.

Please add a signoff.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f0cdfba..4c93b79 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t
> +gva)
> return page;
>  }
>
> +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page)
> +{
> +   int ret = 0;
> +   unsigned long host_pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>   

hfn_t hfn

> +
> +   if (is_rmap_pte(*spte)) {
> +   if (host_pfn != page_to_pfn(page))
> +   rmap_remove(kvm, spte);
> +   else
> +   ret = 1;
> +   }
> +
> +   return ret;
> +}
> +
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>  unsigned pt_access, unsigned pte_access,
>  int user_fault, int write_fault, int dirty,
> @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64
> +*shadow_pte,
>  struct page *page)
>  {
> u64 spte;
> -   int was_rmapped = is_rmap_pte(*shadow_pte);
> +   int was_rmapped = was_spte_rmapped(vcpu->kvm, shadow_pte, page);
> int was_writeble = is_writeble_pte(*shadow_pte);
>   

Calling code with side effects in an initializer is a bit confusing.  I 
suggest simply inlining the code into mmu_set_spte().

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd()

2008-03-17 Thread Avi Kivity
Christoph Hellwig wrote:
> On Sat, Mar 08, 2008 at 06:45:34PM -0800, Roland Dreier wrote:
>   
>>  >   spin_lock(&kvm_lock);
>>  > + if (--kvm->refcount) {
>>  > + spin_lock(&kvm_lock);
>>
>> obvious typo here...
>> 
>
>   












> Indeed.  Any comments from the kvm developers in this approach?  The
> current multi-level file refcounting seems rather bogus so I'd like
> to make some progress on this.
>
>   

I'm okay with switching away from the file-based refcounts to a refcount 
embedded in the kvm structures, though I'm not particularly thrilled by 
it.  Any reason not to use krefs for this?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd()

2008-03-17 Thread Christoph Hellwig
On Sat, Mar 08, 2008 at 06:45:34PM -0800, Roland Dreier wrote:
>  >spin_lock(&kvm_lock);
>  > +  if (--kvm->refcount) {
>  > +  spin_lock(&kvm_lock);
> 
> obvious typo here...

Indeed.  Any comments from the kvm developers in this approach?  The
current multi-level file refcounting seems rather bogus so I'd like
to make some progress on this.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 0/4]Porting Xentrace to kvm

2008-03-17 Thread Liu, Eric E
[EMAIL PROTECTED] wrote:
> Avi Kivity wrote:
>> Jan Kiszka wrote:
>>> Avi Kivity wrote:
>>> 
 Liu, Eric E wrote:
 
> 
> Since we already have had debugfs_entries and some other kernel
> debug mechanism to use, does this kvmtrace make sense? Any
> comment is welcomed.
 This looks very useful.  While kvm_stat provides good data, this
 is much more in depth. 
 
 The kernel already contains a method of transferring percpu data to
 userspace; see Documentation/filesystems/relay.txt.  It supports
 mmap() and read().  Please see if it is a good fit.
 
>>> 
>>> I would also suggest to use the new kernel standard for
>>> instrumentation: trace_mark(). 
>>> 
>>> This would also allow to reuse the trace points with other tracer
>>> and maybe even obsolete a separate transportation channel, e.g.
>>> when LTTng is once :-/ merged. 
>>> 
>> 
>> Can one have markers automatically recorded?  Or do you need to
>> connect the marker with a logging function?
> 
> Yes, of course you need a probe function. Either your own one, or one
> provided by other tracers.
> 
>> 
>> If the latter, it's too difficult to use.
> 
> I don't think it is. Check linux/samples/markers/probe-example.c. BTW,
> RCU-Trace (not yet mainlined) make use of markers as well, see -rt.
> 
> And in contrast to the suggested implementation, markers have less
> impact on the code if built-in but disabled. So there would be no
> reasons for not shipping kvm production binaries that are prepared to
> be traced.
> 
> But if you prefer the inlined version, I bet you'll have to hide it
> from reviewers at LKML. ;)
> 
> Jan

Thanks for your comments, and I think that using relayfs and
trace_mark() is more reasonable, if it makes sense, I will modify the
patches according to these. 



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] tools to dump guest memory and generate core file

2008-03-17 Thread Avi Kivity
david ahern wrote:
> Does anyone know of tools that can dump memory for a qemu guest with addresses
> as seen by the guest and generate a core file? For instance, say you know the
> guest is running a 32-bit linux kernel with a 1G/3G split. Then you would want
> to dump 1G of memory starting 0xc000 and create an ELF core file. The core
> file could then be analyzed using tools like crash (or gdb for the truly
> adventurous).
>
> I know VMware can take a snapshot and generate such a core file. Does a 
> similar
> tool exist for qemu? I see that the qemu console supports a raw memory dump, 
> so
> it should be possible. (Note I am not talking about a core file of the qemu
> process, but rather a core file based on guest memory addressing.)
>
>   

You might try connecting with gdb and using the generate-core-file command.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Oopses with Oprofile and qemu-kvm

2008-03-17 Thread Jan Kiszka
Yang, Sheng wrote:
> ...
> BTW: I am working on NMI supporting now. 

That's good news! Some weeks ago I posted a NMI infrastructure patch [1]
to qemu-devel already, for now only to support IOAPIC NMI watchdogs [1,
2]. Maybe you can make use of this. Please keep us posted about your
approach!

Jan

[1] http://permalink.gmane.org/gmane.comp.emulators.qemu/22989
[2] http://permalink.gmane.org/gmane.comp.emulators.qemu/23448
[3] http://permalink.gmane.org/gmane.comp.emulators.qemu/23456

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Oopses with Oprofile and qemu-kvm

2008-03-17 Thread Yang, Sheng
On Friday 14 March 2008 20:31:50 Zdenek Kabelac wrote:
> Hi
>
> I'm trying various thing with qemu to resolve my other issue and I've
> tried to use oprofile.
> But as soon  as I run these:
>
> opcontrol --vmlinux=mine/vmlinux
> opcontrol --start
>
> I get kernel oops (see bellow)

That's known, because the performance counter (a part of PMU) is not ready in 
KVM, which oprofile used for profiling. The MSRs oprofile want to accessed is 
not exist, so KVM inject a #GP to the guest, causing this oops. And even the 
PMU is ready, current KVM also can't support virtual NMI, so oprofile won't 
get sample file in normal condition. 

There is one way to work around this: using oprofile with timer interrupt 
mode. Add "timer=1" module parameter to oprofile. Please refer to oprofile 
manual for detail. 

BTW: I am working on NMI supporting now. 

-- 
Thanks
Yang, Sheng

>
> Is the fault in qemu (i.e. NMI is not emulated) or do I have to use
> some other tools ?
> Or should I report this to lkml ?
>
> [   59.423311] oprofile: using NMI interrupt.
> [   65.125411] general protection fault:  [1] PREEMPT SMP
> [   65.128156] CPU 1
> [   65.128156] Modules linked in: oprofile nfs lockd nfs_acl sunrpc
> autofs4 dm_mod loop rtc psmouse evdev serio_raw i2c
> _piix4 pcnet32 mii i2c_core button
> [   65.128156] Pid: 2584, comm: udevd Not tainted 2.6.25-rc5-replic-server
> #52 [   65.128156] RIP: 0010:[]  []
>
> :oprofile:nmi_save_registers+0x49/0xb0
>
> [   65.128156] RSP: :81001f89bf78  EFLAGS: 0002
> [   65.128156] RAX: 0001 RBX:  RCX:
> 00c1 [   65.128156] RDX: 88106400 RSI: 
> RDI: 81001b9513a8 [   65.128156] RBP: 81001f89bf78 R08:
> 0020 R09: 81001b951410 [   65.128156] R10: 0002
> R11: 0001 R12: 881043a0 [   65.128156] R13:
> 0001 R14:  R15: 81001c1c65a0 [   65.128156]
> FS:  () GS:81001f80f190(0063)
> knlGS:f7d79720
> [   65.128156] CS:  0010 DS: 002b ES: 002b CR0: 8005003b
> [   65.128156] CR2: f7dad1b0 CR3: 1b93b000 CR4:
> 06e0 [   65.128156] DR0:  DR1: 
> DR2:  [   65.128156] DR3:  DR6:
> 0ff0 DR7: 0400 [   65.128156] Process udevd (pid:
> 2584, threadinfo 81001b004000, task 81001bba8000)
> [   65.128156] Stack:  81001f89bfa8 8101f74f
> aaab 81001c0f1d68
> [   65.128156]  810002666220  81001b005c30
> 8100d25b
> [   65.128156]  81001b005c30   81001b005d48
> 0001 
> [   65.128156] Call Trace:
> [   65.128156][]
> smp_call_function_interrupt+0x4f/0x80
> [   65.128156]  [] call_function_interrupt+0x6b/0x70
> [   65.128156][] ? __do_fault+0x240/0x580
> [   65.128156]  [] ? __do_fault+0x1bd/0x580
> [   65.128156]  [] ? handle_mm_fault+0x26d/0x8c0
> [   65.128156]  [] ? do_page_fault+0x3fc/0xb50
> [   65.128156]  [] ? hrtimer_start+0xe1/0x1a0
> [   65.128156]  [] ? get_lock_stats+0x2a/0x70
> [   65.128156]  [] ? put_lock_stats+0xe/0x30
> [   65.128156]  [] ? do_setitimer+0x393/0x3b0
> [   65.128156]  [] ? _spin_unlock_irq+0x32/0x80
> [   65.128156]  [] ? trace_hardirqs_on+0x131/0x190
> [   65.128156]  [] ? _spin_unlock_irq+0x3d/0x80
> [   65.128156]  [] ? do_setitimer+0x393/0x3b0
> [   65.128156]  [] ? trace_hardirqs_on_thunk+0x35/0x3a
> [   65.128156]  [] ? error_exit+0x0/0xa9
> [   65.128156]
> [   65.128156]
> [   65.128156] Code: b8 60 c6 10 88 4c 8b 88 68 c6 10 88 85 c9 74 3b
> 8d 41 ff 31 f6 4c 8d 40 01 49 c1 e0 04 0f 1f 44 00
>  00 48 8b 0c 37 48 85 c9 74 17 <0f> 32 48 c1 e2 20 89 c0 48 09 c2 89
> 54 37 0c 48 c1 ea 20 89 54
> [   65.128156] RIP  []
> :oprofile:nmi_save_registers+0x49/0xb0 [   65.128156]  RSP
> 
> [   65.128156] ---[ end trace e503d3702565a4da ]---
> [   65.128156] Kernel panic - not syncing: Aiee, killing interrupt handler!
>
>
> Or another one:
>
> [   69.388548] oprofile: using NMI interrupt.
> [   78.377689] general protection fault:  [1] PREEMPT SMP
> [   78.378754] CPU 0
> [   78.378754] Modules linked in: oprofile nfs lockd nfs_acl sunrpc
> autofs4 dm_mod loop psmouse i2c_piix4 evdev rtc ser
> io_raw i2c_core pcnet32 mii button
> [   78.378754] Pid: 506, comm: udevd Not tainted 2.6.25-rc5-replic-server
> #52 [   78.378754] RIP: 0010:[]  []
>
> :oprofile:nmi_save_registers+0x49/0xb0
>
> [   78.378754] RSP: 0018:81559f78  EFLAGS: 0002
> [   78.378754] RAX: 0001 RBX:  RCX:
> 00c1 [   78.378754] RDX: 88108400 RSI: 
> RDI: 81001b589000 [   78.378754] RBP: 81559f78 R08:
> 0020 R09: 81001b589270 [   78.378754] R10: 0002
> R11: 0001 R12: 881063a0 [   78.378754] R13:
> 0001 R14: 81001f20c000 R15:  [   78.