Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Ingo Molnar

* oerg Roedel j...@8bytes.org wrote:

 On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
  Hm, that sounds rather messy if we want to use it to basically expose 
  kernel 
  functionality in a guest/host unified way. Is the qemu process discoverable 
  in 
  some secure way? Can we trust it? Is there some proper tooling available to 
  do 
  it, or do we have to push it through 2-3 packages to get such a useful 
  feature 
  done?
 
 Since we want to implement a pmu usable for the guest anyway why we don't 
 just use a guests perf to get all information we want? [...]

Look at the previous posting of this patch, this is something new and rather 
unique. The main power in the 'perf kvm' kind of instrumentation is to profile 
_both_ the host and the guest on the host, using the same tool (often using 
the same kernel) and using similar workloads, and do profile comparisons using 
'perf diff'.

Note that KVM's in-kernel design makes it easy to offer this kind of 
host/guest shared implementation that Yanmin has created. Other virtulization 
solutions with a poorer design (for example where the hypervisor code base is 
split away from the guest implementation) will have it much harder to create 
something similar.

That kind of integrated approach can result in very interesting finds straight 
away, see:

  http://lkml.indiana.edu/hypermail/linux/kernel/1003.0/00613.html

( the profile there demoes the need for spinlock accelerators for example - 
  there's clearly assymetrically large overhead in guest spinlock code. Guess 
  how much else we'll be able to find with a full 'perf kvm' implementation. )

One of the main goals of a virtualization implementation is to eliminate as 
many performance differences to the host kernel as possible. From the first 
day KVM was released the overriding question from users was always: 'how much 
slower is it than native, and which workloads are hit worst, and why, and 
could you pretty please speed up important workload XYZ'.

'perf kvm' helps exactly that kind of development workflow.

Note that with oprofile you can already do separate guest space and host space 
profiling (with the timer driven fallbackin the guest). One idea with 'perf 
kvm' is to change that paradigm of forced separation and forced duplication 
and to supprt the workflow that most developers employ: use the host space for 
development and unify instrumentation in an intuitive framework. Yanmin's 
'perf kvm' patch is a very good step towards that goal.

Anyway ... look at the patches, try them and see it for yourself. Back in the 
days when i did KVM performance work i wish i had something like Yanmin's 
'perf kvm' feature. I'd probably still be hacking KVM today ;-)

So, the code is there, it's useful and it's up to you guys whether you live 
with this opportunity - the perf developers are certainly eager to help out 
with the details. There's already tons of per kernel subsystem perf helper 
tools: perf sched, perf kmem, perf lock, perf bench, perf timechart.

'perf kvm' is really a natural and good next step IMO that underlines the main 
design goodness KVM brought to the world of virtualization: proper guest/host 
code base integration.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] KVM: Make locked operations truly atomic

2010-03-17 Thread Jan Kiszka
Avi Kivity wrote:
 Once upon a time, locked operations were emulated while holding the mmu mutex.
 Since mmu pages were write protected, it was safe to emulate the writes in
 a non-atomic manner, since there could be no other writer, either in the
 guest or in the kernel.
 
 These days emulation takes place without holding the mmu spinlock, so the
 write could be preempted by an unshadowing event, which exposes the page
 to writes by the guest.  This may cause corruption of guest page tables.
 
 Fix by using an atomic cmpxchg for these operations.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/x86.c |   69 
 
  1 files changed, 48 insertions(+), 21 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 9d02cc7..d724a52 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3299,41 +3299,68 @@ int emulator_write_emulated(unsigned long addr,
  }
  EXPORT_SYMBOL_GPL(emulator_write_emulated);
  
 +#define CMPXCHG_TYPE(t, ptr, old, new) \
 + (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
 +
 +#ifdef CONFIG_X86_64
 +#  define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new)
 +#else
 +#  define CMPXCHG64(ptr, old, new) \
 + (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old))
^^
This should cause the 32-bit build breakage I see with the current next
branch.

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Xiao Guangrong
Using bitmap_empty() to see whether memslot-dirty_bitmap is empty 

Changlog:
cleanup x86 specific kvm_vm_ioctl_get_dirty_log() and fix a local
parameter's type address Takuya Yoshikawa's suggestion

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c  |   17 -
 virt/kvm/kvm_main.c |7 ++-
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bcf52d1..e6cbbd4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2644,22 +2644,17 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
 {
-   int r, n, i;
+   int r, n, is_dirty = 0;
struct kvm_memory_slot *memslot;
-   unsigned long is_dirty = 0;
unsigned long *dirty_bitmap = NULL;
 
mutex_lock(kvm-slots_lock);
 
-   r = -EINVAL;
-   if (log-slot = KVM_MEMORY_SLOTS)
+   r = kvm_get_dirty_log(kvm, log, is_dirty);
+   if (r)
goto out;
 
memslot = kvm-memslots-memslots[log-slot];
-   r = -ENOENT;
-   if (!memslot-dirty_bitmap)
-   goto out;
-
n = ALIGN(memslot-npages, BITS_PER_LONG) / 8;
 
r = -ENOMEM;
@@ -2668,9 +2663,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
goto out;
memset(dirty_bitmap, 0, n);
 
-   for (i = 0; !is_dirty  i  n/sizeof(long); i++)
-   is_dirty = memslot-dirty_bitmap[i];
-
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
struct kvm_memslots *slots, *old_slots;
@@ -2694,8 +2686,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
}
 
r = 0;
-   if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n))
-   r = -EFAULT;
+
 out_free:
vfree(dirty_bitmap);
 out:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bcd08b8..b08a7de 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -767,9 +767,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty)
 {
struct kvm_memory_slot *memslot;
-   int r, i;
-   int n;
-   unsigned long any = 0;
+   int r, n, any = 0;
 
r = -EINVAL;
if (log-slot = KVM_MEMORY_SLOTS)
@@ -782,8 +780,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 
n = ALIGN(memslot-npages, BITS_PER_LONG) / 8;
 
-   for (i = 0; !any  i  n/sizeof(long); ++i)
-   any = memslot-dirty_bitmap[i];
+   any = !bitmap_empty(memslot-dirty_bitmap, memslot-npages);
 
r = -EFAULT;
if (copy_to_user(log-dirty_bitmap, memslot-dirty_bitmap, n))
-- 
1.6.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] KVM: Make locked operations truly atomic

2010-03-17 Thread Avi Kivity

On 03/17/2010 09:45 AM, Jan Kiszka wrote:

Avi Kivity wrote:
   

Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.

These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest.  This may cause corruption of guest page tables.

Fix by using an atomic cmpxchg for these operations.

Signed-off-by: Avi Kivitya...@redhat.com
---
  arch/x86/kvm/x86.c |   69 
  1 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d02cc7..d724a52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3299,41 +3299,68 @@ int emulator_write_emulated(unsigned long addr,
  }
  EXPORT_SYMBOL_GPL(emulator_write_emulated);

+#define CMPXCHG_TYPE(t, ptr, old, new) \
+   (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
+
+#ifdef CONFIG_X86_64
+#  define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new)
+#else
+#  define CMPXCHG64(ptr, old, new) \
+   (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old))
 

 ^^
This should cause the 32-bit build breakage I see with the current next
branch.
   


Also, Marcelo sees autotest breakage, so it's also broken on 64-bit somehow.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-17 Thread Ingo Molnar

* Anthony Liguori anth...@codemonkey.ws wrote:

 On 03/16/2010 12:39 PM, Ingo Molnar wrote:
 If we look at the use-case, it's going to be something like, a user is
 creating virtual machines and wants to get performance information about
 them.
 
 Having to run a separate tool like perf is not going to be what they would
 expect they had to do.  Instead, they would either use their existing GUI
 tool (like virt-manager) or they would use their management interface
 (either QMP or libvirt).
 
 The complexity of interaction is due to the fact that perf shouldn't be a
 stand alone tool.  It should be a library or something with a programmatic
 interface that another tool can make use of.
 But ... a GUI interface/integration is of course possible too, and it's being
 worked on.
 
 perf is mainly a kernel developer tool, and kernel developers generally dont
 use GUIs to do their stuff: which is the (sole) reason why its first ~850
 commits of tools/perf/ were done without a GUI. We go where our developers
 are.
 
 In any case it's not an excuse to have no proper command-line tooling. In 
 fact
 if you cannot get simpler, more atomic command-line tooling right then you'll
 probably doubly suck at doing a GUI as well.
 
 It's about who owns the user interface.
 
 If qemu owns the user interface, than we can satisfy this in a very simple 
 way by adding a perf monitor command.  If we have to support third party 
 tools, then it significantly complicates things.

Of course illogical modularization complicates things 'significantly'.

I wish both you and Avi looked back 3-4 years and realized what made KVM so 
successful back then and why the hearts and minds of virtualization developers 
were captured by KVM almost overnight.

KVM's main strength back then was that it was a surprisingly functional piece 
of code offered by a 10 KLOC patch - right on the very latest upstream kernel. 
Code was shared with upstream, there was version parity, and it all was in the 
same single repo which was (and is) a pleasure to develop on.

Unlike Xen, which was a 200+ KLOC patch on top of a forked 10 MLOC kernel a 
few upstream versions back. Xen had constant version friction due to that fork 
and due to that forced/false separation/modularization: Xen _itself_ was a 
fork of Linux to begin with. (for exampe Xen still had my copyrights last i 
checked, which it got from old Linux code i worked on)

That forced separation and version friction in Xen was a development and 
productization nightmare, and developing on KVM was a truly refreshing 
experience. (I'll go out on a limb to declare that you wont find a _single_ 
developer on this list who will tells us otherwise.)

Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the 
worst-quality remaining aspects of KVM are precisely in areas that you 
mention: 'if we have to support third party tools, then it significantly 
complicates things'. You kept Qemu as an external 'third party' entity to KVM, 
and KVM is clearly hurting from that - just see the recent KVM usability 
thread for examples about suckage.

So a similar 'complication' is the crux of the matter behind KVM quality 
problems: you've not followed through with the original KVM vision and you 
have not applied that concept to Qemu!

And please realize that the user does not care that KVM's kernel bits are top 
notch, if the rest of the package has sucky aspects: it's always the weakest 
link of the chain that matters to the user.

Xen sucked because of such design shortsightedness on the kernel level, and 
now KVM suffers from it on the user space level.

If you want to jump to the next level of technological quality you need to fix 
this attitude and you need to go back to the design roots of KVM. Concentrate 
on Qemu (as that is the weakest link now), make it a first class member of the 
KVM repo and simplify your development model by having a single repo:

 - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in
   the upstream kernel repo, and work on that from that point on.

 - co-develop new features within the same patch. Release new versions of
   kvm-qemu and the kvm bits at the same time (together with the upstream
   kernel), at well defined points in time.

 - encourage kernel-space and user-space KVM developers to work on both 
   user-space and kernel-space bits as a single unit. It's one project and a
   single experience to the user.

 - [ and probably libvirt should go there too ]

If KVM's hypervisor and guest kernel code can enjoy the benefits of a single 
repository, why cannot the rest of KVM enjoy the same developer goodness? Only 
fixing that will bring the break-through in quality - not more manpower 
really.

Yes, i've read a thousand excuses for why this is an absolutely impossible and 
a bad thing to do, and none of them was really convincing to me - and you also 
have become rather emotional about all the arguments so it's hard to argue 
about it on 

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Ingo Molnar

* Frank Ch. Eigler f...@redhat.com wrote:

 Hi -
 
 On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
  [...]
  The only way to really address this is to change the interaction.  
  Instead of running perf externally to qemu, we should support a perf 
  command in the qemu monitor that can then tie directly to the perf 
  tooling.  That gives us the best possible user experience.
 
 To what extent could this be solved with less crossing of 
 isolation/abstraction layers, if the perfctr facilities were properly 
 virtualized? [...]

Note, 'perfctr' is a different out-of-tree Linux kernel project run by someone 
else: it offers the /dev/perfctr special-purpose device that allows raw, 
unabstracted, low-level access to the PMU.

I suspect the one you wanted to mention here is called 'perf' or 'perf 
events'. (and used to be called 'performance counters' or 'perfcounters' until 
it got renamed about a year ago)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Takuya Yoshikawa

Xiao Guangrong wrote:
Using bitmap_empty() to see whether memslot-dirty_bitmap is empty 


Changlog:
cleanup x86 specific kvm_vm_ioctl_get_dirty_log() and fix a local
parameter's type address Takuya Yoshikawa's suggestion


Oh, for such a tiny comment.



Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c  |   17 -
 virt/kvm/kvm_main.c |7 ++-
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bcf52d1..e6cbbd4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c


What I said was just you may be able to use bitmap_empty() instead of

 -  for (i = 0; !is_dirty  i  n/sizeof(long); i++)
 -  is_dirty = memslot-dirty_bitmap[i];

for x86's code too, if your patch for kvm_get_dirty_log() was correct.


@@ -2644,22 +2644,17 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
 {
-   int r, n, i;
+   int r, n, is_dirty = 0;
struct kvm_memory_slot *memslot;
-   unsigned long is_dirty = 0;
unsigned long *dirty_bitmap = NULL;
 
 	mutex_lock(kvm-slots_lock);
 
-	r = -EINVAL;

-   if (log-slot = KVM_MEMORY_SLOTS)
+   r = kvm_get_dirty_log(kvm, log, is_dirty);
+   if (r)
goto out;
 
 	memslot = kvm-memslots-memslots[log-slot];

-   r = -ENOENT;
-   if (!memslot-dirty_bitmap)
-   goto out;
-
n = ALIGN(memslot-npages, BITS_PER_LONG) / 8;
 
 	r = -ENOMEM;

@@ -2668,9 +2663,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
goto out;
memset(dirty_bitmap, 0, n);
 
-	for (i = 0; !is_dirty  i  n/sizeof(long); i++)

-   is_dirty = memslot-dirty_bitmap[i];
-
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
struct kvm_memslots *slots, *old_slots;
@@ -2694,8 +2686,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
}
 
 	r = 0;

-   if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n))
-   r = -EFAULT;
+
 out_free:
vfree(dirty_bitmap);
 out:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bcd08b8..b08a7de 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -767,9 +767,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty)
 {
struct kvm_memory_slot *memslot;
-   int r, i;
-   int n;
-   unsigned long any = 0;
+   int r, n, any = 0;
 
 	r = -EINVAL;

if (log-slot = KVM_MEMORY_SLOTS)
@@ -782,8 +780,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 
 	n = ALIGN(memslot-npages, BITS_PER_LONG) / 8;
 
-	for (i = 0; !any  i  n/sizeof(long); ++i)

-   any = memslot-dirty_bitmap[i];
+   any = !bitmap_empty(memslot-dirty_bitmap, memslot-npages);
 
 	r = -EFAULT;

if (copy_to_user(log-dirty_bitmap, memslot-dirty_bitmap, n))


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Ingo Molnar

* Avi Kivity a...@redhat.com wrote:

 Monitoring guests from the host is useful for kvm developers, but less so 
 for users.

Guest space profiling is easy, and 'perf kvm' is not about that. (plain 'perf' 
will work if a proper paravirt channel is opened to the host)

I think you might have misunderstood the purpose and role of the 'perf kvm' 
patch here? 'perf kvm' is aimed at KVM developers: it is them who improve KVM 
code, not guest kernel users.

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Avi Kivity

On 03/17/2010 10:16 AM, Ingo Molnar wrote:

* Avi Kivitya...@redhat.com  wrote:

   

Monitoring guests from the host is useful for kvm developers, but less so
for users.
 

Guest space profiling is easy, and 'perf kvm' is not about that. (plain 'perf'
will work if a proper paravirt channel is opened to the host)

I think you might have misunderstood the purpose and role of the 'perf kvm'
patch here? 'perf kvm' is aimed at KVM developers: it is them who improve KVM
code, not guest kernel users.
   


Of course I understood it.  My point was that 'perf kvm' serves a tiny 
minority of users.  That doesn't mean it isn't useful, just that it 
doesn't satisfy all needs by itself.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Xiao Guangrong


Takuya Yoshikawa wrote:

 
 Oh, for such a tiny comment.

Your comment is valuable although it's tiny :-)

 

 
 What I said was just you may be able to use bitmap_empty() instead of
 
 -for (i = 0; !is_dirty  i  n/sizeof(long); i++)
 -is_dirty = memslot-dirty_bitmap[i];
 
 for x86's code too, if your patch for kvm_get_dirty_log() was correct.

While i look into x86's code, i found we can direct call kvm_get_dirty_log()
in kvm_vm_ioctl_get_dirty_log() to remove some unnecessary code, this is a
better cleanup way

Thanks,
Xiao
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2 serial ports?

2010-03-17 Thread Michael Tokarev
Since 0.12, it appears that kvm does not allow more than
2 serial ports for a guest:

$ kvm \
 -serial unix:s1,server,nowait \
 -serial unix:s2,server,nowait \
 -serial unix:s3,server,nowait
isa irq 4 already assigned

Is there a work-around for this?

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Takuya Yoshikawa

Xiao Guangrong wrote:


Takuya Yoshikawa wrote:


Oh, for such a tiny comment.


Your comment is valuable although it's tiny :-)



What I said was just you may be able to use bitmap_empty() instead of


-for (i = 0; !is_dirty  i  n/sizeof(long); i++)
-is_dirty = memslot-dirty_bitmap[i];

for x86's code too, if your patch for kvm_get_dirty_log() was correct.


While i look into x86's code, i found we can direct call kvm_get_dirty_log()
in kvm_vm_ioctl_get_dirty_log() to remove some unnecessary code, this is a
better cleanup way


Ah, probably checking the git log will explain you why it is like that!
Marcelo's work? IIRC.



Thanks,
Xiao
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote:
 If the batch size is larger than the virtio queue size, or if there are 
 no flushes at all, then yes the huge write cache gives more opportunity 
 for reordering.  But we're already talking hundreds of requests here.

Yes.  And rememember those don't have to come from the same host.  Also
remember that we rather limit execssive reodering of O_DIRECT requests
in the I/O scheduler because they are synchronous type I/O while
we don't do that for pagecache writeback.

And we don't have unlimited virtio queue size, in fact it's quite
limited.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Ingo Molnar

* Anthony Liguori aligu...@linux.vnet.ibm.com wrote:

 If you want to use a synthetic filesystem as the management interface for 
 qemu, that's one thing.  But you suggested exposing the guest filesystem in 
 its entirely and that's what I disagreed with.

What did you think, that it would be world-readable? Why would we do such a 
stupid thing? Any mounted content should at minimum match whatever policy 
covers the image file. The mounting of contents is not a privilege escallation 
and it is already possible today - just not integrated properly and not 
practical. (and apparently not implemented for all the wrong 'security' 
reasons)

 The guest may encrypt it's disk image.  It still ought to be possible to run 
 perf against that guest, no?

_In_ the guest you can of course run it just fine. (once paravirt bits are in 
place)

That has no connection to 'perf kvm' though, which this patch submission is 
about ...

If you want unified profiling of both host and guest then you need access to 
both the guest and the host. This is what the 'perf kvm' patch is about. 
Please read the patch, i think you might be misunderstanding what it does ...

Regarding encrypted contents - that's really a distraction but the host has 
absolute, 100% control over the guest and there's nothing the guest can do 
about that - unless you are thinking about the sub-sub-case of Orwellian 
DRM-locked-down systems - in which case there's nothing for the host to mount 
and the guest can reject any requests for information on itself and impose 
additional policy that way. So it's a security non-issue.

Note that DRM is pretty much the worst place to look at when it comes to 
usability: DRM lock-down is the anti-thesis of usability. Do you really want 
KVM to match the mind-set of the RIAA and MPAA? Why do you pretend that a 
developer cannot mount his own disk image? Pretty please, help Linux instead, 
where development is driven by usability and accessibility ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Ingo Molnar

* Avi Kivity a...@redhat.com wrote:

 On 03/17/2010 10:16 AM, Ingo Molnar wrote:
 * Avi Kivitya...@redhat.com  wrote:
 
  Monitoring guests from the host is useful for kvm developers, but less so
  for users.
 
  Guest space profiling is easy, and 'perf kvm' is not about that. (plain 
  'perf' will work if a proper paravirt channel is opened to the host)
 
  I think you might have misunderstood the purpose and role of the 'perf 
  kvm' patch here? 'perf kvm' is aimed at KVM developers: it is them who 
  improve KVM code, not guest kernel users.
 
 Of course I understood it.  My point was that 'perf kvm' serves a tiny 
 minority of users. [...]

I hope you wont be disappointed to learn that 100% of Linux, all 13+ million 
lines of it, was and is being developed by a tiny, tiny, tiny minority of 
users ;-)

 [...]  That doesn't mean it isn't useful, just that it doesn't satisfy all 
 needs by itself.

Of course - and it doesnt bring world peace either. One step at a time.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 10:49 AM, Christoph Hellwig wrote:

On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote:
   

If the batch size is larger than the virtio queue size, or if there are
no flushes at all, then yes the huge write cache gives more opportunity
for reordering.  But we're already talking hundreds of requests here.
 

Yes.  And rememember those don't have to come from the same host.  Also
remember that we rather limit execssive reodering of O_DIRECT requests
in the I/O scheduler because they are synchronous type I/O while
we don't do that for pagecache writeback.
   


Maybe we should relax that for kvm.  Perhaps some of the problem comes 
from the fact that we call io_submit() once per request.



And we don't have unlimited virtio queue size, in fact it's quite
limited.
   


That can be extended easily if it fixes the problem.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 serial ports?

2010-03-17 Thread Gerd Hoffmann

On 03/17/10 09:38, Michael Tokarev wrote:

Since 0.12, it appears that kvm does not allow more than
2 serial ports for a guest:

$ kvm \
  -serial unix:s1,server,nowait \
  -serial unix:s2,server,nowait \
  -serial unix:s3,server,nowait
isa irq 4 already assigned

Is there a work-around for this?


Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now. 
 A bit too strict.


/me goes dig out a old patch which never made it upstream for some 
reason I forgot.  Attached.


HTH,
  Gerd
From 7d5d53e8a23544ac6413487a8ecdd43537ade9f3 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann kra...@redhat.com
Date: Fri, 11 Sep 2009 13:43:46 +0200
Subject: [PATCH] isa: refine irq reservations

There are a few cases where IRQ sharing on the ISA bus is used and
possible.  In general only devices of the same kind can do that.
A few use cases:

  * serial lines 1+3 share irq 4
  * serial lines 2+4 share irq 3
  * parallel ports share irq 7
  * ppc/prep: ide ports share irq 13

This patch refines the irq reservation mechanism for the isa bus to
handle those cases.  It keeps track of the driver which owns the IRQ in
question and allows irq sharing for devices handled by the same driver.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/isa-bus.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 4d489d2..bd2f69c 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -26,6 +26,7 @@ struct ISABus {
 BusState qbus;
 qemu_irq *irqs;
 uint32_t assigned;
+DeviceInfo *irq_owner[16];
 };
 static ISABus *isabus;
 
@@ -71,7 +72,9 @@ qemu_irq isa_reserve_irq(int isairq)
 exit(1);
 }
 if (isabus-assigned  (1  isairq)) {
-fprintf(stderr, isa irq %d already assigned\n, isairq);
+DeviceInfo *owner = isabus-irq_owner[isairq];
+fprintf(stderr, isa irq %d already assigned (%s)\n,
+isairq, owner ? owner-name : unknown);
 exit(1);
 }
 isabus-assigned |= (1  isairq);
@@ -82,10 +85,17 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq)
 {
 assert(dev-nirqs  ARRAY_SIZE(dev-isairq));
 if (isabus-assigned  (1  isairq)) {
-fprintf(stderr, isa irq %d already assigned\n, isairq);
-exit(1);
+DeviceInfo *owner = isabus-irq_owner[isairq];
+if (owner == dev-qdev.info) {
+/* irq sharing is ok in case the same driver handles both */;
+} else {
+fprintf(stderr, isa irq %d already assigned (%s)\n,
+isairq, owner ? owner-name : unknown);
+exit(1);
+}
 }
 isabus-assigned |= (1  isairq);
+isabus-irq_owner[isairq] = dev-qdev.info;
 dev-isairq[dev-nirqs] = isairq;
 *p = isabus-irqs[isairq];
 dev-nirqs++;
-- 
1.6.6.1



Re: 2 serial ports?

2010-03-17 Thread Neo Jia
May I ask if it is possible to bind a real physical serial port to a guest?

Thanks,
Neo

On Wed, Mar 17, 2010 at 1:38 AM, Michael Tokarev m...@tls.msk.ru wrote:
 Since 0.12, it appears that kvm does not allow more than
 2 serial ports for a guest:

 $ kvm \
  -serial unix:s1,server,nowait \
  -serial unix:s2,server,nowait \
  -serial unix:s3,server,nowait
 isa irq 4 already assigned

 Is there a work-around for this?

 Thanks!

 /mjt
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zhang, Yanmin
On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
 * Zhang, Yanmin yanmin_zh...@linux.intel.com wrote:
 
  On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
   On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
 From: Zhang, Yanminyanmin_zh...@linux.intel.com

 Based on the discussion in KVM community, I worked out the patch to 
 support
 perf to collect guest os statistics from host side. This patch is 
 implemented
 with Ingo, Peter and some other guys' kind help. Yang Sheng pointed 
 out a
 critical bug and provided good suggestions with other guys. I really 
 appreciate
 their kind help.

 The patch adds new subcommand kvm to perf.

perf kvm top
perf kvm record
perf kvm report
perf kvm diff

 The new perf could profile guest os kernel except guest os user 
 space, but it
 could summarize guest os user space utilization per guest os.

 Below are some examples.
 1) perf kvm top
 [r...@lkp-ne01 norm]# perf kvm --host --guest 
 --guestkallsyms=/home/ymzhang/guest/kallsyms
 --guestmodules=/home/ymzhang/guest/modules top



   Thanks for your kind comments.
   
Excellent, support for guest kernel != host kernel is critical (I can't 
remember the last time I ran same kernels).

How would we support multiple guests with different kernels?
   With the patch, 'perf kvm report --sort pid could show
   summary statistics for all guest os instances. Then, use
   parameter --pid of 'perf kvm record' to collect single problematic 
   instance data.
  Sorry. I found currently --pid isn't process but a thread (main thread).
  
  Ingo,
  
  Is it possible to support a new parameter or extend --inherit, so 'perf 
  record' and 'perf top' could collect data on all threads of a process when 
  the process is running?
  
  If not, I need add a new ugly parameter which is similar to --pid to filter 
  out process data in userspace.
 
 Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
 introduce --tid for the previous, limited-to-a-single-task functionality.
 
 Most users would expect --pid to work like a 'late attach' - i.e. to work 
 like 
 strace -f or like a gdb attach.

Thanks Ingo, Avi.

I worked out below patch against tip/master of March 15th.

Subject: [PATCH] Change perf's parameter --pid to process-wide collection
From: Zhang, Yanmin yanmin_zh...@linux.intel.com

Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
thread id. Now, --pid means perf collects the statistics of all threads of
the process, while --tid means perf just collect the statistics of that thread.

BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
attr-disabled=1 if it isn't a system-wide collection. If there is a '-p'
and no forks, 'perf stat -p' doesn't collect any data. In addition, the
while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact
on running workload. I added a sleep(1) in the loop.

Signed-off-by: Zhang Yanmin yanmin_zh...@linux.intel.com

---

diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c 
linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-record.c 2010-03-16 
08:59:54.896488489 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c 2010-03-17 
16:30:17.71706 +0800
@@ -27,7 +27,7 @@
 #include unistd.h
 #include sched.h
 
-static int fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int *fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static longdefault_interval=  0;
 
@@ -43,6 +43,9 @@ static intraw_samples 
=  0;
 static int system_wide =  0;
 static int profile_cpu = -1;
 static pid_t   target_pid  = -1;
+static pid_t   target_tid  = -1;
+static int *all_tids   =  NULL;
+static int thread_num  =  0;
 static pid_t   child_pid   = -1;
 static int inherit =  1;
 static int force   =  0;
@@ -60,7 +63,7 @@ static struct timeval this_read;
 
 static u64 bytes_written   =  0;
 
-static struct pollfd   event_array[MAX_NR_CPUS * MAX_COUNTERS];
+static struct pollfd   *event_array;
 
 static int nr_poll =  0;
 static int nr_cpu  =  0;
@@ -77,7 +80,7 @@ struct mmap_data {
unsigned int   

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang
On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
 On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
  On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
   Right, but there is a scope between kvm_guest_enter and really running
   in guest os, where a perf event might overflow. Anyway, the scope is
   very narrow, I will change it to use flag PF_VCPU.
 
  There is also a window between setting the flag and calling 'int $2'
  where an NMI might happen and be accounted incorrectly.
 
  Perhaps separate the 'int $2' into a direct call into perf and another
  call for the rest of NMI handling.  I don't see how it would work on svm
  though - AFAICT the NMI is held whereas vmx swallows it.
 
   I guess NMIs
  will be disabled until the next IRET so it isn't racy, just tricky.
 
 I'm not sure if vmexit does break NMI context or not. Hardware NMI context
 isn't reentrant till a IRET. YangSheng would like to double check it.

After more check, I think VMX won't remained NMI block state for host. That's 
means, if NMI happened and processor is in VMX non-root mode, it would only 
result in VMExit, with a reason indicate that it's due to NMI happened, but no 
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. 
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because 
int $2 don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to generate 
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to 
itself is a good idea.

I am debugging a patch based on apic-send_IPI_self(NMI_VECTOR) to replace 
int $2. Something unexpected is happening...

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Avi Kivity

On 03/17/2010 11:28 AM, Sheng Yang wrote:



I'm not sure if vmexit does break NMI context or not. Hardware NMI context
isn't reentrant till a IRET. YangSheng would like to double check it.
 

After more check, I think VMX won't remained NMI block state for host. That's
means, if NMI happened and processor is in VMX non-root mode, it would only
result in VMExit, with a reason indicate that it's due to NMI happened, but no
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
int $2 don't have effect to block following NMI.
   


That's pretty bad, as NMI runs on a separate stack (via IST).  So if 
another NMI happens while our int $2 is running, the stack will be 
corrupted.



And if the NMI sequence is not important(I think so), then we need to generate
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
itself is a good idea.

I am debugging a patch based on apic-send_IPI_self(NMI_VECTOR) to replace
int $2. Something unexpected is happening...
   


I think you need DM_NMI for that to work correctly.

An alternative is to call the NMI handler directly.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.

2010-03-17 Thread Xin, Xiaohui
 Michael,
 I don't use the kiocb comes from the sendmsg/recvmsg,
 since I have embeded the kiocb in page_info structure,
 and allocate it when page_info allocated.

So what I suggested was that vhost allocates and tracks the iocbs, and
passes them to your device with sendmsg/ recvmsg calls. This way your
device won't need to share structures and locking strategy with vhost:
you get an iocb, handle it, invoke a callback to notify vhost about
completion.

This also gets rid of the 'receiver' callback

I'm not sure receiver callback can be removed here:
The patch describes a work flow like this:
netif_receive_skb() gets the packet, it does nothing but just queue the skb
and wakeup the handle_rx() of vhost. handle_rx() then calls the receiver 
callback
to deal with skb and and get the necessary notify info into a list, vhost owns 
the 
list and in the same handle_rx() context use it to complete.

We use receiver callback here is because only handle_rx() is waked up from
netif_receive_skb(), and we need mp device context to deal with the skb and
notify info attached to it. We also have some lock in the callback function.

If I remove the receiver callback, I can only deal with the skb and notify
info in netif_receive_skb(), but this function is in an interrupt context,
which I think lock is not allowed there. But I cannot remove the lock there.


 Please have a review and thanks for the instruction
 for replying email which helps me a lot.
 
 Thanks,
 Xiaohui
 
  drivers/vhost/net.c   |  159 
  +++--
  drivers/vhost/vhost.h |   12 
  2 files changed, 166 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 22d5fef..5483848 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -17,11 +17,13 @@
  #include linux/workqueue.h
  #include linux/rcupdate.h
  #include linux/file.h
 +#include linux/aio.h
  
  #include linux/net.h
  #include linux/if_packet.h
  #include linux/if_arp.h
  #include linux/if_tun.h
 +#include linux/mpassthru.h
  
  #include net/sock.h
  
 @@ -91,6 +93,12 @@ static void tx_poll_start(struct vhost_net *net, struct 
 socket *sock)
  net-tx_poll_state = VHOST_NET_POLL_STARTED;
  }
  
 +static void handle_async_rx_events_notify(struct vhost_net *net,
 +struct vhost_virtqueue *vq);
 +
 +static void handle_async_tx_events_notify(struct vhost_net *net,
 +struct vhost_virtqueue *vq);
 +

A couple of style comments:

- It's better to arrange functions in such order that forward declarations
aren't necessary.  Since we don't have recursion, this should always be
possible.

- continuation lines should be idented at least at the position of '('
on the previous line.

Thanks. I'd correct that.

  /* Expects to be always run from workqueue - which acts as
   * read-size critical section for our kind of RCU. */
  static void handle_tx(struct vhost_net *net)
 @@ -124,6 +132,8 @@ static void handle_tx(struct vhost_net *net)
  tx_poll_stop(net);
  hdr_size = vq-hdr_size;
  
 +handle_async_tx_events_notify(net, vq);
 +
  for (;;) {
  head = vhost_get_vq_desc(net-dev, vq, vq-iov,
   ARRAY_SIZE(vq-iov),
 @@ -151,6 +161,12 @@ static void handle_tx(struct vhost_net *net)
  /* Skip header. TODO: support TSO. */
  s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
  msg.msg_iovlen = out;
 +
 +if (vq-link_state == VHOST_VQ_LINK_ASYNC) {
 +vq-head = head;
 +msg.msg_control = (void *)vq;

So here a device gets a pointer to vhost_virtqueue structure. If it gets
an iocb and invokes a callback, it would not care about vhost internals.

 +}
 +
  len = iov_length(vq-iov, out);
  /* Sanity check */
  if (!len) {
 @@ -166,6 +182,10 @@ static void handle_tx(struct vhost_net *net)
  tx_poll_start(net, sock);
  break;
  }
 +
 +if (vq-link_state == VHOST_VQ_LINK_ASYNC)
 +continue;
+
  if (err != len)
  pr_err(Truncated TX packet: 
  len %d != %zd\n, err, len);
 @@ -177,6 +197,8 @@ static void handle_tx(struct vhost_net *net)
  }
  }
  
 +handle_async_tx_events_notify(net, vq);
 +
  mutex_unlock(vq-mutex);
  unuse_mm(net-dev.mm);
  }
@@ -206,7 +228,8 @@ static void handle_rx(struct vhost_net *net)
  int err;
  size_t hdr_size;
  struct socket *sock = rcu_dereference(vq-private_data);
 -if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
 +if (!sock || (skb_queue_empty(sock-sk-sk_receive_queue) 
 +vq-link_state == VHOST_VQ_LINK_SYNC))
  return;
  
  use_mm(net-dev.mm);
 @@ -214,9 +237,18 @@ static void handle_rx(struct vhost_net 

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang
On Wednesday 17 March 2010 17:41:58 Avi Kivity wrote:
 On 03/17/2010 11:28 AM, Sheng Yang wrote:
  I'm not sure if vmexit does break NMI context or not. Hardware NMI
  context isn't reentrant till a IRET. YangSheng would like to double
  check it.
 
  After more check, I think VMX won't remained NMI block state for host.
  That's means, if NMI happened and processor is in VMX non-root mode, it
  would only result in VMExit, with a reason indicate that it's due to NMI
  happened, but no more state change in the host.
 
  So in that meaning, there _is_ a window between VMExit and KVM handle the
  NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
  code because int $2 don't have effect to block following NMI.
 
 That's pretty bad, as NMI runs on a separate stack (via IST).  So if
 another NMI happens while our int $2 is running, the stack will be
 corrupted.

Though hardware didn't provide this kind of block, software at least would 
warn about it... nmi_enter() still would be executed by int $2, and result 
in BUG() if we are already in NMI context(OK, it is a little better than 
mysterious crash due to corrupted stack).
 
  And if the NMI sequence is not important(I think so), then we need to
  generate a real NMI in current vmexit-after code. Seems let APIC send a
  NMI IPI to itself is a good idea.
 
  I am debugging a patch based on apic-send_IPI_self(NMI_VECTOR) to
  replace int $2. Something unexpected is happening...
 
 I think you need DM_NMI for that to work correctly.
 
 An alternative is to call the NMI handler directly.

apic_send_IPI_self() already took care of APIC_DM_NMI.

And NMI handler would block the following NMI?

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Avi Kivity

On 03/17/2010 11:51 AM, Sheng Yang wrote:



I think you need DM_NMI for that to work correctly.

An alternative is to call the NMI handler directly.
 

apic_send_IPI_self() already took care of APIC_DM_NMI.
   


So it does (though not for x2apic?).  I don't see why it doesn't work.


And NMI handler would block the following NMI?

   


It wouldn't - won't work without extensive changes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Xiao Guangrong

Takuya Yoshikawa wrote:

 
 Ah, probably checking the git log will explain you why it is like that!
 Marcelo's work? IIRC.

Oh, i find this commit:

commit 706831a7faec7ac0d3057d20df8234c45bbbc3c5
Author: Marcelo Tosatti mtosa...@redhat.com
Date:   Wed Dec 23 14:35:22 2009 -0200

KVM: use SRCU for dirty log

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

But i don't know why Marcelo separates kvm_get_dirty_log()'s code
into kvm_vm_ioctl_get_dirty_log(). :-(

Thanks,
Xiao

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.

2010-03-17 Thread Michael S. Tsirkin
On Wed, Mar 17, 2010 at 05:48:10PM +0800, Xin, Xiaohui wrote:
  Michael,
  I don't use the kiocb comes from the sendmsg/recvmsg,
  since I have embeded the kiocb in page_info structure,
  and allocate it when page_info allocated.
 
 So what I suggested was that vhost allocates and tracks the iocbs, and
 passes them to your device with sendmsg/ recvmsg calls. This way your
 device won't need to share structures and locking strategy with vhost:
 you get an iocb, handle it, invoke a callback to notify vhost about
 completion.
 
 This also gets rid of the 'receiver' callback
 
 I'm not sure receiver callback can be removed here:
 The patch describes a work flow like this:
 netif_receive_skb() gets the packet, it does nothing but just queue the skb
 and wakeup the handle_rx() of vhost. handle_rx() then calls the receiver 
 callback
 to deal with skb and and get the necessary notify info into a list, vhost 
 owns the 
 list and in the same handle_rx() context use it to complete.
 
 We use receiver callback here is because only handle_rx() is waked up from
 netif_receive_skb(), and we need mp device context to deal with the skb and
 notify info attached to it. We also have some lock in the callback function.
 
 If I remove the receiver callback, I can only deal with the skb and notify
 info in netif_receive_skb(), but this function is in an interrupt context,
 which I think lock is not allowed there. But I cannot remove the lock there.
 

The basic idea is that vhost passes iocb to recvmsg and backend
completes the iocb to signal that data is ready. That completion could
be in interrupt context and so we need to switch to workqueue to handle
the event, it is true, but the code to do this would live in vhost.c or
net.c.

With this structure your device won't depend on
vhost, and can go under drivers/net/, opening up possibility
to use it for zero copy without vhost in the future.



  Please have a review and thanks for the instruction
  for replying email which helps me a lot.
  
  Thanks,
  Xiaohui
  
   drivers/vhost/net.c   |  159 
   +++--
   drivers/vhost/vhost.h |   12 
   2 files changed, 166 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
  index 22d5fef..5483848 100644
  --- a/drivers/vhost/net.c
  +++ b/drivers/vhost/net.c
  @@ -17,11 +17,13 @@
   #include linux/workqueue.h
   #include linux/rcupdate.h
   #include linux/file.h
  +#include linux/aio.h
   
   #include linux/net.h
   #include linux/if_packet.h
   #include linux/if_arp.h
   #include linux/if_tun.h
  +#include linux/mpassthru.h
   
   #include net/sock.h
   
  @@ -91,6 +93,12 @@ static void tx_poll_start(struct vhost_net *net, struct 
  socket *sock)
 net-tx_poll_state = VHOST_NET_POLL_STARTED;
   }
   
  +static void handle_async_rx_events_notify(struct vhost_net *net,
  +  struct vhost_virtqueue *vq);
  +
  +static void handle_async_tx_events_notify(struct vhost_net *net,
  +  struct vhost_virtqueue *vq);
  +
 
 A couple of style comments:
 
 - It's better to arrange functions in such order that forward declarations
 aren't necessary.  Since we don't have recursion, this should always be
 possible.
 
 - continuation lines should be idented at least at the position of '('
 on the previous line.
 
 Thanks. I'd correct that.
 
   /* Expects to be always run from workqueue - which acts as
* read-size critical section for our kind of RCU. */
   static void handle_tx(struct vhost_net *net)
  @@ -124,6 +132,8 @@ static void handle_tx(struct vhost_net *net)
 tx_poll_stop(net);
 hdr_size = vq-hdr_size;
   
  +  handle_async_tx_events_notify(net, vq);
  +
 for (;;) {
 head = vhost_get_vq_desc(net-dev, vq, vq-iov,
  ARRAY_SIZE(vq-iov),
  @@ -151,6 +161,12 @@ static void handle_tx(struct vhost_net *net)
 /* Skip header. TODO: support TSO. */
 s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
 msg.msg_iovlen = out;
  +
  +  if (vq-link_state == VHOST_VQ_LINK_ASYNC) {
  +  vq-head = head;
  +  msg.msg_control = (void *)vq;
 
 So here a device gets a pointer to vhost_virtqueue structure. If it gets
 an iocb and invokes a callback, it would not care about vhost internals.
 
  +  }
  +
 len = iov_length(vq-iov, out);
 /* Sanity check */
 if (!len) {
  @@ -166,6 +182,10 @@ static void handle_tx(struct vhost_net *net)
 tx_poll_start(net, sock);
 break;
 }
  +
  +  if (vq-link_state == VHOST_VQ_LINK_ASYNC)
  +  continue;
 +
 if (err != len)
 pr_err(Truncated TX packet: 
 len %d != %zd\n, err, len);
  @@ -177,6 +197,8 @@ static void handle_tx(struct vhost_net *net)
 }
  

Re: 2 serial ports?

2010-03-17 Thread Michael Tokarev
Gerd Hoffmann wrote:
 On 03/17/10 09:38, Michael Tokarev wrote:
 Since 0.12, it appears that kvm does not allow more than
 2 serial ports for a guest:

 $ kvm \
   -serial unix:s1,server,nowait \
   -serial unix:s2,server,nowait \
   -serial unix:s3,server,nowait
 isa irq 4 already assigned

 Is there a work-around for this?
 
 Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now.
  A bit too strict.
 
 /me goes dig out a old patch which never made it upstream for some
 reason I forgot.  Attached.

I tried the patch, and it now appears to work.  I did not try
to run various stress tests so far, but basic tests are fine.

Thank you Gerd!  And I think it's time to push it finally :)

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 serial ports?

2010-03-17 Thread Michael Tokarev
Neo Jia wrote:
 May I ask if it is possible to bind a real physical serial port to a guest?

It is all described in the documentation, quite a long list of
various things you can attach to a virtual serial port, incl.
a real one.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: 2 serial ports?

2010-03-17 Thread Paul Brook
 Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now.
   A bit too strict.
 
 /me goes dig out a old patch which never made it upstream for some
 reason I forgot.  Attached.

This is wrong. Two devices should never be manipulating the same qemu_irq 
object.  If you want multiple devices connected to the same IRQ then you need 
an explicit multiplexer. e.g. arm_timer.c:sp804_set_irq.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Make qcow2 check image non critical

2010-03-17 Thread Lucas Meneghel Rodrigues
Instead of forcing the vms to shut down due to qemu-img
check step, just make the postprocess step non-critical,
ie, doesn't make the test fail because of it. The check
is still there, but it won't mask the results of tests
itself, while providing useful additional info.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests_base.cfg.sample |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index beae786..bb455e6 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -1049,8 +1049,7 @@ variants:
 post_command =  python scripts/check_image.py;
 remove_image = no
 post_command_timeout = 600
-kill_vm = yes
-kill_vm_gracefully = yes
+post_command_noncritical = yes
 - vmdk:
 only Fedora Ubuntu Windows
 only smp2
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM-test: SR-IOV: Fix a bug that wrongly check VFs count

2010-03-17 Thread Lucas Meneghel Rodrigues
On Thu, Mar 11, 2010 at 2:54 AM, Yolkfull Chow yz...@redhat.com wrote:
 The parameter 'devices_requested' is irrelated to driver_option 'max_vfs'
 of 'igb'.

 NIC card 82576 has two network interfaces and each can be
 virtualized up to 7 virtual functions, therefore we multiply
 two for the value of driver_option 'max_vfs' and can thus get
 the total number of VFs.

Applied, thanks!

 Signed-off-by: Yolkfull Chow yz...@redhat.com
 ---
  client/tests/kvm/kvm_utils.py |   19 +--
  1 files changed, 13 insertions(+), 6 deletions(-)

 diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
 index 4565dc1..1813ed1 100644
 --- a/client/tests/kvm/kvm_utils.py
 +++ b/client/tests/kvm/kvm_utils.py
 @@ -1012,17 +1012,22 @@ class PciAssignable(object):
         
         Get VFs count number according to lspci.
         
 +        # FIXME: Need to think out a method of identify which
 +        # 'virtual function' belongs to which physical card considering
 +        # that if the host has more than one 82576 card. PCI_ID?
         cmd = lspci | grep 'Virtual Function' | wc -l
 -        # For each VF we'll see 2 prints of 'Virtual Function', so let's
 -        # divide the result per 2
 -        return int(commands.getoutput(cmd)) / 2
 +        return int(commands.getoutput(cmd))


     def check_vfs_count(self):
         
         Check VFs count number according to the parameter driver_options.
         
 -        return (self.get_vfs_count == self.devices_requested)
 +        # Network card 82576 has two network interfaces and each can be
 +        # virtualized up to 7 virtual functions, therefore we multiply
 +        # two for the value of driver_option 'max_vfs'.
 +        expected_count = int((re.findall((\d), self.driver_option)[0])) * 2
 +        return (self.get_vfs_count == expected_count)


     def is_binded_to_stub(self, full_id):
 @@ -1054,15 +1059,17 @@ class PciAssignable(object):
         elif not self.check_vfs_count():
             os.system(modprobe -r %s % self.driver)
             re_probe = True
 +        else:
 +            return True

         # Re-probe driver with proper number of VFs
         if re_probe:
             cmd = modprobe %s %s % (self.driver, self.driver_option)
 +            logging.info(Loading the driver '%s' with option '%s' %
 +                                   (self.driver, self.driver_option))
             s, o = commands.getstatusoutput(cmd)
             if s:
                 return False
 -            if not self.check_vfs_count():
 -                return False
             return True


 --
 1.7.0.1

 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest




-- 
Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 1/2] KVM test: Refactoring the 'autotest' subtest

2010-03-17 Thread Lucas Meneghel Rodrigues
On Fri, Feb 26, 2010 at 1:13 AM, sudhir kumar smalik...@gmail.com wrote:
 Looks good to me. It will definitely boost test speed for certain
 tests and give flexibility to use existing autotest strength in more
 granular way.

Thank you! FYI, this patch was applied, mainly because it's not
dependent on cpu_set test itself:

http://autotest.kernel.org/changeset/4308

 On Fri, Feb 26, 2010 at 1:13 AM, Lucas Meneghel Rodrigues
 l...@redhat.com wrote:
 Refactor autotest subtest into a utility function, so other
 KVM subtests can run autotest control files in hosts as part
 of their routine.

 This arrangement was made to accomodate the upcoming 'cpu_set'
 test.

 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/kvm_test_utils.py |  165 
 +++-
  client/tests/kvm/tests/autotest.py |  153 ++---
  2 files changed, 171 insertions(+), 147 deletions(-)

 diff --git a/client/tests/kvm/kvm_test_utils.py 
 b/client/tests/kvm/kvm_test_utils.py
 index 7d96d6e..71d6303 100644
 --- a/client/tests/kvm/kvm_test_utils.py
 +++ b/client/tests/kvm/kvm_test_utils.py
 @@ -24,7 +24,7 @@ More specifically:
  import time, os, logging, re, commands
  from autotest_lib.client.common_lib import error
  from autotest_lib.client.bin import utils
 -import kvm_utils, kvm_vm, kvm_subprocess
 +import kvm_utils, kvm_vm, kvm_subprocess, scan_results


  def get_living_vm(env, vm_name):
 @@ -237,3 +237,166 @@ def get_memory_info(lvms):
     meminfo = meminfo[0:-2] + }

     return meminfo
 +
 +
 +def run_autotest(vm, session, control_path, timeout, test_name, outputdir):
 +    
 +    Run an autotest control file inside a guest (linux only utility).
 +
 +   �...@param vm: VM object.
 +   �...@param session: A shell session on the VM provided.
 +   �...@param control: An autotest control file.
 +   �...@param timeout: Timeout under which the autotest test must complete.
 +   �...@param test_name: Autotest client test name.
 +   �...@param outputdir: Path on host where we should copy the guest 
 autotest
 +            results to.
 +    
 +    def copy_if_size_differs(vm, local_path, remote_path):
 +        
 +        Copy a file to a guest if it doesn't exist or if its size differs.
 +
 +       �...@param vm: VM object.
 +       �...@param local_path: Local path.
 +       �...@param remote_path: Remote path.
 +        
 +        copy = False
 +        basename = os.path.basename(local_path)
 +        local_size = os.path.getsize(local_path)
 +        output = session.get_command_output(ls -l %s % remote_path)
 +        if such file in output:
 +            logging.info(Copying %s to guest (remote file is missing) %
 +                         basename)
 +            copy = True
 +        else:
 +            try:
 +                remote_size = output.split()[4]
 +                remote_size = int(remote_size)
 +            except IndexError, ValueError:
 +                logging.error(Check for remote path size %s returned %s. 
 +                              Cannot process., remote_path, output)
 +                raise error.TestFail(Failed to check for %s (Guest died?) 
 %
 +                                     remote_path)
 +            if remote_size != local_size:
 +                logging.debug(Copying %s to guest due to size mismatch
 +                              (remote size %s, local size %s) %
 +                              (basename, remote_size, local_size))
 +                copy = True
 +
 +        if copy:
 +            if not vm.copy_files_to(local_path, remote_path):
 +                raise error.TestFail(Could not copy %s to guest % 
 local_path)
 +
 +
 +    def extract(vm, remote_path, dest_dir=.):
 +        
 +        Extract a .tar.bz2 file on the guest.
 +
 +       �...@param vm: VM object
 +       �...@param remote_path: Remote file path
 +       �...@param dest_dir: Destination dir for the contents
 +        
 +        basename = os.path.basename(remote_path)
 +        logging.info(Extracting %s... % basename)
 +        (status, output) = session.get_command_status_output(
 +                                  tar xjvf %s -C %s % (remote_path, 
 dest_dir))
 +        if status != 0:
 +            logging.error(Uncompress output:\n%s % output)
 +            raise error.TestFail(Could not extract % on guest)
 +
 +    if not os.path.isfile(control_path):
 +        raise error.TestError(Invalid path to autotest control file: %s %
 +                              control_path)
 +
 +    tarred_autotest_path = /tmp/autotest.tar.bz2
 +    tarred_test_path = /tmp/%s.tar.bz2 % test_name
 +
 +    # To avoid problems, let's make the test use the current AUTODIR
 +    # (autotest client path) location
 +    autotest_path = os.environ['AUTODIR']
 +    tests_path = os.path.join(autotest_path, 'tests')
 +    test_path = os.path.join(tests_path, test_name)
 +
 +    # tar the contents of bindir/autotest
 +    cmd = tar cvjf %s %s/* % (tarred_autotest_path, 

[PATCHv6 0/4] qemu-kvm: vhost net port

2010-03-17 Thread Michael S. Tsirkin
This is port of vhost v6 patch set I posted previously to qemu-kvm, for
those that want to get good performance out of it :) This patchset needs
to be applied when qemu.git one gets merged, this includes irqchip
support.

Changes from previous version:
- check kvm_enabled in irqfd call

Michael S. Tsirkin (4):
  qemu-kvm: add vhost.h header
  kvm: irqfd support
  msix: add mask/unmask notifiers
  virtio-pci: irqfd support

 hw/msix.c |   36 -
 hw/msix.h |1 +
 hw/pci.h  |6 ++
 hw/virtio-pci.c   |   27 +
 kvm-all.c |   19 +++
 kvm.h |   10 
 kvm/include/linux/vhost.h |  130 +
 7 files changed, 228 insertions(+), 1 deletions(-)
 create mode 100644 kvm/include/linux/vhost.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 1/4] qemu-kvm: add vhost.h header

2010-03-17 Thread Michael S. Tsirkin
This makes it possible to build vhost support
on systems which do not have this header.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm/include/linux/vhost.h |  130 +
 1 files changed, 130 insertions(+), 0 deletions(-)
 create mode 100644 kvm/include/linux/vhost.h

diff --git a/kvm/include/linux/vhost.h b/kvm/include/linux/vhost.h
new file mode 100644
index 000..165a484
--- /dev/null
+++ b/kvm/include/linux/vhost.h
@@ -0,0 +1,130 @@
+#ifndef _LINUX_VHOST_H
+#define _LINUX_VHOST_H
+/* Userspace interface for in-kernel virtio accelerators. */
+
+/* vhost is used to reduce the number of system calls involved in virtio.
+ *
+ * Existing virtio net code is used in the guest without modification.
+ *
+ * This header includes interface used by userspace hypervisor for
+ * device configuration.
+ */
+
+#include linux/types.h
+
+#include linux/ioctl.h
+#include linux/virtio_config.h
+#include linux/virtio_ring.h
+
+struct vhost_vring_state {
+   unsigned int index;
+   unsigned int num;
+};
+
+struct vhost_vring_file {
+   unsigned int index;
+   int fd; /* Pass -1 to unbind from file. */
+
+};
+
+struct vhost_vring_addr {
+   unsigned int index;
+   /* Option flags. */
+   unsigned int flags;
+   /* Flag values: */
+   /* Whether log address is valid. If set enables logging. */
+#define VHOST_VRING_F_LOG 0
+
+   /* Start of array of descriptors (virtually contiguous) */
+   __u64 desc_user_addr;
+   /* Used structure address. Must be 32 bit aligned */
+   __u64 used_user_addr;
+   /* Available structure address. Must be 16 bit aligned */
+   __u64 avail_user_addr;
+   /* Logging support. */
+   /* Log writes to used structure, at offset calculated from specified
+* address. Address must be 32 bit aligned. */
+   __u64 log_guest_addr;
+};
+
+struct vhost_memory_region {
+   __u64 guest_phys_addr;
+   __u64 memory_size; /* bytes */
+   __u64 userspace_addr;
+   __u64 flags_padding; /* No flags are currently specified. */
+};
+
+/* All region addresses and sizes must be 4K aligned. */
+#define VHOST_PAGE_SIZE 0x1000
+
+struct vhost_memory {
+   __u32 nregions;
+   __u32 padding;
+   struct vhost_memory_region regions[0];
+};
+
+/* ioctls */
+
+#define VHOST_VIRTIO 0xAF
+
+/* Features bitmask for forward compatibility.  Transport bits are used for
+ * vhost specific features. */
+#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)
+
+/* Set current process as the (exclusive) owner of this file descriptor.  This
+ * must be called before any other vhost command.  Further calls to
+ * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */
+#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
+/* Give up ownership, and reset the device to default values.
+ * Allows subsequent call to VHOST_OWNER_SET to succeed. */
+#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
+
+/* Set up/modify memory layout */
+#define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory)
+
+/* Write logging setup. */
+/* Memory writes can optionally be logged by setting bit at an offset
+ * (calculated from the physical address) from specified log base.
+ * The bit is set using an atomic 32 bit operation. */
+/* Set base address for logging. */
+#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64)
+/* Specify an eventfd file descriptor to signal on log write. */
+#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int)
+
+/* Ring setup. */
+/* Set number of descriptors in ring. This parameter can not
+ * be modified while ring is running (bound to a device). */
+#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
+/* Set addresses for the ring. */
+#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
+/* Base value where queue looks for available descriptors */
+#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+/* Get accessor: reads index, writes value in num */
+#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct 
vhost_vring_state)
+
+/* The following ioctls use eventfd file descriptors to signal and poll
+ * for events. */
+
+/* Set eventfd to poll for added buffers */
+#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
+/* Set eventfd to signal when buffers have beed used */
+#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
+/* Set eventfd to signal an error */
+#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+
+/* VHOST_NET specific defines */
+
+/* Attach virtio net ring to a raw socket, or tap device.
+ * The socket must be already bound to an ethernet device, this device will be
+ * used for transmit.  Pass fd -1 to unbind from the socket and the transmit
+ * device.  This can be used to stop the ring 

[PATCHv6 2/4] kvm: irqfd support

2010-03-17 Thread Michael S. Tsirkin
Add API to assign/deassign irqfd to kvm.
Add stub so that users do not have to use
ifdefs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm-all.c |   19 +++
 kvm.h |   10 ++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7b05462..1a15662 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1200,5 +1200,24 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
uint16_t val, bool assign)
 }
 #endif
 
+#if defined(KVM_IRQFD)
+int kvm_set_irqfd(int gsi, int fd, bool assigned)
+{
+struct kvm_irqfd irqfd = {
+.fd = fd,
+.gsi = gsi,
+.flags = assigned ? 0 : KVM_IRQFD_FLAG_DEASSIGN,
+};
+int r;
+if (!kvm_enabled() || !kvm_irqchip_in_kernel())
+return -ENOSYS;
+
+r = kvm_vm_ioctl(kvm_state, KVM_IRQFD, irqfd);
+if (r  0)
+return r;
+return 0;
+}
+#endif
+
 #undef PAGE_SIZE
 #include qemu-kvm.c
diff --git a/kvm.h b/kvm.h
index 0951380..72dcaca 100644
--- a/kvm.h
+++ b/kvm.h
@@ -180,4 +180,14 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, 
uint16_t val, bool assign)
 }
 #endif
 
+#if defined(KVM_IRQFD)  defined(CONFIG_KVM)
+int kvm_set_irqfd(int gsi, int fd, bool assigned);
+#else
+static inline
+int kvm_set_irqfd(int gsi, int fd, bool assigned)
+{
+return -ENOSYS;
+}
+#endif
+
 #endif
-- 
1.7.0.18.g0d53a5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 3/4] msix: add mask/unmask notifiers

2010-03-17 Thread Michael S. Tsirkin
Support per-vector callbacks for msix mask/unmask.
Will be used for vhost net.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/msix.c |   36 +++-
 hw/msix.h |1 +
 hw/pci.h  |6 ++
 3 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index faee0b2..3ec8805 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -317,6 +317,13 @@ static void msix_mmio_writel(void *opaque, 
target_phys_addr_t addr,
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
 kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
 }
+if (was_masked != msix_is_masked(dev, vector) 
+dev-msix_mask_notifier  dev-msix_mask_notifier_opaque[vector]) {
+int r = dev-msix_mask_notifier(dev, vector,
+   dev-msix_mask_notifier_opaque[vector],
+   msix_is_masked(dev, vector));
+assert(r = 0);
+}
 msix_handle_mask_update(dev, vector);
 }
 
@@ -355,10 +362,18 @@ void msix_mmio_map(PCIDevice *d, int region_num,
 
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
-int vector;
+int vector, r;
 for (vector = 0; vector  nentries; ++vector) {
 unsigned offset = vector * MSIX_ENTRY_SIZE + MSIX_VECTOR_CTRL;
+int was_masked = msix_is_masked(dev, vector);
 dev-msix_table_page[offset] |= MSIX_VECTOR_MASK;
+if (was_masked != msix_is_masked(dev, vector) 
+dev-msix_mask_notifier  dev-msix_mask_notifier_opaque[vector]) 
{
+r = dev-msix_mask_notifier(dev, vector,
+dev-msix_mask_notifier_opaque[vector],
+msix_is_masked(dev, vector));
+assert(r = 0);
+}
 }
 }
 
@@ -381,6 +396,9 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
 sizeof *dev-msix_irq_entries);
 }
 #endif
+dev-msix_mask_notifier_opaque =
+qemu_mallocz(nentries * sizeof *dev-msix_mask_notifier_opaque);
+dev-msix_mask_notifier = NULL;
 dev-msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES *
 sizeof *dev-msix_entry_used);
 
@@ -443,6 +461,8 @@ int msix_uninit(PCIDevice *dev)
 dev-msix_entry_used = NULL;
 qemu_free(dev-msix_irq_entries);
 dev-msix_irq_entries = NULL;
+qemu_free(dev-msix_mask_notifier_opaque);
+dev-msix_mask_notifier_opaque = NULL;
 dev-cap_present = ~QEMU_PCI_CAP_MSIX;
 return 0;
 }
@@ -586,3 +606,17 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 return;
 msix_free_irq_entries(dev);
 }
+
+int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
+{
+int r = 0;
+if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
+return 0;
+
+if (dev-msix_mask_notifier)
+r = dev-msix_mask_notifier(dev, vector, opaque,
+msix_is_masked(dev, vector));
+if (r = 0)
+dev-msix_mask_notifier_opaque[vector] = opaque;
+return r;
+}
diff --git a/hw/msix.h b/hw/msix.h
index a9f7993..f167231 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -33,4 +33,5 @@ void msix_reset(PCIDevice *dev);
 
 extern int msix_supported;
 
+int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque);
 #endif
diff --git a/hw/pci.h b/hw/pci.h
index 1eab8f2..100104c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -136,6 +136,9 @@ enum {
 #define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
 #define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10
 
+typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
+  void *opaque, int masked);
+
 struct PCIDevice {
 DeviceState qdev;
 /* PCI config space */
@@ -201,6 +204,9 @@ struct PCIDevice {
 
 struct kvm_irq_routing_entry *msix_irq_entries;
 
+void **msix_mask_notifier_opaque;
+msix_mask_notifier_func msix_mask_notifier;
+
 /* Device capability configuration space */
 struct {
 int supported;
-- 
1.7.0.18.g0d53a5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 4/4] virtio-pci: irqfd support

2010-03-17 Thread Michael S. Tsirkin
Use irqfd when supported by kernel.
This uses msix mask notifiers: when vector is masked, we poll it from
userspace.  When it is unmasked, we poll it from kernel.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio-pci.c |   27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 4255d98..f8d8022 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -402,6 +402,27 @@ static void virtio_pci_guest_notifier_read(void *opaque)
 }
 }
 
+static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector,
+void *opaque, int masked)
+{
+VirtQueue *vq = opaque;
+EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
+int r = kvm_set_irqfd(dev-msix_irq_entries[vector].gsi,
+  event_notifier_get_fd(notifier),
+  !masked);
+if (r  0) {
+return (r == -ENOSYS) ? 0 : r;
+}
+if (masked) {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+virtio_pci_guest_notifier_read, NULL, vq);
+} else {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+NULL, NULL, NULL);
+}
+return 0;
+}
+
 static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -415,7 +436,11 @@ static int virtio_pci_set_guest_notifier(void *opaque, int 
n, bool assign)
 }
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 virtio_pci_guest_notifier_read, NULL, vq);
+msix_set_mask_notifier(proxy-pci_dev,
+   virtio_queue_vector(proxy-vdev, n), vq);
 } else {
+msix_set_mask_notifier(proxy-pci_dev,
+   virtio_queue_vector(proxy-vdev, n), NULL);
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, NULL);
 event_notifier_cleanup(notifier);
@@ -500,6 +525,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 
 proxy-pci_dev.config_write = virtio_write_config;
 
+proxy-pci_dev.msix_mask_notifier = virtio_pci_mask_notifier;
+
 size = VIRTIO_PCI_REGION_SIZE(proxy-pci_dev) + vdev-config_len;
 if (size  (size-1))
 size = 1  qemu_fls(size);
-- 
1.7.0.18.g0d53a5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt qcow2 image, recovery?

2010-03-17 Thread RW
Liang Guo gave me this advice some weeks ago:

 you may use kvm-nbd or qemu-nbd to present kvm image
 as a NBD device, so
 that you can use nbd-client to access them. eg:

 kvm-nbd /vm/sid1.img
 modprobe nbd
 nbd-client localhost 1024 /dev/nbd0
 fdisk -l /dev/nbd0

Didn't work for me because I always got a segfault
but maybe it work's for you.

- Robert




On 03/16/10 19:21, Christian Nilsson wrote:
 Hi!
 
 I'm running kvm / qemu-kvm on a couple of production servers everything (or 
 at least most things) works as it should.
 However today someone thought it was a good idea to restart one of the 
 servers and after that the windows 2k3 guest on that server don't boot 
 anymore.
 
 kvm on this server is a bit outdated: QEMU PC emulator version 0.9.1 
 (kvm-83)
 (I guess this is one of the qcow2 corruption bugs, and i can only blame 
 myself for not upgrading kvm sooner.)
 The guest.qcow2 is a 21GiB file for a 60GiB disk
 
 i have tried a couple of things kvm-img convert -f qcow2 -O raw guest.qcow2 
 guest.raw
 this stops and does nothing after creating a guest.raw that is 60GiB but only 
 using 60MiB
 
 so mounted the fs from another server running: QEMU PC emulator version 
 0.12.1 (qemu-kvm-0.12.1.2)
 
 and run qemu-img with the same options as above and after a few secs got 
 qemu-img: error while reading
 and the same 60MiB used by guest.raw
 
 i also tried booting qemu-kvm with a linux guest and this qcow2 image but 
 only get I/O Errors (and no partitions found)
 
 # qemu-img check guest.qcow2
 ERROR: invalid cluster offset=0x10a000  
 ERROR OFLAG_COPIED: l2_offset=ee73 refcount=1   
 ERROR l2_offset=ee73: Table is not cluster aligned; L1 entry corrupted
 ERROR: invalid cluster offset=0x11d44100080   
 ERROR: invalid cluster offset=0x11d61600080   
 ERROR: invalid cluster offset=0x11d68600080   
 ERROR: invalid cluster offset=0x11d95300080
 (and a loot more in this style, full log can be provided if 
 it would be of help to anybody)
 
 
 
 is there any possibility to repair this file, or convert it to a RAW file 
 (even with parts padded that are not safe from the qcow2 image), or as a 
 last resort, are there any debug tools for qcow2 images that might be of use?
 
 I have read up on the qcow fileformat but right now i'm a bit short of time, 
 i need the data in this guests disk image, or at least the MS SQL datafiles 
 that are on this disk) i have also checked the qcow2 file and it do contain a 
 NTLDR string and a loot of other NTFS recognized strings so i know that all 
 data is not gone. the question is how can i access it as a Filesystem again?
 
 
 Any help would be appreciated!
 
 Regards
 Christian Nilsson
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

2010-03-17 Thread Alexander Graf
Avi Kivity wrote:
 On 11/12/2009 02:04 AM, Jan Kiszka wrote:
 This new IOCTL exports all yet user-invisible states related to
 exceptions, interrupts, and NMIs. Together with appropriate user space
 changes, this fixes sporadic problems of vmsave/restore, live migration
 and system reset.



 Applied, thanks.  I added a flags field to the structure in case we
 discover a new bit that needs to fit in there.  Please take a look
 (separate commit in kvm-next).


So without this patch migration fails? Sounds like a stable candidate to
me. Same goes for the follow-up that adds the shadow field.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [Autotest PATCH] KVM-test: Add a subtest 'qemu_img'

2010-03-17 Thread Lucas Meneghel Rodrigues
Copying Michael on the message.

Hi Yolkfull, I have reviewed this patch and I have some comments to
make on it, similar to the ones I made on an earlier version of it:

One of the things that I noticed is that this patch doesn't work very
well out of the box:

[...@freedom kvm]$ ./scan_results.py
TestStatus  Seconds 
Info
--  --- 

(Result file: ../../results/default/status)
smp2.Fedora.11.64.qemu_img.checkGOOD47  
completed successfully
smp2.Fedora.11.64.qemu_img.create   GOOD44  
completed successfully
smp2.Fedora.11.64.qemu_img.convert.to_qcow2 FAIL45  
Image
converted failed; Command: /usr/bin/qemu-img convert -f qcow2 -O qcow2
/tmp/kvm_autotest_root/images/fc11-64.qcow2
/tmp/kvm_autotest_root/images/fc11-64.qcow2.converted_qcow2;Output is:
qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2'
smp2.Fedora.11.64.qemu_img.convert.to_raw   FAIL46  
Image
converted failed; Command: /usr/bin/qemu-img convert -f qcow2 -O raw
/tmp/kvm_autotest_root/images/fc11-64.qcow2
/tmp/kvm_autotest_root/images/fc11-64.qcow2.converted_raw;Output is:
qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2'
smp2.Fedora.11.64.qemu_img.snapshot FAIL44  
Create
snapshot failed via command: /usr/bin/qemu-img snapshot -c snapshot0
/tmp/kvm_autotest_root/images/fc11-64.qcow2;Output is: qemu-img: Could
not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2'
smp2.Fedora.11.64.qemu_img.commit   GOOD44  
completed successfully
smp2.Fedora.11.64.qemu_img.info FAIL44  
Unhandled
str: Unhandled TypeError: argument of type 'NoneType' is not iterable
smp2.Fedora.11.64.qemu_img.rebase   TEST_NA 43  
Current
kvm user space version does not support 'rebase' subcommand
GOOD412 

We need to fix that before upstream inclusion.

Also, one thing that I've noticed is that this test doesn't depend of
any other variants, so we don't need to repeat it to every combination
of guest and qemu command line options. Michael, does it occur to you
a way to get this test out of the variants block, so it gets executed
only once per job and not every combination of guest and other qemu
options?

On Fri, Jan 29, 2010 at 4:00 AM, Yolkfull Chow yz...@redhat.com wrote:
 This is designed to test all subcommands of 'qemu-img' however
 so far 'commit' is not implemented.

 * For 'check' subcommand test, it will 'dd' to create a file with specified
 size and see whether it's supported to be checked. Then convert it to be
 supported formats ('qcow2' and 'raw' so far) to see whether there's error
 after convertion.

 * For 'convert' subcommand test, it will convert both to 'qcow2' and 'raw' 
 from
 the format specified in config file. And only check 'qcow2' after convertion.

 * For 'snapshot' subcommand test, it will create two snapshots and list them.
 Finally delete them if no errors found.

 * For 'info' subcommand test, it will check image format  size according to
 output of 'info' subcommand  at specified image file.

 * For 'rebase' subcommand test, it will create first snapshot 'sn1' based on 
 original
 base_img, and create second snapshot based on sn1. And then rebase sn2 to 
 base_img.
 After rebase check the baking_file of sn2.

 This supports two rebase mode: unsafe mode and safe mode:
 Unsafe mode:
 With -u an unsafe mode is enabled that doesn't require the backing files to 
 exist.
 It merely changes the backing file reference in the COW image. This is useful 
 for
 renaming or moving the backing file. The user is responsible to make sure 
 that the
 new backing file has no changes compared to the old one, or corruption may 
 occur.

 Safe Mode:
 Both the current and the new backing file need to exist, and after the 
 rebase, the
 COW image is guaranteed to have the same guest visible content as before.
 To achieve this, old and new backing file are compared and, if necessary, 
 data is
 copied from the old backing file into the COW image.

 Signed-off-by: Yolkfull Chow yz...@redhat.com
 ---
  client/tests/kvm/tests/qemu_img.py     |  235 
 
  client/tests/kvm/tests_base.cfg.sample |   40 ++
  2 files changed, 275 insertions(+), 0 deletions(-)
  create mode 100644 client/tests/kvm/tests/qemu_img.py

 diff --git a/client/tests/kvm/tests/qemu_img.py 
 b/client/tests/kvm/tests/qemu_img.py
 new file mode 100644
 index 000..e6352a0
 --- /dev/null
 +++ b/client/tests/kvm/tests/qemu_img.py
 @@ -0,0 +1,235 @@
 +import re, os, logging, commands
 +from autotest_lib.client.common_lib import utils, error
 +import 

Re: [PATCH 05/10] Don't call apic functions directly from kvm code

2010-03-17 Thread Glauber Costa
On Tue, Mar 09, 2010 at 03:27:02PM +0200, Avi Kivity wrote:
 On 02/26/2010 10:12 PM, Glauber Costa wrote:
 It is actually not necessary to call a tpr function to save and load cr8,
 as cr8 is part of the processor state, and thus, it is much easier
 to just add it to CPUState.
 
 As for apic base, wrap kvm usages, so we can call either the qemu device,
 or the in kernel version.
 
 
   }
 
 +static void kvm_set_apic_base(CPUState *env, uint64_t val)
 +{
 +if (!kvm_irqchip_in_kernel())
 +cpu_set_apic_base(env, val);
 
 What if it is in kernel?  Just ignored?  Doesn't seem right.
At this point it is right, because there is no irqchip in kernel yet.

In a later patch, irqchip in kernel begins to exist, and this function
gets filled.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: fix error handling in vring ioctls

2010-03-17 Thread Michael S. Tsirkin
Stanse found a locking problem in vhost_set_vring:
several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
VHOST_SET_VRING_ERR with the vq-mutex held.
Fix these up.

Reported-by: Jiri Slaby jirisl...@gmail.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 7cd55e0..7bd7a1e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
if (r  0)
break;
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
-   if (IS_ERR(eventfp))
-   return PTR_ERR(eventfp);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
if (eventfp != vq-kick) {
pollstop = filep = vq-kick;
pollstart = vq-kick = eventfp;
@@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
if (r  0)
break;
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
-   if (IS_ERR(eventfp))
-   return PTR_ERR(eventfp);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
if (eventfp != vq-call) {
filep = vq-call;
ctx = vq-call_ctx;
@@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
if (r  0)
break;
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
-   if (IS_ERR(eventfp))
-   return PTR_ERR(eventfp);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
if (eventfp != vq-error) {
filep = vq-error;
vq-error = eventfp;
-- 
1.7.0.18.g0d53a5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: fix interrupt mitigation with raw sockets

2010-03-17 Thread Michael S. Tsirkin
A thinko in code means we never trigger interrupt
mitigation. Fix this.

Reported-by: Juan Quintela quint...@redhat.com
Reported-by: Unai Uribarri unai.uriba...@optenet.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fcafb6b..a6a88df 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -125,7 +125,7 @@ static void handle_tx(struct vhost_net *net)
mutex_lock(vq-mutex);
vhost_disable_notify(vq);
 
-   if (wmem  sock-sk-sk_sndbuf * 2)
+   if (wmem  sock-sk-sk_sndbuf / 2)
tx_poll_stop(net);
hdr_size = vq-hdr_size;
 
-- 
1.7.0.18.g0d53a5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Anthony Liguori anth...@codemonkey.ws writes:

 This really gets down to your definition of safe behaviour.  As it
 stands, if you suffer a power outage, it may lead to guest
 corruption.
 
 While we are correct in advertising a write-cache, write-caches are
 volatile and should a drive lose power, it could lead to data
 corruption.  Enterprise disks tend to have battery backed write
 caches to prevent this.
 
 In the set up you're emulating, the host is acting as a giant write
 cache.  Should your host fail, you can get data corruption.

Hi Anthony. I suspected my post might spark an interesting discussion!

Before considering anything like this, we did quite a bit of testing with
OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
NTFS filesystems despite these efforts.

Is your claim here that:-

  (a) qemu doesn't emulate a disk write cache correctly; or

  (b) operating systems are inherently unsafe running on top of a disk with
  a write-cache; or

  (c) installations that are already broken and lose data with a physical
  drive with a write-cache can lose much more in this case because the
  write cache is much bigger?

Following Christoph Hellwig's patch series from last September, I'm pretty
convinced that (a) isn't true apart from the inability to disable the
write-cache at run-time, which is something that neither recent linux nor
windows seem to want to do out-of-the box.

Given that modern SATA drives come with fairly substantial write-caches
nowadays which operating systems leave on without widespread disaster, I
don't really believe in (b) either, at least for the ide and scsi case.
Filesystems know they have to flush the disk cache to avoid corruption.
(Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so
I know virtio-blk has to be avoided for current windows and obsolete linux
when writeback caching is on.)

I can certainly imagine (c) might be the case, although when I use strace to
watch the IO to the block device, I see pretty regular fdatasyncs being
issued by the guests, interleaved with the writes, so I'm not sure how
likely the problem would be in practice. Perhaps my test guests were
unrepresentatively well-behaved.

However, the potentially unlimited time-window for loss of incorrectly
unsynced data is also something one could imagine fixing at the qemu level.
Perhaps I should be implementing something like
cache=writeback,flushtimeout=N which, upon a write being issued to the block
device, starts an N second timer if it isn't already running. The timer is
destroyed on flush, and if it expires before it's destroyed, a gratuitous
flush is sent. Do you think this is worth doing? Just a simple 'while sleep
10; do sync; done' on the host even!

We've used cache=none and cache=writethrough, and whilst performance is fine
with a single guest accessing a disk, when we chop the disks up with LVM and
run a even a small handful of guests, the constant seeking to serve tiny
synchronous IOs leads to truly abysmal throughput---we've seen less than
700kB/s streaming write rates within guests when the backing store is
capable of 100MB/s.

With cache=writeback, there's still IO contention between guests, but the
write granularity is a bit coarser, so the host's elevator seems to get a
bit more of a chance to help us out and we can at least squeeze out 5-10MB/s
from two or three concurrently running guests, getting a total of 20-30% of
the performance of the underlying block device rather than a total of around
5%.

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Avi Kivity a...@redhat.com writes:

 On 03/15/2010 10:23 PM, Chris Webb wrote:

 Wasteful duplication of page cache between guest and host notwithstanding,
 turning on cache=writeback is a spectacular performance win for our guests.
 
 Is this with qcow2, raw file, or direct volume access?

This is with direct access to logical volumes. No file systems or qcow2 in
the stack. Our typical host has a couple of SATA disks, combined in md
RAID1, chopped up into volumes with LVM2 (really just dm linear targets).
The performance measured outside qemu is excellent, inside qemu-kvm is fine
too until multiple guests are trying to access their drives at once, but
then everything starts to grind badly.

 I can understand it for qcow2, but for direct volume access this
 shouldn't happen.  The guest schedules as many writes as it can,
 followed by a sync.  The host (and disk) can then reschedule them
 whether they are in the writeback cache or in the block layer, and
 must sync in the same way once completed.

I don't really understand what's going on here, but I wonder if the
underlying problem might be that all the O_DIRECT/O_SYNC writes from the
guests go down into the same block device at the bottom of the device mapper
stack, and thus can't be reordered with respect to one another. For our
purposes,

  Guest AA   Guest BB   Guest AA   Guest BB   Guest AA   Guest BB
  write A1  write A1 write B1
 write B1   write A2  write A1
  write A2 write B1   write A2

are all equivalent, but the system isn't allowed to reorder in this way
because there isn't a separate request queue for each logical volume, just
the one at the bottom. (I don't know whether nested request queues would
behave remotely reasonably either, though!)

Also, if my guest kernel issues (say) three small writes, one at the start
of the disk, one in the middle, one at the end, and then does a flush, can
virtio really express this as one non-contiguous O_DIRECT write (the three
components of which can be reordered by the elevator with respect to one
another) rather than three distinct O_DIRECT writes which can't be permuted?
Can qemu issue a write like that? cache=writeback + flush allows this to be
optimised by the block layer in the normal way.

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Anthony Liguori

On 03/17/2010 10:14 AM, Chris Webb wrote:

Anthony Liguorianth...@codemonkey.ws  writes:

   

This really gets down to your definition of safe behaviour.  As it
stands, if you suffer a power outage, it may lead to guest
corruption.

While we are correct in advertising a write-cache, write-caches are
volatile and should a drive lose power, it could lead to data
corruption.  Enterprise disks tend to have battery backed write
caches to prevent this.

In the set up you're emulating, the host is acting as a giant write
cache.  Should your host fail, you can get data corruption.
 

Hi Anthony. I suspected my post might spark an interesting discussion!

Before considering anything like this, we did quite a bit of testing with
OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
NTFS filesystems despite these efforts.

Is your claim here that:-

   (a) qemu doesn't emulate a disk write cache correctly; or

   (b) operating systems are inherently unsafe running on top of a disk with
   a write-cache; or

   (c) installations that are already broken and lose data with a physical
   drive with a write-cache can lose much more in this case because the
   write cache is much bigger?
   


This is the closest to the most accurate.

It basically boils down to this: most enterprises use a disks with 
battery backed write caches.  Having the host act as a giant write cache 
means that you can lose data.


I agree that a well behaved file system will not become corrupt, but my 
contention is that for many types of applications, data lose == 
corruption and not all file systems are well behaved.  And it's 
certainly valid to argue about whether common filesystems are broken 
but from a purely pragmatic perspective, this is going to be the case.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1

2010-03-17 Thread Anthony Liguori

On 03/08/2010 08:34 AM, Chris Webb wrote:

During boot, the screen gets resized to height 1 and a mouse click at this
point will cause a division by zero when calculating the absolute pointer
position from the pixel (x, y). Return a click in the middle of the screen
instead in this case.

Signed-off-by: Chris Webbch...@arachsys.com
   

Applied.  Thanks.

Regards,

Anthony Liguori

---
  vnc.c |6 --
  1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/vnc.c b/vnc.c
index 01353a9..676a707 100644
--- a/vnc.c
+++ b/vnc.c
@@ -1457,8 +1457,10 @@ static void pointer_event(VncState *vs, int button_mask, 
int x, int y)
  dz = 1;

  if (vs-absolute) {
-kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs-ds) - 1),
-y * 0x7FFF / (ds_get_height(vs-ds) - 1),
+kbd_mouse_event(ds_get_width(vs-ds)  1 ?
+  x * 0x7FFF / (ds_get_width(vs-ds) - 1) : 0x4000,
+ds_get_height(vs-ds)  1 ?
+  y * 0x7FFF / (ds_get_height(vs-ds) - 1) : 0x4000,
  dz, buttons);
  } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) {
  x -= 0x7FFF;
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.

2010-03-17 Thread Paul Brook
 On 03/16/2010 10:10 PM, Blue Swirl wrote:
Yes, and is what tlb_protect_code() does and it's called from
  tb_alloc_page() which is what's code when a TB is created.
 
  Just a tangential note: a long time ago, I tried to disable self
  modifying code detection for Sparc. On most RISC architectures, SMC
  needs explicit flushing so in theory we need not track code memory
  writes. However, during exceptions the translator needs to access the
  original unmodified code that was used to generate the TB. But maybe
  there are other ways to avoid SMC tracking, on x86 it's still needed
 
 On x86 you're supposed to execute a serializing instruction (one of
 INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control
 register, with the exception of MOV CR8), MOV (to debug register),
 WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code.

Last time I checked, a jump instruction was sufficient to ensure coherency 
withing a core.  Serializing instructions are only required for coherency 
between cores on SMP systems.

QEMU effectively has a very large physically tagged icache[1] with very 
expensive cache loads.  AFAIK The only practical way to maintain that cache on 
x86 targets is to do write snooping via dirty bits. On targets that mandate 
explicit icache invalidation we might be able to get away with this, however I 
doubt it actually gains you anything - a correctly written guest is going to 
invalidate at least as much as we get from dirty tracking, and we still need 
to provide correct behaviour when executing with cache disabled.

  but I suppose SMC is pretty rare.
 
 Every time you demand load a code page from disk, you're running self
 modifying code (though it usually doesn't exist in the tlb, so there's
 no previous version that can cause trouble).

I think you're confusing TLB flushes with TB flushes.

Paul

[1] Even modern x86 only have relatively small icache. The large L2/L3 caches 
aren't relevant as they are unified I/D caches.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 05:24 PM, Chris Webb wrote:

Avi Kivitya...@redhat.com  writes:

   

On 03/15/2010 10:23 PM, Chris Webb wrote:

 

Wasteful duplication of page cache between guest and host notwithstanding,
turning on cache=writeback is a spectacular performance win for our guests.
   

Is this with qcow2, raw file, or direct volume access?
 

This is with direct access to logical volumes. No file systems or qcow2 in
the stack. Our typical host has a couple of SATA disks, combined in md
RAID1, chopped up into volumes with LVM2 (really just dm linear targets).
The performance measured outside qemu is excellent, inside qemu-kvm is fine
too until multiple guests are trying to access their drives at once, but
then everything starts to grind badly.

   


OK.


I can understand it for qcow2, but for direct volume access this
shouldn't happen.  The guest schedules as many writes as it can,
followed by a sync.  The host (and disk) can then reschedule them
whether they are in the writeback cache or in the block layer, and
must sync in the same way once completed.
 

I don't really understand what's going on here, but I wonder if the
underlying problem might be that all the O_DIRECT/O_SYNC writes from the
guests go down into the same block device at the bottom of the device mapper
stack, and thus can't be reordered with respect to one another.


They should be reorderable.  Otherwise host filesystems on several 
volumes would suffer the same problems.


Whether the filesystem is in the host or guest shouldn't matter.


For our
purposes,

   Guest AA   Guest BB   Guest AA   Guest BB   Guest AA   Guest BB
   write A1  write A1 write B1
  write B1   write A2  write A1
   write A2 write B1   write A2

are all equivalent, but the system isn't allowed to reorder in this way
because there isn't a separate request queue for each logical volume, just
the one at the bottom. (I don't know whether nested request queues would
behave remotely reasonably either, though!)

Also, if my guest kernel issues (say) three small writes, one at the start
of the disk, one in the middle, one at the end, and then does a flush, can
virtio really express this as one non-contiguous O_DIRECT write (the three
components of which can be reordered by the elevator with respect to one
another) rather than three distinct O_DIRECT writes which can't be permuted?
Can qemu issue a write like that? cache=writeback + flush allows this to be
optimised by the block layer in the normal way.
   


Guest side virtio will send this as three requests followed by a flush.  
Qemu will issue these as three distinct requests and then flush.  The 
requests are marked, as Christoph says, in a way that limits their 
reorderability, and perhaps if we fix these two problems performance 
will improve.


Something that comes to mind is merging of flush requests.  If N guests 
issue one write and one flush each, we should issue N writes and just 
one flush - a flush for the disk applies to all volumes on that disk.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Balbir Singh
* Anthony Liguori anth...@codemonkey.ws [2010-03-17 10:55:47]:

 On 03/17/2010 10:14 AM, Chris Webb wrote:
 Anthony Liguorianth...@codemonkey.ws  writes:
 
 This really gets down to your definition of safe behaviour.  As it
 stands, if you suffer a power outage, it may lead to guest
 corruption.
 
 While we are correct in advertising a write-cache, write-caches are
 volatile and should a drive lose power, it could lead to data
 corruption.  Enterprise disks tend to have battery backed write
 caches to prevent this.
 
 In the set up you're emulating, the host is acting as a giant write
 cache.  Should your host fail, you can get data corruption.
 Hi Anthony. I suspected my post might spark an interesting discussion!
 
 Before considering anything like this, we did quite a bit of testing with
 OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
 power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
 NTFS filesystems despite these efforts.
 
 Is your claim here that:-
 
(a) qemu doesn't emulate a disk write cache correctly; or
 
(b) operating systems are inherently unsafe running on top of a disk with
a write-cache; or
 
(c) installations that are already broken and lose data with a physical
drive with a write-cache can lose much more in this case because the
write cache is much bigger?
 
 This is the closest to the most accurate.
 
 It basically boils down to this: most enterprises use a disks with
 battery backed write caches.  Having the host act as a giant write
 cache means that you can lose data.
 

Dirty limits can help control how much we lose, but also affect how
much we write out.

 I agree that a well behaved file system will not become corrupt, but
 my contention is that for many types of applications, data lose ==
 corruption and not all file systems are well behaved.  And it's
 certainly valid to argue about whether common filesystems are
 broken but from a purely pragmatic perspective, this is going to
 be the case.


I think it is a trade-off for end users to decide on. cache=writeback
does provide performance benefits, but can cause data loss.


-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.

2010-03-17 Thread Avi Kivity

On 03/17/2010 06:06 PM, Paul Brook wrote:

On 03/16/2010 10:10 PM, Blue Swirl wrote:
 

   Yes, and is what tlb_protect_code() does and it's called from
tb_alloc_page() which is what's code when a TB is created.
 

Just a tangential note: a long time ago, I tried to disable self
modifying code detection for Sparc. On most RISC architectures, SMC
needs explicit flushing so in theory we need not track code memory
writes. However, during exceptions the translator needs to access the
original unmodified code that was used to generate the TB. But maybe
there are other ways to avoid SMC tracking, on x86 it's still needed
   

On x86 you're supposed to execute a serializing instruction (one of
INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control
register, with the exception of MOV CR8), MOV (to debug register),
WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code.
 

Last time I checked, a jump instruction was sufficient to ensure coherency
withing a core.  Serializing instructions are only required for coherency
between cores on SMP systems.
   


Yeah, the docs say either a jump or a serializing instruction is needed.


QEMU effectively has a very large physically tagged icache[1] with very
expensive cache loads.  AFAIK The only practical way to maintain that cache on
x86 targets is to do write snooping via dirty bits. On targets that mandate
explicit icache invalidation we might be able to get away with this, however I
doubt it actually gains you anything - a correctly written guest is going to
invalidate at least as much as we get from dirty tracking, and we still need
to provide correct behaviour when executing with cache disabled.
   


Agreed.

   

but I suppose SMC is pretty rare.
   

Every time you demand load a code page from disk, you're running self
modifying code (though it usually doesn't exist in the tlb, so there's
no previous version that can cause trouble).
 

I think you're confusing TLB flushes with TB flushes.
   


No - my thinking was page fault, load page, invlpg, continue.  But the 
invlpg is unneeded, and continue has to include a jump anyway.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Anthony Liguori anth...@codemonkey.ws writes:

 On 03/17/2010 10:14 AM, Chris Webb wrote:
(c) installations that are already broken and lose data with a physical
drive with a write-cache can lose much more in this case because the
write cache is much bigger?
 
 This is the closest to the most accurate.
 
 It basically boils down to this: most enterprises use a disks with
 battery backed write caches.  Having the host act as a giant write
 cache means that you can lose data.
 
 I agree that a well behaved file system will not become corrupt, but
 my contention is that for many types of applications, data lose ==
 corruption and not all file systems are well behaved.  And it's
 certainly valid to argue about whether common filesystems are
 broken but from a purely pragmatic perspective, this is going to
 be the case.

Okay. What I was driving at in describing these systems as 'already broken'
is that they will already lose data (in this sense) if they're run on bare
metal with normal commodity SATA disks with their 32MB write caches on. That
configuration surely describes the vast majority of PC-class desktops and
servers!

If I understand correctly, your point here is that the small cache on a real
SATA drive gives a relatively small time window for data loss, whereas the
worry with cache=writeback is that the host page cache can be gigabytes, so
the time window for unsynced data to be lost is potentially enormous.

Isn't the fix for that just forcing periodic sync on the host to bound-above
the time window for unsynced data loss in the guest?

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] Don't call apic functions directly from kvm code

2010-03-17 Thread Avi Kivity

On 03/17/2010 04:00 PM, Glauber Costa wrote:

On Tue, Mar 09, 2010 at 03:27:02PM +0200, Avi Kivity wrote:
   

On 02/26/2010 10:12 PM, Glauber Costa wrote:
 

It is actually not necessary to call a tpr function to save and load cr8,
as cr8 is part of the processor state, and thus, it is much easier
to just add it to CPUState.

As for apic base, wrap kvm usages, so we can call either the qemu device,
or the in kernel version.


  }

+static void kvm_set_apic_base(CPUState *env, uint64_t val)
+{
+if (!kvm_irqchip_in_kernel())
+cpu_set_apic_base(env, val);
   

What if it is in kernel?  Just ignored?  Doesn't seem right.
 

At this point it is right, because there is no irqchip in kernel yet.

In a later patch, irqchip in kernel begins to exist, and this function
gets filled.
   


Ok.  In the future please code things like that without the if (), and 
add it when you introduce the other side.  Helps fend off nit-pickers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 06:22 PM, Avi Kivity wrote:
Also, if my guest kernel issues (say) three small writes, one at the 
start
of the disk, one in the middle, one at the end, and then does a 
flush, can
virtio really express this as one non-contiguous O_DIRECT write (the 
three

components of which can be reordered by the elevator with respect to one
another) rather than three distinct O_DIRECT writes which can't be 
permuted?
Can qemu issue a write like that? cache=writeback + flush allows this 
to be

optimised by the block layer in the normal way.



Guest side virtio will send this as three requests followed by a 
flush.  Qemu will issue these as three distinct requests and then 
flush.  The requests are marked, as Christoph says, in a way that 
limits their reorderability, and perhaps if we fix these two problems 
performance will improve.


Something that comes to mind is merging of flush requests.  If N 
guests issue one write and one flush each, we should issue N writes 
and just one flush - a flush for the disk applies to all volumes on 
that disk.




Chris, can you carry out an experiment?  Write a program that pwrite()s 
a byte to a file at the same location repeatedly, with the file opened 
using O_SYNC.  Measure the write rate, and run blktrace on the host to 
see what the disk (/dev/sda, not the volume) sees.  Should be a (write, 
flush, write, flush) per pwrite pattern or similar (for writing the data 
and a journal block, perhaps even three writes will be needed).


Then scale this across multiple guests, measure and trace again.  If 
we're lucky, the flushes will be coalesced, if not, we need to work on it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Avi Kivity a...@redhat.com writes:

 Chris, can you carry out an experiment?  Write a program that
 pwrite()s a byte to a file at the same location repeatedly, with the
 file opened using O_SYNC.  Measure the write rate, and run blktrace
 on the host to see what the disk (/dev/sda, not the volume) sees.
 Should be a (write, flush, write, flush) per pwrite pattern or
 similar (for writing the data and a journal block, perhaps even
 three writes will be needed).
 
 Then scale this across multiple guests, measure and trace again.  If
 we're lucky, the flushes will be coalesced, if not, we need to work
 on it.

Sure, sounds like an excellent plan. I don't have a test machine at the
moment as the last host I was using for this has gone into production, but
I'm due to get another one to install later today or first thing tomorrow
which would be ideal for doing this. I'll follow up with the results once I
have them.

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote:
 They should be reorderable.  Otherwise host filesystems on several 
 volumes would suffer the same problems.

They are reordable, just not as extremly as the the page cache.
Remember that the request queue really is just a relatively small queue
of outstanding I/O, and that is absolutely intentional.  Large scale
_caching_ is done by the VM in the pagecache, with all the usual aging,
pressure, etc algorithms applied to it.  The block devices have a
relatively small fixed size request queue associated with it to
facilitate request merging and limited reordering and having fully
set up I/O requests for the device.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 06:47 PM, Chris Webb wrote:

Avi Kivitya...@redhat.com  writes:

   

Chris, can you carry out an experiment?  Write a program that
pwrite()s a byte to a file at the same location repeatedly, with the
file opened using O_SYNC.  Measure the write rate, and run blktrace
on the host to see what the disk (/dev/sda, not the volume) sees.
Should be a (write, flush, write, flush) per pwrite pattern or
similar (for writing the data and a journal block, perhaps even
three writes will be needed).

Then scale this across multiple guests, measure and trace again.  If
we're lucky, the flushes will be coalesced, if not, we need to work
on it.
 

Sure, sounds like an excellent plan. I don't have a test machine at the
moment as the last host I was using for this has gone into production, but
I'm due to get another one to install later today or first thing tomorrow
which would be ideal for doing this. I'll follow up with the results once I
have them.
   


Meanwhile I looked at the code, and it looks bad.  There is an 
IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before 
issuing it.  In any case, qemu doesn't use it as far as I could tell, 
and even if it did, device-matter doesn't implement the needed 
-aio_fsync() operation.


So, there's a lot of plubming needed before we can get cache flushes 
merged into each other.  Given cache=writeback does allow merging, I 
think we explained part of the problem at least.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1

2010-03-17 Thread Alexander Graf
Anthony Liguori wrote:
 On 03/08/2010 08:34 AM, Chris Webb wrote:
 During boot, the screen gets resized to height 1 and a mouse click at
 this
 point will cause a division by zero when calculating the absolute
 pointer
 position from the pixel (x, y). Return a click in the middle of the
 screen
 instead in this case.

 Signed-off-by: Chris Webbch...@arachsys.com

 Applied.  Thanks.

Also queued it to stable?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote:
 Chris, can you carry out an experiment?  Write a program that pwrite()s 
 a byte to a file at the same location repeatedly, with the file opened 
 using O_SYNC.  Measure the write rate, and run blktrace on the host to 
 see what the disk (/dev/sda, not the volume) sees.  Should be a (write, 
 flush, write, flush) per pwrite pattern or similar (for writing the data 
 and a journal block, perhaps even three writes will be needed).
 
 Then scale this across multiple guests, measure and trace again.  If 
 we're lucky, the flushes will be coalesced, if not, we need to work on it.

As the person who has written quite a bit of the current O_SYNC
implementation and also reviewed the rest of it I can tell you that
those flushes won't be coalesced.  If we always rewrite the same block
we do the cache flush from the fsync method and there's is nothing
to coalesced it there.  If you actually do modify metadata (e.g. by
using the new real O_SYNC instead of the old one that always was O_DSYNC
that I introduced in 2.6.33 but that isn't picked up by userspace yet)
you might hit a very limited transaction merging window in some
filesystems, but it's generally very small for a good reason.  If it
were too large we'd make the once progress wait for I/O in another just
because we might expect transactions to coalesced later.  There's been
some long discussion about that fsync transaction batching tuning
for ext3 a while ago.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig
On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote:
 Meanwhile I looked at the code, and it looks bad.  There is an 
 IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before 
 issuing it.  In any case, qemu doesn't use it as far as I could tell, 
 and even if it did, device-matter doesn't implement the needed 
 -aio_fsync() operation.

No one implements it, and all surrounding code is dead wood.  It would
require us to do asynchronous pagecache operations, which involve
major surgery of the VM code.  Patches to do this were rejected multiple
times.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 06:58 PM, Christoph Hellwig wrote:

On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote:
   

Meanwhile I looked at the code, and it looks bad.  There is an
IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before
issuing it.  In any case, qemu doesn't use it as far as I could tell,
and even if it did, device-matter doesn't implement the needed
-aio_fsync() operation.
 

No one implements it, and all surrounding code is dead wood.  It would
require us to do asynchronous pagecache operations, which involve
major surgery of the VM code.  Patches to do this were rejected multiple
times.
   


Pity.  What about the O_DIRECT aio case?  It's ridiculous that you can 
submit async write requests but have to wait synchronously for them to 
actually hit the disk if you have a write cache.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 06:52 PM, Christoph Hellwig wrote:

On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote:
   

They should be reorderable.  Otherwise host filesystems on several
volumes would suffer the same problems.
 

They are reordable, just not as extremly as the the page cache.
Remember that the request queue really is just a relatively small queue
of outstanding I/O, and that is absolutely intentional.  Large scale
_caching_ is done by the VM in the pagecache, with all the usual aging,
pressure, etc algorithms applied to it.


We already have the large scale caching and stuff running in the guest.  
We have a stream of optimized requests coming out of guests, running the 
same algorithm again shouldn't improve things.  The host has an 
opportunity to do inter-guest optimization, but given each guest has its 
own disk area, I don't see how any reordering or merging could help here 
(beyond sorting guests according to disk order).



The block devices have a
relatively small fixed size request queue associated with it to
facilitate request merging and limited reordering and having fully
set up I/O requests for the device.
   


We should enlarge the queues, increase request reorderability, and merge 
flushes (delay flushes until after unrelated writes, then adjacent 
flushes can be collapsed).


Collapsing flushes should get us better than linear scaling (since we 
collapes N writes + M flushes into N writes and 1 flush).  However the 
writes themselves scale worse than linearly, since they now span a 
larger disk space and cause higher seek penalties.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity

On 03/17/2010 06:57 PM, Christoph Hellwig wrote:

On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote:
   

Chris, can you carry out an experiment?  Write a program that pwrite()s
a byte to a file at the same location repeatedly, with the file opened
using O_SYNC.  Measure the write rate, and run blktrace on the host to
see what the disk (/dev/sda, not the volume) sees.  Should be a (write,
flush, write, flush) per pwrite pattern or similar (for writing the data
and a journal block, perhaps even three writes will be needed).

Then scale this across multiple guests, measure and trace again.  If
we're lucky, the flushes will be coalesced, if not, we need to work on it.
 

As the person who has written quite a bit of the current O_SYNC
implementation and also reviewed the rest of it I can tell you that
those flushes won't be coalesced.  If we always rewrite the same block
we do the cache flush from the fsync method and there's is nothing
to coalesced it there.  If you actually do modify metadata (e.g. by
using the new real O_SYNC instead of the old one that always was O_DSYNC
that I introduced in 2.6.33 but that isn't picked up by userspace yet)
you might hit a very limited transaction merging window in some
filesystems, but it's generally very small for a good reason.  If it
were too large we'd make the once progress wait for I/O in another just
because we might expect transactions to coalesced later.  There's been
some long discussion about that fsync transaction batching tuning
for ext3 a while ago.
   


I definitely don't expect flush merging for a single guest, but for 
multiple guests there is certainly an opportunity for merging.  Most 
likely we don't take advantage of it and that's one of the problems.  
Copying data into pagecache so that we can merge the flushes seems like 
a very unsatisfactory implementation.





--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Vivek Goyal
On Wed, Mar 17, 2010 at 03:14:10PM +, Chris Webb wrote:
 Anthony Liguori anth...@codemonkey.ws writes:
 
  This really gets down to your definition of safe behaviour.  As it
  stands, if you suffer a power outage, it may lead to guest
  corruption.
  
  While we are correct in advertising a write-cache, write-caches are
  volatile and should a drive lose power, it could lead to data
  corruption.  Enterprise disks tend to have battery backed write
  caches to prevent this.
  
  In the set up you're emulating, the host is acting as a giant write
  cache.  Should your host fail, you can get data corruption.
 
 Hi Anthony. I suspected my post might spark an interesting discussion!
 
 Before considering anything like this, we did quite a bit of testing with
 OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
 power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
 NTFS filesystems despite these efforts.
 
 Is your claim here that:-
 
   (a) qemu doesn't emulate a disk write cache correctly; or
 
   (b) operating systems are inherently unsafe running on top of a disk with
   a write-cache; or
 
   (c) installations that are already broken and lose data with a physical
   drive with a write-cache can lose much more in this case because the
   write cache is much bigger?
 
 Following Christoph Hellwig's patch series from last September, I'm pretty
 convinced that (a) isn't true apart from the inability to disable the
 write-cache at run-time, which is something that neither recent linux nor
 windows seem to want to do out-of-the box.
 
 Given that modern SATA drives come with fairly substantial write-caches
 nowadays which operating systems leave on without widespread disaster, I
 don't really believe in (b) either, at least for the ide and scsi case.
 Filesystems know they have to flush the disk cache to avoid corruption.
 (Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so
 I know virtio-blk has to be avoided for current windows and obsolete linux
 when writeback caching is on.)
 
 I can certainly imagine (c) might be the case, although when I use strace to
 watch the IO to the block device, I see pretty regular fdatasyncs being
 issued by the guests, interleaved with the writes, so I'm not sure how
 likely the problem would be in practice. Perhaps my test guests were
 unrepresentatively well-behaved.
 
 However, the potentially unlimited time-window for loss of incorrectly
 unsynced data is also something one could imagine fixing at the qemu level.
 Perhaps I should be implementing something like
 cache=writeback,flushtimeout=N which, upon a write being issued to the block
 device, starts an N second timer if it isn't already running. The timer is
 destroyed on flush, and if it expires before it's destroyed, a gratuitous
 flush is sent. Do you think this is worth doing? Just a simple 'while sleep
 10; do sync; done' on the host even!
 
 We've used cache=none and cache=writethrough, and whilst performance is fine
 with a single guest accessing a disk, when we chop the disks up with LVM and
 run a even a small handful of guests, the constant seeking to serve tiny
 synchronous IOs leads to truly abysmal throughput---we've seen less than
 700kB/s streaming write rates within guests when the backing store is
 capable of 100MB/s.
 
 With cache=writeback, there's still IO contention between guests, but the
 write granularity is a bit coarser, so the host's elevator seems to get a
 bit more of a chance to help us out and we can at least squeeze out 5-10MB/s
 from two or three concurrently running guests, getting a total of 20-30% of
 the performance of the underlying block device rather than a total of around
 5%.

Hi Chris,

Are you using CFQ in the host? What is the host kernel version? I am not sure
what is the problem here but you might want to play with IO controller and put
these guests in individual cgroups and see if you get better throughput even
with cache=writethrough.

If the problem is that if sync writes from different guests get intermixed
resulting in more seeks, IO controller might help as these writes will now
go on different group service trees and in CFQ, we try to service requests
from one service tree at a time for a period before we switch the service
tree.

The issue will be that all the logic is in CFQ and it works at leaf nodes
of storage stack and not at LVM nodes. So first you might want to try it with
single partitioned disk. If it helps, then it might help with LVM
configuration also (IO control working at leaf nodes).

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: fix error handling in vring ioctls

2010-03-17 Thread Laurent Chavey
Acked-by: cha...@google.com


On Wed, Mar 17, 2010 at 7:42 AM, Michael S. Tsirkin m...@redhat.com wrote:
 Stanse found a locking problem in vhost_set_vring:
 several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
 VHOST_SET_VRING_ERR with the vq-mutex held.
 Fix these up.

 Reported-by: Jiri Slaby jirisl...@gmail.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/vhost/vhost.c |   18 --
  1 files changed, 12 insertions(+), 6 deletions(-)

 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 7cd55e0..7bd7a1e 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
                if (r  0)
                        break;
                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 -               if (IS_ERR(eventfp))
 -                       return PTR_ERR(eventfp);
 +               if (IS_ERR(eventfp)) {
 +                       r = PTR_ERR(eventfp);
 +                       break;
 +               }
                if (eventfp != vq-kick) {
                        pollstop = filep = vq-kick;
                        pollstart = vq-kick = eventfp;
 @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
                if (r  0)
                        break;
                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 -               if (IS_ERR(eventfp))
 -                       return PTR_ERR(eventfp);
 +               if (IS_ERR(eventfp)) {
 +                       r = PTR_ERR(eventfp);
 +                       break;
 +               }
                if (eventfp != vq-call) {
                        filep = vq-call;
                        ctx = vq-call_ctx;
 @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
                if (r  0)
                        break;
                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 -               if (IS_ERR(eventfp))
 -                       return PTR_ERR(eventfp);
 +               if (IS_ERR(eventfp)) {
 +                       r = PTR_ERR(eventfp);
 +                       break;
 +               }
                if (eventfp != vq-error) {
                        filep = vq-error;
                        vq-error = eventfp;
 --
 1.7.0.18.g0d53a5
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset

2010-03-17 Thread Alexander Graf
Eduardo Habkost wrote:
 svm_vcpu_reset() was not properly resetting the contents of the guest-visible
 cr0 register, causing the following issue:
 https://bugzilla.redhat.com/show_bug.cgi?id=525699

 Without resetting cr0 properly, the vcpu was running the SIPI bootstrap 
 routine
 with paging enabled, making the vcpu get a pagefault exception while trying to
 run it.

 Instead of setting vmcb-save.cr0 directly, the new code just resets
 kvm-arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
 vmcb-save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().

 kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
 kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.

 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
   

Should this go into -stable?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm crashes with Assertion ... failed.

2010-03-17 Thread Marcelo Tosatti
On Sun, Mar 14, 2010 at 09:57:52AM +0100, André Weidemann wrote:
 Hi,
 I cloned the qemu-kvm git repository today with git clone
 git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
 qemu-kvm-2010-03-14, ran configure and compiled it and did a make
 install. Everything went fine without warnings or errors.
 For configure output take a look here: http://pastebin.com/BL4DYCRY
 
 Here is my Server Hardware:
 Asus P5Q Mainbaord
 Intel Q9300
 8GB RAM
 RAID5 with mdadm consisting of 4x 1TB disks
 The volume /dev/storage/Windows7test mentioned below is on this RAID5.
 
 I ran my virtual machine with the following command:
 
 qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc
 192.168.3.42:2 -k de -smp 4,cores=4 -drive
 file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m
 1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net
 tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name
 Windows7test,process=Windows7test -drive
 file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native

Andre,

Can you try qemu-kvm-0.12.3 ?

 Windows7Test_600G.img is a qcow2 file and contains a Windows 7 Pro image.
 /dev/storage/Windows7test is formated with XFS
 
 After starting the machine with the above command line, I booted
 into an Ubuntu 9.10 x86_64 Live Image via PXE and mounted /dev/sdb1
 (/dev/storage/Windows7test) under /mnt. I then did cd /mnt/ and
 ran iozone -Ra -g 2G -b /tmp/iozone-aoi-linux-xls
 
 iozone ran some test and then kvm simply quit with the following
 error message:
 qemu-system-x86_64:
 /usr/local/src/qemu-kvm-2010-03-10/hw/ide/internal.h:510:
 bmdma_active_if: Assertion `bmdma-unit != (uint8_t)-1' failed.
 
 /var/log/syslog contained the folowing:
 Mar 14 09:18:14 server kernel: [318080.627468] kvm: 1361: cpu0
 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
 Mar 14 09:18:14 server  kernel: [318080.627473] kvm: 1361: cpu0
 kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
 Mar 14 09:18:14 server kernel: [318080.627476] kvm: 1361: cpu0
 unhandled wrmsr: 0x400 data 
 Mar 14 09:18:14 server kernel: [318080.627506] kvm: 1361: cpu1
 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
 Mar 14 09:18:14 server  kernel: [318080.627509] kvm: 1361: cpu1
 kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
 Mar 14 09:18:14 server kernel: [318080.627511] kvm: 1361: cpu1
 unhandled wrmsr: 0x400 data 
 Mar 14 09:18:14 server kernel: [318080.627538] kvm: 1361: cpu2
 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
 Mar 14 09:18:14 server kernel: [318080.627540] kvm: 1361: cpu2
 kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
 Mar 14 09:18:14 server kernel: [318080.627543] kvm: 1361: cpu2
 unhandled wrmsr: 0x400 data 
 
 
 I ws able to reproduce this error 3 times in a row.
 
 Regards,
  André
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Disassociate direct maps from guest levels

2010-03-17 Thread Marcelo Tosatti
On Sun, Mar 14, 2010 at 10:22:52AM +0200, Avi Kivity wrote:
 Direct maps are linear translations for a section of memory, used for
 real mode or with large pages.  As such, they are independent of the guest
 levels.
 
 Teach the mmu about this by making page-role.glevels = 0 for direct maps.
 This allows direct maps to be shared among real mode and the various paging
 modes.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/mmu.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index b137515..a984bc1 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1328,6 +1328,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
 kvm_vcpu *vcpu,
   role = vcpu-arch.mmu.base_role;
   role.level = level;
   role.direct = direct;
 + if (role.direct)
 + role.glevels = 0;
   role.access = access;
   if (vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
   quadrant = gaddr  (PAGE_SHIFT + (PT64_PT_BITS * level));
 -- 
 1.7.0.2

Isnt this what happens already, since for tdp base_role.glevels is not 
initialized?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Alexander Graf
Marcelo Tosatti wrote:
 On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote:
   
 Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
 supported processors.

 Signed-off-by: Sheng Yang sh...@linux.intel.com
 

 Applied, thanks.
   

So without this patch kvm breaks with ept=0? Sounds like a stable
candidate to me.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: fix error handling in vring ioctls

2010-03-17 Thread Laurent Chavey
Acked-by: Laurent Chavey cha...@google.com

On Wed, Mar 17, 2010 at 10:54 AM, Laurent Chavey cha...@google.com wrote:
 Acked-by: cha...@google.com


 On Wed, Mar 17, 2010 at 7:42 AM, Michael S. Tsirkin m...@redhat.com wrote:
 Stanse found a locking problem in vhost_set_vring:
 several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
 VHOST_SET_VRING_ERR with the vq-mutex held.
 Fix these up.

 Reported-by: Jiri Slaby jirisl...@gmail.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/vhost/vhost.c |   18 --
  1 files changed, 12 insertions(+), 6 deletions(-)

 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 7cd55e0..7bd7a1e 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
                if (r  0)
                        break;
                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 -               if (IS_ERR(eventfp))
 -                       return PTR_ERR(eventfp);
 +               if (IS_ERR(eventfp)) {
 +                       r = PTR_ERR(eventfp);
 +                       break;
 +               }
                if (eventfp != vq-kick) {
                        pollstop = filep = vq-kick;
                        pollstart = vq-kick = eventfp;
 @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
                if (r  0)
                        break;
                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 -               if (IS_ERR(eventfp))
 -                       return PTR_ERR(eventfp);
 +               if (IS_ERR(eventfp)) {
 +                       r = PTR_ERR(eventfp);
 +                       break;
 +               }
                if (eventfp != vq-call) {
                        filep = vq-call;
                        ctx = vq-call_ctx;
 @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
                if (r  0)
                        break;
                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 -               if (IS_ERR(eventfp))
 -                       return PTR_ERR(eventfp);
 +               if (IS_ERR(eventfp)) {
 +                       r = PTR_ERR(eventfp);
 +                       break;
 +               }
                if (eventfp != vq-error) {
                        filep = vq-error;
                        vq-error = eventfp;
 --
 1.7.0.18.g0d53a5
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 serial ports?

2010-03-17 Thread Neo Jia
On Wed, Mar 17, 2010 at 3:35 AM, Michael Tokarev m...@tls.msk.ru wrote:
 Neo Jia wrote:
 May I ask if it is possible to bind a real physical serial port to a guest?

 It is all described in the documentation, quite a long list of
 various things you can attach to a virtual serial port, incl.
 a real one.

I have tried -serial /dev/ttyS0 but I can't use it to debug my Windows guest.

Thanks,
Neo


 /mjt




-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb
Vivek Goyal vgo...@redhat.com writes:

 Are you using CFQ in the host? What is the host kernel version? I am not sure
 what is the problem here but you might want to play with IO controller and put
 these guests in individual cgroups and see if you get better throughput even
 with cache=writethrough.

Hi. We're using the deadline IO scheduler on 2.6.32.7. We got better
performance from deadline than from cfq when we last tested, which was
admittedly around the 2.6.30 timescale so is now a rather outdated
measurement.

 If the problem is that if sync writes from different guests get intermixed
 resulting in more seeks, IO controller might help as these writes will now
 go on different group service trees and in CFQ, we try to service requests
 from one service tree at a time for a period before we switch the service
 tree.

Thanks for the suggestion: I'll have a play with this. I currently use
/sys/kernel/uids/N/cpu_share with one UID per guest to divide up the CPU
between guests, but this could just as easily be done with a cgroup per
guest if a side-effect is to provide a hint about IO independence to CFQ.

Best wishes,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 serial ports?

2010-03-17 Thread Michael Tokarev
Neo Jia wrote:
 On Wed, Mar 17, 2010 at 3:35 AM, Michael Tokarev m...@tls.msk.ru wrote:
 Neo Jia wrote:
 May I ask if it is possible to bind a real physical serial port to a guest?
 It is all described in the documentation, quite a long list of
 various things you can attach to a virtual serial port, incl.
 a real one.
 
 I have tried -serial /dev/ttyS0 but I can't use it to debug my Windows guest.

That's entirely different issue, -- inability to debug windows guests.
Please don't hijack other threads for unrelated issues -- it makes
finding information and replying more difficult.  If it does not
work for you, ask in a new thread.  But before, try to research
the issue a bit, I've seen several discussions about debugging
guests over serial port in kvm.  Besides, I've no idea what are
you really trying to do - debugging a guest is much easier in kvm
than to set up another HOST and connect two HOSTS over a null-modem
serial cable

/mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add xchg ax, reg emulator test

2010-03-17 Thread Marcelo Tosatti
On Tue, Mar 16, 2010 at 02:42:52PM +0200, Gleb Natapov wrote:
 Add test for opcodes 0x90-0x9f emulation
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c
 index bc6b27f..bfc2942 100644

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Broken loadvm ?

2010-03-17 Thread Marcelo Tosatti
On Tue, Mar 16, 2010 at 05:25:13PM +0200, Alpár Török  wrote:
 PS:  It just occurred to me , that  it does  indeed freeze and cause a
 100% CPU usage. At least i can say for sure that network, serial line,
 keyboard, nor mouse work. If  loadvm is loaded from the command line.
 If loaded from the monitor, everything seams to work, except the
 mouse.  After a -loadvm from the command line, repeating the command
 from the monitor doesn't unfreeze it.
 
 i am really stuck with this. Any help is greatly appreciated, as
 downgrading is not an option.

Upgrade to qemu-kvm-0.12.3?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rework] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling

2010-03-17 Thread Marcelo Tosatti
On Mon, Mar 15, 2010 at 10:13:30PM +0900, Takuya Yoshikawa wrote:
 kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
 mmio ring page and dev even after it has freed them.
 
 Also, if this function fails, though it might be rare, it seems to be
 suggesting the system's serious state: so we'd better stop the works
 following the kvm_creat_vm().
 
 This patch clears these problems.
 
   We move the coalesced mmio's initialization out of kvm_create_vm().
   This seems to be natural because it includes a registration which
   can be done only when vm is successfully created.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 ---
  virt/kvm/coalesced_mmio.c |2 ++
  virt/kvm/kvm_main.c   |   12 
  2 files changed, 10 insertions(+), 4 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Cleanup: change to use bool return values

2010-03-17 Thread Marcelo Tosatti
On Mon, Mar 15, 2010 at 05:29:09PM +0800, Gui Jianfeng wrote:
 Make use of bool as return values, and remove some useless
 bool value converting. Thanks Avi to point this out.
 
 Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Thomas Løcke
Hey all,

I'm working on moving from a mixture of physical servers and
virtualized servers running on Virtualbox, to a pure KVM setup. But
I'm having some problems with my Windows XP guests in my test-setup.

This is the host I'm testing on:

CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
RAM: 8GB
2x320GB WD SATA disks (one for host OS and one for KVM guest images)
2x1GBs Intel nics (bonded)

Host OS is Slackware 13 with the following kernels: 2.6.29.6-huge,
2.6.29.6-generic, 2.6.33 and 2.6.33.1

qemu-kvm is 0.12.3

My Linux guests works like a charm. When they boot up I do a single
ntpdate -b europe.pool.ntp.org and after that the time stays in near
perfect sync with the host, with no ntpd running on the guests. My
Windows XP guests on the other hand drifts backwards in time,
especially when there's load on the guest, for example when I'm
copying a large file from my samba server to the Windows XP guest. The
guest can easily lose 10 minutes while copying a 600MB file. Or if I
start a few browsers and point them at some horrible flash heavy sites
and just let them sit there, then the VM also start losing a lot of
time real fast.

This is the commandline I use to start the Windows XP guests:

qemu-system-x86_64 -hda winxppro.raw -boot c -m 1024 -vnc :1 -k da
-smp 1 -localtime -daemonize -name qemu_winxppro,process=qemu_winxppro
-net nic,macaddr=de:ad:be:ef:00:01,model=rtl8139 -net tap -runas kvm

I use the same commandline for my Linux guests, except the nic is virtio.

I'm at my wits end. I've tried the -tdf option with no success. I've
tried setting various -rtc options with no success.

Could it be I'm missing some key-component in the kernel? Or is there
perhaps some dev version of qemu-kvm I could/should try?

According to some of the #kvm residents, this should just work (tm),
but I simply cannot make it happen.

Any and all advice are more than welcome.

:o)
/Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Zachary Amsden

On 03/17/2010 09:22 AM, Thomas Løcke wrote:

Hey all,

I'm working on moving from a mixture of physical servers and
virtualized servers running on Virtualbox, to a pure KVM setup. But
I'm having some problems with my Windows XP guests in my test-setup.

This is the host I'm testing on:

CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
RAM: 8GB
2x320GB WD SATA disks (one for host OS and one for KVM guest images)
2x1GBs Intel nics (bonded)

Host OS is Slackware 13 with the following kernels: 2.6.29.6-huge,
2.6.29.6-generic, 2.6.33 and 2.6.33.1

qemu-kvm is 0.12.3
   


qemu's been changing a lot, might be best to build from the actual git 
repository, which is 0.12.50 now.



My Linux guests works like a charm. When they boot up I do a single
ntpdate -b europe.pool.ntp.org and after that the time stays in near
perfect sync with the host, with no ntpd running on the guests. My
Windows XP guests on the other hand drifts backwards in time,
especially when there's load on the guest, for example when I'm
copying a large file from my samba server to the Windows XP guest. The
guest can easily lose 10 minutes while copying a 600MB file. Or if I
start a few browsers and point them at some horrible flash heavy sites
and just let them sit there, then the VM also start losing a lot of
time real fast.
   


What's your host CPU load get up to.  You only have a single core?


This is the commandline I use to start the Windows XP guests:

qemu-system-x86_64 -hda winxppro.raw -boot c -m 1024 -vnc :1 -k da
-smp 1 -localtime -daemonize -name qemu_winxppro,process=qemu_winxppro
-net nic,macaddr=de:ad:be:ef:00:01,model=rtl8139 -net tap -runas kvm

I use the same commandline for my Linux guests, except the nic is virtio.

I'm at my wits end. I've tried the -tdf option with no success. I've
tried setting various -rtc options with no success.
   


Including -rtc-td-hack ?

Could it be I'm missing some key-component in the kernel? Or is there
perhaps some dev version of qemu-kvm I could/should try?

According to some of the #kvm residents, this should just work (tm),
but I simply cannot make it happen.

Any and all advice are more than welcome.
   


As always, make sure you are running the latest and greatest modules, 
those matter even more than the kernel, and check for any warning 
messages in dmesg and qemu output.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2972152 ] guest crash when -cpu kvm64

2010-03-17 Thread SourceForge.net
Bugs item #2972152, was opened at 2010-03-17 14:43
Message generated for change (Tracker Item Submitted) made by high33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2972152group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: libkvm
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: hugohiggins (high33)
Assigned to: Nobody/Anonymous (nobody)
Summary: guest crash when -cpu kvm64

Initial Comment:
When using -cpu kvm64 guest crashes when X starts up.  
dmesg on hypervisor says:
[6149047.906364] kvm: 29020: cpu0 unhandled rdmsr: 0xc0010112

Guest boots OK without -cpu parameter

cpu: dual opteron 2435 (12 cores total)
ram: 32gig 
host dist: ubuntu 9.04
host kernel: 2.6.28-16-generic #55-Ubuntu SMP
guest dist: xubuntu-9.10-amd64

# /usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name ubu64 localhost:69 
-M pc \
-m 2048 -boot d -vga std \
-net nic,macaddr=BA:DD:C0:FF:EE:F6,model=virtio -net vde \
-drive file=/dev/sdp,if=scsi,boot=on \
-cpu kvm64 \
-cdrom iso/xubuntu-9.10-desktop-amd64.iso -k en-us -localtime -sdl -vnc 
localhost:69 -daemonize -usbdevice tablet



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2972152group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SIGSEGV with -smp 17+, and error handling around...

2010-03-17 Thread Michael Tokarev
When run with -smp 17 or greather, kvm
fails like this:

$ kvm -smp 17
kvm_create_vcpu: Invalid argument
kvm_setup_mce FAILED: Invalid argument
KVM_SET_LAPIC failed
Segmentation fault
$ _

In qemu-kvm.c, the kvm_create_vcpu() routine
(which is used in a vcpu thread to set up
vcpu) is declared as void, i.e, no error
return.  And the code that calls it blindly
assumes that it will never fail...

But the first error message above is from kernel,
which - apparently - refuses to create 17th vCPU.
Hence we've a vcpu thread which is empty/dummy and
not even fully initialized... so it fails later
in the game.

This all looks quite... raw, not polished ;)

Can we somehow handle the (several possible) errors
in that (and other) places, and how we ever can act
on them?  Abort?  Warn the user and reduce the number
of vcpus accordingly (seems wrong, esp. if it were
some first vcpus or in the middle which failed)...

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-enable-kvm - can it be a required option?

2010-03-17 Thread Michael Tokarev
What I mean is: if asked to enable kvm but kvm
can't be initialized for some reason (lack of
virt extensions on the cpu, permission denied
and so on), can we stop with a fatal error
instead of continuing in emulated mode?

Or maybe with another option, like -require-kvm?

I understand that -enable-kvm is now in upstream
qemu too, and _there_ it means something different,
that is, it enables something that is disabled by
default.  But even with that, if user asks for
something and that something isn't available, it
seems like a good idea to stop here instead of
producing a warning and continuing...

This is especially true for kvm where -enable-kvm
is the default anyway.

I see more and more people are using this option
now in a hope that kvm will actually stop when
no virt extensions are available.  It was my
first reaction too, wow, now I can force it to
require kvm extensions instead of running 1000
times slower!.  So this has something to think
about, it looks like... ;)

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: -enable-kvm - can it be a required option?

2010-03-17 Thread Anthony Liguori

On 03/17/2010 03:18 PM, Michael Tokarev wrote:

What I mean is: if asked to enable kvm but kvm
can't be initialized for some reason (lack of
virt extensions on the cpu, permission denied
and so on), can we stop with a fatal error
instead of continuing in emulated mode?
   


What I've been thinking, is that we should make kvm enablement a -cpu 
option.  Something like:


-cpu host,accel=kvm
-cpu host,accel=tcg
-cpu host,accel=kvm:tcg

(1) would be KVM only, (2) would be TCG only, (3) would be KVM falling 
back to TCG.


What's nice about this approach, is that we already pull CPU model 
definitions from a global config file which means that you could tweak 
this parameter to your liking.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm crashes with Assertion ... failed.

2010-03-17 Thread André Weidemann

On 17.03.2010 19:22, Marcelo Tosatti wrote:

On Sun, Mar 14, 2010 at 09:57:52AM +0100, André Weidemann wrote:

Hi,
I cloned the qemu-kvm git repository today with git clone
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
qemu-kvm-2010-03-14, ran configure and compiled it and did a make
install. Everything went fine without warnings or errors.
For configure output take a look here: http://pastebin.com/BL4DYCRY

Here is my Server Hardware:
Asus P5Q Mainbaord
Intel Q9300
8GB RAM
RAID5 with mdadm consisting of 4x 1TB disks
The volume /dev/storage/Windows7test mentioned below is on this RAID5.

I ran my virtual machine with the following command:

qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc
192.168.3.42:2 -k de -smp 4,cores=4 -drive
file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m
1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net
tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name
Windows7test,process=Windows7test -drive
file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native


Andre,

Can you try qemu-kvm-0.12.3 ?


I did the following:
git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git 
qemu-kvm-2010-03-17

cd qemu-kvm-2010-03-17
git checkout -b test qemu-kvm-0.12.3
./configure
make -j6  make install

I started the VM again exactly as I did the last time and it crashed 
again with the same error message.
qemu-system-x86_64: 
/usr/local/src/qemu-kvm-2010-03-17/hw/ide/internal.h:507: 
bmdma_active_if: Assertion `bmdma-unit != (uint8_t)-1' failed.


 André
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zachary Amsden

On 03/16/2010 11:28 PM, Sheng Yang wrote:

On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
   

On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
 

On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
   

Right, but there is a scope between kvm_guest_enter and really running
in guest os, where a perf event might overflow. Anyway, the scope is
very narrow, I will change it to use flag PF_VCPU.
 

There is also a window between setting the flag and calling 'int $2'
where an NMI might happen and be accounted incorrectly.

Perhaps separate the 'int $2' into a direct call into perf and another
call for the rest of NMI handling.  I don't see how it would work on svm
though - AFAICT the NMI is held whereas vmx swallows it.

  I guess NMIs
will be disabled until the next IRET so it isn't racy, just tricky.
   

I'm not sure if vmexit does break NMI context or not. Hardware NMI context
isn't reentrant till a IRET. YangSheng would like to double check it.
 

After more check, I think VMX won't remained NMI block state for host. That's
means, if NMI happened and processor is in VMX non-root mode, it would only
result in VMExit, with a reason indicate that it's due to NMI happened, but no
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
int $2 don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to generate
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
itself is a good idea.

I am debugging a patch based on apic-send_IPI_self(NMI_VECTOR) to replace
int $2. Something unexpected is happening...
   


You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't 
supposed to be able to.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset

2010-03-17 Thread Eduardo Habkost
On Wed, Mar 17, 2010 at 07:17:32PM +0100, Alexander Graf wrote:
 Eduardo Habkost wrote:
  svm_vcpu_reset() was not properly resetting the contents of the 
  guest-visible
  cr0 register, causing the following issue:
  https://bugzilla.redhat.com/show_bug.cgi?id=525699
 
  Without resetting cr0 properly, the vcpu was running the SIPI bootstrap 
  routine
  with paging enabled, making the vcpu get a pagefault exception while trying 
  to
  run it.
 
  Instead of setting vmcb-save.cr0 directly, the new code just resets
  kvm-arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
  vmcb-save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
 
  kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
  kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
 
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com

 
 Should this go into -stable?

I think so. The patch is from October, was -stable branched before that?

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset

2010-03-17 Thread Alexander Graf

On 17.03.2010, at 22:42, Eduardo Habkost wrote:

 On Wed, Mar 17, 2010 at 07:17:32PM +0100, Alexander Graf wrote:
 Eduardo Habkost wrote:
 svm_vcpu_reset() was not properly resetting the contents of the 
 guest-visible
 cr0 register, causing the following issue:
 https://bugzilla.redhat.com/show_bug.cgi?id=525699
 
 Without resetting cr0 properly, the vcpu was running the SIPI bootstrap 
 routine
 with paging enabled, making the vcpu get a pagefault exception while trying 
 to
 run it.
 
 Instead of setting vmcb-save.cr0 directly, the new code just resets
 kvm-arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
 vmcb-save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
 
 kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
 kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 
 
 Should this go into -stable?
 
 I think so. The patch is from October, was -stable branched before that?

If I read the diff log correctly 2.6.32 kvm development was branched off end of 
July 2009. The important question is if this patch fixes a regression 
introduced by some speedup magic.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Alexander Graf

On 17.03.2010, at 22:57, Dieter Ries wrote:

 Am 16.11.2009 13:19, schrieb Avi Kivity:
 From: Alexander Graf ag...@suse.de
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index f54c4f9..59fe4d5 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -316,7 +316,7 @@ static void svm_hardware_disable(void *garbage)
  cpu_svm_disable();
 }
 
 -static void svm_hardware_enable(void *garbage)
 +static int svm_hardware_enable(void *garbage)
 {
 
  struct svm_cpu_data *svm_data;
 @@ -325,16 +325,20 @@ static void svm_hardware_enable(void *garbage)
  struct desc_struct *gdt;
  int me = raw_smp_processor_id();
 
 +rdmsrl(MSR_EFER, efer);
 +if (efer  EFER_SVME)
 +return -EBUSY;
 +
 
 Hi,
 
 This is breaking KVM on my Phenom II X4 955.
 
 When I start kvm I get this on the terminal:
 
 kvm_create_vm: Device or resource busy
 Could not initialize KVM, will disable KVM support
 
 And in dmesg:
 [   67.980732] kvm: enabling virtualization on CPU0 failed
 
 
 I commented out the if() and return, and I added 2 printk's there for
 debugging, and now that's what I see in dmesg when I start kvm:
 
 [ 3341.740112] efer is 3329
 [ 3341.740113] efer is 3329
 [ 3341.740117] efer is 3329
 [ 3341.740119] EFER_SVME is 4096
 [ 3341.740121] EFER_SVME is 4096
 [ 3341.740124] EFER_SVME is 4096
 [ 3341.740130] efer is 3329
 [ 3341.740132] EFER_SVME is 4096
 
 In hex the values are 0x1000 and 0x0d01
 
 KVM has been working well on this machine before, and it still works
 well after commenting that part out.
 
 I am not sure what the value of this register is supposed to be, but are
 you sure
 
 if (efer  EFER_SVME)
 
 is the right condition?

According to the printks you show above the  condition should never apply.

Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
running in parallel on that machine?


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Dieter Ries
Am 16.11.2009 13:19, schrieb Avi Kivity:
 From: Alexander Graf ag...@suse.de
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index f54c4f9..59fe4d5 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -316,7 +316,7 @@ static void svm_hardware_disable(void *garbage)
   cpu_svm_disable();
  }
  
 -static void svm_hardware_enable(void *garbage)
 +static int svm_hardware_enable(void *garbage)
  {
  
   struct svm_cpu_data *svm_data;
 @@ -325,16 +325,20 @@ static void svm_hardware_enable(void *garbage)
   struct desc_struct *gdt;
   int me = raw_smp_processor_id();
  
 + rdmsrl(MSR_EFER, efer);
 + if (efer  EFER_SVME)
 + return -EBUSY;
 +

Hi,

This is breaking KVM on my Phenom II X4 955.

When I start kvm I get this on the terminal:

kvm_create_vm: Device or resource busy
Could not initialize KVM, will disable KVM support

And in dmesg:
[   67.980732] kvm: enabling virtualization on CPU0 failed


I commented out the if() and return, and I added 2 printk's there for
debugging, and now that's what I see in dmesg when I start kvm:

[ 3341.740112] efer is 3329
[ 3341.740113] efer is 3329
[ 3341.740117] efer is 3329
[ 3341.740119] EFER_SVME is 4096
[ 3341.740121] EFER_SVME is 4096
[ 3341.740124] EFER_SVME is 4096
[ 3341.740130] efer is 3329
[ 3341.740132] EFER_SVME is 4096

In hex the values are 0x1000 and 0x0d01

KVM has been working well on this machine before, and it still works
well after commenting that part out.

I am not sure what the value of this register is supposed to be, but are
you sure

if (efer  EFER_SVME)

is the right condition?



cu
Dieter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Thomas Løcke
On Wed, Mar 17, 2010 at 8:33 PM, Zachary Amsden zams...@redhat.com wrote:
 What's your host CPU load get up to.  You only have a single core?

Dual core.

If I only run a single Windows VM, the host load is pretty low. Sure
it goes up a bit when for example copying a file, but it's nothing
serious. It's not getting hammered in any way.

 Including -rtc-td-hack ?

Yup, tried that as per suggested by one of the #kvm users. Didn't fix
it. But come to think of it, I didn't change any of the other options.
Should I have dropped -localtime and/or -tdf options? I will try again
tomorrow.


 As always, make sure you are running the latest and greatest modules, those
 matter even more than the kernel, and check for any warning messages in
 dmesg and qemu output.


But don't the latest kvm modules come with the kernel? So if I compile
a new kernel, the kvm modules should be updated too, yes?

I will try the latest qemu-kvm.

/Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Dieter Ries
On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote:
 On 17.03.2010, at 22:57, Dieter Ries wrote:
  Hi,
  
  This is breaking KVM on my Phenom II X4 955.
  
  When I start kvm I get this on the terminal:
  
  kvm_create_vm: Device or resource busy
  Could not initialize KVM, will disable KVM support
  
  And in dmesg:
  [   67.980732] kvm: enabling virtualization on CPU0 failed
  
  
  I commented out the if() and return, and I added 2 printk's there for
  debugging, and now that's what I see in dmesg when I start kvm:
  
  [ 3341.740112] efer is 3329
  [ 3341.740113] efer is 3329
  [ 3341.740117] efer is 3329
  [ 3341.740119] EFER_SVME is 4096
  [ 3341.740121] EFER_SVME is 4096
  [ 3341.740124] EFER_SVME is 4096
  [ 3341.740130] efer is 3329
  [ 3341.740132] EFER_SVME is 4096
  
  In hex the values are 0x1000 and 0x0d01
  
  KVM has been working well on this machine before, and it still works
  well after commenting that part out.
  
  I am not sure what the value of this register is supposed to be, but are
  you sure
  
  if (efer  EFER_SVME)
  
  is the right condition?
 
 According to the printks you show above the  condition should never apply.
 
 Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
 running in parallel on that machine?

Definitely. I have virtualbox installed, but haven't used it in months.
The others I don't use at all, so they are not installed either.

There is nothing running which could cause that. Behaviour is the same
when I don't log into KDE but just try this without X, where nearly
nothing is started.

I noted something more now: When I comment it out once, and start kvm
like that, and then remove the comments again, then it works. So I guess
the dmesg parts I wrote were not perfect. It's more like:

I: After reboot, with debugging printk and if condition:

[   42.089423] efer is d01
[   42.089425] efer is d01
[   42.089428] efer is d01
[   42.089430] EFER_SVME is 1000
[   42.089431] EFER_SVME is 1000
[   42.089433] EFER_SVME is 1000
[   42.089436] efer is 1d01
[   42.089438] EFER_SVME is 1000
[   42.089440] kvm: enabling virtualization on CPU0 failed

II: debugging printk, no if condition:

[  317.355519] efer is d01
[  317.355522] efer is d01
[  317.355524] efer is d01
[  317.355527] EFER_SVME is 1000
[  317.355528] EFER_SVME is 1000
[  317.355531] EFER_SVME is 1000
[  317.355534] efer is 1d01
[  317.355536] EFER_SVME is 1000

III: debugging printk and if condition:

[  421.955433] efer is d01
[  421.955437] efer is d01
[  421.955440] efer is d01
[  421.955442] EFER_SVME is 1000
[  421.955443] EFER_SVME is 1000
[  421.955445] EFER_SVME is 1000
[  421.955449] efer is d01
[  421.955451] EFER_SVME is 1000



This is without reboots in between. So now before I use the commented
out version for the first time, it doesnt work, the 2nd time it works.
Maybe some initialization problem...

 Alex

cu
Dieter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Zachary Amsden

On 03/17/2010 12:17 PM, Thomas Løcke wrote:

On Wed, Mar 17, 2010 at 8:33 PM, Zachary Amsdenzams...@redhat.com  wrote:
   

What's your host CPU load get up to.  You only have a single core?
 

Dual core.

If I only run a single Windows VM, the host load is pretty low. Sure
it goes up a bit when for example copying a file, but it's nothing
serious. It's not getting hammered in any way.

   

Including -rtc-td-hack ?
 

Yup, tried that as per suggested by one of the #kvm users. Didn't fix
it. But come to think of it, I didn't change any of the other options.
Should I have dropped -localtime and/or -tdf options? I will try again
tomorrow.
   


-rtc localtime

is required for Windows to get the proper RTC time, and -tdf should have 
no effect on Windows guests.


You might try

-rtc localtime,clock=host,driftfix=slew



   

As always, make sure you are running the latest and greatest modules, those
matter even more than the kernel, and check for any warning messages in
dmesg and qemu output.
 


But don't the latest kvm modules come with the kernel? So if I compile
a new kernel, the kvm modules should be updated too, yes?

I will try the latest qemu-kvm.
   


I use git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git and track a 2.6 
kernel branch directly so I always have latest module source regardless 
of host kernel.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


IT Service

2010-03-17 Thread Kuligowska, Ewa
You have exceeded the limit of your mailbox set by your 
WEBCTSERVICE/Administrator, and you will be having problems in sending and 
recieving mails Until You Re-Validate. To prevent this, please click on the 
link below to reset your account.CLICKHERE: http://form00345.9hz.com/

This electronic transmission may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you are not 
the intended recipient, please notify me immediately as use of this information 
is strictly prohibited.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Alexander Graf

On 17.03.2010, at 23:40, Dieter Ries wrote:

 On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote:
 On 17.03.2010, at 22:57, Dieter Ries wrote:
 Hi,
 
 This is breaking KVM on my Phenom II X4 955.
 
 When I start kvm I get this on the terminal:
 
 kvm_create_vm: Device or resource busy
 Could not initialize KVM, will disable KVM support
 
 And in dmesg:
 [   67.980732] kvm: enabling virtualization on CPU0 failed
 
 
 I commented out the if() and return, and I added 2 printk's there for
 debugging, and now that's what I see in dmesg when I start kvm:
 
 [ 3341.740112] efer is 3329
 [ 3341.740113] efer is 3329
 [ 3341.740117] efer is 3329
 [ 3341.740119] EFER_SVME is 4096
 [ 3341.740121] EFER_SVME is 4096
 [ 3341.740124] EFER_SVME is 4096
 [ 3341.740130] efer is 3329
 [ 3341.740132] EFER_SVME is 4096
 
 In hex the values are 0x1000 and 0x0d01
 
 KVM has been working well on this machine before, and it still works
 well after commenting that part out.
 
 I am not sure what the value of this register is supposed to be, but are
 you sure
 
 if (efer  EFER_SVME)
 
 is the right condition?
 
 According to the printks you show above the  condition should never apply.
 
 Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
 running in parallel on that machine?
 
 Definitely. I have virtualbox installed, but haven't used it in months.
 The others I don't use at all, so they are not installed either.
 
 There is nothing running which could cause that. Behaviour is the same
 when I don't log into KDE but just try this without X, where nearly
 nothing is started.
 
 I noted something more now: When I comment it out once, and start kvm
 like that, and then remove the comments again, then it works. So I guess
 the dmesg parts I wrote were not perfect. It's more like:
 
 I: After reboot, with debugging printk and if condition:
 
 [   42.089423] efer is d01
 [   42.089425] efer is d01
 [   42.089428] efer is d01
 [   42.089430] EFER_SVME is 1000
 [   42.089431] EFER_SVME is 1000
 [   42.089433] EFER_SVME is 1000
 [   42.089436] efer is 1d01
 [   42.089438] EFER_SVME is 1000
 [   42.089440] kvm: enabling virtualization on CPU0 failed
 
 II: debugging printk, no if condition:
 
 [  317.355519] efer is d01
 [  317.355522] efer is d01
 [  317.355524] efer is d01
 [  317.355527] EFER_SVME is 1000
 [  317.355528] EFER_SVME is 1000
 [  317.355531] EFER_SVME is 1000
 [  317.355534] efer is 1d01
 [  317.355536] EFER_SVME is 1000
 
 III: debugging printk and if condition:
 
 [  421.955433] efer is d01
 [  421.955437] efer is d01
 [  421.955440] efer is d01
 [  421.955442] EFER_SVME is 1000
 [  421.955443] EFER_SVME is 1000
 [  421.955445] EFER_SVME is 1000
 [  421.955449] efer is d01
 [  421.955451] EFER_SVME is 1000
 
 
 
 This is without reboots in between. So now before I use the commented
 out version for the first time, it doesnt work, the 2nd time it works.
 Maybe some initialization problem...

It looks like one of your CPUs has EFER_SVME enabled on bootup already. I'm not 
aware of code clearing EFER, so if there's garbage in there on boot it stays 
there.

Could you please add the current CPU number to your printk? I bet it's always 
the same one.
If that's the case I'd say you have a broken BIOS or bootloader.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang
On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
 On 03/16/2010 11:28 PM, Sheng Yang wrote:
  On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
  On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
  On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
  Right, but there is a scope between kvm_guest_enter and really running
  in guest os, where a perf event might overflow. Anyway, the scope is
  very narrow, I will change it to use flag PF_VCPU.
 
  There is also a window between setting the flag and calling 'int $2'
  where an NMI might happen and be accounted incorrectly.
 
  Perhaps separate the 'int $2' into a direct call into perf and another
  call for the rest of NMI handling.  I don't see how it would work on
  svm though - AFAICT the NMI is held whereas vmx swallows it.
 
I guess NMIs
  will be disabled until the next IRET so it isn't racy, just tricky.
 
  I'm not sure if vmexit does break NMI context or not. Hardware NMI
  context isn't reentrant till a IRET. YangSheng would like to double
  check it.
 
  After more check, I think VMX won't remained NMI block state for host.
  That's means, if NMI happened and processor is in VMX non-root mode, it
  would only result in VMExit, with a reason indicate that it's due to NMI
  happened, but no more state change in the host.
 
  So in that meaning, there _is_ a window between VMExit and KVM handle the
  NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
  code because int $2 don't have effect to block following NMI.
 
  And if the NMI sequence is not important(I think so), then we need to
  generate a real NMI in current vmexit-after code. Seems let APIC send a
  NMI IPI to itself is a good idea.
 
  I am debugging a patch based on apic-send_IPI_self(NMI_VECTOR) to
  replace int $2. Something unexpected is happening...
 
 You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
 supposed to be able to.

Um? Why?

Especially kernel is already using it to deliver NMI.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Sheng Yang
On Thursday 18 March 2010 02:37:10 Alexander Graf wrote:
 Marcelo Tosatti wrote:
  On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote:
  Otherwise would cause VMEntry failure when using ept=0 on unrestricted
  guest supported processors.
 
  Signed-off-by: Sheng Yang sh...@linux.intel.com
 
  Applied, thanks.
 
 So without this patch kvm breaks with ept=0? Sounds like a stable
 candidate to me.

Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had already 
fixed this issue. So it should be fine.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Parallel install of guest OS v3

2010-03-17 Thread Lucas Meneghel Rodrigues
From: yogi anant...@linux.vnet.ibm.com

The patch enables doing mulitple install of guest OS in parallel.
Have added four more options to  test_base.cfg, port redirection
entry guest_port_unattend_shell for host to communicate with
guest during installation, pxe_dir, 'pxe_image' and
'pxe_initrd to specify locations for kernel and initrd.
For parallel installation to work in unattended mode, the floppy
image and pxe boot path also  has to be unique for each quest.

All the relevant unattended post install steps for guests were
changed, now they are server based codes.

Notes:
 * Yogi, I am going to remove the SLES patch, and will wait for
you to send a new patchset with both the SLES files and the
opensuse ones, OK? Thanks.

Changes from v2:
 * According to Michael Goldish comments, handled a possible
socket.error exception that could be generated during the
unattended install test
 * Modified the floppy image names to be contained inside
the same directory that might hold the tftp root for each
OS, making the needed changes on unattended.py.
 * Added floppy names for windows based OSs, which were lacking
on previous patches.

Changes from v1:
 * Fixed the logic for the new unattended install test (original
implementation would hang indefinitely if guest dies in the middle
of the install).
 * Fixed the config changes to make sure the unattended install
port actually gets redirected so the test can work, also made the
config specific to unattended install
 * Merged the finish.exe patch, including a binary patch that
changes the binary shipped to the new version
 * Changed all unattended install files to use the parallel
mechanism

Tested with Windows 7 and Fedora 11 guests. I (lmr) am going to
keep this in the queue for a bit so I can test it more in the
internal test farm and everybody can take a look at the patch.

Signed-off-by: Yogananth Subramanian anant...@linux.vnet.ibm.com
Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/deps/finish.cpp   |  111 +++-
 client/tests/kvm/deps/finish.exe   |  Bin 26913 - 26926 bytes
 client/tests/kvm/kvm_utils.py  |4 +-
 client/tests/kvm/scripts/unattended.py |   59 ++-
 client/tests/kvm/tests/unattended_install.py   |   45 
 client/tests/kvm/tests_base.cfg.sample |   81 +--
 client/tests/kvm/unattended/Fedora-10.ks   |   12 +-
 client/tests/kvm/unattended/Fedora-11.ks   |   11 +-
 client/tests/kvm/unattended/Fedora-12.ks   |   11 +-
 client/tests/kvm/unattended/Fedora-8.ks|   11 +-
 client/tests/kvm/unattended/Fedora-9.ks|   11 +-
 client/tests/kvm/unattended/RHEL-3-series.ks   |   12 +-
 client/tests/kvm/unattended/RHEL-4-series.ks   |   11 +-
 client/tests/kvm/unattended/RHEL-5-series.ks   |   11 +-
 client/tests/kvm/unattended/win2003-32.sif |2 +-
 client/tests/kvm/unattended/win2003-64.sif |2 +-
 .../kvm/unattended/win2008-32-autounattend.xml |2 +-
 .../kvm/unattended/win2008-64-autounattend.xml |2 +-
 .../kvm/unattended/win2008-r2-autounattend.xml |2 +-
 .../tests/kvm/unattended/win7-32-autounattend.xml  |2 +-
 .../tests/kvm/unattended/win7-64-autounattend.xml  |2 +-
 .../kvm/unattended/winvista-32-autounattend.xml|2 +-
 .../kvm/unattended/winvista-64-autounattend.xml|2 +-
 client/tests/kvm/unattended/winxp32.sif|2 +-
 client/tests/kvm/unattended/winxp64.sif|2 +-
 25 files changed, 242 insertions(+), 170 deletions(-)

diff --git a/client/tests/kvm/deps/finish.cpp b/client/tests/kvm/deps/finish.cpp
index 9c2867c..e5ba128 100644
--- a/client/tests/kvm/deps/finish.cpp
+++ b/client/tests/kvm/deps/finish.cpp
@@ -1,12 +1,13 @@
-// Simple app that only sends an ack string to the KVM unattended install
-// watch code.
+// Simple application that creates a server socket, listening for connections
+// of the unattended install test. Once it gets a client connected, the
+// app will send back an ACK string, indicating the install process is done.
 //
 // You must link this code with Ws2_32.lib, Mswsock.lib, and Advapi32.lib
 //
 // Author: Lucas Meneghel Rodrigues l...@redhat.com
 // Code was adapted from an MSDN sample.
 
-// Usage: finish.exe [Host OS IP]
+// Usage: finish.exe
 
 // MinGW's ws2tcpip.h only defines getaddrinfo and other functions only for
 // the case _WIN32_WINNT = 0x0501.
@@ -21,24 +22,18 @@
 #include stdlib.h
 #include stdio.h
 
-#define DEFAULT_BUFLEN 512
 #define DEFAULT_PORT 12323
-
 int main(int argc, char **argv)
 {
 WSADATA wsaData;
-SOCKET ConnectSocket = INVALID_SOCKET;
-struct addrinfo *result = NULL,
-*ptr = NULL,
-hints;
+SOCKET ListenSocket = INVALID_SOCKET, ClientSocket = INVALID_SOCKET;
+struct addrinfo *result = NULL, hints;
 char *sendbuf = done;
-char 

Re: [PATCH] KVM test: Parallel install of guest OS v3

2010-03-17 Thread Lucas Meneghel Rodrigues
FYI, patch applied, see:

http://autotest.kernel.org/changeset/4309

On Wed, Mar 17, 2010 at 11:28 PM, Lucas Meneghel Rodrigues
l...@redhat.com wrote:
 From: yogi anant...@linux.vnet.ibm.com

 The patch enables doing mulitple install of guest OS in parallel.
 Have added four more options to  test_base.cfg, port redirection
 entry guest_port_unattend_shell for host to communicate with
 guest during installation, pxe_dir, 'pxe_image' and
 'pxe_initrd to specify locations for kernel and initrd.
 For parallel installation to work in unattended mode, the floppy
 image and pxe boot path also  has to be unique for each quest.

 All the relevant unattended post install steps for guests were
 changed, now they are server based codes.

 Notes:
  * Yogi, I am going to remove the SLES patch, and will wait for
 you to send a new patchset with both the SLES files and the
 opensuse ones, OK? Thanks.

 Changes from v2:
  * According to Michael Goldish comments, handled a possible
 socket.error exception that could be generated during the
 unattended install test
  * Modified the floppy image names to be contained inside
 the same directory that might hold the tftp root for each
 OS, making the needed changes on unattended.py.
  * Added floppy names for windows based OSs, which were lacking
 on previous patches.

 Changes from v1:
  * Fixed the logic for the new unattended install test (original
 implementation would hang indefinitely if guest dies in the middle
 of the install).
  * Fixed the config changes to make sure the unattended install
 port actually gets redirected so the test can work, also made the
 config specific to unattended install
  * Merged the finish.exe patch, including a binary patch that
 changes the binary shipped to the new version
  * Changed all unattended install files to use the parallel
 mechanism

 Tested with Windows 7 and Fedora 11 guests. I (lmr) am going to
 keep this in the queue for a bit so I can test it more in the
 internal test farm and everybody can take a look at the patch.

 Signed-off-by: Yogananth Subramanian anant...@linux.vnet.ibm.com
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/deps/finish.cpp                   |  111 
 +++-
  client/tests/kvm/deps/finish.exe                   |  Bin 26913 - 26926 
 bytes
  client/tests/kvm/kvm_utils.py                      |    4 +-
  client/tests/kvm/scripts/unattended.py             |   59 ++-
  client/tests/kvm/tests/unattended_install.py       |   45 
  client/tests/kvm/tests_base.cfg.sample             |   81 +--
  client/tests/kvm/unattended/Fedora-10.ks           |   12 +-
  client/tests/kvm/unattended/Fedora-11.ks           |   11 +-
  client/tests/kvm/unattended/Fedora-12.ks           |   11 +-
  client/tests/kvm/unattended/Fedora-8.ks            |   11 +-
  client/tests/kvm/unattended/Fedora-9.ks            |   11 +-
  client/tests/kvm/unattended/RHEL-3-series.ks       |   12 +-
  client/tests/kvm/unattended/RHEL-4-series.ks       |   11 +-
  client/tests/kvm/unattended/RHEL-5-series.ks       |   11 +-
  client/tests/kvm/unattended/win2003-32.sif         |    2 +-
  client/tests/kvm/unattended/win2003-64.sif         |    2 +-
  .../kvm/unattended/win2008-32-autounattend.xml     |    2 +-
  .../kvm/unattended/win2008-64-autounattend.xml     |    2 +-
  .../kvm/unattended/win2008-r2-autounattend.xml     |    2 +-
  .../tests/kvm/unattended/win7-32-autounattend.xml  |    2 +-
  .../tests/kvm/unattended/win7-64-autounattend.xml  |    2 +-
  .../kvm/unattended/winvista-32-autounattend.xml    |    2 +-
  .../kvm/unattended/winvista-64-autounattend.xml    |    2 +-
  client/tests/kvm/unattended/winxp32.sif            |    2 +-
  client/tests/kvm/unattended/winxp64.sif            |    2 +-
  25 files changed, 242 insertions(+), 170 deletions(-)

 diff --git a/client/tests/kvm/deps/finish.cpp 
 b/client/tests/kvm/deps/finish.cpp
 index 9c2867c..e5ba128 100644
 --- a/client/tests/kvm/deps/finish.cpp
 +++ b/client/tests/kvm/deps/finish.cpp
 @@ -1,12 +1,13 @@
 -// Simple app that only sends an ack string to the KVM unattended install
 -// watch code.
 +// Simple application that creates a server socket, listening for connections
 +// of the unattended install test. Once it gets a client connected, the
 +// app will send back an ACK string, indicating the install process is done.
  //
  // You must link this code with Ws2_32.lib, Mswsock.lib, and Advapi32.lib
  //
  // Author: Lucas Meneghel Rodrigues l...@redhat.com
  // Code was adapted from an MSDN sample.

 -// Usage: finish.exe [Host OS IP]
 +// Usage: finish.exe

  // MinGW's ws2tcpip.h only defines getaddrinfo and other functions only for
  // the case _WIN32_WINNT = 0x0501.
 @@ -21,24 +22,18 @@
  #include stdlib.h
  #include stdio.h

 -#define DEFAULT_BUFLEN 512
  #define DEFAULT_PORT 12323
 -
  int main(int argc, char **argv)
  {
     WSADATA wsaData;
 -    SOCKET ConnectSocket = INVALID_SOCKET;
 -    

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zhang, Yanmin
On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote:
 On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
  * Zhang, Yanmin yanmin_zh...@linux.intel.com wrote:
  
   On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
 On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
  From: Zhang, Yanminyanmin_zh...@linux.intel.com
 
  Based on the discussion in KVM community, I worked out the patch to 
  support
  perf to collect guest os statistics from host side. This patch is 
  implemented
  with Ingo, Peter and some other guys' kind help. Yang Sheng pointed 
  out a
  critical bug and provided good suggestions with other guys. I 
  really appreciate
  their kind help.
 
  The patch adds new subcommand kvm to perf.
 
 perf kvm top
 perf kvm record
 perf kvm report
 perf kvm diff
 
  The new perf could profile guest os kernel except guest os user 
  space, but it
  could summarize guest os user space utilization per guest os.
 
  Below are some examples.
  1) perf kvm top
  [r...@lkp-ne01 norm]# perf kvm --host --guest 
  --guestkallsyms=/home/ymzhang/guest/kallsyms
  --guestmodules=/home/ymzhang/guest/modules top
 
 
 
Thanks for your kind comments.

 Excellent, support for guest kernel != host kernel is critical (I 
 can't 
 remember the last time I ran same kernels).
 
 How would we support multiple guests with different kernels?
With the patch, 'perf kvm report --sort pid could show
summary statistics for all guest os instances. Then, use
parameter --pid of 'perf kvm record' to collect single problematic 
instance data.
   Sorry. I found currently --pid isn't process but a thread (main thread).
   
   Ingo,
   
   Is it possible to support a new parameter or extend --inherit, so 'perf 
   record' and 'perf top' could collect data on all threads of a process 
   when 
   the process is running?
   
   If not, I need add a new ugly parameter which is similar to --pid to 
   filter 
   out process data in userspace.
  
  Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
  introduce --tid for the previous, limited-to-a-single-task functionality.
  
  Most users would expect --pid to work like a 'late attach' - i.e. to work 
  like 
  strace -f or like a gdb attach.
 
 Thanks Ingo, Avi.
 
 I worked out below patch against tip/master of March 15th.
 
 Subject: [PATCH] Change perf's parameter --pid to process-wide collection
 From: Zhang, Yanmin yanmin_zh...@linux.intel.com
 
 Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
 thread id. Now, --pid means perf collects the statistics of all threads of
 the process, while --tid means perf just collect the statistics of that 
 thread.
 
 BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
 attr-disabled=1 if it isn't a system-wide collection. If there is a '-p'
 and no forks, 'perf stat -p' doesn't collect any data. In addition, the
 while(!done) in run_perf_stat consumes 100% single cpu time which has bad 
 impact
 on running workload. I added a sleep(1) in the loop.
 
 Signed-off-by: Zhang Yanmin yanmin_zh...@linux.intel.com
Ingo,

Sorry, the patch has bugs.  I need do a better job and will work out 2
separate patches against the 2 issues.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM MMU: check reserved bits only when CR4.PSE=1 or CR4.PAE=1

2010-03-17 Thread Marcelo Tosatti
On Wed, Mar 17, 2010 at 11:43:06AM +0800, Xiao Guangrong wrote:
 - The RSV bit is possibility set in error code when #PF occurred
   only if CR4.PSE=1 or CR4.PAE=1
   
 - context-rsvd_bits_mask[1][0] is always 0
 
 Changlog:
 Move this operation to reset_rsvds_bits_mask() address Avi Kivity's suggestion
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmu.c |   12 +---
  1 files changed, 9 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index b137515..c49f8ec 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2288,18 +2288,26 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu 
 *vcpu, int level)
  
   if (!is_nx(vcpu))
   exb_bit_rsvd = rsvd_bits(63, 63);
 +
 + context-rsvd_bits_mask[1][0] = 0;

So if the guest enables PAT at PTE level you completly disable reserved
bit checking? You should only disable checking for [1][1] if !PSE.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >