Re: Timedrift in KVM guests after livemigration.

2010-04-18 Thread Dor Laor

On 04/18/2010 02:21 AM, Espen Berg wrote:

Den 17.04.2010 22:17, skrev Michael Tokarev:

We have three KVM hosts that supports live-migration between them, but
one of our problems is time drifting. The three frontends has different
CPU frequency and the KVM guests adopt the frequency from the host
machine where it was first started.

What do you mean by "adopts" ? Note that the cpu frequency
means nothing for all the modern operating systems, at least
since the days of common usage of MS-DOS which relied on CPU
frequency for its time functions. All interesting things are
now done using timers instead, and timers (which don't depend
on CPU frequency again) usually work quite well.


The assumption that frequency of the ticks was calculated by the hosts
MHz, was based on the fact that grater clock frequency differences
caused higher time drift. 60 MHz difference caused about 24min drift,
332 MHz difference caused about 2h25min drift.



What complicates things is that the most cheap and accurate
enough time source is TSC (time stamp counter register in
the CPU), but it will definitely be different on each
machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
introduced a compensation. See for example -tdf kvm option.


Ah, nice to know. :)


That's two different things here:
The issue that Espen is reporting is that the hosts have different 
frequency and guests that relay on the tsc as a source clock will notice 
that post migration. The is indeed a problem that -tdf does not solve. 
-tdf only adds compensation for the RTC clock emulation.


What's the guest type and what's the guest's source clock?
Using tsc directly as a source clock is not recommended because of this 
migration issue (that is not solveable until we trap every rdtsc by the 
guest). Using pv kvmclock in Linux mitigates this issue since it exposes 
both the tsc and the host clock so guests can adjust themselves.


Several months ago a pvclock migration fix was added to pass the pvclock 
MSRs reading to the destination: 1a03675db146dfc760b3b48b3448075189f142cc






Since this is a cluster in production, I'm not able to try the latest
version either.

Well, that's difficult one, no? It either works or not.
If you can't try anything else, why to ask? :)


What I tried to say was that there are many important virtual servers
running on this cluster at the moment, so "trial by error" was not an
option. The last time we tried 0.12.x (during the initial tests of the
cluster) there where a lot of stability issues, crashes during migration
etc.

Regards, Espen

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timedrift in KVM guests after livemigration.

2010-04-18 Thread Espen Berg

Den 18.04.2010 11:22, skrev Dor Laor:

What do you mean by "adopts" ? Note that the cpu frequency
means nothing for all the modern operating systems, at least
since the days of common usage of MS-DOS which relied on CPU
frequency for its time functions. All interesting things are
now done using timers instead, and timers (which don't depend
on CPU frequency again) usually work quite well.

The assumption that frequency of the ticks was calculated by the hosts
MHz, was based on the fact that grater clock frequency differences
caused higher time drift. 60 MHz difference caused about 24min drift,
332 MHz difference caused about 2h25min drift.

What complicates things is that the most cheap and accurate
enough time source is TSC (time stamp counter register in
the CPU), but it will definitely be different on each
machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
introduced a compensation. See for example -tdf kvm option.

Ah, nice to know. :)

That's two different things here:
The issue that Espen is reporting is that the hosts have different
frequency and guests that relay on the tsc as a source clock will notice
that post migration. The is indeed a problem that -tdf does not solve.
-tdf only adds compensation for the RTC clock emulation.

What's the guest type and what's the guest's source clock?


All guest are Debian lenny with latest upstream kernel, hvm/kvm.

We are using kvm-clock as guest source clock.

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock


Regards
Espen
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timedrift in KVM guests after livemigration.

2010-04-18 Thread Gleb Natapov
On Sun, Apr 18, 2010 at 12:22:54PM +0300, Dor Laor wrote:
> On 04/18/2010 02:21 AM, Espen Berg wrote:
> >Den 17.04.2010 22:17, skrev Michael Tokarev:
> We have three KVM hosts that supports live-migration between them, but
> one of our problems is time drifting. The three frontends has different
> CPU frequency and the KVM guests adopt the frequency from the host
> machine where it was first started.
> >>What do you mean by "adopts" ? Note that the cpu frequency
> >>means nothing for all the modern operating systems, at least
> >>since the days of common usage of MS-DOS which relied on CPU
> >>frequency for its time functions. All interesting things are
> >>now done using timers instead, and timers (which don't depend
> >>on CPU frequency again) usually work quite well.
> >
> >The assumption that frequency of the ticks was calculated by the hosts
> >MHz, was based on the fact that grater clock frequency differences
> >caused higher time drift. 60 MHz difference caused about 24min drift,
> >332 MHz difference caused about 2h25min drift.
> >
> >
> >>What complicates things is that the most cheap and accurate
> >>enough time source is TSC (time stamp counter register in
> >>the CPU), but it will definitely be different on each
> >>machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
> >>introduced a compensation. See for example -tdf kvm option.
> >
> >Ah, nice to know. :)
> 
> That's two different things here:
> The issue that Espen is reporting is that the hosts have different
> frequency and guests that relay on the tsc as a source clock will
> notice that post migration. The is indeed a problem that -tdf does
> not solve. -tdf only adds compensation for the RTC clock emulation.
> 
It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
pit is used it does nothing.

> What's the guest type and what's the guest's source clock?
> Using tsc directly as a source clock is not recommended because of
> this migration issue (that is not solveable until we trap every
> rdtsc by the guest). Using pv kvmclock in Linux mitigates this issue
> since it exposes both the tsc and the host clock so guests can
> adjust themselves.
> 
> Several months ago a pvclock migration fix was added to pass the
> pvclock MSRs reading to the destination:
> 1a03675db146dfc760b3b48b3448075189f142cc
> 
> 
> >
> >>>Since this is a cluster in production, I'm not able to try the latest
> >>>version either.
> >>Well, that's difficult one, no? It either works or not.
> >>If you can't try anything else, why to ask? :)
> >
> >What I tried to say was that there are many important virtual servers
> >running on this cluster at the moment, so "trial by error" was not an
> >option. The last time we tried 0.12.x (during the initial tests of the
> >cluster) there where a lot of stability issues, crashes during migration
> >etc.
> >
> >Regards, Espen
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majord...@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] Autotest: Unattended_install testcase always fail with rhel3.9-32 guest

2010-04-18 Thread Lucas Meneghel Rodrigues
On Sat, 2010-04-17 at 22:55 -0600, David S. Ahern wrote:
> 
> On 04/17/2010 10:09 PM, Amos Kong wrote:
> > %post --interpreter /usr/bin/python
> > import socket, os
> > os.system('dhclient')
> > os.system('chkconfig sshd on')
> > os.system('iptables -F')
> > os.system('echo 0 > /selinux/enforce')
> > server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > server.bind(('', 12323))
> > server.listen(1)
> > (client, addr) = server.accept()
> > client.send("done")
> > client.close()
> 
> So, effectively after the install completes use dhclient to configure a
> network address, start a server on a known port and when a client
> connects send the message "done". I would expect that to work just fine.

Me too, it has been working for RHEL 4.X, 5.X 32/64 bit and 3.X 64 bit.
The problem has been effectively 3.9 32 bit.

> What part is not working? Have you used anaconda's root shell (alt-f2)
> to confirm each step and if so which one is not setup as expected?

dhclient. It fails saying "module IP_... could not be loaded.

> David
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Optometrists - 63,837 records 2,015 emails

2010-04-18 Thread Elaine Y Freeman
To get additional details, samples and counts for our USA contact data please
email me at this address allyson.hairs...@alwaysgood. co.cc

we have lots of different lists in many fields and this week is the time to buy 
with lowered list prices.
  




to subtract your address from our databa#se look here please email 
rem...@alwaysgood .co.cc 

Flash event to switch between two lines
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] KVM: PPC: Make Performance Counters work

2010-04-18 Thread Alexander Graf
When we get a performance counter interrupt we need to route it on to the
Linux handler after we got out of the guest context. We also need to tell
our handling code that this particular interrupt doesn't need treatment.

So let's add those two bits in, making perf work while having a KVM guest
running.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c|3 +++
 arch/powerpc/kvm/book3s_interrupts.S |2 ++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a7de709..a03163b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -872,6 +872,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu->stat.ext_intr_exits++;
r = RESUME_GUEST;
break;
+   case BOOK3S_INTERRUPT_PERFMON:
+   r = RESUME_GUEST;
+   break;
case BOOK3S_INTERRUPT_PROGRAM:
{
enum emulation_result er;
diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index f5b3358..e486193 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -228,6 +228,8 @@ no_dcbz32_off:
beq call_linux_handler
cmpwi   r12, BOOK3S_INTERRUPT_DECREMENTER
beq call_linux_handler
+   cmpwi   r12, BOOK3S_INTERRUPT_PERFMON
+   beq call_linux_handler
 
/* Back to EE=1 */
mtmsr   r6
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] KVM: PPC: Make Alignment interrupts work again

2010-04-18 Thread Alexander Graf
In the process of merging Book3S_32 and 64 I somehow ended up having the
alignment interrupt handler take last_inst, but the fetching code not
fetching it. So we ended up with stale last_inst values.

Let's just enable last_inst fetching for alignment interrupts too.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_segment.S |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index 778e3fc..ede47fd 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -196,6 +196,8 @@ kvmppc_handler_trampoline_exit:
beq ld_last_inst
cmpwi   r12, BOOK3S_INTERRUPT_PROGRAM
beq ld_last_inst
+   cmpwi   r12, BOOK3S_INTERRUPT_ALIGNMENT
+   beq-ld_last_inst
 
b   no_ld_last_inst
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] KVM: PPC: Set VSID_PR also for Book3S_64

2010-04-18 Thread Alexander Graf
Book3S_64 didn't set VSID_PR when we're in PR=1. This lead to pretty bad
behavior when searching for the shadow segment, as part of the code relied
on VSID_PR being set.

This patch fixes booting Book3S_64 guests.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 612de6e..4025ea2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -473,6 +473,9 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct 
kvm_vcpu *vcpu, ulong esid,
break;
}
 
+   if (vcpu->arch.msr & MSR_PR)
+   *vsid |= VSID_PR;
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] Post-PPC32 series

2010-04-18 Thread Alexander Graf
While working with the PPC32 host target we finally have I stumbled over
several things. Thanks to the now possible performance measurements I also
tracked down split mode as one of the major slowdowns to KVM.

What's left now that slows us down is the normal flushing code that needs
to move to a table based lookup and instruction emulation. On PPC32 guests
we waste about 70% of our time on emulating mfmsr, mtmsr, mfsprg, mtsprg
and friends.

Either way - this patch series deprecates the former performance counter
and u64 patch.

Avi / Marcelo, please apply the former series and this series. Ignore the
two patches in between.

Alexander Graf (8):
  KVM: PPC: Convert u64 -> ulong
  KVM: PPC: Make Performance Counters work
  KVM: PPC: Improve split mode
  KVM: PPC: Make Alignment interrupts work again
  KVM: PPC: Be more informative on BUG
  KVM: PPC: Set VSID_PR also for Book3S_64
  KVM: PPC: Fix Book3S_64 Host MMU debug output
  KVM: PPC: Find HTAB ourselves

 arch/powerpc/include/asm/kvm_book3s.h |   13 +--
 arch/powerpc/include/asm/kvm_host.h   |6 ++--
 arch/powerpc/kernel/ppc_ksyms.c   |5 
 arch/powerpc/kvm/book3s.c |   37 +---
 arch/powerpc/kvm/book3s_32_mmu.c  |   27 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c |   29 ++---
 arch/powerpc/kvm/book3s_64_mmu.c  |   34 +
 arch/powerpc/kvm/book3s_64_mmu_host.c |   36 ---
 arch/powerpc/kvm/book3s_interrupts.S  |2 +
 arch/powerpc/kvm/book3s_segment.S |2 +
 10 files changed, 108 insertions(+), 83 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] KVM: PPC: Find HTAB ourselves

2010-04-18 Thread Alexander Graf
For KVM we need to find the location of the HTAB. We can either rely
on internal data structures of the kernel or ask the hardware.

Ben issued complaints about the internal data structure method, so
let's switch it to our own inquiry of the HTAB. Now we're fully
independent :-).

CC: Benjamin Herrenschmidt 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/ppc_ksyms.c   |5 -
 arch/powerpc/kvm/book3s_32_mmu_host.c |   21 +
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 2b7c43f..bc9f39d 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -178,11 +178,6 @@ EXPORT_SYMBOL(switch_mmu_context);
 extern long mol_trampoline;
 EXPORT_SYMBOL(mol_trampoline); /* For MOL */
 EXPORT_SYMBOL(flush_hash_pages); /* For MOL */
-
-extern struct hash_pte *Hash;
-extern unsigned long _SDR1;
-EXPORT_SYMBOL_GPL(Hash); /* For KVM */
-EXPORT_SYMBOL_GPL(_SDR1); /* For KVM */
 #ifdef CONFIG_SMP
 extern int mmu_hash_lock;
 EXPORT_SYMBOL(mmu_hash_lock); /* For MOL */
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 2bb67e6..0bb6600 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -54,6 +54,9 @@
 #error Only 32 bit pages are supported for now
 #endif
 
+static ulong htab;
+static u32 htabmask;
+
 static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
volatile u32 *pteg;
@@ -217,14 +220,11 @@ static struct kvmppc_sid_map *find_sid_vsid(struct 
kvm_vcpu *vcpu, u64 gvsid)
return NULL;
 }
 
-extern struct hash_pte *Hash;
-extern unsigned long _SDR1;
-
 static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, u32 vsid, u32 eaddr,
bool primary)
 {
-   u32 page, hash, htabmask;
-   ulong pteg = (ulong)Hash;
+   u32 page, hash;
+   ulong pteg = htab;
 
page = (eaddr & ~ESID_MASK) >> 12;
 
@@ -232,13 +232,12 @@ static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, 
u32 vsid, u32 eaddr,
if (!primary)
hash = ~hash;
 
-   htabmask = ((_SDR1 & 0x1FF) << 16) | 0xFFC0;
hash &= htabmask;
 
pteg |= hash;
 
-   dprintk_mmu("htab: %p | hash: %x | htabmask: %x | pteg: %lx\n",
-   Hash, hash, htabmask, pteg);
+   dprintk_mmu("htab: %lx | hash: %x | htabmask: %x | pteg: %lx\n",
+   htab, hash, htabmask, pteg);
 
return (u32*)pteg;
 }
@@ -453,6 +452,7 @@ int kvmppc_mmu_init(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
int err;
+   ulong sdr1;
 
err = __init_new_context();
if (err < 0)
@@ -474,5 +474,10 @@ int kvmppc_mmu_init(struct kvm_vcpu *vcpu)
 
vcpu3s->vsid_next = vcpu3s->vsid_first;
 
+   /* Remember where the HTAB is */
+   asm ( "mfsdr1 %0" : "=r"(sdr1) );
+   htabmask = ((sdr1 & 0x1FF) << 16) | 0xFFC0;
+   htab = (ulong)__va(sdr1 & 0x);
+
return 0;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] KVM: PPC: Be more informative on BUG

2010-04-18 Thread Alexander Graf
We have a condition in the ppc64 host mmu code that should never occur.
Unfortunately, it just did happen to me and I was rather puzzled on why,
because BUG_ON doesn't tell me anything useful.

So let's add some more debug output in case this goes wrong. Also change
BUG to WARN, since I don't want to reboot every time I mess something up.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 41af12f..5bf91a7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -231,10 +231,15 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
map = find_sid_vsid(vcpu, vsid);
if (!map) {
-   kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+   ret = kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+   WARN_ON(ret < 0);
map = find_sid_vsid(vcpu, vsid);
}
-   BUG_ON(!map);
+   if (!map) {
+   printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n",
+   vsid, orig_pte->eaddr);
+   WARN();
+   }
 
vsid = map->host_vsid;
va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] KVM: PPC: Convert u64 -> ulong

2010-04-18 Thread Alexander Graf
There are some pieces in the code that I overlooked that still use
u64s instead of longs. This slows down 32 bit hosts unnecessarily, so
let's just move them to ulong.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - don't touch vsid - that stays u64!
---
 arch/powerpc/include/asm/kvm_book3s.h |4 ++--
 arch/powerpc/include/asm/kvm_host.h   |6 +++---
 arch/powerpc/kvm/book3s.c |6 +++---
 arch/powerpc/kvm/book3s_32_mmu.c  |6 +++---
 arch/powerpc/kvm/book3s_32_mmu_host.c |8 +++-
 arch/powerpc/kvm/book3s_64_mmu.c  |4 ++--
 arch/powerpc/kvm/book3s_64_mmu_host.c |6 +++---
 7 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9517b8d..5d3bd0c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -107,9 +107,9 @@ struct kvmppc_vcpu_book3s {
 #define VSID_BAT   0x7fb0ULL
 #define VSID_PR0x8000ULL
 
-extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong 
ea_mask);
 extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
-extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 
pa_end);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong 
pa_end);
 extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
 extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5a83995..0c9ad86 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -124,9 +124,9 @@ struct kvm_arch {
 };
 
 struct kvmppc_pte {
-   u64 eaddr;
+   ulong eaddr;
u64 vpage;
-   u64 raddr;
+   ulong raddr;
bool may_read   : 1;
bool may_write  : 1;
bool may_execute: 1;
@@ -145,7 +145,7 @@ struct kvmppc_mmu {
int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte 
*pte, bool data);
void (*reset_msr)(struct kvm_vcpu *vcpu);
void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
-   int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+   int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, ulong esid, u64 *vsid);
u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
 };
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 5805f99..a7de709 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -812,12 +812,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * so we can't use the NX bit inside the guest. 
Let's cross our fingers,
 * that no guest that needs the dcbz hack does NX.
 */
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
} else {
vcpu->arch.msr |= to_svcpu(vcpu)->shadow_srr1 & 
0x5800;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
}
break;
@@ -843,7 +843,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu->arch.dear = dar;
to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL);
r = RESUME_GUEST;
}
break;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 48efb37..33186b7 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -60,7 +60,7 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
 
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data);
-static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 u64 *vsid);
 
 static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s

[PATCH 7/8] KVM: PPC: Fix Book3S_64 Host MMU debug output

2010-04-18 Thread Alexander Graf
We have some debug output in Book3S_64. Some of that was invalid though,
partially not even compiling because it accessed incorrect variables.

So let's fix that up, making debugging more fun again.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   23 ++-
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 5bf91a7..e4b5744 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -48,8 +48,8 @@
 
 static void invalidate_pte(struct hpte_cache *pte)
 {
-   dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
-   i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+   dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) -> 0x%llx\n",
+   pte->pte.eaddr, pte->pte.vpage, pte->host_va);
 
ppc_md.hpte_invalidate(pte->slot, pte->host_va,
   MMU_PAGE_4K, MMU_SEGSIZE_256M,
@@ -66,7 +66,7 @@ void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong 
guest_ea, ulong ea_mask)
 {
int i;
 
-   dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+   dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%lx & 0x%lx\n",
vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
 
@@ -114,8 +114,8 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong 
pa_start, ulong pa_end)
 {
int i;
 
-   dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
-   vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+   dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%lx & 0x%lx\n",
+   vcpu->arch.hpte_cache_offset, pa_start, pa_end);
BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
 
for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
@@ -186,7 +186,7 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu 
*vcpu, u64 gvsid)
sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
map = &to_book3s(vcpu)->sid_map[sid_map_mask];
if (map->guest_vsid == gvsid) {
-   dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+   dprintk_slb("SLB: Searching: 0x%llx -> 0x%llx\n",
gvsid, map->host_vsid);
return map;
}
@@ -198,7 +198,8 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu 
*vcpu, u64 gvsid)
return map;
}
 
-   dprintk_slb("SLB: Searching 0x%llx -> not found\n", gvsid);
+   dprintk_slb("SLB: Searching %d/%d: 0x%llx -> not found\n",
+   sid_map_mask, SID_MAP_MASK - sid_map_mask, gvsid);
return NULL;
 }
 
@@ -238,7 +239,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
if (!map) {
printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n",
vsid, orig_pte->eaddr);
-   WARN();
+   WARN_ON(true);
+   return -EINVAL;
}
 
vsid = map->host_vsid;
@@ -274,7 +276,7 @@ map_again:
int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
 
-   dprintk_mmu("KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> 
%lx\n",
+   dprintk_mmu("KVM: %c%c Map 0x%lx: [%lx] 0x%lx (0x%llx) -> 
%lx\n",
((rflags & HPTE_R_PP) == 3) ? '-' : 'w',
(rflags & HPTE_R_N) ? '-' : 'x',
orig_pte->eaddr, hpteg, va, orig_pte->vpage, 
hpaddr);
@@ -330,6 +332,9 @@ static struct kvmppc_sid_map *create_sid_map(struct 
kvm_vcpu *vcpu, u64 gvsid)
map->guest_vsid = gvsid;
map->valid = true;
 
+   dprintk_slb("SLB: New mapping at %d: 0x%llx -> 0x%llx\n",
+   sid_map_mask, gvsid, map->host_vsid);
+
return map;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] KVM: PPC: Improve split mode

2010-04-18 Thread Alexander Graf
When in split mode, instruction relocation and data relocation are not equal.

So far we implemented this mode by reserving a special pseudo-VSID for the
two cases and flushing all PTEs when going into split mode, which is slow.

Unfortunately 32bit Linux and Mac OS X use split mode extensively. So to not
slow down things too much, I came up with a different idea: Mark the split
mode with a bit in the VSID and then treat it like any other segment.

This means we can just flush the shadow segment cache, but keep the PTEs
intact. I verified that this works with ppc32 Linux and Mac OS X 10.4
guests and does speed them up.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |9 -
 arch/powerpc/kvm/book3s.c |   28 ++--
 arch/powerpc/kvm/book3s_32_mmu.c  |   21 +
 arch/powerpc/kvm/book3s_64_mmu.c  |   27 +++
 4 files changed, 46 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5d3bd0c..6f74d93 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -100,11 +100,10 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST  1
 #define CONTEXT_GUEST_END  2
 
-#define VSID_REAL_DR   0x7ff0ULL
-#define VSID_REAL_IR   0x7fe0ULL
-#define VSID_SPLIT_MASK0x7fe0ULL
-#define VSID_REAL  0x7fc0ULL
-#define VSID_BAT   0x7fb0ULL
+#define VSID_REAL  0x1fc0ULL
+#define VSID_BAT   0x1fb0ULL
+#define VSID_REAL_DR   0x2000ULL
+#define VSID_REAL_IR   0x4000ULL
 #define VSID_PR0x8000ULL
 
 extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong 
ea_mask);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a03163b..91dc42d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -147,16 +147,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
}
}
 
-   if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) 
||
-   (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
-   bool dr = (vcpu->arch.msr & MSR_DR) ? true : false;
-   bool ir = (vcpu->arch.msr & MSR_IR) ? true : false;
-
-   /* Flush split mode PTEs */
-   if (dr != ir)
-   kvmppc_mmu_pte_vflush(vcpu, VSID_SPLIT_MASK,
- VSID_SPLIT_MASK);
-
+   if ((vcpu->arch.msr & (MSR_PR|MSR_IR|MSR_DR)) !=
+  (old_msr & (MSR_PR|MSR_IR|MSR_DR))) {
kvmppc_mmu_flush_segments(vcpu);
kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
}
@@ -534,6 +526,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
bool is_mmio = false;
bool dr = (vcpu->arch.msr & MSR_DR) ? true : false;
bool ir = (vcpu->arch.msr & MSR_IR) ? true : false;
+   u64 vsid;
 
relocated = data ? dr : ir;
 
@@ -551,13 +544,20 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
case 0:
-   pte.vpage |= VSID_REAL;
+   pte.vpage |= ((u64)VSID_REAL << (SID_SHIFT - 12));
break;
case MSR_DR:
-   pte.vpage |= VSID_REAL_DR;
-   break;
case MSR_IR:
-   pte.vpage |= VSID_REAL_IR;
+   vcpu->arch.mmu.esid_to_vsid(vcpu, eaddr >> SID_SHIFT, &vsid);
+
+   if ((vcpu->arch.msr & (MSR_DR|MSR_IR)) == MSR_DR)
+   pte.vpage |= ((u64)VSID_REAL_DR << (SID_SHIFT - 12));
+   else
+   pte.vpage |= ((u64)VSID_REAL_IR << (SID_SHIFT - 12));
+   pte.vpage |= vsid;
+
+   if (vsid == -1)
+   page_found = -EINVAL;
break;
}
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 33186b7..0b10503 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -330,30 +330,35 @@ static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu 
*vcpu, ulong ea, bool lar
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 u64 *vsid)
 {
+   ulong ea = esid << SID_SHIFT;
+   struct kvmppc_sr *sr;
+   u64 gvsid = esid;
+
+   if (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+   sr = find_sr(to_book3s(vcpu), ea);
+   if (sr->valid)
+   gvsid = sr->vsid;
+   }
+
/* In case we only have one of MSR_IR or MSR_DR set, let's put
   that in the real-mode context (and hope RM doesn't access
   high memory) */
switch (vcpu

Re: [PATCH 5/8] KVM: PPC: Be more informative on BUG

2010-04-18 Thread Jim Paris
Alexander Graf wrote:
> We have a condition in the ppc64 host mmu code that should never occur.
> Unfortunately, it just did happen to me and I was rather puzzled on why,
> because BUG_ON doesn't tell me anything useful.
> 
> So let's add some more debug output in case this goes wrong. Also change
> BUG to WARN, since I don't want to reboot every time I mess something up.
> 
> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_host.c |9 +++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
> b/arch/powerpc/kvm/book3s_64_mmu_host.c
> index 41af12f..5bf91a7 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_host.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
> @@ -231,10 +231,15 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
> kvmppc_pte *orig_pte)
>   vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
>   map = find_sid_vsid(vcpu, vsid);
>   if (!map) {
> - kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
> + ret = kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
> + WARN_ON(ret < 0);
>   map = find_sid_vsid(vcpu, vsid);
>   }
> - BUG_ON(!map);
> + if (!map) {
> + printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n",
> + vsid, orig_pte->eaddr);
> + WARN();

Return here, otherwise dereferencing map in the next line will crash anyway...

> + }
>  
>   vsid = map->host_vsid;
>   va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
> -- 
> 1.6.0.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-jim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: PPC: Be more informative on BUG

2010-04-18 Thread Alexander Graf

On 19.04.2010, at 03:01, Jim Paris wrote:

> Alexander Graf wrote:
>> We have a condition in the ppc64 host mmu code that should never occur.
>> Unfortunately, it just did happen to me and I was rather puzzled on why,
>> because BUG_ON doesn't tell me anything useful.
>> 
>> So let's add some more debug output in case this goes wrong. Also change
>> BUG to WARN, since I don't want to reboot every time I mess something up.
>> 
>> Signed-off-by: Alexander Graf 
>> ---
>> arch/powerpc/kvm/book3s_64_mmu_host.c |9 +++--
>> 1 files changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
>> b/arch/powerpc/kvm/book3s_64_mmu_host.c
>> index 41af12f..5bf91a7 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu_host.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
>> @@ -231,10 +231,15 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
>> kvmppc_pte *orig_pte)
>>  vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
>>  map = find_sid_vsid(vcpu, vsid);
>>  if (!map) {
>> -kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
>> +ret = kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
>> +WARN_ON(ret < 0);
>>  map = find_sid_vsid(vcpu, vsid);
>>  }
>> -BUG_ON(!map);
>> +if (!map) {
>> +printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n",
>> +vsid, orig_pte->eaddr);
>> +WARN();
> 
> Return here, otherwise dereferencing map in the next line will crash anyway...

Very true. In fact, I certainly remember me putting a return and a 
WARN_ON(true) because WARN() gave me a warning here. I wonder where that code 
went ... hrm ...
Either way, thanks for looking over this patch!


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: PPC: Be more informative on BUG

2010-04-18 Thread Alexander Graf

On 19.04.2010, at 03:07, Alexander Graf wrote:

> 
> On 19.04.2010, at 03:01, Jim Paris wrote:
> 
>> Alexander Graf wrote:
>>> We have a condition in the ppc64 host mmu code that should never occur.
>>> Unfortunately, it just did happen to me and I was rather puzzled on why,
>>> because BUG_ON doesn't tell me anything useful.
>>> 
>>> So let's add some more debug output in case this goes wrong. Also change
>>> BUG to WARN, since I don't want to reboot every time I mess something up.
>>> 
>>> Signed-off-by: Alexander Graf 
>>> ---
>>> arch/powerpc/kvm/book3s_64_mmu_host.c |9 +++--
>>> 1 files changed, 7 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
>>> b/arch/powerpc/kvm/book3s_64_mmu_host.c
>>> index 41af12f..5bf91a7 100644
>>> --- a/arch/powerpc/kvm/book3s_64_mmu_host.c
>>> +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
>>> @@ -231,10 +231,15 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
>>> kvmppc_pte *orig_pte)
>>> vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
>>> map = find_sid_vsid(vcpu, vsid);
>>> if (!map) {
>>> -   kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
>>> +   ret = kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
>>> +   WARN_ON(ret < 0);
>>> map = find_sid_vsid(vcpu, vsid);
>>> }
>>> -   BUG_ON(!map);
>>> +   if (!map) {
>>> +   printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n",
>>> +   vsid, orig_pte->eaddr);
>>> +   WARN();
>> 
>> Return here, otherwise dereferencing map in the next line will crash 
>> anyway...
> 
> Very true. In fact, I certainly remember me putting a return and a 
> WARN_ON(true) because WARN() gave me a warning here. I wonder where that code 
> went ... hrm ...
> Either way, thanks for looking over this patch!

Ugh - I messed up my patch rebasing. The hunk to put the return there is 
included in 6/8.

Oh well, I don't think that's enough of a bummer for a resubmit. Avi, please 
apply this set nevertheless.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: VM performance issue in KVM guests.

2010-04-18 Thread Zhang, Xiantao
Srivatsa Vaddagiri wrote:
> On Thu, Apr 15, 2010 at 03:33:18PM +0200, Peter Zijlstra wrote:
>> On Thu, 2010-04-15 at 11:18 +0300, Avi Kivity wrote:
>>> 
>>> Certainly that has even greater potential for Linux guests.  Note
>>> that we spin on mutexes now, so we need to prevent preemption while
>>> the lock owner is running.
>> 
>> either that, or disable spinning on (para) virt kernels. Para virt
>> kernels could possibly extend the thing by also checking to see if
>> the owner's vcpu is running.
> 
> I suspect we will need a combination of both approaches, given that
> we will not be able to avoid preempting guests in their critical
> section always (too long critical sections or real-time tasks wanting
> to preempt). Other idea is to gang-schedule VCPUs of the same guest
> as much as possible? 
Gang-scheduling maybe the ideal solution to solve the issue, and has to change 
host's scheduler a lot to implement it, and it maybe hard to be upstream.  So 
can we figure out an easy way(maybe not best) for this ? 
Xiantao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] Autotest: Unattended_install testcase always fail with rhel3.9-32 guest

2010-04-18 Thread David S. Ahern


On 04/18/2010 12:26 PM, Lucas Meneghel Rodrigues wrote:
> On Sat, 2010-04-17 at 22:55 -0600, David S. Ahern wrote:
>>
>> On 04/17/2010 10:09 PM, Amos Kong wrote:
>>> %post --interpreter /usr/bin/python
>>> import socket, os
>>> os.system('dhclient')
>>> os.system('chkconfig sshd on')
>>> os.system('iptables -F')
>>> os.system('echo 0 > /selinux/enforce')
>>> server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> server.bind(('', 12323))
>>> server.listen(1)
>>> (client, addr) = server.accept()
>>> client.send("done")
>>> client.close()
>>
>> So, effectively after the install completes use dhclient to configure a
>> network address, start a server on a known port and when a client
>> connects send the message "done". I would expect that to work just fine.
> 
> Me too, it has been working for RHEL 4.X, 5.X 32/64 bit and 3.X 64 bit.
> The problem has been effectively 3.9 32 bit.

I fired up a 3.9 guest with your ks.cfg. The problem is due to the
limited functionality in the RHEL3 BOOT kernel for i386. Specifically,
dhclient is failing at:

setsockopt(6, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\6\10\240Y\n\10", 8) =
-1 ENOPROTOOPT

So dhclient client is out. But you can still configure and use
networking via ifconfig if static addressing is an option for you. I was
able to use that command to configure eth0 and push an strace output
file for dhclient.

Also, a couple of comments on this use case:
- SELinux is not applicable
- 32 GB of RAM is way beyond what the RHEL3 i386 can detect and use
- 12 vcpus seems high as well.

David


> 
>> What part is not working? Have you used anaconda's root shell (alt-f2)
>> to confirm each step and if so which one is not setup as expected?
> 
> dhclient. It fails saying "module IP_... could not be loaded.
> 
>> David
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 0/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-18 Thread Zhang, Yanmin
Here is the new patch of V5 against tip/master of April 17th
if anyone wants to try it.

ChangeLog V5:
1) Split kernel patch to 2 parts. The one introduces
perf_guest_info_callbacks() and related register/unregister
functions. The other is the kvm implementation of the callbacks.
2) Port to tip/master tree of April 17th.
3) Fix a bug which causes the module parsing of default guest kernel
fail.

ChangeLog V4:
1) Based on Ingo's comments, I added help information around kvm
such like command-list.txt and perf-kvm.txt.
2) Added guest process id at the tail of kernel dso long name, so
the display could show different label with different guest os.
3) Based on Avi's comments, erase the racy window which might
trigger an NMI while the NMI isn't in guest os.
4) Fixed all the errors and warnings reported by scripts/checkpatch.pl.
5) Fixed a compilation error pointed by Yang Sheng.

ChangeLog V3:
1) Add --guestmount=/dir/to/all/guestos parameter. Admin mounts guest os
root directories under /dir/to/all/guestos by sshfs. For example, I 
start
2 guest os. The one's pid is  and the other's is .
#mkdir ~/guestmount; cd ~/guestmount
#sshfs -o allow_other,direct_io -p 5551 localhost:/ /
#sshfs -o allow_other,direct_io -p 5552 localhost:/ /
#perf kvm --host --guest --guestmount=~/guestmount top

The old --guestkallsyms and --guestmodules are still supported as 
default
guest os symbol parsing.

2) Add guest os buildid support.
3) Add sub command 'perf kvm buildid-list'.
4) Delete sub command 'perf kvm stat', because our current 
implementation
doesn't transfer guest/host requirement to kernel, and kernel always
collects both host and guest statistics. So regular 'perf stat' is ok.
5) Fix a couple of perf bugs.
6) We still have no support on command with parameter 'any' as current 
KVM
just uses process id to identify specific guest os instance. Users could
uses parameter -p to collect specific guest os instance statistics.

ChangeLog V2:
1) Based on Avi's suggestion, I moved callback functions
to generic code area. So the kernel part of the patch is
clearer.
2) Add 'perf kvm stat'.


From: Zhang, Yanmin 

Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.

The patch adds new sub command kvm to perf.

  perf kvm top
  perf kvm record
  perf kvm report
  perf kvm diff
  perf kvm buildid-list

The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.

Below are some examples.
1) perf kvm top
[r...@lkp-ne01 norm]# perf kvm --host --guest 
--guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

---
   PerfTop:   16024 irqs/sec  kernel: 2.6% us: 0.6% guest kernel:76.2% guest 
us:20.6% exact:  0.0% [1000Hz cycles],  (all, 16 CPUs)
---

 samples  pcnt function DSO
 ___ _  ___

 3740.00  8.0% __ticket_spin_lock   [guest.kernel.kallsyms]
 2056.00  4.4% copy_user_generic_string [guest.kernel.kallsyms]
 1412.00  3.0% resource_string  [guest.kernel.kallsyms]
  595.00  1.3% __switch_to  [guest.kernel.kallsyms]
  586.00  1.2% __d_lookup   [guest.kernel.kallsyms]
  574.00  1.2% tcp_sendmsg  [guest.kernel.kallsyms]
  565.00  1.2% kmem_cache_alloc [guest.kernel.kallsyms]
  532.00  1.1% tcp_ack  [guest.kernel.kallsyms]
  494.00  1.1% __kmalloc[guest.kernel.kallsyms]
  468.00  1.0% print_cfs_rq [guest.kernel.kallsyms]
  437.00  0.9% link_path_walk   [guest.kernel.kallsyms]
  380.00  0.8% balance_runtime  [guest.kernel.kallsyms]
  379.00  0.8% kmem_cache_free  [guest.kernel.kallsyms]
  377.00  0.8% in_gate_area_no_task [guest.kernel.kallsyms]
  374.00  0.8% get_page_from_freelist   [guest.kernel.kallsyms]
  372.00  0.8% mark_files_ro[guest.kernel.kallsyms]
  368.00  0.8% 

[PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-18 Thread Zhang, Yanmin
Below patch introduces perf_guest_info_callbacks and related register/unregister
functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and guest 
user
space.

Signed-off-by: Zhang Yanmin 

---

diff -Nraup --exclude-from=exclude.diff 
linux-2.6_tip0417/arch/x86/include/asm/perf_event.h 
linux-2.6_tip0417_perfkvm/arch/x86/include/asm/perf_event.h
--- linux-2.6_tip0417/arch/x86/include/asm/perf_event.h 2010-04-19 
09:51:47.557797121 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/include/asm/perf_event.h 2010-04-19 
09:53:59.689452915 +0800
@@ -135,17 +135,10 @@ extern void perf_events_lapic_init(void)
  */
 #define PERF_EFLAGS_EXACT  (1UL << 3)
 
-#define perf_misc_flags(regs)  \
-({ int misc = 0;   \
-   if (user_mode(regs))\
-   misc |= PERF_RECORD_MISC_USER;  \
-   else\
-   misc |= PERF_RECORD_MISC_KERNEL;\
-   if (regs->flags & PERF_EFLAGS_EXACT)\
-   misc |= PERF_RECORD_MISC_EXACT; \
-   misc; })
-
-#define perf_instruction_pointer(regs) ((regs)->ip)
+struct pt_regs;
+extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
+extern unsigned long perf_misc_flags(struct pt_regs *regs);
+#define perf_misc_flags(regs)  perf_misc_flags(regs)
 
 #else
 static inline void init_hw_perf_events(void)   { }
diff -Nraup --exclude-from=exclude.diff 
linux-2.6_tip0417/arch/x86/kernel/cpu/perf_event.c 
linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0417/arch/x86/kernel/cpu/perf_event.c  2010-04-19 
09:51:48.347655964 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c  2010-04-19 
09:53:59.689452915 +0800
@@ -1720,6 +1720,11 @@ struct perf_callchain_entry *perf_callch
 {
struct perf_callchain_entry *entry;
 
+   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   /* TODO: We don't support guest os callchain now */
+   return NULL;
+   }
+
if (in_nmi())
entry = &__get_cpu_var(pmc_nmi_entry);
else
@@ -1743,3 +1748,30 @@ void perf_arch_fetch_caller_regs(struct 
regs->cs = __KERNEL_CS;
local_save_flags(regs->flags);
 }
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+   unsigned long ip;
+   if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+   ip = perf_guest_cbs->get_guest_ip();
+   else
+   ip = instruction_pointer(regs);
+   return ip;
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+   int misc = 0;
+   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   misc |= perf_guest_cbs->is_user_mode() ?
+   PERF_RECORD_MISC_GUEST_USER :
+   PERF_RECORD_MISC_GUEST_KERNEL;
+   } else
+   misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
+   PERF_RECORD_MISC_KERNEL;
+   if (regs->flags & PERF_EFLAGS_EXACT)
+   misc |= PERF_RECORD_MISC_EXACT;
+
+   return misc;
+}
+
diff -Nraup --exclude-from=exclude.diff 
linux-2.6_tip0417/include/linux/perf_event.h 
linux-2.6_tip0417_perfkvm/include/linux/perf_event.h
--- linux-2.6_tip0417/include/linux/perf_event.h2010-04-19 
09:51:59.544791000 +0800
+++ linux-2.6_tip0417_perfkvm/include/linux/perf_event.h2010-04-19 
09:53:59.691378953 +0800
@@ -288,11 +288,13 @@ struct perf_event_mmap_page {
__u64   data_tail;  /* user-space written tail */
 };
 
-#define PERF_RECORD_MISC_CPUMODE_MASK  (3 << 0)
+#define PERF_RECORD_MISC_CPUMODE_MASK  (7 << 0)
 #define PERF_RECORD_MISC_CPUMODE_UNKNOWN   (0 << 0)
 #define PERF_RECORD_MISC_KERNEL(1 << 0)
 #define PERF_RECORD_MISC_USER  (2 << 0)
 #define PERF_RECORD_MISC_HYPERVISOR(3 << 0)
+#define PERF_RECORD_MISC_GUEST_KERNEL  (4 << 0)
+#define PERF_RECORD_MISC_GUEST_USER(5 << 0)
 
 #define PERF_RECORD_MISC_EXACT (1 << 14)
 /*
@@ -446,6 +448,12 @@ enum perf_callchain_context {
 # include 
 #endif
 
+struct perf_guest_info_callbacks {
+   int (*is_in_guest) (void);
+   int (*is_user_mode) (void);
+   unsigned long (*get_guest_ip) (void);
+};
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include 
 #endif
@@ -932,6 +940,12 @@ static inline void perf_event_mmap(struc
__perf_event_mmap(vma);
 }
 
+extern struct perf_guest_info_callbacks *perf_guest_cbs;
+extern int perf_register_guest_info_callbacks(
+   struct perf_guest_info_callbacks *);
+extern int perf_unregister_guest_info_callbacks(
+   struct perf_guest_info_callbacks *);
+
 extern void perf_event_comm(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
 
@@ -1001,6 +1015,11 @@ perf_sw_event(u32 event_id, u64 nr

[PATCH V5 2/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-18 Thread Zhang, Yanmin
Below patch implements the perf_guest_info_callbacks on kvm.

Signed-off-by: Zhang Yanmin 

---

diff -Nraup linux-2.6_tip0417/arch/x86/kvm/vmx.c 
linux-2.6_tip0417_perfkvm/arch/x86/kvm/vmx.c
--- linux-2.6_tip0417/arch/x86/kvm/vmx.c2010-04-19 09:51:47.908673911 
+0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kvm/vmx.c2010-04-19 
09:53:59.690399987 +0800
@@ -3654,8 +3654,11 @@ static void vmx_complete_interrupts(stru
 
/* We need to handle NMIs before interrupts are enabled */
if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
-   (exit_intr_info & INTR_INFO_VALID_MASK))
+   (exit_intr_info & INTR_INFO_VALID_MASK)) {
+   kvm_before_handle_nmi(&vmx->vcpu);
asm("int $2");
+   kvm_after_handle_nmi(&vmx->vcpu);
+   }
 
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
diff -Nraup linux-2.6_tip0417/arch/x86/kvm/x86.c 
linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c
--- linux-2.6_tip0417/arch/x86/kvm/x86.c2010-04-19 09:51:47.892676413 
+0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c2010-04-19 
09:53:59.691378953 +0800
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #undef TRACE_INCLUDE_FILE
 #define CREATE_TRACE_POINTS
@@ -3765,6 +3766,47 @@ static void kvm_timer_init(void)
}
 }
 
+static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
+
+static int kvm_is_in_guest(void)
+{
+   return percpu_read(current_vcpu) != NULL;
+}
+
+static int kvm_is_user_mode(void)
+{
+   int user_mode = 3;
+   if (percpu_read(current_vcpu))
+   user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+   return user_mode != 0;
+}
+
+static unsigned long kvm_get_guest_ip(void)
+{
+   unsigned long ip = 0;
+   if (percpu_read(current_vcpu))
+   ip = kvm_rip_read(percpu_read(current_vcpu));
+   return ip;
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+   .is_in_guest= kvm_is_in_guest,
+   .is_user_mode   = kvm_is_user_mode,
+   .get_guest_ip   = kvm_get_guest_ip,
+};
+
+void kvm_before_handle_nmi(struct kvm_vcpu *vcpu)
+{
+   percpu_write(current_vcpu, vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_before_handle_nmi);
+
+void kvm_after_handle_nmi(struct kvm_vcpu *vcpu)
+{
+   percpu_write(current_vcpu, NULL);
+}
+EXPORT_SYMBOL_GPL(kvm_after_handle_nmi);
+
 int kvm_arch_init(void *opaque)
 {
int r;
@@ -3801,6 +3843,8 @@ int kvm_arch_init(void *opaque)
 
kvm_timer_init();
 
+   perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
return 0;
 
 out:
@@ -3809,6 +3853,8 @@ out:
 
 void kvm_arch_exit(void)
 {
+   perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
diff -Nraup linux-2.6_tip0417/arch/x86/kvm/x86.h 
linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.h
--- linux-2.6_tip0417/arch/x86/kvm/x86.h2010-04-19 09:51:47.884709050 
+0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.h2010-04-19 
09:53:59.691378953 +0800
@@ -65,4 +65,7 @@ static inline int is_paging(struct kvm_v
return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
 }
 
+void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
+void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
+
 #endif


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM-Test: Add KVM unit test (kvmctl)

2010-04-18 Thread Lucas Meneghel Rodrigues
From: sshang 

The test use kvm test harness kvmctl load binary test case file
to test various functions of the kvm kernel module.

This test is for older style unit testing, after some
consideration we decided to keep the 2 modules separated.

Signed-off-by: Shuxi Shang 
---
 client/tests/kvm/tests/unit_test.py|   34 
 client/tests/kvm/tests_base.cfg.sample |   30 
 2 files changed, 64 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/unit_test.py

diff --git a/client/tests/kvm/tests/unit_test.py 
b/client/tests/kvm/tests/unit_test.py
new file mode 100644
index 000..433cbcc
--- /dev/null
+++ b/client/tests/kvm/tests/unit_test.py
@@ -0,0 +1,34 @@
+import os
+from autotest_lib.client.bin import utils
+from autotest_lib.client.common_lib import error
+
+def run_unit_test_kvmctl(test, params, env):
+"""
+This is kvm userspace unit test, use kvm test harness kvmctl load binary
+test case file to test various function of kvm kernel module.
+The output of all unit test can be found in the test result dir.
+
+@param test: KVM test object.
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+case_list = params.get("case_list").split()
+srcdir = params.get("srcdir", test.srcdir)
+user_dir = os.path.join(srcdir, "kvm_userspace", "kvm", "user")
+os.chdir(user_dir)
+fail = 0
+
+result_file = (test.outputdir, "%s.flat" % case)
+cmd = "./kvmctl test/x86/bootstrap test/x86/ %s" % testfile
+results = None
+try:
+results = utils.system_output(cmd)
+except error.CmdError, e:
+logging.error("Unit test %s failed", case)
+fail += 1
+
+if results is not None:
+utils.open_write_close(result_file, results)
+
+if fail:
+raise error.TestFail("Unit tests failed: %s", " ".join(test_fail_list))
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index e73ba44..eaba30f 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -306,6 +306,36 @@ variants:
 - ksm_parallel:
 ksm_mode = "parallel"
 
+- unit_test_kvmctl:
+type = unit_test
+vms = ''
+profilers = ''
+variants:
+- access:
+case = access
+- apic:
+case = apic
+- emulator:
+case = emulator
+- hypercall:
+case = hypercall
+- msr:
+case = msr
+- port80:
+case = port80
+- realmode:
+case = realmode
+- sieve:
+case = sieve
+- smptest:
+case = smptest
+- tsc:
+case = tsc
+- stringio:
+case = stringio
+- vmexit:
+case = vmexit
+
 - qemu_img:
 type = qemu_img
 vms = ''
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 0/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-18 Thread Ingo Molnar

* Zhang, Yanmin  wrote:

> Here is the new patch of V5 against tip/master of April 17th if anyone wants 
> to try it.

Ok, this looks pretty good from the perf angle - so once Avi likes patches #1 
and #2 and creates a pullable branch we can apply #3 as well to tip:perf/core 
and put it on the potential-2.6.35-merge road.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html