RE: CONFIG_NO_HZ causing poor console responsiveness

2010-07-01 Thread Li Yang-R58472

>-Original Message-
>From: linuxppc-dev-bounces+leoli=freescale@lists.ozlabs.org
>[mailto:linuxppc-dev-bounces+leoli=freescale@lists.ozlabs.org] On
>Behalf Of Benjamin Herrenschmidt
>Sent: Friday, July 02, 2010 1:47 PM
>To: Tabi Timur-B04825
>Cc: Linuxppc-dev Development
>Subject: Re: CONFIG_NO_HZ causing poor console responsiveness
>
>On Tue, 2010-06-29 at 14:54 -0500, Timur Tabi wrote:
>> I'm adding support for a new e500-based board (the P1022DS), and in
>> the process I've discovered that enabling CONFIG_NO_HZ (Tickless
>> System / Dynamic Ticks) causes significant responsiveness problems on
>> the serial console.  When I type on the console, I see delays of up to
>> a half-second for almost every character.  It acts as if there's a
>> background process eating all the CPU.
>>
>> I don't have time to debug this thoroughly at the moment.  The problem
>> occurs in the latest kernel, but it appears not to occur in 2.6.32.
>>
>> Has anyone else seen anything like this?
>
>I noticed that on the bimini with 2.6.35-rc* though I didn't get to track
>it down yet.


Patch found at the following location fixed this problem.

http://www.spinics.net/lists/linux-tip-commits/msg08279.html

Hope it has already been merged.

- Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: CONFIG_NO_HZ causing poor console responsiveness

2010-07-01 Thread Benjamin Herrenschmidt
On Tue, 2010-06-29 at 14:54 -0500, Timur Tabi wrote:
> I'm adding support for a new e500-based board (the P1022DS), and in
> the process I've discovered that enabling CONFIG_NO_HZ (Tickless
> System / Dynamic Ticks) causes significant responsiveness problems on
> the serial console.  When I type on the console, I see delays of up to
> a half-second for almost every character.  It acts as if there's a
> background process eating all the CPU.
> 
> I don't have time to debug this thoroughly at the moment.  The problem
> occurs in the latest kernel, but it appears not to occur in 2.6.32.
> 
> Has anyone else seen anything like this?

I noticed that on the bimini with 2.6.35-rc* though I didn't get to
track it down yet.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: CONFIG_NO_HZ causing poor console responsiveness

2010-07-01 Thread Tabi Timur-B04825
On Jul 1, 2010, at 10:46 PM, "Mike Galbraith"  wrote:
> 
> Hi Timur,
> 
> This has already fixed.  Below is the final fix from tip.

Than Mike.  I thought I was using the latest code, but I guess not.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: CONFIG_NO_HZ causing poor console responsiveness

2010-07-01 Thread Mike Galbraith
On Thu, 2010-07-01 at 16:55 -0500, Timur Tabi wrote:
> On Tue, Jun 29, 2010 at 2:54 PM, Timur Tabi  wrote:
> > I'm adding support for a new e500-based board (the P1022DS), and in
> > the process I've discovered that enabling CONFIG_NO_HZ (Tickless
> > System / Dynamic Ticks) causes significant responsiveness problems on
> > the serial console.  When I type on the console, I see delays of up to
> > a half-second for almost every character.  It acts as if there's a
> > background process eating all the CPU.
> 
> I finally finished my git-bisect, and it wasn't that helpful.  I had
> to skip several commits because the kernel just wouldn't boot:
> 
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08
> 8b911acdf08477c059d1c36c21113ab1696c612b
> 21406928afe43f1db6acab4931bb8c886f4d04ce
> 5ca9880c6f4ba4c84b517bc2fed5366adf63d191
> a64692a3afd85fe048551ab89142fd5ca99a0dbd
> f2e74eeac03ffb779d64b66a643c5e598145a28b
> c6ee36c423c3ed1fb86bb3eabba9fc256a300d16
> e12f31d3e5d36328c7fbd0fce40a95e70b59152c
> 13814d42e45dfbe845a0bbe5184565d9236896ae
> b42e0c41a422a212ddea0666d5a3a0e3c35206db
> 39c0cbe2150cbd848a25ba6cdb271d1ad46818ad <== the crime scene
> beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> 41acab8851a0408c1d5ad6c21a07456f88b54d40
> 6427462bfa50f50dc6c088c07037264fcc73eca1
> c9494727cf293ae2ec66af57547a3e79c724fec2
> We cannot bisect more!
> 
> These correspond to a batch of scheduler patches, most from Mike Galbraith.
> 
> I don't know what to do now.  I can't test any of these commits.  Even
> if I could, they look like they're all part of one set, so I doubt I
> could narrow it down to one commit anyway.

Hi Timur,

This has already fixed.  Below is the final fix from tip.

commit 3310d4d38fbc514e7b18bd3b1eea8effdd63b5aa
Author: Peter Zijlstra 
Date:   Thu Jun 17 18:02:37 2010 +0200

nohz: Fix nohz ratelimit

Chris Wedgwood reports that 39c0cbe (sched: Rate-limit nohz) causes a
serial console regression, unresponsiveness, and indeed it does. The
reason is that the nohz code is skipped even when the tick was already
stopped before the nohz_ratelimit(cpu) condition changed.

Move the nohz_ratelimit() check to the other conditions which prevent
long idle sleeps.

Reported-by: Chris Wedgwood 
Tested-by: Brian Bloniarz 
Signed-off-by: Mike Galbraith 
Signed-off-by: Peter Zijlstra 
Cc: Jiri Kosina 
Cc: Linus Torvalds 
Cc: Greg KH 
Cc: Alan Cox 
Cc: OGAWA Hirofumi 
Cc: Jef Driesen 
LKML-Reference: <1276790557.27822.516.ca...@twins>
Signed-off-by: Thomas Gleixner 

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..783fbad 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -315,9 +315,6 @@ void tick_nohz_stop_sched_tick(int inidle)
goto end;
}
 
-   if (nohz_ratelimit(cpu))
-   goto end;
-
ts->idle_calls++;
/* Read jiffies and the time when jiffies were updated last */
do {
@@ -328,7 +325,7 @@ void tick_nohz_stop_sched_tick(int inidle)
} while (read_seqretry(&xtime_lock, seq));
 
if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
-   arch_needs_cpu(cpu)) {
+   arch_needs_cpu(cpu) || nohz_ratelimit(cpu)) {
next_jiffies = last_jiffies + 1;
delta_jiffies = 1;
} else {


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Benjamin Herrenschmidt
On Thu, 2010-07-01 at 16:42 +0300, Avi Kivity wrote:
> > So I think the only reasonable way to implement page ageing is to
> unmap
> > pages. And that's slow, because it means we have to map them again
> on
> > access. Bleks. Or we could look for the HTAB entry and only unmap
> them
> > if the entry is moot.
> >
> 
> I think it works out if you update struct page when you clear out an
> HTAB.

Hrm... going to struct page without going through the PTE might work out
indeed. We can get to the struct page from the RPN.

However, that means -reading- the hash entry we want to evict, and
that's a fairly expensive H-Call, especially if we ask phyp to
back-translate the real address into a logical (partition) address so we
can get to the struct page While we might be able to reconstitute
the virtual address from the hash content + bucket address. However,
from the vsid back to the page table might be tricky as well.

IE. Either way, it's not a simple process.

Now, eviction is rare, our MMU hash is generally big, so maybe the read
back with back translate to hit struct page might be the way to go here.

As for other kind of invalidations, we do have the PTE around when they
happen so we can go fetch the HW ref bit and update the PTE I suppose.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Benjamin Herrenschmidt
On Thu, 2010-07-01 at 14:52 +0200, Alexander Graf wrote:
> Page ageing is difficult. The HTAB has a hardware set referenced bit,
> but we don't have a guarantee that the entry is still there when we look
> for it. Something else could have overwritten it by then, but the entry
> could still be lingering around in the TLB.
> 
> So I think the only reasonable way to implement page ageing is to unmap
> pages. And that's slow, because it means we have to map them again on
> access. Bleks. Or we could look for the HTAB entry and only unmap them
> if the entry is moot.

Well, not quite.

We -could- use the HW reference bit. However, that means that whenever
we flush the hash PTE we get a snapshot of the HW bit and copy it over
to the PTE.

That's not -that- bad for normal invalidations. However, it's a problem
potentially for eviction. IE. When a hash bucket is full, we
pseudo-randomly evict a slot. If we were to use the HW ref bit, we would
need a way to go back to the PTE from the hash bucket to perform that
update (or something really tricky like sticking it in a list somewhere,
and have the young test walk that list when non-empty, etc...)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Oops while running fs_racer test on a POWER6 box against latest git

2010-07-01 Thread Michael Neuling
In message <20100701105907.gk22...@laptop> you wrote:
> On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > > While running fs_racer test from LTP on a POWER6 box against latest git(2
.6.3
> > 5-rc3-git4 - commitid 984bc9601f64fd)
> > > came across the following warning followed by multiple oops.
> > > 
> > > [ cut here ]
> > > 
> > > Badness at kernel/mutex-debug.c:64
> > > NIP: c00be9e8 LR: c00be9cc CTR: 
> > > REGS: c0010be8f6f0 TRAP: 0700   Not tainted  (2.6.35-rc3-git4-autotes
t)
> > > MSR: 80029032CR: 24224422  XER: 0012
> > > TASK = c0010727cf00[8211] 'fs_racer_file_c' THREAD: c0010be8bb50 
CPU:
> >  2
> > > GPR00:  c0010be8f970 c0d3d798 000
1
> > > GPR04: c0010be8fa70 c0010be8c000 c0010727d9f8 000
0
> > > GPR08: c43042f0 c16534e8 017a c0c29a1
c
> > > GPR12: 28228424 cf600500 c0010be8fc40 200
0
> > > GPR16: f000 c00109c73000 c0010be8fc30 0001044
2
> > > GPR20:   01b6 c0010dd1225
0
> > > GPR24: c017c08c c0010727cf00 c0010dd12278 c0010dd1221
0
> > > GPR28: 0001 c0010be8c000 c0ca2008 c0010be8fa7
0
> > > NIP [c00be9e8] .mutex_remove_waiter+0xa4/0x130
> > > LR [c00be9cc] .mutex_remove_waiter+0x88/0x130
> > > Call Trace:
> > > [c0010be8f970] [c0010be8fa00] 0xc0010be8fa00 (unreliable)
> > > [c0010be8fa00] [c064a9f0] .mutex_lock_nested+0x384/0x430
> > > Instruction dump:
> > > e81f0010 e93d 7fa04800 41fe0028 482e96e5 6000 2fa3 419e0018
> > > e93e8008 8009 2f80 409e0008<0fe0>   e93e8000 8009 2f8
0
> > > Unable to handle kernel paging request for unknown fault
> > > Faulting instruction address: 0xc008d0f4
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > SMP NR_CPUS=1024 NUMA
> > > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > > pSeries
> > > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_ma
p
> > > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > > NIP: c008d0f4 LR: c008d0d0 CTR: 
> > > REGS: c0010978f900 TRAP: 0600   Tainted: GW(2.6.35-rc3-gi
t4-a
> > utotest)
> > > MSR: 80009032
> > > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > > EE,ME,IR,DR>CR: 24022442  XER: 0012
> > > DAR: c0648f54, DSISR: 4001
> > > TASK = c001096e4900[7353] 'fs_racer_file_s' THREAD: c0010978c000 
CPU:
> >  10
> > > GPR00: 4000 c0010978fb80 c0d3d798 000
1
> > > GPR04: c083539e c1610228  c54c688
0
> > > GPR08: 06a5 c0648f54 0007 049b000
0
> > > GPR12:  cf601900  fff
f
> > > GPR16: 4b7dc520   c0010978fea
0
> > > GPR20: 0fffcca7e7a0 0fffcca7e7a0 0fffabf7dfd0 0fffabf7dfd
0
> > > GPR24:  01200011 c0e1c0a8 c0648ed
4
> > > GPR28:  c001096e4900 c0ca0458 c0010725d40
0
> > > NIP [c008d0f4] .copy_process+0x310/0xf40
> > > LR [c008d0d0] .copy_process+0x2ec/0xf40
> > > Call Trace:
> > > [c0010978fb80] [c008d0d0] .copy_process+0x2ec/0xf40 (unreliab
le)
> > > [c0010978fc80] [c008deb4] .do_fork+0x190/0x3cc
> > > [c0010978fdc0] [c0011ef4] .sys_clone+0x58/0x70
> > > [c0010978fe30] [c00087f0] .ppc_clone+0x8/0xc
> > > Instruction dump:
> > > 419e0010 7fe3fb78 480774cd 6000 801f0014 e93f0008 7800b842 39290080
> > > 78004800 6042 901f0014 38004000<7d6048a8>   7d6b0078 7d6049ad 40c2fff
4
> > > 
> > > Kernel version 2.6.34-rc3-git3 works fine.
> > 
> > Should this read 2.6.35-rc3-git3?
> > 
> > If so, there's only about 20 commits in:
> > 5904b3b81d2516..984bc9601f64fd
> > 
> > The likely fs related candidates are from Christoph and Nick Piggin
> > (added to CC)
> > 
> > No commits relating to POWER6 or PPC.
> 
> Not sure what's happening here. The first warning looks like some mutex
> corruption, but it doesn't have a stack trace (these are 2 seperate
> dumps, right? ie. the copy_process stack doesn't relate to the mutex
> warning?) So I don't have much idea.
> 
> If it is reproducable, can you try getting a better stack trace, or
> better 

Re: CONFIG_NO_HZ causing poor console responsiveness

2010-07-01 Thread Timur Tabi
On Tue, Jun 29, 2010 at 2:54 PM, Timur Tabi  wrote:
> I'm adding support for a new e500-based board (the P1022DS), and in
> the process I've discovered that enabling CONFIG_NO_HZ (Tickless
> System / Dynamic Ticks) causes significant responsiveness problems on
> the serial console.  When I type on the console, I see delays of up to
> a half-second for almost every character.  It acts as if there's a
> background process eating all the CPU.

I finally finished my git-bisect, and it wasn't that helpful.  I had
to skip several commits because the kernel just wouldn't boot:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08
8b911acdf08477c059d1c36c21113ab1696c612b
21406928afe43f1db6acab4931bb8c886f4d04ce
5ca9880c6f4ba4c84b517bc2fed5366adf63d191
a64692a3afd85fe048551ab89142fd5ca99a0dbd
f2e74eeac03ffb779d64b66a643c5e598145a28b
c6ee36c423c3ed1fb86bb3eabba9fc256a300d16
e12f31d3e5d36328c7fbd0fce40a95e70b59152c
13814d42e45dfbe845a0bbe5184565d9236896ae
b42e0c41a422a212ddea0666d5a3a0e3c35206db
39c0cbe2150cbd848a25ba6cdb271d1ad46818ad
beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
41acab8851a0408c1d5ad6c21a07456f88b54d40
6427462bfa50f50dc6c088c07037264fcc73eca1
c9494727cf293ae2ec66af57547a3e79c724fec2
We cannot bisect more!

These correspond to a batch of scheduler patches, most from Mike Galbraith.

I don't know what to do now.  I can't test any of these commits.  Even
if I could, they look like they're all part of one set, so I doubt I
could narrow it down to one commit anyway.

-- 
Timur Tabi
Linux kernel developer at Freescale
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: machine check in kernel for a mpc870 board

2010-07-01 Thread Scott Wood

On 07/01/2010 03:17 PM, Shawn Jin wrote:

How do I find the address, reg, and range for nodes like localbus,
soc, eth0, cpm, serial etc.?


If your CCSRBAR is 0xfa20, then pretty much anywhere you see 0xff0x
change it to 0xfa2x.


I'm not sure about the range settings of 0xfe00. How do you get this?

local...@fa200100 {
compatible = "fsl,mpc885-localbus", "fsl,pq1-localbus",
 "simple-bus";
#address-cells =<2>;
#size-cells =<1>;
reg =<0xfa200100 0x40>;

ranges =<
0 0 0xfe00 0x0100// I'm not sure about this?
>;
};


Change 0xfe00 to wherever u-boot maps your flash, and 0x0100 to 
whatever the size of the flash localbus mapping is.


Or more generally update this section to hold whatever is connected to 
the localbus on your board.  The first cell is the chipselect.



Make sure that you've got Linux platform code enabled that matches the
top-level compatible of your device tree.  Try enabling PPC_EARLY_DEBUG_CPM,
making sure to update PPC_EARLY_DEBUG_CPM_ADDR to 0xfa202008.


I enabled this early debug feature but don't know this address change.


The address change is for the different IMMR base, only this use is too 
early/hacky to get it from the device tree.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: machine check in kernel for a mpc870 board

2010-07-01 Thread Shawn Jin
>>> How do I find the address, reg, and range for nodes like localbus,
>>> soc, eth0, cpm, serial etc.?
>
> If your CCSRBAR is 0xfa20, then pretty much anywhere you see 0xff0x
> change it to 0xfa2x.

I'm not sure about the range settings of 0xfe00. How do you get this?

   local...@fa200100 {
   compatible = "fsl,mpc885-localbus", "fsl,pq1-localbus",
"simple-bus";
   #address-cells = <2>;
   #size-cells = <1>;
   reg = <0xfa200100 0x40>;

   ranges = <
   0 0 0xfe00 0x0100// I'm not sure about this?
   >;
   };


>>     Linux/PowerPC load: root=/dev/ram
>>     Finalizing device tree... flat tree at 0x59e300
>>
>> The gdb showed deadbeef.
>>     (gdb) target remote ppcbdi:2001
>>     Remote debugging using ppcbdi:2001
>>     0xdeadbeef in ?? ()
>>     (gdb)
>>
>> The kernel doesn't seem to start. What could go wrong here?
>
> Pretty much anything. :-)

I realized that. :-P The kernel booting was able to stop at
start_kernel(). I'm going to trace further.

> Make sure that you've got Linux platform code enabled that matches the
> top-level compatible of your device tree.  Try enabling PPC_EARLY_DEBUG_CPM,
> making sure to update PPC_EARLY_DEBUG_CPM_ADDR to 0xfa202008.

I enabled this early debug feature but don't know this address change.
I'll try it later.

Thanks a lot, Scott.

-Shawn.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Oops while running fs_racer test on a POWER6 box against latest git

2010-07-01 Thread Maciej Rutecki
On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
> While running fs_racer test from LTP on a POWER6 box against latest
> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the following
> warning followed by multiple oops.
> 

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=16324
for your bug report, please add your address to the CC list in there, thanks!


-- 
Maciej Rutecki
http://www.maciek.unixy.pl
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v1]460EX on-chip SATA driver

2010-07-01 Thread Rupjyoti Sarmah
Dear All,

The Synopsis design ware core is task file orientated so the driver would
still need CONFIG_ATA_SFF.
I would be fixing the Kconfig file to make it dependent on the
CONFIG_ATA_SFF.

Regards,
Rup



-Original Message-
From: Wolfgang Denk [mailto:w...@denx.de]
Sent: Thursday, July 01, 2010 4:25 AM
To: Josh Boyer
Cc: Jeff Garzik; linux-...@vger.kernel.org; s...@denx.de; Rupjyoti Sarmah;
linux-ker...@vger.kernel.org; linuxppc-...@ozlabs.org
Subject: Re: [PATCH v1]460EX on-chip SATA driver

Dear Josh Boyer,

In message <20100630200325.gd7...@zod.rchland.ibm.com> you wrote:
>
> The driver doesn't depend on CONFIG_ATA_SFF in it's Kconfig file, but
seems to
> require it at build time.  Isn't that something that needs fixing in the
> driver?

Right.  Next question is if this is really needed for this driver.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
Copy from one, it's plagiarism; copy from two, it's research.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: machine check in kernel for a mpc870 board

2010-07-01 Thread Scott Wood

On 07/01/2010 02:50 AM, Shawn Jin wrote:

Hi Scott,


How do I find the address, reg, and range for nodes like localbus,
soc, eth0, cpm, serial etc.?


If your CCSRBAR is 0xfa20, then pretty much anywhere you see 
0xff0x change it to 0xfa2x.



I managed to proceed a little bit further.
 Memory<-<0x0 0x800>  (128MB)
 ENET0: local-mac-address<- 00:09:9b:01:58:64
 CPU clock-frequency<- 0x7270e00 (120MHz)
 CPU timebase-frequency<- 0x393870 (4MHz)
 CPU bus-frequency<- 0x3938700 (60MHz)

 zImage starting: loaded at 0x0040 (sp: 0x07d1ccd0)
 Allocating 0x186bdd bytes for kernel ...
 gunzipping (0x<- 0x0040c000:0x00591c30)...done 0x173b18 bytes

 Linux/PowerPC load: root=/dev/ram
 Finalizing device tree... flat tree at 0x59e300

The gdb showed deadbeef.
 (gdb) target remote ppcbdi:2001
 Remote debugging using ppcbdi:2001
 0xdeadbeef in ?? ()
 (gdb)

The kernel doesn't seem to start. What could go wrong here?


Pretty much anything. :-)

Make sure that you've got Linux platform code enabled that matches the 
top-level compatible of your device tree.  Try enabling 
PPC_EARLY_DEBUG_CPM, making sure to update PPC_EARLY_DEBUG_CPM_ADDR to 
0xfa202008.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] KVM: PPC: Book3S_32 MMU debug compile fixes

2010-07-01 Thread Alexander Graf
Alexander Graf wrote:
> Due to previous changes, the Book3S_32 guest MMU code didn't compile properly
> when enabling debugging.
>
> This patch repairs the broken code paths, making it possible to define 
> DEBUG_MMU
> and friends again.
>
> Signed-off-by: Alexander Graf 
>   

Please also don't forget this patch :)


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Marcelo Tosatti
On Wed, Jun 30, 2010 at 03:18:44PM +0200, Alexander Graf wrote:
> Book3s suffered from my really bad shadow MMU implementation so far. So
> I finally got around to implement a combined hash and list mechanism that
> allows for much faster lookup of mapped pages.
> 
> To show that it really is faster, I tried to run simple process spawning
> code inside the guest with and without these patches:
> 
> [without]
> 
> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; 
> done
> 
> real0m20.235s
> user0m10.418s
> sys 0m9.766s
> 
> [with]
> 
> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; 
> done
> 
> real0m14.659s
> user0m8.967s
> sys 0m5.688s
> 
> So as you can see, performance improved significantly.
> 
> v2 -> v3:
> 
>   - use hlist
>   - use global slab cache
> 
> Alexander Graf (2):
>   KVM: PPC: Add generic hpte management functions
>   KVM: PPC: Make use of hash based Shadow MMU
> 
>  arch/powerpc/include/asm/kvm_book3s.h |9 +
>  arch/powerpc/include/asm/kvm_host.h   |   17 ++-
>  arch/powerpc/kvm/Makefile |2 +
>  arch/powerpc/kvm/book3s.c |   14 ++-
>  arch/powerpc/kvm/book3s_32_mmu_host.c |  104 ++---
>  arch/powerpc/kvm/book3s_64_mmu_host.c |   98 +---
>  arch/powerpc/kvm/book3s_mmu_hpte.c|  277 
> +
>  7 files changed, 331 insertions(+), 190 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_mmu_hpte.c

Applied, thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Avi Kivity

On 07/01/2010 03:52 PM, Alexander Graf wrote:



Don't you use lazy spte updates?

   

We do, but given enough time, the guest will touch its entire memory.
 

Oh, so that's the major difference. On PPC we have the HTAB with a
fraction of all the mapped pages in it. We don't have a notion of a full
page table for a guest process. We always only have a snapshot of some
mappings and shadow those lazily.

So at worst, we have HPTEG_CACHE_NUM shadow pages mapped, which would be
(1<<  15) * 4k which again would be at most 128MB of guest memory. We
can't hold more mappings than that anyways, so chances are low we have a
mapping for each hva.
   


Doesn't that seriously impact performance?  A guest that recycles pages 
from its lru will touch pages at random from its entire address space.  
On bare metal that isn't a problem (I imagine) due to large tlbs.  But 
virtualized on 4K pages that means the htlb will be thrashed.



But then again I probably do need an rmap for the mmu_notifier magic,
right? But I'd rather prefer to have that code path be slow and the
dirty bitmap invalidation fast than the other way around. Swapping is
slow either way.

   

It's not just swapping, it's also page ageing.  That needs to be
fast.  Does ppc have a hardware-set referenced bit?  If so, you need a
fast rmap for mmu notifiers.
 

Page ageing is difficult. The HTAB has a hardware set referenced bit,
but we don't have a guarantee that the entry is still there when we look
for it. Something else could have overwritten it by then, but the entry
could still be lingering around in the TLB.
   


Whoever's dropping the HTAB needs to update the host struct page, and 
also reflect the bit into the guest's HTAB, no?


In fact, on x86 shadow, we don't have an spte for a gpte that is not 
accessed, precisely so we know the exact point in time when the accessed 
bit is set.



So I think the only reasonable way to implement page ageing is to unmap
pages. And that's slow, because it means we have to map them again on
access. Bleks. Or we could look for the HTAB entry and only unmap them
if the entry is moot.
   


I think it works out if you update struct page when you clear out an HTAB.

--
error compiling committee.c: too many arguments to function

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Alexander Graf
Avi Kivity wrote:
> On 07/01/2010 03:28 PM, Alexander Graf wrote:
>>
>>>
Wouldn't it speed up dirty bitmap flushing
 a lot if we'd just have a simple linked list of all sPTEs belonging to
 that memslot?


>>> The complexity is O(pages_in_slot) + O(sptes_for_slot).
>>>
>>> Usually, every page is mapped at least once, so sptes_for_slot
>>> dominates.  Even when it isn't so, iterating the rmap base pointers is
>>> very fast since they are linear in memory, while sptes are scattered
>>> around, causing cache misses.
>>>  
>> Why would pages be mapped often?
>
> It's not a question of how often they are mapped (shadow: very often;
> tdp: very rarely) but what percentage of pages are mapped.  It's
> usually 100%.
>
>> Don't you use lazy spte updates?
>>
>
> We do, but given enough time, the guest will touch its entire memory.

Oh, so that's the major difference. On PPC we have the HTAB with a
fraction of all the mapped pages in it. We don't have a notion of a full
page table for a guest process. We always only have a snapshot of some
mappings and shadow those lazily.

So at worst, we have HPTEG_CACHE_NUM shadow pages mapped, which would be
(1 << 15) * 4k which again would be at most 128MB of guest memory. We
can't hold more mappings than that anyways, so chances are low we have a
mapping for each hva.

>
>
>>> Another consideration is that on x86, an spte occupies just 64 bits
>>> (for the hardware pte); if there are multiple sptes per page (rare on
>>> modern hardware), there is also extra memory for rmap chains;
>>> sometimes we also allocate 64 bits for the gfn.  Having an extra
>>> linked list would require more memory to be allocated and maintained.
>>>  
>> Hrm. I was thinking of not having an rmap but only using the chain. The
>> only slots that would require such a chain would be the ones with dirty
>> bitmapping enabled, so no penalty for normal RAM (unless you use kemari
>> or live migration of course).
>>
>
> You could also only chain writeable ptes.

Very true. Probably even more useful :).

>
>> But then again I probably do need an rmap for the mmu_notifier magic,
>> right? But I'd rather prefer to have that code path be slow and the
>> dirty bitmap invalidation fast than the other way around. Swapping is
>> slow either way.
>>
>
> It's not just swapping, it's also page ageing.  That needs to be
> fast.  Does ppc have a hardware-set referenced bit?  If so, you need a
> fast rmap for mmu notifiers.

Page ageing is difficult. The HTAB has a hardware set referenced bit,
but we don't have a guarantee that the entry is still there when we look
for it. Something else could have overwritten it by then, but the entry
could still be lingering around in the TLB.

So I think the only reasonable way to implement page ageing is to unmap
pages. And that's slow, because it means we have to map them again on
access. Bleks. Or we could look for the HTAB entry and only unmap them
if the entry is moot.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Avi Kivity

On 07/01/2010 03:28 PM, Alexander Graf wrote:





   Wouldn't it speed up dirty bitmap flushing
a lot if we'd just have a simple linked list of all sPTEs belonging to
that memslot?

   

The complexity is O(pages_in_slot) + O(sptes_for_slot).

Usually, every page is mapped at least once, so sptes_for_slot
dominates.  Even when it isn't so, iterating the rmap base pointers is
very fast since they are linear in memory, while sptes are scattered
around, causing cache misses.
 

Why would pages be mapped often?


It's not a question of how often they are mapped (shadow: very often; 
tdp: very rarely) but what percentage of pages are mapped.  It's usually 
100%.



Don't you use lazy spte updates?
   


We do, but given enough time, the guest will touch its entire memory.



Another consideration is that on x86, an spte occupies just 64 bits
(for the hardware pte); if there are multiple sptes per page (rare on
modern hardware), there is also extra memory for rmap chains;
sometimes we also allocate 64 bits for the gfn.  Having an extra
linked list would require more memory to be allocated and maintained.
 

Hrm. I was thinking of not having an rmap but only using the chain. The
only slots that would require such a chain would be the ones with dirty
bitmapping enabled, so no penalty for normal RAM (unless you use kemari
or live migration of course).
   


You could also only chain writeable ptes.


But then again I probably do need an rmap for the mmu_notifier magic,
right? But I'd rather prefer to have that code path be slow and the
dirty bitmap invalidation fast than the other way around. Swapping is
slow either way.
   


It's not just swapping, it's also page ageing.  That needs to be fast.  
Does ppc have a hardware-set referenced bit?  If so, you need a fast 
rmap for mmu notifiers.


--
error compiling committee.c: too many arguments to function

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Alexander Graf
Avi Kivity wrote:
> On 07/01/2010 01:00 PM, Alexander Graf wrote:
>>
>> But doesn't that mean that you still need to loop through all the hvas
>> that you want to invalidate?
>
> It does.
>
>>   Wouldn't it speed up dirty bitmap flushing
>> a lot if we'd just have a simple linked list of all sPTEs belonging to
>> that memslot?
>>
>
> The complexity is O(pages_in_slot) + O(sptes_for_slot).
>
> Usually, every page is mapped at least once, so sptes_for_slot
> dominates.  Even when it isn't so, iterating the rmap base pointers is
> very fast since they are linear in memory, while sptes are scattered
> around, causing cache misses.

Why would pages be mapped often? Don't you use lazy spte updates?

>
> Another consideration is that on x86, an spte occupies just 64 bits
> (for the hardware pte); if there are multiple sptes per page (rare on
> modern hardware), there is also extra memory for rmap chains;
> sometimes we also allocate 64 bits for the gfn.  Having an extra
> linked list would require more memory to be allocated and maintained.

Hrm. I was thinking of not having an rmap but only using the chain. The
only slots that would require such a chain would be the ones with dirty
bitmapping enabled, so no penalty for normal RAM (unless you use kemari
or live migration of course).

But then again I probably do need an rmap for the mmu_notifier magic,
right? But I'd rather prefer to have that code path be slow and the
dirty bitmap invalidation fast than the other way around. Swapping is
slow either way.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] KVM: PPC: Magic Page BookE support

2010-07-01 Thread Alexander Graf
Josh Boyer wrote:
> On Thu, Jul 01, 2010 at 12:42:49PM +0200, Alexander Graf wrote:
>   
>> As we now have Book3s support for the magic page, we also need BookE to
>> join in on the party.
>>
>> This patch implements generic magic page logic for BookE and specific
>> TLB logic for e500. I didn't have any 440 around, so I didn't dare to
>> blindly try and write up broken code.
>> 
>
> Is this the only patch in the series that needs 440 specific code?  Also,
> does 440 KVM still work after this series is applied even without the code
> not present in this patch?
>   

Yes, pretty much. The rest of the code is generic. But 440 should easily
just work with this patch set. If you have one to try it out, please
give it a try. I can even prepare a 440 enabling patch so you could
verify if it works.

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] KVM: PPC: Magic Page BookE support

2010-07-01 Thread Josh Boyer
On Thu, Jul 01, 2010 at 12:42:49PM +0200, Alexander Graf wrote:
>As we now have Book3s support for the magic page, we also need BookE to
>join in on the party.
>
>This patch implements generic magic page logic for BookE and specific
>TLB logic for e500. I didn't have any 440 around, so I didn't dare to
>blindly try and write up broken code.

Is this the only patch in the series that needs 440 specific code?  Also,
does 440 KVM still work after this series is applied even without the code
not present in this patch?

josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Avi Kivity

On 07/01/2010 01:00 PM, Alexander Graf wrote:


But doesn't that mean that you still need to loop through all the hvas
that you want to invalidate?


It does.


  Wouldn't it speed up dirty bitmap flushing
a lot if we'd just have a simple linked list of all sPTEs belonging to
that memslot?
   


The complexity is O(pages_in_slot) + O(sptes_for_slot).

Usually, every page is mapped at least once, so sptes_for_slot 
dominates.  Even when it isn't so, iterating the rmap base pointers is 
very fast since they are linear in memory, while sptes are scattered 
around, causing cache misses.


Another consideration is that on x86, an spte occupies just 64 bits (for 
the hardware pte); if there are multiple sptes per page (rare on modern 
hardware), there is also extra memory for rmap chains; sometimes we also 
allocate 64 bits for the gfn.  Having an extra linked list would require 
more memory to be allocated and maintained.


--
error compiling committee.c: too many arguments to function

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Oops while running fs_racer test on a POWER6 box against latest git

2010-07-01 Thread Nick Piggin
On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > While running fs_racer test from LTP on a POWER6 box against latest 
> > git(2.6.3
> 5-rc3-git4 - commitid 984bc9601f64fd)
> > came across the following warning followed by multiple oops.
> > 
> > [ cut here ]
> > 
> > Badness at kernel/mutex-debug.c:64
> > NIP: c00be9e8 LR: c00be9cc CTR: 
> > REGS: c0010be8f6f0 TRAP: 0700   Not tainted  (2.6.35-rc3-git4-autotest)
> > MSR: 80029032CR: 24224422  XER: 0012
> > TASK = c0010727cf00[8211] 'fs_racer_file_c' THREAD: c0010be8bb50 
> > CPU:
>  2
> > GPR00:  c0010be8f970 c0d3d798 0001
> > GPR04: c0010be8fa70 c0010be8c000 c0010727d9f8 
> > GPR08: c43042f0 c16534e8 017a c0c29a1c
> > GPR12: 28228424 cf600500 c0010be8fc40 2000
> > GPR16: f000 c00109c73000 c0010be8fc30 00010442
> > GPR20:   01b6 c0010dd12250
> > GPR24: c017c08c c0010727cf00 c0010dd12278 c0010dd12210
> > GPR28: 0001 c0010be8c000 c0ca2008 c0010be8fa70
> > NIP [c00be9e8] .mutex_remove_waiter+0xa4/0x130
> > LR [c00be9cc] .mutex_remove_waiter+0x88/0x130
> > Call Trace:
> > [c0010be8f970] [c0010be8fa00] 0xc0010be8fa00 (unreliable)
> > [c0010be8fa00] [c064a9f0] .mutex_lock_nested+0x384/0x430
> > Instruction dump:
> > e81f0010 e93d 7fa04800 41fe0028 482e96e5 6000 2fa3 419e0018
> > e93e8008 8009 2f80 409e0008<0fe0>   e93e8000 8009 2f80
> > Unable to handle kernel paging request for unknown fault
> > Faulting instruction address: 0xc008d0f4
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > SMP NR_CPUS=1024 NUMA
> > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > pSeries
> > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > NIP: c008d0f4 LR: c008d0d0 CTR: 
> > REGS: c0010978f900 TRAP: 0600   Tainted: GW
> > (2.6.35-rc3-git4-a
> utotest)
> > MSR: 80009032
> > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > Unrecoverable FP Unavailable Exception 800 at c0648ed4
> > EE,ME,IR,DR>CR: 24022442  XER: 0012
> > DAR: c0648f54, DSISR: 4001
> > TASK = c001096e4900[7353] 'fs_racer_file_s' THREAD: c0010978c000 
> > CPU:
>  10
> > GPR00: 4000 c0010978fb80 c0d3d798 0001
> > GPR04: c083539e c1610228  c54c6880
> > GPR08: 06a5 c0648f54 0007 049b
> > GPR12:  cf601900  
> > GPR16: 4b7dc520   c0010978fea0
> > GPR20: 0fffcca7e7a0 0fffcca7e7a0 0fffabf7dfd0 0fffabf7dfd0
> > GPR24:  01200011 c0e1c0a8 c0648ed4
> > GPR28:  c001096e4900 c0ca0458 c0010725d400
> > NIP [c008d0f4] .copy_process+0x310/0xf40
> > LR [c008d0d0] .copy_process+0x2ec/0xf40
> > Call Trace:
> > [c0010978fb80] [c008d0d0] .copy_process+0x2ec/0xf40 (unreliable)
> > [c0010978fc80] [c008deb4] .do_fork+0x190/0x3cc
> > [c0010978fdc0] [c0011ef4] .sys_clone+0x58/0x70
> > [c0010978fe30] [c00087f0] .ppc_clone+0x8/0xc
> > Instruction dump:
> > 419e0010 7fe3fb78 480774cd 6000 801f0014 e93f0008 7800b842 39290080
> > 78004800 6042 901f0014 38004000<7d6048a8>   7d6b0078 7d6049ad 40c2fff4
> > 
> > Kernel version 2.6.34-rc3-git3 works fine.
> 
> Should this read 2.6.35-rc3-git3?
> 
> If so, there's only about 20 commits in:
> 5904b3b81d2516..984bc9601f64fd
> 
> The likely fs related candidates are from Christoph and Nick Piggin
> (added to CC)
> 
> No commits relating to POWER6 or PPC.

Not sure what's happening here. The first warning looks like some mutex
corruption, but it doesn't have a stack trace (these are 2 seperate
dumps, right? ie. the copy_process stack doesn't relate to the mutex
warning?) So I don't have much idea.

If it is reproducable, can you try getting a better stack trace, or
better yet, even bisecting if there is just a small window?

Thanks,
Nick

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 14/27] KVM: PPC: Magic Page BookE support

2010-07-01 Thread Alexander Graf
As we now have Book3s support for the magic page, we also need BookE to
join in on the party.

This patch implements generic magic page logic for BookE and specific
TLB logic for e500. I didn't have any 440 around, so I didn't dare to
blindly try and write up broken code.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/booke.c|   29 +
 arch/powerpc/kvm/e500_tlb.c |   19 +--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 0f8ff9d..9609207 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -244,6 +244,31 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
vcpu->arch.shared->int_pending = 0;
 }
 
+/* Check if a DTLB miss was on the magic page. Returns !0 if so. */
+int kvmppc_dtlb_magic_page(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+   ulong mp_ea = vcpu->arch.magic_page_ea;
+   ulong gpaddr = vcpu->arch.magic_page_pa;
+   int gtlb_index = 11 | (1 << 16); /* Random number in TLB1 */
+
+   /* Check for existence of magic page */
+   if(likely(!mp_ea))
+   return 0;
+
+   /* Check if we're on the magic page */
+   if(likely((eaddr >> 12) != (mp_ea >> 12)))
+   return 0;
+
+   /* Don't map in user mode */
+   if(vcpu->arch.shared->msr & MSR_PR)
+   return 0;
+
+   kvmppc_mmu_map(vcpu, vcpu->arch.magic_page_ea, gpaddr, gtlb_index);
+   kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS);
+
+   return 1;
+}
+
 /**
  * kvmppc_handle_exit
  *
@@ -311,6 +336,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = RESUME_HOST;
break;
case EMULATE_FAIL:
+   case EMULATE_DO_MMIO:
/* XXX Deliver Program interrupt to guest. */
printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
   __func__, vcpu->arch.pc, vcpu->arch.last_inst);
@@ -380,6 +406,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
gpa_t gpaddr;
gfn_t gfn;
 
+   if (kvmppc_dtlb_magic_page(vcpu, eaddr))
+   break;
+
/* Check the guest TLB. */
gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr);
if (gtlb_index < 0) {
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 66845a5..f5582ca 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -295,9 +295,22 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
struct page *new_page;
struct tlbe *stlbe;
hpa_t hpaddr;
+   u32 mas2 = gtlbe->mas2;
+   u32 mas3 = gtlbe->mas3;
 
stlbe = &vcpu_e500->shadow_tlb[tlbsel][esel];
 
+   if ((vcpu_e500->vcpu.arch.magic_page_ea) &&
+   ((vcpu_e500->vcpu.arch.magic_page_pa >> PAGE_SHIFT) == gfn) &&
+   !(vcpu_e500->vcpu.arch.shared->msr & MSR_PR)) {
+   mas2 = 0;
+   mas3 = E500_TLB_SUPER_PERM_MASK;
+   hpaddr = virt_to_phys(vcpu_e500->vcpu.arch.shared);
+   new_page = pfn_to_page(hpaddr >> PAGE_SHIFT);
+   get_page(new_page);
+   goto mapped;
+   }
+
/* Get reference to new page. */
new_page = gfn_to_page(vcpu_e500->vcpu.kvm, gfn);
if (is_error_page(new_page)) {
@@ -305,6 +318,8 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
kvm_release_page_clean(new_page);
return;
}
+
+mapped:
hpaddr = page_to_phys(new_page);
 
/* Drop reference to old page. */
@@ -316,10 +331,10 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
stlbe->mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K)
| MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN)
-   | e500_shadow_mas2_attrib(gtlbe->mas2,
+   | e500_shadow_mas2_attrib(mas2,
vcpu_e500->vcpu.arch.shared->msr & MSR_PR);
stlbe->mas3 = (hpaddr & MAS3_RPN)
-   | e500_shadow_mas3_attrib(gtlbe->mas3,
+   | e500_shadow_mas3_attrib(mas3,
vcpu_e500->vcpu.arch.shared->msr & MSR_PR);
stlbe->mas7 = (hpaddr >> 32) & MAS7_RPN;
 
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 25/27] KVM: PPC: PV mtmsrd L=0 and mtmsr

2010-07-01 Thread Alexander Graf
There is also a form of mtmsr where all bits need to be addressed. While the
PPC64 Linux kernel behaves resonably well here, on PPC32 we do not have an
L=1 form. It does mtmsr even for simple things like only changing EE.

So we need to hook into that one as well and check for a mask of bits that we
deem safe to change from within guest context.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - use kvm_patch_ins_b
---
 arch/powerpc/kernel/kvm.c  |   51 
 arch/powerpc/kernel/kvm_emul.S |   84 
 2 files changed, 135 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 1e32298..2541736 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -62,7 +62,9 @@
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
 #define KVM_INST_TLBSYNC   0x7c00046c
+#define KVM_INST_MTMSRD_L0 0x7c000164
 #define KVM_INST_MTMSRD_L1 0x7c010164
+#define KVM_INST_MTMSR 0x7c000124
 
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
@@ -166,6 +168,49 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
kvm_patch_ins_b(inst, distance_start);
 }
 
+extern u32 kvm_emulate_mtmsr_branch_offs;
+extern u32 kvm_emulate_mtmsr_reg1_offs;
+extern u32 kvm_emulate_mtmsr_reg2_offs;
+extern u32 kvm_emulate_mtmsr_reg3_offs;
+extern u32 kvm_emulate_mtmsr_orig_ins_offs;
+extern u32 kvm_emulate_mtmsr_len;
+extern u32 kvm_emulate_mtmsr[];
+
+static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_mtmsr_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)&p[kvm_emulate_mtmsr_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start > KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4);
+   p[kvm_emulate_mtmsr_branch_offs] |= distance_end & KVM_INST_B_MASK;
+   p[kvm_emulate_mtmsr_reg1_offs] |= rt;
+   p[kvm_emulate_mtmsr_reg2_offs] |= rt;
+   p[kvm_emulate_mtmsr_reg3_offs] |= rt;
+   p[kvm_emulate_mtmsr_orig_ins_offs] = *inst;
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4);
+
+   /* Patch the invocation */
+   kvm_patch_ins_b(inst, distance_start);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -246,6 +291,12 @@ static void kvm_check_ins(u32 *inst)
if (get_rt(inst_rt) < 30)
kvm_patch_ins_mtmsrd(inst, inst_rt);
break;
+   case KVM_INST_MTMSR:
+   case KVM_INST_MTMSRD_L0:
+   /* We use r30 and r31 during the hook */
+   if (get_rt(inst_rt) < 30)
+   kvm_patch_ins_mtmsr(inst, inst_rt);
+   break;
}
 
switch (_inst) {
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 25e6683..ccf5a42 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -110,3 +110,87 @@ kvm_emulate_mtmsrd_reg_offs:
 .global kvm_emulate_mtmsrd_len
 kvm_emulate_mtmsrd_len:
.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
+
+
+#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI)
+#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS
+
+.global kvm_emulate_mtmsr
+kvm_emulate_mtmsr:
+
+   SCRATCH_SAVE
+
+   /* Fetch old MSR in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Find the changed bits between old and new MSR */
+kvm_emulate_mtmsr_reg1:
+   xor r31, r0, r31
+
+   /* Check if we need to really do mtmsr */
+   LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS)
+   and.r31, r31, r30
+
+   /* No critical bits changed? Maybe we can stay in the guest. */
+   beq maybe_stay_in_guest
+
+do_mtmsr:
+
+   SCRATCH_RESTORE
+
+   /* Just fire off the mtmsr if it's critical */
+kvm_emulate_mtmsr_orig_ins:
+   mtmsr   r0
+
+   b   kvm_emulate_mtmsr_branch
+
+maybe_stay_in_guest:
+
+   /* Check if we have to fetch an interrupt */
+   lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
+   cmpwi   r31, 0
+   beq+no_mtmsr
+
+   /* Check if we may trigger an interrupt */
+kvm_emulate_mtmsr_reg2:
+   andi.   r31, r0, MSR_EE
+   beq no_mtmsr
+
+   b   do_mtmsr
+
+no_mtmsr:
+
+   /* Put MSR into magic page because we don't call mtmsr */
+kvm_emulate_mtmsr_reg3:
+   STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller 

[PATCH 19/27] KVM: PPC: PV instructions to loads and stores

2010-07-01 Thread Alexander Graf
Some instructions can simply be replaced by load and store instructions to
or from the magic page.

This patch replaces often called instructions that fall into the above category.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - use kvm_patch_ins
---
 arch/powerpc/kernel/kvm.c |  111 +
 1 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 1f328d5..7094ee4 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,6 +32,35 @@
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+#define KVM_INST_LWZ   0x8000
+#define KVM_INST_STW   0x9000
+#define KVM_INST_LD0xe800
+#define KVM_INST_STD   0xf800
+#define KVM_INST_NOP   0x6000
+#define KVM_INST_B 0x4800
+#define KVM_INST_B_MASK0x03ff
+#define KVM_INST_B_MAX 0x01ff
+
+#define KVM_MASK_RT0x03e0
+#define KVM_INST_MFMSR 0x7ca6
+#define KVM_INST_MFSPR_SPRG0   0x7c1042a6
+#define KVM_INST_MFSPR_SPRG1   0x7c1142a6
+#define KVM_INST_MFSPR_SPRG2   0x7c1242a6
+#define KVM_INST_MFSPR_SPRG3   0x7c1342a6
+#define KVM_INST_MFSPR_SRR00x7c1a02a6
+#define KVM_INST_MFSPR_SRR10x7c1b02a6
+#define KVM_INST_MFSPR_DAR 0x7c1302a6
+#define KVM_INST_MFSPR_DSISR   0x7c1202a6
+
+#define KVM_INST_MTSPR_SPRG0   0x7c1043a6
+#define KVM_INST_MTSPR_SPRG1   0x7c1143a6
+#define KVM_INST_MTSPR_SPRG2   0x7c1243a6
+#define KVM_INST_MTSPR_SPRG3   0x7c1343a6
+#define KVM_INST_MTSPR_SRR00x7c1a03a6
+#define KVM_INST_MTSPR_SRR10x7c1b03a6
+#define KVM_INST_MTSPR_DAR 0x7c1303a6
+#define KVM_INST_MTSPR_DSISR   0x7c1203a6
+
 static bool kvm_patching_worked = true;
 
 static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
@@ -40,6 +69,34 @@ static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
flush_icache_range((ulong)inst, (ulong)inst + 4);
 }
 
+static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0xfffc));
+#else
+   kvm_patch_ins(inst, KVM_INST_LWZ | rt | ((addr + 4) & 0xfffc));
+#endif
+}
+
+static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt)
+{
+   kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr & 0x));
+}
+
+static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   kvm_patch_ins(inst, KVM_INST_STD | rt | (addr & 0xfffc));
+#else
+   kvm_patch_ins(inst, KVM_INST_STW | rt | ((addr + 4) & 0xfffc));
+#endif
+}
+
+static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
+{
+   kvm_patch_ins(inst, KVM_INST_STW | rt | (addr & 0xfffc));
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -54,6 +111,60 @@ static void kvm_check_ins(u32 *inst)
u32 inst_rt = _inst & KVM_MASK_RT;
 
switch (inst_no_rt) {
+   /* Loads */
+   case KVM_INST_MFMSR:
+   kvm_patch_ins_ld(inst, magic_var(msr), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG0:
+   kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG1:
+   kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG2:
+   kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG3:
+   kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SRR0:
+   kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SRR1:
+   kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt);
+   break;
+   case KVM_INST_MFSPR_DAR:
+   kvm_patch_ins_ld(inst, magic_var(dar), inst_rt);
+   break;
+   case KVM_INST_MFSPR_DSISR:
+   kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt);
+   break;
+
+   /* Stores */
+   case KVM_INST_MTSPR_SPRG0:
+   kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG1:
+   kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG2:
+   kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG3:
+   kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SRR0:
+   kvm_patch_ins_std(inst, magic_var(srr0), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SRR1:
+   kvm_patch_ins_std(inst, magic_var(srr1), inst_rt);
+   break;
+   case KVM_INST

[PATCH 24/27] KVM: PPC: PV mtmsrd L=1

2010-07-01 Thread Alexander Graf
The PowerPC ISA has a special instruction for mtmsr that only changes the EE
and RI bits, namely the L=1 form.

Since that one is reasonably often occuring and simple to implement, let's
go with this first. Writing EE=0 is always just a store. Doing EE=1 also
requires us to check for pending interrupts and if necessary exit back to the
hypervisor.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - use kvm_patch_ins_b
---
 arch/powerpc/kernel/kvm.c  |   45 
 arch/powerpc/kernel/kvm_emul.S |   56 
 2 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 337e3e5..1e32298 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -62,6 +62,7 @@
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
 #define KVM_INST_TLBSYNC   0x7c00046c
+#define KVM_INST_MTMSRD_L1 0x7c010164
 
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
@@ -128,6 +129,43 @@ static u32 *kvm_alloc(int len)
return p;
 }
 
+extern u32 kvm_emulate_mtmsrd_branch_offs;
+extern u32 kvm_emulate_mtmsrd_reg_offs;
+extern u32 kvm_emulate_mtmsrd_len;
+extern u32 kvm_emulate_mtmsrd[];
+
+static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_mtmsrd_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)&p[kvm_emulate_mtmsrd_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start > KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4);
+   p[kvm_emulate_mtmsrd_branch_offs] |= distance_end & KVM_INST_B_MASK;
+   p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4);
+
+   /* Patch the invocation */
+   kvm_patch_ins_b(inst, distance_start);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -201,6 +239,13 @@ static void kvm_check_ins(u32 *inst)
case KVM_INST_TLBSYNC:
kvm_patch_ins_nop(inst);
break;
+
+   /* Rewrites */
+   case KVM_INST_MTMSRD_L1:
+   /* We use r30 and r31 during the hook */
+   if (get_rt(inst_rt) < 30)
+   kvm_patch_ins_mtmsrd(inst, inst_rt);
+   break;
}
 
switch (_inst) {
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 7da835a..25e6683 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -54,3 +54,59 @@
/* Disable critical section. We are critical if \
   shared->critical == r1 and r2 is always != r1 */ \
STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);
+
+.global kvm_emulate_mtmsrd
+kvm_emulate_mtmsrd:
+
+   SCRATCH_SAVE
+
+   /* Put MSR & ~(MSR_EE|MSR_RI) in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+   lis r30, (~(MSR_EE | MSR_RI))@h
+   ori r30, r30, (~(MSR_EE | MSR_RI))@l
+   and r31, r31, r30
+
+   /* OR the register's (MSR_EE|MSR_RI) on MSR */
+kvm_emulate_mtmsrd_reg:
+   andi.   r30, r0, (MSR_EE|MSR_RI)
+   or  r31, r31, r30
+
+   /* Put MSR back into magic page */
+   STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Check if we have to fetch an interrupt */
+   lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
+   cmpwi   r31, 0
+   beq+no_check
+
+   /* Check if we may trigger an interrupt */
+   andi.   r30, r30, MSR_EE
+   beq no_check
+
+   SCRATCH_RESTORE
+
+   /* Nag hypervisor */
+   tlbsync
+
+   b   kvm_emulate_mtmsrd_branch
+
+no_check:
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_mtmsrd_branch:
+   b   .
+kvm_emulate_mtmsrd_end:
+
+.global kvm_emulate_mtmsrd_branch_offs
+kvm_emulate_mtmsrd_branch_offs:
+   .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4
+
+.global kvm_emulate_mtmsrd_reg_offs
+kvm_emulate_mtmsrd_reg_offs:
+   .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4
+
+.global kvm_emulate_mtmsrd_len
+kvm_emulate_mtmsrd_len:
+   .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 27/27] KVM: PPC: Add Documentation about PV interface

2010-07-01 Thread Alexander Graf
We just introduced a new PV interface that screams for documentation. So here
it is - a shiny new and awesome text file describing the internal works of
the PPC KVM paravirtual interface.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - clarify guest implementation
  - clarify that privileged instructions still work
  - explain safe MSR bits
  - Fix dsisr patch description
  - change hypervisor calls to use new register values
---
 Documentation/kvm/ppc-pv.txt |  185 ++
 1 files changed, 185 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt

diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
new file mode 100644
index 000..82de6c6
--- /dev/null
+++ b/Documentation/kvm/ppc-pv.txt
@@ -0,0 +1,185 @@
+The PPC KVM paravirtual interface
+=
+
+The basic execution principle by which KVM on PowerPC works is to run all 
kernel
+space code in PR=1 which is user space. This way we trap all privileged
+instructions and can emulate them accordingly.
+
+Unfortunately that is also the downfall. There are quite some privileged
+instructions that needlessly return us to the hypervisor even though they
+could be handled differently.
+
+This is what the PPC PV interface helps with. It takes privileged instructions
+and transforms them into unprivileged ones with some help from the hypervisor.
+This cuts down virtualization costs by about 50% on some of my benchmarks.
+
+The code for that interface can be found in arch/powerpc/kernel/kvm*
+
+Querying for existence
+==
+
+To find out if we're running on KVM or not, we overlay the PVR register. 
Usually
+the PVR register contains an id that identifies your CPU type. If, however, you
+pass KVM_PVR_PARA in the register that you want the PVR result in, the register
+still contains KVM_PVR_PARA after the mfpvr call.
+
+   LOAD_REG_IMM(r5, KVM_PVR_PARA)
+   mfpvr   r5
+   [r5 still contains KVM_PVR_PARA]
+
+Once determined to run under a PV capable KVM, you can now use hypercalls as
+described below.
+
+PPC hypercalls
+==
+
+The only viable ways to reliably get from guest context to host context are:
+
+   1) Call an invalid instruction
+   2) Call the "sc" instruction with a parameter to "sc"
+   3) Call the "sc" instruction with parameters in GPRs
+
+Method 1 is always a bad idea. Invalid instructions can be replaced later on
+by valid instructions, rendering the interface broken.
+
+Method 2 also has downfalls. If the parameter to "sc" is != 0 the spec is
+rather unclear if the sc is targeted directly for the hypervisor or the
+supervisor. It would also require that we read the syscall issuing instruction
+every time a syscall is issued, slowing down guest syscalls.
+
+Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R0 and
+KVM_SC_MAGIC_R3) in r0 and r3 respectively. If a syscall instruction with these
+magic values arrives from the guest's kernel mode, we take the syscall as a
+hypercall.
+
+The parameters are as follows:
+
+   r0  KVM_SC_MAGIC_R0
+   r3  KVM_SC_MAGIC_R3 Return code
+   r4  Hypercall number
+   r5  First parameter
+   r6  Second parameter
+   r7  Third parameter
+   r8  Fourth parameter
+
+Hypercall definitions are shared in generic code, so the same hypercall numbers
+apply for x86 and powerpc alike.
+
+The magic page
+==
+
+To enable communication between the hypervisor and guest there is a new shared
+page that contains parts of supervisor visible register state. The guest can
+map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
+
+With this hypercall issued the guest always gets the magic page mapped at the
+desired location in effective and physical address space. For now, we always
+map the page to -4096. This way we can access it using absolute load and store
+functions. The following instruction reads the first field of the magic page:
+
+   ld  rX, -4096(0)
+
+The interface is designed to be extensible should there be need later to add
+additional registers to the magic page. If you add fields to the magic page,
+also define a new hypercall feature to indicate that the host can give you more
+registers. Only if the host supports the additional features, make use of them.
+
+The magic page has the following layout as described in
+arch/powerpc/include/asm/kvm_para.h:
+
+struct kvm_vcpu_arch_shared {
+   __u64 scratch1;
+   __u64 scratch2;
+   __u64 scratch3;
+   __u64 critical; /* Guest may not get interrupts if == r1 */
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
+   __u64 srr0;
+   __u64 srr1;
+   __u64 dar;
+   __u64 msr;
+   __u32 dsisr;
+   __u32 int_pending;  /* Tells the guest if we have an interrupt 

[PATCH 21/27] KVM: PPC: Introduce kvm_tmp framework

2010-07-01 Thread Alexander Graf
We will soon require more sophisticated methods to replace single instructions
with multiple instructions. We do that by branching to a memory region where we
write replacement code for the instruction to.

This region needs to be within 32 MB of the patched instruction though, because
that's the furthest we can jump with immediate branches.

So we keep 1MB of free space around in bss. After we're done initing we can just
tell the mm system that the unused pages are free, but until then we have enough
space to fit all our code in.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/kvm.c |   41 +++--
 1 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 3a49de5..75c9e0b 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -64,6 +64,8 @@
 #define KVM_INST_TLBSYNC   0x7c00046c
 
 static bool kvm_patching_worked = true;
+static char kvm_tmp[1024 * 1024];
+static int kvm_tmp_index;
 
 static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
 {
@@ -104,6 +106,23 @@ static void kvm_patch_ins_nop(u32 *inst)
kvm_patch_ins(inst, KVM_INST_NOP);
 }
 
+static u32 *kvm_alloc(int len)
+{
+   u32 *p;
+
+   if ((kvm_tmp_index + len) > ARRAY_SIZE(kvm_tmp)) {
+   printk(KERN_ERR "KVM: No more space (%d + %d)\n",
+   kvm_tmp_index, len);
+   kvm_patching_worked = false;
+   return NULL;
+   }
+
+   p = (void*)&kvm_tmp[kvm_tmp_index];
+   kvm_tmp_index += len;
+
+   return p;
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -201,12 +220,27 @@ static void kvm_use_magic_page(void)
kvm_check_ins(p);
 }
 
+static void kvm_free_tmp(void)
+{
+   unsigned long start, end;
+
+   start = (ulong)&kvm_tmp[kvm_tmp_index + (PAGE_SIZE - 1)] & PAGE_MASK;
+   end = (ulong)&kvm_tmp[ARRAY_SIZE(kvm_tmp)] & PAGE_MASK;
+
+   /* Free the tmp space we don't need */
+   for (; start < end; start += PAGE_SIZE) {
+   ClearPageReserved(virt_to_page(start));
+   init_page_count(virt_to_page(start));
+   free_page(start);
+   totalram_pages++;
+   }
+}
+
 static int __init kvm_guest_init(void)
 {
-   char *p;
 
if (!kvm_para_available())
-   return 0;
+   goto free_tmp;
 
if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
kvm_use_magic_page();
@@ -214,6 +248,9 @@ static int __init kvm_guest_init(void)
printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
 kvm_patching_worked ? "worked" : "failed");
 
+free_tmp:
+   kvm_free_tmp();
+
return 0;
 }
 
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 22/27] KVM: PPC: Introduce branch patching helper

2010-07-01 Thread Alexander Graf
We will need to patch several instruction streams over to a different
code path, so we need a way to patch a single instruction with a branch
somewhere else.

This patch adds a helper to facilitate this patching.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/kvm.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 75c9e0b..337e3e5 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -106,6 +106,11 @@ static void kvm_patch_ins_nop(u32 *inst)
kvm_patch_ins(inst, KVM_INST_NOP);
 }
 
+static void kvm_patch_ins_b(u32 *inst, int addr)
+{
+   kvm_patch_ins(inst, KVM_INST_B | (addr & KVM_INST_B_MASK));
+}
+
 static u32 *kvm_alloc(int len)
 {
u32 *p;
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 26/27] KVM: PPC: PV wrteei

2010-07-01 Thread Alexander Graf
On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.

So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - use kvm_patch_ins_b
---
 arch/powerpc/kernel/kvm.c  |   50 
 arch/powerpc/kernel/kvm_emul.S |   41 
 2 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 2541736..995fadd 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -66,6 +66,9 @@
 #define KVM_INST_MTMSRD_L1 0x7c010164
 #define KVM_INST_MTMSR 0x7c000124
 
+#define KVM_INST_WRTEEI_0  0x7c000146
+#define KVM_INST_WRTEEI_1  0x7c008146
+
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
 static int kvm_tmp_index;
@@ -211,6 +214,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
kvm_patch_ins_b(inst, distance_start);
 }
 
+#ifdef CONFIG_BOOKE
+
+extern u32 kvm_emulate_wrteei_branch_offs;
+extern u32 kvm_emulate_wrteei_ee_offs;
+extern u32 kvm_emulate_wrteei_len;
+extern u32 kvm_emulate_wrteei[];
+
+static void kvm_patch_ins_wrteei(u32 *inst)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_wrteei_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)&p[kvm_emulate_wrteei_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start > KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4);
+   p[kvm_emulate_wrteei_branch_offs] |= distance_end & KVM_INST_B_MASK;
+   p[kvm_emulate_wrteei_ee_offs] |= (*inst & MSR_EE);
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4);
+
+   /* Patch the invocation */
+   kvm_patch_ins_b(inst, distance_start);
+}
+
+#endif
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -300,6 +344,12 @@ static void kvm_check_ins(u32 *inst)
}
 
switch (_inst) {
+#ifdef CONFIG_BOOKE
+   case KVM_INST_WRTEEI_0:
+   case KVM_INST_WRTEEI_1:
+   kvm_patch_ins_wrteei(inst);
+   break;
+#endif
}
 }
 
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index ccf5a42..b79b9de 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs:
 .global kvm_emulate_mtmsr_len
 kvm_emulate_mtmsr_len:
.long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4
+
+
+
+.global kvm_emulate_wrteei
+kvm_emulate_wrteei:
+
+   SCRATCH_SAVE
+
+   /* Fetch old MSR in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Remove MSR_EE from old MSR */
+   li  r30, 0
+   ori r30, r30, MSR_EE
+   andcr31, r31, r30
+
+   /* OR new MSR_EE onto the old MSR */
+kvm_emulate_wrteei_ee:
+   ori r31, r31, 0
+
+   /* Write new MSR value back */
+   STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_wrteei_branch:
+   b   .
+kvm_emulate_wrteei_end:
+
+.global kvm_emulate_wrteei_branch_offs
+kvm_emulate_wrteei_branch_offs:
+   .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_ee_offs
+kvm_emulate_wrteei_ee_offs:
+   .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_len
+kvm_emulate_wrteei_len:
+   .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 02/27] KVM: PPC: Convert MSR to shared page

2010-07-01 Thread Alexander Graf
One of the most obvious registers to share with the guest directly is the
MSR. The MSR contains the "interrupts enabled" flag which the guest has to
toggle in critical sections.

So in order to bring the overhead of interrupt en- and disabling down, let's
put msr into the shared page. Keep in mind that even though you can fully read
its contents, writing to it doesn't always update all state. There are a few
safe fields that don't require hypervisor interaction. See the documentation
for a list of MSR bits that are safe to be set from inside the guest.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h  |1 -
 arch/powerpc/include/asm/kvm_para.h  |1 +
 arch/powerpc/kernel/asm-offsets.c|2 +-
 arch/powerpc/kvm/44x_tlb.c   |8 ++--
 arch/powerpc/kvm/book3s.c|   65 --
 arch/powerpc/kvm/book3s_32_mmu.c |   12 +++---
 arch/powerpc/kvm/book3s_32_mmu_host.c|4 +-
 arch/powerpc/kvm/book3s_64_mmu.c |   12 +++---
 arch/powerpc/kvm/book3s_64_mmu_host.c|4 +-
 arch/powerpc/kvm/book3s_emulate.c|9 ++--
 arch/powerpc/kvm/book3s_paired_singles.c |7 ++-
 arch/powerpc/kvm/booke.c |   20 +-
 arch/powerpc/kvm/booke.h |6 +-
 arch/powerpc/kvm/booke_emulate.c |6 +-
 arch/powerpc/kvm/booke_interrupts.S  |3 +-
 arch/powerpc/kvm/e500_tlb.c  |   12 +++---
 arch/powerpc/kvm/e500_tlb.h  |2 +-
 arch/powerpc/kvm/powerpc.c   |3 +-
 18 files changed, 93 insertions(+), 84 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 246a3dd..c7aee42 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -210,7 +210,6 @@ struct kvm_vcpu_arch {
u32 cr;
 #endif
 
-   ulong msr;
 #ifdef CONFIG_PPC_BOOK3S
ulong shadow_msr;
ulong hflags;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 1485ba8..a17dc52 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #include 
 
 struct kvm_vcpu_arch_shared {
+   __u64 msr;
 };
 
 #ifdef __KERNEL__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 944f593..a55d47e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -394,13 +394,13 @@ int main(void)
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
-   DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr));
DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
+   DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
 
/* book3s */
 #ifdef CONFIG_PPC_BOOK3S
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index 8123125..4cbbca7 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -221,14 +221,14 @@ gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned 
int gtlb_index,
 
 int kvmppc_mmu_itlb_index(struct kvm_vcpu *vcpu, gva_t eaddr)
 {
-   unsigned int as = !!(vcpu->arch.msr & MSR_IS);
+   unsigned int as = !!(vcpu->arch.shared->msr & MSR_IS);
 
return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
 }
 
 int kvmppc_mmu_dtlb_index(struct kvm_vcpu *vcpu, gva_t eaddr)
 {
-   unsigned int as = !!(vcpu->arch.msr & MSR_DS);
+   unsigned int as = !!(vcpu->arch.shared->msr & MSR_DS);
 
return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
 }
@@ -353,7 +353,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, 
gpa_t gpaddr,
 
stlbe.word1 = (hpaddr & 0xfc00) | ((hpaddr >> 32) & 0xf);
stlbe.word2 = kvmppc_44x_tlb_shadow_attrib(flags,
-   vcpu->arch.msr & MSR_PR);
+   vcpu->arch.shared->msr & 
MSR_PR);
stlbe.tid = !(asid & 0xff);
 
/* Keep track of the reference so we can properly release it later. */
@@ -422,7 +422,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
 
/* Does it match current guest AS? */
/* XXX what about IS != DS? */
-   if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS))
+   if (get_tlb_ts(tlbe) != !!(vcpu->arch.shared->msr & MSR_IS))
return 0;
 
gpa = get_tlb_raddr(tlbe);
di

[PATCH 17/27] KVM: PPC: Generic KVM PV guest support

2010-07-01 Thread Alexander Graf
We have all the hypervisor pieces in place now, but the guest parts are still
missing.

This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/Makefile  |2 ++
 arch/powerpc/kernel/asm-offsets.c |   15 +++
 arch/powerpc/kernel/kvm.c |   34 ++
 arch/powerpc/kernel/kvm_emul.S|   27 +++
 arch/powerpc/platforms/Kconfig|   10 ++
 5 files changed, 88 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kernel/kvm.c
 create mode 100644 arch/powerpc/kernel/kvm_emul.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 58d0572..2d7eb9e 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -125,6 +125,8 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o
+
 # Disable GCOV in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
 GCOV_PROFILE_ftrace.o := n
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a55d47e..e3e740b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -465,6 +465,21 @@ int main(void)
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
 #endif /* CONFIG_PPC_BOOK3S */
 #endif
+
+#ifdef CONFIG_KVM_GUEST
+   DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared,
+   scratch1));
+   DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared,
+   scratch2));
+   DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared,
+   scratch3));
+   DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared,
+  int_pending));
+   DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
+   DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared,
+   critical));
+#endif
+
 #ifdef CONFIG_44x
DEFINE(PGD_T_LOG2, PGD_T_LOG2);
DEFINE(PTE_T_LOG2, PTE_T_LOG2);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
new file mode 100644
index 000..2d8dd73
--- /dev/null
+++ b/arch/powerpc/kernel/kvm.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ * Alexander Graf 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define KVM_MAGIC_PAGE (-4096L)
+#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
+
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
new file mode 100644
index 000..c7b9fc9
--- /dev/null
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -0,0 +1,27 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2010
+ *
+ * Authors: Alexander Graf 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define KVM_MAGIC_PAGE (-4096)
+
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index d1663db..1744349 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -21,6 +21,16 @@ source "arch/powerpc/platforms/44x/Kconfig"
 source "arch/powerpc/platforms/40x/Kconfig"
 source "arch/powerpc/platforms/amigaone/Kconfig"
 
+config KVM_GUEST
+   bool "KVM Guest support"
+   default y
+   ---help---
+ This

[PATCH 23/27] KVM: PPC: PV assembler helpers

2010-07-01 Thread Alexander Graf
When we hook an instruction we need to make sure we don't clobber any of
the registers at that point. So we write them out to scratch space in the
magic page. To make sure we don't fall into a race with another piece of
hooked code, we need to disable interrupts.

To make the later patches and code in general easier readable, let's introduce
a set of defines that save and restore r30, r31 and cr. Let's also define some
helpers to read the lower 32 bits of a 64 bit field on 32 bit systems.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/kvm_emul.S |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index c7b9fc9..7da835a 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -25,3 +25,32 @@
 
 #define KVM_MAGIC_PAGE (-4096)
 
+#ifdef CONFIG_64BIT
+#define LL64(reg, offs, reg2)  ld  reg, (offs)(reg2)
+#define STL64(reg, offs, reg2) std reg, (offs)(reg2)
+#else
+#define LL64(reg, offs, reg2)  lwz reg, (offs + 4)(reg2)
+#define STL64(reg, offs, reg2) stw reg, (offs + 4)(reg2)
+#endif
+
+#define SCRATCH_SAVE   \
+   /* Enable critical section. We are critical if  \
+  shared->critical == r1 */\
+   STL64(r1, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);  \
+   \
+   /* Save state */\
+   PPC_STL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0);  \
+   PPC_STL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0);  \
+   mfcrr31;\
+   stw r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0);
+
+#define SCRATCH_RESTORE
\
+   /* Restore state */ \
+   PPC_LL  r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0);  \
+   lwz r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0);  \
+   mtcrr30;\
+   PPC_LL  r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0);  \
+   \
+   /* Disable critical section. We are critical if \
+  shared->critical == r1 and r2 is always != r1 */ \
+   STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 13/27] KVM: PPC: Magic Page Book3s support

2010-07-01 Thread Alexander Graf
We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.

In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.

Signed-off-by: Alexander Graf 

v1 -> v2:

  - RMO -> PAM
---
 arch/powerpc/kvm/book3s.c |7 +++
 arch/powerpc/kvm/book3s_32_mmu.c  |   16 
 arch/powerpc/kvm/book3s_32_mmu_host.c |   12 
 arch/powerpc/kvm/book3s_64_mmu.c  |   30 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c |   12 
 5 files changed, 76 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 14db032..b22e608 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -554,6 +554,13 @@ mmio:
 
 static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
+   ulong mp_pa = vcpu->arch.magic_page_pa;
+
+   if (unlikely(mp_pa) &&
+   unlikely((mp_pa & KVM_RMO) >> PAGE_SHIFT == gfn)) {
+   return 1;
+   }
+
return kvm_is_visible_gfn(vcpu->kvm, gfn);
 }
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 41130c8..5bf4bf8 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data)
 {
int r;
+   ulong mp_ea = vcpu->arch.magic_page_ea;
 
pte->eaddr = eaddr;
+
+   /* Magic page override */
+   if (unlikely(mp_ea) &&
+   unlikely((eaddr & ~0xfffULL) == (mp_ea & ~0xfffULL)) &&
+   !(vcpu->arch.shared->msr & MSR_PR)) {
+   pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+   pte->raddr = vcpu->arch.magic_page_pa | (pte->raddr & 0xfff);
+   pte->raddr &= KVM_PAM;
+   pte->may_execute = true;
+   pte->may_read = true;
+   pte->may_write = true;
+
+   return 0;
+   }
+
r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
if (r < 0)
   r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 67b8c38..506d187 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -145,6 +145,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
bool primary = false;
bool evict = false;
struct hpte_cache *pte;
+   ulong mp_pa = vcpu->arch.magic_page_pa;
+
+   /* Magic page override */
+   if (unlikely(mp_pa) &&
+   unlikely((orig_pte->raddr & ~0xfffUL & KVM_PAM) ==
+(mp_pa & ~0xfffUL & KVM_PAM))) {
+   hpaddr = (pfn_t)virt_to_phys(vcpu->arch.shared);
+   get_page(pfn_to_page(hpaddr >> PAGE_SHIFT));
+   goto mapped;
+   }
 
/* Get host physical address for gpa */
hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
@@ -155,6 +165,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
}
hpaddr <<= PAGE_SHIFT;
 
+mapped:
+
/* and write the mapping ea -> hpa into the pt */
vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
map = find_sid_vsid(vcpu, vsid);
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 58aa840..d7889ef 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -163,6 +163,22 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
bool found = false;
bool perm_err = false;
int second = 0;
+   ulong mp_ea = vcpu->arch.magic_page_ea;
+
+   /* Magic page override */
+   if (unlikely(mp_ea) &&
+   unlikely((eaddr & ~0xfffULL) == (mp_ea & ~0xfffULL)) &&
+   !(vcpu->arch.shared->msr & MSR_PR)) {
+   gpte->eaddr = eaddr;
+   gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
+   gpte->raddr = vcpu->arch.magic_page_pa | (gpte->raddr & 0xfff);
+   gpte->raddr &= KVM_PAM;
+   gpte->may_execute = true;
+   gpte->may_read = true;
+   gpte->may_write = true;
+
+   return 0;
+   }
 
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
if (!slbe)
@@ -445,6 +461,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct 
kvm_vcpu *vcpu, ulong esid,
ulong ea = esid << SID_SHIFT;
struct kvmppc_slb *slb;
u64 gvsid = esid;
+   ulong mp_ea = vcpu->arch.magic_page_ea;
 
if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
   

[PATCH 10/27] KVM: PPC: Tell guest about pending interrupts

2010-07-01 Thread Alexander Graf
When the guest turns on interrupts again, it needs to know if we have an
interrupt pending for it. Because if so, it should rather get out of guest
context and get the interrupt.

So we introduce a new field in the shared page that we use to tell the guest
that there's a pending interrupt lying around.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_para.h |1 +
 arch/powerpc/kvm/book3s.c   |7 +++
 arch/powerpc/kvm/booke.c|7 +++
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 1f7dccd..82131fc 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -36,6 +36,7 @@ struct kvm_vcpu_arch_shared {
__u64 dar;
__u64 msr;
__u32 dsisr;
+   __u32 int_pending;  /* Tells the guest if we have an interrupt */
 };
 
 #define KVM_PVR_PARA   0x4b564d3f /* "KVM?" */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index ab43744..66313a2 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -337,6 +337,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
 void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 {
unsigned long *pending = &vcpu->arch.pending_exceptions;
+   unsigned long old_pending = vcpu->arch.pending_exceptions;
unsigned int priority;
 
 #ifdef EXIT_DEBUG
@@ -356,6 +357,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 BITS_PER_BYTE * sizeof(*pending),
 priority + 1);
}
+
+   /* Tell the guest about our interrupt status */
+   if (*pending)
+   vcpu->arch.shared->int_pending = 1;
+   else if (old_pending)
+   vcpu->arch.shared->int_pending = 0;
 }
 
 void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index b9f8ecf..0f8ff9d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -224,6 +224,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
 void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 {
unsigned long *pending = &vcpu->arch.pending_exceptions;
+   unsigned long old_pending = vcpu->arch.pending_exceptions;
unsigned int priority;
 
priority = __ffs(*pending);
@@ -235,6 +236,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 BITS_PER_BYTE * sizeof(*pending),
 priority + 1);
}
+
+   /* Tell the guest about our interrupt status */
+   if (*pending)
+   vcpu->arch.shared->int_pending = 1;
+   else if (old_pending)
+   vcpu->arch.shared->int_pending = 0;
 }
 
 /**
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 08/27] KVM: PPC: Add PV guest critical sections

2010-07-01 Thread Alexander Graf
When running in hooked code we need a way to disable interrupts without
clobbering any interrupts or exiting out to the hypervisor.

To achieve this, we have an additional critical field in the shared page. If
that field is equal to the r1 register of the guest, it tells the hypervisor
that we're in such a critical section and thus may not receive any interrupts.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - make crit detection only trigger in supervisor mode
---
 arch/powerpc/include/asm/kvm_para.h |1 +
 arch/powerpc/kvm/book3s.c   |   18 --
 arch/powerpc/kvm/booke.c|   15 +++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 89c2760..d9c06ab 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #include 
 
 struct kvm_vcpu_arch_shared {
+   __u64 critical; /* Guest may not get interrupts if == r1 */
__u64 sprg0;
__u64 sprg1;
__u64 sprg2;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 10afa48..ab43744 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -251,14 +251,28 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
int deliver = 1;
int vec = 0;
ulong flags = 0ULL;
+   ulong crit_raw = vcpu->arch.shared->critical;
+   ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
+   bool crit;
+
+   /* Truncate crit indicators in 32 bit mode */
+   if (!(vcpu->arch.shared->msr & MSR_SF)) {
+   crit_raw &= 0x;
+   crit_r1 &= 0x;
+   }
+
+   /* Critical section when crit == r1 */
+   crit = (crit_raw == crit_r1);
+   /* ... and we're in supervisor mode */
+   crit = crit && !(vcpu->arch.shared->msr & MSR_PR);
 
switch (priority) {
case BOOK3S_IRQPRIO_DECREMENTER:
-   deliver = vcpu->arch.shared->msr & MSR_EE;
+   deliver = (vcpu->arch.shared->msr & MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_DECREMENTER;
break;
case BOOK3S_IRQPRIO_EXTERNAL:
-   deliver = vcpu->arch.shared->msr & MSR_EE;
+   deliver = (vcpu->arch.shared->msr & MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_EXTERNAL;
break;
case BOOK3S_IRQPRIO_SYSTEM_RESET:
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index bd812f4..b9f8ecf 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -147,6 +147,20 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
int allowed = 0;
ulong uninitialized_var(msr_mask);
bool update_esr = false, update_dear = false;
+   ulong crit_raw = vcpu->arch.shared->critical;
+   ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
+   bool crit;
+
+   /* Truncate crit indicators in 32 bit mode */
+   if (!(vcpu->arch.shared->msr & MSR_SF)) {
+   crit_raw &= 0x;
+   crit_r1 &= 0x;
+   }
+
+   /* Critical section when crit == r1 */
+   crit = (crit_raw == crit_r1);
+   /* ... and we're in supervisor mode */
+   crit = crit && !(vcpu->arch.shared->msr & MSR_PR);
 
switch (priority) {
case BOOKE_IRQPRIO_DTLB_MISS:
@@ -181,6 +195,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_DECREMENTER:
case BOOKE_IRQPRIO_FIT:
allowed = vcpu->arch.shared->msr & MSR_EE;
+   allowed = allowed && !crit;
msr_mask = MSR_CE|MSR_ME|MSR_DE;
break;
case BOOKE_IRQPRIO_DEBUG:
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 11/27] KVM: PPC: Make RMO a define

2010-07-01 Thread Alexander Graf
On PowerPC it's very normal to not support all of the physical RAM in real mode.
To check if we're matching on the shared page or not, we need to know the limits
so we can restrain ourselves to that range.

So let's make it a define instead of open-coding it. And while at it, let's also
increase it.

Signed-off-by: Alexander Graf 

v1 -> v2:

  - RMO -> PAM
---
 arch/powerpc/include/asm/kvm_host.h |3 +++
 arch/powerpc/kvm/book3s.c   |4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5674300..fdfb7f0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -47,6 +47,9 @@
 #define HPTEG_HASH_NUM_VPTE(1 << HPTEG_HASH_BITS_VPTE)
 #define HPTEG_HASH_NUM_VPTE_LONG   (1 << HPTEG_HASH_BITS_VPTE_LONG)
 
+/* Physical Address Mask - allowed range of real mode RAM access */
+#define KVM_PAM0x0fffULL
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 66313a2..14db032 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -465,7 +465,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, 
bool data,
r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
} else {
pte->eaddr = eaddr;
-   pte->raddr = eaddr & 0x;
+   pte->raddr = eaddr & KVM_PAM;
pte->vpage = VSID_REAL | eaddr >> 12;
pte->may_read = true;
pte->may_write = true;
@@ -579,7 +579,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
pte.may_execute = true;
pte.may_read = true;
pte.may_write = true;
-   pte.raddr = eaddr & 0x;
+   pte.raddr = eaddr & KVM_PAM;
pte.eaddr = eaddr;
pte.vpage = eaddr >> 12;
}
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 05/27] KVM: PPC: Convert SRR0 and SRR1 to shared page

2010-07-01 Thread Alexander Graf
The SRR0 and SRR1 registers contain cached values of the PC and MSR
respectively. They get written to by the hypervisor when an interrupt
occurs or directly by the kernel. They are also used to tell the rfi(d)
instruction where to jump to.

Because it only gets touched on defined events that, it's very simple to
share with the guest. Hypervisor and guest both have full r/w access.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |2 --
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kvm/book3s.c   |   12 ++--
 arch/powerpc/kvm/book3s_emulate.c   |4 ++--
 arch/powerpc/kvm/booke.c|   15 ---
 arch/powerpc/kvm/booke_emulate.c|4 ++--
 arch/powerpc/kvm/emulate.c  |   12 
 7 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 4502c0f..227f770 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -224,8 +224,6 @@ struct kvm_vcpu_arch {
ulong sprg5;
ulong sprg6;
ulong sprg7;
-   ulong srr0;
-   ulong srr1;
ulong csrr0;
ulong csrr1;
ulong dsrr0;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index ec72a1c..d7fc6c2 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,8 @@
 #include 
 
 struct kvm_vcpu_arch_shared {
+   __u64 srr0;
+   __u64 srr1;
__u64 dar;
__u64 msr;
__u32 dsisr;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 29a3ed6..7cc3da6 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
 void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
 {
-   vcpu->arch.srr0 = kvmppc_get_pc(vcpu);
-   vcpu->arch.srr1 = vcpu->arch.shared->msr | flags;
+   vcpu->arch.shared->srr0 = kvmppc_get_pc(vcpu);
+   vcpu->arch.shared->srr1 = vcpu->arch.shared->msr | flags;
kvmppc_set_pc(vcpu, to_book3s(vcpu)->hior + vec);
vcpu->arch.mmu.reset_msr(vcpu);
 }
@@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs->lr = kvmppc_get_lr(vcpu);
regs->xer = kvmppc_get_xer(vcpu);
regs->msr = vcpu->arch.shared->msr;
-   regs->srr0 = vcpu->arch.srr0;
-   regs->srr1 = vcpu->arch.srr1;
+   regs->srr0 = vcpu->arch.shared->srr0;
+   regs->srr1 = vcpu->arch.shared->srr1;
regs->pid = vcpu->arch.pid;
regs->sprg0 = vcpu->arch.sprg0;
regs->sprg1 = vcpu->arch.sprg1;
@@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_lr(vcpu, regs->lr);
kvmppc_set_xer(vcpu, regs->xer);
kvmppc_set_msr(vcpu, regs->msr);
-   vcpu->arch.srr0 = regs->srr0;
-   vcpu->arch.srr1 = regs->srr1;
+   vcpu->arch.shared->srr0 = regs->srr0;
+   vcpu->arch.shared->srr1 = regs->srr1;
vcpu->arch.sprg0 = regs->sprg0;
vcpu->arch.sprg1 = regs->sprg1;
vcpu->arch.sprg2 = regs->sprg2;
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index c147864..f333cb4 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
switch (get_xop(inst)) {
case OP_19_XOP_RFID:
case OP_19_XOP_RFI:
-   kvmppc_set_pc(vcpu, vcpu->arch.srr0);
-   kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+   kvmppc_set_pc(vcpu, vcpu->arch.shared->srr0);
+   kvmppc_set_msr(vcpu, vcpu->arch.shared->srr1);
*advance = 0;
break;
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 5844bcf..8b546fe 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
 
printk("pc:   %08lx msr:  %08llx\n", vcpu->arch.pc, 
vcpu->arch.shared->msr);
printk("lr:   %08lx ctr:  %08lx\n", vcpu->arch.lr, vcpu->arch.ctr);
-   printk("srr0: %08lx srr1: %08lx\n", vcpu->arch.srr0, vcpu->arch.srr1);
+   printk("srr0: %08llx srr1: %08llx\n", vcpu->arch.shared->srr0,
+   vcpu->arch.shared->srr1);
 
printk("exceptions: %08lx\n", vcpu->arch.pending_exceptions);
 
@@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
}
 
if (allowed) {
-   vcpu->arch.srr0 = vcpu->arch.pc;
-   vcpu->arch.srr1 = vcpu->arch.shared->msr;
+ 

[PATCH 09/27] KVM: PPC: Add PV guest scratch registers

2010-07-01 Thread Alexander Graf
While running in hooked code we need to store register contents out because
we must not clobber any registers.

So let's add some fields to the shared page we can just happily write to.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_para.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index d9c06ab..1f7dccd 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,9 @@
 #include 
 
 struct kvm_vcpu_arch_shared {
+   __u64 scratch1;
+   __u64 scratch2;
+   __u64 scratch3;
__u64 critical; /* Guest may not get interrupts if == r1 */
__u64 sprg0;
__u64 sprg1;
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 20/27] KVM: PPC: PV tlbsync to nop

2010-07-01 Thread Alexander Graf
With our current MMU scheme we don't need to know about the tlbsync instruction.
So we can just nop it out.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - use kvm_patch_ins
---
 arch/powerpc/kernel/kvm.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 7094ee4..3a49de5 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -61,6 +61,8 @@
 #define KVM_INST_MTSPR_DAR 0x7c1303a6
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
+#define KVM_INST_TLBSYNC   0x7c00046c
+
 static bool kvm_patching_worked = true;
 
 static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
@@ -97,6 +99,11 @@ static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
kvm_patch_ins(inst, KVM_INST_STW | rt | (addr & 0xfffc));
 }
 
+static void kvm_patch_ins_nop(u32 *inst)
+{
+   kvm_patch_ins(inst, KVM_INST_NOP);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -165,6 +172,11 @@ static void kvm_check_ins(u32 *inst)
case KVM_INST_MTSPR_DSISR:
kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt);
break;
+
+   /* Nops */
+   case KVM_INST_TLBSYNC:
+   kvm_patch_ins_nop(inst);
+   break;
}
 
switch (_inst) {
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 18/27] KVM: PPC: KVM PV guest stubs

2010-07-01 Thread Alexander Graf
We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.

This patch still only contains stubs. But at least it loops through the
text section :).

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - kvm guest patch framework: introduce patch_ins
---
 arch/powerpc/kernel/kvm.c |   63 +
 1 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 2d8dd73..1f328d5 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,3 +32,66 @@
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+static bool kvm_patching_worked = true;
+
+static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
+{
+   *inst = new_inst;
+   flush_icache_range((ulong)inst, (ulong)inst + 4);
+}
+
+static void kvm_map_magic_page(void *data)
+{
+   kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
+  KVM_MAGIC_PAGE,  /* Physical Address */
+  KVM_MAGIC_PAGE); /* Effective Address */
+}
+
+static void kvm_check_ins(u32 *inst)
+{
+   u32 _inst = *inst;
+   u32 inst_no_rt = _inst & ~KVM_MASK_RT;
+   u32 inst_rt = _inst & KVM_MASK_RT;
+
+   switch (inst_no_rt) {
+   }
+
+   switch (_inst) {
+   }
+}
+
+static void kvm_use_magic_page(void)
+{
+   u32 *p;
+   u32 *start, *end;
+
+   /* Tell the host to map the magic page to -4096 on all CPUs */
+
+   on_each_cpu(kvm_map_magic_page, NULL, 1);
+
+   /* Now loop through all code and find instructions */
+
+   start = (void*)_stext;
+   end = (void*)_etext;
+
+   for (p = start; p < end; p++)
+   kvm_check_ins(p);
+}
+
+static int __init kvm_guest_init(void)
+{
+   char *p;
+
+   if (!kvm_para_available())
+   return 0;
+
+   if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
+   kvm_use_magic_page();
+
+   printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
+kvm_patching_worked ? "worked" : "failed");
+
+   return 0;
+}
+
+postcore_initcall(kvm_guest_init);
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 16/27] KVM: Move kvm_guest_init out of generic code

2010-07-01 Thread Alexander Graf
Currently x86 is the only architecture that uses kvm_guest_init(). With
PowerPC we're getting a second user, but the signature is different there
and we don't need to export it, as it uses the normal kernel init framework.

So let's move the x86 specific definition of that function over to the x86
specfic header file.

Signed-off-by: Alexander Graf 
---
 arch/x86/include/asm/kvm_para.h |6 ++
 include/linux/kvm_para.h|5 -
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 05eba5e..7b562b6 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -158,6 +158,12 @@ static inline unsigned int kvm_arch_para_features(void)
return cpuid_eax(KVM_CPUID_FEATURES);
 }
 
+#ifdef CONFIG_KVM_GUEST
+void __init kvm_guest_init(void);
+#else
+#define kvm_guest_init() do { } while (0)
 #endif
 
+#endif /* __KERNEL__ */
+
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ac2015a..47a070b 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -26,11 +26,6 @@
 #include 
 
 #ifdef __KERNEL__
-#ifdef CONFIG_KVM_GUEST
-void __init kvm_guest_init(void);
-#else
-#define kvm_guest_init() do { } while (0)
-#endif
 
 static inline int kvm_para_has_feature(unsigned int feature)
 {
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 15/27] KVM: PPC: Expose magic page support to guest

2010-07-01 Thread Alexander Graf
Now that we have the shared page in place and the MMU code knows about
the magic page, we can expose that capability to the guest!

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kvm/powerpc.c  |   11 +++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 82131fc..3cae15d 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -43,6 +43,8 @@ struct kvm_vcpu_arch_shared {
 #define KVM_SC_MAGIC_R00x4b564d52 /* "KVMR" */
 #define KVM_SC_MAGIC_R30x554c455a /* "ULEZ" */
 
+#define KVM_FEATURE_MAGIC_PAGE 1
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1ebb29e..0be119a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -60,8 +60,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
 
switch (nr) {
+   case KVM_HC_PPC_MAP_MAGIC_PAGE:
+   {
+   vcpu->arch.magic_page_pa = param1;
+   vcpu->arch.magic_page_ea = param2;
+
+   r = 0;
+   break;
+   }
case KVM_HC_FEATURES:
r = 0;
+#if !defined(CONFIG_KVM_440) /* XXX missing bits on 440 */
+   r |= (1 << KVM_FEATURE_MAGIC_PAGE);
+#endif
break;
default:
r = -KVM_ENOSYS;
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 04/27] KVM: PPC: Convert DAR to shared page.

2010-07-01 Thread Alexander Graf
The DAR register contains the address a data page fault occured at. This
register behaves pretty much like a simple data storage register that gets
written to on data faults. There is no hypervisor interaction required on
read or write.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h  |1 -
 arch/powerpc/include/asm/kvm_para.h  |1 +
 arch/powerpc/kvm/book3s.c|   14 +++---
 arch/powerpc/kvm/book3s_emulate.c|6 +++---
 arch/powerpc/kvm/book3s_paired_singles.c |2 +-
 arch/powerpc/kvm/booke.c |2 +-
 arch/powerpc/kvm/booke_emulate.c |4 ++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index c7aee42..4502c0f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -230,7 +230,6 @@ struct kvm_vcpu_arch {
ulong csrr1;
ulong dsrr0;
ulong dsrr1;
-   ulong dear;
ulong esr;
u32 dec;
u32 decar;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 9f7565b..ec72a1c 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #include 
 
 struct kvm_vcpu_arch_shared {
+   __u64 dar;
__u64 msr;
__u32 dsisr;
 };
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 72917f8..29a3ed6 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -594,14 +594,14 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
if (page_found == -ENOENT) {
/* Page not found in guest PTE entries */
-   vcpu->arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr;
vcpu->arch.shared->msr |=
(to_svcpu(vcpu)->shadow_srr1 & 0xf800ULL);
kvmppc_book3s_queue_irqprio(vcpu, vec);
} else if (page_found == -EPERM) {
/* Storage protection */
-   vcpu->arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
vcpu->arch.shared->dsisr =
to_svcpu(vcpu)->fault_dsisr & ~DSISR_NOHPTE;
vcpu->arch.shared->dsisr |= DSISR_PROTFAULT;
@@ -610,7 +610,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_book3s_queue_irqprio(vcpu, vec);
} else if (page_found == -EINVAL) {
/* Page not found in guest SLB */
-   vcpu->arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
} else if (!is_mmio &&
   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
@@ -867,17 +867,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
if (to_svcpu(vcpu)->fault_dsisr & DSISR_NOHPTE) {
r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr);
} else {
-   vcpu->arch.dear = dar;
+   vcpu->arch.shared->dar = dar;
vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL);
+   kvmppc_mmu_pte_flush(vcpu, dar, ~0xFFFUL);
r = RESUME_GUEST;
}
break;
}
case BOOK3S_INTERRUPT_DATA_SEGMENT:
if (kvmppc_mmu_map_segment(vcpu, kvmppc_get_fault_dar(vcpu)) < 
0) {
-   vcpu->arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu);
kvmppc_book3s_queue_irqprio(vcpu,
BOOK3S_INTERRUPT_DATA_SEGMENT);
}
@@ -997,7 +997,7 @@ program_interrupt:
if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
vcpu->arch.shared->dsisr = kvmppc_alignment_dsisr(vcpu,
kvmppc_get_last_inst(vcpu));
-   vcpu->arch.dear = kvmppc_alignment_dar(vcpu,
+   vcpu->arch.shared->dar = kvmppc_alignment_dar(vcpu,
kvmppc_get_last_inst(vcpu));
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
}
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 9982ff1..c147864 100644
--- a/arch/powerpc/kvm/book3s_emulate.c

[PATCH 12/27] KVM: PPC: First magic page steps

2010-07-01 Thread Alexander Graf
We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.

This patch introduces simple defines, so the follow-up patches are easier to
read.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 include/linux/kvm_para.h|1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index fdfb7f0..14be0f3 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -286,6 +286,8 @@ struct kvm_vcpu_arch {
u64 dec_jiffies;
unsigned long pending_exceptions;
struct kvm_vcpu_arch_shared *shared;
+   unsigned long magic_page_pa; /* phys addr to map the magic page to */
+   unsigned long magic_page_ea; /* effect. addr to map the magic page to */
 
 #ifdef CONFIG_PPC_BOOK3S
struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 3b8080e..ac2015a 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -18,6 +18,7 @@
 #define KVM_HC_VAPIC_POLL_IRQ  1
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
+#define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
 /*
  * hypercalls use architecture specific
-- 
1.6.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 07/27] KVM: PPC: Implement hypervisor interface

2010-07-01 Thread Alexander Graf
To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.

This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.

This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - change hypervisor calls to use new register values
---
 arch/powerpc/include/asm/kvm_para.h |  100 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |1 +
 arch/powerpc/kvm/book3s.c   |   10 +++-
 arch/powerpc/kvm/booke.c|   11 -
 arch/powerpc/kvm/emulate.c  |   11 -
 arch/powerpc/kvm/powerpc.c  |   28 ++
 include/linux/kvm_para.h|1 +
 7 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index e402999..89c2760 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -34,16 +34,112 @@ struct kvm_vcpu_arch_shared {
__u32 dsisr;
 };
 
+#define KVM_PVR_PARA   0x4b564d3f /* "KVM?" */
+#define KVM_SC_MAGIC_R00x4b564d52 /* "KVMR" */
+#define KVM_SC_MAGIC_R30x554c455a /* "ULEZ" */
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
 {
-   return 0;
+   unsigned long pvr = KVM_PVR_PARA;
+
+   asm volatile("mfpvr %0" : "=r"(pvr) : "0"(pvr));
+   return pvr == KVM_PVR_PARA;
+}
+
+static inline long kvm_hypercall0(unsigned int nr)
+{
+   unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0;
+   unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3;
+   unsigned long register _nr asm("r4") = nr;
+
+   asm volatile("sc"
+: "=r"(r3)
+: "r"(r0), "r"(r3), "r"(_nr)
+: "memory");
+
+   return r3;
 }
 
+static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
+{
+   unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0;
+   unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3;
+   unsigned long register _nr asm("r4") = nr;
+   unsigned long register _p1 asm("r5") = p1;
+
+   asm volatile("sc"
+: "=r"(r3)
+: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1)
+: "memory");
+
+   return r3;
+}
+
+static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
+ unsigned long p2)
+{
+   unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0;
+   unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3;
+   unsigned long register _nr asm("r4") = nr;
+   unsigned long register _p1 asm("r5") = p1;
+   unsigned long register _p2 asm("r6") = p2;
+
+   asm volatile("sc"
+: "=r"(r3)
+: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1), "r"(_p2)
+: "memory");
+
+   return r3;
+}
+
+static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
+{
+   unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0;
+   unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3;
+   unsigned long register _nr asm("r4") = nr;
+   unsigned long register _p1 asm("r5") = p1;
+   unsigned long register _p2 asm("r6") = p2;
+   unsigned long register _p3 asm("r7") = p3;
+
+   asm volatile("sc"
+: "=r"(r3)
+: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1), "r"(_p2), "r"(_p3)
+: "memory");
+
+   return r3;
+}
+
+static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+   unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0;
+   unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3;
+   unsigned long register _nr asm("r4") = nr;
+   unsigned long register _p1 asm("r5") = p1;
+   unsigned long register _p2 asm("r6") = p2;
+   unsigned long register _p3 asm("r7") = p3;
+   unsigned long register _p4 asm("r8") = p4;
+
+   asm volatile("sc"
+: "=r"(r3)
+: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1), "r"(_p2), "r"(_p3),
+  "r"(_p4)
+: "memory");
+
+   return r3;
+}
+
+
 static inline unsigned int kvm_arch_para_features(void)
 {
-   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return kvm_hypercall0(KVM_HC_FEATURES);
 }
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 18d139e..ecb3bc7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -10

[PATCH 03/27] KVM: PPC: Convert DSISR to shared page

2010-07-01 Thread Alexander Graf
The DSISR register contains information about a data page fault. It is fully
read/write from inside the guest context and we don't need to worry about
interacting based on writes of this register.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h|1 -
 arch/powerpc/include/asm/kvm_para.h  |1 +
 arch/powerpc/kvm/book3s.c|   11 ++-
 arch/powerpc/kvm/book3s_emulate.c|6 +++---
 arch/powerpc/kvm/book3s_paired_singles.c |2 +-
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 8274a2d..b5b1961 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -85,7 +85,6 @@ struct kvmppc_vcpu_book3s {
u64 hid[6];
u64 gqr[8];
int slb_nr;
-   u32 dsisr;
u64 sdr1;
u64 hior;
u64 msr_mask;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index a17dc52..9f7565b 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -24,6 +24,7 @@
 
 struct kvm_vcpu_arch_shared {
__u64 msr;
+   __u32 dsisr;
 };
 
 #ifdef __KERNEL__
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 38cca77..72917f8 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -595,15 +595,16 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
if (page_found == -ENOENT) {
/* Page not found in guest PTE entries */
vcpu->arch.dear = kvmppc_get_fault_dar(vcpu);
-   to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr;
+   vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr;
vcpu->arch.shared->msr |=
(to_svcpu(vcpu)->shadow_srr1 & 0xf800ULL);
kvmppc_book3s_queue_irqprio(vcpu, vec);
} else if (page_found == -EPERM) {
/* Storage protection */
vcpu->arch.dear = kvmppc_get_fault_dar(vcpu);
-   to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr & 
~DSISR_NOHPTE;
-   to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT;
+   vcpu->arch.shared->dsisr =
+   to_svcpu(vcpu)->fault_dsisr & ~DSISR_NOHPTE;
+   vcpu->arch.shared->dsisr |= DSISR_PROTFAULT;
vcpu->arch.shared->msr |=
(to_svcpu(vcpu)->shadow_srr1 & 0xf800ULL);
kvmppc_book3s_queue_irqprio(vcpu, vec);
@@ -867,7 +868,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr);
} else {
vcpu->arch.dear = dar;
-   to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr;
+   vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL);
r = RESUME_GUEST;
@@ -994,7 +995,7 @@ program_interrupt:
}
case BOOK3S_INTERRUPT_ALIGNMENT:
if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
-   to_book3s(vcpu)->dsisr = kvmppc_alignment_dsisr(vcpu,
+   vcpu->arch.shared->dsisr = kvmppc_alignment_dsisr(vcpu,
kvmppc_get_last_inst(vcpu));
vcpu->arch.dear = kvmppc_alignment_dar(vcpu,
kvmppc_get_last_inst(vcpu));
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 35d3c16..9982ff1 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -221,7 +221,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
else if (r == -EPERM)
dsisr |= DSISR_PROTFAULT;
 
-   to_book3s(vcpu)->dsisr = dsisr;
+   vcpu->arch.shared->dsisr = dsisr;
to_svcpu(vcpu)->fault_dsisr = dsisr;
 
kvmppc_book3s_queue_irqprio(vcpu,
@@ -327,7 +327,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
to_book3s(vcpu)->sdr1 = spr_val;
break;
case SPRN_DSISR:
-   to_book3s(vcpu)->dsisr = spr_val;
+   vcpu->arch.shared->dsisr = spr_val;
break;
case SPRN_DAR:
vcpu->arch.dear = spr_val;
@@ -440,7 +440,7 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
kvmppc_set_gpr(vcpu, rt, to_book3s(v

[PATCH 06/27] KVM: PPC: Convert SPRG[0-4] to shared page

2010-07-01 Thread Alexander Graf
When in kernel mode there are 4 additional registers available that are
simple data storage. Instead of exiting to the hypervisor to read and
write those, we can just share them with the guest using the page.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |4 
 arch/powerpc/include/asm/kvm_para.h |4 
 arch/powerpc/kvm/book3s.c   |   16 
 arch/powerpc/kvm/booke.c|   16 
 arch/powerpc/kvm/emulate.c  |   24 
 5 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 227f770..5674300 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -216,10 +216,6 @@ struct kvm_vcpu_arch {
ulong guest_owned_ext;
 #endif
u32 mmucr;
-   ulong sprg0;
-   ulong sprg1;
-   ulong sprg2;
-   ulong sprg3;
ulong sprg4;
ulong sprg5;
ulong sprg6;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index d7fc6c2..e402999 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,10 @@
 #include 
 
 struct kvm_vcpu_arch_shared {
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
__u64 srr0;
__u64 srr1;
__u64 dar;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 7cc3da6..0a56e8d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs->srr0 = vcpu->arch.shared->srr0;
regs->srr1 = vcpu->arch.shared->srr1;
regs->pid = vcpu->arch.pid;
-   regs->sprg0 = vcpu->arch.sprg0;
-   regs->sprg1 = vcpu->arch.sprg1;
-   regs->sprg2 = vcpu->arch.sprg2;
-   regs->sprg3 = vcpu->arch.sprg3;
+   regs->sprg0 = vcpu->arch.shared->sprg0;
+   regs->sprg1 = vcpu->arch.shared->sprg1;
+   regs->sprg2 = vcpu->arch.shared->sprg2;
+   regs->sprg3 = vcpu->arch.shared->sprg3;
regs->sprg5 = vcpu->arch.sprg4;
regs->sprg6 = vcpu->arch.sprg5;
regs->sprg7 = vcpu->arch.sprg6;
@@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_msr(vcpu, regs->msr);
vcpu->arch.shared->srr0 = regs->srr0;
vcpu->arch.shared->srr1 = regs->srr1;
-   vcpu->arch.sprg0 = regs->sprg0;
-   vcpu->arch.sprg1 = regs->sprg1;
-   vcpu->arch.sprg2 = regs->sprg2;
-   vcpu->arch.sprg3 = regs->sprg3;
+   vcpu->arch.shared->sprg0 = regs->sprg0;
+   vcpu->arch.shared->sprg1 = regs->sprg1;
+   vcpu->arch.shared->sprg2 = regs->sprg2;
+   vcpu->arch.shared->sprg3 = regs->sprg3;
vcpu->arch.sprg5 = regs->sprg4;
vcpu->arch.sprg6 = regs->sprg5;
vcpu->arch.sprg7 = regs->sprg6;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 8b546fe..984c461 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs->srr0 = vcpu->arch.shared->srr0;
regs->srr1 = vcpu->arch.shared->srr1;
regs->pid = vcpu->arch.pid;
-   regs->sprg0 = vcpu->arch.sprg0;
-   regs->sprg1 = vcpu->arch.sprg1;
-   regs->sprg2 = vcpu->arch.sprg2;
-   regs->sprg3 = vcpu->arch.sprg3;
+   regs->sprg0 = vcpu->arch.shared->sprg0;
+   regs->sprg1 = vcpu->arch.shared->sprg1;
+   regs->sprg2 = vcpu->arch.shared->sprg2;
+   regs->sprg3 = vcpu->arch.shared->sprg3;
regs->sprg5 = vcpu->arch.sprg4;
regs->sprg6 = vcpu->arch.sprg5;
regs->sprg7 = vcpu->arch.sprg6;
@@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_msr(vcpu, regs->msr);
vcpu->arch.shared->srr0 = regs->srr0;
vcpu->arch.shared->srr1 = regs->srr1;
-   vcpu->arch.sprg0 = regs->sprg0;
-   vcpu->arch.sprg1 = regs->sprg1;
-   vcpu->arch.sprg2 = regs->sprg2;
-   vcpu->arch.sprg3 = regs->sprg3;
+   vcpu->arch.shared->sprg0 = regs->sprg0;
+   vcpu->arch.shared->sprg1 = regs->sprg1;
+   vcpu->arch.shared->sprg2 = regs->sprg2;
+   vcpu->arch.shared->sprg3 = regs->sprg3;
vcpu->arch.sprg5 = regs->sprg4;
vcpu->arch.sprg6 = regs->sprg5;
vcpu->arch.sprg7 = regs->sprg6;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index ad0fa4f..454869b 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, rt, get_tb()); break;
 
 

[PATCH 01/27] KVM: PPC: Introduce shared page

2010-07-01 Thread Alexander Graf
For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.

This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/44x.c  |7 +++
 arch/powerpc/kvm/book3s.c   |7 +++
 arch/powerpc/kvm/e500.c |7 +++
 6 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e004eaf..246a3dd 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define KVM_MAX_VCPUS 1
@@ -289,6 +290,7 @@ struct kvm_vcpu_arch {
struct tasklet_struct tasklet;
u64 dec_jiffies;
unsigned long pending_exceptions;
+   struct kvm_vcpu_arch_shared *shared;
 
 #ifdef CONFIG_PPC_BOOK3S
struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 2d48f6a..1485ba8 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -20,6 +20,11 @@
 #ifndef __POWERPC_KVM_PARA_H__
 #define __POWERPC_KVM_PARA_H__
 
+#include 
+
+struct kvm_vcpu_arch_shared {
+};
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 496cc5b..944f593 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -400,6 +400,7 @@ int main(void)
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
+   DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
 
/* book3s */
 #ifdef CONFIG_PPC_BOOK3S
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index 73c0a3f..e7b1f3f 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto free_vcpu;
 
+   vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu->arch.shared)
+   goto uninit_vcpu;
+
return vcpu;
 
+uninit_vcpu:
+   kvm_vcpu_uninit(vcpu);
 free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
 out:
@@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
 
+   free_page((unsigned long)vcpu->arch.shared);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
 }
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 30c0bd5..2c2c3ca 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_shadow_vcpu;
 
+   vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu->arch.shared)
+   goto uninit_vcpu;
+
vcpu->arch.host_retip = kvm_return_point;
vcpu->arch.host_msr = mfmsr();
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
return vcpu;
 
+uninit_vcpu:
+   kvm_vcpu_uninit(vcpu);
 free_shadow_vcpu:
kfree(vcpu_book3s->shadow_vcpu);
 free_vcpu:
@@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 
+   free_page((unsigned long)vcpu->arch.shared);
kvm_vcpu_uninit(vcpu);
kfree(vcpu_book3s->shadow_vcpu);
vfree(vcpu_book3s);
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index e8a00b0..71750f2 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto uninit_vcpu;
 
+   vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu->arch.shared)
+   goto uninit_tlb;
+
return vcpu;
 
+uninit_tlb:
+   kvmppc_e500_tlb_uninit(vcpu_e500);
 uninit_vcpu:
kvm_vcpu_uninit(vcpu);
 free_vcpu:
@@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
 
+   free_page((unsigned long)vcpu->arch.shared);

[PATCH 00/27] KVM PPC PV framework

2010-07-01 Thread Alexander Graf
On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the
hypervisor extensions.

While that is all great to show that virtualization is possible, there are
quite some cases where the emulation overhead of privileged instructions is
killing performance.

This patchset tackles exactly that issue. It introduces a paravirtual framework
using which KVM and Linux share a page to exchange register state with. That
way we don't have to switch to the hypervisor just to change a value of a
privileged register.

To prove my point, I ran the same test I did for the MMU optimizations against
the PV framework. Here are the results:

[without]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; done

real0m14.659s
user0m8.967s
sys 0m5.688s

[with]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; done

real0m7.557s
user0m4.121s
sys 0m3.426s


So this is a significant performance improvement! I'm quite happy how fast this
whole thing becomes :)

I tried to take all comments I've heard from people so far about such a PV
framework into account. In case you told me something before that is a no-go
and I still did it, please just tell me again.

Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start
experiencing the power yourself. - heh

v1 -> v2:

  - change hypervisor calls to use r0 and r3
  - make crit detection only trigger in supervisor mode
  - RMO -> PAM
  - introduce kvm_patch_ins
  - only flush icache when patching
  - introduce kvm_patch_ins_b
  - update documentation

Alexander Graf (27):
  KVM: PPC: Introduce shared page
  KVM: PPC: Convert MSR to shared page
  KVM: PPC: Convert DSISR to shared page
  KVM: PPC: Convert DAR to shared page.
  KVM: PPC: Convert SRR0 and SRR1 to shared page
  KVM: PPC: Convert SPRG[0-4] to shared page
  KVM: PPC: Implement hypervisor interface
  KVM: PPC: Add PV guest critical sections
  KVM: PPC: Add PV guest scratch registers
  KVM: PPC: Tell guest about pending interrupts
  KVM: PPC: Make RMO a define
  KVM: PPC: First magic page steps
  KVM: PPC: Magic Page Book3s support
  KVM: PPC: Magic Page BookE support
  KVM: PPC: Expose magic page support to guest
  KVM: Move kvm_guest_init out of generic code
  KVM: PPC: Generic KVM PV guest support
  KVM: PPC: KVM PV guest stubs
  KVM: PPC: PV instructions to loads and stores
  KVM: PPC: PV tlbsync to nop
  KVM: PPC: Introduce kvm_tmp framework
  KVM: PPC: Introduce branch patching helper
  KVM: PPC: PV assembler helpers
  KVM: PPC: PV mtmsrd L=1
  KVM: PPC: PV mtmsrd L=0 and mtmsr
  KVM: PPC: PV wrteei
  KVM: PPC: Add Documentation about PV interface

 Documentation/kvm/ppc-pv.txt |  185 ++
 arch/powerpc/include/asm/kvm_book3s.h|1 -
 arch/powerpc/include/asm/kvm_host.h  |   15 +-
 arch/powerpc/include/asm/kvm_para.h  |  121 +-
 arch/powerpc/include/asm/kvm_ppc.h   |1 +
 arch/powerpc/kernel/Makefile |2 +
 arch/powerpc/kernel/asm-offsets.c|   18 ++-
 arch/powerpc/kernel/kvm.c|  408 ++
 arch/powerpc/kernel/kvm_emul.S   |  237 +
 arch/powerpc/kvm/44x.c   |7 +
 arch/powerpc/kvm/44x_tlb.c   |8 +-
 arch/powerpc/kvm/book3s.c|  165 -
 arch/powerpc/kvm/book3s_32_mmu.c |   28 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c|   16 +-
 arch/powerpc/kvm/book3s_64_mmu.c |   42 +++-
 arch/powerpc/kvm/book3s_64_mmu_host.c|   16 +-
 arch/powerpc/kvm/book3s_emulate.c|   25 +-
 arch/powerpc/kvm/book3s_paired_singles.c |   11 +-
 arch/powerpc/kvm/booke.c |  113 +++--
 arch/powerpc/kvm/booke.h |6 +-
 arch/powerpc/kvm/booke_emulate.c |   14 +-
 arch/powerpc/kvm/booke_interrupts.S  |3 +-
 arch/powerpc/kvm/e500.c  |7 +
 arch/powerpc/kvm/e500_tlb.c  |   31 ++-
 arch/powerpc/kvm/e500_tlb.h  |2 +-
 arch/powerpc/kvm/emulate.c   |   47 +++-
 arch/powerpc/kvm/powerpc.c   |   42 +++-
 arch/powerpc/platforms/Kconfig   |   10 +
 arch/x86/include/asm/kvm_para.h  |6 +
 include/linux/kvm_para.h |7 +-
 30 files changed, 1420 insertions(+), 174 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt
 create mode 100644 arch/powerpc/kernel/kvm.c
 create mode 100644 arch/powerpc/kernel/kvm_emul.S

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Alexander Graf
Avi Kivity wrote:
> On 07/01/2010 11:18 AM, Alexander Graf wrote:
>>
>> How does dirty bitmap flushing work on x86 atm? I loop through all
>> mapped pages and flush the ones that match the range of the region I
>> need to flush. But wouldn't it be a lot more efficient to have an
>> hlist in the memslot and loop through that when I need to flush that
>> memslot?
>>
>
> x86 loops through the reverse-map link list rooted at the memory
> slot.  The linked list links all sptes for a single hva.
>
> So, it's like you describe, except it's an array of lists instead of a
> single list.  We need per-page rmap lists to be able to remove a
> page's sptes in response to an mmu notifier callback, and to be able
> to write protect a guest page if it's used as a page table.
>

But doesn't that mean that you still need to loop through all the hvas
that you want to invalidate? Wouldn't it speed up dirty bitmap flushing
a lot if we'd just have a simple linked list of all sPTEs belonging to
that memslot?

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/9] Add Synopsys DesignWare HS USB OTG Controller driver.

2010-07-01 Thread Stefan Roese
Fushen,

On Wednesday 30 June 2010 22:16:52 fushen chen wrote:
> The driver is based on Synopsys driver 2.60a.

OK.
 
> We started to prepare open source submission based on our internal
> version. We sync this version to linux-2.6-denx repository from time to
> time. I'll sync the driver to the latest linux-2.6-denx as Wolfgang
> pointed out, and re-submit patch to open source.

Thanks. I really appreciate this. A lot of effort has gone into this driver to 
fix some very troubling issues. And this driver version has undergone very 
intensive testing. So please make sure to integrate the changes/fixes. And 
please add myself and Chuck Meade , who did most of the 
bigger changes, to Cc on your new patch versions.
 
Thanks.

Cheers,
Stefan

--
DENX Software Engineering GmbH,  MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Avi Kivity

On 07/01/2010 11:18 AM, Alexander Graf wrote:


How does dirty bitmap flushing work on x86 atm? I loop through all mapped pages 
and flush the ones that match the range of the region I need to flush. But 
wouldn't it be a lot more efficient to have an hlist in the memslot and loop 
through that when I need to flush that memslot?
   


x86 loops through the reverse-map link list rooted at the memory slot.  
The linked list links all sptes for a single hva.


So, it's like you describe, except it's an array of lists instead of a 
single list.  We need per-page rmap lists to be able to remove a page's 
sptes in response to an mmu notifier callback, and to be able to write 
protect a guest page if it's used as a page table.


--
error compiling committee.c: too many arguments to function

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Alexander Graf

On 01.07.2010, at 09:29, Avi Kivity wrote:

> On 06/30/2010 04:18 PM, Alexander Graf wrote:
>> Book3s suffered from my really bad shadow MMU implementation so far. So
>> I finally got around to implement a combined hash and list mechanism that
>> allows for much faster lookup of mapped pages.
>> 
>> To show that it really is faster, I tried to run simple process spawning
>> code inside the guest with and without these patches:
>> 
>> [without]
>> 
>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello>  /dev/null; 
>> done
>> 
>> real0m20.235s
>> user0m10.418s
>> sys 0m9.766s
>> 
>> [with]
>> 
>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello>  /dev/null; 
>> done
>> 
>> real0m14.659s
>> user0m8.967s
>> sys 0m5.688s
>> 
>> So as you can see, performance improved significantly.
>> 
>> v2 ->  v3:
>> 
>>   - use hlist
>>   - use global slab cache
>> 
>>   
> 
> Looks good.

Great :).

How does dirty bitmap flushing work on x86 atm? I loop through all mapped pages 
and flush the ones that match the range of the region I need to flush. But 
wouldn't it be a lot more efficient to have an hlist in the memslot and loop 
through that when I need to flush that memslot?

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: machine check in kernel for a mpc870 board

2010-07-01 Thread Shawn Jin
Hi Scott,

> How do I find the address, reg, and range for nodes like localbus,
> soc, eth0, cpm, serial etc.? Do the addresses of localbus and soc
> relate to IMMR? So my localbus and soc should be as follows?
>
>        local...@fa200100 {
>                compatible = "fsl,mpc885-localbus", "fsl,pq1-localbus",
>                             "simple-bus";
>                #address-cells = <2>;
>                #size-cells = <1>;
>                reg = <0xfa200100 0x40>;
>
>                ranges = <
>                        0 0 0xfe00 0x0100    // I'm not sure about 
> this?
>                >;
>        };

I managed to proceed a little bit further.
Memory <- <0x0 0x800> (128MB)
ENET0: local-mac-address <- 00:09:9b:01:58:64
CPU clock-frequency <- 0x7270e00 (120MHz)
CPU timebase-frequency <- 0x393870 (4MHz)
CPU bus-frequency <- 0x3938700 (60MHz)

zImage starting: loaded at 0x0040 (sp: 0x07d1ccd0)
Allocating 0x186bdd bytes for kernel ...
gunzipping (0x <- 0x0040c000:0x00591c30)...done 0x173b18 bytes

Linux/PowerPC load: root=/dev/ram
Finalizing device tree... flat tree at 0x59e300

The gdb showed deadbeef.
(gdb) target remote ppcbdi:2001
Remote debugging using ppcbdi:2001
0xdeadbeef in ?? ()
(gdb)

The kernel doesn't seem to start. What could go wrong here?

Thanks a lot,
-Shawn.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/2] Faster MMU lookups for Book3s v3

2010-07-01 Thread Avi Kivity

On 06/30/2010 04:18 PM, Alexander Graf wrote:

Book3s suffered from my really bad shadow MMU implementation so far. So
I finally got around to implement a combined hash and list mechanism that
allows for much faster lookup of mapped pages.

To show that it really is faster, I tried to run simple process spawning
code inside the guest with and without these patches:

[without]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello>  /dev/null; done

real0m20.235s
user0m10.418s
sys 0m9.766s

[with]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello>  /dev/null; done

real0m14.659s
user0m8.967s
sys 0m5.688s

So as you can see, performance improved significantly.

v2 ->  v3:

   - use hlist
   - use global slab cache

   


Looks good.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev