RE: CONFIG_NO_HZ causing poor console responsiveness
>-Original Message- >From: linuxppc-dev-bounces+leoli=freescale@lists.ozlabs.org >[mailto:linuxppc-dev-bounces+leoli=freescale@lists.ozlabs.org] On >Behalf Of Benjamin Herrenschmidt >Sent: Friday, July 02, 2010 1:47 PM >To: Tabi Timur-B04825 >Cc: Linuxppc-dev Development >Subject: Re: CONFIG_NO_HZ causing poor console responsiveness > >On Tue, 2010-06-29 at 14:54 -0500, Timur Tabi wrote: >> I'm adding support for a new e500-based board (the P1022DS), and in >> the process I've discovered that enabling CONFIG_NO_HZ (Tickless >> System / Dynamic Ticks) causes significant responsiveness problems on >> the serial console. When I type on the console, I see delays of up to >> a half-second for almost every character. It acts as if there's a >> background process eating all the CPU. >> >> I don't have time to debug this thoroughly at the moment. The problem >> occurs in the latest kernel, but it appears not to occur in 2.6.32. >> >> Has anyone else seen anything like this? > >I noticed that on the bimini with 2.6.35-rc* though I didn't get to track >it down yet. Patch found at the following location fixed this problem. http://www.spinics.net/lists/linux-tip-commits/msg08279.html Hope it has already been merged. - Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: CONFIG_NO_HZ causing poor console responsiveness
On Tue, 2010-06-29 at 14:54 -0500, Timur Tabi wrote: > I'm adding support for a new e500-based board (the P1022DS), and in > the process I've discovered that enabling CONFIG_NO_HZ (Tickless > System / Dynamic Ticks) causes significant responsiveness problems on > the serial console. When I type on the console, I see delays of up to > a half-second for almost every character. It acts as if there's a > background process eating all the CPU. > > I don't have time to debug this thoroughly at the moment. The problem > occurs in the latest kernel, but it appears not to occur in 2.6.32. > > Has anyone else seen anything like this? I noticed that on the bimini with 2.6.35-rc* though I didn't get to track it down yet. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: CONFIG_NO_HZ causing poor console responsiveness
On Jul 1, 2010, at 10:46 PM, "Mike Galbraith" wrote: > > Hi Timur, > > This has already fixed. Below is the final fix from tip. Than Mike. I thought I was using the latest code, but I guess not. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: CONFIG_NO_HZ causing poor console responsiveness
On Thu, 2010-07-01 at 16:55 -0500, Timur Tabi wrote: > On Tue, Jun 29, 2010 at 2:54 PM, Timur Tabi wrote: > > I'm adding support for a new e500-based board (the P1022DS), and in > > the process I've discovered that enabling CONFIG_NO_HZ (Tickless > > System / Dynamic Ticks) causes significant responsiveness problems on > > the serial console. When I type on the console, I see delays of up to > > a half-second for almost every character. It acts as if there's a > > background process eating all the CPU. > > I finally finished my git-bisect, and it wasn't that helpful. I had > to skip several commits because the kernel just wouldn't boot: > > There are only 'skip'ped commits left to test. > The first bad commit could be any of: > 6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08 > 8b911acdf08477c059d1c36c21113ab1696c612b > 21406928afe43f1db6acab4931bb8c886f4d04ce > 5ca9880c6f4ba4c84b517bc2fed5366adf63d191 > a64692a3afd85fe048551ab89142fd5ca99a0dbd > f2e74eeac03ffb779d64b66a643c5e598145a28b > c6ee36c423c3ed1fb86bb3eabba9fc256a300d16 > e12f31d3e5d36328c7fbd0fce40a95e70b59152c > 13814d42e45dfbe845a0bbe5184565d9236896ae > b42e0c41a422a212ddea0666d5a3a0e3c35206db > 39c0cbe2150cbd848a25ba6cdb271d1ad46818ad <== the crime scene > beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8 > 41acab8851a0408c1d5ad6c21a07456f88b54d40 > 6427462bfa50f50dc6c088c07037264fcc73eca1 > c9494727cf293ae2ec66af57547a3e79c724fec2 > We cannot bisect more! > > These correspond to a batch of scheduler patches, most from Mike Galbraith. > > I don't know what to do now. I can't test any of these commits. Even > if I could, they look like they're all part of one set, so I doubt I > could narrow it down to one commit anyway. Hi Timur, This has already fixed. Below is the final fix from tip. commit 3310d4d38fbc514e7b18bd3b1eea8effdd63b5aa Author: Peter Zijlstra Date: Thu Jun 17 18:02:37 2010 +0200 nohz: Fix nohz ratelimit Chris Wedgwood reports that 39c0cbe (sched: Rate-limit nohz) causes a serial console regression, unresponsiveness, and indeed it does. The reason is that the nohz code is skipped even when the tick was already stopped before the nohz_ratelimit(cpu) condition changed. Move the nohz_ratelimit() check to the other conditions which prevent long idle sleeps. Reported-by: Chris Wedgwood Tested-by: Brian Bloniarz Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra Cc: Jiri Kosina Cc: Linus Torvalds Cc: Greg KH Cc: Alan Cox Cc: OGAWA Hirofumi Cc: Jef Driesen LKML-Reference: <1276790557.27822.516.ca...@twins> Signed-off-by: Thomas Gleixner diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 1d7b9bc..783fbad 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -315,9 +315,6 @@ void tick_nohz_stop_sched_tick(int inidle) goto end; } - if (nohz_ratelimit(cpu)) - goto end; - ts->idle_calls++; /* Read jiffies and the time when jiffies were updated last */ do { @@ -328,7 +325,7 @@ void tick_nohz_stop_sched_tick(int inidle) } while (read_seqretry(&xtime_lock, seq)); if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) || - arch_needs_cpu(cpu)) { + arch_needs_cpu(cpu) || nohz_ratelimit(cpu)) { next_jiffies = last_jiffies + 1; delta_jiffies = 1; } else { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On Thu, 2010-07-01 at 16:42 +0300, Avi Kivity wrote: > > So I think the only reasonable way to implement page ageing is to > unmap > > pages. And that's slow, because it means we have to map them again > on > > access. Bleks. Or we could look for the HTAB entry and only unmap > them > > if the entry is moot. > > > > I think it works out if you update struct page when you clear out an > HTAB. Hrm... going to struct page without going through the PTE might work out indeed. We can get to the struct page from the RPN. However, that means -reading- the hash entry we want to evict, and that's a fairly expensive H-Call, especially if we ask phyp to back-translate the real address into a logical (partition) address so we can get to the struct page While we might be able to reconstitute the virtual address from the hash content + bucket address. However, from the vsid back to the page table might be tricky as well. IE. Either way, it's not a simple process. Now, eviction is rare, our MMU hash is generally big, so maybe the read back with back translate to hit struct page might be the way to go here. As for other kind of invalidations, we do have the PTE around when they happen so we can go fetch the HW ref bit and update the PTE I suppose. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On Thu, 2010-07-01 at 14:52 +0200, Alexander Graf wrote: > Page ageing is difficult. The HTAB has a hardware set referenced bit, > but we don't have a guarantee that the entry is still there when we look > for it. Something else could have overwritten it by then, but the entry > could still be lingering around in the TLB. > > So I think the only reasonable way to implement page ageing is to unmap > pages. And that's slow, because it means we have to map them again on > access. Bleks. Or we could look for the HTAB entry and only unmap them > if the entry is moot. Well, not quite. We -could- use the HW reference bit. However, that means that whenever we flush the hash PTE we get a snapshot of the HW bit and copy it over to the PTE. That's not -that- bad for normal invalidations. However, it's a problem potentially for eviction. IE. When a hash bucket is full, we pseudo-randomly evict a slot. If we were to use the HW ref bit, we would need a way to go back to the PTE from the hash bucket to perform that update (or something really tricky like sticking it in a list somewhere, and have the young test walk that list when non-empty, etc...) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Oops while running fs_racer test on a POWER6 box against latest git
In message <20100701105907.gk22...@laptop> you wrote: > On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote: > > > While running fs_racer test from LTP on a POWER6 box against latest git(2 .6.3 > > 5-rc3-git4 - commitid 984bc9601f64fd) > > > came across the following warning followed by multiple oops. > > > > > > [ cut here ] > > > > > > Badness at kernel/mutex-debug.c:64 > > > NIP: c00be9e8 LR: c00be9cc CTR: > > > REGS: c0010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotes t) > > > MSR: 80029032CR: 24224422 XER: 0012 > > > TASK = c0010727cf00[8211] 'fs_racer_file_c' THREAD: c0010be8bb50 CPU: > > 2 > > > GPR00: c0010be8f970 c0d3d798 000 1 > > > GPR04: c0010be8fa70 c0010be8c000 c0010727d9f8 000 0 > > > GPR08: c43042f0 c16534e8 017a c0c29a1 c > > > GPR12: 28228424 cf600500 c0010be8fc40 200 0 > > > GPR16: f000 c00109c73000 c0010be8fc30 0001044 2 > > > GPR20: 01b6 c0010dd1225 0 > > > GPR24: c017c08c c0010727cf00 c0010dd12278 c0010dd1221 0 > > > GPR28: 0001 c0010be8c000 c0ca2008 c0010be8fa7 0 > > > NIP [c00be9e8] .mutex_remove_waiter+0xa4/0x130 > > > LR [c00be9cc] .mutex_remove_waiter+0x88/0x130 > > > Call Trace: > > > [c0010be8f970] [c0010be8fa00] 0xc0010be8fa00 (unreliable) > > > [c0010be8fa00] [c064a9f0] .mutex_lock_nested+0x384/0x430 > > > Instruction dump: > > > e81f0010 e93d 7fa04800 41fe0028 482e96e5 6000 2fa3 419e0018 > > > e93e8008 8009 2f80 409e0008<0fe0> e93e8000 8009 2f8 0 > > > Unable to handle kernel paging request for unknown fault > > > Faulting instruction address: 0xc008d0f4 > > > Oops: Kernel access of bad area, sig: 7 [#1] > > > SMP NR_CPUS=1024 NUMA > > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > > pSeries > > > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_ma p > > > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg > > > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod > > > NIP: c008d0f4 LR: c008d0d0 CTR: > > > REGS: c0010978f900 TRAP: 0600 Tainted: GW(2.6.35-rc3-gi t4-a > > utotest) > > > MSR: 80009032 > > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > > EE,ME,IR,DR>CR: 24022442 XER: 0012 > > > DAR: c0648f54, DSISR: 4001 > > > TASK = c001096e4900[7353] 'fs_racer_file_s' THREAD: c0010978c000 CPU: > > 10 > > > GPR00: 4000 c0010978fb80 c0d3d798 000 1 > > > GPR04: c083539e c1610228 c54c688 0 > > > GPR08: 06a5 c0648f54 0007 049b000 0 > > > GPR12: cf601900 fff f > > > GPR16: 4b7dc520 c0010978fea 0 > > > GPR20: 0fffcca7e7a0 0fffcca7e7a0 0fffabf7dfd0 0fffabf7dfd 0 > > > GPR24: 01200011 c0e1c0a8 c0648ed 4 > > > GPR28: c001096e4900 c0ca0458 c0010725d40 0 > > > NIP [c008d0f4] .copy_process+0x310/0xf40 > > > LR [c008d0d0] .copy_process+0x2ec/0xf40 > > > Call Trace: > > > [c0010978fb80] [c008d0d0] .copy_process+0x2ec/0xf40 (unreliab le) > > > [c0010978fc80] [c008deb4] .do_fork+0x190/0x3cc > > > [c0010978fdc0] [c0011ef4] .sys_clone+0x58/0x70 > > > [c0010978fe30] [c00087f0] .ppc_clone+0x8/0xc > > > Instruction dump: > > > 419e0010 7fe3fb78 480774cd 6000 801f0014 e93f0008 7800b842 39290080 > > > 78004800 6042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff 4 > > > > > > Kernel version 2.6.34-rc3-git3 works fine. > > > > Should this read 2.6.35-rc3-git3? > > > > If so, there's only about 20 commits in: > > 5904b3b81d2516..984bc9601f64fd > > > > The likely fs related candidates are from Christoph and Nick Piggin > > (added to CC) > > > > No commits relating to POWER6 or PPC. > > Not sure what's happening here. The first warning looks like some mutex > corruption, but it doesn't have a stack trace (these are 2 seperate > dumps, right? ie. the copy_process stack doesn't relate to the mutex > warning?) So I don't have much idea. > > If it is reproducable, can you try getting a better stack trace, or > better
Re: CONFIG_NO_HZ causing poor console responsiveness
On Tue, Jun 29, 2010 at 2:54 PM, Timur Tabi wrote: > I'm adding support for a new e500-based board (the P1022DS), and in > the process I've discovered that enabling CONFIG_NO_HZ (Tickless > System / Dynamic Ticks) causes significant responsiveness problems on > the serial console. When I type on the console, I see delays of up to > a half-second for almost every character. It acts as if there's a > background process eating all the CPU. I finally finished my git-bisect, and it wasn't that helpful. I had to skip several commits because the kernel just wouldn't boot: There are only 'skip'ped commits left to test. The first bad commit could be any of: 6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08 8b911acdf08477c059d1c36c21113ab1696c612b 21406928afe43f1db6acab4931bb8c886f4d04ce 5ca9880c6f4ba4c84b517bc2fed5366adf63d191 a64692a3afd85fe048551ab89142fd5ca99a0dbd f2e74eeac03ffb779d64b66a643c5e598145a28b c6ee36c423c3ed1fb86bb3eabba9fc256a300d16 e12f31d3e5d36328c7fbd0fce40a95e70b59152c 13814d42e45dfbe845a0bbe5184565d9236896ae b42e0c41a422a212ddea0666d5a3a0e3c35206db 39c0cbe2150cbd848a25ba6cdb271d1ad46818ad beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8 41acab8851a0408c1d5ad6c21a07456f88b54d40 6427462bfa50f50dc6c088c07037264fcc73eca1 c9494727cf293ae2ec66af57547a3e79c724fec2 We cannot bisect more! These correspond to a batch of scheduler patches, most from Mike Galbraith. I don't know what to do now. I can't test any of these commits. Even if I could, they look like they're all part of one set, so I doubt I could narrow it down to one commit anyway. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: machine check in kernel for a mpc870 board
On 07/01/2010 03:17 PM, Shawn Jin wrote: How do I find the address, reg, and range for nodes like localbus, soc, eth0, cpm, serial etc.? If your CCSRBAR is 0xfa20, then pretty much anywhere you see 0xff0x change it to 0xfa2x. I'm not sure about the range settings of 0xfe00. How do you get this? local...@fa200100 { compatible = "fsl,mpc885-localbus", "fsl,pq1-localbus", "simple-bus"; #address-cells =<2>; #size-cells =<1>; reg =<0xfa200100 0x40>; ranges =< 0 0 0xfe00 0x0100// I'm not sure about this? >; }; Change 0xfe00 to wherever u-boot maps your flash, and 0x0100 to whatever the size of the flash localbus mapping is. Or more generally update this section to hold whatever is connected to the localbus on your board. The first cell is the chipselect. Make sure that you've got Linux platform code enabled that matches the top-level compatible of your device tree. Try enabling PPC_EARLY_DEBUG_CPM, making sure to update PPC_EARLY_DEBUG_CPM_ADDR to 0xfa202008. I enabled this early debug feature but don't know this address change. The address change is for the different IMMR base, only this use is too early/hacky to get it from the device tree. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: machine check in kernel for a mpc870 board
>>> How do I find the address, reg, and range for nodes like localbus, >>> soc, eth0, cpm, serial etc.? > > If your CCSRBAR is 0xfa20, then pretty much anywhere you see 0xff0x > change it to 0xfa2x. I'm not sure about the range settings of 0xfe00. How do you get this? local...@fa200100 { compatible = "fsl,mpc885-localbus", "fsl,pq1-localbus", "simple-bus"; #address-cells = <2>; #size-cells = <1>; reg = <0xfa200100 0x40>; ranges = < 0 0 0xfe00 0x0100// I'm not sure about this? >; }; >> Linux/PowerPC load: root=/dev/ram >> Finalizing device tree... flat tree at 0x59e300 >> >> The gdb showed deadbeef. >> (gdb) target remote ppcbdi:2001 >> Remote debugging using ppcbdi:2001 >> 0xdeadbeef in ?? () >> (gdb) >> >> The kernel doesn't seem to start. What could go wrong here? > > Pretty much anything. :-) I realized that. :-P The kernel booting was able to stop at start_kernel(). I'm going to trace further. > Make sure that you've got Linux platform code enabled that matches the > top-level compatible of your device tree. Try enabling PPC_EARLY_DEBUG_CPM, > making sure to update PPC_EARLY_DEBUG_CPM_ADDR to 0xfa202008. I enabled this early debug feature but don't know this address change. I'll try it later. Thanks a lot, Scott. -Shawn. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Oops while running fs_racer test on a POWER6 box against latest git
On środa, 30 czerwca 2010 o 13:22:27 divya wrote: > While running fs_racer test from LTP on a POWER6 box against latest > git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the following > warning followed by multiple oops. > I created a Bugzilla entry at https://bugzilla.kernel.org/show_bug.cgi?id=16324 for your bug report, please add your address to the CC list in there, thanks! -- Maciej Rutecki http://www.maciek.unixy.pl ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v1]460EX on-chip SATA driver
Dear All, The Synopsis design ware core is task file orientated so the driver would still need CONFIG_ATA_SFF. I would be fixing the Kconfig file to make it dependent on the CONFIG_ATA_SFF. Regards, Rup -Original Message- From: Wolfgang Denk [mailto:w...@denx.de] Sent: Thursday, July 01, 2010 4:25 AM To: Josh Boyer Cc: Jeff Garzik; linux-...@vger.kernel.org; s...@denx.de; Rupjyoti Sarmah; linux-ker...@vger.kernel.org; linuxppc-...@ozlabs.org Subject: Re: [PATCH v1]460EX on-chip SATA driver Dear Josh Boyer, In message <20100630200325.gd7...@zod.rchland.ibm.com> you wrote: > > The driver doesn't depend on CONFIG_ATA_SFF in it's Kconfig file, but seems to > require it at build time. Isn't that something that needs fixing in the > driver? Right. Next question is if this is really needed for this driver. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de Copy from one, it's plagiarism; copy from two, it's research. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: machine check in kernel for a mpc870 board
On 07/01/2010 02:50 AM, Shawn Jin wrote: Hi Scott, How do I find the address, reg, and range for nodes like localbus, soc, eth0, cpm, serial etc.? If your CCSRBAR is 0xfa20, then pretty much anywhere you see 0xff0x change it to 0xfa2x. I managed to proceed a little bit further. Memory<-<0x0 0x800> (128MB) ENET0: local-mac-address<- 00:09:9b:01:58:64 CPU clock-frequency<- 0x7270e00 (120MHz) CPU timebase-frequency<- 0x393870 (4MHz) CPU bus-frequency<- 0x3938700 (60MHz) zImage starting: loaded at 0x0040 (sp: 0x07d1ccd0) Allocating 0x186bdd bytes for kernel ... gunzipping (0x<- 0x0040c000:0x00591c30)...done 0x173b18 bytes Linux/PowerPC load: root=/dev/ram Finalizing device tree... flat tree at 0x59e300 The gdb showed deadbeef. (gdb) target remote ppcbdi:2001 Remote debugging using ppcbdi:2001 0xdeadbeef in ?? () (gdb) The kernel doesn't seem to start. What could go wrong here? Pretty much anything. :-) Make sure that you've got Linux platform code enabled that matches the top-level compatible of your device tree. Try enabling PPC_EARLY_DEBUG_CPM, making sure to update PPC_EARLY_DEBUG_CPM_ADDR to 0xfa202008. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] KVM: PPC: Book3S_32 MMU debug compile fixes
Alexander Graf wrote: > Due to previous changes, the Book3S_32 guest MMU code didn't compile properly > when enabling debugging. > > This patch repairs the broken code paths, making it possible to define > DEBUG_MMU > and friends again. > > Signed-off-by: Alexander Graf > Please also don't forget this patch :) Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On Wed, Jun 30, 2010 at 03:18:44PM +0200, Alexander Graf wrote: > Book3s suffered from my really bad shadow MMU implementation so far. So > I finally got around to implement a combined hash and list mechanism that > allows for much faster lookup of mapped pages. > > To show that it really is faster, I tried to run simple process spawning > code inside the guest with and without these patches: > > [without] > > debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; > done > > real0m20.235s > user0m10.418s > sys 0m9.766s > > [with] > > debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; > done > > real0m14.659s > user0m8.967s > sys 0m5.688s > > So as you can see, performance improved significantly. > > v2 -> v3: > > - use hlist > - use global slab cache > > Alexander Graf (2): > KVM: PPC: Add generic hpte management functions > KVM: PPC: Make use of hash based Shadow MMU > > arch/powerpc/include/asm/kvm_book3s.h |9 + > arch/powerpc/include/asm/kvm_host.h | 17 ++- > arch/powerpc/kvm/Makefile |2 + > arch/powerpc/kvm/book3s.c | 14 ++- > arch/powerpc/kvm/book3s_32_mmu_host.c | 104 ++--- > arch/powerpc/kvm/book3s_64_mmu_host.c | 98 +--- > arch/powerpc/kvm/book3s_mmu_hpte.c| 277 > + > 7 files changed, 331 insertions(+), 190 deletions(-) > create mode 100644 arch/powerpc/kvm/book3s_mmu_hpte.c Applied, thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On 07/01/2010 03:52 PM, Alexander Graf wrote: Don't you use lazy spte updates? We do, but given enough time, the guest will touch its entire memory. Oh, so that's the major difference. On PPC we have the HTAB with a fraction of all the mapped pages in it. We don't have a notion of a full page table for a guest process. We always only have a snapshot of some mappings and shadow those lazily. So at worst, we have HPTEG_CACHE_NUM shadow pages mapped, which would be (1<< 15) * 4k which again would be at most 128MB of guest memory. We can't hold more mappings than that anyways, so chances are low we have a mapping for each hva. Doesn't that seriously impact performance? A guest that recycles pages from its lru will touch pages at random from its entire address space. On bare metal that isn't a problem (I imagine) due to large tlbs. But virtualized on 4K pages that means the htlb will be thrashed. But then again I probably do need an rmap for the mmu_notifier magic, right? But I'd rather prefer to have that code path be slow and the dirty bitmap invalidation fast than the other way around. Swapping is slow either way. It's not just swapping, it's also page ageing. That needs to be fast. Does ppc have a hardware-set referenced bit? If so, you need a fast rmap for mmu notifiers. Page ageing is difficult. The HTAB has a hardware set referenced bit, but we don't have a guarantee that the entry is still there when we look for it. Something else could have overwritten it by then, but the entry could still be lingering around in the TLB. Whoever's dropping the HTAB needs to update the host struct page, and also reflect the bit into the guest's HTAB, no? In fact, on x86 shadow, we don't have an spte for a gpte that is not accessed, precisely so we know the exact point in time when the accessed bit is set. So I think the only reasonable way to implement page ageing is to unmap pages. And that's slow, because it means we have to map them again on access. Bleks. Or we could look for the HTAB entry and only unmap them if the entry is moot. I think it works out if you update struct page when you clear out an HTAB. -- error compiling committee.c: too many arguments to function ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
Avi Kivity wrote: > On 07/01/2010 03:28 PM, Alexander Graf wrote: >> >>> Wouldn't it speed up dirty bitmap flushing a lot if we'd just have a simple linked list of all sPTEs belonging to that memslot? >>> The complexity is O(pages_in_slot) + O(sptes_for_slot). >>> >>> Usually, every page is mapped at least once, so sptes_for_slot >>> dominates. Even when it isn't so, iterating the rmap base pointers is >>> very fast since they are linear in memory, while sptes are scattered >>> around, causing cache misses. >>> >> Why would pages be mapped often? > > It's not a question of how often they are mapped (shadow: very often; > tdp: very rarely) but what percentage of pages are mapped. It's > usually 100%. > >> Don't you use lazy spte updates? >> > > We do, but given enough time, the guest will touch its entire memory. Oh, so that's the major difference. On PPC we have the HTAB with a fraction of all the mapped pages in it. We don't have a notion of a full page table for a guest process. We always only have a snapshot of some mappings and shadow those lazily. So at worst, we have HPTEG_CACHE_NUM shadow pages mapped, which would be (1 << 15) * 4k which again would be at most 128MB of guest memory. We can't hold more mappings than that anyways, so chances are low we have a mapping for each hva. > > >>> Another consideration is that on x86, an spte occupies just 64 bits >>> (for the hardware pte); if there are multiple sptes per page (rare on >>> modern hardware), there is also extra memory for rmap chains; >>> sometimes we also allocate 64 bits for the gfn. Having an extra >>> linked list would require more memory to be allocated and maintained. >>> >> Hrm. I was thinking of not having an rmap but only using the chain. The >> only slots that would require such a chain would be the ones with dirty >> bitmapping enabled, so no penalty for normal RAM (unless you use kemari >> or live migration of course). >> > > You could also only chain writeable ptes. Very true. Probably even more useful :). > >> But then again I probably do need an rmap for the mmu_notifier magic, >> right? But I'd rather prefer to have that code path be slow and the >> dirty bitmap invalidation fast than the other way around. Swapping is >> slow either way. >> > > It's not just swapping, it's also page ageing. That needs to be > fast. Does ppc have a hardware-set referenced bit? If so, you need a > fast rmap for mmu notifiers. Page ageing is difficult. The HTAB has a hardware set referenced bit, but we don't have a guarantee that the entry is still there when we look for it. Something else could have overwritten it by then, but the entry could still be lingering around in the TLB. So I think the only reasonable way to implement page ageing is to unmap pages. And that's slow, because it means we have to map them again on access. Bleks. Or we could look for the HTAB entry and only unmap them if the entry is moot. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On 07/01/2010 03:28 PM, Alexander Graf wrote: Wouldn't it speed up dirty bitmap flushing a lot if we'd just have a simple linked list of all sPTEs belonging to that memslot? The complexity is O(pages_in_slot) + O(sptes_for_slot). Usually, every page is mapped at least once, so sptes_for_slot dominates. Even when it isn't so, iterating the rmap base pointers is very fast since they are linear in memory, while sptes are scattered around, causing cache misses. Why would pages be mapped often? It's not a question of how often they are mapped (shadow: very often; tdp: very rarely) but what percentage of pages are mapped. It's usually 100%. Don't you use lazy spte updates? We do, but given enough time, the guest will touch its entire memory. Another consideration is that on x86, an spte occupies just 64 bits (for the hardware pte); if there are multiple sptes per page (rare on modern hardware), there is also extra memory for rmap chains; sometimes we also allocate 64 bits for the gfn. Having an extra linked list would require more memory to be allocated and maintained. Hrm. I was thinking of not having an rmap but only using the chain. The only slots that would require such a chain would be the ones with dirty bitmapping enabled, so no penalty for normal RAM (unless you use kemari or live migration of course). You could also only chain writeable ptes. But then again I probably do need an rmap for the mmu_notifier magic, right? But I'd rather prefer to have that code path be slow and the dirty bitmap invalidation fast than the other way around. Swapping is slow either way. It's not just swapping, it's also page ageing. That needs to be fast. Does ppc have a hardware-set referenced bit? If so, you need a fast rmap for mmu notifiers. -- error compiling committee.c: too many arguments to function ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
Avi Kivity wrote: > On 07/01/2010 01:00 PM, Alexander Graf wrote: >> >> But doesn't that mean that you still need to loop through all the hvas >> that you want to invalidate? > > It does. > >> Wouldn't it speed up dirty bitmap flushing >> a lot if we'd just have a simple linked list of all sPTEs belonging to >> that memslot? >> > > The complexity is O(pages_in_slot) + O(sptes_for_slot). > > Usually, every page is mapped at least once, so sptes_for_slot > dominates. Even when it isn't so, iterating the rmap base pointers is > very fast since they are linear in memory, while sptes are scattered > around, causing cache misses. Why would pages be mapped often? Don't you use lazy spte updates? > > Another consideration is that on x86, an spte occupies just 64 bits > (for the hardware pte); if there are multiple sptes per page (rare on > modern hardware), there is also extra memory for rmap chains; > sometimes we also allocate 64 bits for the gfn. Having an extra > linked list would require more memory to be allocated and maintained. Hrm. I was thinking of not having an rmap but only using the chain. The only slots that would require such a chain would be the ones with dirty bitmapping enabled, so no penalty for normal RAM (unless you use kemari or live migration of course). But then again I probably do need an rmap for the mmu_notifier magic, right? But I'd rather prefer to have that code path be slow and the dirty bitmap invalidation fast than the other way around. Swapping is slow either way. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] KVM: PPC: Magic Page BookE support
Josh Boyer wrote: > On Thu, Jul 01, 2010 at 12:42:49PM +0200, Alexander Graf wrote: > >> As we now have Book3s support for the magic page, we also need BookE to >> join in on the party. >> >> This patch implements generic magic page logic for BookE and specific >> TLB logic for e500. I didn't have any 440 around, so I didn't dare to >> blindly try and write up broken code. >> > > Is this the only patch in the series that needs 440 specific code? Also, > does 440 KVM still work after this series is applied even without the code > not present in this patch? > Yes, pretty much. The rest of the code is generic. But 440 should easily just work with this patch set. If you have one to try it out, please give it a try. I can even prepare a 440 enabling patch so you could verify if it works. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] KVM: PPC: Magic Page BookE support
On Thu, Jul 01, 2010 at 12:42:49PM +0200, Alexander Graf wrote: >As we now have Book3s support for the magic page, we also need BookE to >join in on the party. > >This patch implements generic magic page logic for BookE and specific >TLB logic for e500. I didn't have any 440 around, so I didn't dare to >blindly try and write up broken code. Is this the only patch in the series that needs 440 specific code? Also, does 440 KVM still work after this series is applied even without the code not present in this patch? josh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On 07/01/2010 01:00 PM, Alexander Graf wrote: But doesn't that mean that you still need to loop through all the hvas that you want to invalidate? It does. Wouldn't it speed up dirty bitmap flushing a lot if we'd just have a simple linked list of all sPTEs belonging to that memslot? The complexity is O(pages_in_slot) + O(sptes_for_slot). Usually, every page is mapped at least once, so sptes_for_slot dominates. Even when it isn't so, iterating the rmap base pointers is very fast since they are linear in memory, while sptes are scattered around, causing cache misses. Another consideration is that on x86, an spte occupies just 64 bits (for the hardware pte); if there are multiple sptes per page (rare on modern hardware), there is also extra memory for rmap chains; sometimes we also allocate 64 bits for the gfn. Having an extra linked list would require more memory to be allocated and maintained. -- error compiling committee.c: too many arguments to function ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Oops while running fs_racer test on a POWER6 box against latest git
On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote: > > While running fs_racer test from LTP on a POWER6 box against latest > > git(2.6.3 > 5-rc3-git4 - commitid 984bc9601f64fd) > > came across the following warning followed by multiple oops. > > > > [ cut here ] > > > > Badness at kernel/mutex-debug.c:64 > > NIP: c00be9e8 LR: c00be9cc CTR: > > REGS: c0010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotest) > > MSR: 80029032CR: 24224422 XER: 0012 > > TASK = c0010727cf00[8211] 'fs_racer_file_c' THREAD: c0010be8bb50 > > CPU: > 2 > > GPR00: c0010be8f970 c0d3d798 0001 > > GPR04: c0010be8fa70 c0010be8c000 c0010727d9f8 > > GPR08: c43042f0 c16534e8 017a c0c29a1c > > GPR12: 28228424 cf600500 c0010be8fc40 2000 > > GPR16: f000 c00109c73000 c0010be8fc30 00010442 > > GPR20: 01b6 c0010dd12250 > > GPR24: c017c08c c0010727cf00 c0010dd12278 c0010dd12210 > > GPR28: 0001 c0010be8c000 c0ca2008 c0010be8fa70 > > NIP [c00be9e8] .mutex_remove_waiter+0xa4/0x130 > > LR [c00be9cc] .mutex_remove_waiter+0x88/0x130 > > Call Trace: > > [c0010be8f970] [c0010be8fa00] 0xc0010be8fa00 (unreliable) > > [c0010be8fa00] [c064a9f0] .mutex_lock_nested+0x384/0x430 > > Instruction dump: > > e81f0010 e93d 7fa04800 41fe0028 482e96e5 6000 2fa3 419e0018 > > e93e8008 8009 2f80 409e0008<0fe0> e93e8000 8009 2f80 > > Unable to handle kernel paging request for unknown fault > > Faulting instruction address: 0xc008d0f4 > > Oops: Kernel access of bad area, sig: 7 [#1] > > SMP NR_CPUS=1024 NUMA > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > pSeries > > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map > > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg > > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod > > NIP: c008d0f4 LR: c008d0d0 CTR: > > REGS: c0010978f900 TRAP: 0600 Tainted: GW > > (2.6.35-rc3-git4-a > utotest) > > MSR: 80009032 > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > Unrecoverable FP Unavailable Exception 800 at c0648ed4 > > EE,ME,IR,DR>CR: 24022442 XER: 0012 > > DAR: c0648f54, DSISR: 4001 > > TASK = c001096e4900[7353] 'fs_racer_file_s' THREAD: c0010978c000 > > CPU: > 10 > > GPR00: 4000 c0010978fb80 c0d3d798 0001 > > GPR04: c083539e c1610228 c54c6880 > > GPR08: 06a5 c0648f54 0007 049b > > GPR12: cf601900 > > GPR16: 4b7dc520 c0010978fea0 > > GPR20: 0fffcca7e7a0 0fffcca7e7a0 0fffabf7dfd0 0fffabf7dfd0 > > GPR24: 01200011 c0e1c0a8 c0648ed4 > > GPR28: c001096e4900 c0ca0458 c0010725d400 > > NIP [c008d0f4] .copy_process+0x310/0xf40 > > LR [c008d0d0] .copy_process+0x2ec/0xf40 > > Call Trace: > > [c0010978fb80] [c008d0d0] .copy_process+0x2ec/0xf40 (unreliable) > > [c0010978fc80] [c008deb4] .do_fork+0x190/0x3cc > > [c0010978fdc0] [c0011ef4] .sys_clone+0x58/0x70 > > [c0010978fe30] [c00087f0] .ppc_clone+0x8/0xc > > Instruction dump: > > 419e0010 7fe3fb78 480774cd 6000 801f0014 e93f0008 7800b842 39290080 > > 78004800 6042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff4 > > > > Kernel version 2.6.34-rc3-git3 works fine. > > Should this read 2.6.35-rc3-git3? > > If so, there's only about 20 commits in: > 5904b3b81d2516..984bc9601f64fd > > The likely fs related candidates are from Christoph and Nick Piggin > (added to CC) > > No commits relating to POWER6 or PPC. Not sure what's happening here. The first warning looks like some mutex corruption, but it doesn't have a stack trace (these are 2 seperate dumps, right? ie. the copy_process stack doesn't relate to the mutex warning?) So I don't have much idea. If it is reproducable, can you try getting a better stack trace, or better yet, even bisecting if there is just a small window? Thanks, Nick ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 14/27] KVM: PPC: Magic Page BookE support
As we now have Book3s support for the magic page, we also need BookE to join in on the party. This patch implements generic magic page logic for BookE and specific TLB logic for e500. I didn't have any 440 around, so I didn't dare to blindly try and write up broken code. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/booke.c| 29 + arch/powerpc/kvm/e500_tlb.c | 19 +-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 0f8ff9d..9609207 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -244,6 +244,31 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) vcpu->arch.shared->int_pending = 0; } +/* Check if a DTLB miss was on the magic page. Returns !0 if so. */ +int kvmppc_dtlb_magic_page(struct kvm_vcpu *vcpu, ulong eaddr) +{ + ulong mp_ea = vcpu->arch.magic_page_ea; + ulong gpaddr = vcpu->arch.magic_page_pa; + int gtlb_index = 11 | (1 << 16); /* Random number in TLB1 */ + + /* Check for existence of magic page */ + if(likely(!mp_ea)) + return 0; + + /* Check if we're on the magic page */ + if(likely((eaddr >> 12) != (mp_ea >> 12))) + return 0; + + /* Don't map in user mode */ + if(vcpu->arch.shared->msr & MSR_PR) + return 0; + + kvmppc_mmu_map(vcpu, vcpu->arch.magic_page_ea, gpaddr, gtlb_index); + kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS); + + return 1; +} + /** * kvmppc_handle_exit * @@ -311,6 +336,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_HOST; break; case EMULATE_FAIL: + case EMULATE_DO_MMIO: /* XXX Deliver Program interrupt to guest. */ printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n", __func__, vcpu->arch.pc, vcpu->arch.last_inst); @@ -380,6 +406,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, gpa_t gpaddr; gfn_t gfn; + if (kvmppc_dtlb_magic_page(vcpu, eaddr)) + break; + /* Check the guest TLB. */ gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr); if (gtlb_index < 0) { diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index 66845a5..f5582ca 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -295,9 +295,22 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, struct page *new_page; struct tlbe *stlbe; hpa_t hpaddr; + u32 mas2 = gtlbe->mas2; + u32 mas3 = gtlbe->mas3; stlbe = &vcpu_e500->shadow_tlb[tlbsel][esel]; + if ((vcpu_e500->vcpu.arch.magic_page_ea) && + ((vcpu_e500->vcpu.arch.magic_page_pa >> PAGE_SHIFT) == gfn) && + !(vcpu_e500->vcpu.arch.shared->msr & MSR_PR)) { + mas2 = 0; + mas3 = E500_TLB_SUPER_PERM_MASK; + hpaddr = virt_to_phys(vcpu_e500->vcpu.arch.shared); + new_page = pfn_to_page(hpaddr >> PAGE_SHIFT); + get_page(new_page); + goto mapped; + } + /* Get reference to new page. */ new_page = gfn_to_page(vcpu_e500->vcpu.kvm, gfn); if (is_error_page(new_page)) { @@ -305,6 +318,8 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, kvm_release_page_clean(new_page); return; } + +mapped: hpaddr = page_to_phys(new_page); /* Drop reference to old page. */ @@ -316,10 +331,10 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, stlbe->mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K) | MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID; stlbe->mas2 = (gvaddr & MAS2_EPN) - | e500_shadow_mas2_attrib(gtlbe->mas2, + | e500_shadow_mas2_attrib(mas2, vcpu_e500->vcpu.arch.shared->msr & MSR_PR); stlbe->mas3 = (hpaddr & MAS3_RPN) - | e500_shadow_mas3_attrib(gtlbe->mas3, + | e500_shadow_mas3_attrib(mas3, vcpu_e500->vcpu.arch.shared->msr & MSR_PR); stlbe->mas7 = (hpaddr >> 32) & MAS7_RPN; -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 25/27] KVM: PPC: PV mtmsrd L=0 and mtmsr
There is also a form of mtmsr where all bits need to be addressed. While the PPC64 Linux kernel behaves resonably well here, on PPC32 we do not have an L=1 form. It does mtmsr even for simple things like only changing EE. So we need to hook into that one as well and check for a mask of bits that we deem safe to change from within guest context. Signed-off-by: Alexander Graf --- v1 -> v2: - use kvm_patch_ins_b --- arch/powerpc/kernel/kvm.c | 51 arch/powerpc/kernel/kvm_emul.S | 84 2 files changed, 135 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 1e32298..2541736 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -62,7 +62,9 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L0 0x7c000164 #define KVM_INST_MTMSRD_L1 0x7c010164 +#define KVM_INST_MTMSR 0x7c000124 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -166,6 +168,49 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) kvm_patch_ins_b(inst, distance_start); } +extern u32 kvm_emulate_mtmsr_branch_offs; +extern u32 kvm_emulate_mtmsr_reg1_offs; +extern u32 kvm_emulate_mtmsr_reg2_offs; +extern u32 kvm_emulate_mtmsr_reg3_offs; +extern u32 kvm_emulate_mtmsr_orig_ins_offs; +extern u32 kvm_emulate_mtmsr_len; +extern u32 kvm_emulate_mtmsr[]; + +static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsr_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)&p[kvm_emulate_mtmsr_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start > KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4); + p[kvm_emulate_mtmsr_branch_offs] |= distance_end & KVM_INST_B_MASK; + p[kvm_emulate_mtmsr_reg1_offs] |= rt; + p[kvm_emulate_mtmsr_reg2_offs] |= rt; + p[kvm_emulate_mtmsr_reg3_offs] |= rt; + p[kvm_emulate_mtmsr_orig_ins_offs] = *inst; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4); + + /* Patch the invocation */ + kvm_patch_ins_b(inst, distance_start); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -246,6 +291,12 @@ static void kvm_check_ins(u32 *inst) if (get_rt(inst_rt) < 30) kvm_patch_ins_mtmsrd(inst, inst_rt); break; + case KVM_INST_MTMSR: + case KVM_INST_MTMSRD_L0: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) < 30) + kvm_patch_ins_mtmsr(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 25e6683..ccf5a42 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -110,3 +110,87 @@ kvm_emulate_mtmsrd_reg_offs: .global kvm_emulate_mtmsrd_len kvm_emulate_mtmsrd_len: .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 + + +#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI) +#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS + +.global kvm_emulate_mtmsr +kvm_emulate_mtmsr: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Find the changed bits between old and new MSR */ +kvm_emulate_mtmsr_reg1: + xor r31, r0, r31 + + /* Check if we need to really do mtmsr */ + LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS) + and.r31, r31, r30 + + /* No critical bits changed? Maybe we can stay in the guest. */ + beq maybe_stay_in_guest + +do_mtmsr: + + SCRATCH_RESTORE + + /* Just fire off the mtmsr if it's critical */ +kvm_emulate_mtmsr_orig_ins: + mtmsr r0 + + b kvm_emulate_mtmsr_branch + +maybe_stay_in_guest: + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_mtmsr + + /* Check if we may trigger an interrupt */ +kvm_emulate_mtmsr_reg2: + andi. r31, r0, MSR_EE + beq no_mtmsr + + b do_mtmsr + +no_mtmsr: + + /* Put MSR into magic page because we don't call mtmsr */ +kvm_emulate_mtmsr_reg3: + STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to caller
[PATCH 19/27] KVM: PPC: PV instructions to loads and stores
Some instructions can simply be replaced by load and store instructions to or from the magic page. This patch replaces often called instructions that fall into the above category. Signed-off-by: Alexander Graf --- v1 -> v2: - use kvm_patch_ins --- arch/powerpc/kernel/kvm.c | 111 + 1 files changed, 111 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 1f328d5..7094ee4 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -32,6 +32,35 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +#define KVM_INST_LWZ 0x8000 +#define KVM_INST_STW 0x9000 +#define KVM_INST_LD0xe800 +#define KVM_INST_STD 0xf800 +#define KVM_INST_NOP 0x6000 +#define KVM_INST_B 0x4800 +#define KVM_INST_B_MASK0x03ff +#define KVM_INST_B_MAX 0x01ff + +#define KVM_MASK_RT0x03e0 +#define KVM_INST_MFMSR 0x7ca6 +#define KVM_INST_MFSPR_SPRG0 0x7c1042a6 +#define KVM_INST_MFSPR_SPRG1 0x7c1142a6 +#define KVM_INST_MFSPR_SPRG2 0x7c1242a6 +#define KVM_INST_MFSPR_SPRG3 0x7c1342a6 +#define KVM_INST_MFSPR_SRR00x7c1a02a6 +#define KVM_INST_MFSPR_SRR10x7c1b02a6 +#define KVM_INST_MFSPR_DAR 0x7c1302a6 +#define KVM_INST_MFSPR_DSISR 0x7c1202a6 + +#define KVM_INST_MTSPR_SPRG0 0x7c1043a6 +#define KVM_INST_MTSPR_SPRG1 0x7c1143a6 +#define KVM_INST_MTSPR_SPRG2 0x7c1243a6 +#define KVM_INST_MTSPR_SPRG3 0x7c1343a6 +#define KVM_INST_MTSPR_SRR00x7c1a03a6 +#define KVM_INST_MTSPR_SRR10x7c1b03a6 +#define KVM_INST_MTSPR_DAR 0x7c1303a6 +#define KVM_INST_MTSPR_DSISR 0x7c1203a6 + static bool kvm_patching_worked = true; static inline void kvm_patch_ins(u32 *inst, u32 new_inst) @@ -40,6 +69,34 @@ static inline void kvm_patch_ins(u32 *inst, u32 new_inst) flush_icache_range((ulong)inst, (ulong)inst + 4); } +static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0xfffc)); +#else + kvm_patch_ins(inst, KVM_INST_LWZ | rt | ((addr + 4) & 0xfffc)); +#endif +} + +static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt) +{ + kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr & 0x)); +} + +static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + kvm_patch_ins(inst, KVM_INST_STD | rt | (addr & 0xfffc)); +#else + kvm_patch_ins(inst, KVM_INST_STW | rt | ((addr + 4) & 0xfffc)); +#endif +} + +static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) +{ + kvm_patch_ins(inst, KVM_INST_STW | rt | (addr & 0xfffc)); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -54,6 +111,60 @@ static void kvm_check_ins(u32 *inst) u32 inst_rt = _inst & KVM_MASK_RT; switch (inst_no_rt) { + /* Loads */ + case KVM_INST_MFMSR: + kvm_patch_ins_ld(inst, magic_var(msr), inst_rt); + break; + case KVM_INST_MFSPR_SPRG0: + kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MFSPR_SPRG1: + kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MFSPR_SPRG2: + kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MFSPR_SPRG3: + kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MFSPR_SRR0: + kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MFSPR_SRR1: + kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MFSPR_DAR: + kvm_patch_ins_ld(inst, magic_var(dar), inst_rt); + break; + case KVM_INST_MFSPR_DSISR: + kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt); + break; + + /* Stores */ + case KVM_INST_MTSPR_SPRG0: + kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MTSPR_SPRG1: + kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MTSPR_SPRG2: + kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MTSPR_SPRG3: + kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MTSPR_SRR0: + kvm_patch_ins_std(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MTSPR_SRR1: + kvm_patch_ins_std(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST
[PATCH 24/27] KVM: PPC: PV mtmsrd L=1
The PowerPC ISA has a special instruction for mtmsr that only changes the EE and RI bits, namely the L=1 form. Since that one is reasonably often occuring and simple to implement, let's go with this first. Writing EE=0 is always just a store. Doing EE=1 also requires us to check for pending interrupts and if necessary exit back to the hypervisor. Signed-off-by: Alexander Graf --- v1 -> v2: - use kvm_patch_ins_b --- arch/powerpc/kernel/kvm.c | 45 arch/powerpc/kernel/kvm_emul.S | 56 2 files changed, 101 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 337e3e5..1e32298 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -62,6 +62,7 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L1 0x7c010164 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -128,6 +129,43 @@ static u32 *kvm_alloc(int len) return p; } +extern u32 kvm_emulate_mtmsrd_branch_offs; +extern u32 kvm_emulate_mtmsrd_reg_offs; +extern u32 kvm_emulate_mtmsrd_len; +extern u32 kvm_emulate_mtmsrd[]; + +static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsrd_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)&p[kvm_emulate_mtmsrd_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start > KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4); + p[kvm_emulate_mtmsrd_branch_offs] |= distance_end & KVM_INST_B_MASK; + p[kvm_emulate_mtmsrd_reg_offs] |= rt; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4); + + /* Patch the invocation */ + kvm_patch_ins_b(inst, distance_start); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -201,6 +239,13 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_TLBSYNC: kvm_patch_ins_nop(inst); break; + + /* Rewrites */ + case KVM_INST_MTMSRD_L1: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) < 30) + kvm_patch_ins_mtmsrd(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 7da835a..25e6683 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -54,3 +54,59 @@ /* Disable critical section. We are critical if \ shared->critical == r1 and r2 is always != r1 */ \ STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); + +.global kvm_emulate_mtmsrd +kvm_emulate_mtmsrd: + + SCRATCH_SAVE + + /* Put MSR & ~(MSR_EE|MSR_RI) in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + lis r30, (~(MSR_EE | MSR_RI))@h + ori r30, r30, (~(MSR_EE | MSR_RI))@l + and r31, r31, r30 + + /* OR the register's (MSR_EE|MSR_RI) on MSR */ +kvm_emulate_mtmsrd_reg: + andi. r30, r0, (MSR_EE|MSR_RI) + or r31, r31, r30 + + /* Put MSR back into magic page */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_check + + /* Check if we may trigger an interrupt */ + andi. r30, r30, MSR_EE + beq no_check + + SCRATCH_RESTORE + + /* Nag hypervisor */ + tlbsync + + b kvm_emulate_mtmsrd_branch + +no_check: + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_mtmsrd_branch: + b . +kvm_emulate_mtmsrd_end: + +.global kvm_emulate_mtmsrd_branch_offs +kvm_emulate_mtmsrd_branch_offs: + .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_reg_offs +kvm_emulate_mtmsrd_reg_offs: + .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_len +kvm_emulate_mtmsrd_len: + .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 27/27] KVM: PPC: Add Documentation about PV interface
We just introduced a new PV interface that screams for documentation. So here it is - a shiny new and awesome text file describing the internal works of the PPC KVM paravirtual interface. Signed-off-by: Alexander Graf --- v1 -> v2: - clarify guest implementation - clarify that privileged instructions still work - explain safe MSR bits - Fix dsisr patch description - change hypervisor calls to use new register values --- Documentation/kvm/ppc-pv.txt | 185 ++ 1 files changed, 185 insertions(+), 0 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt new file mode 100644 index 000..82de6c6 --- /dev/null +++ b/Documentation/kvm/ppc-pv.txt @@ -0,0 +1,185 @@ +The PPC KVM paravirtual interface += + +The basic execution principle by which KVM on PowerPC works is to run all kernel +space code in PR=1 which is user space. This way we trap all privileged +instructions and can emulate them accordingly. + +Unfortunately that is also the downfall. There are quite some privileged +instructions that needlessly return us to the hypervisor even though they +could be handled differently. + +This is what the PPC PV interface helps with. It takes privileged instructions +and transforms them into unprivileged ones with some help from the hypervisor. +This cuts down virtualization costs by about 50% on some of my benchmarks. + +The code for that interface can be found in arch/powerpc/kernel/kvm* + +Querying for existence +== + +To find out if we're running on KVM or not, we overlay the PVR register. Usually +the PVR register contains an id that identifies your CPU type. If, however, you +pass KVM_PVR_PARA in the register that you want the PVR result in, the register +still contains KVM_PVR_PARA after the mfpvr call. + + LOAD_REG_IMM(r5, KVM_PVR_PARA) + mfpvr r5 + [r5 still contains KVM_PVR_PARA] + +Once determined to run under a PV capable KVM, you can now use hypercalls as +described below. + +PPC hypercalls +== + +The only viable ways to reliably get from guest context to host context are: + + 1) Call an invalid instruction + 2) Call the "sc" instruction with a parameter to "sc" + 3) Call the "sc" instruction with parameters in GPRs + +Method 1 is always a bad idea. Invalid instructions can be replaced later on +by valid instructions, rendering the interface broken. + +Method 2 also has downfalls. If the parameter to "sc" is != 0 the spec is +rather unclear if the sc is targeted directly for the hypervisor or the +supervisor. It would also require that we read the syscall issuing instruction +every time a syscall is issued, slowing down guest syscalls. + +Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R0 and +KVM_SC_MAGIC_R3) in r0 and r3 respectively. If a syscall instruction with these +magic values arrives from the guest's kernel mode, we take the syscall as a +hypercall. + +The parameters are as follows: + + r0 KVM_SC_MAGIC_R0 + r3 KVM_SC_MAGIC_R3 Return code + r4 Hypercall number + r5 First parameter + r6 Second parameter + r7 Third parameter + r8 Fourth parameter + +Hypercall definitions are shared in generic code, so the same hypercall numbers +apply for x86 and powerpc alike. + +The magic page +== + +To enable communication between the hypervisor and guest there is a new shared +page that contains parts of supervisor visible register state. The guest can +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE. + +With this hypercall issued the guest always gets the magic page mapped at the +desired location in effective and physical address space. For now, we always +map the page to -4096. This way we can access it using absolute load and store +functions. The following instruction reads the first field of the magic page: + + ld rX, -4096(0) + +The interface is designed to be extensible should there be need later to add +additional registers to the magic page. If you add fields to the magic page, +also define a new hypercall feature to indicate that the host can give you more +registers. Only if the host supports the additional features, make use of them. + +The magic page has the following layout as described in +arch/powerpc/include/asm/kvm_para.h: + +struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; + __u64 critical; /* Guest may not get interrupts if == r1 */ + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; + __u64 srr0; + __u64 srr1; + __u64 dar; + __u64 msr; + __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt
[PATCH 21/27] KVM: PPC: Introduce kvm_tmp framework
We will soon require more sophisticated methods to replace single instructions with multiple instructions. We do that by branching to a memory region where we write replacement code for the instruction to. This region needs to be within 32 MB of the patched instruction though, because that's the furthest we can jump with immediate branches. So we keep 1MB of free space around in bss. After we're done initing we can just tell the mm system that the unused pages are free, but until then we have enough space to fit all our code in. Signed-off-by: Alexander Graf --- arch/powerpc/kernel/kvm.c | 41 +++-- 1 files changed, 39 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 3a49de5..75c9e0b 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -64,6 +64,8 @@ #define KVM_INST_TLBSYNC 0x7c00046c static bool kvm_patching_worked = true; +static char kvm_tmp[1024 * 1024]; +static int kvm_tmp_index; static inline void kvm_patch_ins(u32 *inst, u32 new_inst) { @@ -104,6 +106,23 @@ static void kvm_patch_ins_nop(u32 *inst) kvm_patch_ins(inst, KVM_INST_NOP); } +static u32 *kvm_alloc(int len) +{ + u32 *p; + + if ((kvm_tmp_index + len) > ARRAY_SIZE(kvm_tmp)) { + printk(KERN_ERR "KVM: No more space (%d + %d)\n", + kvm_tmp_index, len); + kvm_patching_worked = false; + return NULL; + } + + p = (void*)&kvm_tmp[kvm_tmp_index]; + kvm_tmp_index += len; + + return p; +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -201,12 +220,27 @@ static void kvm_use_magic_page(void) kvm_check_ins(p); } +static void kvm_free_tmp(void) +{ + unsigned long start, end; + + start = (ulong)&kvm_tmp[kvm_tmp_index + (PAGE_SIZE - 1)] & PAGE_MASK; + end = (ulong)&kvm_tmp[ARRAY_SIZE(kvm_tmp)] & PAGE_MASK; + + /* Free the tmp space we don't need */ + for (; start < end; start += PAGE_SIZE) { + ClearPageReserved(virt_to_page(start)); + init_page_count(virt_to_page(start)); + free_page(start); + totalram_pages++; + } +} + static int __init kvm_guest_init(void) { - char *p; if (!kvm_para_available()) - return 0; + goto free_tmp; if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) kvm_use_magic_page(); @@ -214,6 +248,9 @@ static int __init kvm_guest_init(void) printk(KERN_INFO "KVM: Live patching for a fast VM %s\n", kvm_patching_worked ? "worked" : "failed"); +free_tmp: + kvm_free_tmp(); + return 0; } -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 22/27] KVM: PPC: Introduce branch patching helper
We will need to patch several instruction streams over to a different code path, so we need a way to patch a single instruction with a branch somewhere else. This patch adds a helper to facilitate this patching. Signed-off-by: Alexander Graf --- arch/powerpc/kernel/kvm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 75c9e0b..337e3e5 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -106,6 +106,11 @@ static void kvm_patch_ins_nop(u32 *inst) kvm_patch_ins(inst, KVM_INST_NOP); } +static void kvm_patch_ins_b(u32 *inst, int addr) +{ + kvm_patch_ins(inst, KVM_INST_B | (addr & KVM_INST_B_MASK)); +} + static u32 *kvm_alloc(int len) { u32 *p; -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 26/27] KVM: PPC: PV wrteei
On BookE the preferred way to write the EE bit is the wrteei instruction. It already encodes the EE bit in the instruction. So in order to get BookE some speedups as well, let's also PV'nize thati instruction. Signed-off-by: Alexander Graf --- v1 -> v2: - use kvm_patch_ins_b --- arch/powerpc/kernel/kvm.c | 50 arch/powerpc/kernel/kvm_emul.S | 41 2 files changed, 91 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 2541736..995fadd 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -66,6 +66,9 @@ #define KVM_INST_MTMSRD_L1 0x7c010164 #define KVM_INST_MTMSR 0x7c000124 +#define KVM_INST_WRTEEI_0 0x7c000146 +#define KVM_INST_WRTEEI_1 0x7c008146 + static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; static int kvm_tmp_index; @@ -211,6 +214,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) kvm_patch_ins_b(inst, distance_start); } +#ifdef CONFIG_BOOKE + +extern u32 kvm_emulate_wrteei_branch_offs; +extern u32 kvm_emulate_wrteei_ee_offs; +extern u32 kvm_emulate_wrteei_len; +extern u32 kvm_emulate_wrteei[]; + +static void kvm_patch_ins_wrteei(u32 *inst) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_wrteei_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)&p[kvm_emulate_wrteei_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start > KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4); + p[kvm_emulate_wrteei_branch_offs] |= distance_end & KVM_INST_B_MASK; + p[kvm_emulate_wrteei_ee_offs] |= (*inst & MSR_EE); + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4); + + /* Patch the invocation */ + kvm_patch_ins_b(inst, distance_start); +} + +#endif + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -300,6 +344,12 @@ static void kvm_check_ins(u32 *inst) } switch (_inst) { +#ifdef CONFIG_BOOKE + case KVM_INST_WRTEEI_0: + case KVM_INST_WRTEEI_1: + kvm_patch_ins_wrteei(inst); + break; +#endif } } diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index ccf5a42..b79b9de 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs: .global kvm_emulate_mtmsr_len kvm_emulate_mtmsr_len: .long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4 + + + +.global kvm_emulate_wrteei +kvm_emulate_wrteei: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Remove MSR_EE from old MSR */ + li r30, 0 + ori r30, r30, MSR_EE + andcr31, r31, r30 + + /* OR new MSR_EE onto the old MSR */ +kvm_emulate_wrteei_ee: + ori r31, r31, 0 + + /* Write new MSR value back */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_wrteei_branch: + b . +kvm_emulate_wrteei_end: + +.global kvm_emulate_wrteei_branch_offs +kvm_emulate_wrteei_branch_offs: + .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_ee_offs +kvm_emulate_wrteei_ee_offs: + .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_len +kvm_emulate_wrteei_len: + .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4 -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 02/27] KVM: PPC: Convert MSR to shared page
One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the "interrupts enabled" flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the documentation for a list of MSR bits that are safe to be set from inside the guest. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h |1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kernel/asm-offsets.c|2 +- arch/powerpc/kvm/44x_tlb.c |8 ++-- arch/powerpc/kvm/book3s.c| 65 -- arch/powerpc/kvm/book3s_32_mmu.c | 12 +++--- arch/powerpc/kvm/book3s_32_mmu_host.c|4 +- arch/powerpc/kvm/book3s_64_mmu.c | 12 +++--- arch/powerpc/kvm/book3s_64_mmu_host.c|4 +- arch/powerpc/kvm/book3s_emulate.c|9 ++-- arch/powerpc/kvm/book3s_paired_singles.c |7 ++- arch/powerpc/kvm/booke.c | 20 +- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c |6 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500_tlb.c | 12 +++--- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/powerpc.c |3 +- 18 files changed, 93 insertions(+), 84 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 246a3dd..c7aee42 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -210,7 +210,6 @@ struct kvm_vcpu_arch { u32 cr; #endif - ulong msr; #ifdef CONFIG_PPC_BOOK3S ulong shadow_msr; ulong hflags; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 1485ba8..a17dc52 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include struct kvm_vcpu_arch_shared { + __u64 msr; }; #ifdef __KERNEL__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 944f593..a55d47e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -394,13 +394,13 @@ int main(void) DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack)); DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid)); DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr)); - DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr)); DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4)); DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5)); DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); + DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c index 8123125..4cbbca7 100644 --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -221,14 +221,14 @@ gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index, int kvmppc_mmu_itlb_index(struct kvm_vcpu *vcpu, gva_t eaddr) { - unsigned int as = !!(vcpu->arch.msr & MSR_IS); + unsigned int as = !!(vcpu->arch.shared->msr & MSR_IS); return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as); } int kvmppc_mmu_dtlb_index(struct kvm_vcpu *vcpu, gva_t eaddr) { - unsigned int as = !!(vcpu->arch.msr & MSR_DS); + unsigned int as = !!(vcpu->arch.shared->msr & MSR_DS); return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as); } @@ -353,7 +353,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gpa_t gpaddr, stlbe.word1 = (hpaddr & 0xfc00) | ((hpaddr >> 32) & 0xf); stlbe.word2 = kvmppc_44x_tlb_shadow_attrib(flags, - vcpu->arch.msr & MSR_PR); + vcpu->arch.shared->msr & MSR_PR); stlbe.tid = !(asid & 0xff); /* Keep track of the reference so we can properly release it later. */ @@ -422,7 +422,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu, /* Does it match current guest AS? */ /* XXX what about IS != DS? */ - if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS)) + if (get_tlb_ts(tlbe) != !!(vcpu->arch.shared->msr & MSR_IS)) return 0; gpa = get_tlb_raddr(tlbe); di
[PATCH 17/27] KVM: PPC: Generic KVM PV guest support
We have all the hypervisor pieces in place now, but the guest parts are still missing. This patch implements basic awareness of KVM when running Linux as guest. It doesn't do anything with it yet though. Signed-off-by: Alexander Graf --- arch/powerpc/kernel/Makefile |2 ++ arch/powerpc/kernel/asm-offsets.c | 15 +++ arch/powerpc/kernel/kvm.c | 34 ++ arch/powerpc/kernel/kvm_emul.S| 27 +++ arch/powerpc/platforms/Kconfig| 10 ++ 5 files changed, 88 insertions(+), 0 deletions(-) create mode 100644 arch/powerpc/kernel/kvm.c create mode 100644 arch/powerpc/kernel/kvm_emul.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 58d0572..2d7eb9e 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -125,6 +125,8 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),) obj-y += ppc_save_regs.o endif +obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o + # Disable GCOV in odd or sensitive code GCOV_PROFILE_prom_init.o := n GCOV_PROFILE_ftrace.o := n diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a55d47e..e3e740b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -465,6 +465,21 @@ int main(void) DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr)); #endif /* CONFIG_PPC_BOOK3S */ #endif + +#ifdef CONFIG_KVM_GUEST + DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared, + scratch1)); + DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared, + scratch2)); + DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared, + scratch3)); + DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared, + int_pending)); + DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); + DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared, + critical)); +#endif + #ifdef CONFIG_44x DEFINE(PGD_T_LOG2, PGD_T_LOG2); DEFINE(PTE_T_LOG2, PTE_T_LOG2); diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c new file mode 100644 index 000..2d8dd73 --- /dev/null +++ b/arch/powerpc/kernel/kvm.c @@ -0,0 +1,34 @@ +/* + * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved. + * + * Authors: + * Alexander Graf + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#define KVM_MAGIC_PAGE (-4096L) +#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) + diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S new file mode 100644 index 000..c7b9fc9 --- /dev/null +++ b/arch/powerpc/kernel/kvm_emul.S @@ -0,0 +1,27 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright SUSE Linux Products GmbH 2010 + * + * Authors: Alexander Graf + */ + +#include +#include +#include +#include +#include + +#define KVM_MAGIC_PAGE (-4096) + diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index d1663db..1744349 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -21,6 +21,16 @@ source "arch/powerpc/platforms/44x/Kconfig" source "arch/powerpc/platforms/40x/Kconfig" source "arch/powerpc/platforms/amigaone/Kconfig" +config KVM_GUEST + bool "KVM Guest support" + default y + ---help--- + This
[PATCH 23/27] KVM: PPC: PV assembler helpers
When we hook an instruction we need to make sure we don't clobber any of the registers at that point. So we write them out to scratch space in the magic page. To make sure we don't fall into a race with another piece of hooked code, we need to disable interrupts. To make the later patches and code in general easier readable, let's introduce a set of defines that save and restore r30, r31 and cr. Let's also define some helpers to read the lower 32 bits of a 64 bit field on 32 bit systems. Signed-off-by: Alexander Graf --- arch/powerpc/kernel/kvm_emul.S | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index c7b9fc9..7da835a 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -25,3 +25,32 @@ #define KVM_MAGIC_PAGE (-4096) +#ifdef CONFIG_64BIT +#define LL64(reg, offs, reg2) ld reg, (offs)(reg2) +#define STL64(reg, offs, reg2) std reg, (offs)(reg2) +#else +#define LL64(reg, offs, reg2) lwz reg, (offs + 4)(reg2) +#define STL64(reg, offs, reg2) stw reg, (offs + 4)(reg2) +#endif + +#define SCRATCH_SAVE \ + /* Enable critical section. We are critical if \ + shared->critical == r1 */\ + STL64(r1, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); \ + \ + /* Save state */\ + PPC_STL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0); \ + PPC_STL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0); \ + mfcrr31;\ + stw r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0); + +#define SCRATCH_RESTORE \ + /* Restore state */ \ + PPC_LL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0); \ + lwz r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0); \ + mtcrr30;\ + PPC_LL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0); \ + \ + /* Disable critical section. We are critical if \ + shared->critical == r1 and r2 is always != r1 */ \ + STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 13/27] KVM: PPC: Magic Page Book3s support
We need to override EA as well as PA lookups for the magic page. When the guest tells us to project it, the magic page overrides any guest mappings. In order to reflect that, we need to hook into all the MMU layers of KVM to force map the magic page if necessary. Signed-off-by: Alexander Graf v1 -> v2: - RMO -> PAM --- arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/book3s_32_mmu.c | 16 arch/powerpc/kvm/book3s_32_mmu_host.c | 12 arch/powerpc/kvm/book3s_64_mmu.c | 30 +- arch/powerpc/kvm/book3s_64_mmu_host.c | 12 5 files changed, 76 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 14db032..b22e608 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -554,6 +554,13 @@ mmio: static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) { + ulong mp_pa = vcpu->arch.magic_page_pa; + + if (unlikely(mp_pa) && + unlikely((mp_pa & KVM_RMO) >> PAGE_SHIFT == gfn)) { + return 1; + } + return kvm_is_visible_gfn(vcpu->kvm, gfn); } diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 41130c8..5bf4bf8 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data) { int r; + ulong mp_ea = vcpu->arch.magic_page_ea; pte->eaddr = eaddr; + + /* Magic page override */ + if (unlikely(mp_ea) && + unlikely((eaddr & ~0xfffULL) == (mp_ea & ~0xfffULL)) && + !(vcpu->arch.shared->msr & MSR_PR)) { + pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data); + pte->raddr = vcpu->arch.magic_page_pa | (pte->raddr & 0xfff); + pte->raddr &= KVM_PAM; + pte->may_execute = true; + pte->may_read = true; + pte->may_write = true; + + return 0; + } + r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data); if (r < 0) r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true); diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 67b8c38..506d187 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -145,6 +145,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) bool primary = false; bool evict = false; struct hpte_cache *pte; + ulong mp_pa = vcpu->arch.magic_page_pa; + + /* Magic page override */ + if (unlikely(mp_pa) && + unlikely((orig_pte->raddr & ~0xfffUL & KVM_PAM) == +(mp_pa & ~0xfffUL & KVM_PAM))) { + hpaddr = (pfn_t)virt_to_phys(vcpu->arch.shared); + get_page(pfn_to_page(hpaddr >> PAGE_SHIFT)); + goto mapped; + } /* Get host physical address for gpa */ hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT); @@ -155,6 +165,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) } hpaddr <<= PAGE_SHIFT; +mapped: + /* and write the mapping ea -> hpa into the pt */ vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid); map = find_sid_vsid(vcpu, vsid); diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 58aa840..d7889ef 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -163,6 +163,22 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, bool found = false; bool perm_err = false; int second = 0; + ulong mp_ea = vcpu->arch.magic_page_ea; + + /* Magic page override */ + if (unlikely(mp_ea) && + unlikely((eaddr & ~0xfffULL) == (mp_ea & ~0xfffULL)) && + !(vcpu->arch.shared->msr & MSR_PR)) { + gpte->eaddr = eaddr; + gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data); + gpte->raddr = vcpu->arch.magic_page_pa | (gpte->raddr & 0xfff); + gpte->raddr &= KVM_PAM; + gpte->may_execute = true; + gpte->may_read = true; + gpte->may_write = true; + + return 0; + } slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr); if (!slbe) @@ -445,6 +461,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid, ulong ea = esid << SID_SHIFT; struct kvmppc_slb *slb; u64 gvsid = esid; + ulong mp_ea = vcpu->arch.magic_page_ea; if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
[PATCH 10/27] KVM: PPC: Tell guest about pending interrupts
When the guest turns on interrupts again, it needs to know if we have an interrupt pending for it. Because if so, it should rather get out of guest context and get the interrupt. So we introduce a new field in the shared page that we use to tell the guest that there's a pending interrupt lying around. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/booke.c|7 +++ 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 1f7dccd..82131fc 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -36,6 +36,7 @@ struct kvm_vcpu_arch_shared { __u64 dar; __u64 msr; __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ }; #define KVM_PVR_PARA 0x4b564d3f /* "KVM?" */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index ab43744..66313a2 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -337,6 +337,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = &vcpu->arch.pending_exceptions; + unsigned long old_pending = vcpu->arch.pending_exceptions; unsigned int priority; #ifdef EXIT_DEBUG @@ -356,6 +357,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu->arch.shared->int_pending = 1; + else if (old_pending) + vcpu->arch.shared->int_pending = 0; } void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index b9f8ecf..0f8ff9d 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -224,6 +224,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = &vcpu->arch.pending_exceptions; + unsigned long old_pending = vcpu->arch.pending_exceptions; unsigned int priority; priority = __ffs(*pending); @@ -235,6 +236,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu->arch.shared->int_pending = 1; + else if (old_pending) + vcpu->arch.shared->int_pending = 0; } /** -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 08/27] KVM: PPC: Add PV guest critical sections
When running in hooked code we need a way to disable interrupts without clobbering any interrupts or exiting out to the hypervisor. To achieve this, we have an additional critical field in the shared page. If that field is equal to the r1 register of the guest, it tells the hypervisor that we're in such a critical section and thus may not receive any interrupts. Signed-off-by: Alexander Graf --- v1 -> v2: - make crit detection only trigger in supervisor mode --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c | 18 -- arch/powerpc/kvm/booke.c| 15 +++ 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 89c2760..d9c06ab 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include struct kvm_vcpu_arch_shared { + __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; __u64 sprg2; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 10afa48..ab43744 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -251,14 +251,28 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) int deliver = 1; int vec = 0; ulong flags = 0ULL; + ulong crit_raw = vcpu->arch.shared->critical; + ulong crit_r1 = kvmppc_get_gpr(vcpu, 1); + bool crit; + + /* Truncate crit indicators in 32 bit mode */ + if (!(vcpu->arch.shared->msr & MSR_SF)) { + crit_raw &= 0x; + crit_r1 &= 0x; + } + + /* Critical section when crit == r1 */ + crit = (crit_raw == crit_r1); + /* ... and we're in supervisor mode */ + crit = crit && !(vcpu->arch.shared->msr & MSR_PR); switch (priority) { case BOOK3S_IRQPRIO_DECREMENTER: - deliver = vcpu->arch.shared->msr & MSR_EE; + deliver = (vcpu->arch.shared->msr & MSR_EE) && !crit; vec = BOOK3S_INTERRUPT_DECREMENTER; break; case BOOK3S_IRQPRIO_EXTERNAL: - deliver = vcpu->arch.shared->msr & MSR_EE; + deliver = (vcpu->arch.shared->msr & MSR_EE) && !crit; vec = BOOK3S_INTERRUPT_EXTERNAL; break; case BOOK3S_IRQPRIO_SYSTEM_RESET: diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index bd812f4..b9f8ecf 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -147,6 +147,20 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, int allowed = 0; ulong uninitialized_var(msr_mask); bool update_esr = false, update_dear = false; + ulong crit_raw = vcpu->arch.shared->critical; + ulong crit_r1 = kvmppc_get_gpr(vcpu, 1); + bool crit; + + /* Truncate crit indicators in 32 bit mode */ + if (!(vcpu->arch.shared->msr & MSR_SF)) { + crit_raw &= 0x; + crit_r1 &= 0x; + } + + /* Critical section when crit == r1 */ + crit = (crit_raw == crit_r1); + /* ... and we're in supervisor mode */ + crit = crit && !(vcpu->arch.shared->msr & MSR_PR); switch (priority) { case BOOKE_IRQPRIO_DTLB_MISS: @@ -181,6 +195,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_DECREMENTER: case BOOKE_IRQPRIO_FIT: allowed = vcpu->arch.shared->msr & MSR_EE; + allowed = allowed && !crit; msr_mask = MSR_CE|MSR_ME|MSR_DE; break; case BOOKE_IRQPRIO_DEBUG: -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 11/27] KVM: PPC: Make RMO a define
On PowerPC it's very normal to not support all of the physical RAM in real mode. To check if we're matching on the shared page or not, we need to know the limits so we can restrain ourselves to that range. So let's make it a define instead of open-coding it. And while at it, let's also increase it. Signed-off-by: Alexander Graf v1 -> v2: - RMO -> PAM --- arch/powerpc/include/asm/kvm_host.h |3 +++ arch/powerpc/kvm/book3s.c |4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5674300..fdfb7f0 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -47,6 +47,9 @@ #define HPTEG_HASH_NUM_VPTE(1 << HPTEG_HASH_BITS_VPTE) #define HPTEG_HASH_NUM_VPTE_LONG (1 << HPTEG_HASH_BITS_VPTE_LONG) +/* Physical Address Mask - allowed range of real mode RAM access */ +#define KVM_PAM0x0fffULL + struct kvm; struct kvm_run; struct kvm_vcpu; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 66313a2..14db032 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -465,7 +465,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data); } else { pte->eaddr = eaddr; - pte->raddr = eaddr & 0x; + pte->raddr = eaddr & KVM_PAM; pte->vpage = VSID_REAL | eaddr >> 12; pte->may_read = true; pte->may_write = true; @@ -579,7 +579,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, pte.may_execute = true; pte.may_read = true; pte.may_write = true; - pte.raddr = eaddr & 0x; + pte.raddr = eaddr & KVM_PAM; pte.eaddr = eaddr; pte.vpage = eaddr >> 12; } -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 05/27] KVM: PPC: Convert SRR0 and SRR1 to shared page
The SRR0 and SRR1 registers contain cached values of the PC and MSR respectively. They get written to by the hypervisor when an interrupt occurs or directly by the kernel. They are also used to tell the rfi(d) instruction where to jump to. Because it only gets touched on defined events that, it's very simple to share with the guest. Hypervisor and guest both have full r/w access. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h |2 -- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/book3s.c | 12 ++-- arch/powerpc/kvm/book3s_emulate.c |4 ++-- arch/powerpc/kvm/booke.c| 15 --- arch/powerpc/kvm/booke_emulate.c|4 ++-- arch/powerpc/kvm/emulate.c | 12 7 files changed, 28 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 4502c0f..227f770 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -224,8 +224,6 @@ struct kvm_vcpu_arch { ulong sprg5; ulong sprg6; ulong sprg7; - ulong srr0; - ulong srr1; ulong csrr0; ulong csrr1; ulong dsrr0; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index ec72a1c..d7fc6c2 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,8 @@ #include struct kvm_vcpu_arch_shared { + __u64 srr0; + __u64 srr1; __u64 dar; __u64 msr; __u32 dsisr; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 29a3ed6..7cc3da6 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr) void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) { - vcpu->arch.srr0 = kvmppc_get_pc(vcpu); - vcpu->arch.srr1 = vcpu->arch.shared->msr | flags; + vcpu->arch.shared->srr0 = kvmppc_get_pc(vcpu); + vcpu->arch.shared->srr1 = vcpu->arch.shared->msr | flags; kvmppc_set_pc(vcpu, to_book3s(vcpu)->hior + vec); vcpu->arch.mmu.reset_msr(vcpu); } @@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs->lr = kvmppc_get_lr(vcpu); regs->xer = kvmppc_get_xer(vcpu); regs->msr = vcpu->arch.shared->msr; - regs->srr0 = vcpu->arch.srr0; - regs->srr1 = vcpu->arch.srr1; + regs->srr0 = vcpu->arch.shared->srr0; + regs->srr1 = vcpu->arch.shared->srr1; regs->pid = vcpu->arch.pid; regs->sprg0 = vcpu->arch.sprg0; regs->sprg1 = vcpu->arch.sprg1; @@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_lr(vcpu, regs->lr); kvmppc_set_xer(vcpu, regs->xer); kvmppc_set_msr(vcpu, regs->msr); - vcpu->arch.srr0 = regs->srr0; - vcpu->arch.srr1 = regs->srr1; + vcpu->arch.shared->srr0 = regs->srr0; + vcpu->arch.shared->srr1 = regs->srr1; vcpu->arch.sprg0 = regs->sprg0; vcpu->arch.sprg1 = regs->sprg1; vcpu->arch.sprg2 = regs->sprg2; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index c147864..f333cb4 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_xop(inst)) { case OP_19_XOP_RFID: case OP_19_XOP_RFI: - kvmppc_set_pc(vcpu, vcpu->arch.srr0); - kvmppc_set_msr(vcpu, vcpu->arch.srr1); + kvmppc_set_pc(vcpu, vcpu->arch.shared->srr0); + kvmppc_set_msr(vcpu, vcpu->arch.shared->srr1); *advance = 0; break; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5844bcf..8b546fe 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu) printk("pc: %08lx msr: %08llx\n", vcpu->arch.pc, vcpu->arch.shared->msr); printk("lr: %08lx ctr: %08lx\n", vcpu->arch.lr, vcpu->arch.ctr); - printk("srr0: %08lx srr1: %08lx\n", vcpu->arch.srr0, vcpu->arch.srr1); + printk("srr0: %08llx srr1: %08llx\n", vcpu->arch.shared->srr0, + vcpu->arch.shared->srr1); printk("exceptions: %08lx\n", vcpu->arch.pending_exceptions); @@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, } if (allowed) { - vcpu->arch.srr0 = vcpu->arch.pc; - vcpu->arch.srr1 = vcpu->arch.shared->msr; +
[PATCH 09/27] KVM: PPC: Add PV guest scratch registers
While running in hooked code we need to store register contents out because we must not clobber any registers. So let's add some fields to the shared page we can just happily write to. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_para.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d9c06ab..1f7dccd 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,9 @@ #include struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 20/27] KVM: PPC: PV tlbsync to nop
With our current MMU scheme we don't need to know about the tlbsync instruction. So we can just nop it out. Signed-off-by: Alexander Graf --- v1 -> v2: - use kvm_patch_ins --- arch/powerpc/kernel/kvm.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 7094ee4..3a49de5 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -61,6 +61,8 @@ #define KVM_INST_MTSPR_DAR 0x7c1303a6 #define KVM_INST_MTSPR_DSISR 0x7c1203a6 +#define KVM_INST_TLBSYNC 0x7c00046c + static bool kvm_patching_worked = true; static inline void kvm_patch_ins(u32 *inst, u32 new_inst) @@ -97,6 +99,11 @@ static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) kvm_patch_ins(inst, KVM_INST_STW | rt | (addr & 0xfffc)); } +static void kvm_patch_ins_nop(u32 *inst) +{ + kvm_patch_ins(inst, KVM_INST_NOP); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -165,6 +172,11 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_MTSPR_DSISR: kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt); break; + + /* Nops */ + case KVM_INST_TLBSYNC: + kvm_patch_ins_nop(inst); + break; } switch (_inst) { -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 18/27] KVM: PPC: KVM PV guest stubs
We will soon start and replace instructions from the text section with other, paravirtualized versions. To ease the readability of those patches I split out the generic looping and magic page mapping code out. This patch still only contains stubs. But at least it loops through the text section :). Signed-off-by: Alexander Graf --- v1 -> v2: - kvm guest patch framework: introduce patch_ins --- arch/powerpc/kernel/kvm.c | 63 + 1 files changed, 63 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 2d8dd73..1f328d5 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -32,3 +32,66 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +static bool kvm_patching_worked = true; + +static inline void kvm_patch_ins(u32 *inst, u32 new_inst) +{ + *inst = new_inst; + flush_icache_range((ulong)inst, (ulong)inst + 4); +} + +static void kvm_map_magic_page(void *data) +{ + kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, + KVM_MAGIC_PAGE, /* Physical Address */ + KVM_MAGIC_PAGE); /* Effective Address */ +} + +static void kvm_check_ins(u32 *inst) +{ + u32 _inst = *inst; + u32 inst_no_rt = _inst & ~KVM_MASK_RT; + u32 inst_rt = _inst & KVM_MASK_RT; + + switch (inst_no_rt) { + } + + switch (_inst) { + } +} + +static void kvm_use_magic_page(void) +{ + u32 *p; + u32 *start, *end; + + /* Tell the host to map the magic page to -4096 on all CPUs */ + + on_each_cpu(kvm_map_magic_page, NULL, 1); + + /* Now loop through all code and find instructions */ + + start = (void*)_stext; + end = (void*)_etext; + + for (p = start; p < end; p++) + kvm_check_ins(p); +} + +static int __init kvm_guest_init(void) +{ + char *p; + + if (!kvm_para_available()) + return 0; + + if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) + kvm_use_magic_page(); + + printk(KERN_INFO "KVM: Live patching for a fast VM %s\n", +kvm_patching_worked ? "worked" : "failed"); + + return 0; +} + +postcore_initcall(kvm_guest_init); -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 16/27] KVM: Move kvm_guest_init out of generic code
Currently x86 is the only architecture that uses kvm_guest_init(). With PowerPC we're getting a second user, but the signature is different there and we don't need to export it, as it uses the normal kernel init framework. So let's move the x86 specific definition of that function over to the x86 specfic header file. Signed-off-by: Alexander Graf --- arch/x86/include/asm/kvm_para.h |6 ++ include/linux/kvm_para.h|5 - 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 05eba5e..7b562b6 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -158,6 +158,12 @@ static inline unsigned int kvm_arch_para_features(void) return cpuid_eax(KVM_CPUID_FEATURES); } +#ifdef CONFIG_KVM_GUEST +void __init kvm_guest_init(void); +#else +#define kvm_guest_init() do { } while (0) #endif +#endif /* __KERNEL__ */ + #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index ac2015a..47a070b 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -26,11 +26,6 @@ #include #ifdef __KERNEL__ -#ifdef CONFIG_KVM_GUEST -void __init kvm_guest_init(void); -#else -#define kvm_guest_init() do { } while (0) -#endif static inline int kvm_para_has_feature(unsigned int feature) { -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 15/27] KVM: PPC: Expose magic page support to guest
Now that we have the shared page in place and the MMU code knows about the magic page, we can expose that capability to the guest! Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/powerpc.c | 11 +++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 82131fc..3cae15d 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -43,6 +43,8 @@ struct kvm_vcpu_arch_shared { #define KVM_SC_MAGIC_R00x4b564d52 /* "KVMR" */ #define KVM_SC_MAGIC_R30x554c455a /* "ULEZ" */ +#define KVM_FEATURE_MAGIC_PAGE 1 + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1ebb29e..0be119a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -60,8 +60,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) } switch (nr) { + case KVM_HC_PPC_MAP_MAGIC_PAGE: + { + vcpu->arch.magic_page_pa = param1; + vcpu->arch.magic_page_ea = param2; + + r = 0; + break; + } case KVM_HC_FEATURES: r = 0; +#if !defined(CONFIG_KVM_440) /* XXX missing bits on 440 */ + r |= (1 << KVM_FEATURE_MAGIC_PAGE); +#endif break; default: r = -KVM_ENOSYS; -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 04/27] KVM: PPC: Convert DAR to shared page.
The DAR register contains the address a data page fault occured at. This register behaves pretty much like a simple data storage register that gets written to on data faults. There is no hypervisor interaction required on read or write. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h |1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c| 14 +++--- arch/powerpc/kvm/book3s_emulate.c|6 +++--- arch/powerpc/kvm/book3s_paired_singles.c |2 +- arch/powerpc/kvm/booke.c |2 +- arch/powerpc/kvm/booke_emulate.c |4 ++-- 7 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index c7aee42..4502c0f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -230,7 +230,6 @@ struct kvm_vcpu_arch { ulong csrr1; ulong dsrr0; ulong dsrr1; - ulong dear; ulong esr; u32 dec; u32 decar; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 9f7565b..ec72a1c 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include struct kvm_vcpu_arch_shared { + __u64 dar; __u64 msr; __u32 dsisr; }; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 72917f8..29a3ed6 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -594,14 +594,14 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (page_found == -ENOENT) { /* Page not found in guest PTE entries */ - vcpu->arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu); vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr; vcpu->arch.shared->msr |= (to_svcpu(vcpu)->shadow_srr1 & 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EPERM) { /* Storage protection */ - vcpu->arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu); vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr & ~DSISR_NOHPTE; vcpu->arch.shared->dsisr |= DSISR_PROTFAULT; @@ -610,7 +610,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EINVAL) { /* Page not found in guest SLB */ - vcpu->arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu); kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80); } else if (!is_mmio && kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) { @@ -867,17 +867,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, if (to_svcpu(vcpu)->fault_dsisr & DSISR_NOHPTE) { r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); } else { - vcpu->arch.dear = dar; + vcpu->arch.shared->dar = dar; vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); - kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL); + kvmppc_mmu_pte_flush(vcpu, dar, ~0xFFFUL); r = RESUME_GUEST; } break; } case BOOK3S_INTERRUPT_DATA_SEGMENT: if (kvmppc_mmu_map_segment(vcpu, kvmppc_get_fault_dar(vcpu)) < 0) { - vcpu->arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu->arch.shared->dar = kvmppc_get_fault_dar(vcpu); kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_SEGMENT); } @@ -997,7 +997,7 @@ program_interrupt: if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { vcpu->arch.shared->dsisr = kvmppc_alignment_dsisr(vcpu, kvmppc_get_last_inst(vcpu)); - vcpu->arch.dear = kvmppc_alignment_dar(vcpu, + vcpu->arch.shared->dar = kvmppc_alignment_dar(vcpu, kvmppc_get_last_inst(vcpu)); kvmppc_book3s_queue_irqprio(vcpu, exit_nr); } diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 9982ff1..c147864 100644 --- a/arch/powerpc/kvm/book3s_emulate.c
[PATCH 12/27] KVM: PPC: First magic page steps
We will be introducing a method to project the shared page in guest context. As soon as we're talking about this coupling, the shared page is colled magic page. This patch introduces simple defines, so the follow-up patches are easier to read. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h |2 ++ include/linux/kvm_para.h|1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fdfb7f0..14be0f3 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -286,6 +286,8 @@ struct kvm_vcpu_arch { u64 dec_jiffies; unsigned long pending_exceptions; struct kvm_vcpu_arch_shared *shared; + unsigned long magic_page_pa; /* phys addr to map the magic page to */ + unsigned long magic_page_ea; /* effect. addr to map the magic page to */ #ifdef CONFIG_PPC_BOOK3S struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 3b8080e..ac2015a 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -18,6 +18,7 @@ #define KVM_HC_VAPIC_POLL_IRQ 1 #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 +#define KVM_HC_PPC_MAP_MAGIC_PAGE 4 /* * hypercalls use architecture specific -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 07/27] KVM: PPC: Implement hypervisor interface
To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf --- v1 -> v2: - change hypervisor calls to use new register values --- arch/powerpc/include/asm/kvm_para.h | 100 ++- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kvm/book3s.c | 10 +++- arch/powerpc/kvm/booke.c| 11 - arch/powerpc/kvm/emulate.c | 11 - arch/powerpc/kvm/powerpc.c | 28 ++ include/linux/kvm_para.h|1 + 7 files changed, 156 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index e402999..89c2760 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -34,16 +34,112 @@ struct kvm_vcpu_arch_shared { __u32 dsisr; }; +#define KVM_PVR_PARA 0x4b564d3f /* "KVM?" */ +#define KVM_SC_MAGIC_R00x4b564d52 /* "KVMR" */ +#define KVM_SC_MAGIC_R30x554c455a /* "ULEZ" */ + #ifdef __KERNEL__ static inline int kvm_para_available(void) { - return 0; + unsigned long pvr = KVM_PVR_PARA; + + asm volatile("mfpvr %0" : "=r"(pvr) : "0"(pvr)); + return pvr == KVM_PVR_PARA; +} + +static inline long kvm_hypercall0(unsigned int nr) +{ + unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0; + unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3; + unsigned long register _nr asm("r4") = nr; + + asm volatile("sc" +: "=r"(r3) +: "r"(r0), "r"(r3), "r"(_nr) +: "memory"); + + return r3; } +static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) +{ + unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0; + unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3; + unsigned long register _nr asm("r4") = nr; + unsigned long register _p1 asm("r5") = p1; + + asm volatile("sc" +: "=r"(r3) +: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1) +: "memory"); + + return r3; +} + +static inline long kvm_hypercall2(unsigned int nr, unsigned long p1, + unsigned long p2) +{ + unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0; + unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3; + unsigned long register _nr asm("r4") = nr; + unsigned long register _p1 asm("r5") = p1; + unsigned long register _p2 asm("r6") = p2; + + asm volatile("sc" +: "=r"(r3) +: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1), "r"(_p2) +: "memory"); + + return r3; +} + +static inline long kvm_hypercall3(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3) +{ + unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0; + unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3; + unsigned long register _nr asm("r4") = nr; + unsigned long register _p1 asm("r5") = p1; + unsigned long register _p2 asm("r6") = p2; + unsigned long register _p3 asm("r7") = p3; + + asm volatile("sc" +: "=r"(r3) +: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1), "r"(_p2), "r"(_p3) +: "memory"); + + return r3; +} + +static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + unsigned long register r0 asm("r0") = KVM_SC_MAGIC_R0; + unsigned long register r3 asm("r3") = KVM_SC_MAGIC_R3; + unsigned long register _nr asm("r4") = nr; + unsigned long register _p1 asm("r5") = p1; + unsigned long register _p2 asm("r6") = p2; + unsigned long register _p3 asm("r7") = p3; + unsigned long register _p4 asm("r8") = p4; + + asm volatile("sc" +: "=r"(r3) +: "r"(r0), "r"(r3), "r"(_nr), "r"(_p1), "r"(_p2), "r"(_p3), + "r"(_p4) +: "memory"); + + return r3; +} + + static inline unsigned int kvm_arch_para_features(void) { - return 0; + if (!kvm_para_available()) + return 0; + + return kvm_hypercall0(KVM_HC_FEATURES); } #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 18d139e..ecb3bc7 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -10
[PATCH 03/27] KVM: PPC: Convert DSISR to shared page
The DSISR register contains information about a data page fault. It is fully read/write from inside the guest context and we don't need to worry about interacting based on writes of this register. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h|1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c| 11 ++- arch/powerpc/kvm/book3s_emulate.c|6 +++--- arch/powerpc/kvm/book3s_paired_singles.c |2 +- 5 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 8274a2d..b5b1961 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -85,7 +85,6 @@ struct kvmppc_vcpu_book3s { u64 hid[6]; u64 gqr[8]; int slb_nr; - u32 dsisr; u64 sdr1; u64 hior; u64 msr_mask; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index a17dc52..9f7565b 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -24,6 +24,7 @@ struct kvm_vcpu_arch_shared { __u64 msr; + __u32 dsisr; }; #ifdef __KERNEL__ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 38cca77..72917f8 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -595,15 +595,16 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (page_found == -ENOENT) { /* Page not found in guest PTE entries */ vcpu->arch.dear = kvmppc_get_fault_dar(vcpu); - to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr; + vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr; vcpu->arch.shared->msr |= (to_svcpu(vcpu)->shadow_srr1 & 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EPERM) { /* Storage protection */ vcpu->arch.dear = kvmppc_get_fault_dar(vcpu); - to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr & ~DSISR_NOHPTE; - to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT; + vcpu->arch.shared->dsisr = + to_svcpu(vcpu)->fault_dsisr & ~DSISR_NOHPTE; + vcpu->arch.shared->dsisr |= DSISR_PROTFAULT; vcpu->arch.shared->msr |= (to_svcpu(vcpu)->shadow_srr1 & 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); @@ -867,7 +868,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); } else { vcpu->arch.dear = dar; - to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr; + vcpu->arch.shared->dsisr = to_svcpu(vcpu)->fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL); r = RESUME_GUEST; @@ -994,7 +995,7 @@ program_interrupt: } case BOOK3S_INTERRUPT_ALIGNMENT: if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { - to_book3s(vcpu)->dsisr = kvmppc_alignment_dsisr(vcpu, + vcpu->arch.shared->dsisr = kvmppc_alignment_dsisr(vcpu, kvmppc_get_last_inst(vcpu)); vcpu->arch.dear = kvmppc_alignment_dar(vcpu, kvmppc_get_last_inst(vcpu)); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 35d3c16..9982ff1 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -221,7 +221,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, else if (r == -EPERM) dsisr |= DSISR_PROTFAULT; - to_book3s(vcpu)->dsisr = dsisr; + vcpu->arch.shared->dsisr = dsisr; to_svcpu(vcpu)->fault_dsisr = dsisr; kvmppc_book3s_queue_irqprio(vcpu, @@ -327,7 +327,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) to_book3s(vcpu)->sdr1 = spr_val; break; case SPRN_DSISR: - to_book3s(vcpu)->dsisr = spr_val; + vcpu->arch.shared->dsisr = spr_val; break; case SPRN_DAR: vcpu->arch.dear = spr_val; @@ -440,7 +440,7 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) kvmppc_set_gpr(vcpu, rt, to_book3s(v
[PATCH 06/27] KVM: PPC: Convert SPRG[0-4] to shared page
When in kernel mode there are 4 additional registers available that are simple data storage. Instead of exiting to the hypervisor to read and write those, we can just share them with the guest using the page. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h |4 arch/powerpc/include/asm/kvm_para.h |4 arch/powerpc/kvm/book3s.c | 16 arch/powerpc/kvm/booke.c| 16 arch/powerpc/kvm/emulate.c | 24 5 files changed, 36 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 227f770..5674300 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -216,10 +216,6 @@ struct kvm_vcpu_arch { ulong guest_owned_ext; #endif u32 mmucr; - ulong sprg0; - ulong sprg1; - ulong sprg2; - ulong sprg3; ulong sprg4; ulong sprg5; ulong sprg6; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d7fc6c2..e402999 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,10 @@ #include struct kvm_vcpu_arch_shared { + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; __u64 srr0; __u64 srr1; __u64 dar; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 7cc3da6..0a56e8d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs->srr0 = vcpu->arch.shared->srr0; regs->srr1 = vcpu->arch.shared->srr1; regs->pid = vcpu->arch.pid; - regs->sprg0 = vcpu->arch.sprg0; - regs->sprg1 = vcpu->arch.sprg1; - regs->sprg2 = vcpu->arch.sprg2; - regs->sprg3 = vcpu->arch.sprg3; + regs->sprg0 = vcpu->arch.shared->sprg0; + regs->sprg1 = vcpu->arch.shared->sprg1; + regs->sprg2 = vcpu->arch.shared->sprg2; + regs->sprg3 = vcpu->arch.shared->sprg3; regs->sprg5 = vcpu->arch.sprg4; regs->sprg6 = vcpu->arch.sprg5; regs->sprg7 = vcpu->arch.sprg6; @@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs->msr); vcpu->arch.shared->srr0 = regs->srr0; vcpu->arch.shared->srr1 = regs->srr1; - vcpu->arch.sprg0 = regs->sprg0; - vcpu->arch.sprg1 = regs->sprg1; - vcpu->arch.sprg2 = regs->sprg2; - vcpu->arch.sprg3 = regs->sprg3; + vcpu->arch.shared->sprg0 = regs->sprg0; + vcpu->arch.shared->sprg1 = regs->sprg1; + vcpu->arch.shared->sprg2 = regs->sprg2; + vcpu->arch.shared->sprg3 = regs->sprg3; vcpu->arch.sprg5 = regs->sprg4; vcpu->arch.sprg6 = regs->sprg5; vcpu->arch.sprg7 = regs->sprg6; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 8b546fe..984c461 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs->srr0 = vcpu->arch.shared->srr0; regs->srr1 = vcpu->arch.shared->srr1; regs->pid = vcpu->arch.pid; - regs->sprg0 = vcpu->arch.sprg0; - regs->sprg1 = vcpu->arch.sprg1; - regs->sprg2 = vcpu->arch.sprg2; - regs->sprg3 = vcpu->arch.sprg3; + regs->sprg0 = vcpu->arch.shared->sprg0; + regs->sprg1 = vcpu->arch.shared->sprg1; + regs->sprg2 = vcpu->arch.shared->sprg2; + regs->sprg3 = vcpu->arch.shared->sprg3; regs->sprg5 = vcpu->arch.sprg4; regs->sprg6 = vcpu->arch.sprg5; regs->sprg7 = vcpu->arch.sprg6; @@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs->msr); vcpu->arch.shared->srr0 = regs->srr0; vcpu->arch.shared->srr1 = regs->srr1; - vcpu->arch.sprg0 = regs->sprg0; - vcpu->arch.sprg1 = regs->sprg1; - vcpu->arch.sprg2 = regs->sprg2; - vcpu->arch.sprg3 = regs->sprg3; + vcpu->arch.shared->sprg0 = regs->sprg0; + vcpu->arch.shared->sprg1 = regs->sprg1; + vcpu->arch.shared->sprg2 = regs->sprg2; + vcpu->arch.shared->sprg3 = regs->sprg3; vcpu->arch.sprg5 = regs->sprg4; vcpu->arch.sprg6 = regs->sprg5; vcpu->arch.sprg7 = regs->sprg6; diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index ad0fa4f..454869b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, rt, get_tb()); break;
[PATCH 01/27] KVM: PPC: Introduce shared page
For transparent variable sharing between the hypervisor and guest, I introduce a shared page. This shared page will contain all the registers the guest can read and write safely without exiting guest context. This patch only implements the stubs required for the basic structure of the shared page. The actual register moving follows. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/include/asm/kvm_para.h |5 + arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kvm/44x.c |7 +++ arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/e500.c |7 +++ 6 files changed, 29 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e004eaf..246a3dd 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -25,6 +25,7 @@ #include #include #include +#include #include #define KVM_MAX_VCPUS 1 @@ -289,6 +290,7 @@ struct kvm_vcpu_arch { struct tasklet_struct tasklet; u64 dec_jiffies; unsigned long pending_exceptions; + struct kvm_vcpu_arch_shared *shared; #ifdef CONFIG_PPC_BOOK3S struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 2d48f6a..1485ba8 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -20,6 +20,11 @@ #ifndef __POWERPC_KVM_PARA_H__ #define __POWERPC_KVM_PARA_H__ +#include + +struct kvm_vcpu_arch_shared { +}; + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 496cc5b..944f593 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -400,6 +400,7 @@ int main(void) DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); + DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c index 73c0a3f..e7b1f3f 100644 --- a/arch/powerpc/kvm/44x.c +++ b/arch/powerpc/kvm/44x.c @@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_vcpu; + vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu->arch.shared) + goto uninit_vcpu; + return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_vcpu: kmem_cache_free(kvm_vcpu_cache, vcpu_44x); out: @@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu); + free_page((unsigned long)vcpu->arch.shared); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, vcpu_44x); } diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 30c0bd5..2c2c3ca 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_shadow_vcpu; + vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu->arch.shared) + goto uninit_vcpu; + vcpu->arch.host_retip = kvm_return_point; vcpu->arch.host_msr = mfmsr(); #ifdef CONFIG_PPC_BOOK3S_64 @@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_shadow_vcpu: kfree(vcpu_book3s->shadow_vcpu); free_vcpu: @@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); + free_page((unsigned long)vcpu->arch.shared); kvm_vcpu_uninit(vcpu); kfree(vcpu_book3s->shadow_vcpu); vfree(vcpu_book3s); diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index e8a00b0..71750f2 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto uninit_vcpu; + vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu->arch.shared) + goto uninit_tlb; + return vcpu; +uninit_tlb: + kvmppc_e500_tlb_uninit(vcpu_e500); uninit_vcpu: kvm_vcpu_uninit(vcpu); free_vcpu: @@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); + free_page((unsigned long)vcpu->arch.shared);
[PATCH 00/27] KVM PPC PV framework
On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the hypervisor extensions. While that is all great to show that virtualization is possible, there are quite some cases where the emulation overhead of privileged instructions is killing performance. This patchset tackles exactly that issue. It introduces a paravirtual framework using which KVM and Linux share a page to exchange register state with. That way we don't have to switch to the hypervisor just to change a value of a privileged register. To prove my point, I ran the same test I did for the MMU optimizations against the PV framework. Here are the results: [without] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; done real0m14.659s user0m8.967s sys 0m5.688s [with] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > /dev/null; done real0m7.557s user0m4.121s sys 0m3.426s So this is a significant performance improvement! I'm quite happy how fast this whole thing becomes :) I tried to take all comments I've heard from people so far about such a PV framework into account. In case you told me something before that is a no-go and I still did it, please just tell me again. Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start experiencing the power yourself. - heh v1 -> v2: - change hypervisor calls to use r0 and r3 - make crit detection only trigger in supervisor mode - RMO -> PAM - introduce kvm_patch_ins - only flush icache when patching - introduce kvm_patch_ins_b - update documentation Alexander Graf (27): KVM: PPC: Introduce shared page KVM: PPC: Convert MSR to shared page KVM: PPC: Convert DSISR to shared page KVM: PPC: Convert DAR to shared page. KVM: PPC: Convert SRR0 and SRR1 to shared page KVM: PPC: Convert SPRG[0-4] to shared page KVM: PPC: Implement hypervisor interface KVM: PPC: Add PV guest critical sections KVM: PPC: Add PV guest scratch registers KVM: PPC: Tell guest about pending interrupts KVM: PPC: Make RMO a define KVM: PPC: First magic page steps KVM: PPC: Magic Page Book3s support KVM: PPC: Magic Page BookE support KVM: PPC: Expose magic page support to guest KVM: Move kvm_guest_init out of generic code KVM: PPC: Generic KVM PV guest support KVM: PPC: KVM PV guest stubs KVM: PPC: PV instructions to loads and stores KVM: PPC: PV tlbsync to nop KVM: PPC: Introduce kvm_tmp framework KVM: PPC: Introduce branch patching helper KVM: PPC: PV assembler helpers KVM: PPC: PV mtmsrd L=1 KVM: PPC: PV mtmsrd L=0 and mtmsr KVM: PPC: PV wrteei KVM: PPC: Add Documentation about PV interface Documentation/kvm/ppc-pv.txt | 185 ++ arch/powerpc/include/asm/kvm_book3s.h|1 - arch/powerpc/include/asm/kvm_host.h | 15 +- arch/powerpc/include/asm/kvm_para.h | 121 +- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kernel/Makefile |2 + arch/powerpc/kernel/asm-offsets.c| 18 ++- arch/powerpc/kernel/kvm.c| 408 ++ arch/powerpc/kernel/kvm_emul.S | 237 + arch/powerpc/kvm/44x.c |7 + arch/powerpc/kvm/44x_tlb.c |8 +- arch/powerpc/kvm/book3s.c| 165 - arch/powerpc/kvm/book3s_32_mmu.c | 28 ++- arch/powerpc/kvm/book3s_32_mmu_host.c| 16 +- arch/powerpc/kvm/book3s_64_mmu.c | 42 +++- arch/powerpc/kvm/book3s_64_mmu_host.c| 16 +- arch/powerpc/kvm/book3s_emulate.c| 25 +- arch/powerpc/kvm/book3s_paired_singles.c | 11 +- arch/powerpc/kvm/booke.c | 113 +++-- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c | 14 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500.c |7 + arch/powerpc/kvm/e500_tlb.c | 31 ++- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/emulate.c | 47 +++- arch/powerpc/kvm/powerpc.c | 42 +++- arch/powerpc/platforms/Kconfig | 10 + arch/x86/include/asm/kvm_para.h |6 + include/linux/kvm_para.h |7 +- 30 files changed, 1420 insertions(+), 174 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt create mode 100644 arch/powerpc/kernel/kvm.c create mode 100644 arch/powerpc/kernel/kvm_emul.S ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
Avi Kivity wrote: > On 07/01/2010 11:18 AM, Alexander Graf wrote: >> >> How does dirty bitmap flushing work on x86 atm? I loop through all >> mapped pages and flush the ones that match the range of the region I >> need to flush. But wouldn't it be a lot more efficient to have an >> hlist in the memslot and loop through that when I need to flush that >> memslot? >> > > x86 loops through the reverse-map link list rooted at the memory > slot. The linked list links all sptes for a single hva. > > So, it's like you describe, except it's an array of lists instead of a > single list. We need per-page rmap lists to be able to remove a > page's sptes in response to an mmu notifier callback, and to be able > to write protect a guest page if it's used as a page table. > But doesn't that mean that you still need to loop through all the hvas that you want to invalidate? Wouldn't it speed up dirty bitmap flushing a lot if we'd just have a simple linked list of all sPTEs belonging to that memslot? Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/9] Add Synopsys DesignWare HS USB OTG Controller driver.
Fushen, On Wednesday 30 June 2010 22:16:52 fushen chen wrote: > The driver is based on Synopsys driver 2.60a. OK. > We started to prepare open source submission based on our internal > version. We sync this version to linux-2.6-denx repository from time to > time. I'll sync the driver to the latest linux-2.6-denx as Wolfgang > pointed out, and re-submit patch to open source. Thanks. I really appreciate this. A lot of effort has gone into this driver to fix some very troubling issues. And this driver version has undergone very intensive testing. So please make sure to integrate the changes/fixes. And please add myself and Chuck Meade , who did most of the bigger changes, to Cc on your new patch versions. Thanks. Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On 07/01/2010 11:18 AM, Alexander Graf wrote: How does dirty bitmap flushing work on x86 atm? I loop through all mapped pages and flush the ones that match the range of the region I need to flush. But wouldn't it be a lot more efficient to have an hlist in the memslot and loop through that when I need to flush that memslot? x86 loops through the reverse-map link list rooted at the memory slot. The linked list links all sptes for a single hva. So, it's like you describe, except it's an array of lists instead of a single list. We need per-page rmap lists to be able to remove a page's sptes in response to an mmu notifier callback, and to be able to write protect a guest page if it's used as a page table. -- error compiling committee.c: too many arguments to function ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On 01.07.2010, at 09:29, Avi Kivity wrote: > On 06/30/2010 04:18 PM, Alexander Graf wrote: >> Book3s suffered from my really bad shadow MMU implementation so far. So >> I finally got around to implement a combined hash and list mechanism that >> allows for much faster lookup of mapped pages. >> >> To show that it really is faster, I tried to run simple process spawning >> code inside the guest with and without these patches: >> >> [without] >> >> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello> /dev/null; >> done >> >> real0m20.235s >> user0m10.418s >> sys 0m9.766s >> >> [with] >> >> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello> /dev/null; >> done >> >> real0m14.659s >> user0m8.967s >> sys 0m5.688s >> >> So as you can see, performance improved significantly. >> >> v2 -> v3: >> >> - use hlist >> - use global slab cache >> >> > > Looks good. Great :). How does dirty bitmap flushing work on x86 atm? I loop through all mapped pages and flush the ones that match the range of the region I need to flush. But wouldn't it be a lot more efficient to have an hlist in the memslot and loop through that when I need to flush that memslot? Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: machine check in kernel for a mpc870 board
Hi Scott, > How do I find the address, reg, and range for nodes like localbus, > soc, eth0, cpm, serial etc.? Do the addresses of localbus and soc > relate to IMMR? So my localbus and soc should be as follows? > > local...@fa200100 { > compatible = "fsl,mpc885-localbus", "fsl,pq1-localbus", > "simple-bus"; > #address-cells = <2>; > #size-cells = <1>; > reg = <0xfa200100 0x40>; > > ranges = < > 0 0 0xfe00 0x0100 // I'm not sure about > this? > >; > }; I managed to proceed a little bit further. Memory <- <0x0 0x800> (128MB) ENET0: local-mac-address <- 00:09:9b:01:58:64 CPU clock-frequency <- 0x7270e00 (120MHz) CPU timebase-frequency <- 0x393870 (4MHz) CPU bus-frequency <- 0x3938700 (60MHz) zImage starting: loaded at 0x0040 (sp: 0x07d1ccd0) Allocating 0x186bdd bytes for kernel ... gunzipping (0x <- 0x0040c000:0x00591c30)...done 0x173b18 bytes Linux/PowerPC load: root=/dev/ram Finalizing device tree... flat tree at 0x59e300 The gdb showed deadbeef. (gdb) target remote ppcbdi:2001 Remote debugging using ppcbdi:2001 0xdeadbeef in ?? () (gdb) The kernel doesn't seem to start. What could go wrong here? Thanks a lot, -Shawn. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
On 06/30/2010 04:18 PM, Alexander Graf wrote: Book3s suffered from my really bad shadow MMU implementation so far. So I finally got around to implement a combined hash and list mechanism that allows for much faster lookup of mapped pages. To show that it really is faster, I tried to run simple process spawning code inside the guest with and without these patches: [without] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello> /dev/null; done real0m20.235s user0m10.418s sys 0m9.766s [with] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello> /dev/null; done real0m14.659s user0m8.967s sys 0m5.688s So as you can see, performance improved significantly. v2 -> v3: - use hlist - use global slab cache Looks good. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev