Re: [Patch v2] kexec: increase max of kexec segments and use dynamic allocation
On 07/27/10 18:00, Milton Miller wrote: [ Added kexec at lists.infradead.org and linuxppc-dev@lists.ozlabs.org ] Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with many memory ranges. When hibernate on a machine with disjoint memory we do need one segment for each memory region. Increase this hard limit to 16K which is reasonably large. And change -segment from a static array to a dynamically allocated memory. Cc: Neil Hormannhor...@redhat.com Cc: huang yinghuang.ying.cari...@gmail.com Cc: Eric W. Biedermanebied...@xmission.com Signed-off-by: WANG Congamw...@redhat.com --- diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index ed31a29..f115585 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -131,10 +131,7 @@ static void copy_segments(unsigned long ind) void kexec_copy_flush(struct kimage *image) { long i, nr_segments = image-nr_segments; - struct kexec_segment ranges[KEXEC_SEGMENT_MAX]; - - /* save the ranges on the stack to efficiently flush the icache */ - memcpy(ranges, image-segment, sizeof(ranges)); + struct kexec_segment range; I'm glad you found our copy on the stack and removed the stack overflow that comes with this bump, but ... /* * After this call we may not use anything allocated in dynamic @@ -148,9 +145,11 @@ void kexec_copy_flush(struct kimage *image) * we need to clear the icache for all dest pages sometime, * including ones that were in place on the original copy */ - for (i = 0; i nr_segments; i++) - flush_icache_range((unsigned long)__va(ranges[i].mem), - (unsigned long)__va(ranges[i].mem + ranges[i].memsz)); + for (i = 0; i nr_segments; i++) { + memcpy(range,image-segment[i], sizeof(range)); + flush_icache_range((unsigned long)__va(range.mem), + (unsigned long)__va(range.mem + range.memsz)); + } } This is executed after the copy, so as it says, we may not use anything allocated in dynamic memory. We could allocate control pages to copy the segment list into. Actually ppc64 doesn't use the existing control page, but that is only 4kB today. We need the list to icache flush all the pages in all the segments. The as the indirect list doesn't have pages that were allocated at their destination. Or maybe the icache flush should be done in the generic code like it does for crash load segments? I don't get the point here, according to the comments, it is copied into stack because of efficiency. -- The opposite of love is not hate, it's indifference. - Elie Wiesel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V5] powerpc/mpc512x: Add gpio driver
On Wed, Jul 7, 2010 at 5:28 AM, Peter Korsgaard jac...@sunsite.dk wrote: Anatolij == Anatolij Gustschin ag...@denx.de writes: Hi, Old mail, I know .. Anatolij From: Matthias Fuchs matthias.fu...@esd.eu Anatolij This patch adds a gpio driver for MPC512X PowerPCs. Anatolij It has been tested on our CAN-CBX-CPU5201 module that Anatolij uses a MPC5121 CPU. This platform comes with a couple of Anatolij LEDs and configuration switches that have been used for testing. Anatolij After change to the of-gpio api the reworked driver has been Anatolij tested on pdm360ng board with some configuration switches. This looks very similar to the existing arch/powerpc/sysdev/mpc8xxx_gpio.c - Couldn't we just add 5121 support there instead? Anatolij +struct mpc512x_gpio_regs { Anatolij + u32 gpdir; Anatolij + u32 gpodr; Anatolij + u32 gpdat; Anatolij + u32 gpier; Anatolij + u32 gpimr; Anatolij + u32 gpicr1; Anatolij + u32 gpicr2; Anatolij +}; Hi Anatolij, Peter's right, the register map looks the same, except for the additional gpicr1 2 registers in the 512x version. Can the 512x gpios be supported by the 8xxx gpio driver? g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: i meet some surprising things,when i modify the dts file
On Thu, Jul 29, 2010 at 12:08 AM, hacklu embedway.t...@gmail.com wrote: local...@f0010100 { ranges = 0 0 FC00 100 2 0 FA00 100 1 0 7000 100 ; fl...@0,0{ . } fl...@2,0{ } board-cont...@1,0{ . } } this is part of my dts files. I don't kown what each field means in the config rangs. for instance, 2 0 FA00 100 . I only konw this: 2 is means chip selects. 0 is what? Fa0 means the start address. 100 means the range of the device Ranges translates from the child address domain to the parent address domain. It consists of 3 fields; The child base address, the parent base address, and the size. In this case: child base address := 2 0 (#address-cells = 2 in this node) parent base address := 0xfa00 (#address-cells = 1 in parent node) and length = 0x100 (16MB) For the child address, #address-cells is set to 2, meaning 1 cell for the chip select #, and 1 cell for an offset into the chip select range. In most cases the offset will be zero in a ranges property. So in this case, the ranges property states that chip select 2 is a 16MB region mapped to base address 0xfa00. but ,I got some puzzled. when I set the two flash in the 0,1 chips select or 0,2 chips select my linux works well. and, the board-control only can be set at 1 chis select,otherwise the pci doesn't be detected. Unless the bus controller hardware needs to know the chip select number for another purpose (ie. setting up a local bus DMA transfer), you could really use any number for the chip select as long as it is consistent between the child node and the ranges property. so , what is the chips select? is it based on hardware? Yes, it is based on hardware. The .dts file is describing which CS line each external device is attached to. but my flash can use 0,1,2 chips select. or it is just set by software? but my pci devece can only work in 1 chips select. BTW: I also want to know how to write the dts file. I want to understand each node in the dts files. but I can't get enough documents. I have readed the linux/document/... could you privode me some useful information? See here: http://www.devicetree.org/Device_Tree_Usage g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V5] powerpc/mpc512x: Add gpio driver
On Thu, 29 Jul 2010 01:19:23 -0600 Grant Likely grant.lik...@secretlab.ca wrote: On Wed, Jul 7, 2010 at 5:28 AM, Peter Korsgaard jac...@sunsite.dk wrote: Anatolij == Anatolij Gustschin ag...@denx.de writes: Hi, Old mail, I know .. Anatolij From: Matthias Fuchs matthias.fu...@esd.eu Anatolij This patch adds a gpio driver for MPC512X PowerPCs. Anatolij It has been tested on our CAN-CBX-CPU5201 module that Anatolij uses a MPC5121 CPU. This platform comes with a couple of Anatolij LEDs and configuration switches that have been used for testing. Anatolij After change to the of-gpio api the reworked driver has been Anatolij tested on pdm360ng board with some configuration switches. This looks very similar to the existing arch/powerpc/sysdev/mpc8xxx_gpio.c - Couldn't we just add 5121 support there instead? Anatolij +struct mpc512x_gpio_regs { Anatolij + u32 gpdir; Anatolij + u32 gpodr; Anatolij + u32 gpdat; Anatolij + u32 gpier; Anatolij + u32 gpimr; Anatolij + u32 gpicr1; Anatolij + u32 gpicr2; Anatolij +}; Hi Anatolij, Peter's right, the register map looks the same, except for the additional gpicr1 2 registers in the 512x version. Can the 512x gpios be supported by the 8xxx gpio driver? Hi Grant, I wanted to extend/test this driver but didn't have time so far. I'll look at 8xxx gpio driver this weekend to see if it can be used for 512x gpios. Anatolij ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
i meet some surprising things,when i modify the dts file
local...@f0010100 { ranges = 0 0 FC00 100 2 0 FA00 100 1 0 7000 100 ; fl...@0,0{ . } fl...@2,0{ } board-cont...@1,0{ . } } this is part of my dts files. I don't kown what each field means in the config rangs. for instance, 2 0 FA00 100 . I only konw this: 2 is means chip selects. 0 is what? Fa0 means the start address. 100 means the range of the device but ,I got some puzzled. when I set the two flash in the 0,1 chips select or 0,2 chips select my linux works well. and, the board-control only can be set at 1 chis select,otherwise the pci doesn't be detected. so , what is the chips select? is it based on hardware? but my flash can use 0,1,2 chips select. or it is just set by software? but my pci devece can only work in 1 chips select. BTW: I also want to know how to write the dts file. I want to understand each node in the dts files. but I can't get enough documents. I have readed the linux/document/... could you privode me some useful information? thank you ver much~ 2010-07-29 hacklu ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Memory Mapping Buffers smaller than page size?
Hi Simon, Thanks for the quick reply. One more thing I want to ask is what if I create a dma pool (using pci_pool_create()), allocate dma buffers from that pool and then try to memory map them? will the buffers in that case will be continuous and is it possible to memory map them in a single user space page? Thanks in advance Ravi Gupta On Wed, Jul 28, 2010 at 7:51 PM, Simon Richter simon.rich...@hogyros.dewrote: Hi, On Wed, Jul 28, 2010 at 06:44:10PM +0530, Ravi Gupta wrote: I am new to linux device drivers development. I have created 16 buffers of size 256 bytes each(using kmalloc()) in my device driver code. I want to memory map these buffers to user space. Now is it possible to memory map these buffer(16*256 = 4096 = 1 page on 32 bit linux) into a single page in user space OR i have to map them in individual pages in user space? Note, all the buffers may not be stored in continuous memory location. Pages are the smallest unit for mappings, so each buffer would end up in its own mapping. If you want the buffers to be accessible without an offset, then you cannot have them in continuous locations, as you cannot map memory from the middle of a page to the beginning either. So your options are: one page per buffer (wasteful, but gives you granular access control), or allocating all the buffers as a single block. Simon ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/6] Remove owner field from sysfs attribute structure
On Wed, Jul 28, 2010 at 10:09 PM, Guenter Roeck guenter.ro...@ericsson.com wrote: The following comment is found in include/linux/sysfs.h: /* FIXME * The *owner field is no longer used. * x86 tree has been cleaned up. The owner * attribute is still left for other arches. */ As it turns out, the *owner field is (again?) initialized in several modules, suggesting that such initialization may be creeping back into the code. This patch set removes the above comment, the *owner field, and each instance in the code where it was found to be initialized. Compiled with x86 allmodconfig as well as with all alpha, arm, mips, powerpc, and sparc defconfig builds. This seems reasonable to me. Can we get this in linux-next? Eric ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] Add support for ramdisk on ppc32 for uImage-ppc and Elf-ppc
On Tue, Jul 20, 2010 at 03:14:58PM -0500, Matthew McClintock wrote: This fixes --reuseinitrd and --ramdisk option for ppc32 on uImage-ppc and Elf. It works for normal kexec as well as for kdump. When using --reuseinitrd you need to specifify retain_initrd on the command line. Also, if you are doing kdump you need to make sure your initrd lives in the crashdump region otherwise the kdump kernel will not be able to access it. The --ramdisk option should always work. Thanks, I have applied this change. I had to do a minor merge on the Makefile, could you verify that the result is correct? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/4] irq: rename IRQF_TIMER to IRQF_NO_SUSPEND
On Wed, 28 Jul 2010, Ian Campbell wrote: Continue to provide IRQF_TIMER as an alias to IRQF_NO_SUSPEND since I think it is worth preserving the nice self-documenting name (where it is used appropriately). It also avoid needing to patch all the many users who are using the flag for an actual timer interrupt. I'm not happy about the alias. What about: #define __IRQF_TIMER0x0200 #define IRQF_NO_SUSPEND 0x0400 #define IRQF_TIMER(__IRQF_TIMER | IRQF_NO_SUSPEND) Thanks, tglx ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/4] irq: rename IRQF_TIMER to IRQF_NO_SUSPEND
On Thu, 2010-07-29 at 09:49 +0100, Thomas Gleixner wrote: On Wed, 28 Jul 2010, Ian Campbell wrote: Continue to provide IRQF_TIMER as an alias to IRQF_NO_SUSPEND since I think it is worth preserving the nice self-documenting name (where it is used appropriately). It also avoid needing to patch all the many users who are using the flag for an actual timer interrupt. I'm not happy about the alias. What about: #define __IRQF_TIMER 0x0200 #define IRQF_NO_SUSPEND 0x0400 #define IRQF_TIMER(__IRQF_TIMER | IRQF_NO_SUSPEND) Sure, I'll rework along those lines. Ian. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 0/7] Fixup booting with device trees and uImage/elf on ppc32
On Mon, Jul 26, 2010 at 11:22:58PM -0500, Matthew McClintock wrote: On Jul 26, 2010, at 9:55 PM, Simon Horman wrote: [Cced linuxppc-dev] On Tue, Jul 20, 2010 at 11:42:57PM -0500, Matthew McClintock wrote: This patch series adds full support for booting with a flat device tree with either uImage or elf file formats. Kexec and Kdump should work, and you should also be able to use ramdisks or reuse your current ramdisk as well This patch series was tested on an mpc85xx system with a kernel version 2.6.35-rc3 v1: Initial version v2: Added support for fs2dt (file system to device tree) v3: Fix some misc. git problems I had and other code cleanups Hi Matthew, I'm a little concerned that these changes are non trivial and haven't had much review. But I am prepared to put them into my tree once 2.0.2 is released - perhaps that way they will get some test coverage. Does that work for you? Either way works for me. I know they could use more review, however as Maxim said the current tree does not work AFAIK. Either way, I'm willing to keeping addressing everyones concerns and wait or move forward and make some quick fixes as well. All applied. I made some minor changes to three of the patches. I have noted each change in separate emails. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[GIT/PATCH 0/4] Do not use IRQF_TIMER for non timer interrupts
On Thu, 2010-07-29 at 09:49 +0100, Thomas Gleixner wrote: On Wed, 28 Jul 2010, Ian Campbell wrote: Continue to provide IRQF_TIMER as an alias to IRQF_NO_SUSPEND since I think it is worth preserving the nice self-documenting name (where it is used appropriately). It also avoid needing to patch all the many users who are using the flag for an actual timer interrupt. I'm not happy about the alias. What about: #define __IRQF_TIMER 0x0200 #define IRQF_NO_SUSPEND 0x0400 #define IRQF_TIMER(__IRQF_TIMER | IRQF_NO_SUSPEND) Resending with this change. Plus I ran checkpatch on the whole lot (I previously managed to run it only on the first patch) and fixed the complaints. Ian. The following changes since commit fc0f5ac8fe693d1b05f5a928cc48135d1c8b7f2e: Linus Torvalds (1): Merge branch 'for-linus' of git://git.kernel.org/.../ericvh/v9fs are available in the git repository at: git://xenbits.xensource.com/people/ianc/linux-2.6.git for-irq/irqf-no-suspend Ian Campbell (4): irq: Add new IRQ flag IRQF_NO_SUSPEND ixp4xx-beeper: Use IRQF_NO_SUSPEND not IRQF_TIMER for non-timer interrupt powerpc: Use IRQF_NO_SUSPEND not IRQF_TIMER for non-timer interrupts xen: do not suspend IPI IRQs. arch/powerpc/platforms/powermac/low_i2c.c |5 +++-- drivers/input/misc/ixp4xx-beeper.c|3 ++- drivers/macintosh/via-pmu.c |9 + drivers/xen/events.c |1 + include/linux/interrupt.h |7 ++- kernel/irq/manage.c |2 +- 6 files changed, 18 insertions(+), 9 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/4] irq: Add new IRQ flag IRQF_NO_SUSPEND
A small number of users of IRQF_TIMER are using it for the implied no suspend behaviour on interrupts which are not timer interrupts. Therefore add a new IRQF_NO_SUSPEND flag, rename IRQF_TIMER to __IRQF_TIMER and redefine IRQF_TIMER in terms of these new flags. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Thomas Gleixner t...@linutronix.de Cc: Jeremy Fitzhardinge jer...@goop.org Cc: Dmitry Torokhov dmitry.torok...@gmail.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Grant Likely grant.lik...@secretlab.ca Cc: xen-de...@lists.xensource.com Cc: linux-in...@vger.kernel.org Cc: linuxppc-...@ozlabs.org Cc: devicetree-disc...@lists.ozlabs.org --- include/linux/interrupt.h |7 ++- kernel/irq/manage.c |2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index c233113..a0384a4 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -53,16 +53,21 @@ * IRQF_ONESHOT - Interrupt is not reenabled after the hardirq handler finished. *Used by threaded interrupts which need to keep the *irq line disabled until the threaded handler has been run. + * IRQF_NO_SUSPEND - Do not disable this IRQ during suspend + * */ #define IRQF_DISABLED 0x0020 #define IRQF_SAMPLE_RANDOM 0x0040 #define IRQF_SHARED0x0080 #define IRQF_PROBE_SHARED 0x0100 -#define IRQF_TIMER 0x0200 +#define __IRQF_TIMER 0x0200 #define IRQF_PERCPU0x0400 #define IRQF_NOBALANCING 0x0800 #define IRQF_IRQPOLL 0x1000 #define IRQF_ONESHOT 0x2000 +#define IRQF_NO_SUSPEND0x4000 + +#define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND) /* * Bits used by threaded handlers: diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index e149748..c3003e9 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -216,7 +216,7 @@ static inline int setup_affinity(unsigned int irq, struct irq_desc *desc) void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend) { if (suspend) { - if (!desc-action || (desc-action-flags IRQF_TIMER)) + if (!desc-action || (desc-action-flags IRQF_NO_SUSPEND)) return; desc-status |= IRQ_SUSPENDED; } -- 1.5.6.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/4] powerpc: Use IRQF_NO_SUSPEND not IRQF_TIMER for non-timer interrupts
kw_i2c_irq and via_pmu_interrupt are not timer interrupts and therefore should not use IRQF_TIMER. Use the recently introduced IRQF_NO_SUSPEND instead since that is the actual desired behaviour. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Thomas Gleixner t...@linutronix.de Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Grant Likely grant.lik...@secretlab.ca Cc: linuxppc-...@ozlabs.org Cc: devicetree-disc...@lists.ozlabs.org --- arch/powerpc/platforms/powermac/low_i2c.c |5 +++-- drivers/macintosh/via-pmu.c |9 + 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powermac/low_i2c.c b/arch/powerpc/platforms/powermac/low_i2c.c index 06a137c..480567e 100644 --- a/arch/powerpc/platforms/powermac/low_i2c.c +++ b/arch/powerpc/platforms/powermac/low_i2c.c @@ -542,11 +542,12 @@ static struct pmac_i2c_host_kw *__init kw_i2c_host_init(struct device_node *np) /* Make sure IRQ is disabled */ kw_write_reg(reg_ier, 0); - /* Request chip interrupt. We set IRQF_TIMER because we don't + /* Request chip interrupt. We set IRQF_NO_SUSPEND because we don't * want that interrupt disabled between the 2 passes of driver * suspend or we'll have issues running the pfuncs */ - if (request_irq(host-irq, kw_i2c_irq, IRQF_TIMER, keywest i2c, host)) + if (request_irq(host-irq, kw_i2c_irq, IRQF_NO_SUSPEND, + keywest i2c, host)) host-irq = NO_IRQ; printk(KERN_INFO KeyWest i2c @0x%08x irq %d %s\n, diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c index 3d4fc0f..35bc273 100644 --- a/drivers/macintosh/via-pmu.c +++ b/drivers/macintosh/via-pmu.c @@ -400,11 +400,12 @@ static int __init via_pmu_start(void) printk(KERN_ERR via-pmu: can't map interrupt\n); return -ENODEV; } - /* We set IRQF_TIMER because we don't want the interrupt to be disabled -* between the 2 passes of driver suspend, we control our own disabling -* for that one + /* We set IRQF_NO_SUSPEND because we don't want the interrupt +* to be disabled between the 2 passes of driver suspend, we +* control our own disabling for that one */ - if (request_irq(irq, via_pmu_interrupt, IRQF_TIMER, VIA-PMU, (void *)0)) { + if (request_irq(irq, via_pmu_interrupt, IRQF_NO_SUSPEND, + VIA-PMU, (void *)0)) { printk(KERN_ERR via-pmu: can't request irq %d\n, irq); return -ENODEV; } -- 1.5.6.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems using UART on MPC5200
Hi Sven, I am using a PowerPC MPC5200 from Freescale (with STK5200-Board), ELDK 4.2 from DENX and the Kernel 2.6.34-rc5. My Kernel is running fine. The console output is coming over the device ttyPSC0. In future I want to login over telnet. So I deactivated the Kerneloption to output the console over the UART device. It would help if you were more precise in describing what you did and what you try to achieve. What exact option did you change? Now I want to read and write to the RS232 interface from a program. But when I try to open the device ttyPSC* I get the following error: unable to read portsettings : Inappropriate ioctl for device The message means what it says - whatever device driver is connected to the device file you open does not support the ioctl you call on it. Now to better understand this, it would help if you tell us what device file you open, what major and minor number this has, what /proc/devices shows this hooks to and what ioctl you do in your application. What does this mean ? How can I send and receive Data from/to the UART ? This should all work with standard procedures. Cheers Detlev -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-40 Fax: (+49)-8142-66989-80 Email: d...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 00/27] KVM PPC PV framework v3
On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the hypervisor extensions. While that is all great to show that virtualization is possible, there are quite some cases where the emulation overhead of privileged instructions is killing performance. This patchset tackles exactly that issue. It introduces a paravirtual framework using which KVM and Linux share a page to exchange register state with. That way we don't have to switch to the hypervisor just to change a value of a privileged register. To prove my point, I ran the same test I did for the MMU optimizations against the PV framework. Here are the results: [without] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m14.659s user0m8.967s sys 0m5.688s [with] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m7.557s user0m4.121s sys 0m3.426s So this is a significant performance improvement! I'm quite happy how fast this whole thing becomes :) I tried to take all comments I've heard from people so far about such a PV framework into account. In case you told me something before that is a no-go and I still did it, please just tell me again. To make use of this whole thing you also need patches to qemu and openbios. I have them in my queue, but want to see this set upstream first before I start sending patches to the other projects. Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start experiencing the power yourself. - heh v1 - v2: - change hypervisor calls to use r0 and r3 - make crit detection only trigger in supervisor mode - RMO - PAM - introduce kvm_patch_ins - only flush icache when patching - introduce kvm_patch_ins_b - update documentation v2 - v3: - use pPAPR conventions for hypercall interface - only use r0 as magic sc number - remove PVR detection - remove BookE shared page mapping support - combine book3s-64 and -32 magic page ra override - add self-test check if the mapping works to guest code - add safety check for relocatable kernels Alexander Graf (27): KVM: PPC: Introduce shared page KVM: PPC: Convert MSR to shared page KVM: PPC: Convert DSISR to shared page KVM: PPC: Convert DAR to shared page. KVM: PPC: Convert SRR0 and SRR1 to shared page KVM: PPC: Convert SPRG[0-4] to shared page KVM: PPC: Implement hypervisor interface KVM: PPC: Add PV guest critical sections KVM: PPC: Add PV guest scratch registers KVM: PPC: Tell guest about pending interrupts KVM: PPC: Make PAM a define KVM: PPC: First magic page steps KVM: PPC: Magic Page Book3s support KVM: PPC: Expose magic page support to guest KVM: Move kvm_guest_init out of generic code KVM: PPC: Generic KVM PV guest support KVM: PPC: KVM PV guest stubs KVM: PPC: PV instructions to loads and stores KVM: PPC: PV tlbsync to nop KVM: PPC: Introduce kvm_tmp framework KVM: PPC: Introduce branch patching helper KVM: PPC: PV assembler helpers KVM: PPC: PV mtmsrd L=1 KVM: PPC: PV mtmsrd L=0 and mtmsr KVM: PPC: PV wrteei KVM: PPC: Add Documentation about PV interface KVM: PPC: Add get_pvinfo interface to query hypercall instructions Documentation/kvm/api.txt| 23 ++ Documentation/kvm/ppc-pv.txt | 180 +++ arch/powerpc/include/asm/kvm_book3s.h|2 +- arch/powerpc/include/asm/kvm_host.h | 15 +- arch/powerpc/include/asm/kvm_para.h | 135 - arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kernel/Makefile |2 + arch/powerpc/kernel/asm-offsets.c| 18 +- arch/powerpc/kernel/kvm.c| 485 ++ arch/powerpc/kernel/kvm_emul.S | 247 +++ arch/powerpc/kvm/44x.c |7 + arch/powerpc/kvm/44x_tlb.c |8 +- arch/powerpc/kvm/book3s.c| 188 arch/powerpc/kvm/book3s_32_mmu.c | 28 ++- arch/powerpc/kvm/book3s_32_mmu_host.c|6 +- arch/powerpc/kvm/book3s_64_mmu.c | 42 +++- arch/powerpc/kvm/book3s_64_mmu_host.c| 13 +- arch/powerpc/kvm/book3s_emulate.c| 25 +- arch/powerpc/kvm/book3s_paired_singles.c | 11 +- arch/powerpc/kvm/booke.c | 83 -- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c | 14 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500.c |7 + arch/powerpc/kvm/e500_tlb.c | 12 +- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/emulate.c | 36 ++- arch/powerpc/kvm/powerpc.c | 84 +- arch/powerpc/platforms/Kconfig | 10 + arch/x86/include/asm/kvm_para.h |6 + include/linux/kvm.h | 11 + include/linux/kvm_para.h |7 +- 32 files changed, 1538
[PATCH 08/27] KVM: PPC: Add PV guest critical sections
When running in hooked code we need a way to disable interrupts without clobbering any interrupts or exiting out to the hypervisor. To achieve this, we have an additional critical field in the shared page. If that field is equal to the r1 register of the guest, it tells the hypervisor that we're in such a critical section and thus may not receive any interrupts. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - make crit detection only trigger in supervisor mode --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c | 18 -- arch/powerpc/kvm/booke.c| 15 +++ 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 556fd59..4577e7b 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -24,6 +24,7 @@ #include linux/of.h struct kvm_vcpu_arch_shared { + __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; __u64 sprg2; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 5cb5f0d..d6227ff 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -251,14 +251,28 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) int deliver = 1; int vec = 0; ulong flags = 0ULL; + ulong crit_raw = vcpu-arch.shared-critical; + ulong crit_r1 = kvmppc_get_gpr(vcpu, 1); + bool crit; + + /* Truncate crit indicators in 32 bit mode */ + if (!(vcpu-arch.shared-msr MSR_SF)) { + crit_raw = 0x; + crit_r1 = 0x; + } + + /* Critical section when crit == r1 */ + crit = (crit_raw == crit_r1); + /* ... and we're in supervisor mode */ + crit = crit !(vcpu-arch.shared-msr MSR_PR); switch (priority) { case BOOK3S_IRQPRIO_DECREMENTER: - deliver = vcpu-arch.shared-msr MSR_EE; + deliver = (vcpu-arch.shared-msr MSR_EE) !crit; vec = BOOK3S_INTERRUPT_DECREMENTER; break; case BOOK3S_IRQPRIO_EXTERNAL: - deliver = vcpu-arch.shared-msr MSR_EE; + deliver = (vcpu-arch.shared-msr MSR_EE) !crit; vec = BOOK3S_INTERRUPT_EXTERNAL; break; case BOOK3S_IRQPRIO_SYSTEM_RESET: diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 13e0747..104d0ee 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -147,6 +147,20 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, int allowed = 0; ulong uninitialized_var(msr_mask); bool update_esr = false, update_dear = false; + ulong crit_raw = vcpu-arch.shared-critical; + ulong crit_r1 = kvmppc_get_gpr(vcpu, 1); + bool crit; + + /* Truncate crit indicators in 32 bit mode */ + if (!(vcpu-arch.shared-msr MSR_SF)) { + crit_raw = 0x; + crit_r1 = 0x; + } + + /* Critical section when crit == r1 */ + crit = (crit_raw == crit_r1); + /* ... and we're in supervisor mode */ + crit = crit !(vcpu-arch.shared-msr MSR_PR); switch (priority) { case BOOKE_IRQPRIO_DTLB_MISS: @@ -181,6 +195,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_DECREMENTER: case BOOKE_IRQPRIO_FIT: allowed = vcpu-arch.shared-msr MSR_EE; + allowed = allowed !crit; msr_mask = MSR_CE|MSR_ME|MSR_DE; break; case BOOKE_IRQPRIO_DEBUG: -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 01/27] KVM: PPC: Introduce shared page
For transparent variable sharing between the hypervisor and guest, I introduce a shared page. This shared page will contain all the registers the guest can read and write safely without exiting guest context. This patch only implements the stubs required for the basic structure of the shared page. The actual register moving follows. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/include/asm/kvm_para.h |5 + arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kvm/44x.c |7 +++ arch/powerpc/kvm/book3s.c |9 - arch/powerpc/kvm/e500.c |7 +++ 6 files changed, 30 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index b0b23c0..53edacd 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -25,6 +25,7 @@ #include linux/interrupt.h #include linux/types.h #include linux/kvm_types.h +#include linux/kvm_para.h #include asm/kvm_asm.h #define KVM_MAX_VCPUS 1 @@ -290,6 +291,7 @@ struct kvm_vcpu_arch { struct tasklet_struct tasklet; u64 dec_jiffies; unsigned long pending_exceptions; + struct kvm_vcpu_arch_shared *shared; #ifdef CONFIG_PPC_BOOK3S struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 2d48f6a..1485ba8 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -20,6 +20,11 @@ #ifndef __POWERPC_KVM_PARA_H__ #define __POWERPC_KVM_PARA_H__ +#include linux/types.h + +struct kvm_vcpu_arch_shared { +}; + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 496cc5b..944f593 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -400,6 +400,7 @@ int main(void) DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); + DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c index 73c0a3f..e7b1f3f 100644 --- a/arch/powerpc/kvm/44x.c +++ b/arch/powerpc/kvm/44x.c @@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_vcpu; + return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_vcpu: kmem_cache_free(kvm_vcpu_cache, vcpu_44x); out: @@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu); + free_page((unsigned long)vcpu-arch.shared); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, vcpu_44x); } diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index a3cef30..b3385dd 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1242,6 +1242,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_shadow_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_vcpu; + vcpu-arch.host_retip = kvm_return_point; vcpu-arch.host_msr = mfmsr(); #ifdef CONFIG_PPC_BOOK3S_64 @@ -1268,10 +1272,12 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) err = kvmppc_mmu_init(vcpu); if (err 0) - goto free_shadow_vcpu; + goto uninit_vcpu; return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_shadow_vcpu: kfree(vcpu_book3s-shadow_vcpu); free_vcpu: @@ -1284,6 +1290,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); + free_page((unsigned long)vcpu-arch.shared); kvm_vcpu_uninit(vcpu); kfree(vcpu_book3s-shadow_vcpu); vfree(vcpu_book3s); diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index e8a00b0..71750f2 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto uninit_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_tlb; + return vcpu; +uninit_tlb: + kvmppc_e500_tlb_uninit(vcpu_e500); uninit_vcpu:
[PATCH 06/27] KVM: PPC: Convert SPRG[0-4] to shared page
When in kernel mode there are 4 additional registers available that are simple data storage. Instead of exiting to the hypervisor to read and write those, we can just share them with the guest using the page. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |4 arch/powerpc/include/asm/kvm_para.h |4 arch/powerpc/kvm/book3s.c | 16 arch/powerpc/kvm/booke.c| 16 arch/powerpc/kvm/emulate.c | 24 5 files changed, 36 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5255d75..221cf85 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -217,10 +217,6 @@ struct kvm_vcpu_arch { ulong guest_owned_ext; #endif u32 mmucr; - ulong sprg0; - ulong sprg1; - ulong sprg2; - ulong sprg3; ulong sprg4; ulong sprg5; ulong sprg6; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d7fc6c2..e402999 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,10 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; __u64 srr0; __u64 srr1; __u64 dar; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index afa0dd4..cfd7fe5 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = vcpu-arch.shared-srr0; regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.sprg0; - regs-sprg1 = vcpu-arch.sprg1; - regs-sprg2 = vcpu-arch.sprg2; - regs-sprg3 = vcpu-arch.sprg3; + regs-sprg0 = vcpu-arch.shared-sprg0; + regs-sprg1 = vcpu-arch.shared-sprg1; + regs-sprg2 = vcpu-arch.shared-sprg2; + regs-sprg3 = vcpu-arch.shared-sprg3; regs-sprg5 = vcpu-arch.sprg4; regs-sprg6 = vcpu-arch.sprg5; regs-sprg7 = vcpu-arch.sprg6; @@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.shared-srr0 = regs-srr0; vcpu-arch.shared-srr1 = regs-srr1; - vcpu-arch.sprg0 = regs-sprg0; - vcpu-arch.sprg1 = regs-sprg1; - vcpu-arch.sprg2 = regs-sprg2; - vcpu-arch.sprg3 = regs-sprg3; + vcpu-arch.shared-sprg0 = regs-sprg0; + vcpu-arch.shared-sprg1 = regs-sprg1; + vcpu-arch.shared-sprg2 = regs-sprg2; + vcpu-arch.shared-sprg3 = regs-sprg3; vcpu-arch.sprg5 = regs-sprg4; vcpu-arch.sprg6 = regs-sprg5; vcpu-arch.sprg7 = regs-sprg6; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 793df28..b2c8c42 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = vcpu-arch.shared-srr0; regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.sprg0; - regs-sprg1 = vcpu-arch.sprg1; - regs-sprg2 = vcpu-arch.sprg2; - regs-sprg3 = vcpu-arch.sprg3; + regs-sprg0 = vcpu-arch.shared-sprg0; + regs-sprg1 = vcpu-arch.shared-sprg1; + regs-sprg2 = vcpu-arch.shared-sprg2; + regs-sprg3 = vcpu-arch.shared-sprg3; regs-sprg5 = vcpu-arch.sprg4; regs-sprg6 = vcpu-arch.sprg5; regs-sprg7 = vcpu-arch.sprg6; @@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.shared-srr0 = regs-srr0; vcpu-arch.shared-srr1 = regs-srr1; - vcpu-arch.sprg0 = regs-sprg0; - vcpu-arch.sprg1 = regs-sprg1; - vcpu-arch.sprg2 = regs-sprg2; - vcpu-arch.sprg3 = regs-sprg3; + vcpu-arch.shared-sprg0 = regs-sprg0; + vcpu-arch.shared-sprg1 = regs-sprg1; + vcpu-arch.shared-sprg2 = regs-sprg2; + vcpu-arch.shared-sprg3 = regs-sprg3; vcpu-arch.sprg5 = regs-sprg4; vcpu-arch.sprg6 = regs-sprg5; vcpu-arch.sprg7 = regs-sprg6; diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index ad0fa4f..454869b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, rt, get_tb()); break; case SPRN_SPRG0: - kvmppc_set_gpr(vcpu, rt,
[PATCH 02/27] KVM: PPC: Convert MSR to shared page
One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the interrupts enabled flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the documentation for a list of MSR bits that are safe to be set from inside the guest. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kernel/asm-offsets.c|2 +- arch/powerpc/kvm/44x_tlb.c |8 ++-- arch/powerpc/kvm/book3s.c| 65 -- arch/powerpc/kvm/book3s_32_mmu.c | 12 +++--- arch/powerpc/kvm/book3s_32_mmu_host.c|4 +- arch/powerpc/kvm/book3s_64_mmu.c | 12 +++--- arch/powerpc/kvm/book3s_64_mmu_host.c|4 +- arch/powerpc/kvm/book3s_emulate.c|9 ++-- arch/powerpc/kvm/book3s_paired_singles.c |7 ++- arch/powerpc/kvm/booke.c | 20 +- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c |6 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500_tlb.c | 12 +++--- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/powerpc.c |3 +- 18 files changed, 93 insertions(+), 84 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 53edacd..ba20f90 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -211,7 +211,6 @@ struct kvm_vcpu_arch { u32 cr; #endif - ulong msr; #ifdef CONFIG_PPC_BOOK3S ulong shadow_msr; ulong hflags; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 1485ba8..a17dc52 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 msr; }; #ifdef __KERNEL__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 944f593..a55d47e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -394,13 +394,13 @@ int main(void) DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack)); DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid)); DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr)); - DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr)); DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4)); DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5)); DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); + DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c index 8123125..4cbbca7 100644 --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -221,14 +221,14 @@ gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index, int kvmppc_mmu_itlb_index(struct kvm_vcpu *vcpu, gva_t eaddr) { - unsigned int as = !!(vcpu-arch.msr MSR_IS); + unsigned int as = !!(vcpu-arch.shared-msr MSR_IS); return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu-arch.pid, as); } int kvmppc_mmu_dtlb_index(struct kvm_vcpu *vcpu, gva_t eaddr) { - unsigned int as = !!(vcpu-arch.msr MSR_DS); + unsigned int as = !!(vcpu-arch.shared-msr MSR_DS); return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu-arch.pid, as); } @@ -353,7 +353,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gpa_t gpaddr, stlbe.word1 = (hpaddr 0xfc00) | ((hpaddr 32) 0xf); stlbe.word2 = kvmppc_44x_tlb_shadow_attrib(flags, - vcpu-arch.msr MSR_PR); + vcpu-arch.shared-msr MSR_PR); stlbe.tid = !(asid 0xff); /* Keep track of the reference so we can properly release it later. */ @@ -422,7 +422,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu, /* Does it match current guest AS? */ /* XXX what about IS != DS? */ - if (get_tlb_ts(tlbe) != !!(vcpu-arch.msr MSR_IS)) + if (get_tlb_ts(tlbe) != !!(vcpu-arch.shared-msr MSR_IS)) return 0; gpa = get_tlb_raddr(tlbe); diff
[PATCH 14/27] KVM: PPC: Expose magic page support to guest
Now that we have the shared page in place and the MMU code knows about the magic page, we can expose that capability to the guest! Signed-off-by: Alexander Graf ag...@suse.de --- v2 - v3: - align hypercalls to in/out of ePAPR --- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/powerpc.c | 11 +++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 0653b0d..7438ab3 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -45,6 +45,8 @@ struct kvm_vcpu_arch_shared { #define HC_EV_SUCCESS 0 #define HC_EV_UNIMPLEMENTED12 +#define KVM_FEATURE_MAGIC_PAGE 1 + #ifdef __KERNEL__ #ifdef CONFIG_KVM_GUEST diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index a4cf4b4..fecfe04 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -61,8 +61,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) } switch (nr) { + case HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE: + { + vcpu-arch.magic_page_pa = param1; + vcpu-arch.magic_page_ea = param2; + + r = HC_EV_SUCCESS; + break; + } case HC_VENDOR_KVM | KVM_HC_FEATURES: r = HC_EV_SUCCESS; +#if defined(CONFIG_PPC_BOOK3S) /* XXX Missing magic page on BookE */ + r2 |= (1 KVM_FEATURE_MAGIC_PAGE); +#endif /* Second return value is in r4 */ kvmppc_set_gpr(vcpu, 4, r2); -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 12/27] KVM: PPC: First magic page steps
We will be introducing a method to project the shared page in guest context. As soon as we're talking about this coupling, the shared page is colled magic page. This patch introduces simple defines, so the follow-up patches are easier to read. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ include/linux/kvm_para.h|1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1674da8..e1da775 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -287,6 +287,8 @@ struct kvm_vcpu_arch { u64 dec_jiffies; unsigned long pending_exceptions; struct kvm_vcpu_arch_shared *shared; + unsigned long magic_page_pa; /* phys addr to map the magic page to */ + unsigned long magic_page_ea; /* effect. addr to map the magic page to */ #ifdef CONFIG_PPC_BOOK3S struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 3b8080e..ac2015a 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -18,6 +18,7 @@ #define KVM_HC_VAPIC_POLL_IRQ 1 #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 +#define KVM_HC_PPC_MAP_MAGIC_PAGE 4 /* * hypercalls use architecture specific -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 17/27] KVM: PPC: KVM PV guest stubs
We will soon start and replace instructions from the text section with other, paravirtualized versions. To ease the readability of those patches I split out the generic looping and magic page mapping code out. This patch still only contains stubs. But at least it loops through the text section :). Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - kvm guest patch framework: introduce patch_ins v2 - v3: - add self-test in guest code - remove superfluous new lines in generic guest code --- arch/powerpc/kernel/kvm.c | 95 + 1 files changed, 95 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index a5ece71..e93366f 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -33,6 +33,62 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +#define KVM_MASK_RT0x03e0 + +static bool kvm_patching_worked = true; + +static inline void kvm_patch_ins(u32 *inst, u32 new_inst) +{ + *inst = new_inst; + flush_icache_range((ulong)inst, (ulong)inst + 4); +} + +static void kvm_map_magic_page(void *data) +{ + kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, + KVM_MAGIC_PAGE, /* Physical Address */ + KVM_MAGIC_PAGE); /* Effective Address */ +} + +static void kvm_check_ins(u32 *inst) +{ + u32 _inst = *inst; + u32 inst_no_rt = _inst ~KVM_MASK_RT; + u32 inst_rt = _inst KVM_MASK_RT; + + switch (inst_no_rt) { + } + + switch (_inst) { + } +} + +static void kvm_use_magic_page(void) +{ + u32 *p; + u32 *start, *end; + u32 tmp; + + /* Tell the host to map the magic page to -4096 on all CPUs */ + on_each_cpu(kvm_map_magic_page, NULL, 1); + + /* Quick self-test to see if the mapping works */ + if (__get_user(tmp, (u32*)KVM_MAGIC_PAGE)) { + kvm_patching_worked = false; + return; + } + + /* Now loop through all code and find instructions */ + start = (void*)_stext; + end = (void*)_etext; + + for (p = start; p end; p++) + kvm_check_ins(p); + + printk(KERN_INFO KVM: Live patching for a fast VM %s\n, +kvm_patching_worked ? worked : failed); +} + unsigned long kvm_hypercall(unsigned long *in, unsigned long *out, unsigned long nr) @@ -69,3 +125,42 @@ unsigned long kvm_hypercall(unsigned long *in, return r3; } EXPORT_SYMBOL_GPL(kvm_hypercall); + +static int kvm_para_setup(void) +{ + extern u32 kvm_hypercall_start; + struct device_node *hyper_node; + u32 *insts; + int len, i; + + hyper_node = of_find_node_by_path(/hypervisor); + if (!hyper_node) + return -1; + + insts = (u32*)of_get_property(hyper_node, hcall-instructions, len); + if (len % 4) + return -1; + if (len (4 * 4)) + return -1; + + for (i = 0; i (len / 4); i++) + kvm_patch_ins((kvm_hypercall_start)[i], insts[i]); + + return 0; +} + +static int __init kvm_guest_init(void) +{ + if (!kvm_para_available()) + return 0; + + if (kvm_para_setup()) + return 0; + + if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) + kvm_use_magic_page(); + + return 0; +} + +postcore_initcall(kvm_guest_init); -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 10/27] KVM: PPC: Tell guest about pending interrupts
When the guest turns on interrupts again, it needs to know if we have an interrupt pending for it. Because if so, it should rather get out of guest context and get the interrupt. So we introduce a new field in the shared page that we use to tell the guest that there's a pending interrupt lying around. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/booke.c|7 +++ 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 5be00c9..0653b0d 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -37,6 +37,7 @@ struct kvm_vcpu_arch_shared { __u64 dar; __u64 msr; __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ }; #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index d6227ff..06229fe 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -337,6 +337,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = vcpu-arch.pending_exceptions; + unsigned long old_pending = vcpu-arch.pending_exceptions; unsigned int priority; #ifdef EXIT_DEBUG @@ -356,6 +357,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu-arch.shared-int_pending = 1; + else if (old_pending) + vcpu-arch.shared-int_pending = 0; } void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 104d0ee..c604277 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -224,6 +224,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = vcpu-arch.pending_exceptions; + unsigned long old_pending = vcpu-arch.pending_exceptions; unsigned int priority; priority = __ffs(*pending); @@ -235,6 +236,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu-arch.shared-int_pending = 1; + else if (old_pending) + vcpu-arch.shared-int_pending = 0; } /** -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 20/27] KVM: PPC: Introduce kvm_tmp framework
We will soon require more sophisticated methods to replace single instructions with multiple instructions. We do that by branching to a memory region where we write replacement code for the instruction to. This region needs to be within 32 MB of the patched instruction though, because that's the furthest we can jump with immediate branches. So we keep 1MB of free space around in bss. After we're done initing we can just tell the mm system that the unused pages are free, but until then we have enough space to fit all our code in. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 42 -- 1 files changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 3258922..926f93f 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -65,6 +65,8 @@ #define KVM_INST_TLBSYNC 0x7c00046c static bool kvm_patching_worked = true; +static char kvm_tmp[1024 * 1024]; +static int kvm_tmp_index; static inline void kvm_patch_ins(u32 *inst, u32 new_inst) { @@ -105,6 +107,23 @@ static void kvm_patch_ins_nop(u32 *inst) kvm_patch_ins(inst, KVM_INST_NOP); } +static u32 *kvm_alloc(int len) +{ + u32 *p; + + if ((kvm_tmp_index + len) ARRAY_SIZE(kvm_tmp)) { + printk(KERN_ERR KVM: No more space (%d + %d)\n, + kvm_tmp_index, len); + kvm_patching_worked = false; + return NULL; + } + + p = (void*)kvm_tmp[kvm_tmp_index]; + kvm_tmp_index += len; + + return p; +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -270,17 +289,36 @@ static int kvm_para_setup(void) return 0; } +static __init void kvm_free_tmp(void) +{ + unsigned long start, end; + + start = (ulong)kvm_tmp[kvm_tmp_index + (PAGE_SIZE - 1)] PAGE_MASK; + end = (ulong)kvm_tmp[ARRAY_SIZE(kvm_tmp)] PAGE_MASK; + + /* Free the tmp space we don't need */ + for (; start end; start += PAGE_SIZE) { + ClearPageReserved(virt_to_page(start)); + init_page_count(virt_to_page(start)); + free_page(start); + totalram_pages++; + } +} + static int __init kvm_guest_init(void) { if (!kvm_para_available()) - return 0; + goto free_tmp; if (kvm_para_setup()) - return 0; + goto free_tmp; if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) kvm_use_magic_page(); +free_tmp: + kvm_free_tmp(); + return 0; } -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 09/27] KVM: PPC: Add PV guest scratch registers
While running in hooked code we need to store register contents out because we must not clobber any registers. So let's add some fields to the shared page we can just happily write to. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 4577e7b..5be00c9 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -24,6 +24,9 @@ #include linux/of.h struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 11/27] KVM: PPC: Make PAM a define
On PowerPC it's very normal to not support all of the physical RAM in real mode. To check if we're matching on the shared page or not, we need to know the limits so we can restrain ourselves to that range. So let's make it a define instead of open-coding it. And while at it, let's also increase it. Signed-off-by: Alexander Graf ag...@suse.de v2 - v3: - RMO - PAM (non-magic page) --- arch/powerpc/include/asm/kvm_host.h |3 +++ arch/powerpc/kvm/book3s.c |4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 221cf85..1674da8 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -48,6 +48,9 @@ #define HPTEG_HASH_NUM_VPTE(1 HPTEG_HASH_BITS_VPTE) #define HPTEG_HASH_NUM_VPTE_LONG (1 HPTEG_HASH_BITS_VPTE_LONG) +/* Physical Address Mask - allowed range of real mode RAM access */ +#define KVM_PAM0x0fffULL + struct kvm; struct kvm_run; struct kvm_vcpu; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 06229fe..0ed5376 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -465,7 +465,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, r = vcpu-arch.mmu.xlate(vcpu, eaddr, pte, data); } else { pte-eaddr = eaddr; - pte-raddr = eaddr 0x; + pte-raddr = eaddr KVM_PAM; pte-vpage = VSID_REAL | eaddr 12; pte-may_read = true; pte-may_write = true; @@ -579,7 +579,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, pte.may_execute = true; pte.may_read = true; pte.may_write = true; - pte.raddr = eaddr 0x; + pte.raddr = eaddr KVM_PAM; pte.eaddr = eaddr; pte.vpage = eaddr 12; } -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 15/27] KVM: Move kvm_guest_init out of generic code
Currently x86 is the only architecture that uses kvm_guest_init(). With PowerPC we're getting a second user, but the signature is different there and we don't need to export it, as it uses the normal kernel init framework. So let's move the x86 specific definition of that function over to the x86 specfic header file. Signed-off-by: Alexander Graf ag...@suse.de --- arch/x86/include/asm/kvm_para.h |6 ++ include/linux/kvm_para.h|5 - 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 05eba5e..7b562b6 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -158,6 +158,12 @@ static inline unsigned int kvm_arch_para_features(void) return cpuid_eax(KVM_CPUID_FEATURES); } +#ifdef CONFIG_KVM_GUEST +void __init kvm_guest_init(void); +#else +#define kvm_guest_init() do { } while (0) #endif +#endif /* __KERNEL__ */ + #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index ac2015a..47a070b 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -26,11 +26,6 @@ #include asm/kvm_para.h #ifdef __KERNEL__ -#ifdef CONFIG_KVM_GUEST -void __init kvm_guest_init(void); -#else -#define kvm_guest_init() do { } while (0) -#endif static inline int kvm_para_has_feature(unsigned int feature) { -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 04/27] KVM: PPC: Convert DAR to shared page.
The DAR register contains the address a data page fault occured at. This register behaves pretty much like a simple data storage register that gets written to on data faults. There is no hypervisor interaction required on read or write. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c| 14 +++--- arch/powerpc/kvm/book3s_emulate.c|6 +++--- arch/powerpc/kvm/book3s_paired_singles.c |2 +- arch/powerpc/kvm/booke.c |2 +- arch/powerpc/kvm/booke_emulate.c |4 ++-- 7 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ba20f90..c852408 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -231,7 +231,6 @@ struct kvm_vcpu_arch { ulong csrr1; ulong dsrr0; ulong dsrr1; - ulong dear; ulong esr; u32 dec; u32 decar; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 9f7565b..ec72a1c 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 dar; __u64 msr; __u32 dsisr; }; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index eb401b6..4d46f8b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -594,14 +594,14 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (page_found == -ENOENT) { /* Page not found in guest PTE entries */ - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; vcpu-arch.shared-msr |= (to_svcpu(vcpu)-shadow_srr1 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EPERM) { /* Storage protection */ - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr ~DSISR_NOHPTE; vcpu-arch.shared-dsisr |= DSISR_PROTFAULT; @@ -610,7 +610,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EINVAL) { /* Page not found in guest SLB */ - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80); } else if (!is_mmio kvmppc_visible_gfn(vcpu, pte.raddr PAGE_SHIFT)) { @@ -867,17 +867,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, if (to_svcpu(vcpu)-fault_dsisr DSISR_NOHPTE) { r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); } else { - vcpu-arch.dear = dar; + vcpu-arch.shared-dar = dar; vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); - kvmppc_mmu_pte_flush(vcpu, vcpu-arch.dear, ~0xFFFUL); + kvmppc_mmu_pte_flush(vcpu, dar, ~0xFFFUL); r = RESUME_GUEST; } break; } case BOOK3S_INTERRUPT_DATA_SEGMENT: if (kvmppc_mmu_map_segment(vcpu, kvmppc_get_fault_dar(vcpu)) 0) { - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_SEGMENT); } @@ -997,7 +997,7 @@ program_interrupt: if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { vcpu-arch.shared-dsisr = kvmppc_alignment_dsisr(vcpu, kvmppc_get_last_inst(vcpu)); - vcpu-arch.dear = kvmppc_alignment_dar(vcpu, + vcpu-arch.shared-dar = kvmppc_alignment_dar(vcpu, kvmppc_get_last_inst(vcpu)); kvmppc_book3s_queue_irqprio(vcpu, exit_nr); } diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 9982ff1..c147864 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++
[PATCH 18/27] KVM: PPC: PV instructions to loads and stores
Some instructions can simply be replaced by load and store instructions to or from the magic page. This patch replaces often called instructions that fall into the above category. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use kvm_patch_ins --- arch/powerpc/kernel/kvm.c | 109 + 1 files changed, 109 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index e93366f..9ec572c 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -33,7 +33,34 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +#define KVM_INST_LWZ 0x8000 +#define KVM_INST_STW 0x9000 +#define KVM_INST_LD0xe800 +#define KVM_INST_STD 0xf800 +#define KVM_INST_NOP 0x6000 +#define KVM_INST_B 0x4800 +#define KVM_INST_B_MASK0x03ff +#define KVM_INST_B_MAX 0x01ff + #define KVM_MASK_RT0x03e0 +#define KVM_INST_MFMSR 0x7ca6 +#define KVM_INST_MFSPR_SPRG0 0x7c1042a6 +#define KVM_INST_MFSPR_SPRG1 0x7c1142a6 +#define KVM_INST_MFSPR_SPRG2 0x7c1242a6 +#define KVM_INST_MFSPR_SPRG3 0x7c1342a6 +#define KVM_INST_MFSPR_SRR00x7c1a02a6 +#define KVM_INST_MFSPR_SRR10x7c1b02a6 +#define KVM_INST_MFSPR_DAR 0x7c1302a6 +#define KVM_INST_MFSPR_DSISR 0x7c1202a6 + +#define KVM_INST_MTSPR_SPRG0 0x7c1043a6 +#define KVM_INST_MTSPR_SPRG1 0x7c1143a6 +#define KVM_INST_MTSPR_SPRG2 0x7c1243a6 +#define KVM_INST_MTSPR_SPRG3 0x7c1343a6 +#define KVM_INST_MTSPR_SRR00x7c1a03a6 +#define KVM_INST_MTSPR_SRR10x7c1b03a6 +#define KVM_INST_MTSPR_DAR 0x7c1303a6 +#define KVM_INST_MTSPR_DSISR 0x7c1203a6 static bool kvm_patching_worked = true; @@ -43,6 +70,34 @@ static inline void kvm_patch_ins(u32 *inst, u32 new_inst) flush_icache_range((ulong)inst, (ulong)inst + 4); } +static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + kvm_patch_ins(inst, KVM_INST_LD | rt | (addr 0xfffc)); +#else + kvm_patch_ins(inst, KVM_INST_LWZ | rt | ((addr + 4) 0xfffc)); +#endif +} + +static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt) +{ + kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr 0x)); +} + +static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + kvm_patch_ins(inst, KVM_INST_STD | rt | (addr 0xfffc)); +#else + kvm_patch_ins(inst, KVM_INST_STW | rt | ((addr + 4) 0xfffc)); +#endif +} + +static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) +{ + kvm_patch_ins(inst, KVM_INST_STW | rt | (addr 0xfffc)); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -57,6 +112,60 @@ static void kvm_check_ins(u32 *inst) u32 inst_rt = _inst KVM_MASK_RT; switch (inst_no_rt) { + /* Loads */ + case KVM_INST_MFMSR: + kvm_patch_ins_ld(inst, magic_var(msr), inst_rt); + break; + case KVM_INST_MFSPR_SPRG0: + kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MFSPR_SPRG1: + kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MFSPR_SPRG2: + kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MFSPR_SPRG3: + kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MFSPR_SRR0: + kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MFSPR_SRR1: + kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MFSPR_DAR: + kvm_patch_ins_ld(inst, magic_var(dar), inst_rt); + break; + case KVM_INST_MFSPR_DSISR: + kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt); + break; + + /* Stores */ + case KVM_INST_MTSPR_SPRG0: + kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MTSPR_SPRG1: + kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MTSPR_SPRG2: + kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MTSPR_SPRG3: + kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MTSPR_SRR0: + kvm_patch_ins_std(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MTSPR_SRR1: + kvm_patch_ins_std(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MTSPR_DAR: + kvm_patch_ins_std(inst,
[PATCH 05/27] KVM: PPC: Convert SRR0 and SRR1 to shared page
The SRR0 and SRR1 registers contain cached values of the PC and MSR respectively. They get written to by the hypervisor when an interrupt occurs or directly by the kernel. They are also used to tell the rfi(d) instruction where to jump to. Because it only gets touched on defined events that, it's very simple to share with the guest. Hypervisor and guest both have full r/w access. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 -- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/book3s.c | 12 ++-- arch/powerpc/kvm/book3s_emulate.c |4 ++-- arch/powerpc/kvm/booke.c| 15 --- arch/powerpc/kvm/booke_emulate.c|4 ++-- arch/powerpc/kvm/emulate.c | 12 7 files changed, 28 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index c852408..5255d75 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -225,8 +225,6 @@ struct kvm_vcpu_arch { ulong sprg5; ulong sprg6; ulong sprg7; - ulong srr0; - ulong srr1; ulong csrr0; ulong csrr1; ulong dsrr0; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index ec72a1c..d7fc6c2 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,8 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 srr0; + __u64 srr1; __u64 dar; __u64 msr; __u32 dsisr; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 4d46f8b..afa0dd4 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr) void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) { - vcpu-arch.srr0 = kvmppc_get_pc(vcpu); - vcpu-arch.srr1 = vcpu-arch.shared-msr | flags; + vcpu-arch.shared-srr0 = kvmppc_get_pc(vcpu); + vcpu-arch.shared-srr1 = vcpu-arch.shared-msr | flags; kvmppc_set_pc(vcpu, to_book3s(vcpu)-hior + vec); vcpu-arch.mmu.reset_msr(vcpu); } @@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-lr = kvmppc_get_lr(vcpu); regs-xer = kvmppc_get_xer(vcpu); regs-msr = vcpu-arch.shared-msr; - regs-srr0 = vcpu-arch.srr0; - regs-srr1 = vcpu-arch.srr1; + regs-srr0 = vcpu-arch.shared-srr0; + regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; regs-sprg0 = vcpu-arch.sprg0; regs-sprg1 = vcpu-arch.sprg1; @@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_lr(vcpu, regs-lr); kvmppc_set_xer(vcpu, regs-xer); kvmppc_set_msr(vcpu, regs-msr); - vcpu-arch.srr0 = regs-srr0; - vcpu-arch.srr1 = regs-srr1; + vcpu-arch.shared-srr0 = regs-srr0; + vcpu-arch.shared-srr1 = regs-srr1; vcpu-arch.sprg0 = regs-sprg0; vcpu-arch.sprg1 = regs-sprg1; vcpu-arch.sprg2 = regs-sprg2; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index c147864..f333cb4 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_xop(inst)) { case OP_19_XOP_RFID: case OP_19_XOP_RFI: - kvmppc_set_pc(vcpu, vcpu-arch.srr0); - kvmppc_set_msr(vcpu, vcpu-arch.srr1); + kvmppc_set_pc(vcpu, vcpu-arch.shared-srr0); + kvmppc_set_msr(vcpu, vcpu-arch.shared-srr1); *advance = 0; break; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 4aab6d2..793df28 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu) printk(pc: %08lx msr: %08llx\n, vcpu-arch.pc, vcpu-arch.shared-msr); printk(lr: %08lx ctr: %08lx\n, vcpu-arch.lr, vcpu-arch.ctr); - printk(srr0: %08lx srr1: %08lx\n, vcpu-arch.srr0, vcpu-arch.srr1); + printk(srr0: %08llx srr1: %08llx\n, vcpu-arch.shared-srr0, + vcpu-arch.shared-srr1); printk(exceptions: %08lx\n, vcpu-arch.pending_exceptions); @@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, } if (allowed) { - vcpu-arch.srr0 = vcpu-arch.pc; - vcpu-arch.srr1 = vcpu-arch.shared-msr; + vcpu-arch.shared-srr0 = vcpu-arch.pc; +
[PATCH 07/27] KVM: PPC: Implement hypervisor interface
To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - change hypervisor calls to use new register values v2 - v3: - move PV interface to ePAPR - only check R0 on hypercall - remove PVR hack - align hypercalls to in/out of ePAPR - add kvm.c with hypercall function --- arch/powerpc/include/asm/kvm_para.h | 114 ++- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kernel/Makefile|2 + arch/powerpc/kernel/kvm.c | 68 + arch/powerpc/kvm/book3s.c |9 ++- arch/powerpc/kvm/booke.c| 10 +++- arch/powerpc/kvm/powerpc.c | 32 ++ include/linux/kvm_para.h|1 + 8 files changed, 233 insertions(+), 4 deletions(-) create mode 100644 arch/powerpc/kernel/kvm.c diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index e402999..556fd59 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -21,6 +21,7 @@ #define __POWERPC_KVM_PARA_H__ #include linux/types.h +#include linux/of.h struct kvm_vcpu_arch_shared { __u64 sprg0; @@ -34,16 +35,127 @@ struct kvm_vcpu_arch_shared { __u32 dsisr; }; +#define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */ +#define HC_VENDOR_KVM (42 16) +#define HC_EV_SUCCESS 0 +#define HC_EV_UNIMPLEMENTED12 + #ifdef __KERNEL__ +#ifdef CONFIG_KVM_GUEST + +static inline int kvm_para_available(void) +{ + struct device_node *hyper_node; + + hyper_node = of_find_node_by_path(/hypervisor); + if (!hyper_node) + return 0; + + if (!of_device_is_compatible(hyper_node, linux,kvm)) + return 0; + + return 1; +} + +extern unsigned long kvm_hypercall(unsigned long *in, + unsigned long *out, + unsigned long nr); + +#else + static inline int kvm_para_available(void) { return 0; } +static unsigned long kvm_hypercall(unsigned long *in, + unsigned long *out, + unsigned long nr) +{ + return HC_EV_UNIMPLEMENTED; +} + +#endif + +static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2) +{ + unsigned long in[8]; + unsigned long out[8]; + unsigned long r; + + r = kvm_hypercall(in, out, nr | HC_VENDOR_KVM); + *r2 = out[0]; + + return r; +} + +static inline long kvm_hypercall0(unsigned int nr) +{ + unsigned long in[8]; + unsigned long out[8]; + + return kvm_hypercall(in, out, nr | HC_VENDOR_KVM); +} + +static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) +{ + unsigned long in[8]; + unsigned long out[8]; + + in[0] = p1; + return kvm_hypercall(in, out, nr | HC_VENDOR_KVM); +} + +static inline long kvm_hypercall2(unsigned int nr, unsigned long p1, + unsigned long p2) +{ + unsigned long in[8]; + unsigned long out[8]; + + in[0] = p1; + in[1] = p2; + return kvm_hypercall(in, out, nr | HC_VENDOR_KVM); +} + +static inline long kvm_hypercall3(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3) +{ + unsigned long in[8]; + unsigned long out[8]; + + in[0] = p1; + in[1] = p2; + in[2] = p3; + return kvm_hypercall(in, out, nr | HC_VENDOR_KVM); +} + +static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + unsigned long in[8]; + unsigned long out[8]; + + in[0] = p1; + in[1] = p2; + in[2] = p3; + in[3] = p4; + return kvm_hypercall(in, out, nr | HC_VENDOR_KVM); +} + + static inline unsigned int kvm_arch_para_features(void) { - return 0; + unsigned long r; + + if (!kvm_para_available()) + return 0; + + if(kvm_hypercall0_1(KVM_HC_FEATURES, r)) + return 0; + + return r; } #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 18d139e..ecb3bc7 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -107,6 +107,7 @@ extern int kvmppc_booke_init(void); extern void kvmppc_booke_exit(void); extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); +extern int kvmppc_kvm_pv(struct
[PATCH 03/27] KVM: PPC: Convert DSISR to shared page
The DSISR register contains information about a data page fault. It is fully read/write from inside the guest context and we don't need to worry about interacting based on writes of this register. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h|1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c| 11 ++- arch/powerpc/kvm/book3s_emulate.c|6 +++--- arch/powerpc/kvm/book3s_paired_singles.c |2 +- 5 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 8274a2d..b5b1961 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -85,7 +85,6 @@ struct kvmppc_vcpu_book3s { u64 hid[6]; u64 gqr[8]; int slb_nr; - u32 dsisr; u64 sdr1; u64 hior; u64 msr_mask; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index a17dc52..9f7565b 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -24,6 +24,7 @@ struct kvm_vcpu_arch_shared { __u64 msr; + __u32 dsisr; }; #ifdef __KERNEL__ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 2efe692..eb401b6 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -595,15 +595,16 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (page_found == -ENOENT) { /* Page not found in guest PTE entries */ vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); - to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr; + vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; vcpu-arch.shared-msr |= (to_svcpu(vcpu)-shadow_srr1 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EPERM) { /* Storage protection */ vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); - to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr ~DSISR_NOHPTE; - to_book3s(vcpu)-dsisr |= DSISR_PROTFAULT; + vcpu-arch.shared-dsisr = + to_svcpu(vcpu)-fault_dsisr ~DSISR_NOHPTE; + vcpu-arch.shared-dsisr |= DSISR_PROTFAULT; vcpu-arch.shared-msr |= (to_svcpu(vcpu)-shadow_srr1 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); @@ -867,7 +868,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); } else { vcpu-arch.dear = dar; - to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr; + vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); kvmppc_mmu_pte_flush(vcpu, vcpu-arch.dear, ~0xFFFUL); r = RESUME_GUEST; @@ -994,7 +995,7 @@ program_interrupt: } case BOOK3S_INTERRUPT_ALIGNMENT: if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { - to_book3s(vcpu)-dsisr = kvmppc_alignment_dsisr(vcpu, + vcpu-arch.shared-dsisr = kvmppc_alignment_dsisr(vcpu, kvmppc_get_last_inst(vcpu)); vcpu-arch.dear = kvmppc_alignment_dar(vcpu, kvmppc_get_last_inst(vcpu)); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 35d3c16..9982ff1 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -221,7 +221,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, else if (r == -EPERM) dsisr |= DSISR_PROTFAULT; - to_book3s(vcpu)-dsisr = dsisr; + vcpu-arch.shared-dsisr = dsisr; to_svcpu(vcpu)-fault_dsisr = dsisr; kvmppc_book3s_queue_irqprio(vcpu, @@ -327,7 +327,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) to_book3s(vcpu)-sdr1 = spr_val; break; case SPRN_DSISR: - to_book3s(vcpu)-dsisr = spr_val; + vcpu-arch.shared-dsisr = spr_val; break; case SPRN_DAR: vcpu-arch.dear = spr_val; @@ -440,7 +440,7 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)-sdr1);
[PATCH 21/27] KVM: PPC: Introduce branch patching helper
We will need to patch several instruction streams over to a different code path, so we need a way to patch a single instruction with a branch somewhere else. This patch adds a helper to facilitate this patching. Signed-off-by: Alexander Graf ag...@suse.de --- v2 - v3: - add safety check for relocatable kernels --- arch/powerpc/kernel/kvm.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 926f93f..239a70d 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -107,6 +107,20 @@ static void kvm_patch_ins_nop(u32 *inst) kvm_patch_ins(inst, KVM_INST_NOP); } +static void kvm_patch_ins_b(u32 *inst, int addr) +{ +#ifdef CONFIG_RELOCATABLE + /* On relocatable kernels interrupts handlers and our code + can be in different regions, so we don't patch them */ + + extern u32 __end_interrupts; + if ((ulong)inst (ulong)__end_interrupts) + return; +#endif + + kvm_patch_ins(inst, KVM_INST_B | (addr KVM_INST_B_MASK)); +} + static u32 *kvm_alloc(int len) { u32 *p; -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 24/27] KVM: PPC: PV mtmsrd L=0 and mtmsr
There is also a form of mtmsr where all bits need to be addressed. While the PPC64 Linux kernel behaves resonably well here, on PPC32 we do not have an L=1 form. It does mtmsr even for simple things like only changing EE. So we need to hook into that one as well and check for a mask of bits that we deem safe to change from within guest context. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use kvm_patch_ins_b --- arch/powerpc/kernel/kvm.c | 51 arch/powerpc/kernel/kvm_emul.S | 84 2 files changed, 135 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 717ab0d..8ac57e2 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -63,7 +63,9 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L0 0x7c000164 #define KVM_INST_MTMSRD_L1 0x7c010164 +#define KVM_INST_MTMSR 0x7c000124 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -176,6 +178,49 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) kvm_patch_ins_b(inst, distance_start); } +extern u32 kvm_emulate_mtmsr_branch_offs; +extern u32 kvm_emulate_mtmsr_reg1_offs; +extern u32 kvm_emulate_mtmsr_reg2_offs; +extern u32 kvm_emulate_mtmsr_reg3_offs; +extern u32 kvm_emulate_mtmsr_orig_ins_offs; +extern u32 kvm_emulate_mtmsr_len; +extern u32 kvm_emulate_mtmsr[]; + +static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsr_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_mtmsr_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4); + p[kvm_emulate_mtmsr_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_mtmsr_reg1_offs] |= rt; + p[kvm_emulate_mtmsr_reg2_offs] |= rt; + p[kvm_emulate_mtmsr_reg3_offs] |= rt; + p[kvm_emulate_mtmsr_orig_ins_offs] = *inst; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4); + + /* Patch the invocation */ + kvm_patch_ins_b(inst, distance_start); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -256,6 +301,12 @@ static void kvm_check_ins(u32 *inst) if (get_rt(inst_rt) 30) kvm_patch_ins_mtmsrd(inst, inst_rt); break; + case KVM_INST_MTMSR: + case KVM_INST_MTMSRD_L0: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) 30) + kvm_patch_ins_mtmsr(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 10dc4a6..8cd22f4 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -120,3 +120,87 @@ kvm_emulate_mtmsrd_reg_offs: .global kvm_emulate_mtmsrd_len kvm_emulate_mtmsrd_len: .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 + + +#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI) +#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS + +.global kvm_emulate_mtmsr +kvm_emulate_mtmsr: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Find the changed bits between old and new MSR */ +kvm_emulate_mtmsr_reg1: + xor r31, r0, r31 + + /* Check if we need to really do mtmsr */ + LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS) + and.r31, r31, r30 + + /* No critical bits changed? Maybe we can stay in the guest. */ + beq maybe_stay_in_guest + +do_mtmsr: + + SCRATCH_RESTORE + + /* Just fire off the mtmsr if it's critical */ +kvm_emulate_mtmsr_orig_ins: + mtmsr r0 + + b kvm_emulate_mtmsr_branch + +maybe_stay_in_guest: + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_mtmsr + + /* Check if we may trigger an interrupt */ +kvm_emulate_mtmsr_reg2: + andi. r31, r0, MSR_EE + beq no_mtmsr + + b do_mtmsr + +no_mtmsr: + + /* Put MSR into magic page because we don't call mtmsr */ +kvm_emulate_mtmsr_reg3: + STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to
[PATCH 27/27] KVM: PPC: Add get_pvinfo interface to query hypercall instructions
We need to tell the guest the opcodes that make up a hypercall through interfaces that are controlled by userspace. So we need to add a call for userspace to allow it to query those opcodes so it can pass them on. This is required because the hypercall opcodes can change based on the hypervisor conditions. If we're running in hardware accelerated hypervisor mode, a hypercall looks different from when we're running without hardware acceleration. Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/kvm/api.txt | 23 +++ arch/powerpc/kvm/powerpc.c | 38 ++ include/linux/kvm.h| 11 +++ 3 files changed, 72 insertions(+), 0 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index 5f5b649..44d9893 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -1032,6 +1032,29 @@ are defined as follows: eax, ebx, ecx, edx: the values returned by the cpuid instruction for this function/index combination +4.46 KVM_PPC_GET_PVINFO + +Capability: KVM_CAP_PPC_GET_PVINFO +Architectures: ppc +Type: vm ioctl +Parameters: struct kvm_ppc_pvinfo (out) +Returns: 0 on success, !0 on error + +struct kvm_ppc_pvinfo { + __u32 flags; + __u32 hcall[4]; + __u8 pad[108]; +}; + +This ioctl fetches PV specific information that need to be passed to the guest +using the device tree or other means from vm context. + +For now the only implemented piece of information distributed here is an array +of 4 instructions that make up a hypercall. + +If any additional field gets added to this structure later on, a bit for that +additional piece of information will be set in the flags bitmap. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index fecfe04..6a53a3f 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -191,6 +191,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_PPC_UNSET_IRQ: case KVM_CAP_ENABLE_CAP: case KVM_CAP_PPC_OSI: + case KVM_CAP_PPC_GET_PVINFO: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -578,16 +579,53 @@ out: return r; } +static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo) +{ + u32 inst_lis = 0x3c00; + u32 inst_ori = 0x6000; + u32 inst_nop = 0x6000; + u32 inst_sc = 0x4402; + u32 inst_imm_mask = 0x; + + /* +* The hypercall to get into KVM from within guest context is as +* follows: +* +*lis r0, r0, kvm_sc_magic...@h +*ori r0, kvm_sc_magic...@l +*sc +*nop +*/ + pvinfo-hcall[0] = inst_lis | ((KVM_SC_MAGIC_R0 16) inst_imm_mask); + pvinfo-hcall[1] = inst_ori | (KVM_SC_MAGIC_R0 inst_imm_mask); + pvinfo-hcall[2] = inst_sc; + pvinfo-hcall[3] = inst_nop; + + return 0; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { + void __user *argp = (void __user *)arg; long r; switch (ioctl) { + case KVM_PPC_GET_PVINFO: { + struct kvm_ppc_pvinfo pvinfo; + r = kvm_vm_ioctl_get_pvinfo(pvinfo); + if (copy_to_user(argp, pvinfo, sizeof(pvinfo))) { + r = -EFAULT; + goto out; + } + + break; + } default: r = -ENOTTY; } +out: return r; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 636fc38..3707704 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -414,6 +414,14 @@ struct kvm_enable_cap { __u8 pad[64]; }; +/* for KVM_PPC_GET_PVINFO */ +struct kvm_ppc_pvinfo { + /* out */ + __u32 flags; + __u32 hcall[4]; + __u8 pad[108]; +}; + #define KVMIO 0xAE /* @@ -530,6 +538,7 @@ struct kvm_enable_cap { #ifdef __KVM_HAVE_XCRS #define KVM_CAP_XCRS 56 #endif +#define KVM_CAP_PPC_GET_PVINFO 57 #ifdef KVM_CAP_IRQ_ROUTING @@ -664,6 +673,8 @@ struct kvm_clock_data { /* Available with KVM_CAP_PIT_STATE2 */ #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2) #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2) +/* Available with KVM_CAP_PPC_GET_PVINFO */ +#define KVM_PPC_GET_PVINFO _IOW(KVMIO, 0xa1, struct kvm_ppc_pvinfo) /* * ioctls for vcpu fds -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 26/27] KVM: PPC: Add Documentation about PV interface
We just introduced a new PV interface that screams for documentation. So here it is - a shiny new and awesome text file describing the internal works of the PPC KVM paravirtual interface. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - clarify guest implementation - clarify that privileged instructions still work - explain safe MSR bits - Fix dsisr patch description - change hypervisor calls to use new register values v2 - v3: - update documentation to new hypercall interface - change detection to be device tree based --- Documentation/kvm/ppc-pv.txt | 180 ++ 1 files changed, 180 insertions(+), 0 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt new file mode 100644 index 000..960cd51 --- /dev/null +++ b/Documentation/kvm/ppc-pv.txt @@ -0,0 +1,180 @@ +The PPC KVM paravirtual interface += + +The basic execution principle by which KVM on PowerPC works is to run all kernel +space code in PR=1 which is user space. This way we trap all privileged +instructions and can emulate them accordingly. + +Unfortunately that is also the downfall. There are quite some privileged +instructions that needlessly return us to the hypervisor even though they +could be handled differently. + +This is what the PPC PV interface helps with. It takes privileged instructions +and transforms them into unprivileged ones with some help from the hypervisor. +This cuts down virtualization costs by about 50% on some of my benchmarks. + +The code for that interface can be found in arch/powerpc/kernel/kvm* + +Querying for existence +== + +To find out if we're running on KVM or not, we leverage the device tree. When +Linux is running on KVM, a node /hypervisor exists. That node contains a +compatible property with the value linux,kvm. + +Once you determined you're running under a PV capable KVM, you can now use +hypercalls as described below. + +KVM hypercalls +== + +Inside the device tree's /hypervisor node there's a property called +'hypercall-instructions'. This property contains at most 4 opcodes that make +up the hypercall. To call a hypercall, just call these instructions. + +The parameters are as follows: + + RegisterIN OUT + + r0 - volatile + r3 1st parameter Return code + r4 2nd parameter 1st output value + r5 3rd parameter 2nd output value + r6 4th parameter 3rd output value + r7 5th parameter 4th output value + r8 6th parameter 5th output value + r9 7th parameter 6th output value + r10 8th parameter 7th output value + r11 hypercall number8th output value + r12 - volatile + +Hypercall definitions are shared in generic code, so the same hypercall numbers +apply for x86 and powerpc alike with the exception that each KVM hypercall +also needs to be ORed with the KVM vendor code which is (42 16). + +Return codes can be as follows: + + CodeMeaning + + 0 Success + 12 Hypercall not implemented + 0 Error + +The magic page +== + +To enable communication between the hypervisor and guest there is a new shared +page that contains parts of supervisor visible register state. The guest can +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE. + +With this hypercall issued the guest always gets the magic page mapped at the +desired location in effective and physical address space. For now, we always +map the page to -4096. This way we can access it using absolute load and store +functions. The following instruction reads the first field of the magic page: + + ld rX, -4096(0) + +The interface is designed to be extensible should there be need later to add +additional registers to the magic page. If you add fields to the magic page, +also define a new hypercall feature to indicate that the host can give you more +registers. Only if the host supports the additional features, make use of them. + +The magic page has the following layout as described in +arch/powerpc/include/asm/kvm_para.h: + +struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; + __u64 critical; /* Guest may not get interrupts if == r1 */ + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; + __u64 srr0; + __u64 srr1; + __u64 dar; + __u64 msr; + __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ +}; + +Additions to the
[PATCH 22/27] KVM: PPC: PV assembler helpers
When we hook an instruction we need to make sure we don't clobber any of the registers at that point. So we write them out to scratch space in the magic page. To make sure we don't fall into a race with another piece of hooked code, we need to disable interrupts. To make the later patches and code in general easier readable, let's introduce a set of defines that save and restore r30, r31 and cr. Let's also define some helpers to read the lower 32 bits of a 64 bit field on 32 bit systems. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm_emul.S | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index e0d4183..1dac72d 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -35,3 +35,32 @@ kvm_hypercall_start: #define KVM_MAGIC_PAGE (-4096) +#ifdef CONFIG_64BIT +#define LL64(reg, offs, reg2) ld reg, (offs)(reg2) +#define STL64(reg, offs, reg2) std reg, (offs)(reg2) +#else +#define LL64(reg, offs, reg2) lwz reg, (offs + 4)(reg2) +#define STL64(reg, offs, reg2) stw reg, (offs + 4)(reg2) +#endif + +#define SCRATCH_SAVE \ + /* Enable critical section. We are critical if \ + shared-critical == r1 */\ + STL64(r1, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); \ + \ + /* Save state */\ + PPC_STL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0); \ + PPC_STL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0); \ + mfcrr31;\ + stw r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0); + +#define SCRATCH_RESTORE \ + /* Restore state */ \ + PPC_LL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0); \ + lwz r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0); \ + mtcrr30;\ + PPC_LL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0); \ + \ + /* Disable critical section. We are critical if \ + shared-critical == r1 and r2 is always != r1 */ \ + STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 19/27] KVM: PPC: PV tlbsync to nop
With our current MMU scheme we don't need to know about the tlbsync instruction. So we can just nop it out. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use kvm_patch_ins --- arch/powerpc/kernel/kvm.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 9ec572c..3258922 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -62,6 +62,8 @@ #define KVM_INST_MTSPR_DAR 0x7c1303a6 #define KVM_INST_MTSPR_DSISR 0x7c1203a6 +#define KVM_INST_TLBSYNC 0x7c00046c + static bool kvm_patching_worked = true; static inline void kvm_patch_ins(u32 *inst, u32 new_inst) @@ -98,6 +100,11 @@ static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) kvm_patch_ins(inst, KVM_INST_STW | rt | (addr 0xfffc)); } +static void kvm_patch_ins_nop(u32 *inst) +{ + kvm_patch_ins(inst, KVM_INST_NOP); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -166,6 +173,11 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_MTSPR_DSISR: kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt); break; + + /* Nops */ + case KVM_INST_TLBSYNC: + kvm_patch_ins_nop(inst); + break; } switch (_inst) { -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 16/27] KVM: PPC: Generic KVM PV guest support
We have all the hypervisor pieces in place now, but the guest parts are still missing. This patch implements basic awareness of KVM when running Linux as guest. It doesn't do anything with it yet though. Signed-off-by: Alexander Graf ag...@suse.de --- v2 - v3: - Add hypercall stub --- arch/powerpc/kernel/Makefile |2 +- arch/powerpc/kernel/asm-offsets.c | 15 +++ arch/powerpc/kernel/kvm.c |3 +++ arch/powerpc/kernel/kvm_emul.S| 37 + arch/powerpc/platforms/Kconfig| 10 ++ 5 files changed, 66 insertions(+), 1 deletions(-) create mode 100644 arch/powerpc/kernel/kvm_emul.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 5ea853d..d8e29b4 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -125,7 +125,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),) obj-y += ppc_save_regs.o endif -obj-$(CONFIG_KVM_GUEST)+= kvm.o +obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o # Disable GCOV in odd or sensitive code GCOV_PROFILE_prom_init.o := n diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a55d47e..e3e740b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -465,6 +465,21 @@ int main(void) DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr)); #endif /* CONFIG_PPC_BOOK3S */ #endif + +#ifdef CONFIG_KVM_GUEST + DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared, + scratch1)); + DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared, + scratch2)); + DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared, + scratch3)); + DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared, + int_pending)); + DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); + DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared, + critical)); +#endif + #ifdef CONFIG_44x DEFINE(PGD_T_LOG2, PGD_T_LOG2); DEFINE(PTE_T_LOG2, PTE_T_LOG2); diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 4f85505..a5ece71 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -30,6 +30,9 @@ #include asm/cacheflush.h #include asm/disassemble.h +#define KVM_MAGIC_PAGE (-4096L) +#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) + unsigned long kvm_hypercall(unsigned long *in, unsigned long *out, unsigned long nr) diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S new file mode 100644 index 000..e0d4183 --- /dev/null +++ b/arch/powerpc/kernel/kvm_emul.S @@ -0,0 +1,37 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright SUSE Linux Products GmbH 2010 + * + * Authors: Alexander Graf ag...@suse.de + */ + +#include asm/ppc_asm.h +#include asm/kvm_asm.h +#include asm/reg.h +#include asm/page.h +#include asm/asm-offsets.h + +/* Hypercall entry point. Will be patched with device tree instructions. */ + +.global kvm_hypercall_start +kvm_hypercall_start: + li r3, -1 + nop + nop + nop + blr + +#define KVM_MAGIC_PAGE (-4096) + diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index d1663db..1744349 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -21,6 +21,16 @@ source arch/powerpc/platforms/44x/Kconfig source arch/powerpc/platforms/40x/Kconfig source arch/powerpc/platforms/amigaone/Kconfig +config KVM_GUEST + bool KVM Guest support + default y + ---help--- + This option enables various optimizations for running under the KVM + hypervisor. Overhead for the kernel when not running inside KVM should + be minimal. + + In case of doubt, say Y + config PPC_NATIVE bool depends on 6xx || PPC64 -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org
[PATCH 23/27] KVM: PPC: PV mtmsrd L=1
The PowerPC ISA has a special instruction for mtmsr that only changes the EE and RI bits, namely the L=1 form. Since that one is reasonably often occuring and simple to implement, let's go with this first. Writing EE=0 is always just a store. Doing EE=1 also requires us to check for pending interrupts and if necessary exit back to the hypervisor. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use kvm_patch_ins_b --- arch/powerpc/kernel/kvm.c | 45 arch/powerpc/kernel/kvm_emul.S | 56 2 files changed, 101 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 239a70d..717ab0d 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -63,6 +63,7 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L1 0x7c010164 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -138,6 +139,43 @@ static u32 *kvm_alloc(int len) return p; } +extern u32 kvm_emulate_mtmsrd_branch_offs; +extern u32 kvm_emulate_mtmsrd_reg_offs; +extern u32 kvm_emulate_mtmsrd_len; +extern u32 kvm_emulate_mtmsrd[]; + +static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsrd_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_mtmsrd_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4); + p[kvm_emulate_mtmsrd_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_mtmsrd_reg_offs] |= rt; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4); + + /* Patch the invocation */ + kvm_patch_ins_b(inst, distance_start); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -211,6 +249,13 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_TLBSYNC: kvm_patch_ins_nop(inst); break; + + /* Rewrites */ + case KVM_INST_MTMSRD_L1: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) 30) + kvm_patch_ins_mtmsrd(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 1dac72d..10dc4a6 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -64,3 +64,59 @@ kvm_hypercall_start: /* Disable critical section. We are critical if \ shared-critical == r1 and r2 is always != r1 */ \ STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); + +.global kvm_emulate_mtmsrd +kvm_emulate_mtmsrd: + + SCRATCH_SAVE + + /* Put MSR ~(MSR_EE|MSR_RI) in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + lis r30, (~(MSR_EE | MSR_RI))@h + ori r30, r30, (~(MSR_EE | MSR_RI))@l + and r31, r31, r30 + + /* OR the register's (MSR_EE|MSR_RI) on MSR */ +kvm_emulate_mtmsrd_reg: + andi. r30, r0, (MSR_EE|MSR_RI) + or r31, r31, r30 + + /* Put MSR back into magic page */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_check + + /* Check if we may trigger an interrupt */ + andi. r30, r30, MSR_EE + beq no_check + + SCRATCH_RESTORE + + /* Nag hypervisor */ + tlbsync + + b kvm_emulate_mtmsrd_branch + +no_check: + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_mtmsrd_branch: + b . +kvm_emulate_mtmsrd_end: + +.global kvm_emulate_mtmsrd_branch_offs +kvm_emulate_mtmsrd_branch_offs: + .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_reg_offs +kvm_emulate_mtmsrd_reg_offs: + .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_len +kvm_emulate_mtmsrd_len: + .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 13/27] KVM: PPC: Magic Page Book3s support
We need to override EA as well as PA lookups for the magic page. When the guest tells us to project it, the magic page overrides any guest mappings. In order to reflect that, we need to hook into all the MMU layers of KVM to force map the magic page if necessary. Signed-off-by: Alexander Graf ag...@suse.de --- v2 - v3: - RMO - PAM - combine 32 and 64 real page magic override - remove leftover goto point - align hypercalls to in/out of ePAPR --- arch/powerpc/include/asm/kvm_book3s.h |1 + arch/powerpc/kvm/book3s.c | 35 ++-- arch/powerpc/kvm/book3s_32_mmu.c | 16 +++ arch/powerpc/kvm/book3s_32_mmu_host.c |2 +- arch/powerpc/kvm/book3s_64_mmu.c | 30 +++- arch/powerpc/kvm/book3s_64_mmu_host.c |9 +-- 6 files changed, 81 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b5b1961..00cf8b0 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -130,6 +130,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, bool upper, u32 val); extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); +extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); extern u32 kvmppc_trampoline_lowmem; extern u32 kvmppc_trampoline_enter; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 0ed5376..eee97b5 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -419,6 +419,25 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) } } +pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + ulong mp_pa = vcpu-arch.magic_page_pa; + + /* Magic page override */ + if (unlikely(mp_pa) + unlikely(((gfn PAGE_SHIFT) KVM_PAM) == +((mp_pa PAGE_MASK) KVM_PAM))) { + ulong shared_page = ((ulong)vcpu-arch.shared) PAGE_MASK; + pfn_t pfn; + + pfn = (pfn_t)virt_to_phys((void*)shared_page) PAGE_SHIFT; + get_page(pfn_to_page(pfn)); + return pfn; + } + + return gfn_to_pfn(vcpu-kvm, gfn); +} + /* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To * make Book3s_32 Linux work on Book3s_64, we have to make sure we trap dcbz to * emulate 32 bytes dcbz length. @@ -554,6 +573,13 @@ mmio: static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) { + ulong mp_pa = vcpu-arch.magic_page_pa; + + if (unlikely(mp_pa) + unlikely((mp_pa KVM_PAM) PAGE_SHIFT == gfn)) { + return 1; + } + return kvm_is_visible_gfn(vcpu-kvm, gfn); } @@ -1257,6 +1283,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) struct kvmppc_vcpu_book3s *vcpu_book3s; struct kvm_vcpu *vcpu; int err = -ENOMEM; + unsigned long p; vcpu_book3s = vmalloc(sizeof(struct kvmppc_vcpu_book3s)); if (!vcpu_book3s) @@ -1274,8 +1301,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_shadow_vcpu; - vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); - if (!vcpu-arch.shared) + p = __get_free_page(GFP_KERNEL|__GFP_ZERO); + /* the real shared page fills the last 4k of our page */ + vcpu-arch.shared = (void*)(p + PAGE_SIZE - 4096); + if (!p) goto uninit_vcpu; vcpu-arch.host_retip = kvm_return_point; @@ -1322,7 +1351,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); - free_page((unsigned long)vcpu-arch.shared); + free_page((unsigned long)vcpu-arch.shared PAGE_MASK); kvm_vcpu_uninit(vcpu); kfree(vcpu_book3s-shadow_vcpu); vfree(vcpu_book3s); diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 449bce5..a7d121a 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data) { int r; + ulong mp_ea = vcpu-arch.magic_page_ea; pte-eaddr = eaddr; + + /* Magic page override */ + if (unlikely(mp_ea) + unlikely((eaddr ~0xfffULL) == (mp_ea ~0xfffULL)) + !(vcpu-arch.shared-msr MSR_PR)) { + pte-vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data); + pte-raddr = vcpu-arch.magic_page_pa | (pte-raddr 0xfff); + pte-raddr = KVM_PAM; + pte-may_execute = true; +
[PATCH 25/27] KVM: PPC: PV wrteei
On BookE the preferred way to write the EE bit is the wrteei instruction. It already encodes the EE bit in the instruction. So in order to get BookE some speedups as well, let's also PV'nize thati instruction. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use kvm_patch_ins_b --- arch/powerpc/kernel/kvm.c | 50 arch/powerpc/kernel/kvm_emul.S | 41 2 files changed, 91 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 8ac57e2..e936817 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -67,6 +67,9 @@ #define KVM_INST_MTMSRD_L1 0x7c010164 #define KVM_INST_MTMSR 0x7c000124 +#define KVM_INST_WRTEEI_0 0x7c000146 +#define KVM_INST_WRTEEI_1 0x7c008146 + static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; static int kvm_tmp_index; @@ -221,6 +224,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) kvm_patch_ins_b(inst, distance_start); } +#ifdef CONFIG_BOOKE + +extern u32 kvm_emulate_wrteei_branch_offs; +extern u32 kvm_emulate_wrteei_ee_offs; +extern u32 kvm_emulate_wrteei_len; +extern u32 kvm_emulate_wrteei[]; + +static void kvm_patch_ins_wrteei(u32 *inst) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_wrteei_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_wrteei_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4); + p[kvm_emulate_wrteei_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_wrteei_ee_offs] |= (*inst MSR_EE); + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4); + + /* Patch the invocation */ + kvm_patch_ins_b(inst, distance_start); +} + +#endif + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -310,6 +354,12 @@ static void kvm_check_ins(u32 *inst) } switch (_inst) { +#ifdef CONFIG_BOOKE + case KVM_INST_WRTEEI_0: + case KVM_INST_WRTEEI_1: + kvm_patch_ins_wrteei(inst); + break; +#endif } } diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 8cd22f4..3199f65 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -204,3 +204,44 @@ kvm_emulate_mtmsr_orig_ins_offs: .global kvm_emulate_mtmsr_len kvm_emulate_mtmsr_len: .long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4 + + + +.global kvm_emulate_wrteei +kvm_emulate_wrteei: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Remove MSR_EE from old MSR */ + li r30, 0 + ori r30, r30, MSR_EE + andcr31, r31, r30 + + /* OR new MSR_EE onto the old MSR */ +kvm_emulate_wrteei_ee: + ori r31, r31, 0 + + /* Write new MSR value back */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_wrteei_branch: + b . +kvm_emulate_wrteei_end: + +.global kvm_emulate_wrteei_branch_offs +kvm_emulate_wrteei_branch_offs: + .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_ee_offs +kvm_emulate_wrteei_ee_offs: + .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_len +kvm_emulate_wrteei_len: + .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4 -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/7] KVM: PPC: Book3S_32 MMU debug compile fixes
Due to previous changes, the Book3S_32 guest MMU code didn't compile properly when enabling debugging. This patch repairs the broken code paths, making it possible to define DEBUG_MMU and friends again. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_32_mmu.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index a7d121a..5bf4bf8 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -104,7 +104,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3 pteg = (vcpu_book3s-sdr1 0x) | hash; dprintk(MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n, - vcpu_book3s-vcpu.arch.pc, eaddr, vcpu_book3s-sdr1, pteg, + kvmppc_get_pc(vcpu_book3s-vcpu), eaddr, vcpu_book3s-sdr1, pteg, sre-vsid); r = gfn_to_hva(vcpu_book3s-vcpu.kvm, pteg PAGE_SHIFT); @@ -269,7 +269,7 @@ no_page_found: dprintk_pte(KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n, to_book3s(vcpu)-sdr1, ptegp); for (i=0; i16; i+=2) { - dprintk_pte( %02d: 0x%x - 0x%x (0x%llx)\n, + dprintk_pte( %02d: 0x%x - 0x%x (0x%x)\n, i, pteg[i], pteg[i+1], ptem); } } -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 7/7] KVM: PPC: Move KVM trampolines before __end_interrupts
When using a relocatable kernel we need to make sure that the trampline code and the interrupt handlers are both copied to low memory. The only way to do this reliably is to put them in the copied section. This patch should make relocated kernels work with KVM. KVM-Stable-Tag Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/exceptions-64s.S |6 ++ arch/powerpc/kernel/head_64.S|6 -- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 3e423fb..a0f25fb 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -299,6 +299,12 @@ slb_miss_user_pseries: b . /* prevent spec. execution */ #endif /* __DISABLED__ */ +/* KVM's trampoline code needs to be close to the interrupt handlers */ + +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER +#include ../kvm/book3s_rmhandlers.S +#endif + .align 7 .globl __end_interrupts __end_interrupts: diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 844a44b..d3010a3 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -166,12 +166,6 @@ exception_marker: #include exceptions-64s.S #endif -/* KVM trampoline code needs to be close to the interrupt handlers */ - -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER -#include ../kvm/book3s_rmhandlers.S -#endif - _GLOBAL(generic_secondary_thread_init) mr r24,r3 -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/7] KVM: PPC: Add book3s_32 tlbie flush acceleration
On Book3s_32 the tlbie instruction flushed effective addresses by the mask 0x0000. This is pretty hard to reflect with a hash that hashes ~0xfff, so to speed up that target we should also keep a special hash around for it. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |4 +++ arch/powerpc/kvm/book3s_mmu_hpte.c | 40 ++ 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fafc71a..bba3b9b 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -42,9 +42,11 @@ #define HPTEG_CACHE_NUM(1 15) #define HPTEG_HASH_BITS_PTE13 +#define HPTEG_HASH_BITS_PTE_LONG 12 #define HPTEG_HASH_BITS_VPTE 13 #define HPTEG_HASH_BITS_VPTE_LONG 5 #define HPTEG_HASH_NUM_PTE (1 HPTEG_HASH_BITS_PTE) +#define HPTEG_HASH_NUM_PTE_LONG(1 HPTEG_HASH_BITS_PTE_LONG) #define HPTEG_HASH_NUM_VPTE(1 HPTEG_HASH_BITS_VPTE) #define HPTEG_HASH_NUM_VPTE_LONG (1 HPTEG_HASH_BITS_VPTE_LONG) @@ -163,6 +165,7 @@ struct kvmppc_mmu { struct hpte_cache { struct hlist_node list_pte; + struct hlist_node list_pte_long; struct hlist_node list_vpte; struct hlist_node list_vpte_long; struct rcu_head rcu_head; @@ -293,6 +296,7 @@ struct kvm_vcpu_arch { #ifdef CONFIG_PPC_BOOK3S struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; + struct hlist_head hpte_hash_pte_long[HPTEG_HASH_NUM_PTE_LONG]; struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE]; struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG]; int hpte_cache_count; diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c index b643893..02c64ab 100644 --- a/arch/powerpc/kvm/book3s_mmu_hpte.c +++ b/arch/powerpc/kvm/book3s_mmu_hpte.c @@ -45,6 +45,12 @@ static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) return hash_64(eaddr PTE_SIZE, HPTEG_HASH_BITS_PTE); } +static inline u64 kvmppc_mmu_hash_pte_long(u64 eaddr) +{ + return hash_64((eaddr 0x0000) PTE_SIZE, + HPTEG_HASH_BITS_PTE_LONG); +} + static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) { return hash_64(vpage 0xfULL, HPTEG_HASH_BITS_VPTE); @@ -66,6 +72,11 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte) index = kvmppc_mmu_hash_pte(pte-pte.eaddr); hlist_add_head_rcu(pte-list_pte, vcpu-arch.hpte_hash_pte[index]); + /* Add to ePTE_long list */ + index = kvmppc_mmu_hash_pte_long(pte-pte.eaddr); + hlist_add_head_rcu(pte-list_pte_long, + vcpu-arch.hpte_hash_pte_long[index]); + /* Add to vPTE list */ index = kvmppc_mmu_hash_vpte(pte-pte.vpage); hlist_add_head_rcu(pte-list_vpte, vcpu-arch.hpte_hash_vpte[index]); @@ -99,6 +110,7 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) spin_lock(vcpu-arch.mmu_lock); hlist_del_init_rcu(pte-list_pte); + hlist_del_init_rcu(pte-list_pte_long); hlist_del_init_rcu(pte-list_vpte); hlist_del_init_rcu(pte-list_vpte_long); @@ -150,10 +162,28 @@ static void kvmppc_mmu_pte_flush_page(struct kvm_vcpu *vcpu, ulong guest_ea) rcu_read_unlock(); } -void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask) +static void kvmppc_mmu_pte_flush_long(struct kvm_vcpu *vcpu, ulong guest_ea) { - u64 i; + struct hlist_head *list; + struct hlist_node *node; + struct hpte_cache *pte; + + /* Find the list of entries in the map */ + list = vcpu-arch.hpte_hash_pte_long[ + kvmppc_mmu_hash_pte_long(guest_ea)]; + rcu_read_lock(); + + /* Check the list for matching entries and invalidate */ + hlist_for_each_entry_rcu(pte, node, list, list_pte_long) + if ((pte-pte.eaddr 0x0000UL) == guest_ea) + invalidate_pte(vcpu, pte); + + rcu_read_unlock(); +} + +void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask) +{ dprintk_mmu(KVM: Flushing %d Shadow PTEs: 0x%lx 0x%lx\n, vcpu-arch.hpte_cache_count, guest_ea, ea_mask); @@ -164,9 +194,7 @@ void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask) kvmppc_mmu_pte_flush_page(vcpu, guest_ea); break; case 0x0000: - /* 32-bit flush w/o segment, go through all possible segments */ - for (i = 0; i 0x1ULL; i += 0x1000ULL) - kvmppc_mmu_pte_flush(vcpu, guest_ea | i, ~0xfffUL); + kvmppc_mmu_pte_flush_long(vcpu, guest_ea); break; case 0: /* Doing a complete
[PATCH 5/7] KVM: PPC: Use MSR_DR for external load_up
Book3S_32 requires MSR_DR to be disabled during load_up_xxx while on Book3S_64 it's supposed to be enabled. I misread the code and disabled it in both cases, potentially breaking the PS3 which has a really small RMA. This patch makes KVM work on the PS3 again. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_rmhandlers.S | 28 +++- 1 files changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 506d5c3..229d3d6 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -202,8 +202,25 @@ _GLOBAL(kvmppc_rmcall) #if defined(CONFIG_PPC_BOOK3S_32) #define STACK_LR INT_FRAME_SIZE+4 + +/* load_up_xxx have to run with MSR_DR=0 on Book3S_32 */ +#define MSR_EXT_START \ + PPC_STL r20, _NIP(r1); \ + mfmsr r20;\ + LOAD_REG_IMMEDIATE(r3, MSR_DR|MSR_EE); \ + andcr3,r20,r3; /* Disable DR,EE */ \ + mtmsr r3; \ + sync + +#define MSR_EXT_END\ + mtmsr r20;/* Enable DR,EE */ \ + sync; \ + PPC_LL r20, _NIP(r1) + #elif defined(CONFIG_PPC_BOOK3S_64) #define STACK_LR _LINK +#define MSR_EXT_START +#define MSR_EXT_END #endif /* @@ -215,19 +232,12 @@ _GLOBAL(kvmppc_load_up_ ## what); \ PPC_STLU r1, -INT_FRAME_SIZE(r1); \ mflrr3; \ PPC_STL r3, STACK_LR(r1); \ - PPC_STL r20, _NIP(r1); \ - mfmsr r20;\ - LOAD_REG_IMMEDIATE(r3, MSR_DR|MSR_EE); \ - andcr3,r20,r3; /* Disable DR,EE */ \ - mtmsr r3; \ - sync; \ + MSR_EXT_START; \ \ bl FUNC(load_up_ ## what); \ \ - mtmsr r20;/* Enable DR,EE */ \ - sync; \ + MSR_EXT_END;\ PPC_LL r3, STACK_LR(r1); \ - PPC_LL r20, _NIP(r1); \ mtlrr3; \ addir1, r1, INT_FRAME_SIZE; \ blr -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/7] KVM: PPC: RCU'ify the Book3s MMU
So far we've been running all code without locking of any sort. This wasn't really an issue because I didn't see any parallel access to the shadow MMU code coming. But then I started to implement dirty bitmapping to MOL which has the video code in its own thread, so suddenly we had the dirty bitmap code run in parallel to the shadow mmu code. And with that came trouble. So I went ahead and made the MMU modifying functions as parallelizable as I could think of. I hope I didn't screw up too much RCU logic :-). If you know your way around RCU and locking and what needs to be done when, please take a look at this patch. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/kvm/book3s_mmu_hpte.c | 78 ++ 2 files changed, 61 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e1da775..fafc71a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -165,6 +165,7 @@ struct hpte_cache { struct hlist_node list_pte; struct hlist_node list_vpte; struct hlist_node list_vpte_long; + struct rcu_head rcu_head; u64 host_va; u64 pfn; ulong slot; @@ -295,6 +296,7 @@ struct kvm_vcpu_arch { struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE]; struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG]; int hpte_cache_count; + spinlock_t mmu_lock; #endif }; diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c index 4868d4a..b643893 100644 --- a/arch/powerpc/kvm/book3s_mmu_hpte.c +++ b/arch/powerpc/kvm/book3s_mmu_hpte.c @@ -60,68 +60,94 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte) { u64 index; + spin_lock(vcpu-arch.mmu_lock); + /* Add to ePTE list */ index = kvmppc_mmu_hash_pte(pte-pte.eaddr); - hlist_add_head(pte-list_pte, vcpu-arch.hpte_hash_pte[index]); + hlist_add_head_rcu(pte-list_pte, vcpu-arch.hpte_hash_pte[index]); /* Add to vPTE list */ index = kvmppc_mmu_hash_vpte(pte-pte.vpage); - hlist_add_head(pte-list_vpte, vcpu-arch.hpte_hash_vpte[index]); + hlist_add_head_rcu(pte-list_vpte, vcpu-arch.hpte_hash_vpte[index]); /* Add to vPTE_long list */ index = kvmppc_mmu_hash_vpte_long(pte-pte.vpage); - hlist_add_head(pte-list_vpte_long, - vcpu-arch.hpte_hash_vpte_long[index]); + hlist_add_head_rcu(pte-list_vpte_long, + vcpu-arch.hpte_hash_vpte_long[index]); + + spin_unlock(vcpu-arch.mmu_lock); +} + +static void free_pte_rcu(struct rcu_head *head) +{ + struct hpte_cache *pte = container_of(head, struct hpte_cache, rcu_head); + kmem_cache_free(hpte_cache, pte); } static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) { + /* pte already invalidated? */ + if (hlist_unhashed(pte-list_pte)) + return; + dprintk_mmu(KVM: Flushing SPT: 0x%lx (0x%llx) - 0x%llx\n, pte-pte.eaddr, pte-pte.vpage, pte-host_va); /* Different for 32 and 64 bit */ kvmppc_mmu_invalidate_pte(vcpu, pte); + spin_lock(vcpu-arch.mmu_lock); + + hlist_del_init_rcu(pte-list_pte); + hlist_del_init_rcu(pte-list_vpte); + hlist_del_init_rcu(pte-list_vpte_long); + + spin_unlock(vcpu-arch.mmu_lock); + if (pte-pte.may_write) kvm_release_pfn_dirty(pte-pfn); else kvm_release_pfn_clean(pte-pfn); - hlist_del(pte-list_pte); - hlist_del(pte-list_vpte); - hlist_del(pte-list_vpte_long); - vcpu-arch.hpte_cache_count--; - kmem_cache_free(hpte_cache, pte); + call_rcu(pte-rcu_head, free_pte_rcu); } static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu) { struct hpte_cache *pte; - struct hlist_node *node, *tmp; + struct hlist_node *node; int i; + rcu_read_lock(); + for (i = 0; i HPTEG_HASH_NUM_VPTE_LONG; i++) { struct hlist_head *list = vcpu-arch.hpte_hash_vpte_long[i]; - hlist_for_each_entry_safe(pte, node, tmp, list, list_vpte_long) + hlist_for_each_entry_rcu(pte, node, list, list_vpte_long) invalidate_pte(vcpu, pte); } + + rcu_read_unlock(); } static void kvmppc_mmu_pte_flush_page(struct kvm_vcpu *vcpu, ulong guest_ea) { struct hlist_head *list; - struct hlist_node *node, *tmp; + struct hlist_node *node; struct hpte_cache *pte; /* Find the list of entries in the map */ list = vcpu-arch.hpte_hash_pte[kvmppc_mmu_hash_pte(guest_ea)]; + rcu_read_lock(); + /* Check the list for matching entries and invalidate */ - hlist_for_each_entry_safe(pte,
[PATCH 3/7] KVM: PPC: correctly check gfn_to_pfn() return value
From: Gleb Natapov g...@redhat.com On failure gfn_to_pfn returns bad_page so use correct function to check for that. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_32_mmu_host.c |2 +- arch/powerpc/kvm/book3s_64_mmu_host.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 05e8c9e..343452c 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -148,7 +148,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) /* Get host physical address for gpa */ hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte-raddr PAGE_SHIFT); - if (kvm_is_error_hva(hpaddr)) { + if (is_error_pfn(hpaddr)) { printk(KERN_INFO Couldn't get guest page for gfn %lx!\n, orig_pte-eaddr); return -EINVAL; diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index 6cdd19a..672b149 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -102,7 +102,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) /* Get host physical address for gpa */ hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte-raddr PAGE_SHIFT); - if (kvm_is_error_hva(hpaddr)) { + if (is_error_pfn(hpaddr)) { printk(KERN_INFO Couldn't get guest page for gfn %lx!\n, orig_pte-eaddr); return -EINVAL; } -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/7] Rest of my KVM-PPC patch queue
During the past few weeks a couple of fixes have gathered in my queue. This is a dump of everything that is not related to the PV framework. Please apply on top of the PV stuff. Alexander Graf (6): KVM: PPC: Book3S_32 MMU debug compile fixes KVM: PPC: RCU'ify the Book3s MMU KVM: PPC: Add book3s_32 tlbie flush acceleration KVM: PPC: Use MSR_DR for external load_up KVM: PPC: Make long relocations be ulong KVM: PPC: Move KVM trampolines before __end_interrupts Gleb Natapov (1): KVM: PPC: correctly check gfn_to_pfn() return value arch/powerpc/include/asm/kvm_book3s.h |4 +- arch/powerpc/include/asm/kvm_host.h |6 ++ arch/powerpc/kernel/exceptions-64s.S |6 ++ arch/powerpc/kernel/head_64.S |6 -- arch/powerpc/kvm/book3s_32_mmu.c |4 +- arch/powerpc/kvm/book3s_32_mmu_host.c |2 +- arch/powerpc/kvm/book3s_64_mmu_host.c |2 +- arch/powerpc/kvm/book3s_mmu_hpte.c| 118 ++--- arch/powerpc/kvm/book3s_rmhandlers.S | 32 ++--- 9 files changed, 133 insertions(+), 47 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 6/7] KVM: PPC: Make long relocations be ulong
On Book3S KVM we directly expose some asm pointers to C code as variables. These need to be relocated and thus break on relocatable kernels. To make sure we can at least build, let's mark them as long instead of u32 where 64bit relocations don't work. This fixes the following build error: WARNING: 2 bad relocations^M c0008590 R_PPC64_ADDR32.text+0x40008460^M c0008594 R_PPC64_ADDR32.text+0x40008598^M Please keep in mind that actually using KVM on a relocated kernel might still break. This only fixes the compile problem. Reported-by: Subrata Modak subr...@linux.vnet.ibm.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/kvm/book3s_rmhandlers.S |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 00cf8b0..f04f516 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -132,8 +132,8 @@ extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); -extern u32 kvmppc_trampoline_lowmem; -extern u32 kvmppc_trampoline_enter; +extern ulong kvmppc_trampoline_lowmem; +extern ulong kvmppc_trampoline_enter; extern void kvmppc_rmcall(ulong srr0, ulong srr1); extern void kvmppc_load_up_fpu(void); extern void kvmppc_load_up_altivec(void); diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 229d3d6..2b9c908 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -252,10 +252,10 @@ define_load_up(vsx) .global kvmppc_trampoline_lowmem kvmppc_trampoline_lowmem: - .long kvmppc_handler_lowmem_trampoline - CONFIG_KERNEL_START + PPC_LONG kvmppc_handler_lowmem_trampoline - CONFIG_KERNEL_START .global kvmppc_trampoline_enter kvmppc_trampoline_enter: - .long kvmppc_handler_trampoline_enter - CONFIG_KERNEL_START + PPC_LONG kvmppc_handler_trampoline_enter - CONFIG_KERNEL_START #include book3s_segment.S -- 1.6.0.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] Add support for ramdisk on ppc32 for uImage-ppc and Elf-ppc
On Jul 29, 2010, at 3:33 AM, Simon Horman wrote: On Tue, Jul 20, 2010 at 03:14:58PM -0500, Matthew McClintock wrote: This fixes --reuseinitrd and --ramdisk option for ppc32 on uImage-ppc and Elf. It works for normal kexec as well as for kdump. When using --reuseinitrd you need to specifify retain_initrd on the command line. Also, if you are doing kdump you need to make sure your initrd lives in the crashdump region otherwise the kdump kernel will not be able to access it. The --ramdisk option should always work. Thanks, I have applied this change. I had to do a minor merge on the Makefile, could you verify that the result is correct? Tested and looks good. -M ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/6] regulator: Remove owner field from attribute initialization in regulator core driver
On Wed, Jul 28, 2010 at 10:09:24PM -0700, Guenter Roeck wrote: Signed-off-by: Guenter Roeck guenter.ro...@ericsson.com Acked-by: Mark Brown broo...@opensource.wolfsonmicro.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2 v1.04] Add support for DWC OTG driver.
On Thu, Jul 29, 2010 at 09:26:12AM -0700, Fushen Chen wrote: [PATCH 1/2 v1.04] 1. License information is under clarification. I meant that APM is still working with Synopys to resolve the GPL License. There is no result yet. Then I would be very careful in posting the code like you have done. As it is, the code is not something that can be legally posted or used in any device, and you might be held liable for it :( good luck, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2 v1.04] Add support for DWC OTG driver.
[PATCH 1/2 v1.04] 1. License information is under clarification. I meant that APM is still working with Synopys to resolve the GPL License. There is no result yet. I'll change this line to License issue is resolved. if that happens. I modified other part of the patch according to other reviewer's comment. Thanks, Fushen On Wed, Jul 28, 2010 at 8:05 PM, Greg KH gre...@suse.de wrote: On Wed, Jul 28, 2010 at 05:28:41PM -0700, Fushen Chen wrote: [PATCH 1/2 v1.04] 1. License information is under clarification. What do you mean by this? I fail to see a change here, why just repost the same code again? What is being done to resolve the issues I outlined previously? greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] of: Provide default of_node_to_nid() implementation.
of_node_to_nid() is only relevant in a few architectures. Don't force everyone to implement it anyway. Signed-off-by: Grant Likely grant.lik...@secretlab.ca --- v3: make -1 the default return value and let powerpc override it to 0 when CONFIG_NUMA not set. arch/microblaze/include/asm/topology.h | 10 -- arch/powerpc/include/asm/prom.h|7 +++ arch/powerpc/include/asm/topology.h|7 --- arch/sparc/include/asm/prom.h |3 +-- include/linux/of.h |5 + 5 files changed, 13 insertions(+), 19 deletions(-) diff --git a/arch/microblaze/include/asm/topology.h b/arch/microblaze/include/asm/topology.h index 96bcea5..5428f33 100644 --- a/arch/microblaze/include/asm/topology.h +++ b/arch/microblaze/include/asm/topology.h @@ -1,11 +1 @@ #include asm-generic/topology.h - -#ifndef _ASM_MICROBLAZE_TOPOLOGY_H -#define _ASM_MICROBLAZE_TOPOLOGY_H - -struct device_node; -static inline int of_node_to_nid(struct device_node *device) -{ - return 0; -} -#endif /* _ASM_MICROBLAZE_TOPOLOGY_H */ diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index da7dd63..55bccc0 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -103,6 +103,13 @@ struct device_node *of_find_next_cache_node(struct device_node *np); /* Get the MAC address */ extern const void *of_get_mac_address(struct device_node *np); +#ifdef CONFIG_NUMA +extern int of_node_to_nid(struct device_node *device); +#else +static inline int of_node_to_nid(struct device_node *device) { return 0; } +#endif +#define of_node_to_nid of_node_to_nid + /** * of_irq_map_pci - Resolve the interrupt for a PCI device * @pdev: the device whose interrupt is to be resolved diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 32adf72..09dd38c 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -41,8 +41,6 @@ static inline int cpu_to_node(int cpu) cpu_all_mask : \ node_to_cpumask_map[node]) -int of_node_to_nid(struct device_node *device); - struct pci_bus; #ifdef CONFIG_PCI extern int pcibus_to_node(struct pci_bus *bus); @@ -94,11 +92,6 @@ extern void sysfs_remove_device_from_node(struct sys_device *dev, int nid); #else -static inline int of_node_to_nid(struct device_node *device) -{ - return 0; -} - static inline void dump_numa_cpu_topology(void) {} static inline int sysfs_add_device_to_node(struct sys_device *dev, int nid) diff --git a/arch/sparc/include/asm/prom.h b/arch/sparc/include/asm/prom.h index c82a7da..291f125 100644 --- a/arch/sparc/include/asm/prom.h +++ b/arch/sparc/include/asm/prom.h @@ -43,8 +43,7 @@ extern int of_getintprop_default(struct device_node *np, extern int of_find_in_proplist(const char *list, const char *match, int len); #ifdef CONFIG_NUMA extern int of_node_to_nid(struct device_node *dp); -#else -#define of_node_to_nid(dp) (-1) +#define of_node_to_nid of_node_to_nid #endif extern void prom_build_devicetree(void); diff --git a/include/linux/of.h b/include/linux/of.h index b0756f3..cad7cf0 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -146,6 +146,11 @@ static inline unsigned long of_read_ulong(const __be32 *cell, int size) #define OF_BAD_ADDR((u64)-1) +#ifndef of_node_to_nid +static inline int of_node_to_nid(struct device_node *np) { return -1; } +#define of_node_to_nid of_node_to_nid +#endif + extern struct device_node *of_find_node_by_name(struct device_node *from, const char *name); #define for_each_node_by_name(dn, name) \ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] of/address: Clean up function declarations
This patch moves the declaration of of_get_address(), of_get_pci_address(), and of_pci_address_to_resource() out of arch code and into the common linux/of_address header file. This patch also fixes some of the asm/prom.h ordering issues. It still includes some header files that it ideally shouldn't be, but at least the ordering is consistent now so that of_* overrides work. Signed-off-by: Grant Likely grant.lik...@secretlab.ca --- arch/microblaze/include/asm/prom.h| 33 +++- arch/powerpc/include/asm/prom.h | 49 +++-- arch/powerpc/kernel/legacy_serial.c |1 + arch/powerpc/kernel/pci-common.c |1 + arch/powerpc/platforms/52xx/lite5200.c|1 + arch/powerpc/platforms/amigaone/setup.c |3 +- arch/powerpc/platforms/iseries/mf.c |1 + arch/powerpc/platforms/powermac/feature.c |2 + drivers/char/bsr.c|1 + drivers/net/fsl_pq_mdio.c |1 + drivers/net/xilinx_emaclite.c |2 + drivers/serial/uartlite.c |1 + drivers/spi/mpc512x_psc_spi.c |1 + drivers/spi/mpc52xx_psc_spi.c |1 + drivers/spi/xilinx_spi_of.c |1 + drivers/usb/gadget/fsl_qe_udc.c |1 + drivers/video/controlfb.c |2 + drivers/video/offb.c |3 +- include/linux/of_address.h| 32 +++ 19 files changed, 74 insertions(+), 63 deletions(-) diff --git a/arch/microblaze/include/asm/prom.h b/arch/microblaze/include/asm/prom.h index cb9c3dd..101fa09 100644 --- a/arch/microblaze/include/asm/prom.h +++ b/arch/microblaze/include/asm/prom.h @@ -20,11 +20,6 @@ #ifndef __ASSEMBLY__ #include linux/types.h -#include linux/of_address.h -#include linux/of_irq.h -#include linux/of_fdt.h -#include linux/proc_fs.h -#include linux/platform_device.h #include asm/irq.h #include asm/atomic.h @@ -52,25 +47,9 @@ extern void pci_create_OF_bus_map(void); * OF address retreival translation */ -/* Extract an address from a device, returns the region size and - * the address space flags too. The PCI version uses a BAR number - * instead of an absolute index - */ -extern const u32 *of_get_address(struct device_node *dev, int index, - u64 *size, unsigned int *flags); -extern const u32 *of_get_pci_address(struct device_node *dev, int bar_no, - u64 *size, unsigned int *flags); - -extern int of_pci_address_to_resource(struct device_node *dev, int bar, - struct resource *r); - #ifdef CONFIG_PCI extern unsigned long pci_address_to_pio(phys_addr_t address); -#else -static inline unsigned long pci_address_to_pio(phys_addr_t address) -{ - return (unsigned long)-1; -} +#define pci_address_to_pio pci_address_to_pio #endif /* CONFIG_PCI */ /* Parse the ibm,dma-window property of an OF node into the busno, phys and @@ -99,8 +78,18 @@ extern const void *of_get_mac_address(struct device_node *np); * resolving using the OF tree walking. */ struct pci_dev; +struct of_irq; extern int of_irq_map_pci(struct pci_dev *pdev, struct of_irq *out_irq); #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ + +/* These includes are put at the bottom because they may contain things + * that are overridden by this file. Ideally they shouldn't be included + * by this file, but there are a bunch of .c files that currently depend + * on it. Eventually they will be cleaned up. */ +#include linux/of_fdt.h +#include linux/of_irq.h +#include linux/platform_device.h + #endif /* _ASM_MICROBLAZE_PROM_H */ diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 55bccc0..ae26f2e 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -17,11 +17,6 @@ * 2 of the License, or (at your option) any later version. */ #include linux/types.h -#include linux/of_fdt.h -#include linux/of_address.h -#include linux/of_irq.h -#include linux/proc_fs.h -#include linux/platform_device.h #include asm/irq.h #include asm/atomic.h @@ -49,41 +44,9 @@ extern void pci_create_OF_bus_map(void); extern u64 of_translate_dma_address(struct device_node *dev, const u32 *in_addr); -/* Extract an address from a device, returns the region size and - * the address space flags too. The PCI version uses a BAR number - * instead of an absolute index - */ -extern const u32 *of_get_address(struct device_node *dev, int index, - u64 *size, unsigned int *flags); -#ifdef CONFIG_PCI -extern const u32 *of_get_pci_address(struct device_node *dev, int bar_no, - u64 *size, unsigned int *flags); -#else -static inline const u32 *of_get_pci_address(struct device_node *dev, - int bar_no, u64 *size, unsigned int *flags) -{ - return NULL; -} -#endif /*
Re: Commit 3da34aa brakes MSI support on MPC8308 (possibly all MPC83xx) [REPOST]
Dear Kumar Kim, any comments on this issue? Thanks. In message 4c48b384.1020...@emcraft.com Ilya Yanok wrote: Hi Kumar, Kim, Josh, everybody, I hope to disturb you but I haven't got any reply for my first posting... I've found that MSI work correctly with older kernels on my MPC8308RDB board and don't work with newer ones. After bisecting I've found that the source of the problem is commit 3da34aa: commit 3da34aae03d498ee62f75aa7467de93cce3030fd Author: Kumar Gala ga...@kernel.crashing.org Date: Tue May 12 15:51:56 2009 -0500 powerpc/fsl: Support unique MSI addresses per PCIe Root Complex Its feasible based on how the PCI address map is setup that the region of PCI address space used for MSIs differs for each PHB on the same SoC. Instead of assuming that the address mappes to CCSRBAR 1:1 we read PEXCSRBAR (BAR0) for the PHB that the given pci_dev is on. Signed-off-by: Kumar Gala ga...@kernel.crashing.org I can see BAR0 initialization for 85xx/86xx hardware but not for 83xx neigher in the kernel nor in U-Boot (that makes me think that all 83xx can be affected). I'm not actually an PCI expert so I've just tried to write IMMR base address to the BAR0 register from the U-Boot to get the correct address but this doesn't help. Please direct me how to init 83xx PCIE controller to make it compatible with this patch. Kim, I think MPC8315E is affected too, could you please test it? Thanks in advance. Regards, Ilya. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de A good aphorism is too hard for the tooth of time, and is not worn away by all the centuries, although it serves as food for every epoch. - Friedrich Wilhelm Nietzsche _Miscellaneous Maxims and Opinions_ no. 168 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
In message 4c511216.30...@ozlabs.org you wrote: When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running , and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. Again, stuck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. There are a lot of cleanups in this patch. The change you are making would be a lot clearer without all the additional cleanups in there. I think I'd like to see this as two patches. One for cleanups and one for the addition of wake_offline_cpus(). Other than that, I'm not completely convinced this is the functionality we want. Do we really want to online these cpus? Why where they offlined in the first place? I understand the stuck problem, but is the solution to online them, or to change the device tree so that the second kernel doesn't detect them as stuck? Mikey Signed-off-by: Matt Evans m...@ozlabs.org --- v2: Added FIXME comment noting a possible problem with incorrectly started secondary CPUs, following feedback from Milton. arch/powerpc/kernel/machine_kexec_64.c | 55 -- - 1 files changed, 49 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/mac hine_kexec_64.c index 4fbb3be..37f805e 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -15,6 +15,8 @@ #include linux/thread_info.h #include linux/init_task.h #include linux/errno.h +#include linux/kernel.h +#include linux/cpu.h #include asm/page.h #include asm/current.h @@ -181,7 +183,20 @@ static void kexec_prepare_cpus_wait(int wait_state) int my_cpu, i, notified=-1; my_cpu = get_cpu(); - /* Make sure each CPU has atleast made it to the state we need */ + /* Make sure each CPU has at least made it to the state we need. + * + * FIXME: There is a (slim) chance of a problem if not all of the CPUs + * are correctly onlined. If somehow we start a CPU on boot with RTAS + * start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in + * time, the boot CPU will timeout. If it does eventually execute + * stuff, the secondary will start up (paca[].cpu_start was written) an d + * get into a peculiar state. If the platform supports + * smp_ops-take_timebase(), the secondary CPU will probably be spinnin g + * in there. If not (i.e. pseries), the secondary will continue on and + * try to online itself/idle/etc. If it survives that, we need to find + * these possible-but-not-online-but-should-be CPUs and chaperone them + * into kexec_smp_wait(). + */ for_each_online_cpu(i) { if (i == my_cpu) continue; @@ -189,9 +204,9 @@ static void kexec_prepare_cpus_wait(int wait_state) while (paca[i].kexec_state wait_state) { barrier(); if (i != notified) { - printk( kexec: waiting for cpu %d (physical - %d) to enter %i state\n, - i, paca[i].hw_cpu_id, wait_state); + printk(KERN_INFO kexec: waiting for cpu %d +(physical %d) to enter %i state\n, +i, paca[i].hw_cpu_id, wait_state); notified = i; } } @@ -199,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state) mb(); } -static void kexec_prepare_cpus(void) +/* + * We need to make sure each present CPU is online. The next kernel will sc an + * the device tree and assume primary threads are online and query secondary + * threads via RTAS to online them if required. If we don't online primary + * threads, they will be stuck. However, we also online secondary threads a s we + * may be using 'cede offline'. In this case RTAS doesn't see the secondary + * threads as offline -- and again, these CPUs will be stuck. + * + * So, we online all CPUs that should be running, including secondary thread s. + */ +static void wake_offline_cpus(void) { + int cpu = 0; + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + printk(KERN_INFO kexec: Waking offline cpu %d.\n, +cpu); + cpu_up(cpu); + } + } +} + +static void kexec_prepare_cpus(void) +{ + wake_offline_cpus(); smp_call_function(kexec_smp_down, NULL, /* wait */0);
Re: [PATCH 1/2 v1.03] Add support for DWC OTG HCD function.
Hi Greg: We will change to a BSD 3 clause license header. Our legal counsel is talking to Synopsis to make this change. We will resubmit once this is in place. Please let me know if you have any additional concerns. Feng Kan Applied Micro On Mon, Jul 26, 2010 at 4:16 PM, Greg KH gre...@suse.de wrote: On Mon, Jul 26, 2010 at 04:05:49PM -0700, Feng Kan wrote: Hi Greg: We are having our legal revisit this again. What would you advise us to do at this point? I thought I was very clear below as to what is needed. Disclose the agreement or have someone with legal authority reply this thread. Neither will resolve the end issue, right? Perhaps something in the header that states Applied Micro verified with Synopsys to use this code for GPL purpose. No, that will just make it messier. Someone needs to delete all of the mess in the file, put the proper license information for what the code is being licensed under (whatever it is), and provide a signed-off-by from a person from Synopsys and APM that can speak for the company that they agree that the code can properly be placed into the Linux kernel. thanks, greg k-h -- Feng Kan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
Michael Neuling wrote: In message 4c511216.30...@ozlabs.org you wrote: When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running , and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. Again, stuck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. There are a lot of cleanups in this patch. The change you are making would be a lot clearer without all the additional cleanups in there. I think I'd like to see this as two patches. One for cleanups and one for the addition of wake_offline_cpus(). Okay, I can split this. Typofixy-add-debug in one, wake_offline_cpus in another. Other than that, I'm not completely convinced this is the functionality we want. Do we really want to online these cpus? Why where they offlined in the first place? I understand the stuck problem, but is the solution to online them, or to change the device tree so that the second kernel doesn't detect them as stuck? Well... There are two cases. If a CPU is soft-offlined on pseries, it must be woken from that cede loop (in platforms/pseries/hotplug-cpu.c) as we're replacing code under its feet. We could either special-case the wakeup from this cede loop to get that CPU to RTAS stop-self itself properly. (Kind of like a wake to die.) So that leaves hard-offline CPUs (perhaps including the above): I don't know why they might have been offlined. If it's something serious, like fire, they'd be removed from the present set too (and thus not be considered in this restarting case). We could add a mask to the CPU node to show which of the threads (if any) are running, and alter the startup code to start everything if this mask doesn't exist (non-kexec) or only online currently-running threads if the mask is present. That feels a little weird. My reasoning for restarting everything was: The first time you boot, all of your present CPUs are started up. When you reboot, any CPUs you offlined for fun are restarted. Kexec is (in this non-crash sense) a user-initiated 'quick reboot', so I reasoned that it should look the same as a 'hard reboot' and your new invocation would have all available CPUs running as is usual. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2 v1.04] Add support for DWC OTG driver.
On 07/28/2010 05:28 PM, Fushen Chen wrote: [PATCH 1/2 v1.04] . . . PATCH 1/2 seems to not have made it to linux-...@vger.kernel.org. I suspect that a spam filter got it. Could you remove whatever there is in the patch that triggers the filter? Or failing that, change the filter so we can all see the patch? Thanks, David Daney ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2 v1.03] Add support for DWC OTG HCD function.
On Thu, Jul 29, 2010 at 05:14:59PM -0700, Feng Kan wrote: Hi Greg: We will change to a BSD 3 clause license header. Our legal counsel is talking to Synopsis to make this change. Why BSD? You do realize what that means when combined within the body of the kernel, right? Are you going to be expecting others to contribute back to the code under this license, or will you accept the fact that future contributions from the community will cause the license to change? We will resubmit once this is in place. Please let me know if you have any additional concerns. My main concern is that you, and everyone else involved in the driver, never considered the license of the code in the first place and expected the kernel community to accept it as-is, placing the problem on us. What will be done in the future to prevent this from happening again? thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2 v1.03] Add support for DWC OTG HCD function.
Hi Greg: On Thu, Jul 29, 2010 at 5:50 PM, Greg KH gre...@suse.de wrote: On Thu, Jul 29, 2010 at 05:14:59PM -0700, Feng Kan wrote: Hi Greg: We will change to a BSD 3 clause license header. Our legal counsel is talking to Synopsis to make this change. Why BSD? You do realize what that means when combined within the body of the kernel, right? FKAN: We will shoot for a dual BSD/GPL license such as the one in the HP Hil driver. Are you going to be expecting others to contribute back to the code under this license, or will you accept the fact that future contributions from the community will cause the license to change? We will resubmit once this is in place. Please let me know if you have any additional concerns. My main concern is that you, and everyone else involved in the driver, never considered the license of the code in the first place and expected the kernel community to accept it as-is, placing the problem on us. FKAN: Please don't think this is the case, we gone through this exercise with Denx. We had legal looking into the header before submission to them and the kernel. What will be done in the future to prevent this from happening again? FKAN: agreed, once bitten :) thanks, greg k-h -- Feng Kan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2 v1.03] Add support for DWC OTG HCD function.
On Thu, Jul 29, 2010 at 06:19:25PM -0700, Feng Kan wrote: Hi Greg: On Thu, Jul 29, 2010 at 5:50 PM, Greg KH gre...@suse.de wrote: On Thu, Jul 29, 2010 at 05:14:59PM -0700, Feng Kan wrote: Hi Greg: We will change to a BSD 3 clause license header. Our legal counsel is talking to Synopsis to make this change. Why BSD? You do realize what that means when combined within the body of the kernel, right? FKAN: We will shoot for a dual BSD/GPL license such as the one in the HP Hil driver. What specific driver is this? And are you sure that all of the contributors to the code agree with this licensing change? Are you going to require contributors to dual-license their changes? If so, why keep it BSD, what does that get you? Are you going to be expecting others to contribute back to the code under this license, or will you accept the fact that future contributions from the community will cause the license to change? You didn't answer this question, which is a very important one before I can accept this driver. We will resubmit once this is in place. Please let me know if you have any additional concerns. My main concern is that you, and everyone else involved in the driver, never considered the license of the code in the first place and expected the kernel community to accept it as-is, placing the problem on us. FKAN: Please don't think this is the case, we gone through this exercise with Denx. What is Denx? We had legal looking into the header before submission to them and the kernel. Then what happened here? Just curious as to how the driver was public for so long before someone realized this. What will be done in the future to prevent this from happening again? FKAN: agreed, once bitten :) That didn't answer the question :) thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] powerpc/prom: Export device tree physical address via proc
On Thu, Jul 15, 2010 at 11:39:21AM -0500, Matthew McClintock wrote: On Thu, 2010-07-15 at 10:22 -0600, Grant Likely wrote: Thanks for taking a look. My first thought was to just blow away all the memreserve regions and start over. But, there are reserve regions for other things that I might not want to blow away. For example, on mpc85xx SMP systems we have an additional reserve region for our boot page. What is your starting point? Where does the device tree (and memreserve list) come from that you're passing to kexec? My first impression is that if you have to scrub the memreserve list, then the source being used to obtain the memreserves is either faulty or unsuitable to the task. I'm pulling the device tree passed in via u-boot and passing it to kexec. It is the most complete device tree and requires the least amount of fixup. I have to scrub two items, the ramdisk/initrd and the device tree because upon kexec'ing the kernel we have the ability to pass in new ramdisk/initrd and device tree. They can also live at different physical addresses for the second reboot. The initrd addresses are already exposed, so we can update/remove/reuse that entry, we just need a way for kexec to determine the current device tree address so it can replace the correct memreserve region for the kexec'ing kernels' device tree. Ok, be careful with this. You do have the information you need, but you might have to split an existing entry. Having a single reserve entry to cover the initrd would be typical, but it doesn't have to happen that way - e.g. if a firmware reserves a big region for its own purposes, and places the initrd within that region. Also, the latest specs do *not* require the device tree itself to be mem reserved. The whole problem comes from repeatedly kexec'ing, we need to make sure we don't keep losing blobs of memory to reserve regions (so we can't just blindly add). We also need to make sure we don't lose other memreserve regions that might be important for other things (so we can't just blow them all away). -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] powerpc/prom: Export device tree physical address via proc
On Thu, Jul 15, 2010 at 01:18:21PM -0600, Grant Likely wrote: On Thu, Jul 15, 2010 at 12:58 PM, Matthew McClintock m...@freescale.com wrote: On Thu, 2010-07-15 at 12:37 -0600, Grant Likely wrote: On Thu, Jul 15, 2010 at 12:03 PM, Matthew McClintock m...@freescale.com wrote: Yes. Where would we get a list of memreserve sections? I would say the list of reserves that are not under the control of Linux should be explicitly described in the device tree proper. For instance, if you have a region that firmware depends on, then have a node for describing the firmware and a property stating the memory regions that it depends on. The memreserve regions can be generated from that. Ok, so we could traverse the tree node-by-bode for a persistent-memreserve property and add them to the /memreserve/ list in the kexec user space tools? Well.. I don't think it should be this way as a matter of spec. But you could use a property as an interim stash for memreserve information. I agree that the precise defined semantics of the memreserve regions is kind of fuzzy and non-obvious. Here's how I believe they need to work: memory in a reserved region must *never* be touched by the OS (or subsequent kexec-invoked OSes) unless something else in the device tree explicitly instructs it how There already exist several mechanisms for instructing the OS to use particular reserved regions for particular purposes: e.g. the initrd properties, and the spin-table properties. More such mechanisms might be added in future ePAPR (or whatever) revisions. But if the OS version doesn't understand such a future mechanism, it must fall back to assuming that the memory is reserved in perpetuity. Now, some of these mechanisms (implicitly) permit the OS to re-use the reserved memory after it's done using them as instructed (initrd is the most obvious one). In that case the OS can re-add the reserved space to it's general pools, and excise it from the reserved space for subsequent kexec()-style boots. However that's (potentially) a more complex process than just removing an entry - the initial firmware is free to combine adjacent reserved regions into one reserve entry, or even to cover a single reserved region with multiple entries. So in order to do this manipulation you will need an allocator of sorts that does the region reservation/dereservation correctly handling the semantics on a byte-by-byte basis. You should also be careful that the regions you're handling do actually lie in memory space. Linux doesn't support this right now, but I do have an experimental patch that allows the initrd properties to point to (e.g.) flash instead of RAM. In that case the initrd wouldn't have to lie in an explicitly reserved region, and obviously could not be returned to the general pool after use. I *think* that is okay, but I'd like to hear from Segher, Ben, Mitch, David Gibson, and other device tree experts on whether or not that exact property naming is a good one. Write up a proposed binding (you can use devicetree.org). Post it for review (make sure you cc: both devicetree-discuss and linuxppc-dev, as well as cc'ing the people listed above.) Should we export the reserve sections instead of the device tree location? It shouldn't really be something that the kernel is explicitly exporting because it is a characteristic of the board design. It is something that belongs in the tree-proper. ie. when you extract the tree you have data telling what the region is, and why it is reserved. Agreed. We just need a way to preserve what was there at boot to pass to the new kernel. Yet there is no differentiation between the board-dictated memory reserves and the things that U-Boot/Linux made an arbitrary decision on. The solution should focus not on can I throw this one away? but rather Is this one I should keep? :-) A subtle difference, I know, but it changes the way you approach the solution. Fair enough. I think the above solution will work nicely, and I can start implementing something if you agree - if I interpreted your idea correctly. Although it should not require any changes to the kernel proper. Correct. g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2 v1.03] Add support for DWC OTG HCD function.
On Thu, Jul 29, 2010 at 6:26 PM, Greg KH gre...@suse.de wrote: On Thu, Jul 29, 2010 at 06:19:25PM -0700, Feng Kan wrote: Hi Greg: On Thu, Jul 29, 2010 at 5:50 PM, Greg KH gre...@suse.de wrote: On Thu, Jul 29, 2010 at 05:14:59PM -0700, Feng Kan wrote: Hi Greg: We will change to a BSD 3 clause license header. Our legal counsel is talking to Synopsis to make this change. Why BSD? You do realize what that means when combined within the body of the kernel, right? FKAN: We will shoot for a dual BSD/GPL license such as the one in the HP Hil driver. What specific driver is this? FKAN: this is driver/input/serio/hil_mlc.c and quite a number of others. And are you sure that all of the contributors to the code agree with this licensing change? Are you going to require contributors to dual-license their changes? If so, why keep it BSD, what does that get you? FKAN: for one thing, to make it future proof on other submissions. Are you going to be expecting others to contribute back to the code under this license, or will you accept the fact that future contributions from the community will cause the license to change? You didn't answer this question, which is a very important one before I can accept this driver. FKAN: Yes, all of the above. Our legal is working on that. I thought by default GPL defines the above statement. We will resubmit once this is in place. Please let me know if you have any additional concerns. My main concern is that you, and everyone else involved in the driver, never considered the license of the code in the first place and expected the kernel community to accept it as-is, placing the problem on us. FKAN: Please don't think this is the case, we gone through this exercise with Denx. What is Denx? FKAN: U-Boot Denx.de We had legal looking into the header before submission to them and the kernel. Then what happened here? Just curious as to how the driver was public for so long before someone realized this. FKAN: this was few years back. At the time we had the header changed so it was BSD like to be accepted by Denx. What will be done in the future to prevent this from happening again? FKAN: agreed, once bitten :) That didn't answer the question :) FKAN: we have a system of checks for every patch that goes out. I will send out a guideline to all reviewer to make sure the header follow kernel precedence. Legal is quite aware of the issue now too. thanks, greg k-h -- Feng Kan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
(adding kexec list to CC) In message 4c521fd2.4050...@ozlabs.org you wrote: Michael Neuling wrote: In message 4c511216.30...@ozlabs.org you wrote: When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already runn ing , and will declare them stuck. On pseries, there's also a soft-offline (ced e) state that CPUs may be in; this can also cause problems as the kexeced ker nel may ask RTAS if they're online -- and RTAS would say they are. Again, stu ck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. There are a lot of cleanups in this patch. The change you are making would be a lot clearer without all the additional cleanups in there. I think I'd like to see this as two patches. One for cleanups and one for the addition of wake_offline_cpus(). Okay, I can split this. Typofixy-add-debug in one, wake_offline_cpus in another. Thanks. Other than that, I'm not completely convinced this is the functionality we want. Do we really want to online these cpus? Why where they offlined in the first place? I understand the stuck problem, but is the solution to online them, or to change the device tree so that the second kernel doesn't detect them as stuck? Well... There are two cases. If a CPU is soft-offlined on pseries, it must b e woken from that cede loop (in platforms/pseries/hotplug-cpu.c) as we're repla cing code under its feet. We could either special-case the wakeup from this ce de loop to get that CPU to RTAS stop-self itself properly. (Kind of like a wake to die.) Makes sense. So that leaves hard-offline CPUs (perhaps including the above): I don't know why they might have been offlined. If it's something serious, like fire, they'd be removed from the present set too (and thus not be considered in this restarting case). We could add a mask to the CPU node to show which of the threads (if any) are running, and alter the startup code to start everything if this mask doesn't exist (non-kexec) or only online currently-running threads if the mask is present. That feels a little weird. My reasoning for restarting everything was: The first time you boot, all of your present CPUs are started up. When you reboot, any CPUs you offlined for fun are restarted. Kexec is (in this non-crash sense) a user-initiated 'quick reboot', so I reasoned that it should look the same as a 'hard reboot' and your new invocation would have all available CPUs running as is usual. OK, I like this justification. Would be good to include it in the checkin comment since we're changing functionality somewhat. Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2 v1.03] Add support for DWC OTG HCD function.
On Thu, Jul 29, 2010 at 07:02:44PM -0700, Feng Kan wrote: On Thu, Jul 29, 2010 at 6:26 PM, Greg KH gre...@suse.de wrote: On Thu, Jul 29, 2010 at 06:19:25PM -0700, Feng Kan wrote: Hi Greg: On Thu, Jul 29, 2010 at 5:50 PM, Greg KH gre...@suse.de wrote: On Thu, Jul 29, 2010 at 05:14:59PM -0700, Feng Kan wrote: Hi Greg: We will change to a BSD 3 clause license header. Our legal counsel is talking to Synopsis to make this change. Why BSD? ??You do realize what that means when combined within the body of the kernel, right? FKAN: We will shoot for a dual BSD/GPL license such as the one in the HP ?? ?? ?? ?? ?? ??Hil driver. What specific driver is this? FKAN: this is driver/input/serio/hil_mlc.c and quite a number of others. Ok, thanks. Are you _sure_ that you didn't take any existing GPL code and put it into this driver when making it? Did all contributors to the code release their contributions under both licenses? And are you sure that all of the contributors to the code agree with this licensing change? ??Are you going to require contributors to dual-license their changes? If so, why keep it BSD, what does that get you? FKAN: for one thing, to make it future proof on other submissions. What do you mean by this? What can you do with this code other than use it on a Linux system? You can't put it into any other operating system with a different license, can you? Are you going to be expecting others to contribute back to the code under this license, or will you accept the fact that future contributions from the community will cause the license to change? You didn't answer this question, which is a very important one before I can accept this driver. FKAN: Yes, all of the above. Our legal is working on that. I thought by default GPL defines the above statement. The GPL does, but as you are trying to dual-license the code, you have to be careful about how you accept changes, and under what license. It's a lot more work than I think you realize. What process do you have in place to handle this? We will resubmit once this is in place. Please let me know if you have any additional concerns. My main concern is that you, and everyone else involved in the driver, never considered the license of the code in the first place and expected the kernel community to accept it as-is, placing the problem on us. FKAN: Please don't think this is the case, we gone through this exercise ?? ?? ?? ?? ?? with Denx. What is Denx? FKAN: U-Boot Denx.de Ah, thanks. We had legal looking into the header before submission ?? ?? ?? ?? ?? to them and the kernel. Then what happened here? ??Just curious as to how the driver was public for so long before someone realized this. FKAN: this was few years back. At the time we had the header changed so it was BSD like to be accepted by Denx. What will be done in the future to prevent this from happening again? FKAN: agreed, once bitten :) That didn't answer the question :) FKAN: we have a system of checks for every patch that goes out. I will send out a guideline to all reviewer to make sure the header follow kernel precedence. But you took this code from a different vendor, are you able to properly identify the code contributions to this base and what license it is under and where they got it from? Legal is quite aware of the issue now too. As they should be :) Please reconsider the dual licensing unless you really are ready to handle the implications of it. thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
On Fri, Jul 30, 2010 at 01:15:14PM +1000, Michael Neuling wrote: (adding kexec list to CC) In message 4c521fd2.4050...@ozlabs.org you wrote: Michael Neuling wrote: In message 4c511216.30...@ozlabs.org you wrote: When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already runn ing , and will declare them stuck. On pseries, there's also a soft-offline (ced e) state that CPUs may be in; this can also cause problems as the kexeced ker nel may ask RTAS if they're online -- and RTAS would say they are. Again, stu ck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. There are a lot of cleanups in this patch. The change you are making would be a lot clearer without all the additional cleanups in there. I think I'd like to see this as two patches. One for cleanups and one for the addition of wake_offline_cpus(). Okay, I can split this. Typofixy-add-debug in one, wake_offline_cpus in another. Thanks. Other than that, I'm not completely convinced this is the functionality we want. Do we really want to online these cpus? Why where they offlined in the first place? I understand the stuck problem, but is the solution to online them, or to change the device tree so that the second kernel doesn't detect them as stuck? Well... There are two cases. If a CPU is soft-offlined on pseries, it must b e woken from that cede loop (in platforms/pseries/hotplug-cpu.c) as we're repla cing code under its feet. We could either special-case the wakeup from this ce de loop to get that CPU to RTAS stop-self itself properly. (Kind of like a wake to die.) Makes sense. So that leaves hard-offline CPUs (perhaps including the above): I don't know why they might have been offlined. If it's something serious, like fire, they'd be removed from the present set too (and thus not be considered in this restarting case). We could add a mask to the CPU node to show which of the threads (if any) are running, and alter the startup code to start everything if this mask doesn't exist (non-kexec) or only online currently-running threads if the mask is present. That feels a little weird. My reasoning for restarting everything was: The first time you boot, all of your present CPUs are started up. When you reboot, any CPUs you offlined for fun are restarted. Kexec is (in this non-crash sense) a user-initiated 'quick reboot', so I reasoned that it should look the same as a 'hard reboot' and your new invocation would have all available CPUs running as is usual. OK, I like this justification. Would be good to include it in the checkin comment since we're changing functionality somewhat. FWIW, I do too. Personally I like to think of kexec as soft-reboot. Where soft means, in software, not soft-touch. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3 v2] mmc: Add ESDHC weird voltage bits workaround
P4080 ESDHC controller does not support 1.8V and 3.0V voltage. but the host controller capabilities register wrongly set the bits. This patch adds the workaround to correct the weird voltage setting bits. Signed-off-by: Roy Zang tie-fei.z...@freescale.com --- This is the second version of patch http://patchwork.ozlabs.org/patch/60106/ According to the comment, remove some un-necessary setting. Together with patch http://patchwork.ozlabs.org/patch/60111/ http://patchwork.ozlabs.org/patch/60116/ This serial patches add mmc support for p4080 silicon drivers/mmc/host/sdhci-of-core.c |4 drivers/mmc/host/sdhci.c |8 drivers/mmc/host/sdhci.h |4 3 files changed, 16 insertions(+), 0 deletions(-) diff --git a/drivers/mmc/host/sdhci-of-core.c b/drivers/mmc/host/sdhci-of-core.c index 0c30242..1f3913d 100644 --- a/drivers/mmc/host/sdhci-of-core.c +++ b/drivers/mmc/host/sdhci-of-core.c @@ -164,6 +164,10 @@ static int __devinit sdhci_of_probe(struct of_device *ofdev, if (sdhci_of_wp_inverted(np)) host-quirks |= SDHCI_QUIRK_INVERTED_WRITE_PROTECT; + if (of_device_is_compatible(np, fsl,p4080-esdhc)) + host-quirks |= (SDHCI_QUIRK_QORIQ_NO_VDD_180 + |SDHCI_QUIRK_QORIQ_NO_VDD_300); + clk = of_get_property(np, clock-frequency, size); if (clk size == sizeof(*clk) *clk) of_host-clock = *clk; diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 1424d08..a667790 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -1699,6 +1699,14 @@ int sdhci_add_host(struct sdhci_host *host) caps = sdhci_readl(host, SDHCI_CAPABILITIES); +/* Workaround for P4080 host controller capabilities + * 1.8V and 3.0V do not supported*/ + if (host-quirks SDHCI_QUIRK_QORIQ_NO_VDD_180) + caps = ~SDHCI_CAN_VDD_180; + + if (host-quirks SDHCI_QUIRK_QORIQ_NO_VDD_300) + caps = ~SDHCI_CAN_VDD_300; + if (host-quirks SDHCI_QUIRK_FORCE_DMA) host-flags |= SDHCI_USE_SDMA; else if (!(caps SDHCI_CAN_DO_SDMA)) diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index aa112aa..389b58c 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -243,6 +243,10 @@ struct sdhci_host { #define SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC (126) /* Controller uses Auto CMD12 command to stop the transfer */ #define SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 (127) +/* Controller cannot support 1.8V */ +#define SDHCI_QUIRK_QORIQ_NO_VDD_180 (128) +/* Controller cannot support 3.0V */ +#define SDHCI_QUIRK_QORIQ_NO_VDD_300 (129) int irq;/* Device IRQ */ void __iomem * ioaddr; /* Mapped address */ -- 1.5.6.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] powerpc/kexec: Fix orphaned offline CPUs, add comments/debug
Separated tidyup comments debug away from the fix of restarting offline available CPUs before waiting for them on kexec. Matt Evans (2): powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.c powerpc/kexec: Fix orphaned offline CPUs across kexec arch/powerpc/kernel/machine_kexec_64.c | 55 --- 1 files changed, 49 insertions(+), 6 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.c
Tidies some typos, KERN_INFO-ise an info msg, and add a debug msg showing when the final sequence starts. Also adds a comment to kexec_prepare_cpus_wait() to make note of a possible problem; the need for kexec to deal with CPUs that failed to originally start up. Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 29 - 1 files changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 4fbb3be..aa3d5cd 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -15,6 +15,7 @@ #include linux/thread_info.h #include linux/init_task.h #include linux/errno.h +#include linux/kernel.h #include asm/page.h #include asm/current.h @@ -181,7 +182,20 @@ static void kexec_prepare_cpus_wait(int wait_state) int my_cpu, i, notified=-1; my_cpu = get_cpu(); - /* Make sure each CPU has atleast made it to the state we need */ + /* Make sure each CPU has at least made it to the state we need. +* +* FIXME: There is a (slim) chance of a problem if not all of the CPUs +* are correctly onlined. If somehow we start a CPU on boot with RTAS +* start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in +* time, the boot CPU will timeout. If it does eventually execute +* stuff, the secondary will start up (paca[].cpu_start was written) and +* get into a peculiar state. If the platform supports +* smp_ops-take_timebase(), the secondary CPU will probably be spinning +* in there. If not (i.e. pseries), the secondary will continue on and +* try to online itself/idle/etc. If it survives that, we need to find +* these possible-but-not-online-but-should-be CPUs and chaperone them +* into kexec_smp_wait(). +*/ for_each_online_cpu(i) { if (i == my_cpu) continue; @@ -189,9 +203,9 @@ static void kexec_prepare_cpus_wait(int wait_state) while (paca[i].kexec_state wait_state) { barrier(); if (i != notified) { - printk( kexec: waiting for cpu %d (physical -%d) to enter %i state\n, - i, paca[i].hw_cpu_id, wait_state); + printk(KERN_INFO kexec: waiting for cpu %d + (physical %d) to enter %i state\n, + i, paca[i].hw_cpu_id, wait_state); notified = i; } } @@ -215,7 +229,10 @@ static void kexec_prepare_cpus(void) if (ppc_md.kexec_cpu_down) ppc_md.kexec_cpu_down(0, 0); - /* Before removing MMU mapings make sure all CPUs have entered real mode */ + /* +* Before removing MMU mappings make sure all CPUs have entered real +* mode: +*/ kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE); put_cpu(); @@ -284,6 +301,8 @@ void default_machine_kexec(struct kimage *image) if (crashing_cpu == -1) kexec_prepare_cpus(); + pr_debug(kexec: Starting switchover sequence.\n); + /* switch to a staticly allocated stack. Based on irq stack code. * XXX: the task struct will likely be invalid once we do the copy! */ -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/kexec: Fix orphaned offline CPUs across kexec
When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running, and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. The CPU will either appear stuck, or will cause a crash as we replace its cede loop beneath it. This patch kicks each present offline CPU awake before the kexec, so that none are forever lost to these assumptions in the subsequent kernel. Now, the behaviour is that all available CPUs that were offlined are now online usable after the kexec. This mimics the behaviour of a full reboot (on which all CPUs will be restarted). Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 26 +- 1 files changed, 25 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index aa3d5cd..37f805e 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -16,6 +16,7 @@ #include linux/init_task.h #include linux/errno.h #include linux/kernel.h +#include linux/cpu.h #include asm/page.h #include asm/current.h @@ -213,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state) mb(); } -static void kexec_prepare_cpus(void) +/* + * We need to make sure each present CPU is online. The next kernel will scan + * the device tree and assume primary threads are online and query secondary + * threads via RTAS to online them if required. If we don't online primary + * threads, they will be stuck. However, we also online secondary threads as we + * may be using 'cede offline'. In this case RTAS doesn't see the secondary + * threads as offline -- and again, these CPUs will be stuck. + * + * So, we online all CPUs that should be running, including secondary threads. + */ +static void wake_offline_cpus(void) { + int cpu = 0; + + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + printk(KERN_INFO kexec: Waking offline cpu %d.\n, + cpu); + cpu_up(cpu); + } + } +} +static void kexec_prepare_cpus(void) +{ + wake_offline_cpus(); smp_call_function(kexec_smp_down, NULL, /* wait */0); local_irq_disable(); mb(); /* make sure IRQs are disabled before we say they are */ -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev