Re: Re: Re: [PATCHv5] atomic: add *_dec_not_zero
On Mon, 2011-12-05 at 08:57 +0100, Sven Eckelmann wrote: > On Monday 05 December 2011 09:41:55 Benjamin Herrenschmidt wrote: > > On Sun, 2011-12-04 at 22:18 +, Russell King - ARM Linux wrote: > > > > .../... > > > > > And really, I believe it would be a good cleanup if all the standard > > > definitions for atomic64 ops (like atomic64_add_negative) were also > > > defined in include/linux/atomic.h rather than individually in every > > > atomic*.h header throughout the kernel source, except where an arch > > > wants to explicitly override it. Yet again, virtually all architectures > > > define these in exactly the same way. > > > > > > We have more than enough code in arch/ for any architecture to worry > > > about, we don't need schemes to add more when there's simple and > > > practical solutions to avoiding doing so if the right design were > > > chosen (preferably from the outset.) > > > > > > So, I'm not going to offer my ack for a change which I don't believe > > > is the correct approach. > > > > I agree with Russell, his approach is a lot easier to maintain long run, > > we should even consider converting existing definitions. > > I would rather go with "the existing definitions have to converted" and this > means "not by this patch". Right. I didn't suggest -you- had to do it as a pre-req to your patch. > At the moment, the atomic64 stuff exist only as > separate generic or arch specific implementation. It is fine that Russell > King > noticed that people like Arun Sharma did a lot of work to made it true for > atomic_t, but atomic64_t is a little bit different right now (at least as I > understand it). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: oprofile callgraph support missing for common cpus
Benjamin Herrenschmidt wrote on 2011/11/25 06:24:32: > > On Fri, 2011-11-18 at 09:22 +0100, Joakim Tjernlund wrote: > > > I forgot to ask, oprofile mentions setting -no-omit-framepointer to get > > correct backtrace but I cannot turn on frame pointers for the ppc kernel. > > Isn't frame pointers needed for pcc? what about user space? > > PowerPC always has frame pointers, ignore that :-) A bit late but consider this: int leaf(int x) { return x+3; } which yields(with gcc -O2 -S): .file "leaf.c" .section".text" .align 2 .globl leaf .type leaf, @function leaf: addi 3,3,3 blr .size leaf, .-leaf .section.note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)" Here there is with frame pointer(I guess that the messing around with r11 and r31 is a defect?): (With gcc -O2 -S -fno-omit-frame-pointer) .file "leaf.c" .section".text" .align 2 .globl leaf .type leaf, @function leaf: stwu 1,-16(1) addi 3,3,3 lwz 11,0(1) stw 31,12(1) mr 31,1 lwz 31,-4(11) mr 1,11 blr .size leaf, .-leaf .section.note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)" ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: oprofile callgraph support missing for common cpus
On Mon, 2011-12-05 at 09:50 +0100, Joakim Tjernlund wrote: > Benjamin Herrenschmidt wrote on 2011/11/25 > 06:24:32: > > > > On Fri, 2011-11-18 at 09:22 +0100, Joakim Tjernlund wrote: > > > > > I forgot to ask, oprofile mentions setting -no-omit-framepointer to get > > > correct backtrace but I cannot turn on frame pointers for the ppc kernel. > > > Isn't frame pointers needed for pcc? what about user space? > > > > PowerPC always has frame pointers, ignore that :-) > > A bit late but consider this: .../... Right I wasn't clear. We do have frame pointers for non-leaf functions, and we can trace from LR when we are on a leaf function, we can use __builtin_return_address as well. We also explicitely prevent -fno-omit-frame-pointer, iirc, due to a bug with older versions of gcc which could cause miscompiles under some circumstances (though I don't remember the details). Cheers, Ben. > int leaf(int x) > { > return x+3; > } > > which yields(with gcc -O2 -S): > .file "leaf.c" > .section".text" > .align 2 > .globl leaf > .type leaf, @function > leaf: > addi 3,3,3 > blr > .size leaf, .-leaf > .section.note.GNU-stack,"",@progbits > .ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)" > > > Here there is with frame pointer(I guess that the messing around with r11 and > r31 is a defect?): > (With gcc -O2 -S -fno-omit-frame-pointer) > > .file "leaf.c" > .section".text" > .align 2 > .globl leaf > .type leaf, @function > leaf: > stwu 1,-16(1) > addi 3,3,3 > lwz 11,0(1) > stw 31,12(1) > mr 31,1 > lwz 31,-4(11) > mr 1,11 > blr > .size leaf, .-leaf > .section.note.GNU-stack,"",@progbits > .ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)" ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] mmc: sdhci-pltfm: Added sdhci-adjust-timeout quirk
Some controller provides an incorrect timeout value for transfers, So it need the quirk to adjust timeout value to 0xE. E.g. eSDHC of MPC8536, P1010, and P2020. Signed-off-by: Xie Xiaobo --- drivers/mmc/host/sdhci-pltfm.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/mmc/host/sdhci-pltfm.c b/drivers/mmc/host/sdhci-pltfm.c index a9e12ea..b5d6b3f 100644 --- a/drivers/mmc/host/sdhci-pltfm.c +++ b/drivers/mmc/host/sdhci-pltfm.c @@ -2,7 +2,7 @@ * sdhci-pltfm.c Support for SDHCI platform devices * Copyright (c) 2009 Intel Corporation * - * Copyright (c) 2007 Freescale Semiconductor, Inc. + * Copyright (c) 2007, 2011 Freescale Semiconductor, Inc. * Copyright (c) 2009 MontaVista Software, Inc. * * Authors: Xiaobo Xie @@ -68,6 +68,9 @@ void sdhci_get_of_property(struct platform_device *pdev) if (of_get_property(np, "sdhci,1-bit-only", NULL)) host->quirks |= SDHCI_QUIRK_FORCE_1_BIT_DATA; + if (of_get_property(np, "sdhci,sdhci-adjust-timeout", NULL)) + host->quirks |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL; + if (sdhci_of_wp_inverted(np)) host->quirks |= SDHCI_QUIRK_INVERTED_WRITE_PROTECT; -- 1.6.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.
On (Tue) 29 Nov 2011 [09:50:41], Miche Baker-Harvey wrote: > Good grief! Sorry for the spacing mess-up! Here's a resend with > reformatting. > > Amit, > We aren't using either QEMU or kvmtool, but we are using KVM. All So it's a different userspace? Any chance this different userspace is causing these problems to appear? Esp. since I couldn't reproduce with qemu. > the issues we are seeing happen when we try to establish multiple > virtioconsoles at boot time. The command line isn't relevant, but I > can tell you the protocol that's passing between the host (kvm) and > the guest (see the end of this message). > > We do go through the control_work_handler(), but it's not > providing synchronization. Here's a trace of the > control_work_handler() and handle_control_message() calls; note that > there are two concurrent calls to control_work_handler(). Ah; how does that happen? control_work_handler() should just be invoked once, and if there are any more pending work items to be consumed, they should be done within the loop inside control_work_handler(). > I decorated control_work_handler() with a "lifetime" marker, and > passed this value to handle_control_message(), so we can see which > control messages are being handled from which instance of > the control_work_handler() thread. > > Notice that we enter control_work_handler() a second time before > the handling of the second PORT_ADD message is complete. The > first CONSOLE_PORT message is handled by the second > control_work_handler() call, but the second is handled by the first > control_work_handler() call. > > root@myubuntu:~# dmesg | grep MBH > [3371055.808738] control_work_handler #1 > [3371055.809372] + #1 handle_control_message PORT_ADD > [3371055.810169] - handle_control_message PORT_ADD > [3371055.810170] + #1 handle_control_message PORT_ADD > [3371055.810244] control_work_handler #2 > [3371055.810245] + #2 handle_control_message CONSOLE_PORT > [3371055.810246] got hvc_ports_mutex > [3371055.810578] - handle_control_message PORT_ADD > [3371055.810579] + #1 handle_control_message CONSOLE_PORT > [3371055.810580] trylock of hvc_ports_mutex failed > [3371055.811352] got hvc_ports_mutex > [3371055.811370] - handle_control_message CONSOLE_PORT > [3371055.816609] - handle_control_message CONSOLE_PORT > > So, I'm guessing the bug is that there shouldn't be two instances of > control_work_handler() running simultaneously? Yep, I assumed we did that but apparently not. Do you plan to chase this one down? Amit ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] mtd/nand: Add ONFI support for FSL NAND controller
- fix NAND_CMD_READID command for ONFI detect. - add NAND_CMD_PARAM command to read the ONFI parameter page. Signed-off-by: Shengzhou Liu --- drivers/mtd/nand/fsl_elbc_nand.c | 19 --- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c index 742bf73..08a3aba 100644 --- a/drivers/mtd/nand/fsl_elbc_nand.c +++ b/drivers/mtd/nand/fsl_elbc_nand.c @@ -349,19 +349,24 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command, fsl_elbc_run_command(mtd); return; - /* READID must read all 5 possible bytes while CEB is active */ case NAND_CMD_READID: - dev_vdbg(priv->dev, "fsl_elbc_cmdfunc: NAND_CMD_READID.\n"); + case NAND_CMD_PARAM: + dev_vdbg(priv->dev, "fsl_elbc_cmdfunc: NAND_CMD %x\n", command); out_be32(&lbc->fir, (FIR_OP_CM0 << FIR_OP0_SHIFT) | (FIR_OP_UA << FIR_OP1_SHIFT) | (FIR_OP_RBW << FIR_OP2_SHIFT)); - out_be32(&lbc->fcr, NAND_CMD_READID << FCR_CMD0_SHIFT); - /* nand_get_flash_type() reads 8 bytes of entire ID string */ - out_be32(&lbc->fbcr, 8); - elbc_fcm_ctrl->read_bytes = 8; + out_be32(&lbc->fcr, command << FCR_CMD0_SHIFT); + /* reads 8 bytes of entire ID string */ + if (NAND_CMD_READID == command) { + out_be32(&lbc->fbcr, 8); + elbc_fcm_ctrl->read_bytes = 8; + } else { + out_be32(&lbc->fbcr, 256); + elbc_fcm_ctrl->read_bytes = 256; + } elbc_fcm_ctrl->use_mdr = 1; - elbc_fcm_ctrl->mdr = 0; + elbc_fcm_ctrl->mdr = column; set_addr(mtd, 0, 0, 0); fsl_elbc_run_command(mtd); -- 1.6.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] mtd/nand: fixup for fmr initialization of Freescale NAND controller
There was a bug for fmr initialization, which lead to fmr was always 0x100 in fsl_elbc_chip_init() and caused FCM command timeout before calling fsl_elbc_chip_init_tail(). Signed-off-by: Shengzhou Liu --- drivers/mtd/nand/fsl_elbc_nand.c |8 +++- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c index eedd8ee..742bf73 100644 --- a/drivers/mtd/nand/fsl_elbc_nand.c +++ b/drivers/mtd/nand/fsl_elbc_nand.c @@ -659,9 +659,7 @@ static int fsl_elbc_chip_init_tail(struct mtd_info *mtd) if (chip->pagemask & 0xff00) al++; - /* add to ECCM mode set in fsl_elbc_init */ - priv->fmr |= (12 << FMR_CWTO_SHIFT) | /* Timeout > 12 ms */ -(al << FMR_AL_SHIFT); + priv->fmr |= al << FMR_AL_SHIFT; dev_dbg(priv->dev, "fsl_elbc_init: nand->numchips = %d\n", chip->numchips); @@ -764,8 +762,8 @@ static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv) priv->mtd.priv = chip; priv->mtd.owner = THIS_MODULE; - /* Set the ECCM according to the settings in bootloader.*/ - priv->fmr = in_be32(&lbc->fmr) & FMR_ECCM; + /* Set fmr according to the settings in bootloader.*/ + priv->fmr = in_be32(&lbc->fmr); /* fill in nand_chip structure */ /* set up function call table */ -- 1.6.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Re: [PATCHv5] atomic: add *_dec_not_zero
Looking at this: > #ifndef atomic_inc_unless_negative > static inline int atomic_inc_unless_negative(atomic_t *p) > { > int v, v1; > for (v = 0; v >= 0; v = v1) { > v1 = atomic_cmpxchg(p, v, v + 1); > if (likely(v1 == v)) > return 1; > } > return 0; > } > #endif why is it optimised for '*p' being zero?? I'd have though the initial assignment to 'v' should be made by reading '*p' without any memory barriers (etc). David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/6] RFCv2 Fix Fsl 8250 BRK bug
> Anyway, have a look and see if this version of things is acceptable > to all. (Again, the dts update from Kumar isn't shown here). > > Thanks to all who provided the feedback on v1. Looks good to me Acked-by: Alan Cox ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Problem with eLBC?
Dear Kumar Gala! Our company is a client Freescale company, we use mpc8308, mpc8321 and other microprocessors. I have a board on the mpc8308 chip. This board runs the Linux kernel. On this board there is NAND flash and DSP proc. on the Local Bus. Chunk from DTS-file is below: localbus@e0005000 { #address-cells = <2>; #size-cells = <1>; compatible = "fsl,mpc8315-elbc", "fsl,elbc", "simple-bus"; reg = <0xe0005000 0x1000>; interrupts = <77 0x8>; interrupt-parent = <&ipic>; // CS0 and CS1 are swapped when // booting from nand, but the // addresses are the same. ranges = <0x0 0x0 0xfe00 0x0080 0x1 0x0 0xe060 0x2000 0x2 0x0 0xf000 0x0002 0x3 0x0 0xfa00 0x8000>; nand@1,0 { #address-cells = <1>; #size-cells = <1>; compatible = "fsl,mpc8315-fcm-nand", "fsl,elbc-fcm-nand"; reg = <0x1 0x0 0x2000>; u-boot@0 { reg = <0x0 0x10>; read-only; label = "U-Boot-NAND"; }; dtb@10 { reg = <0x10 0x4>; read-only; label = "DTB-NAND"; }; kernel@14 { reg = <0x14 0x20>; read-only; label = "Kernel-NAND"; }; jffs2@34 { reg = <0x0034 0x01c0>; label = "JFFS2-NAND"; }; reserve@1f4 { reg = <0x01f4 0x000c>; label = "Reserve"; }; }; dsp0@D002 { reg = <0xD002 0x1>; interrupts = <18 0x8>; interrupt-parent = <&ipic>; dsp0; }; dsp1@D003 { reg = <0xD003 0x1>; interrupts = <19 0x8>; interrupt-parent = <&ipic>; dsp1; }; User-level application periodically reads data from the DSP and writes the data to the DSP via a character device. When the application is reading from DSP or writing to the DSP, file system calls to cause errors: [root@mpc8308-kd-124 /root]# ls -l mtd->read(0xdc bytes from 0x8af524) returned ECC error mtd->read(0x1fc bytes from 0x8aee04) returned ECC error mtd->read(0xac bytes from 0x8ae554) returned ECC error mtd->read(0x200 bytes from 0x8adc00) returned ECC error .. .. ... when writing to flash errors occur following items: Write of 1662 bytes at 0x01935244 failed. returned -5, retlen 0 Not marking the space at 0x01935244 as dirty because the flash driver returned retlen zero Write of 1662 bytes at 0x0193 failed. returned -5, retlen 0 Not marking the space at 0x0193 as dirty because the flash driver returned retlen zero nand_erase: start = 0x01c7, len = 16384 nand_isbad_bbt(): bbt info for offs 0x01c7: (block 1820) 0x00 nand_write_oob: to = 0x01c7, len = 8 cannot write OOB for EB at 0193, requested 8 bytes, read 0 bytes, error -5 I write to support this problem and I was told the followin
[PATCH] sbc834x: put full compat string in board match check
The commit 883c2cfc8bcc0fd00c5d9f596fb8870f481b5bda: "fix of_flat_dt_is_compatible() to match the full compatible string" causes silent boot death on the sbc8349 board because it was just looking for 8349 and not 8349E -- as originally there were non-E (no SEC/encryption) chips available. Just add the E to the board detection string since all boards I've seen were manufactured with the E versions. Signed-off-by: Paul Gortmaker diff --git a/arch/powerpc/platforms/83xx/sbc834x.c b/arch/powerpc/platforms/83xx/sbc834x.c index af41d8c..f5a783a 100644 --- a/arch/powerpc/platforms/83xx/sbc834x.c +++ b/arch/powerpc/platforms/83xx/sbc834x.c @@ -102,11 +102,11 @@ static int __init sbc834x_probe(void) { unsigned long root = of_get_flat_dt_root(); - return of_flat_dt_is_compatible(root, "SBC834x"); + return of_flat_dt_is_compatible(root, "SBC834xE"); } define_machine(sbc834x) { - .name = "SBC834x", + .name = "SBC834xE", .probe = sbc834x_probe, .setup_arch = sbc834x_setup_arch, .init_IRQ = sbc834x_init_IRQ, -- 1.7.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 02/10] powerpc: Consolidate mpic_alloc() OF address translation
On Dec 03, 2011, at 10:53, Kumar Gala wrote: > On Dec 2, 2011, at 10:27 AM, Kyle Moffett wrote: >> Instead of using the open-coded "reg" property lookup and address >> translation in mpic_alloc(), directly call of_address_to_resource(). >> This includes various workarounds for special cases which the naive >> of_address_translate() does not. >> >> Afterwards it is possible to remove the copiously copy-pasted calls to >> of_address_translate() from the 85xx/86xx/powermac platforms. >> >> Signed-off-by: Kyle Moffett >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Grant Likely >> Cc: Kumar Gala >> --- >> arch/powerpc/platforms/85xx/corenet_ds.c |9 + >> arch/powerpc/platforms/85xx/ksi8560.c |9 + >> arch/powerpc/platforms/85xx/mpc8536_ds.c |9 + >> arch/powerpc/platforms/85xx/mpc85xx_ads.c |9 + >> arch/powerpc/platforms/85xx/mpc85xx_cds.c |9 + >> arch/powerpc/platforms/85xx/mpc85xx_ds.c | 11 + >> arch/powerpc/platforms/85xx/mpc85xx_mds.c |9 + >> arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 11 + >> arch/powerpc/platforms/85xx/p1010rdb.c|9 + >> arch/powerpc/platforms/85xx/p1022_ds.c|9 + >> arch/powerpc/platforms/85xx/p1023_rds.c |9 + >> arch/powerpc/platforms/85xx/sbc8548.c |9 + >> arch/powerpc/platforms/85xx/sbc8560.c |9 + >> arch/powerpc/platforms/85xx/socrates.c|9 + >> arch/powerpc/platforms/85xx/stx_gp3.c |9 + >> arch/powerpc/platforms/85xx/tqm85xx.c |9 + >> arch/powerpc/platforms/85xx/xes_mpc85xx.c |9 + >> arch/powerpc/platforms/86xx/pic.c |4 +- >> arch/powerpc/platforms/powermac/pic.c |8 +--- >> arch/powerpc/sysdev/mpic.c| 61 >> - >> 20 files changed, 55 insertions(+), 175 deletions(-) > > What about cleaning up: > > arch/powerpc/platforms/chrp/setup.c:chrp_mpic = mpic_alloc(np, opaddr, > MPIC_PRIMARY, > arch/powerpc/platforms/embedded6xx/holly.c: mpic = mpic_alloc(tsi_pic, > mpic_paddr, > arch/powerpc/platforms/embedded6xx/linkstation.c: mpic = > mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC > arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c: mpic = > mpic_alloc(tsi_pic, mpic_paddr, > arch/powerpc/platforms/embedded6xx/storcenter.c:mpic = > mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC > arch/powerpc/platforms/maple/setup.c: mpic = mpic_alloc(mpic_node, > openpic_addr, flags, > arch/powerpc/platforms/pasemi/setup.c: mpic = mpic_alloc(mpic_node, > openpic_addr, > arch/powerpc/platforms/pseries/setup.c: mpic = mpic_alloc(pSeries_mpic_node, > openpic_addr, > > Seems like we should be able to remove the 'phys_addr' argument altogether. Well, ideally the MPIC code would just be a OF platform_driver with a bit of supplementary platform_data to deal with device-tree flaws. Unfortunately it's quite a long way from that. Some platforms seem to prefer to use a "platform-open-pic" property on the root node instead of setting up the "reg" node of the open-pic itself. Furthermore, the ISU configuration seems to be board-specific. pSeries seems to have all of the ISUs configured as additional cells in the "platform-open-pic" property, but almost all of the rest are just hard-coded offsets from the PIC address in the board-support code. If it was possible to fix the device-trees on the systems with hardcoded offsets then we could put the ISU addresses into the "platform-open-pic" property and test that in mpic_alloc(). Otherwise there's still going to be a fair amount of hardcoding for specific boards. Regardless, I think this patch series is a good first cut and cleaning up some of the more egregious code duplication there. Cheers, Kyle Moffett -- Curious about my work on the Debian powerpcspe port? I'm keeping a blog here: http://pureperl.blogspot.com/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 02/10] powerpc: Consolidate mpic_alloc() OF address translation
On Dec 5, 2011, at 12:41 PM, Moffett, Kyle D wrote: > On Dec 03, 2011, at 10:53, Kumar Gala wrote: >> On Dec 2, 2011, at 10:27 AM, Kyle Moffett wrote: >>> Instead of using the open-coded "reg" property lookup and address >>> translation in mpic_alloc(), directly call of_address_to_resource(). >>> This includes various workarounds for special cases which the naive >>> of_address_translate() does not. >>> >>> Afterwards it is possible to remove the copiously copy-pasted calls to >>> of_address_translate() from the 85xx/86xx/powermac platforms. >>> >>> Signed-off-by: Kyle Moffett >>> Cc: Benjamin Herrenschmidt >>> Cc: Paul Mackerras >>> Cc: Grant Likely >>> Cc: Kumar Gala >>> --- >>> arch/powerpc/platforms/85xx/corenet_ds.c |9 + >>> arch/powerpc/platforms/85xx/ksi8560.c |9 + >>> arch/powerpc/platforms/85xx/mpc8536_ds.c |9 + >>> arch/powerpc/platforms/85xx/mpc85xx_ads.c |9 + >>> arch/powerpc/platforms/85xx/mpc85xx_cds.c |9 + >>> arch/powerpc/platforms/85xx/mpc85xx_ds.c | 11 + >>> arch/powerpc/platforms/85xx/mpc85xx_mds.c |9 + >>> arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 11 + >>> arch/powerpc/platforms/85xx/p1010rdb.c|9 + >>> arch/powerpc/platforms/85xx/p1022_ds.c|9 + >>> arch/powerpc/platforms/85xx/p1023_rds.c |9 + >>> arch/powerpc/platforms/85xx/sbc8548.c |9 + >>> arch/powerpc/platforms/85xx/sbc8560.c |9 + >>> arch/powerpc/platforms/85xx/socrates.c|9 + >>> arch/powerpc/platforms/85xx/stx_gp3.c |9 + >>> arch/powerpc/platforms/85xx/tqm85xx.c |9 + >>> arch/powerpc/platforms/85xx/xes_mpc85xx.c |9 + >>> arch/powerpc/platforms/86xx/pic.c |4 +- >>> arch/powerpc/platforms/powermac/pic.c |8 +--- >>> arch/powerpc/sysdev/mpic.c| 61 >>> - >>> 20 files changed, 55 insertions(+), 175 deletions(-) >> >> What about cleaning up: >> >> arch/powerpc/platforms/chrp/setup.c:chrp_mpic = mpic_alloc(np, opaddr, >> MPIC_PRIMARY, >> arch/powerpc/platforms/embedded6xx/holly.c: mpic = mpic_alloc(tsi_pic, >> mpic_paddr, >> arch/powerpc/platforms/embedded6xx/linkstation.c: mpic = >> mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC >> arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c: mpic = >> mpic_alloc(tsi_pic, mpic_paddr, >> arch/powerpc/platforms/embedded6xx/storcenter.c:mpic = >> mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC >> arch/powerpc/platforms/maple/setup.c: mpic = mpic_alloc(mpic_node, >> openpic_addr, flags, >> arch/powerpc/platforms/pasemi/setup.c: mpic = mpic_alloc(mpic_node, >> openpic_addr, >> arch/powerpc/platforms/pseries/setup.c: mpic = mpic_alloc(pSeries_mpic_node, >> openpic_addr, >> >> Seems like we should be able to remove the 'phys_addr' argument altogether. > > Well, ideally the MPIC code would just be a OF platform_driver with a > bit of supplementary platform_data to deal with device-tree flaws. > Unfortunately it's quite a long way from that. > > Some platforms seem to prefer to use a "platform-open-pic" property on > the root node instead of setting up the "reg" node of the open-pic > itself. > > Furthermore, the ISU configuration seems to be board-specific. pSeries > seems to have all of the ISUs configured as additional cells in the > "platform-open-pic" property, but almost all of the rest are just > hard-coded offsets from the PIC address in the board-support code. > > If it was possible to fix the device-trees on the systems with hardcoded > offsets then we could put the ISU addresses into the "platform-open-pic" > property and test that in mpic_alloc(). > > Otherwise there's still going to be a fair amount of hardcoding for > specific boards. > > Regardless, I think this patch series is a good first cut and cleaning > up some of the more egregious code duplication there. > > Cheers, > Kyle Moffett Agreed its a good first pass cleanup but it doesn't seem like we're that far off from remove the 'phys_addr' being passed in. - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] mtd/nand: fixup for fmr initialization of Freescale NAND controller
On 12/05/2011 04:54 AM, Shengzhou Liu wrote: > There was a bug for fmr initialization, which lead to fmr was always 0x100 > in fsl_elbc_chip_init() and caused FCM command timeout before calling > fsl_elbc_chip_init_tail(). > > Signed-off-by: Shengzhou Liu > --- > drivers/mtd/nand/fsl_elbc_nand.c |8 +++- > 1 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/mtd/nand/fsl_elbc_nand.c > b/drivers/mtd/nand/fsl_elbc_nand.c > index eedd8ee..742bf73 100644 > --- a/drivers/mtd/nand/fsl_elbc_nand.c > +++ b/drivers/mtd/nand/fsl_elbc_nand.c > @@ -659,9 +659,7 @@ static int fsl_elbc_chip_init_tail(struct mtd_info *mtd) > if (chip->pagemask & 0xff00) > al++; > > - /* add to ECCM mode set in fsl_elbc_init */ > - priv->fmr |= (12 << FMR_CWTO_SHIFT) | /* Timeout > 12 ms */ > - (al << FMR_AL_SHIFT); > + priv->fmr |= al << FMR_AL_SHIFT; > > dev_dbg(priv->dev, "fsl_elbc_init: nand->numchips = %d\n", > chip->numchips); > @@ -764,8 +762,8 @@ static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv) > priv->mtd.priv = chip; > priv->mtd.owner = THIS_MODULE; > > - /* Set the ECCM according to the settings in bootloader.*/ > - priv->fmr = in_be32(&lbc->fmr) & FMR_ECCM; > + /* Set fmr according to the settings in bootloader.*/ > + priv->fmr = in_be32(&lbc->fmr); > > /* fill in nand_chip structure */ > /* set up function call table */ We shouldn't be relying on the bootloader to provide a sane value here -- the bootloader may not have used/initialized NAND at all. It's sort of OK for ECCM, since unless you're trying to match an externally programmed flash, or the bootloader uses the flash, all we really care about is that the value stay consistent. The timeout, OTOH, must not be set too low or things won't work. We should just set a value that we believe to be high enough for all uses. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
On 12/05/2011 12:47 AM, Artem Bityutskiy wrote: > On Sun, 2011-12-04 at 12:31 +0800, shuo@freescale.com wrote: >> +/* >> + * Freescale FCM controller has a 2K size limitation of buffer >> + * RAM, so elbc_fcm_ctrl->buffer have to be used if writesize >> + * of chip is greater than 2048. >> + * We malloc a large enough buffer (maximum page size is 16K). >> + */ >> +elbc_fcm_ctrl->buffer = kmalloc(1024 * 16 + 1024, GFP_KERNEL); >> +if (!elbc_fcm_ctrl->buffer) { >> +dev_err(dev, "failed to allocate memory\n"); >> +mutex_unlock(&fsl_elbc_nand_mutex); >> +ret = -ENOMEM; >> +goto err; >> +} > > Sorry for returning to this again and agian - I do not have time to dig > suggest you the right solutions on the one hand, you do not provide me a > good answer on the other hand (or I forgot?). > > 16KiB pages do not even exist I believe. Googling turns up some hints of it, but nothing concrete such as a datasheet. We can assume 8K max for now and adjust it later, as the need becomes clear. > And you kmalloc 33KiB or RAM 17KiB, or 9KiB if we forget about 16K-page NAND. > although in most cases you need only 5KiB. I think this is wrong - > what is the very strong reason of wasting RAM you have? > > Why you cannot allocate exactly the required amount of RAM after > 'nand_scan_ident()' finishes and you know the page size? Because this is a controller resource, shared by multiple NAND chips that may be different page sizes (even if not, it's adding another point of synchronization required between initialization of different chips). I don't think it's worth the gymnastics to save a few KiB. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Powerbook G4 and sound
Hello, I've got problem with sound on my Powerbook G4 Titanium (PowerBook3,5). Sound always cranks and skips when I moving my touchpad, regardless of the source of sound. It happens in all my programs, including vlc, mplayer, e-uae and ioquake3 - they show in the log messeges like "underrun occured" and "broken pipe". If I remember correctly, this bug appears on kernel 2.6.32-5 (from Debian Sqeeze) and later. On kernel 2.6.38 from Ubuntu sound works _almost_ normally (cranks occur rarely). I used also mpd server, but it breaks after sound cranks (ncmpc shows "Timeout") and I can only restart mpd server (/etc/init.d/mpd restart) to get mpd working (for some time). Log from alsa-test.sh in attachement. Sorry for my English :) Best regards, Петр метель alsa-info.txt.zUlUP5eQPO Description: Binary data ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problem with eLBC?
On 12/05/2011 08:02 AM, Alexander Lyasin wrote: > In reply to your Service Request SR 1-807899446: > > Yes, due to several design peculiarities in local bus nand controller, > simultaneous accesses to nand flash and to other local bus memory > controller may cause nand flash controller access failure. Our linux > team suggested to use "software lock" method to avoid this problem - > please do not use other local bus controllers, when nand flash is accessed. What kernel version are you using? The latest mainline kernel should not have this issue. Make sure you have these patches: commit d08e44570ed611c527a1062eb4f8c6ac61832e6e Author: Shengzhou Liu Date: Thu May 19 18:48:01 2011 +0800 powerpc/fsl_lbc: Add workaround for ELBC-A001 erratum Simultaneous FCM and GPCM or UPM operation may erroneously trigger bus monitor timeout. Set the local bus monitor timeout value to the maximum by setting LBCR[BMT] = 0 and LBCR[BMTPS] = 0xF. Signed-off-by: Shengzhou Liu Signed-off-by: Kumar Gala and commit 476459a6cf46d20ec73d9b211f3894ced5f9871e Author: Scott Wood Date: Fri Nov 13 14:13:01 2009 -0600 mtd: eLBC NAND: use recommended command sequences Currently, the program and erase sequences do not wait for completion, instead relying on a subsequent waitfunc() callback. However, this causes the chipselect to be deasserted while the NAND chip is still asserting the busy pin, which can corrupt activity on other chipselects. This patch switches to using the sequences recommended by the manual, in which a wait is performed within the initial command sequence. We can now re-use the status byte from the initial command sequence, rather than having to do another status read in the waitfunc. Since we're already touching the command sequences, it also cleans up some cruft in SEQIN that isn't needed since we cannot program partial pages outside of OOB. Signed-off-by: Scott Wood Reported-by: Suchit Lepcha Signed-off-by: Artem Bityutskiy Signed-off-by: David Woodhouse -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] rapidio/tsi721: switch to dma_zalloc_coherent
Replaces pair dma_alloc_coherent()+memset() with new dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2 Signed-off-by: Alexandre Bounine --- drivers/rapidio/devices/tsi721.c | 17 - 1 files changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c index 5225930..514c28c 100644 --- a/drivers/rapidio/devices/tsi721.c +++ b/drivers/rapidio/devices/tsi721.c @@ -851,14 +851,12 @@ static int tsi721_doorbell_init(struct tsi721_device *priv) INIT_WORK(&priv->idb_work, tsi721_db_dpc); /* Allocate buffer for inbound doorbells queue */ - priv->idb_base = dma_alloc_coherent(&priv->pdev->dev, + priv->idb_base = dma_zalloc_coherent(&priv->pdev->dev, IDB_QSIZE * TSI721_IDB_ENTRY_SIZE, &priv->idb_dma, GFP_KERNEL); if (!priv->idb_base) return -ENOMEM; - memset(priv->idb_base, 0, IDB_QSIZE * TSI721_IDB_ENTRY_SIZE); - dev_dbg(&priv->pdev->dev, "Allocated IDB buffer @ %p (phys = %llx)\n", priv->idb_base, (unsigned long long)priv->idb_dma); @@ -904,7 +902,7 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, int chnum) */ /* Allocate space for DMA descriptors */ - bd_ptr = dma_alloc_coherent(&priv->pdev->dev, + bd_ptr = dma_zalloc_coherent(&priv->pdev->dev, bd_num * sizeof(struct tsi721_dma_desc), &bd_phys, GFP_KERNEL); if (!bd_ptr) @@ -913,8 +911,6 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, int chnum) priv->bdma[chnum].bd_phys = bd_phys; priv->bdma[chnum].bd_base = bd_ptr; - memset(bd_ptr, 0, bd_num * sizeof(struct tsi721_dma_desc)); - dev_dbg(&priv->pdev->dev, "DMA descriptors @ %p (phys = %llx)\n", bd_ptr, (unsigned long long)bd_phys); @@ -922,7 +918,7 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, int chnum) sts_size = (bd_num >= TSI721_DMA_MINSTSSZ) ? bd_num : TSI721_DMA_MINSTSSZ; sts_size = roundup_pow_of_two(sts_size); - sts_ptr = dma_alloc_coherent(&priv->pdev->dev, + sts_ptr = dma_zalloc_coherent(&priv->pdev->dev, sts_size * sizeof(struct tsi721_dma_sts), &sts_phys, GFP_KERNEL); if (!sts_ptr) { @@ -938,8 +934,6 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, int chnum) priv->bdma[chnum].sts_base = sts_ptr; priv->bdma[chnum].sts_size = sts_size; - memset(sts_ptr, 0, sts_size); - dev_dbg(&priv->pdev->dev, "desc status FIFO @ %p (phys = %llx) size=0x%x\n", sts_ptr, (unsigned long long)sts_phys, sts_size); @@ -1400,7 +1394,7 @@ static int tsi721_open_outb_mbox(struct rio_mport *mport, void *dev_id, /* Outbound message descriptor status FIFO allocation */ priv->omsg_ring[mbox].sts_size = roundup_pow_of_two(entries + 1); - priv->omsg_ring[mbox].sts_base = dma_alloc_coherent(&priv->pdev->dev, + priv->omsg_ring[mbox].sts_base = dma_zalloc_coherent(&priv->pdev->dev, priv->omsg_ring[mbox].sts_size * sizeof(struct tsi721_dma_sts), &priv->omsg_ring[mbox].sts_phys, GFP_KERNEL); @@ -1412,9 +1406,6 @@ static int tsi721_open_outb_mbox(struct rio_mport *mport, void *dev_id, goto out_desc; } - memset(priv->omsg_ring[mbox].sts_base, 0, - entries * sizeof(struct tsi721_dma_sts)); - /* * Configure Outbound Messaging Engine */ -- 1.7.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc/85xx: add a 32-bit P1022DS device tree
Kumar Gala wrote: > look at how mpc8572ds handles 36b.dts we put common definitions in a shared > file. Ok, I've made those changes, but when I boot the kernel, I'm seeing this. Can you give me a clue as to what's wrong? PCI: Probing PCI hardware pci :00:00.0: [1957:0110] type 1 class 0x000b20 pci :00:00.0: ignoring class b20 (doesn't match header type 01) pci :00:00.0: supports D1 D2 pci :00:00.0: PME# supported from D0 D1 D2 D3hot D3cold pci :00:00.0: PME# disabled pci :00:00.0: PCI bridge to [bus 01-ff] pci 0001:02:00.0: [1957:0110] type 1 class 0x000b20 pci 0001:02:00.0: ignoring class b20 (doesn't match header type 01) pci 0001:02:00.0: supports D1 D2 pci 0001:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold pci 0001:02:00.0: PME# disabled pci 0001:02:00.0: PCI bridge to [bus 03-ff] pci 0002:04:00.0: [1957:0110] type 1 class 0x000b20 pci 0002:04:00.0: ignoring class b20 (doesn't match header type 01) pci 0002:04:00.0: supports D1 D2 pci 0002:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold pci 0002:04:00.0: PME# disabled pci 0002:05:00.0: [8086:10d3] type 0 class 0x000200 pci 0002:05:00.0: reg 10: [mem 0x8000-0x8001] pci 0002:05:00.0: reg 14: [mem 0x8008-0x800f] pci 0002:05:00.0: reg 18: [io 0x1000-0x101f] pci 0002:05:00.0: reg 1c: [mem 0x8010-0x80103fff] pci 0002:05:00.0: reg 30: [mem 0x-0x0003 pref] pci 0002:05:00.0: PME# supported from D0 D3hot D3cold pci 0002:05:00.0: PME# disabled pci 0002:04:00.0: PCI bridge to [bus 05-ff] pci 0002:04:00.0: bridge window [mem 0x8000-0x801f] PCI: Cannot allocate resource region 0 of device 0002:05:00.0, will remap PCI: Cannot allocate resource region 1 of device 0002:05:00.0, will remap PCI: Cannot allocate resource region 3 of device 0002:05:00.0, will remap PCI :00 Cannot reserve Legacy IO [io 0xffbed000-0xffbedfff] PCI 0001:02 Cannot reserve Legacy IO [io 0xffbdb000-0xffbdbfff] PCI 0002:04 Cannot reserve Legacy IO [io 0xffbc9000-0xffbc9fff] PCI: max bus depth: 1 pci_try_num: 2 pci :00:00.0: PCI bridge to [bus 01-01] pci :00:00.0: bridge window [io 0xffbed000-0xffbfcfff] pci :00:00.0: bridge window [mem 0xa000-0xbfff] pci 0001:02:00.0: PCI bridge to [bus 03-03] pci 0001:02:00.0: bridge window [io 0xffbdb000-0xffbeafff] pci 0001:02:00.0: bridge window [mem 0xc000-0xdfff] pci 0002:04:00.0: BAR 9: can't assign mem pref (size 0x10) pci 0002:05:00.0: BAR 1: assigned [mem 0x8000-0x8007] pci 0002:05:00.0: BAR 1: set to [mem 0x8000-0x8007] (PCI address [0xe000 -0xe007]) pci 0002:05:00.0: BAR 6: assigned [mem 0x8008-0x800b pref] pci 0002:05:00.0: BAR 0: assigned [mem 0x800c-0x800d] pci 0002:05:00.0: BAR 0: set to [mem 0x800c-0x800d] (PCI address [0xe00c -0xe00d]) pci 0002:05:00.0: BAR 3: assigned [mem 0x800e-0x800e3fff] pci 0002:05:00.0: BAR 3: set to [mem 0x800e-0x800e3fff] (PCI address [0xe00e -0xe00e3fff]) pci 0002:04:00.0: PCI bridge to [bus 05-05] pci 0002:04:00.0: bridge window [io 0xffbc9000-0xffbd8fff] pci 0002:04:00.0: bridge window [mem 0x8000-0x9fff] pci :00:00.0: enabling device (0106 -> 0107) pci 0001:02:00.0: enabling device (0106 -> 0107) pci 0002:04:00.0: enabling device (0106 -> 0107) pci_bus :00: resource 0 [io 0xffbed000-0xffbfcfff]
next BUG: using smp_processor_id() in preemptible
3.2.0-rc3-next-20111202 with CONFIG_DEBUG_PREEMPT=y gives me lots of Dec 4 20:03:19 thorn kernel: BUG: using smp_processor_id() in preemptible [] code: startpar/1365 Dec 4 20:03:19 thorn kernel: caller is .arch_local_irq_restore+0x44/0x90 Dec 4 20:03:19 thorn kernel: Call Trace: Dec 4 20:03:19 thorn kernel: [c001b45a7c60] [c0011fe8] .show_stack+0x6c/0x16c (unreliable) Dec 4 20:03:19 thorn kernel: [c001b45a7d10] [c024318c] .debug_smp_processor_id+0xe4/0x11c Dec 4 20:03:19 thorn kernel: [c001b45a7da0] [c000e2e8] .arch_local_irq_restore+0x44/0x90 Dec 4 20:03:19 thorn kernel: [c001b45a7e30] [c0005870] .do_hash_page+0x70/0x74 Dec 4 20:03:21 thorn kernel: debug_smp_processor_id: 21950 callbacks suppressed from the u64 *next_tb = &__get_cpu_var(decrementers_next_tb) in decrementer_check_overflow(): I've no idea whether it's safe just to use get_cpu_var then put_cpu_var there instead, but no hurry, I can survive with DEBUG_PREEMPT off. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
"KVM: PPC: booke: Improve timer register emulation" breaks Book3s HV
I'm not sure why yet, but commit 8a97c432 ("KVM: PPC: booke: Improve timer register emulation") in Alex's kvm-ppc-next branch is breaking Book3S HV KVM on POWER7. Guest cpus fail to spin up, and even with just one cpu, the guest stalls every so often. If I stop the guest and inspect the state with qemu, PC is at 0x900. Reverting 8a97c432 makes it work properly again. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Provide a way for KVM to indicate that NV GPR values are lost
This fixes a problem where a CPU thread coming out of nap mode can think it has valid values in the nonvolatile GPRs (r14 - r31) as saved away in power7_idle, but in fact the values have been trashed because the thread was used for KVM in the mean time. The result is that the thread crashes because code that called power7_idle (e.g., pnv_smp_cpu_kill_self()) goes to use values in registers that have been trashed. The bit field in SRR1 that tells whether state was lost only reflects the most recent nap, which may not have been the nap instruction in power7_idle. So we need an extra PACA field to indicate that state has been lost even if SRR1 indicates that the most recent nap didn't lose state. We clear this field when saving the state in power7_idle, we set it to a non-zero value when we use the thread for KVM, and we test it in power7_wakeup_noloss. Signed-off-by: Paul Mackerras --- I assume this should go via Ben's tree, since it touches more powerpc code than PPC KVM code. arch/powerpc/include/asm/paca.h |1 + arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kernel/idle_power7.S |4 arch/powerpc/kvm/book3s_hv_rmhandlers.S |3 +++ 4 files changed, 9 insertions(+) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 17722c7..269c05a 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -135,6 +135,7 @@ struct paca_struct { u8 hard_enabled;/* set if irqs are enabled in MSR */ u8 io_sync; /* writel() needs spin_unlock sync */ u8 irq_work_pending;/* IRQ_WORK interrupt while soft-disable */ + u8 nap_state_lost; /* NV GPR values lost in power7_idle */ #ifdef CONFIG_PPC_POWERNV /* Pointer to OPAL machine check event structure set by the diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index ec24b36..8e0db0b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -208,6 +208,7 @@ int main(void) DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time)); DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time)); DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save)); + DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct, nap_state_lost)); #endif /* CONFIG_PPC64 */ /* RTAS */ diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 3a70845..fcdff19 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -54,6 +54,7 @@ _GLOBAL(power7_idle) li r0,0 stb r0,PACASOFTIRQEN(r13) /* we'll hard-enable shortly */ stb r0,PACAHARDIRQEN(r13) + stb r0,PACA_NAPSTATELOST(r13) /* Continue saving state */ SAVE_GPR(2, r1) @@ -86,6 +87,9 @@ _GLOBAL(power7_wakeup_loss) rfid _GLOBAL(power7_wakeup_noloss) + lbz r0,PACA_NAPSTATELOST(r13) + cmpwi r0,0 + bne .power7_wakeup_loss ld r1,PACAR1(r13) ld r4,_MSR(r1) ld r5,_NIP(r1) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 7b8dbf6..b70bf22 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -112,6 +112,9 @@ kvm_start_guest: stbcix r0, r5, r6 /* clear it */ stwcix r8, r5, r7 /* EOI it */ + /* NV GPR values from power7_idle() will no longer be valid */ + stb r0, PACA_NAPSTATELOST(r13) + .global kvmppc_hv_entry kvmppc_hv_entry: ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/powernv: Fix problems in onlining CPUs
At present, on the powernv platform, if you off-line a CPU that was online, and then try to on-line it again, the kernel generates a warning message "OPAL Error -1 starting CPU n". Furthermore, if the CPU is a secondary thread that was used by KVM while it was off-line, the CPU fails to come online. The first problem is fixed by only calling OPAL to start the CPU the first time it is on-lined, as indicated by the cpu_start field of its PACA being zero. The second problem is fixed by restoring the cpu_start field to 1 instead of 0 when using the CPU within KVM. Signed-off-by: Paul Mackerras --- diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e37f8f4..ca9b733 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -65,7 +65,7 @@ BEGIN_FTR_SECTION lbz r0,PACAPROCSTART(r13) cmpwi r0,0x80 bne 1f - li r0,0 + li r0,1 stb r0,PACAPROCSTART(r13) b kvm_start_guest 1: diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index e877366..17210c5 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -75,7 +75,7 @@ int __devinit pnv_smp_kick_cpu(int nr) /* On OPAL v2 the CPU are still spinning inside OPAL itself, * get them back now */ - if (firmware_has_feature(FW_FEATURE_OPALv2)) { + if (!paca[nr].cpu_start && firmware_has_feature(FW_FEATURE_OPALv2)) { pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", nr, pcpu); rc = opal_start_cpu(pcpu, start_here); if (rc != OPAL_SUCCESS) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 07/13] KVM: PPC: Allow use of small pages to back Book3S HV guests
This relaxes the requirement that the guest memory be provided as 16MB huge pages, allowing it to be provided as normal memory, i.e. in pages of PAGE_SIZE bytes (4k or 64k). To allow this, we index the kvm->arch.slot_phys[] arrays with a small page index, even if huge pages are being used, and use the low-order 5 bits of each entry to store the order of the enclosing page with respect to normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE). Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h |8 ++ arch/powerpc/include/asm/kvm_host.h |3 +- arch/powerpc/include/asm/kvm_ppc.h |2 +- arch/powerpc/include/asm/reg.h |1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 122 -- arch/powerpc/kvm/book3s_hv.c | 57 -- arch/powerpc/kvm/book3s_hv_rm_mmu.c |6 +- 7 files changed, 130 insertions(+), 69 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index ab6772e..d55e6b4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -107,4 +107,12 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) return 0; /* error */ } +static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, + unsigned long pagesize) +{ + unsigned long mask = (pagesize >> PAGE_SHIFT) - 1; + + return !(memslot->base_gfn & mask) && !(memslot->npages & mask); +} + #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2a52bdb..ba1da85 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -176,14 +176,13 @@ struct revmap_entry { }; /* Low-order bits in kvm->arch.slot_phys[][] */ +#define KVMPPC_PAGE_ORDER_MASK 0x1f #define KVMPPC_GOT_PAGE0x80 struct kvm_arch { #ifdef CONFIG_KVM_BOOK3S_64_HV unsigned long hpt_virt; struct revmap_entry *revmap; - unsigned long ram_psize; - unsigned long ram_porder; unsigned int lpid; unsigned int host_lpid; unsigned long host_lpcr; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 111e1b4..a61b5b5 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -122,7 +122,7 @@ extern void kvmppc_free_hpt(struct kvm *kvm); extern long kvmppc_prepare_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem); extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, - struct kvm_memory_slot *memslot); + struct kvm_memory_slot *memslot, unsigned long porder); extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 559da19..4599d12 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -237,6 +237,7 @@ #define LPCR_ISL (1ul << (63-2)) #define LPCR_VC_SH (63-2) #define LPCR_DPFD_SH (63-11) +#define LPCR_VRMASD (0x1ful << (63-16)) #define LPCR_VRMA_L (1ul << (63-12)) #define LPCR_VRMA_LP0(1ul << (63-15)) #define LPCR_VRMA_LP1(1ul << (63-16)) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 87016cc..cc18f3d 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -34,8 +34,6 @@ #include #include -/* Pages in the VRMA are 16MB pages */ -#define VRMA_PAGE_ORDER24 #define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ @@ -95,17 +93,31 @@ void kvmppc_free_hpt(struct kvm *kvm) free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT); } -void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot) +/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */ +static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize) +{ + return (pgsize > 0x1000) ? HPTE_V_LARGE : 0; +} + +/* Bits in second HPTE dword for pagesize 4k, 64k or 16M */ +static inline unsigned long hpte1_pgsize_encoding(unsigned long pgsize) +{ + return (pgsize == 0x1) ? 0x1000 : 0; +} + +void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, +unsigned long porder) { - struct kvm *kvm = vcpu->kvm; unsigned long i; unsigned long npages; unsigned long hp_v, hp_r; unsigned long addr, hash; - unsigned long porder = kvm->arch.ram_porder; + unsigned long psize; + unsigned long hp0, hp1;
[PATCH 12/13] KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath a Book3S HV guest, on processors that support virtualized partition memory, that is, POWER7. Instead of pinning all the guest's pages, we now look in the host userspace Linux page tables to find the mapping for a given guest page. Then, if the userspace Linux PTE gets invalidated, kvm_unmap_hva() gets called for that address, and we replace all the guest HPTEs that refer to that page with absent HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit set, which will cause an HDSI when the guest tries to access them. Finally, the page fault handler is extended to reinstantiate the guest HPTE when the guest tries to access a page which has been paged out. Since we can't intercept the guest DSI and ISI interrupts on PPC970, we still have to pin all the guest pages on PPC970. We have a new flag, kvm->arch.using_mmu_notifiers, that indicates whether we can page guest pages out. If it is not set, the MMU notifier callbacks do nothing and everything operates as before. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h|4 + arch/powerpc/include/asm/kvm_book3s_64.h | 31 arch/powerpc/include/asm/kvm_host.h | 16 ++ arch/powerpc/include/asm/reg.h |3 + arch/powerpc/kvm/Kconfig |1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 268 -- arch/powerpc/kvm/book3s_hv.c | 25 ++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 140 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 49 ++ arch/powerpc/kvm/powerpc.c |3 + arch/powerpc/mm/hugetlbpage.c|2 + 11 files changed, 483 insertions(+), 59 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5ac53f9..72688d8 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -145,6 +145,10 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); +extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, + unsigned long *rmap, long pte_index, int realmode); +extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep, + unsigned long pte_index); extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 9a59b6d..75a1b42 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -130,6 +130,37 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type; } +/* + * Lock and read a linux PTE. If it's present and writable, atomically + * set dirty and referenced bits and return the PTE, otherwise return 0. + */ +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p) +{ + pte_t pte, tmp; + + /* wait until _PAGE_BUSY is clear then set it atomically */ + __asm__ __volatile__ ( + "1: ldarx %0,0,%3\n" + " andi. %1,%0,%4\n" + " bne-1b\n" + " ori %1,%0,%4\n" + " stdcx. %1,0,%3\n" + " bne-1b" + : "=&r" (pte), "=&r" (tmp), "=m" (*p) + : "r" (p), "i" (_PAGE_BUSY) + : "cc"); + + if (pte_present(pte)) { + pte = pte_mkyoung(pte); + if (pte_write(pte)) + pte = pte_mkdirty(pte); + } + + *p = pte; /* clears _PAGE_BUSY */ + + return pte; +} + /* Return HPTE cache control bits corresponding to Linux pte bits */ static inline unsigned long hpte_cache_bits(unsigned long pte_val) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index c9c92f0..eb20ddc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -32,6 +32,7 @@ #include #include #include +#include #define KVM_MAX_VCPUS NR_CPUS #define KVM_MAX_VCORES NR_CPUS @@ -43,6 +44,19 @@ #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 #endif +#ifdef CONFIG_KVM_BOOK3S_64_HV +#include + +#define KVM_ARCH_WANT_MMU_NOTIFIER + +struct kvm; +extern int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); +extern int kvm_age_hva(struct kvm *kvm, unsigned long hva); +extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); +extern void kvm_set_spte_hva(struct kv
[PATCH 01/13] KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code
This moves the get/set_one_reg implementation down from powerpc.c into booke.c, book3s_pr.c and book3s_hv.c. This avoids #ifdefs in C code, but more importantly, it fixes a bug on Book3s HV where we were accessing beyond the end of the kvm_vcpu struct (via the to_book3s() macro) and corrupting memory, causing random crashes and file corruption. On Book3s HV we only accept setting the HIOR to zero, since the guest runs in supervisor mode and its vectors are never offset from zero. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_ppc.h |3 ++ arch/powerpc/kvm/book3s_hv.c | 33 ++ arch/powerpc/kvm/book3s_pr.c | 33 ++ arch/powerpc/kvm/booke.c | 10 + arch/powerpc/kvm/powerpc.c | 39 5 files changed, 79 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 5192c2e..fc2d696 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -176,6 +176,9 @@ int kvmppc_core_set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs); void kvmppc_get_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs); int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs); +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg); +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg); + void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid); #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ecc77fa..5efdd5b 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -390,6 +390,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, return 0; } +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + reg->u.reg64 = 0; + r = 0; + break; + default: + break; + } + + return r; +} + +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + /* Only allow this to be set to zero */ + if (reg->u.reg64 == 0) + r = 0; + break; + default: + break; + } + + return r; +} + int kvmppc_core_check_processor_compat(void) { if (cpu_has_feature(CPU_FTR_HVMODE)) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index cbb7051..1abe35c 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -837,6 +837,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, return 0; } +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + reg->u.reg64 = to_book3s(vcpu)->hior; + r = 0; + break; + default: + break; + } + + return r; +} + +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + to_book3s(vcpu)->hior = reg->u.reg64; + to_book3s(vcpu)->hior_explicit = true; + r = 0; + break; + default: + break; + } + + return r; +} + int kvmppc_core_check_processor_compat(void) { return 0; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 9e41f45..ee9e1ee 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -887,6 +887,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, return kvmppc_core_set_sregs(vcpu, sregs); } +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + return -EINVAL; +} + +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + return -EINVAL; +} + int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) { return -ENOTSUPP; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 34515e8..1239c6f 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -620,45 +620,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, return r; } -static int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, - struct kvm_one_reg *reg) -{ - int r = -EINVAL; - - switch (reg->id) { -#ifdef CONFIG_PPC_BOOK3S - case KVM_ONE_REG_PPC_HIOR: - reg->u.reg64 = to_book3s(vcpu)->hior; - r = 0; -
[PATCH 13/13] KVM: PPC: Allow for read-only pages backing a Book3S HV guest
With this, if a guest does an H_ENTER with a read/write HPTE on a page which is currently read-only, we make the actual HPTE inserted be a read-only version of the HPTE. We now intercept protection faults as well as HPTE not found faults, and for a protection fault we work out whether it should be reflected to the guest (e.g. because the guest HPTE didn't allow write access to usermode) or handled by switching to kernel context and calling kvmppc_book3s_hv_page_fault, which will then request write access to the page and update the actual HPTE. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 20 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 33 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 32 - arch/powerpc/kvm/book3s_hv_rmhandlers.S |4 +- 4 files changed, 72 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 75a1b42..37755d0 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -115,6 +115,22 @@ static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; } +static inline int hpte_is_writable(unsigned long ptel) +{ + unsigned long pp = ptel & (HPTE_R_PP0 | HPTE_R_PP); + + return pp != PP_RXRX && pp != PP_RXXX; +} + +static inline unsigned long hpte_make_readonly(unsigned long ptel) +{ + if ((ptel & HPTE_R_PP0) || (ptel & HPTE_R_PP) == PP_RWXX) + ptel = (ptel & ~HPTE_R_PP) | PP_RXXX; + else + ptel |= PP_RXRX; + return ptel; +} + static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) { unsigned int wimg = ptel & HPTE_R_WIMG; @@ -134,7 +150,7 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) * Lock and read a linux PTE. If it's present and writable, atomically * set dirty and referenced bits and return the PTE, otherwise return 0. */ -static inline pte_t kvmppc_read_update_linux_pte(pte_t *p) +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) { pte_t pte, tmp; @@ -152,7 +168,7 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *p) if (pte_present(pte)) { pte = pte_mkyoung(pte); - if (pte_write(pte)) + if (writing && pte_write(pte)) pte = pte_mkdirty(pte); } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 6919d99..b1b31c7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -502,6 +502,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, struct page *page, *pages[1]; long index, ret, npages; unsigned long is_io; + unsigned int writing, write_ok; struct vm_area_struct *vma; /* @@ -552,8 +553,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, pfn = 0; page = NULL; pte_size = PAGE_SIZE; + writing = (dsisr & DSISR_ISSTORE) != 0; + /* If writing != 0, then the HPTE must allow writing, if we get here */ + write_ok = writing; hva = gfn_to_hva_memslot(memslot, gfn); - npages = get_user_pages_fast(hva, 1, 1, pages); + npages = get_user_pages_fast(hva, 1, writing, pages); if (npages < 1) { /* Check if it's an I/O mapping */ down_read(¤t->mm->mmap_sem); @@ -564,6 +568,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, ((hva - vma->vm_start) >> PAGE_SHIFT); pte_size = psize; is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot)); + write_ok = vma->vm_flags & VM_WRITE; } up_read(¤t->mm->mmap_sem); if (!pfn) @@ -574,6 +579,18 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, page = compound_head(page); pte_size <<= compound_order(page); } + /* if the guest wants write access, see if that is OK */ + if (!writing && hpte_is_writable(hpte[2])) { + pte_t *ptep, pte; + + ptep = find_linux_pte_or_hugepte(current->mm->pgd, +hva, NULL); + if (ptep && pte_present(*ptep)) { + pte = kvmppc_read_update_linux_pte(ptep, 1); + if (pte_write(pte)) + write_ok = 1; + } + } pfn = page_to_pfn(pa
[PATCH 08/13] KVM: PPC: Allow I/O mappings in memory slots
This provides for the case where userspace maps an I/O device into the address range of a memory slot using a VM_PFNMAP mapping. In that case, we work out the pfn from vma->vm_pgoff, and record the cache enable bits from vma->vm_page_prot in two low-order bits in the slot_phys array entries. Then, in kvmppc_h_enter() we check that the cache bits in the HPTE that the guest wants to insert match the cache bits in the slot_phys array entry. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 26 +++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 67 -- arch/powerpc/kvm/book3s_hv_rm_mmu.c |5 +- 4 files changed, 76 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index d55e6b4..a98e0f6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -107,6 +107,32 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) return 0; /* error */ } +static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) +{ + unsigned int wimg = ptel & HPTE_R_WIMG; + + /* Handle SAO */ + if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) && + cpu_has_feature(CPU_FTR_ARCH_206)) + wimg = HPTE_R_M; + + if (!io_type) + return wimg == HPTE_R_M; + + return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type; +} + +/* Return HPTE cache control bits corresponding to Linux pte bits */ +static inline unsigned long hpte_cache_bits(unsigned long pte_val) +{ +#if _PAGE_NO_CACHE == HPTE_R_I && _PAGE_WRITETHRU == HPTE_R_W + return pte_val & (HPTE_R_W | HPTE_R_I); +#else + return ((pte_val & _PAGE_NO_CACHE) ? HPTE_R_I : 0) + + ((pte_val & _PAGE_WRITETHRU) ? HPTE_R_W : 0); +#endif +} + static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, unsigned long pagesize) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ba1da85..9b1c247 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -177,6 +177,8 @@ struct revmap_entry { /* Low-order bits in kvm->arch.slot_phys[][] */ #define KVMPPC_PAGE_ORDER_MASK 0x1f +#define KVMPPC_PAGE_NO_CACHE HPTE_R_I/* 0x20 */ +#define KVMPPC_PAGE_WRITETHRU HPTE_R_W/* 0x40 */ #define KVMPPC_GOT_PAGE0x80 struct kvm_arch { diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index cc18f3d..b904c40 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -199,7 +199,8 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn, struct page *page, *hpage, *pages[1]; unsigned long s, pgsize; unsigned long *physp; - unsigned int got, pgorder; + unsigned int is_io, got, pgorder; + struct vm_area_struct *vma; unsigned long pfn, i, npages; physp = kvm->arch.slot_phys[memslot->id]; @@ -208,34 +209,51 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn, if (physp[gfn - memslot->base_gfn]) return 0; + is_io = 0; + got = 0; page = NULL; pgsize = psize; + err = -EINVAL; start = gfn_to_hva_memslot(memslot, gfn); /* Instantiate and get the page we want access to */ np = get_user_pages_fast(start, 1, 1, pages); - if (np != 1) - return -EINVAL; - page = pages[0]; - got = KVMPPC_GOT_PAGE; + if (np != 1) { + /* Look up the vma for the page */ + down_read(¤t->mm->mmap_sem); + vma = find_vma(current->mm, start); + if (!vma || vma->vm_start > start || + start + psize > vma->vm_end || + !(vma->vm_flags & VM_PFNMAP)) + goto up_err; + is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot)); + pfn = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); + /* check alignment of pfn vs. requested page size */ + if (psize > PAGE_SIZE && (pfn & ((psize >> PAGE_SHIFT) - 1))) + goto up_err; + up_read(¤t->mm->mmap_sem); - /* See if this is a large page */ - s = PAGE_SIZE; - if (PageHuge(page)) { - hpage = compound_head(page); - s <<= compound_order(hpage); - /* Get the whole large page if slot alignment is ok */ - if (s > psize && slot_is_aligned(memslot, s) && - !(memslot->userspace_addr & (s - 1))) { - start &= ~(s - 1); - pgsize = s; -
[PATCH 11/13] KVM: Add barriers to allow mmu_notifier_retry to be used locklessly
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give the correct answer when called without kvm->mmu_lock being held. PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than a single global spinlock in order to improve the scalability of updates to the guest MMU hashed page table, and so needs this. Signed-off-by: Paul Mackerras --- include/linux/kvm_host.h | 14 +- virt/kvm/kvm_main.c |6 +++--- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c6a2ec9..17319ad 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -672,12 +672,16 @@ static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se if (unlikely(vcpu->kvm->mmu_notifier_count)) return 1; /* -* Both reads happen under the mmu_lock and both values are -* modified under mmu_lock, so there's no need of smb_rmb() -* here in between, otherwise mmu_notifier_count should be -* read before mmu_notifier_seq, see -* mmu_notifier_invalidate_range_end write side. +* Ensure the read of mmu_notifier_count happens before the read +* of mmu_notifier_seq. This interacts with the smp_wmb() in +* mmu_notifier_invalidate_range_end to make sure that the caller +* either sees the old (non-zero) value of mmu_notifier_count or +* the new (incremented) value of mmu_notifier_seq. +* PowerPC Book3s HV KVM calls this under a per-page lock +* rather than under kvm->mmu_lock, for scalability, so +* can't rely on kvm->mmu_lock to keep things ordered. */ + smp_rmb(); if (vcpu->kvm->mmu_notifier_seq != mmu_seq) return 1; return 0; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d9cfb78..ad2a912 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -357,11 +357,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, * been freed. */ kvm->mmu_notifier_seq++; + smp_wmb(); /* * The above sequence increase must be visible before the -* below count decrease but both values are read by the kvm -* page fault under mmu_lock spinlock so we don't need to add -* a smb_wmb() here in between the two. +* below count decrease, which is ensured by the smp_wmb above +* in conjunction with the smp_rmb in mmu_notifier_retry(). */ kvm->mmu_notifier_count--; spin_unlock(&kvm->mmu_lock); -- 1.7.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 04/13] KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests
This adds two new functions, kvmppc_pin_guest_page() and kvmppc_unpin_guest_page(), and uses them to pin the guest pages where the guest has registered areas of memory for the hypervisor to update, (i.e. the per-cpu virtual processor areas, SLB shadow buffers and dispatch trace logs) and then unpin them when they are no longer required. Although it is not strictly necessary to pin the pages at this point, since all guest pages are already pinned, later commits in this series will mean that guest pages aren't all pinned. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h |3 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 38 ++ arch/powerpc/kvm/book3s_hv.c | 67 ++--- 3 files changed, 78 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index deb8a4e..16db48c 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -140,6 +140,9 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); +extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, + unsigned long *nb_ret); +extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); extern void kvmppc_entry_trampoline(void); extern void kvmppc_hv_entry_trampoline(void); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index e4c6069..dcd39dc 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -184,6 +184,44 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, return -ENOENT; } +void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa, + unsigned long *nb_ret) +{ + struct kvm_memory_slot *memslot; + unsigned long gfn = gpa >> PAGE_SHIFT; + struct page *page; + unsigned long offset; + unsigned long pfn, pa; + unsigned long *physp; + + memslot = gfn_to_memslot(kvm, gfn); + if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) + return NULL; + physp = kvm->arch.slot_phys[memslot->id]; + if (!physp) + return NULL; + physp += (gfn - memslot->base_gfn) >> + (kvm->arch.ram_porder - PAGE_SHIFT); + pa = *physp; + if (!pa) + return NULL; + pfn = pa >> PAGE_SHIFT; + page = pfn_to_page(pfn); + get_page(page); + offset = gpa & (kvm->arch.ram_psize - 1); + if (nb_ret) + *nb_ret = kvm->arch.ram_psize - offset; + return page_address(page) + offset; +} + +void kvmppc_unpin_guest_page(struct kvm *kvm, void *va) +{ + struct page *page = virt_to_page(va); + + page = compound_head(page); + put_page(page); +} + void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu) { struct kvmppc_mmu *mmu = &vcpu->arch.mmu; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c2ee5a7..6e94af8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -137,12 +137,10 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu, unsigned long vcpuid, unsigned long vpa) { struct kvm *kvm = vcpu->kvm; - unsigned long gfn, pg_index, ra, len; - unsigned long pg_offset; + unsigned long len, nb; void *va; struct kvm_vcpu *tvcpu; - struct kvm_memory_slot *memslot; - unsigned long *physp; + int err = H_PARAMETER; tvcpu = kvmppc_find_vcpu(kvm, vcpuid); if (!tvcpu) @@ -155,51 +153,41 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu, if (flags < 4) { if (vpa & 0x7f) return H_PARAMETER; + if (flags >= 2 && !tvcpu->arch.vpa) + return H_RESOURCE; /* registering new area; convert logical addr to real */ - gfn = vpa >> PAGE_SHIFT; - memslot = gfn_to_memslot(kvm, gfn); - if (!memslot || !(memslot->flags & KVM_MEMSLOT_INVALID)) - return H_PARAMETER; - physp = kvm->arch.slot_phys[memslot->id]; - if (!physp) - return H_PARAMETER; - pg_index = (gfn - memslot->base_gfn) >> - (kvm->arch.ram_porder - PAGE_SHIFT); - pg_offset = vpa & (kvm->arch.ram_psize - 1); - ra = physp[pg_index]; - if (!ra) + va = kvmppc_pin_guest_page(kvm, vpa, &nb); + if (va == NULL) return H_PARAMETER; -
[PATCH 09/13] KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn
This expands the reverse mapping array to contain two links for each HPTE which are used to link together HPTEs that correspond to the same guest logical page. Each circular list of HPTEs is pointed to by the rmap array entry for the guest logical page, pointed to by the relevant memslot. Links are 32-bit HPT entry indexes rather than full 64-bit pointers, to save space. We use 3 of the remaining 32 bits in the rmap array entries as a lock bit, a referenced bit and a present bit (the present bit is needed since HPTE index 0 is valid). The bit lock for the rmap chain nests inside the HPTE lock bit. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 18 ++ arch/powerpc/include/asm/kvm_host.h | 17 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 84 +- 3 files changed, 117 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index a98e0f6..90e6658 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -107,6 +107,11 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) return 0; /* error */ } +static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) +{ + return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; +} + static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) { unsigned int wimg = ptel & HPTE_R_WIMG; @@ -133,6 +138,19 @@ static inline unsigned long hpte_cache_bits(unsigned long pte_val) #endif } +static inline void lock_rmap(unsigned long *rmap) +{ + do { + while (test_bit(KVMPPC_RMAP_LOCK_BIT, rmap)) + cpu_relax(); + } while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmap)); +} + +static inline void unlock_rmap(unsigned long *rmap) +{ + __clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmap); +} + static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, unsigned long pagesize) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 9b1c247..e369d49 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -169,12 +169,27 @@ struct kvmppc_rma_info { /* * The reverse mapping array has one entry for each HPTE, * which stores the guest's view of the second word of the HPTE - * (including the guest physical address of the mapping). + * (including the guest physical address of the mapping), + * plus forward and backward pointers in a doubly-linked ring + * of HPTEs that map the same host page. The pointers in this + * ring are 32-bit HPTE indexes, to save space. */ struct revmap_entry { unsigned long guest_rpte; + unsigned int forw, back; }; +/* + * We use the top bit of each memslot->rmap entry as a lock bit, + * and bit 32 as a present flag. The bottom 32 bits are the + * index in the guest HPT of a HPTE that points to the page. + */ +#define KVMPPC_RMAP_LOCK_BIT 63 +#define KVMPPC_RMAP_REF_BIT33 +#define KVMPPC_RMAP_REFERENCED (1ul << KVMPPC_RMAP_REF_BIT) +#define KVMPPC_RMAP_PRESENT0x1ul +#define KVMPPC_RMAP_INDEX 0xul + /* Low-order bits in kvm->arch.slot_phys[][] */ #define KVMPPC_PAGE_ORDER_MASK 0x1f #define KVMPPC_PAGE_NO_CACHE HPTE_R_I/* 0x20 */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 88d2add..b600f8c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -57,6 +57,70 @@ static void *real_vmalloc_addr(void *x) return __va(addr); } +/* + * Add this HPTE into the chain for the real page. + * Must be called with the chain locked; it unlocks the chain. + */ +static void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, +unsigned long *rmap, long pte_index, int realmode) +{ + struct revmap_entry *head, *tail; + unsigned long i; + + if (*rmap & KVMPPC_RMAP_PRESENT) { + i = *rmap & KVMPPC_RMAP_INDEX; + head = &kvm->arch.revmap[i]; + if (realmode) + head = real_vmalloc_addr(head); + tail = &kvm->arch.revmap[head->back]; + if (realmode) + tail = real_vmalloc_addr(tail); + rev->forw = i; + rev->back = head->back; + tail->forw = pte_index; + head->back = pte_index; + } else { + rev->forw = rev->back = pte_index; + i = pte_index; + } + smp_wmb(); + *rmap = i | KVMPPC_RMAP_REFERENCED | KVMPPC_RMAP_PRESENT; /* unlock */ +} + +/* Remove this HPTE from the chain for a real page */ +static void remove_revmap_chain(struct kvm *kvm, long pte_index, +
[PATCH 03/13] KVM: PPC: Keep page physical addresses in per-slot arrays
This allocates an array for each memory slot that is added to store the physical addresses of the pages in the slot. This array is vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr(). This allows us to remove the ram_pginfo field from the kvm_arch struct, and removes the 64GB guest RAM limit that we had. We use the low-order bits of the array entries to store a flag indicating that we have done get_page on the corresponding page, and therefore need to call put_page when we are finished with the page. Currently this is set for all pages except those in our special RMO regions. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_host.h |8 ++- arch/powerpc/kvm/book3s_64_mmu_hv.c | 18 +++--- arch/powerpc/kvm/book3s_hv.c| 114 +-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 44 - 4 files changed, 109 insertions(+), 75 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 629df2e..cf6b4d7 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -175,25 +175,27 @@ struct revmap_entry { unsigned long guest_rpte; }; +/* Low-order bits in kvm->arch.slot_phys[][] */ +#define KVMPPC_GOT_PAGE0x80 + struct kvm_arch { #ifdef CONFIG_KVM_BOOK3S_64_HV unsigned long hpt_virt; struct revmap_entry *revmap; - unsigned long ram_npages; unsigned long ram_psize; unsigned long ram_porder; - struct kvmppc_pginfo *ram_pginfo; unsigned int lpid; unsigned int host_lpid; unsigned long host_lpcr; unsigned long sdr1; unsigned long host_sdr1; int tlbie_lock; - int n_rma_pages; unsigned long lpcr; unsigned long rmor; struct kvmppc_rma_info *rma; struct list_head spapr_tce_tables; + unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; + int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; unsigned short last_vcpu[NR_CPUS]; struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; #endif /* CONFIG_KVM_BOOK3S_64_HV */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 80ece8d..e4c6069 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -98,16 +98,16 @@ void kvmppc_free_hpt(struct kvm *kvm) void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) { unsigned long i; - unsigned long npages = kvm->arch.ram_npages; - unsigned long pfn; + unsigned long npages; + unsigned long pa; unsigned long *hpte; unsigned long hash; unsigned long porder = kvm->arch.ram_porder; struct revmap_entry *rev; - struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo; + unsigned long *physp; - if (!pginfo) - return; + physp = kvm->arch.slot_phys[mem->slot]; + npages = kvm->arch.slot_npages[mem->slot]; /* VRMA can't be > 1TB */ if (npages > 1ul << (40 - porder)) @@ -117,9 +117,10 @@ void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) npages = HPT_NPTEG; for (i = 0; i < npages; ++i) { - pfn = pginfo[i].pfn; - if (!pfn) + pa = physp[i]; + if (!pa) break; + pa &= PAGE_MASK; /* can't use hpt_hash since va > 64 bits */ hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK; /* @@ -131,8 +132,7 @@ void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) hash = (hash << 3) + 7; hpte = (unsigned long *) (kvm->arch.hpt_virt + (hash << 4)); /* HPTE low word - RPN, protection, etc. */ - hpte[1] = (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C | - HPTE_R_M | PP_RWXX; + hpte[1] = pa | HPTE_R_R | HPTE_R_C | HPTE_R_M | PP_RWXX; smp_wmb(); hpte[0] = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) | (i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED | diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 5efdd5b..c2ee5a7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -48,14 +48,6 @@ #include #include -/* - * For now, limit memory to 64GB and require it to be large pages. - * This value is chosen because it makes the ram_pginfo array be - * 64kB in size, which is about as large as we want to be trying - * to allocate with kmalloc. - */ -#define MAX_MEM_ORDER 36 - #define LARGE_PAGE_ORDER 24 /* 16MB pages */ /* #define EXIT_DEBUG */ @@ -145,10 +137,12 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
[PATCH 06/13] KVM: PPC: Only get pages when actually needed, not in prepare_memory_region()
This removes the code from kvmppc_core_prepare_memory_region() that looked up the VMA for the region being added and called hva_to_page to get the pfns for the memory. We have no guarantee that there will be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION ioctl call; userspace can do that ioctl and then map memory into the region later. Instead we defer looking up the pfn for each memory page until it is needed, which generally means when the guest does an H_ENTER hcall on the page. Since we can't call get_user_pages in real mode, if we don't already have the pfn for the page, kvmppc_h_enter() will return H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back to kernel context. That calls kvmppc_get_guest_page() to get the pfn for the page, and then calls back to kvmppc_h_enter() to redo the HPTE insertion. When the first vcpu starts executing, we need to have the RMO or VRMA region mapped so that the guest's real mode accesses will work. Thus we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set up and if not, call kvmppc_hv_setup_rma(). It checks if the memslot starting at guest physical 0 now has RMO memory mapped there; if so it sets it up for the guest, otherwise on POWER7 it sets up the VRMA. The function that does that, kvmppc_map_vrma, is now a bit simpler, as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself. Since we are now potentially updating entries in the slot_phys[] arrays from multiple vcpu threads, we now have a spinlock protecting those updates to ensure that we don't lose track of any references to pages. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h|4 + arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h |4 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 130 +--- arch/powerpc/kvm/book3s_hv.c | 244 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 56 7 files changed, 291 insertions(+), 161 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 16db48c..5e7e04b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -143,6 +143,10 @@ extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); +extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, + long pte_index, unsigned long pteh, unsigned long ptel); +extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, + long pte_index, unsigned long pteh, unsigned long ptel); extern void kvmppc_entry_trampoline(void); extern void kvmppc_hv_entry_trampoline(void); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index fe45a81..ab6772e 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -95,4 +95,16 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + /* only handle 4k, 64k and 16M pages for now */ + if (!(h & HPTE_V_LARGE)) + return 1ul << 12; /* 4k page */ + if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206)) + return 1ul << 16; /* 64k page */ + if ((l & 0xff000) == 0) + return 1ul << 24; /* 16M page */ + return 0; /* error */ +} + #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index cf6b4d7..2a52bdb 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -193,7 +193,9 @@ struct kvm_arch { unsigned long lpcr; unsigned long rmor; struct kvmppc_rma_info *rma; + int rma_setup_done; struct list_head spapr_tce_tables; + spinlock_t slot_phys_lock; unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; unsigned short last_vcpu[NR_CPUS]; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index fc2d696..111e1b4 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -121,8 +121,8 @@ extern long kvmppc_alloc_hpt(struct kvm *kvm); extern void kvmppc_free_hpt(struct kvm *kvm); extern long kvmppc_prepare_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem); -
[PATCH 0/13] KVM: PPC: Update Book3S HV memory handling
This series of patches updates the Book3S-HV KVM code that manages the guest hashed page table (HPT) to enable several things: * MMIO emulation and MMIO pass-through * Use of small pages (4kB or 64kB, depending on config) to back the guest memory * Pageable guest memory - i.e. backing pages can be removed from the guest and reinstated on demand, using the MMU notifier mechanism. * Guests can be given read-only access to pages even though they think they have mapped them read/write. When they try to write to them their access is upgraded to read/write. This allows KSM to share pages between guests. On PPC970 we have no way to get DSIs and ISIs to come to the hypervisor, so we can't do MMIO emulation or pageable guest memory. On POWER7 we set the VPM1 bit in the LPCR to make all DSIs and ISIs come to the hypervisor (host) as HDSIs or HISIs. This code is working well in my tests. The sporadic crashes that I was seeing earlier are fixed by the first patch in the series. Somewhat to my surprise, when I implemented the last patch in the series I started to see KSM coalescing pages without any further effort on my part -- my tests were on a machine with Fedora 16 installed, and it has ksmtuned running by default. This series is on top of Alex Graf's kvm-ppc-next branch, although the last patch on that branch ("KVM: PPC: booke: Improve timer register emulation") is causing the decrementer not to work properly in Book3S HV guests, for reasons that I haven't fully determined yet. These patches only touch arch/powerpc except for patch 11, which adds a couple of barriers to allow mmu_notifier_retry() to be used outside of the kvm->mmu_lock. Unlike the previous version of these patches, we don't look at what's mapped in the user address space at the time that kvmppc_core_prepare_memory_region or kvmppc_core_commit_memory_region gets called; we look up pages only when they are needed, either because the guest wants to map them with an H_ENTER hypercall, or for the pages needed for the virtual real-mode area (VRMA), at the time of the first VCPU_RUN ioctl. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 02/13] KVM: PPC: Keep a record of HV guest view of hashed page table entries
This adds an array that parallels the guest hashed page table (HPT), that is, it has one entry per HPTE, used to store the guest's view of the second doubleword of the corresponding HPTE. The first doubleword in the HPTE is the same as the guest's idea of it, so we don't need to store a copy, but the second doubleword in the HPTE has the real page number rather than the guest's logical page number. This allows us to remove the back_translate() and reverse_xlate() functions. This "reverse mapping" array is vmalloc'd, meaning that to access it in real mode we have to walk the kernel's page tables explicitly. That is done by the new real_vmalloc_addr() function. (In fact this returns an address in the linear mapping, so the result is usable both in real mode and in virtual mode.) There are also some minor cleanups here: moving the definitions of HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h |8 +++ arch/powerpc/include/asm/kvm_host.h | 10 arch/powerpc/kvm/book3s_64_mmu_hv.c | 44 +++ arch/powerpc/kvm/book3s_hv_rm_mmu.c | 87 ++ 4 files changed, 103 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index d0ac94f..23bb17e 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -29,6 +29,14 @@ static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu) #define SPAPR_TCE_SHIFT12 +#ifdef CONFIG_KVM_BOOK3S_64_HV +/* For now use fixed-size 16MB page table */ +#define HPT_ORDER 24 +#define HPT_NPTEG (1ul << (HPT_ORDER - 7))/* 128B per pteg */ +#define HPT_NPTE (HPT_NPTEG << 3)/* 8 PTEs per PTEG */ +#define HPT_HASH_MASK (HPT_NPTEG - 1) +#endif + static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, unsigned long pte_index) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 66c75cd..629df2e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -166,9 +166,19 @@ struct kvmppc_rma_info { atomic_t use_count; }; +/* + * The reverse mapping array has one entry for each HPTE, + * which stores the guest's view of the second word of the HPTE + * (including the guest physical address of the mapping). + */ +struct revmap_entry { + unsigned long guest_rpte; +}; + struct kvm_arch { #ifdef CONFIG_KVM_BOOK3S_64_HV unsigned long hpt_virt; + struct revmap_entry *revmap; unsigned long ram_npages; unsigned long ram_psize; unsigned long ram_porder; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index bc3a2ea..80ece8d 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -33,11 +34,6 @@ #include #include -/* For now use fixed-size 16MB page table */ -#define HPT_ORDER 24 -#define HPT_NPTEG (1ul << (HPT_ORDER - 7))/* 128B per pteg */ -#define HPT_HASH_MASK (HPT_NPTEG - 1) - /* Pages in the VRMA are 16MB pages */ #define VRMA_PAGE_ORDER24 #define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ @@ -51,7 +47,9 @@ long kvmppc_alloc_hpt(struct kvm *kvm) { unsigned long hpt; unsigned long lpid; + struct revmap_entry *rev; + /* Allocate guest's hashed page table */ hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN, HPT_ORDER - PAGE_SHIFT); if (!hpt) { @@ -60,12 +58,20 @@ long kvmppc_alloc_hpt(struct kvm *kvm) } kvm->arch.hpt_virt = hpt; + /* Allocate reverse map array */ + rev = vmalloc(sizeof(struct revmap_entry) * HPT_NPTE); + if (!rev) { + pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n"); + goto out_freehpt; + } + kvm->arch.revmap = rev; + + /* Allocate the guest's logical partition ID */ do { lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS); if (lpid >= NR_LPIDS) { pr_err("kvm_alloc_hpt: No LPIDs free\n"); - free_pages(hpt, HPT_ORDER - PAGE_SHIFT); - return -ENOMEM; + goto out_freeboth; } } while (test_and_set_bit(lpid, lpid_inuse)); @@ -74,11 +80,18 @@ long kvmppc_alloc_hpt(struct kvm *kvm) pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid); return 0; + + out_freeboth: + vfree(rev); + out_freehpt: + free_pages(hpt, HPT_ORDER - PAGE_SHIFT); +
[PATCH 05/13] KVM: PPC: Make the H_ENTER hcall more reliable
At present, our implementation of H_ENTER only makes one try at locking each slot that it looks at, and doesn't even retry the ldarx/stdcx. atomic update sequence that it uses to attempt to lock the slot. Thus it can return the H_PTEG_FULL error unnecessarily, particularly when the H_EXACT flag is set, meaning that the caller wants a specific PTEG slot. This improves the situation by making a second pass when no free HPTE slot is found, where we spin until we succeed in locking each slot in turn and then check whether it is full while we hold the lock. If the second pass fails, then we return H_PTEG_FULL. This also moves lock_hpte to a header file (since later commits in this series will need to use it from other source files) and renames it to try_lock_hpte, which is a somewhat less misleading name. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 25 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 63 -- 2 files changed, 59 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 23bb17e..fe45a81 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -37,6 +37,31 @@ static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu) #define HPT_HASH_MASK (HPT_NPTEG - 1) #endif +/* + * We use a lock bit in HPTE dword 0 to synchronize updates and + * accesses to each HPTE, and another bit to indicate non-present + * HPTEs. + */ +#define HPTE_V_HVLOCK 0x40UL + +static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) +{ + unsigned long tmp, old; + + asm volatile(" ldarx %0,0,%2\n" +" and.%1,%0,%3\n" +" bne 2f\n" +" ori %0,%0,%4\n" +" stdcx. %0,0,%2\n" +" beq+2f\n" +" li %1,%3\n" +"2:isync" +: "=&r" (tmp), "=&r" (old) +: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK) +: "cc", "memory"); + return old == 0; +} + static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, unsigned long pte_index) { diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 5f45ba7..659175f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -56,26 +56,6 @@ static void *real_vmalloc_addr(void *x) return __va(addr); } -#define HPTE_V_HVLOCK 0x40UL - -static inline long lock_hpte(unsigned long *hpte, unsigned long bits) -{ - unsigned long tmp, old; - - asm volatile(" ldarx %0,0,%2\n" -" and.%1,%0,%3\n" -" bne 2f\n" -" ori %0,%0,%4\n" -" stdcx. %0,0,%2\n" -" beq+2f\n" -" li %1,%3\n" -"2:isync" -: "=&r" (tmp), "=&r" (old) -: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK) -: "cc", "memory"); - return old == 0; -} - long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { @@ -129,24 +109,49 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, pteh &= ~0x60UL; ptel &= ~(HPTE_R_PP0 - kvm->arch.ram_psize); ptel |= pa; + if (pte_index >= HPT_NPTE) return H_PARAMETER; if (likely((flags & H_EXACT) == 0)) { pte_index &= ~7UL; hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4)); - for (i = 0; ; ++i) { - if (i == 8) - return H_PTEG_FULL; + for (i = 0; i < 8; ++i) { if ((*hpte & HPTE_V_VALID) == 0 && - lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) + try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) break; hpte += 2; } + if (i == 8) { + /* +* Since try_lock_hpte doesn't retry (not even stdcx. +* failures), it could be that there is a free slot +* but we transiently failed to lock it. Try again, +* actually locking each slot and checking it. +*/ + hpte -= 16; + for (i = 0; i < 8; ++i) { + while (!try_lock_hpte(hpte, HPTE_V_HVLOCK)) + cpu_relax(); +
[PATCH 10/13] KVM: PPC: Implement MMIO emulation support for Book3S HV guests
This provides the low-level support for MMIO emulation in Book3S HV guests. When the guest tries to map a page which is not covered by any memslot, that page is taken to be an MMIO emulation page. Instead of inserting a valid HPTE, we insert an HPTE that has the valid bit clear but another hypervisor software-use bit set, which we call HPTE_V_ABSENT, to indicate that this is an absent page. An absent page is treated much like a valid page as far as guest hcalls (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that an absent HPTE doesn't need to be invalidated with tlbie since it was never valid as far as the hardware is concerned. When the guest accesses a page for which there is an absent HPTE, it will take a hypervisor data storage interrupt (HDSI) since we now set the VPM1 bit in the LPCR. Our HDSI handler for HPTE-not-present faults looks up the hash table and if it finds an absent HPTE mapping the requested virtual address, will switch to kernel mode and handle the fault in kvmppc_book3s_hv_page_fault(), which at present just calls kvmppc_hv_emulate_mmio() to set up the MMIO emulation. This is based on an earlier patch by Benjamin Herrenschmidt, but since heavily reworked. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h|5 + arch/powerpc/include/asm/kvm_book3s_64.h | 26 +++ arch/powerpc/include/asm/kvm_host.h |5 + arch/powerpc/include/asm/mmu-hash64.h|2 +- arch/powerpc/include/asm/ppc-opcode.h|4 +- arch/powerpc/include/asm/reg.h |1 + arch/powerpc/kernel/asm-offsets.c|1 + arch/powerpc/kernel/exceptions-64s.S |8 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 228 +-- arch/powerpc/kvm/book3s_hv.c | 21 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 262 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 127 --- 12 files changed, 607 insertions(+), 83 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5e7e04b..5ac53f9 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -121,6 +121,11 @@ extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu); extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte); extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr); extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu); +extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run, + struct kvm_vcpu *vcpu, unsigned long addr, + unsigned long status); +extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, + unsigned long slb_v, unsigned long valid); extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte); extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 90e6658..9a59b6d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -37,12 +37,15 @@ static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu) #define HPT_HASH_MASK (HPT_NPTEG - 1) #endif +#define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ + /* * We use a lock bit in HPTE dword 0 to synchronize updates and * accesses to each HPTE, and another bit to indicate non-present * HPTEs. */ #define HPTE_V_HVLOCK 0x40UL +#define HPTE_V_ABSENT 0x20UL static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) { @@ -138,6 +141,29 @@ static inline unsigned long hpte_cache_bits(unsigned long pte_val) #endif } +static inline bool hpte_read_permission(unsigned long pp, unsigned long key) +{ + if (key) + return PP_RWRX <= pp && pp <= PP_RXRX; + return 1; +} + +static inline bool hpte_write_permission(unsigned long pp, unsigned long key) +{ + if (key) + return pp == PP_RWRW; + return pp <= PP_RWRW; +} + +static inline int hpte_get_skey_perm(unsigned long hpte_r, unsigned long amr) +{ + unsigned long skey; + + skey = ((hpte_r & HPTE_R_KEY_HI) >> 57) | + ((hpte_r & HPTE_R_KEY_LO) >> 9); + return (amr >> (62 - 2 * skey)) & 3; +} + static inline void lock_rmap(unsigned long *rmap) { do { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e369d49..c9c92f0 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -209,6 +209,7 @@ struct kvm_arch { unsigned long lpcr; unsigned long rmor; struct kvmppc_rma_info *rma; + unsigned long vrma_slb_v; int rma_setup_done; struct list_head spapr_tce_tables; spinlock_t slot_phys_lock; @@ -451,6 +452,10 @@