Re: Re: Re: [PATCHv5] atomic: add *_dec_not_zero

2011-12-05 Thread Benjamin Herrenschmidt
On Mon, 2011-12-05 at 08:57 +0100, Sven Eckelmann wrote:
> On Monday 05 December 2011 09:41:55 Benjamin Herrenschmidt wrote:
> > On Sun, 2011-12-04 at 22:18 +, Russell King - ARM Linux wrote:
> > 
> >  .../...
> > 
> > > And really, I believe it would be a good cleanup if all the standard
> > > definitions for atomic64 ops (like atomic64_add_negative) were also
> > > defined in include/linux/atomic.h rather than individually in every
> > > atomic*.h header throughout the kernel source, except where an arch
> > > wants to explicitly override it.  Yet again, virtually all architectures
> > > define these in exactly the same way.
> > > 
> > > We have more than enough code in arch/ for any architecture to worry
> > > about, we don't need schemes to add more when there's simple and
> > > practical solutions to avoiding doing so if the right design were
> > > chosen (preferably from the outset.)
> > > 
> > > So, I'm not going to offer my ack for a change which I don't believe
> > > is the correct approach.
> > 
> > I agree with Russell, his approach is a lot easier to maintain long run,
> > we should even consider converting existing definitions.
> 
> I would rather go with "the existing definitions have to converted" and this 
> means "not by this patch".

Right. I didn't suggest -you- had to do it as a pre-req to your patch.

>  At the moment, the atomic64 stuff exist only as 
> separate generic or arch specific implementation. It is fine that Russell 
> King 
> noticed that people like Arun Sharma did a lot of work to made it true for 
> atomic_t, but atomic64_t is a little bit different right now (at least as I 
> understand it).

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: oprofile callgraph support missing for common cpus

2011-12-05 Thread Joakim Tjernlund
Benjamin Herrenschmidt  wrote on 2011/11/25 06:24:32:
>
> On Fri, 2011-11-18 at 09:22 +0100, Joakim Tjernlund wrote:
>
> > I forgot to ask, oprofile mentions setting -no-omit-framepointer to get
> > correct backtrace but I cannot turn on frame pointers for the ppc kernel.
> > Isn't frame pointers needed for pcc? what about user space?
>
> PowerPC always has frame pointers, ignore that :-)

A bit late but consider this:

int leaf(int x)
{
return x+3;
}

which yields(with gcc -O2 -S):
.file   "leaf.c"
.section".text"
.align 2
.globl leaf
.type   leaf, @function
leaf:
addi 3,3,3
blr
.size   leaf, .-leaf
.section.note.GNU-stack,"",@progbits
.ident  "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"


Here there is with frame pointer(I guess that the messing around with r11 and 
r31 is a defect?):
(With gcc -O2 -S -fno-omit-frame-pointer)

.file   "leaf.c"
.section".text"
.align 2
.globl leaf
.type   leaf, @function
leaf:
stwu 1,-16(1)
addi 3,3,3
lwz 11,0(1)
stw 31,12(1)
mr 31,1
lwz 31,-4(11)
mr 1,11
blr
.size   leaf, .-leaf
.section.note.GNU-stack,"",@progbits
.ident  "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: oprofile callgraph support missing for common cpus

2011-12-05 Thread Benjamin Herrenschmidt
On Mon, 2011-12-05 at 09:50 +0100, Joakim Tjernlund wrote:
> Benjamin Herrenschmidt  wrote on 2011/11/25 
> 06:24:32:
> >
> > On Fri, 2011-11-18 at 09:22 +0100, Joakim Tjernlund wrote:
> >
> > > I forgot to ask, oprofile mentions setting -no-omit-framepointer to get
> > > correct backtrace but I cannot turn on frame pointers for the ppc kernel.
> > > Isn't frame pointers needed for pcc? what about user space?
> >
> > PowerPC always has frame pointers, ignore that :-)
> 
> A bit late but consider this:

 .../...

Right I wasn't clear. We do have frame pointers for non-leaf functions,
and we can trace from LR when we are on a leaf function, we can use
__builtin_return_address as well.

We also explicitely prevent -fno-omit-frame-pointer, iirc, due to a bug
with older versions of gcc which could cause miscompiles under some
circumstances (though I don't remember the details).

Cheers,
Ben.


> int leaf(int x)
> {
>   return x+3;
> }
> 
> which yields(with gcc -O2 -S):
>   .file   "leaf.c"
>   .section".text"
>   .align 2
>   .globl leaf
>   .type   leaf, @function
> leaf:
>   addi 3,3,3
>   blr
>   .size   leaf, .-leaf
>   .section.note.GNU-stack,"",@progbits
>   .ident  "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"
> 
> 
> Here there is with frame pointer(I guess that the messing around with r11 and 
> r31 is a defect?):
> (With gcc -O2 -S -fno-omit-frame-pointer)
> 
>   .file   "leaf.c"
>   .section".text"
>   .align 2
>   .globl leaf
>   .type   leaf, @function
> leaf:
>   stwu 1,-16(1)
>   addi 3,3,3
>   lwz 11,0(1)
>   stw 31,12(1)
>   mr 31,1
>   lwz 31,-4(11)
>   mr 1,11
>   blr
>   .size   leaf, .-leaf
>   .section.note.GNU-stack,"",@progbits
>   .ident  "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] mmc: sdhci-pltfm: Added sdhci-adjust-timeout quirk

2011-12-05 Thread Xie Xiaobo
Some controller provides an incorrect timeout value for transfers,
So it need the quirk to adjust timeout value to 0xE.
E.g. eSDHC of MPC8536, P1010, and P2020.

Signed-off-by: Xie Xiaobo 
---
 drivers/mmc/host/sdhci-pltfm.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/host/sdhci-pltfm.c b/drivers/mmc/host/sdhci-pltfm.c
index a9e12ea..b5d6b3f 100644
--- a/drivers/mmc/host/sdhci-pltfm.c
+++ b/drivers/mmc/host/sdhci-pltfm.c
@@ -2,7 +2,7 @@
  * sdhci-pltfm.c Support for SDHCI platform devices
  * Copyright (c) 2009 Intel Corporation
  *
- * Copyright (c) 2007 Freescale Semiconductor, Inc.
+ * Copyright (c) 2007, 2011 Freescale Semiconductor, Inc.
  * Copyright (c) 2009 MontaVista Software, Inc.
  *
  * Authors: Xiaobo Xie 
@@ -68,6 +68,9 @@ void sdhci_get_of_property(struct platform_device *pdev)
if (of_get_property(np, "sdhci,1-bit-only", NULL))
host->quirks |= SDHCI_QUIRK_FORCE_1_BIT_DATA;
 
+   if (of_get_property(np, "sdhci,sdhci-adjust-timeout", NULL))
+   host->quirks |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;
+
if (sdhci_of_wp_inverted(np))
host->quirks |= SDHCI_QUIRK_INVERTED_WRITE_PROTECT;
 
-- 
1.6.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.

2011-12-05 Thread Amit Shah
On (Tue) 29 Nov 2011 [09:50:41], Miche Baker-Harvey wrote:
> Good grief!  Sorry for the spacing mess-up!  Here's a resend with 
> reformatting.
> 
> Amit,
> We aren't using either QEMU or kvmtool, but we are using KVM.  All

So it's a different userspace?  Any chance this different userspace is
causing these problems to appear?  Esp. since I couldn't reproduce
with qemu.

> the issues we are seeing happen when we try to establish multiple
> virtioconsoles at boot time.  The command line isn't relevant, but I
> can tell you the protocol that's passing between the host (kvm) and
> the guest (see the end of this message).
> 
> We do go through the control_work_handler(), but it's not
> providing synchronization.  Here's a trace of the
> control_work_handler() and handle_control_message() calls; note that
> there are two concurrent calls to control_work_handler().

Ah; how does that happen?  control_work_handler() should just be
invoked once, and if there are any more pending work items to be
consumed, they should be done within the loop inside
control_work_handler().

> I decorated control_work_handler() with a "lifetime" marker, and
> passed this value to handle_control_message(), so we can see which
> control messages are being handled from which instance of
> the control_work_handler() thread.
> 
> Notice that we enter control_work_handler() a second time before
> the handling of the second PORT_ADD message is complete. The
> first CONSOLE_PORT message is handled by the second
> control_work_handler() call, but the second is handled by the first
> control_work_handler() call.
> 
> root@myubuntu:~# dmesg | grep MBH
> [3371055.808738] control_work_handler #1
> [3371055.809372] + #1 handle_control_message PORT_ADD
> [3371055.810169] - handle_control_message PORT_ADD
> [3371055.810170] + #1 handle_control_message PORT_ADD
> [3371055.810244]  control_work_handler #2
> [3371055.810245] + #2 handle_control_message CONSOLE_PORT
> [3371055.810246]  got hvc_ports_mutex
> [3371055.810578] - handle_control_message PORT_ADD
> [3371055.810579] + #1 handle_control_message CONSOLE_PORT
> [3371055.810580]  trylock of hvc_ports_mutex failed
> [3371055.811352]  got hvc_ports_mutex
> [3371055.811370] - handle_control_message CONSOLE_PORT
> [3371055.816609] - handle_control_message CONSOLE_PORT
> 
> So, I'm guessing the bug is that there shouldn't be two instances of
> control_work_handler() running simultaneously?

Yep, I assumed we did that but apparently not.  Do you plan to chase
this one down?

Amit
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/2] mtd/nand: Add ONFI support for FSL NAND controller

2011-12-05 Thread Shengzhou Liu
- fix NAND_CMD_READID command for ONFI detect.
- add NAND_CMD_PARAM command to read the ONFI parameter page.

Signed-off-by: Shengzhou Liu 
---
 drivers/mtd/nand/fsl_elbc_nand.c |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index 742bf73..08a3aba 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -349,19 +349,24 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, 
unsigned int command,
fsl_elbc_run_command(mtd);
return;
 
-   /* READID must read all 5 possible bytes while CEB is active */
case NAND_CMD_READID:
-   dev_vdbg(priv->dev, "fsl_elbc_cmdfunc: NAND_CMD_READID.\n");
+   case NAND_CMD_PARAM:
+   dev_vdbg(priv->dev, "fsl_elbc_cmdfunc: NAND_CMD %x\n", command);
 
out_be32(&lbc->fir, (FIR_OP_CM0 << FIR_OP0_SHIFT) |
(FIR_OP_UA  << FIR_OP1_SHIFT) |
(FIR_OP_RBW << FIR_OP2_SHIFT));
-   out_be32(&lbc->fcr, NAND_CMD_READID << FCR_CMD0_SHIFT);
-   /* nand_get_flash_type() reads 8 bytes of entire ID string */
-   out_be32(&lbc->fbcr, 8);
-   elbc_fcm_ctrl->read_bytes = 8;
+   out_be32(&lbc->fcr, command << FCR_CMD0_SHIFT);
+   /* reads 8 bytes of entire ID string */
+   if (NAND_CMD_READID == command) {
+   out_be32(&lbc->fbcr, 8);
+   elbc_fcm_ctrl->read_bytes = 8;
+   } else {
+   out_be32(&lbc->fbcr, 256);
+   elbc_fcm_ctrl->read_bytes = 256;
+   }
elbc_fcm_ctrl->use_mdr = 1;
-   elbc_fcm_ctrl->mdr = 0;
+   elbc_fcm_ctrl->mdr = column;
 
set_addr(mtd, 0, 0, 0);
fsl_elbc_run_command(mtd);
-- 
1.6.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] mtd/nand: fixup for fmr initialization of Freescale NAND controller

2011-12-05 Thread Shengzhou Liu
There was a bug for fmr initialization, which lead to  fmr was always 0x100
in fsl_elbc_chip_init() and caused FCM command timeout before calling
fsl_elbc_chip_init_tail().

Signed-off-by: Shengzhou Liu 
---
 drivers/mtd/nand/fsl_elbc_nand.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index eedd8ee..742bf73 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -659,9 +659,7 @@ static int fsl_elbc_chip_init_tail(struct mtd_info *mtd)
if (chip->pagemask & 0xff00)
al++;
 
-   /* add to ECCM mode set in fsl_elbc_init */
-   priv->fmr |= (12 << FMR_CWTO_SHIFT) |  /* Timeout > 12 ms */
-(al << FMR_AL_SHIFT);
+   priv->fmr |= al << FMR_AL_SHIFT;
 
dev_dbg(priv->dev, "fsl_elbc_init: nand->numchips = %d\n",
chip->numchips);
@@ -764,8 +762,8 @@ static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv)
priv->mtd.priv = chip;
priv->mtd.owner = THIS_MODULE;
 
-   /* Set the ECCM according to the settings in bootloader.*/
-   priv->fmr = in_be32(&lbc->fmr) & FMR_ECCM;
+   /* Set fmr according to the settings in bootloader.*/
+   priv->fmr = in_be32(&lbc->fmr);
 
/* fill in nand_chip structure */
/* set up function call table */
-- 
1.6.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: Re: [PATCHv5] atomic: add *_dec_not_zero

2011-12-05 Thread David Laight
Looking at this:

> #ifndef atomic_inc_unless_negative
> static inline int atomic_inc_unless_negative(atomic_t *p)
> {
> int v, v1;
> for (v = 0; v >= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v + 1);
> if (likely(v1 == v))
> return 1;
> }
> return 0;
> }
> #endif

why is it optimised for '*p' being zero??
I'd have though the initial assignment to 'v' should
be made by reading '*p' without any memory barriers (etc).

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/6] RFCv2 Fix Fsl 8250 BRK bug

2011-12-05 Thread Alan Cox
> Anyway, have a look and see if this version of things is acceptable
> to all.  (Again, the dts update from Kumar isn't shown here).
> 
> Thanks to all who provided the feedback on v1.

Looks good to me

Acked-by: Alan Cox 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Problem with eLBC?

2011-12-05 Thread Alexander Lyasin
Dear Kumar Gala!

Our company is a client Freescale company, we use mpc8308, mpc8321 and other 
microprocessors. 
I have a board on the mpc8308 chip. This board runs the Linux kernel.
On this board there is NAND flash and DSP proc. on the Local Bus.
Chunk from DTS-file is below:

localbus@e0005000 {
#address-cells = <2>;
#size-cells = <1>;
compatible = "fsl,mpc8315-elbc", "fsl,elbc", "simple-bus";
reg = <0xe0005000 0x1000>;
interrupts = <77 0x8>;
interrupt-parent = <&ipic>;

// CS0 and CS1 are swapped when
// booting from nand, but the
// addresses are the same.
ranges = <0x0 0x0 0xfe00 0x0080
  0x1 0x0 0xe060 0x2000
  0x2 0x0 0xf000 0x0002
  0x3 0x0 0xfa00 0x8000>;

nand@1,0 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "fsl,mpc8315-fcm-nand",
 "fsl,elbc-fcm-nand";
reg = <0x1 0x0 0x2000>;

u-boot@0 {
reg = <0x0 0x10>;
read-only;
label = "U-Boot-NAND";
};
dtb@10 {
reg = <0x10 0x4>;
read-only;
label = "DTB-NAND";
};
kernel@14 {
reg = <0x14 0x20>;
read-only;
label = "Kernel-NAND";
};
jffs2@34 {
reg = <0x0034 0x01c0>;
label = "JFFS2-NAND";
};
reserve@1f4 {
reg = <0x01f4 0x000c>;
label = "Reserve";
};
};

dsp0@D002 {
reg = <0xD002 0x1>;
interrupts = <18 0x8>;
interrupt-parent = <&ipic>;
dsp0;
};

dsp1@D003 {
reg = <0xD003 0x1>;
interrupts = <19 0x8>;
interrupt-parent = <&ipic>;
dsp1;
};

User-level application periodically reads data from the DSP and writes the 
data to the DSP via a character device. When the application is reading from 
DSP or writing to the DSP, file system calls to cause errors:

[root@mpc8308-kd-124 /root]# ls -l  
 
mtd->read(0xdc bytes from 0x8af524) returned ECC error  
 
mtd->read(0x1fc bytes from 0x8aee04) returned ECC error 
 
mtd->read(0xac bytes from 0x8ae554) returned ECC error  
 
mtd->read(0x200 bytes from 0x8adc00) returned ECC error
..
..
...
when writing to flash errors occur following items:

Write of 1662 bytes at 0x01935244 failed. returned -5, retlen 0 
 
Not marking the space at 0x01935244 as dirty because the flash driver returned 
retlen zero   
Write of 1662 bytes at 0x0193 failed. returned -5, retlen 0 
 
Not marking the space at 0x0193 as dirty because the flash driver returned 
retlen zero   
nand_erase: start = 0x01c7, len = 16384 
 
nand_isbad_bbt(): bbt info for offs 0x01c7: (block 1820) 0x00   
 
nand_write_oob: to = 0x01c7, len = 8
 
cannot write OOB for EB at 0193, requested 8 bytes, read 0 bytes, error -5

I write to support this problem and I was told the followin

[PATCH] sbc834x: put full compat string in board match check

2011-12-05 Thread Paul Gortmaker
The commit 883c2cfc8bcc0fd00c5d9f596fb8870f481b5bda:

 "fix of_flat_dt_is_compatible() to match the full compatible string"

causes silent boot death on the sbc8349 board because it was
just looking for 8349 and not 8349E -- as originally there
were non-E (no SEC/encryption) chips available.  Just add the
E to the board detection string since all boards I've seen
were manufactured with the E versions.

Signed-off-by: Paul Gortmaker 

diff --git a/arch/powerpc/platforms/83xx/sbc834x.c 
b/arch/powerpc/platforms/83xx/sbc834x.c
index af41d8c..f5a783a 100644
--- a/arch/powerpc/platforms/83xx/sbc834x.c
+++ b/arch/powerpc/platforms/83xx/sbc834x.c
@@ -102,11 +102,11 @@ static int __init sbc834x_probe(void)
 {
unsigned long root = of_get_flat_dt_root();
 
-   return of_flat_dt_is_compatible(root, "SBC834x");
+   return of_flat_dt_is_compatible(root, "SBC834xE");
 }
 
 define_machine(sbc834x) {
-   .name   = "SBC834x",
+   .name   = "SBC834xE",
.probe  = sbc834x_probe,
.setup_arch = sbc834x_setup_arch,
.init_IRQ   = sbc834x_init_IRQ,
-- 
1.7.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 02/10] powerpc: Consolidate mpic_alloc() OF address translation

2011-12-05 Thread Moffett, Kyle D
On Dec 03, 2011, at 10:53, Kumar Gala wrote:
> On Dec 2, 2011, at 10:27 AM, Kyle Moffett wrote:
>> Instead of using the open-coded "reg" property lookup and address
>> translation in mpic_alloc(), directly call of_address_to_resource().
>> This includes various workarounds for special cases which the naive
>> of_address_translate() does not.
>> 
>> Afterwards it is possible to remove the copiously copy-pasted calls to
>> of_address_translate() from the 85xx/86xx/powermac platforms.
>> 
>> Signed-off-by: Kyle Moffett 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Grant Likely 
>> Cc: Kumar Gala 
>> ---
>> arch/powerpc/platforms/85xx/corenet_ds.c  |9 +
>> arch/powerpc/platforms/85xx/ksi8560.c |9 +
>> arch/powerpc/platforms/85xx/mpc8536_ds.c  |9 +
>> arch/powerpc/platforms/85xx/mpc85xx_ads.c |9 +
>> arch/powerpc/platforms/85xx/mpc85xx_cds.c |9 +
>> arch/powerpc/platforms/85xx/mpc85xx_ds.c  |   11 +
>> arch/powerpc/platforms/85xx/mpc85xx_mds.c |9 +
>> arch/powerpc/platforms/85xx/mpc85xx_rdb.c |   11 +
>> arch/powerpc/platforms/85xx/p1010rdb.c|9 +
>> arch/powerpc/platforms/85xx/p1022_ds.c|9 +
>> arch/powerpc/platforms/85xx/p1023_rds.c   |9 +
>> arch/powerpc/platforms/85xx/sbc8548.c |9 +
>> arch/powerpc/platforms/85xx/sbc8560.c |9 +
>> arch/powerpc/platforms/85xx/socrates.c|9 +
>> arch/powerpc/platforms/85xx/stx_gp3.c |9 +
>> arch/powerpc/platforms/85xx/tqm85xx.c |9 +
>> arch/powerpc/platforms/85xx/xes_mpc85xx.c |9 +
>> arch/powerpc/platforms/86xx/pic.c |4 +-
>> arch/powerpc/platforms/powermac/pic.c |8 +---
>> arch/powerpc/sysdev/mpic.c|   61 
>> -
>> 20 files changed, 55 insertions(+), 175 deletions(-)
> 
> What about cleaning up:
> 
> arch/powerpc/platforms/chrp/setup.c:chrp_mpic = mpic_alloc(np, opaddr, 
> MPIC_PRIMARY,
> arch/powerpc/platforms/embedded6xx/holly.c: mpic = mpic_alloc(tsi_pic, 
> mpic_paddr,
> arch/powerpc/platforms/embedded6xx/linkstation.c:   mpic = 
> mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC
> arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c:  mpic = 
> mpic_alloc(tsi_pic, mpic_paddr,
> arch/powerpc/platforms/embedded6xx/storcenter.c:mpic = 
> mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC
> arch/powerpc/platforms/maple/setup.c:   mpic = mpic_alloc(mpic_node, 
> openpic_addr, flags,
> arch/powerpc/platforms/pasemi/setup.c:  mpic = mpic_alloc(mpic_node, 
> openpic_addr,
> arch/powerpc/platforms/pseries/setup.c: mpic = mpic_alloc(pSeries_mpic_node, 
> openpic_addr,
> 
> Seems like we should be able to remove the 'phys_addr' argument altogether.

Well, ideally the MPIC code would just be a OF platform_driver with a
bit of supplementary platform_data to deal with device-tree flaws.
Unfortunately it's quite a long way from that.

Some platforms seem to prefer to use a "platform-open-pic" property on
the root node instead of setting up the "reg" node of the open-pic
itself.

Furthermore, the ISU configuration seems to be board-specific.  pSeries
seems to have all of the ISUs configured as additional cells in the
"platform-open-pic" property, but almost all of the rest are just
hard-coded offsets from the PIC address in the board-support code.

If it was possible to fix the device-trees on the systems with hardcoded
offsets then we could put the ISU addresses into the "platform-open-pic"
property and test that in mpic_alloc().

Otherwise there's still going to be a fair amount of hardcoding for
specific boards.

Regardless, I think this patch series is a good first cut and cleaning
up some of the more egregious code duplication there.

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 02/10] powerpc: Consolidate mpic_alloc() OF address translation

2011-12-05 Thread Kumar Gala

On Dec 5, 2011, at 12:41 PM, Moffett, Kyle D wrote:

> On Dec 03, 2011, at 10:53, Kumar Gala wrote:
>> On Dec 2, 2011, at 10:27 AM, Kyle Moffett wrote:
>>> Instead of using the open-coded "reg" property lookup and address
>>> translation in mpic_alloc(), directly call of_address_to_resource().
>>> This includes various workarounds for special cases which the naive
>>> of_address_translate() does not.
>>> 
>>> Afterwards it is possible to remove the copiously copy-pasted calls to
>>> of_address_translate() from the 85xx/86xx/powermac platforms.
>>> 
>>> Signed-off-by: Kyle Moffett 
>>> Cc: Benjamin Herrenschmidt 
>>> Cc: Paul Mackerras 
>>> Cc: Grant Likely 
>>> Cc: Kumar Gala 
>>> ---
>>> arch/powerpc/platforms/85xx/corenet_ds.c  |9 +
>>> arch/powerpc/platforms/85xx/ksi8560.c |9 +
>>> arch/powerpc/platforms/85xx/mpc8536_ds.c  |9 +
>>> arch/powerpc/platforms/85xx/mpc85xx_ads.c |9 +
>>> arch/powerpc/platforms/85xx/mpc85xx_cds.c |9 +
>>> arch/powerpc/platforms/85xx/mpc85xx_ds.c  |   11 +
>>> arch/powerpc/platforms/85xx/mpc85xx_mds.c |9 +
>>> arch/powerpc/platforms/85xx/mpc85xx_rdb.c |   11 +
>>> arch/powerpc/platforms/85xx/p1010rdb.c|9 +
>>> arch/powerpc/platforms/85xx/p1022_ds.c|9 +
>>> arch/powerpc/platforms/85xx/p1023_rds.c   |9 +
>>> arch/powerpc/platforms/85xx/sbc8548.c |9 +
>>> arch/powerpc/platforms/85xx/sbc8560.c |9 +
>>> arch/powerpc/platforms/85xx/socrates.c|9 +
>>> arch/powerpc/platforms/85xx/stx_gp3.c |9 +
>>> arch/powerpc/platforms/85xx/tqm85xx.c |9 +
>>> arch/powerpc/platforms/85xx/xes_mpc85xx.c |9 +
>>> arch/powerpc/platforms/86xx/pic.c |4 +-
>>> arch/powerpc/platforms/powermac/pic.c |8 +---
>>> arch/powerpc/sysdev/mpic.c|   61 
>>> -
>>> 20 files changed, 55 insertions(+), 175 deletions(-)
>> 
>> What about cleaning up:
>> 
>> arch/powerpc/platforms/chrp/setup.c:chrp_mpic = mpic_alloc(np, opaddr, 
>> MPIC_PRIMARY,
>> arch/powerpc/platforms/embedded6xx/holly.c: mpic = mpic_alloc(tsi_pic, 
>> mpic_paddr,
>> arch/powerpc/platforms/embedded6xx/linkstation.c:   mpic = 
>> mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC
>> arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c:  mpic = 
>> mpic_alloc(tsi_pic, mpic_paddr,
>> arch/powerpc/platforms/embedded6xx/storcenter.c:mpic = 
>> mpic_alloc(dnp, paddr, MPIC_PRIMARY | MPIC
>> arch/powerpc/platforms/maple/setup.c:   mpic = mpic_alloc(mpic_node, 
>> openpic_addr, flags,
>> arch/powerpc/platforms/pasemi/setup.c:  mpic = mpic_alloc(mpic_node, 
>> openpic_addr,
>> arch/powerpc/platforms/pseries/setup.c: mpic = mpic_alloc(pSeries_mpic_node, 
>> openpic_addr,
>> 
>> Seems like we should be able to remove the 'phys_addr' argument altogether.
> 
> Well, ideally the MPIC code would just be a OF platform_driver with a
> bit of supplementary platform_data to deal with device-tree flaws.
> Unfortunately it's quite a long way from that.
> 
> Some platforms seem to prefer to use a "platform-open-pic" property on
> the root node instead of setting up the "reg" node of the open-pic
> itself.
> 
> Furthermore, the ISU configuration seems to be board-specific.  pSeries
> seems to have all of the ISUs configured as additional cells in the
> "platform-open-pic" property, but almost all of the rest are just
> hard-coded offsets from the PIC address in the board-support code.
> 
> If it was possible to fix the device-trees on the systems with hardcoded
> offsets then we could put the ISU addresses into the "platform-open-pic"
> property and test that in mpic_alloc().
> 
> Otherwise there's still going to be a fair amount of hardcoding for
> specific boards.
> 
> Regardless, I think this patch series is a good first cut and cleaning
> up some of the more egregious code duplication there.
> 
> Cheers,
> Kyle Moffett

Agreed its a good first pass cleanup but it doesn't seem like we're that far 
off from remove the 'phys_addr' being passed in.

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] mtd/nand: fixup for fmr initialization of Freescale NAND controller

2011-12-05 Thread Scott Wood
On 12/05/2011 04:54 AM, Shengzhou Liu wrote:
> There was a bug for fmr initialization, which lead to  fmr was always 0x100
> in fsl_elbc_chip_init() and caused FCM command timeout before calling
> fsl_elbc_chip_init_tail().
> 
> Signed-off-by: Shengzhou Liu 
> ---
>  drivers/mtd/nand/fsl_elbc_nand.c |8 +++-
>  1 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mtd/nand/fsl_elbc_nand.c 
> b/drivers/mtd/nand/fsl_elbc_nand.c
> index eedd8ee..742bf73 100644
> --- a/drivers/mtd/nand/fsl_elbc_nand.c
> +++ b/drivers/mtd/nand/fsl_elbc_nand.c
> @@ -659,9 +659,7 @@ static int fsl_elbc_chip_init_tail(struct mtd_info *mtd)
>   if (chip->pagemask & 0xff00)
>   al++;
>  
> - /* add to ECCM mode set in fsl_elbc_init */
> - priv->fmr |= (12 << FMR_CWTO_SHIFT) |  /* Timeout > 12 ms */
> -  (al << FMR_AL_SHIFT);
> + priv->fmr |= al << FMR_AL_SHIFT;
>  
>   dev_dbg(priv->dev, "fsl_elbc_init: nand->numchips = %d\n",
>   chip->numchips);
> @@ -764,8 +762,8 @@ static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv)
>   priv->mtd.priv = chip;
>   priv->mtd.owner = THIS_MODULE;
>  
> - /* Set the ECCM according to the settings in bootloader.*/
> - priv->fmr = in_be32(&lbc->fmr) & FMR_ECCM;
> + /* Set fmr according to the settings in bootloader.*/
> + priv->fmr = in_be32(&lbc->fmr);
>  
>   /* fill in nand_chip structure */
>   /* set up function call table */

We shouldn't be relying on the bootloader to provide a sane value here
-- the bootloader may not have used/initialized NAND at all.

It's sort of OK for ECCM, since unless you're trying to match an
externally programmed flash, or the bootloader uses the flash, all we
really care about is that the value stay consistent.  The timeout, OTOH,
must not be set too low or things won't work.

We should just set a value that we believe to be high enough for all uses.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-12-05 Thread Scott Wood
On 12/05/2011 12:47 AM, Artem Bityutskiy wrote:
> On Sun, 2011-12-04 at 12:31 +0800, shuo@freescale.com wrote:
>> +/*
>> + * Freescale FCM controller has a 2K size limitation of buffer
>> + * RAM, so elbc_fcm_ctrl->buffer have to be used if writesize
>> + * of chip is greater than 2048.
>> + * We malloc a large enough buffer (maximum page size is 16K).
>> + */
>> +elbc_fcm_ctrl->buffer = kmalloc(1024 * 16 + 1024, GFP_KERNEL);
>> +if (!elbc_fcm_ctrl->buffer) {
>> +dev_err(dev, "failed to allocate memory\n");
>> +mutex_unlock(&fsl_elbc_nand_mutex);
>> +ret = -ENOMEM;
>> +goto err;
>> +}
> 
> Sorry for returning to this again and agian - I do not have time to dig
> suggest you the right solutions on the one hand, you do not provide me a
> good answer on the other hand (or I forgot?).
> 
> 16KiB pages do not even exist I believe.

Googling turns up some hints of it, but nothing concrete such as a
datasheet.  We can assume 8K max for now and adjust it later, as the
need becomes clear.

> And you kmalloc 33KiB or RAM

17KiB, or 9KiB if we forget about 16K-page NAND.

> although in most cases you need only 5KiB. I think this is wrong -
> what is the very strong reason of wasting RAM you have?
> 
> Why you cannot allocate exactly the required amount of RAM after
> 'nand_scan_ident()' finishes and you know the page size?

Because this is a controller resource, shared by multiple NAND chips
that may be different page sizes (even if not, it's adding another point
of synchronization required between initialization of different chips).
 I don't think it's worth the gymnastics to save a few KiB.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Powerbook G4 and sound

2011-12-05 Thread Петр метель
Hello,

I've got problem with sound on my Powerbook G4 Titanium
(PowerBook3,5). Sound always cranks and skips when I moving my
touchpad, regardless of the source of sound. It happens in all my
programs, including vlc, mplayer, e-uae and ioquake3 - they show in
the log messeges like "underrun occured" and "broken pipe".
If I remember correctly, this bug appears on kernel 2.6.32-5 (from
Debian Sqeeze) and later. On kernel 2.6.38 from Ubuntu sound works
_almost_ normally (cranks occur rarely).

I used also mpd server, but it breaks after sound cranks (ncmpc shows
"Timeout") and I can only restart mpd server (/etc/init.d/mpd restart)
to get mpd working (for some time).

Log from alsa-test.sh in attachement.

Sorry for my English :)

Best regards,
Петр метель


alsa-info.txt.zUlUP5eQPO
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Problem with eLBC?

2011-12-05 Thread Scott Wood
On 12/05/2011 08:02 AM, Alexander Lyasin wrote:
> In reply to your Service Request SR 1-807899446:
> 
> Yes, due to several design peculiarities in local bus nand controller,
> simultaneous accesses to nand flash and to other local bus memory
> controller may cause nand flash controller access failure. Our linux
> team suggested to use "software lock" method to avoid this problem -
> please do not use other local bus controllers, when nand flash is accessed.

What kernel version are you using?  The latest mainline kernel should
not have this issue.

Make sure you have these patches:

commit d08e44570ed611c527a1062eb4f8c6ac61832e6e
Author: Shengzhou Liu 
Date:   Thu May 19 18:48:01 2011 +0800

powerpc/fsl_lbc: Add workaround for ELBC-A001 erratum

Simultaneous FCM and GPCM or UPM operation may erroneously trigger
bus monitor timeout.

Set the local bus monitor timeout value to the maximum by setting
LBCR[BMT] = 0 and LBCR[BMTPS] = 0xF.

Signed-off-by: Shengzhou Liu 
Signed-off-by: Kumar Gala 

and

commit 476459a6cf46d20ec73d9b211f3894ced5f9871e
Author: Scott Wood 
Date:   Fri Nov 13 14:13:01 2009 -0600

mtd: eLBC NAND: use recommended command sequences

Currently, the program and erase sequences do not wait for completion,
instead relying on a subsequent waitfunc() callback.  However, this
causes
the chipselect to be deasserted while the NAND chip is still
asserting the
busy pin, which can corrupt activity on other chipselects.

This patch switches to using the sequences recommended by the manual,
in which a wait is performed within the initial command sequence.
We can
now re-use the status byte from the initial command sequence, rather
than
having to do another status read in the waitfunc.

Since we're already touching the command sequences, it also cleans
up some
cruft in SEQIN that isn't needed since we cannot program partial pages
outside of OOB.

Signed-off-by: Scott Wood 
Reported-by: Suchit Lepcha 
Signed-off-by: Artem Bityutskiy 
Signed-off-by: David Woodhouse 

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] rapidio/tsi721: switch to dma_zalloc_coherent

2011-12-05 Thread Alexandre Bounine
Replaces pair dma_alloc_coherent()+memset() with new dma_zalloc_coherent()
added by Andrew Morton for kernel version 3.2

Signed-off-by: Alexandre Bounine 
---
 drivers/rapidio/devices/tsi721.c |   17 -
 1 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index 5225930..514c28c 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -851,14 +851,12 @@ static int tsi721_doorbell_init(struct tsi721_device 
*priv)
INIT_WORK(&priv->idb_work, tsi721_db_dpc);
 
/* Allocate buffer for inbound doorbells queue */
-   priv->idb_base = dma_alloc_coherent(&priv->pdev->dev,
+   priv->idb_base = dma_zalloc_coherent(&priv->pdev->dev,
IDB_QSIZE * TSI721_IDB_ENTRY_SIZE,
&priv->idb_dma, GFP_KERNEL);
if (!priv->idb_base)
return -ENOMEM;
 
-   memset(priv->idb_base, 0, IDB_QSIZE * TSI721_IDB_ENTRY_SIZE);
-
dev_dbg(&priv->pdev->dev, "Allocated IDB buffer @ %p (phys = %llx)\n",
priv->idb_base, (unsigned long long)priv->idb_dma);
 
@@ -904,7 +902,7 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, 
int chnum)
 */
 
/* Allocate space for DMA descriptors */
-   bd_ptr = dma_alloc_coherent(&priv->pdev->dev,
+   bd_ptr = dma_zalloc_coherent(&priv->pdev->dev,
bd_num * sizeof(struct tsi721_dma_desc),
&bd_phys, GFP_KERNEL);
if (!bd_ptr)
@@ -913,8 +911,6 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, 
int chnum)
priv->bdma[chnum].bd_phys = bd_phys;
priv->bdma[chnum].bd_base = bd_ptr;
 
-   memset(bd_ptr, 0, bd_num * sizeof(struct tsi721_dma_desc));
-
dev_dbg(&priv->pdev->dev, "DMA descriptors @ %p (phys = %llx)\n",
bd_ptr, (unsigned long long)bd_phys);
 
@@ -922,7 +918,7 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, 
int chnum)
sts_size = (bd_num >= TSI721_DMA_MINSTSSZ) ?
bd_num : TSI721_DMA_MINSTSSZ;
sts_size = roundup_pow_of_two(sts_size);
-   sts_ptr = dma_alloc_coherent(&priv->pdev->dev,
+   sts_ptr = dma_zalloc_coherent(&priv->pdev->dev,
 sts_size * sizeof(struct tsi721_dma_sts),
 &sts_phys, GFP_KERNEL);
if (!sts_ptr) {
@@ -938,8 +934,6 @@ static int tsi721_bdma_ch_init(struct tsi721_device *priv, 
int chnum)
priv->bdma[chnum].sts_base = sts_ptr;
priv->bdma[chnum].sts_size = sts_size;
 
-   memset(sts_ptr, 0, sts_size);
-
dev_dbg(&priv->pdev->dev,
"desc status FIFO @ %p (phys = %llx) size=0x%x\n",
sts_ptr, (unsigned long long)sts_phys, sts_size);
@@ -1400,7 +1394,7 @@ static int tsi721_open_outb_mbox(struct rio_mport *mport, 
void *dev_id,
 
/* Outbound message descriptor status FIFO allocation */
priv->omsg_ring[mbox].sts_size = roundup_pow_of_two(entries + 1);
-   priv->omsg_ring[mbox].sts_base = dma_alloc_coherent(&priv->pdev->dev,
+   priv->omsg_ring[mbox].sts_base = dma_zalloc_coherent(&priv->pdev->dev,
priv->omsg_ring[mbox].sts_size *
sizeof(struct tsi721_dma_sts),
&priv->omsg_ring[mbox].sts_phys, GFP_KERNEL);
@@ -1412,9 +1406,6 @@ static int tsi721_open_outb_mbox(struct rio_mport *mport, 
void *dev_id,
goto out_desc;
}
 
-   memset(priv->omsg_ring[mbox].sts_base, 0,
-   entries * sizeof(struct tsi721_dma_sts));
-
/*
 * Configure Outbound Messaging Engine
 */
-- 
1.7.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc/85xx: add a 32-bit P1022DS device tree

2011-12-05 Thread Timur Tabi
Kumar Gala wrote:
> look at how mpc8572ds handles 36b.dts we put common definitions in a shared 
> file.

Ok, I've made those changes, but when I boot the kernel, I'm seeing this.  Can 
you give me a clue as to what's wrong?

PCI: Probing PCI hardware   
pci :00:00.0: [1957:0110] type 1 class 0x000b20 
pci :00:00.0: ignoring class b20 (doesn't match header type 01) 
pci :00:00.0: supports D1 D2
pci :00:00.0: PME# supported from D0 D1 D2 D3hot D3cold 
pci :00:00.0: PME# disabled 
pci :00:00.0: PCI bridge to [bus 01-ff] 
pci 0001:02:00.0: [1957:0110] type 1 class 0x000b20 
pci 0001:02:00.0: ignoring class b20 (doesn't match header type 01) 
pci 0001:02:00.0: supports D1 D2
pci 0001:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold 
pci 0001:02:00.0: PME# disabled 
pci 0001:02:00.0: PCI bridge to [bus 03-ff] 
pci 0002:04:00.0: [1957:0110] type 1 class 0x000b20 
pci 0002:04:00.0: ignoring class b20 (doesn't match header type 01) 
pci 0002:04:00.0: supports D1 D2
pci 0002:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold 
pci 0002:04:00.0: PME# disabled 
pci 0002:05:00.0: [8086:10d3] type 0 class 0x000200 
pci 0002:05:00.0: reg 10: [mem 0x8000-0x8001]   
pci 0002:05:00.0: reg 14: [mem 0x8008-0x800f]   
pci 0002:05:00.0: reg 18: [io  0x1000-0x101f]   
pci 0002:05:00.0: reg 1c: [mem 0x8010-0x80103fff]   
pci 0002:05:00.0: reg 30: [mem 0x-0x0003 pref]  
pci 0002:05:00.0: PME# supported from D0 D3hot D3cold   
pci 0002:05:00.0: PME# disabled 
pci 0002:04:00.0: PCI bridge to [bus 05-ff] 
pci 0002:04:00.0:   bridge window [mem 0x8000-0x801f]   
PCI: Cannot allocate resource region 0 of device 0002:05:00.0, will remap   
PCI: Cannot allocate resource region 1 of device 0002:05:00.0, will remap   
PCI: Cannot allocate resource region 3 of device 0002:05:00.0, will remap   
PCI :00 Cannot reserve Legacy IO [io  0xffbed000-0xffbedfff]
PCI 0001:02 Cannot reserve Legacy IO [io  0xffbdb000-0xffbdbfff]
PCI 0002:04 Cannot reserve Legacy IO [io  0xffbc9000-0xffbc9fff]
PCI: max bus depth: 1 pci_try_num: 2
pci :00:00.0: PCI bridge to [bus 01-01] 
pci :00:00.0:   bridge window [io  0xffbed000-0xffbfcfff]   
pci :00:00.0:   bridge window [mem 0xa000-0xbfff]   
pci 0001:02:00.0: PCI bridge to [bus 03-03] 
pci 0001:02:00.0:   bridge window [io  0xffbdb000-0xffbeafff]   
pci 0001:02:00.0:   bridge window [mem 0xc000-0xdfff]   
pci 0002:04:00.0: BAR 9: can't assign mem pref (size 0x10)  
pci 0002:05:00.0: BAR 1: assigned [mem 0x8000-0x8007]   
pci 0002:05:00.0: BAR 1: set to [mem 0x8000-0x8007] (PCI address [0xe000
-0xe007])   
pci 0002:05:00.0: BAR 6: assigned [mem 0x8008-0x800b pref]  
pci 0002:05:00.0: BAR 0: assigned [mem 0x800c-0x800d]   
pci 0002:05:00.0: BAR 0: set to [mem 0x800c-0x800d] (PCI address [0xe00c
-0xe00d])   
pci 0002:05:00.0: BAR 3: assigned [mem 0x800e-0x800e3fff]   
pci 0002:05:00.0: BAR 3: set to [mem 0x800e-0x800e3fff] (PCI address [0xe00e
-0xe00e3fff])   
pci 0002:04:00.0: PCI bridge to [bus 05-05] 
pci 0002:04:00.0:   bridge window [io  0xffbc9000-0xffbd8fff]   
pci 0002:04:00.0:   bridge window [mem 0x8000-0x9fff]   
pci :00:00.0: enabling device (0106 -> 0107)
pci 0001:02:00.0: enabling device (0106 -> 0107)
pci 0002:04:00.0: enabling device (0106 -> 0107)
pci_bus :00: resource 0 [io  0xffbed000-0xffbfcfff]

next BUG: using smp_processor_id() in preemptible

2011-12-05 Thread Hugh Dickins
3.2.0-rc3-next-20111202 with CONFIG_DEBUG_PREEMPT=y gives me lots of

Dec  4 20:03:19 thorn kernel: BUG: using smp_processor_id() in preemptible 
[] code: startpar/1365
Dec  4 20:03:19 thorn kernel: caller is .arch_local_irq_restore+0x44/0x90
Dec  4 20:03:19 thorn kernel: Call Trace:
Dec  4 20:03:19 thorn kernel: [c001b45a7c60] [c0011fe8] 
.show_stack+0x6c/0x16c (unreliable)
Dec  4 20:03:19 thorn kernel: [c001b45a7d10] [c024318c] 
.debug_smp_processor_id+0xe4/0x11c
Dec  4 20:03:19 thorn kernel: [c001b45a7da0] [c000e2e8] 
.arch_local_irq_restore+0x44/0x90
Dec  4 20:03:19 thorn kernel: [c001b45a7e30] [c0005870] 
.do_hash_page+0x70/0x74
Dec  4 20:03:21 thorn kernel: debug_smp_processor_id: 21950 callbacks suppressed

from the u64 *next_tb = &__get_cpu_var(decrementers_next_tb)
in decrementer_check_overflow(): I've no idea whether it's safe
just to use get_cpu_var then put_cpu_var there instead,
but no hurry, I can survive with DEBUG_PREEMPT off.

Hugh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


"KVM: PPC: booke: Improve timer register emulation" breaks Book3s HV

2011-12-05 Thread Paul Mackerras
I'm not sure why yet, but commit 8a97c432 ("KVM: PPC: booke: Improve
timer register emulation") in Alex's kvm-ppc-next branch is breaking
Book3S HV KVM on POWER7.  Guest cpus fail to spin up, and even with
just one cpu, the guest stalls every so often.  If I stop the guest
and inspect the state with qemu, PC is at 0x900.  Reverting 8a97c432
makes it work properly again.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Provide a way for KVM to indicate that NV GPR values are lost

2011-12-05 Thread Paul Mackerras
This fixes a problem where a CPU thread coming out of nap mode can
think it has valid values in the nonvolatile GPRs (r14 - r31) as saved
away in power7_idle, but in fact the values have been trashed because
the thread was used for KVM in the mean time.  The result is that the
thread crashes because code that called power7_idle (e.g.,
pnv_smp_cpu_kill_self()) goes to use values in registers that have
been trashed.

The bit field in SRR1 that tells whether state was lost only reflects
the most recent nap, which may not have been the nap instruction in
power7_idle.  So we need an extra PACA field to indicate that state
has been lost even if SRR1 indicates that the most recent nap didn't
lose state.  We clear this field when saving the state in power7_idle,
we set it to a non-zero value when we use the thread for KVM, and we
test it in power7_wakeup_noloss.

Signed-off-by: Paul Mackerras 
---
I assume this should go via Ben's tree, since it touches more powerpc
code than PPC KVM code.

 arch/powerpc/include/asm/paca.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kernel/idle_power7.S   |4 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |3 +++
 4 files changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 17722c7..269c05a 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -135,6 +135,7 @@ struct paca_struct {
u8 hard_enabled;/* set if irqs are enabled in MSR */
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
+   u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
 
 #ifdef CONFIG_PPC_POWERNV
/* Pointer to OPAL machine check event structure set by the
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ec24b36..8e0db0b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -208,6 +208,7 @@ int main(void)
DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time));
DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+   DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct, nap_state_lost));
 #endif /* CONFIG_PPC64 */
 
/* RTAS */
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 3a70845..fcdff19 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -54,6 +54,7 @@ _GLOBAL(power7_idle)
li  r0,0
stb r0,PACASOFTIRQEN(r13)   /* we'll hard-enable shortly */
stb r0,PACAHARDIRQEN(r13)
+   stb r0,PACA_NAPSTATELOST(r13)
 
/* Continue saving state */
SAVE_GPR(2, r1)
@@ -86,6 +87,9 @@ _GLOBAL(power7_wakeup_loss)
rfid
 
 _GLOBAL(power7_wakeup_noloss)
+   lbz r0,PACA_NAPSTATELOST(r13)
+   cmpwi   r0,0
+   bne .power7_wakeup_loss
ld  r1,PACAR1(r13)
ld  r4,_MSR(r1)
ld  r5,_NIP(r1)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 7b8dbf6..b70bf22 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -112,6 +112,9 @@ kvm_start_guest:
stbcix  r0, r5, r6  /* clear it */
stwcix  r8, r5, r7  /* EOI it */
 
+   /* NV GPR values from power7_idle() will no longer be valid */
+   stb r0, PACA_NAPSTATELOST(r13)
+
 .global kvmppc_hv_entry
 kvmppc_hv_entry:
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/powernv: Fix problems in onlining CPUs

2011-12-05 Thread Paul Mackerras
At present, on the powernv platform, if you off-line a CPU that was
online, and then try to on-line it again, the kernel generates a
warning message "OPAL Error -1 starting CPU n".  Furthermore, if the
CPU is a secondary thread that was used by KVM while it was off-line,
the CPU fails to come online.

The first problem is fixed by only calling OPAL to start the CPU the
first time it is on-lined, as indicated by the cpu_start field of its
PACA being zero.  The second problem is fixed by restoring the
cpu_start field to 1 instead of 0 when using the CPU within KVM.

Signed-off-by: Paul Mackerras 
---
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index e37f8f4..ca9b733 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -65,7 +65,7 @@ BEGIN_FTR_SECTION
lbz r0,PACAPROCSTART(r13)
cmpwi   r0,0x80
bne 1f
-   li  r0,0
+   li  r0,1
stb r0,PACAPROCSTART(r13)
b   kvm_start_guest
 1:
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index e877366..17210c5 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -75,7 +75,7 @@ int __devinit pnv_smp_kick_cpu(int nr)
/* On OPAL v2 the CPU are still spinning inside OPAL itself,
 * get them back now
 */
-   if (firmware_has_feature(FW_FEATURE_OPALv2)) {
+   if (!paca[nr].cpu_start && firmware_has_feature(FW_FEATURE_OPALv2)) {
pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", nr, pcpu);
rc = opal_start_cpu(pcpu, start_here);
if (rc != OPAL_SUCCESS)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 07/13] KVM: PPC: Allow use of small pages to back Book3S HV guests

2011-12-05 Thread Paul Mackerras
This relaxes the requirement that the guest memory be provided as
16MB huge pages, allowing it to be provided as normal memory, i.e.
in pages of PAGE_SIZE bytes (4k or 64k).  To allow this, we index
the kvm->arch.slot_phys[] arrays with a small page index, even if
huge pages are being used, and use the low-order 5 bits of each
entry to store the order of the enclosing page with respect to
normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE).

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |8 ++
 arch/powerpc/include/asm/kvm_host.h  |3 +-
 arch/powerpc/include/asm/kvm_ppc.h   |2 +-
 arch/powerpc/include/asm/reg.h   |1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  122 --
 arch/powerpc/kvm/book3s_hv.c |   57 --
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |6 +-
 7 files changed, 130 insertions(+), 69 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index ab6772e..d55e6b4 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -107,4 +107,12 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
return 0;   /* error */
 }
 
+static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
+  unsigned long pagesize)
+{
+   unsigned long mask = (pagesize >> PAGE_SHIFT) - 1;
+
+   return !(memslot->base_gfn & mask) && !(memslot->npages & mask);
+}
+
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2a52bdb..ba1da85 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -176,14 +176,13 @@ struct revmap_entry {
 };
 
 /* Low-order bits in kvm->arch.slot_phys[][] */
+#define KVMPPC_PAGE_ORDER_MASK 0x1f
 #define KVMPPC_GOT_PAGE0x80
 
 struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
struct revmap_entry *revmap;
-   unsigned long ram_psize;
-   unsigned long ram_porder;
unsigned int lpid;
unsigned int host_lpid;
unsigned long host_lpcr;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 111e1b4..a61b5b5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -122,7 +122,7 @@ extern void kvmppc_free_hpt(struct kvm *kvm);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
 extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
-   struct kvm_memory_slot *memslot);
+   struct kvm_memory_slot *memslot, unsigned long porder);
 extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 559da19..4599d12 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -237,6 +237,7 @@
 #define   LPCR_ISL (1ul << (63-2))
 #define   LPCR_VC_SH   (63-2)
 #define   LPCR_DPFD_SH (63-11)
+#define   LPCR_VRMASD  (0x1ful << (63-16))
 #define   LPCR_VRMA_L  (1ul << (63-12))
 #define   LPCR_VRMA_LP0(1ul << (63-15))
 #define   LPCR_VRMA_LP1(1ul << (63-16))
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 87016cc..cc18f3d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -34,8 +34,6 @@
 #include 
 #include 
 
-/* Pages in the VRMA are 16MB pages */
-#define VRMA_PAGE_ORDER24
 #define VRMA_VSID  0x1ffUL /* 1TB VSID reserved for VRMA */
 
 /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
@@ -95,17 +93,31 @@ void kvmppc_free_hpt(struct kvm *kvm)
free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT);
 }
 
-void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot)
+/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
+static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
+{
+   return (pgsize > 0x1000) ? HPTE_V_LARGE : 0;
+}
+
+/* Bits in second HPTE dword for pagesize 4k, 64k or 16M */
+static inline unsigned long hpte1_pgsize_encoding(unsigned long pgsize)
+{
+   return (pgsize == 0x1) ? 0x1000 : 0;
+}
+
+void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
+unsigned long porder)
 {
-   struct kvm *kvm = vcpu->kvm;
unsigned long i;
unsigned long npages;
unsigned long hp_v, hp_r;
unsigned long addr, hash;
-   unsigned long porder = kvm->arch.ram_porder;
+   unsigned long psize;
+   unsigned long hp0, hp1;
   

[PATCH 12/13] KVM: PPC: Implement MMU notifiers for Book3S HV guests

2011-12-05 Thread Paul Mackerras
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7.  Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page.  Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.

Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970.  We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out.  If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|4 +
 arch/powerpc/include/asm/kvm_book3s_64.h |   31 
 arch/powerpc/include/asm/kvm_host.h  |   16 ++
 arch/powerpc/include/asm/reg.h   |3 +
 arch/powerpc/kvm/Kconfig |1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  268 --
 arch/powerpc/kvm/book3s_hv.c |   25 ++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  140 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   49 ++
 arch/powerpc/kvm/powerpc.c   |3 +
 arch/powerpc/mm/hugetlbpage.c|2 +
 11 files changed, 483 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5ac53f9..72688d8 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -145,6 +145,10 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat,
 extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
 extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu 
*vcpu);
 extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
+   unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep,
+   unsigned long pte_index);
 extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
unsigned long *nb_ret);
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9a59b6d..75a1b42 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -130,6 +130,37 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, 
unsigned long io_type)
return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type;
 }
 
+/*
+ * Lock and read a linux PTE.  If it's present and writable, atomically
+ * set dirty and referenced bits and return the PTE, otherwise return 0.
+ */
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p)
+{
+   pte_t pte, tmp;
+
+   /* wait until _PAGE_BUSY is clear then set it atomically */
+   __asm__ __volatile__ (
+   "1: ldarx   %0,0,%3\n"
+   "   andi.   %1,%0,%4\n"
+   "   bne-1b\n"
+   "   ori %1,%0,%4\n"
+   "   stdcx.  %1,0,%3\n"
+   "   bne-1b"
+   : "=&r" (pte), "=&r" (tmp), "=m" (*p)
+   : "r" (p), "i" (_PAGE_BUSY)
+   : "cc");
+
+   if (pte_present(pte)) {
+   pte = pte_mkyoung(pte);
+   if (pte_write(pte))
+   pte = pte_mkdirty(pte);
+   }
+
+   *p = pte;   /* clears _PAGE_BUSY */
+
+   return pte;
+}
+
 /* Return HPTE cache control bits corresponding to Linux pte bits */
 static inline unsigned long hpte_cache_bits(unsigned long pte_val)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index c9c92f0..eb20ddc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
@@ -43,6 +44,19 @@
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 #endif
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+#include 
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+
+struct kvm;
+extern int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
+extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+extern void kvm_set_spte_hva(struct kv

[PATCH 01/13] KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code

2011-12-05 Thread Paul Mackerras
This moves the get/set_one_reg implementation down from powerpc.c into
booke.c, book3s_pr.c and book3s_hv.c.  This avoids #ifdefs in C code,
but more importantly, it fixes a bug on Book3s HV where we were
accessing beyond the end of the kvm_vcpu struct (via the to_book3s()
macro) and corrupting memory, causing random crashes and file corruption.

On Book3s HV we only accept setting the HIOR to zero, since the guest
runs in supervisor mode and its vectors are never offset from zero.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_ppc.h |3 ++
 arch/powerpc/kvm/book3s_hv.c   |   33 ++
 arch/powerpc/kvm/book3s_pr.c   |   33 ++
 arch/powerpc/kvm/booke.c   |   10 +
 arch/powerpc/kvm/powerpc.c |   39 
 5 files changed, 79 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 5192c2e..fc2d696 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -176,6 +176,9 @@ int kvmppc_core_set_sregs(struct kvm_vcpu *vcpu, struct 
kvm_sregs *sregs);
 void kvmppc_get_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
 int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg);
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg);
+
 void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid);
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ecc77fa..5efdd5b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -390,6 +390,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   reg->u.reg64 = 0;
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   /* Only allow this to be set to zero */
+   if (reg->u.reg64 == 0)
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
 int kvmppc_core_check_processor_compat(void)
 {
if (cpu_has_feature(CPU_FTR_HVMODE))
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index cbb7051..1abe35c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -837,6 +837,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   reg->u.reg64 = to_book3s(vcpu)->hior;
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   to_book3s(vcpu)->hior = reg->u.reg64;
+   to_book3s(vcpu)->hior_explicit = true;
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
 int kvmppc_core_check_processor_compat(void)
 {
return 0;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 9e41f45..ee9e1ee 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -887,6 +887,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return kvmppc_core_set_sregs(vcpu, sregs);
 }
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   return -EINVAL;
+}
+
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   return -EINVAL;
+}
+
 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
return -ENOTSUPP;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 34515e8..1239c6f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -620,45 +620,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
return r;
 }
 
-static int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
- struct kvm_one_reg *reg)
-{
-   int r = -EINVAL;
-
-   switch (reg->id) {
-#ifdef CONFIG_PPC_BOOK3S
-   case KVM_ONE_REG_PPC_HIOR:
-   reg->u.reg64 = to_book3s(vcpu)->hior;
-   r = 0;
-   

[PATCH 13/13] KVM: PPC: Allow for read-only pages backing a Book3S HV guest

2011-12-05 Thread Paul Mackerras
With this, if a guest does an H_ENTER with a read/write HPTE on a page
which is currently read-only, we make the actual HPTE inserted be a
read-only version of the HPTE.  We now intercept protection faults as
well as HPTE not found faults, and for a protection fault we work out
whether it should be reflected to the guest (e.g. because the guest HPTE
didn't allow write access to usermode) or handled by switching to
kernel context and calling kvmppc_book3s_hv_page_fault, which will then
request write access to the page and update the actual HPTE.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   20 -
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |   33 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   32 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |4 +-
 4 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 75a1b42..37755d0 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -115,6 +115,22 @@ static inline unsigned long hpte_rpn(unsigned long ptel, 
unsigned long psize)
return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
 }
 
+static inline int hpte_is_writable(unsigned long ptel)
+{
+   unsigned long pp = ptel & (HPTE_R_PP0 | HPTE_R_PP);
+
+   return pp != PP_RXRX && pp != PP_RXXX;
+}
+
+static inline unsigned long hpte_make_readonly(unsigned long ptel)
+{
+   if ((ptel & HPTE_R_PP0) || (ptel & HPTE_R_PP) == PP_RWXX)
+   ptel = (ptel & ~HPTE_R_PP) | PP_RXXX;
+   else
+   ptel |= PP_RXRX;
+   return ptel;
+}
+
 static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long 
io_type)
 {
unsigned int wimg = ptel & HPTE_R_WIMG;
@@ -134,7 +150,7 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, 
unsigned long io_type)
  * Lock and read a linux PTE.  If it's present and writable, atomically
  * set dirty and referenced bits and return the PTE, otherwise return 0.
  */
-static inline pte_t kvmppc_read_update_linux_pte(pte_t *p)
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
 {
pte_t pte, tmp;
 
@@ -152,7 +168,7 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *p)
 
if (pte_present(pte)) {
pte = pte_mkyoung(pte);
-   if (pte_write(pte))
+   if (writing && pte_write(pte))
pte = pte_mkdirty(pte);
}
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6919d99..b1b31c7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -502,6 +502,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
struct page *page, *pages[1];
long index, ret, npages;
unsigned long is_io;
+   unsigned int writing, write_ok;
struct vm_area_struct *vma;
 
/*
@@ -552,8 +553,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
pfn = 0;
page = NULL;
pte_size = PAGE_SIZE;
+   writing = (dsisr & DSISR_ISSTORE) != 0;
+   /* If writing != 0, then the HPTE must allow writing, if we get here */
+   write_ok = writing;
hva = gfn_to_hva_memslot(memslot, gfn);
-   npages = get_user_pages_fast(hva, 1, 1, pages);
+   npages = get_user_pages_fast(hva, 1, writing, pages);
if (npages < 1) {
/* Check if it's an I/O mapping */
down_read(¤t->mm->mmap_sem);
@@ -564,6 +568,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
((hva - vma->vm_start) >> PAGE_SHIFT);
pte_size = psize;
is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot));
+   write_ok = vma->vm_flags & VM_WRITE;
}
up_read(¤t->mm->mmap_sem);
if (!pfn)
@@ -574,6 +579,18 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
page = compound_head(page);
pte_size <<= compound_order(page);
}
+   /* if the guest wants write access, see if that is OK */
+   if (!writing && hpte_is_writable(hpte[2])) {
+   pte_t *ptep, pte;
+
+   ptep = find_linux_pte_or_hugepte(current->mm->pgd,
+hva, NULL);
+   if (ptep && pte_present(*ptep)) {
+   pte = kvmppc_read_update_linux_pte(ptep, 1);
+   if (pte_write(pte))
+   write_ok = 1;
+   }
+   }
pfn = page_to_pfn(pa

[PATCH 08/13] KVM: PPC: Allow I/O mappings in memory slots

2011-12-05 Thread Paul Mackerras
This provides for the case where userspace maps an I/O device into the
address range of a memory slot using a VM_PFNMAP mapping.  In that
case, we work out the pfn from vma->vm_pgoff, and record the cache
enable bits from vma->vm_page_prot in two low-order bits in the
slot_phys array entries.  Then, in kvmppc_h_enter() we check that the
cache bits in the HPTE that the guest wants to insert match the cache
bits in the slot_phys array entry.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   26 +++
 arch/powerpc/include/asm/kvm_host.h  |2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |   67 --
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |5 +-
 4 files changed, 76 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index d55e6b4..a98e0f6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -107,6 +107,32 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
return 0;   /* error */
 }
 
+static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long 
io_type)
+{
+   unsigned int wimg = ptel & HPTE_R_WIMG;
+
+   /* Handle SAO */
+   if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
+   cpu_has_feature(CPU_FTR_ARCH_206))
+   wimg = HPTE_R_M;
+
+   if (!io_type)
+   return wimg == HPTE_R_M;
+
+   return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type;
+}
+
+/* Return HPTE cache control bits corresponding to Linux pte bits */
+static inline unsigned long hpte_cache_bits(unsigned long pte_val)
+{
+#if _PAGE_NO_CACHE == HPTE_R_I && _PAGE_WRITETHRU == HPTE_R_W
+   return pte_val & (HPTE_R_W | HPTE_R_I);
+#else
+   return ((pte_val & _PAGE_NO_CACHE) ? HPTE_R_I : 0) +
+   ((pte_val & _PAGE_WRITETHRU) ? HPTE_R_W : 0);
+#endif
+}
+
 static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
   unsigned long pagesize)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ba1da85..9b1c247 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -177,6 +177,8 @@ struct revmap_entry {
 
 /* Low-order bits in kvm->arch.slot_phys[][] */
 #define KVMPPC_PAGE_ORDER_MASK 0x1f
+#define KVMPPC_PAGE_NO_CACHE   HPTE_R_I/* 0x20 */
+#define KVMPPC_PAGE_WRITETHRU  HPTE_R_W/* 0x40 */
 #define KVMPPC_GOT_PAGE0x80
 
 struct kvm_arch {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cc18f3d..b904c40 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -199,7 +199,8 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned 
long gfn,
struct page *page, *hpage, *pages[1];
unsigned long s, pgsize;
unsigned long *physp;
-   unsigned int got, pgorder;
+   unsigned int is_io, got, pgorder;
+   struct vm_area_struct *vma;
unsigned long pfn, i, npages;
 
physp = kvm->arch.slot_phys[memslot->id];
@@ -208,34 +209,51 @@ static long kvmppc_get_guest_page(struct kvm *kvm, 
unsigned long gfn,
if (physp[gfn - memslot->base_gfn])
return 0;
 
+   is_io = 0;
+   got = 0;
page = NULL;
pgsize = psize;
+   err = -EINVAL;
start = gfn_to_hva_memslot(memslot, gfn);
 
/* Instantiate and get the page we want access to */
np = get_user_pages_fast(start, 1, 1, pages);
-   if (np != 1)
-   return -EINVAL;
-   page = pages[0];
-   got = KVMPPC_GOT_PAGE;
+   if (np != 1) {
+   /* Look up the vma for the page */
+   down_read(¤t->mm->mmap_sem);
+   vma = find_vma(current->mm, start);
+   if (!vma || vma->vm_start > start ||
+   start + psize > vma->vm_end ||
+   !(vma->vm_flags & VM_PFNMAP))
+   goto up_err;
+   is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot));
+   pfn = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+   /* check alignment of pfn vs. requested page size */
+   if (psize > PAGE_SIZE && (pfn & ((psize >> PAGE_SHIFT) - 1)))
+   goto up_err;
+   up_read(¤t->mm->mmap_sem);
 
-   /* See if this is a large page */
-   s = PAGE_SIZE;
-   if (PageHuge(page)) {
-   hpage = compound_head(page);
-   s <<= compound_order(hpage);
-   /* Get the whole large page if slot alignment is ok */
-   if (s > psize && slot_is_aligned(memslot, s) &&
-   !(memslot->userspace_addr & (s - 1))) {
-   start &= ~(s - 1);
-   pgsize = s;
-   

[PATCH 11/13] KVM: Add barriers to allow mmu_notifier_retry to be used locklessly

2011-12-05 Thread Paul Mackerras
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
the correct answer when called without kvm->mmu_lock being held.
PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
a single global spinlock in order to improve the scalability of updates
to the guest MMU hashed page table, and so needs this.

Signed-off-by: Paul Mackerras 
---
 include/linux/kvm_host.h |   14 +-
 virt/kvm/kvm_main.c  |6 +++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c6a2ec9..17319ad 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -672,12 +672,16 @@ static inline int mmu_notifier_retry(struct kvm_vcpu 
*vcpu, unsigned long mmu_se
if (unlikely(vcpu->kvm->mmu_notifier_count))
return 1;
/*
-* Both reads happen under the mmu_lock and both values are
-* modified under mmu_lock, so there's no need of smb_rmb()
-* here in between, otherwise mmu_notifier_count should be
-* read before mmu_notifier_seq, see
-* mmu_notifier_invalidate_range_end write side.
+* Ensure the read of mmu_notifier_count happens before the read
+* of mmu_notifier_seq.  This interacts with the smp_wmb() in
+* mmu_notifier_invalidate_range_end to make sure that the caller
+* either sees the old (non-zero) value of mmu_notifier_count or
+* the new (incremented) value of mmu_notifier_seq.
+* PowerPC Book3s HV KVM calls this under a per-page lock
+* rather than under kvm->mmu_lock, for scalability, so
+* can't rely on kvm->mmu_lock to keep things ordered.
 */
+   smp_rmb();
if (vcpu->kvm->mmu_notifier_seq != mmu_seq)
return 1;
return 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9cfb78..ad2a912 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -357,11 +357,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct 
mmu_notifier *mn,
 * been freed.
 */
kvm->mmu_notifier_seq++;
+   smp_wmb();
/*
 * The above sequence increase must be visible before the
-* below count decrease but both values are read by the kvm
-* page fault under mmu_lock spinlock so we don't need to add
-* a smb_wmb() here in between the two.
+* below count decrease, which is ensured by the smp_wmb above
+* in conjunction with the smp_rmb in mmu_notifier_retry().
 */
kvm->mmu_notifier_count--;
spin_unlock(&kvm->mmu_lock);
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 04/13] KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests

2011-12-05 Thread Paul Mackerras
This adds two new functions, kvmppc_pin_guest_page() and
kvmppc_unpin_guest_page(), and uses them to pin the guest pages where
the guest has registered areas of memory for the hypervisor to update,
(i.e. the per-cpu virtual processor areas, SLB shadow buffers and
dispatch trace logs) and then unpin them when they are no longer
required.

Although it is not strictly necessary to pin the pages at this point,
since all guest pages are already pinned, later commits in this series
will mean that guest pages aren't all pinned.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h |3 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   38 ++
 arch/powerpc/kvm/book3s_hv.c  |   67 ++---
 3 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index deb8a4e..16db48c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -140,6 +140,9 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat,
 extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
 extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu 
*vcpu);
 extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
+   unsigned long *nb_ret);
+extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index e4c6069..dcd39dc 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -184,6 +184,44 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
return -ENOENT;
 }
 
+void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long *nb_ret)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long gfn = gpa >> PAGE_SHIFT;
+   struct page *page;
+   unsigned long offset;
+   unsigned long pfn, pa;
+   unsigned long *physp;
+
+   memslot = gfn_to_memslot(kvm, gfn);
+   if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
+   return NULL;
+   physp = kvm->arch.slot_phys[memslot->id];
+   if (!physp)
+   return NULL;
+   physp += (gfn - memslot->base_gfn) >>
+   (kvm->arch.ram_porder - PAGE_SHIFT);
+   pa = *physp;
+   if (!pa)
+   return NULL;
+   pfn = pa >> PAGE_SHIFT;
+   page = pfn_to_page(pfn);
+   get_page(page);
+   offset = gpa & (kvm->arch.ram_psize - 1);
+   if (nb_ret)
+   *nb_ret = kvm->arch.ram_psize - offset;
+   return page_address(page) + offset;
+}
+
+void kvmppc_unpin_guest_page(struct kvm *kvm, void *va)
+{
+   struct page *page = virt_to_page(va);
+
+   page = compound_head(page);
+   put_page(page);
+}
+
 void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
 {
struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c2ee5a7..6e94af8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -137,12 +137,10 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
   unsigned long vcpuid, unsigned long vpa)
 {
struct kvm *kvm = vcpu->kvm;
-   unsigned long gfn, pg_index, ra, len;
-   unsigned long pg_offset;
+   unsigned long len, nb;
void *va;
struct kvm_vcpu *tvcpu;
-   struct kvm_memory_slot *memslot;
-   unsigned long *physp;
+   int err = H_PARAMETER;
 
tvcpu = kvmppc_find_vcpu(kvm, vcpuid);
if (!tvcpu)
@@ -155,51 +153,41 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
if (flags < 4) {
if (vpa & 0x7f)
return H_PARAMETER;
+   if (flags >= 2 && !tvcpu->arch.vpa)
+   return H_RESOURCE;
/* registering new area; convert logical addr to real */
-   gfn = vpa >> PAGE_SHIFT;
-   memslot = gfn_to_memslot(kvm, gfn);
-   if (!memslot || !(memslot->flags & KVM_MEMSLOT_INVALID))
-   return H_PARAMETER;
-   physp = kvm->arch.slot_phys[memslot->id];
-   if (!physp)
-   return H_PARAMETER;
-   pg_index = (gfn - memslot->base_gfn) >>
-   (kvm->arch.ram_porder - PAGE_SHIFT);
-   pg_offset = vpa & (kvm->arch.ram_psize - 1);
-   ra = physp[pg_index];
-   if (!ra)
+   va = kvmppc_pin_guest_page(kvm, vpa, &nb);
+   if (va == NULL)
return H_PARAMETER;
-

[PATCH 09/13] KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn

2011-12-05 Thread Paul Mackerras
This expands the reverse mapping array to contain two links for each
HPTE which are used to link together HPTEs that correspond to the
same guest logical page.  Each circular list of HPTEs is pointed to
by the rmap array entry for the guest logical page, pointed to by
the relevant memslot.  Links are 32-bit HPT entry indexes rather than
full 64-bit pointers, to save space.  We use 3 of the remaining 32
bits in the rmap array entries as a lock bit, a referenced bit and
a present bit (the present bit is needed since HPTE index 0 is valid).
The bit lock for the rmap chain nests inside the HPTE lock bit.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   18 ++
 arch/powerpc/include/asm/kvm_host.h  |   17 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   84 +-
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index a98e0f6..90e6658 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -107,6 +107,11 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
return 0;   /* error */
 }
 
+static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
+{
+   return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
+}
+
 static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long 
io_type)
 {
unsigned int wimg = ptel & HPTE_R_WIMG;
@@ -133,6 +138,19 @@ static inline unsigned long hpte_cache_bits(unsigned long 
pte_val)
 #endif
 }
 
+static inline void lock_rmap(unsigned long *rmap)
+{
+   do {
+   while (test_bit(KVMPPC_RMAP_LOCK_BIT, rmap))
+   cpu_relax();
+   } while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmap));
+}
+
+static inline void unlock_rmap(unsigned long *rmap)
+{
+   __clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmap);
+}
+
 static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
   unsigned long pagesize)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 9b1c247..e369d49 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -169,12 +169,27 @@ struct kvmppc_rma_info {
 /*
  * The reverse mapping array has one entry for each HPTE,
  * which stores the guest's view of the second word of the HPTE
- * (including the guest physical address of the mapping).
+ * (including the guest physical address of the mapping),
+ * plus forward and backward pointers in a doubly-linked ring
+ * of HPTEs that map the same host page.  The pointers in this
+ * ring are 32-bit HPTE indexes, to save space.
  */
 struct revmap_entry {
unsigned long guest_rpte;
+   unsigned int forw, back;
 };
 
+/*
+ * We use the top bit of each memslot->rmap entry as a lock bit,
+ * and bit 32 as a present flag.  The bottom 32 bits are the
+ * index in the guest HPT of a HPTE that points to the page.
+ */
+#define KVMPPC_RMAP_LOCK_BIT   63
+#define KVMPPC_RMAP_REF_BIT33
+#define KVMPPC_RMAP_REFERENCED (1ul << KVMPPC_RMAP_REF_BIT)
+#define KVMPPC_RMAP_PRESENT0x1ul
+#define KVMPPC_RMAP_INDEX  0xul
+
 /* Low-order bits in kvm->arch.slot_phys[][] */
 #define KVMPPC_PAGE_ORDER_MASK 0x1f
 #define KVMPPC_PAGE_NO_CACHE   HPTE_R_I/* 0x20 */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 88d2add..b600f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -57,6 +57,70 @@ static void *real_vmalloc_addr(void *x)
return __va(addr);
 }
 
+/*
+ * Add this HPTE into the chain for the real page.
+ * Must be called with the chain locked; it unlocks the chain.
+ */
+static void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
+unsigned long *rmap, long pte_index, int realmode)
+{
+   struct revmap_entry *head, *tail;
+   unsigned long i;
+
+   if (*rmap & KVMPPC_RMAP_PRESENT) {
+   i = *rmap & KVMPPC_RMAP_INDEX;
+   head = &kvm->arch.revmap[i];
+   if (realmode)
+   head = real_vmalloc_addr(head);
+   tail = &kvm->arch.revmap[head->back];
+   if (realmode)
+   tail = real_vmalloc_addr(tail);
+   rev->forw = i;
+   rev->back = head->back;
+   tail->forw = pte_index;
+   head->back = pte_index;
+   } else {
+   rev->forw = rev->back = pte_index;
+   i = pte_index;
+   }
+   smp_wmb();
+   *rmap = i | KVMPPC_RMAP_REFERENCED | KVMPPC_RMAP_PRESENT; /* unlock */
+}
+
+/* Remove this HPTE from the chain for a real page */
+static void remove_revmap_chain(struct kvm *kvm, long pte_index,
+  

[PATCH 03/13] KVM: PPC: Keep page physical addresses in per-slot arrays

2011-12-05 Thread Paul Mackerras
This allocates an array for each memory slot that is added to store
the physical addresses of the pages in the slot.  This array is
vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr().
This allows us to remove the ram_pginfo field from the kvm_arch
struct, and removes the 64GB guest RAM limit that we had.

We use the low-order bits of the array entries to store a flag
indicating that we have done get_page on the corresponding page,
and therefore need to call put_page when we are finished with the
page.  Currently this is set for all pages except those in our
special RMO regions.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |8 ++-
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   18 +++---
 arch/powerpc/kvm/book3s_hv.c|  114 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |   44 -
 4 files changed, 109 insertions(+), 75 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 629df2e..cf6b4d7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -175,25 +175,27 @@ struct revmap_entry {
unsigned long guest_rpte;
 };
 
+/* Low-order bits in kvm->arch.slot_phys[][] */
+#define KVMPPC_GOT_PAGE0x80
+
 struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
struct revmap_entry *revmap;
-   unsigned long ram_npages;
unsigned long ram_psize;
unsigned long ram_porder;
-   struct kvmppc_pginfo *ram_pginfo;
unsigned int lpid;
unsigned int host_lpid;
unsigned long host_lpcr;
unsigned long sdr1;
unsigned long host_sdr1;
int tlbie_lock;
-   int n_rma_pages;
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
struct list_head spapr_tce_tables;
+   unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
+   int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 80ece8d..e4c6069 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -98,16 +98,16 @@ void kvmppc_free_hpt(struct kvm *kvm)
 void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem)
 {
unsigned long i;
-   unsigned long npages = kvm->arch.ram_npages;
-   unsigned long pfn;
+   unsigned long npages;
+   unsigned long pa;
unsigned long *hpte;
unsigned long hash;
unsigned long porder = kvm->arch.ram_porder;
struct revmap_entry *rev;
-   struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo;
+   unsigned long *physp;
 
-   if (!pginfo)
-   return;
+   physp = kvm->arch.slot_phys[mem->slot];
+   npages = kvm->arch.slot_npages[mem->slot];
 
/* VRMA can't be > 1TB */
if (npages > 1ul << (40 - porder))
@@ -117,9 +117,10 @@ void kvmppc_map_vrma(struct kvm *kvm, struct 
kvm_userspace_memory_region *mem)
npages = HPT_NPTEG;
 
for (i = 0; i < npages; ++i) {
-   pfn = pginfo[i].pfn;
-   if (!pfn)
+   pa = physp[i];
+   if (!pa)
break;
+   pa &= PAGE_MASK;
/* can't use hpt_hash since va > 64 bits */
hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK;
/*
@@ -131,8 +132,7 @@ void kvmppc_map_vrma(struct kvm *kvm, struct 
kvm_userspace_memory_region *mem)
hash = (hash << 3) + 7;
hpte = (unsigned long *) (kvm->arch.hpt_virt + (hash << 4));
/* HPTE low word - RPN, protection, etc. */
-   hpte[1] = (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C |
-   HPTE_R_M | PP_RWXX;
+   hpte[1] = pa | HPTE_R_R | HPTE_R_C | HPTE_R_M | PP_RWXX;
smp_wmb();
hpte[0] = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
(i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED |
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5efdd5b..c2ee5a7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -48,14 +48,6 @@
 #include 
 #include 
 
-/*
- * For now, limit memory to 64GB and require it to be large pages.
- * This value is chosen because it makes the ram_pginfo array be
- * 64kB in size, which is about as large as we want to be trying
- * to allocate with kmalloc.
- */
-#define MAX_MEM_ORDER  36
-
 #define LARGE_PAGE_ORDER   24  /* 16MB pages */
 
 /* #define EXIT_DEBUG */
@@ -145,10 +137,12 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
  

[PATCH 06/13] KVM: PPC: Only get pages when actually needed, not in prepare_memory_region()

2011-12-05 Thread Paul Mackerras
This removes the code from kvmppc_core_prepare_memory_region() that
looked up the VMA for the region being added and called hva_to_page
to get the pfns for the memory.  We have no guarantee that there will
be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION
ioctl call; userspace can do that ioctl and then map memory into the
region later.

Instead we defer looking up the pfn for each memory page until it is
needed, which generally means when the guest does an H_ENTER hcall on
the page.  Since we can't call get_user_pages in real mode, if we don't
already have the pfn for the page, kvmppc_h_enter() will return
H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back
to kernel context.  That calls kvmppc_get_guest_page() to get the pfn
for the page, and then calls back to kvmppc_h_enter() to redo the HPTE
insertion.

When the first vcpu starts executing, we need to have the RMO or VRMA
region mapped so that the guest's real mode accesses will work.  Thus
we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set
up and if not, call kvmppc_hv_setup_rma().  It checks if the memslot
starting at guest physical 0 now has RMO memory mapped there; if so it
sets it up for the guest, otherwise on POWER7 it sets up the VRMA.
The function that does that, kvmppc_map_vrma, is now a bit simpler,
as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself.

Since we are now potentially updating entries in the slot_phys[]
arrays from multiple vcpu threads, we now have a spinlock protecting
those updates to ensure that we don't lose track of any references
to pages.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|4 +
 arch/powerpc/include/asm/kvm_book3s_64.h |   12 ++
 arch/powerpc/include/asm/kvm_host.h  |2 +
 arch/powerpc/include/asm/kvm_ppc.h   |4 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  130 +---
 arch/powerpc/kvm/book3s_hv.c |  244 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   56 
 7 files changed, 291 insertions(+), 161 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 16db48c..5e7e04b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -143,6 +143,10 @@ extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, 
gfn_t gfn);
 extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
unsigned long *nb_ret);
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
+extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
+   long pte_index, unsigned long pteh, unsigned long ptel);
+extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
+   long pte_index, unsigned long pteh, unsigned long ptel);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index fe45a81..ab6772e 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -95,4 +95,16 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   /* only handle 4k, 64k and 16M pages for now */
+   if (!(h & HPTE_V_LARGE))
+   return 1ul << 12;   /* 4k page */
+   if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206))
+   return 1ul << 16;   /* 64k page */
+   if ((l & 0xff000) == 0)
+   return 1ul << 24;   /* 16M page */
+   return 0;   /* error */
+}
+
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cf6b4d7..2a52bdb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -193,7 +193,9 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
+   int rma_setup_done;
struct list_head spapr_tce_tables;
+   spinlock_t slot_phys_lock;
unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
unsigned short last_vcpu[NR_CPUS];
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fc2d696..111e1b4 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -121,8 +121,8 @@ extern long kvmppc_alloc_hpt(struct kvm *kvm);
 extern void kvmppc_free_hpt(struct kvm *kvm);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
-

[PATCH 0/13] KVM: PPC: Update Book3S HV memory handling

2011-12-05 Thread Paul Mackerras
This series of patches updates the Book3S-HV KVM code that manages the
guest hashed page table (HPT) to enable several things:

* MMIO emulation and MMIO pass-through

* Use of small pages (4kB or 64kB, depending on config) to back the
  guest memory

* Pageable guest memory - i.e. backing pages can be removed from the
  guest and reinstated on demand, using the MMU notifier mechanism.

* Guests can be given read-only access to pages even though they think
  they have mapped them read/write.  When they try to write to them
  their access is upgraded to read/write.  This allows KSM to share
  pages between guests.

On PPC970 we have no way to get DSIs and ISIs to come to the
hypervisor, so we can't do MMIO emulation or pageable guest memory.
On POWER7 we set the VPM1 bit in the LPCR to make all DSIs and ISIs
come to the hypervisor (host) as HDSIs or HISIs.

This code is working well in my tests.  The sporadic crashes that I
was seeing earlier are fixed by the first patch in the series.
Somewhat to my surprise, when I implemented the last patch in the
series I started to see KSM coalescing pages without any further
effort on my part -- my tests were on a machine with Fedora 16
installed, and it has ksmtuned running by default.

This series is on top of Alex Graf's kvm-ppc-next branch, although the
last patch on that branch ("KVM: PPC: booke: Improve timer register
emulation") is causing the decrementer not to work properly in Book3S
HV guests, for reasons that I haven't fully determined yet.

These patches only touch arch/powerpc except for patch 11, which adds
a couple of barriers to allow mmu_notifier_retry() to be used outside
of the kvm->mmu_lock.

Unlike the previous version of these patches, we don't look at what's
mapped in the user address space at the time that
kvmppc_core_prepare_memory_region or kvmppc_core_commit_memory_region
gets called; we look up pages only when they are needed, either
because the guest wants to map them with an H_ENTER hypercall, or for
the pages needed for the virtual real-mode area (VRMA), at the time of
the first VCPU_RUN ioctl.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 02/13] KVM: PPC: Keep a record of HV guest view of hashed page table entries

2011-12-05 Thread Paul Mackerras
This adds an array that parallels the guest hashed page table (HPT),
that is, it has one entry per HPTE, used to store the guest's view
of the second doubleword of the corresponding HPTE.  The first
doubleword in the HPTE is the same as the guest's idea of it, so we
don't need to store a copy, but the second doubleword in the HPTE has
the real page number rather than the guest's logical page number.
This allows us to remove the back_translate() and reverse_xlate()
functions.

This "reverse mapping" array is vmalloc'd, meaning that to access it
in real mode we have to walk the kernel's page tables explicitly.
That is done by the new real_vmalloc_addr() function.  (In fact this
returns an address in the linear mapping, so the result is usable
both in real mode and in virtual mode.)

There are also some minor cleanups here: moving the definitions of
HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |8 +++
 arch/powerpc/include/asm/kvm_host.h  |   10 
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |   44 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   87 ++
 4 files changed, 103 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index d0ac94f..23bb17e 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -29,6 +29,14 @@ static inline struct kvmppc_book3s_shadow_vcpu 
*to_svcpu(struct kvm_vcpu *vcpu)
 
 #define SPAPR_TCE_SHIFT12
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+/* For now use fixed-size 16MB page table */
+#define HPT_ORDER  24
+#define HPT_NPTEG  (1ul << (HPT_ORDER - 7))/* 128B per pteg */
+#define HPT_NPTE   (HPT_NPTEG << 3)/* 8 PTEs per PTEG */
+#define HPT_HASH_MASK  (HPT_NPTEG - 1)
+#endif
+
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 unsigned long pte_index)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 66c75cd..629df2e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -166,9 +166,19 @@ struct kvmppc_rma_info {
atomic_t use_count;
 };
 
+/*
+ * The reverse mapping array has one entry for each HPTE,
+ * which stores the guest's view of the second word of the HPTE
+ * (including the guest physical address of the mapping).
+ */
+struct revmap_entry {
+   unsigned long guest_rpte;
+};
+
 struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
+   struct revmap_entry *revmap;
unsigned long ram_npages;
unsigned long ram_psize;
unsigned long ram_porder;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index bc3a2ea..80ece8d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -33,11 +34,6 @@
 #include 
 #include 
 
-/* For now use fixed-size 16MB page table */
-#define HPT_ORDER  24
-#define HPT_NPTEG  (1ul << (HPT_ORDER - 7))/* 128B per pteg */
-#define HPT_HASH_MASK  (HPT_NPTEG - 1)
-
 /* Pages in the VRMA are 16MB pages */
 #define VRMA_PAGE_ORDER24
 #define VRMA_VSID  0x1ffUL /* 1TB VSID reserved for VRMA */
@@ -51,7 +47,9 @@ long kvmppc_alloc_hpt(struct kvm *kvm)
 {
unsigned long hpt;
unsigned long lpid;
+   struct revmap_entry *rev;
 
+   /* Allocate guest's hashed page table */
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
   HPT_ORDER - PAGE_SHIFT);
if (!hpt) {
@@ -60,12 +58,20 @@ long kvmppc_alloc_hpt(struct kvm *kvm)
}
kvm->arch.hpt_virt = hpt;
 
+   /* Allocate reverse map array */
+   rev = vmalloc(sizeof(struct revmap_entry) * HPT_NPTE);
+   if (!rev) {
+   pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
+   goto out_freehpt;
+   }
+   kvm->arch.revmap = rev;
+
+   /* Allocate the guest's logical partition ID */
do {
lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
if (lpid >= NR_LPIDS) {
pr_err("kvm_alloc_hpt: No LPIDs free\n");
-   free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
-   return -ENOMEM;
+   goto out_freeboth;
}
} while (test_and_set_bit(lpid, lpid_inuse));
 
@@ -74,11 +80,18 @@ long kvmppc_alloc_hpt(struct kvm *kvm)
 
pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
return 0;
+
+ out_freeboth:
+   vfree(rev);
+ out_freehpt:
+   free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
+

[PATCH 05/13] KVM: PPC: Make the H_ENTER hcall more reliable

2011-12-05 Thread Paul Mackerras
At present, our implementation of H_ENTER only makes one try at locking
each slot that it looks at, and doesn't even retry the ldarx/stdcx.
atomic update sequence that it uses to attempt to lock the slot.  Thus
it can return the H_PTEG_FULL error unnecessarily, particularly when
the H_EXACT flag is set, meaning that the caller wants a specific PTEG
slot.

This improves the situation by making a second pass when no free HPTE
slot is found, where we spin until we succeed in locking each slot in
turn and then check whether it is full while we hold the lock.  If the
second pass fails, then we return H_PTEG_FULL.

This also moves lock_hpte to a header file (since later commits in this
series will need to use it from other source files) and renames it to
try_lock_hpte, which is a somewhat less misleading name.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   25 
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   63 --
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 23bb17e..fe45a81 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -37,6 +37,31 @@ static inline struct kvmppc_book3s_shadow_vcpu 
*to_svcpu(struct kvm_vcpu *vcpu)
 #define HPT_HASH_MASK  (HPT_NPTEG - 1)
 #endif
 
+/*
+ * We use a lock bit in HPTE dword 0 to synchronize updates and
+ * accesses to each HPTE, and another bit to indicate non-present
+ * HPTEs.
+ */
+#define HPTE_V_HVLOCK  0x40UL
+
+static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
+{
+   unsigned long tmp, old;
+
+   asm volatile("  ldarx   %0,0,%2\n"
+"  and.%1,%0,%3\n"
+"  bne 2f\n"
+"  ori %0,%0,%4\n"
+"  stdcx.  %0,0,%2\n"
+"  beq+2f\n"
+"  li  %1,%3\n"
+"2:isync"
+: "=&r" (tmp), "=&r" (old)
+: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK)
+: "cc", "memory");
+   return old == 0;
+}
+
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 unsigned long pte_index)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 5f45ba7..659175f 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -56,26 +56,6 @@ static void *real_vmalloc_addr(void *x)
return __va(addr);
 }
 
-#define HPTE_V_HVLOCK  0x40UL
-
-static inline long lock_hpte(unsigned long *hpte, unsigned long bits)
-{
-   unsigned long tmp, old;
-
-   asm volatile("  ldarx   %0,0,%2\n"
-"  and.%1,%0,%3\n"
-"  bne 2f\n"
-"  ori %0,%0,%4\n"
-"  stdcx.  %0,0,%2\n"
-"  beq+2f\n"
-"  li  %1,%3\n"
-"2:isync"
-: "=&r" (tmp), "=&r" (old)
-: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK)
-: "cc", "memory");
-   return old == 0;
-}
-
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel)
 {
@@ -129,24 +109,49 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long 
flags,
pteh &= ~0x60UL;
ptel &= ~(HPTE_R_PP0 - kvm->arch.ram_psize);
ptel |= pa;
+
if (pte_index >= HPT_NPTE)
return H_PARAMETER;
if (likely((flags & H_EXACT) == 0)) {
pte_index &= ~7UL;
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
-   for (i = 0; ; ++i) {
-   if (i == 8)
-   return H_PTEG_FULL;
+   for (i = 0; i < 8; ++i) {
if ((*hpte & HPTE_V_VALID) == 0 &&
-   lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID))
+   try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID))
break;
hpte += 2;
}
+   if (i == 8) {
+   /*
+* Since try_lock_hpte doesn't retry (not even stdcx.
+* failures), it could be that there is a free slot
+* but we transiently failed to lock it.  Try again,
+* actually locking each slot and checking it.
+*/
+   hpte -= 16;
+   for (i = 0; i < 8; ++i) {
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   

[PATCH 10/13] KVM: PPC: Implement MMIO emulation support for Book3S HV guests

2011-12-05 Thread Paul Mackerras
This provides the low-level support for MMIO emulation in Book3S HV
guests.  When the guest tries to map a page which is not covered by
any memslot, that page is taken to be an MMIO emulation page.  Instead
of inserting a valid HPTE, we insert an HPTE that has the valid bit
clear but another hypervisor software-use bit set, which we call
HPTE_V_ABSENT, to indicate that this is an absent page.  An
absent page is treated much like a valid page as far as guest hcalls
(H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
an absent HPTE doesn't need to be invalidated with tlbie since it
was never valid as far as the hardware is concerned.

When the guest accesses a page for which there is an absent HPTE, it
will take a hypervisor data storage interrupt (HDSI) since we now set
the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
looks up the hash table and if it finds an absent HPTE mapping the
requested virtual address, will switch to kernel mode and handle the
fault in kvmppc_book3s_hv_page_fault(), which at present just calls
kvmppc_hv_emulate_mmio() to set up the MMIO emulation.

This is based on an earlier patch by Benjamin Herrenschmidt, but since
heavily reworked.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|5 +
 arch/powerpc/include/asm/kvm_book3s_64.h |   26 +++
 arch/powerpc/include/asm/kvm_host.h  |5 +
 arch/powerpc/include/asm/mmu-hash64.h|2 +-
 arch/powerpc/include/asm/ppc-opcode.h|4 +-
 arch/powerpc/include/asm/reg.h   |1 +
 arch/powerpc/kernel/asm-offsets.c|1 +
 arch/powerpc/kernel/exceptions-64s.S |8 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  228 +--
 arch/powerpc/kvm/book3s_hv.c |   21 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  262 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  127 ---
 12 files changed, 607 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5e7e04b..5ac53f9 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -121,6 +121,11 @@ extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu 
*vcpu);
 extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
 extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
 extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run,
+   struct kvm_vcpu *vcpu, unsigned long addr,
+   unsigned long status);
+extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr,
+   unsigned long slb_v, unsigned long valid);
 
 extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache 
*pte);
 extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 90e6658..9a59b6d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -37,12 +37,15 @@ static inline struct kvmppc_book3s_shadow_vcpu 
*to_svcpu(struct kvm_vcpu *vcpu)
 #define HPT_HASH_MASK  (HPT_NPTEG - 1)
 #endif
 
+#define VRMA_VSID  0x1ffUL /* 1TB VSID reserved for VRMA */
+
 /*
  * We use a lock bit in HPTE dword 0 to synchronize updates and
  * accesses to each HPTE, and another bit to indicate non-present
  * HPTEs.
  */
 #define HPTE_V_HVLOCK  0x40UL
+#define HPTE_V_ABSENT  0x20UL
 
 static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
 {
@@ -138,6 +141,29 @@ static inline unsigned long hpte_cache_bits(unsigned long 
pte_val)
 #endif
 }
 
+static inline bool hpte_read_permission(unsigned long pp, unsigned long key)
+{
+   if (key)
+   return PP_RWRX <= pp && pp <= PP_RXRX;
+   return 1;
+}
+
+static inline bool hpte_write_permission(unsigned long pp, unsigned long key)
+{
+   if (key)
+   return pp == PP_RWRW;
+   return pp <= PP_RWRW;
+}
+
+static inline int hpte_get_skey_perm(unsigned long hpte_r, unsigned long amr)
+{
+   unsigned long skey;
+
+   skey = ((hpte_r & HPTE_R_KEY_HI) >> 57) |
+   ((hpte_r & HPTE_R_KEY_LO) >> 9);
+   return (amr >> (62 - 2 * skey)) & 3;
+}
+
 static inline void lock_rmap(unsigned long *rmap)
 {
do {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e369d49..c9c92f0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -209,6 +209,7 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
+   unsigned long vrma_slb_v;
int rma_setup_done;
struct list_head spapr_tce_tables;
spinlock_t slot_phys_lock;
@@ -451,6 +452,10 @@