date:20150129

Re: [RFC,1/2] powerpc/fsl-pci: atomic get_user when pagefault_disabled

2015-01-29 Thread David Hildenbrand

> On Tue, Nov 25, 2014 at 12:43:25PM +0100, David Hildenbrand wrote:
> > Whenever we have pagefaults disabled, we have to use the atomic variants of
> > (set|get)_user and copy_(from|to)_user.
> > 
> > Signed-off-by: David Hildenbrand 
> > ---
> >  arch/powerpc/sysdev/fsl_pci.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> > index 65d2ed4..c0af4ef 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -1025,7 +1025,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
> > if (is_in_pci_mem_space(addr)) {
> > if (user_mode(regs)) {
> > pagefault_disable();
> > -   ret = get_user(regs->nip, &inst);
> > +   ret = __get_user_inatomic(regs->nip, &inst);
> > pagefault_enable();
> > } else {
> > ret = probe_kernel_address(regs->nip, inst);
> 
> Please post a non-RFC version if you're ready for this to be merged.
> 
> -Scott
> 

Hi Scott,

actually this patch was wrong. We are allowed to use the non-atomic variants
during pagefault_disable(). The semantics of get_user()/set_user() then change
- they will not sleep.

To reenable the might_sleep checks() in might_fault() makes it necessary to
count the levels of pagefault_disable() calls, to distinguish it from oridnary
preempt_disable() calls. I've got another patchset out there that deals with
this problem and is able to distinguish between them.

The interest in this feature just doesn't seem to be very high :)

So you can safely ignore this patch.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powernv: Add OPAL soft-poweroff routine

2015-01-29 Thread Joel Stanley

Register a notifier for a OPAL message indicating that the machine
should prepare itself for a graceful power off.

OPAL will tell us if the power off is a reboot or shutdown, but for now
we perform the same orderly_poweroff action.

Signed-off-by: Joel Stanley 
---
v2:
  - combine the reboot and off cases, as they are the same code

 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/platforms/powernv/Makefile |  2 +-
 arch/powerpc/platforms/powernv/opal-power.c | 66 +
 3 files changed, 68 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-power.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ca2dd45..cdf32c0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -250,7 +250,7 @@ enum OpalMessageType {
 */
OPAL_MSG_MEM_ERR,
OPAL_MSG_EPOW,
-   OPAL_MSG_SHUTDOWN,
+   OPAL_MSG_SHUTDOWN,  /* params[0] = 1 reboot, 0 shutdown */
OPAL_MSG_HMI_EVT,
OPAL_MSG_TYPE_MAX,
 };
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index f241acc..6f3c5d3 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
-obj-y  += opal-msglog.o opal-hmi.o
+obj-y  += opal-msglog.o opal-hmi.o opal-power.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/opal-power.c 
b/arch/powerpc/platforms/powernv/opal-power.c
new file mode 100644
index 000..bbc1054
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-power.c
@@ -0,0 +1,66 @@
+/*
+ * PowerNV OPAL power control for graceful shutdown handling
+ *
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define SOFT_OFF 0x00
+#define SOFT_REBOOT 0x01
+
+static int opal_power_control_event(struct notifier_block *nb,
+   unsigned long msg_type, void *msg)
+{
+   struct opal_msg *power_msg = msg;
+   uint64_t type;
+
+   type = be64_to_cpu(power_msg->params[0]);
+
+   switch (type) {
+   case SOFT_REBOOT:
+   /* Fall through. The service processor is responsible for
+* bringing the machine back up */
+   case SOFT_OFF:
+   pr_info("OPAL: poweroff requested\n");
+   orderly_poweroff(true);
+   break;
+   default:
+   pr_err("OPAL: power control type unexpected %016llx\n", type);
+   }
+
+   return 0;
+}
+
+static struct notifier_block opal_power_control_nb = {
+   .notifier_call  = opal_power_control_event,
+   .next   = NULL,
+   .priority   = 0,
+};
+
+static int __init opal_power_control_init(void)
+{
+   int ret;
+
+   ret = opal_message_notifier_register(OPAL_MSG_SHUTDOWN,
+&opal_power_control_nb);
+   if (ret) {
+   pr_err("%s: Can't register OPAL event notifier (%d)\n",
+   __func__, ret);
+   return ret;
+   }
+
+   return 0;
+}
+
+machine_subsys_initcall(powernv, opal_power_control_init);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_HV_POSSIBLE

2015-01-29 Thread Michael Ellerman

On Wed, 2015-07-01 at 04:43:07 UTC, Mahesh Salgaonkar wrote:
> From: Mahesh Salgaonkar 
> 
> commit id 9975f5e added new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE
> that helps to select the relevant code in the kernel when HV and PR
> bits are built as separate modules. As part of that commit, all the
> instances of #ifdef CONFIG_KVM_BOOK3S_64_HV was replaced with
> CONFIG_KVM_BOOK3S_HV_POSSIBLE. But the MCE code still depends on
> CONFIG_KVM_BOOK3S_64_HV which is wrong. When HV bits are built as a
> separate module the relevent MCE code gets excluded. This patch fixes
> the MCE code to use CONFIG_KVM_BOOK3S_HV_POSSIBLE.

So what is the symptom? ie. is it fatal or just annoying.

And depending on that, should this go to stable?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: perf/powerpc: reset event hw state when adding it to the PMU

2015-01-29 Thread Scott Wood

On Thu, Jun 26, 2014 at 11:58:58AM +0300, Alexandru-Cezar Sardan wrote:
> When adding an event to the PMU with PERF_EF_START the STOPPED and UPTODATE
> flags need to be cleared in the hw.event status variable because they are
> preventing the update of the event count on overflow interrupt.
> 
> Signed-off-by: Alexandru-Cezar Sardan 
> ---
>  arch/powerpc/perf/core-fsl-emb.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Sorry for the delay -- it wasn't CCed to me and the subject line didn't
indicate that it was fsl related.

> diff --git a/arch/powerpc/perf/core-fsl-emb.c 
> b/arch/powerpc/perf/core-fsl-emb.c
> index d35ae52..ef2ce48 100644
> --- a/arch/powerpc/perf/core-fsl-emb.c
> +++ b/arch/powerpc/perf/core-fsl-emb.c
> @@ -330,9 +330,11 @@ static int fsl_emb_pmu_add(struct perf_event *event, int 
> flags)
>   }
>   local64_set(&event->hw.prev_count, val);
>  
> - if (!(flags & PERF_EF_START)) {
> + if (unlikely(!(flags & PERF_EF_START))) {
>   event->hw.state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
>   val = 0;
> + } else {
> + event->hw.state &= ~(PERF_HES_STOPPED | PERF_HES_UPTODATE);
>   }

Why unlikely()?  None of the other perf drivers have that there.

Commit f53d168c does something similar for book3s.  It sets hw.state to
zero instead of clearing the flags.  Any reason why core-fsl-emb should
be different?

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cpuidle/powernv: Enter fastsleep on checking if deep idle states are allowed

2015-01-29 Thread Preeti U Murthy

On 09/15/2014 02:22 PM, Preeti U Murthy wrote:
> On 09/15/2014 12:29 PM, Michael Ellerman wrote:
>> On Fri, 2014-09-12 at 16:31 +0530, Preeti U Murthy wrote:
>>> Today the procfs interface /proc/sys/kernel/powersave-nap is used to control
>>> entry into deep idle states beyond snooze. Check for the value of this
>>> parameter before entering fastsleep. We already do this check for nap in
>>> power7_idle().
>>>
>>> Signed-off-by: Preeti U Murthy 
>>> ---
>>>
>>>  drivers/cpuidle/cpuidle-powernv.c |6 ++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
>>> b/drivers/cpuidle/cpuidle-powernv.c
>>> index a64be57..b8ba52e 100644
>>> --- a/drivers/cpuidle/cpuidle-powernv.c
>>> +++ b/drivers/cpuidle/cpuidle-powernv.c
>>> @@ -69,6 +69,12 @@ static int fastsleep_loop(struct cpuidle_device *dev,
>>> unsigned long old_lpcr = mfspr(SPRN_LPCR);
>>> unsigned long new_lpcr;
>>>  
>>> +   /*
>>> +* Verify if snooze is the only valid cpuidle state
>>> +*/
>>> +   if (!(powersave_nap > 0))
>>> +   return index;
>>> +
>>> if (unlikely(system_state < SYSTEM_RUNNING))
>>> return index;
>>
>> Doesn't the above mean we are just going to keep trying to go into fastsleep
>> again and again? Or does the idle code work out that it didn't work based on
>> the fact that we didn't sleep for the right period?
> 
> Thats right. The idle code figures that its judgment to enter fastsleep
> was not correct and applies a correction factor to its future decisions
> . This correction factor is intended to influence the cpuidle governor's
> decision on choosing an idle state for the cpu based on the history of
> wakeups. Hence a shallower idle state will be chosen here on in the
> above circumstance.

On second thoughts, I feel that this is not the best way to disable deep
idle states. Going down this path means the governor should identify
that fastsleep is not being entered into and take corrective action from
then on. If the governor is buggy, we pointlessly enter and exit
fastsleep_loop() during long idle periods.

So a better way to disable fastsleep during runtime is through the
disable flag in sysfs. To disable deep idle states at bootup one could
use powersave=off kernel cmd line parameter.

The functionality provided by powersave_nap and smt_snooze_delay (to
quote another example) carried value during the days that powernv did
not have a cpuidle driver. We retain them so as to not break userspace,
although they do not carry value today. Any future users of these
parameters should be avoided IMO. Hence I retract this patch.

Regards
Preeti U Murthy

> 
>>
>> We were talking about getting rid of powersave_nap altogether, but I think we
>> decided we couldn't, I forget.
> 
> Isn't this a helpful knob to disable cpuidle at runtime? Currently we
> check the value of powersave_nap before entering both nap and fastsleep.
> 
> Regards
> Preeti U Murthy
>>
>> cheers
>>
>>
>> ___
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

AW: AW: SPE & Interrupt context (was how to make use of SPE instructions)

2015-01-29 Thread Markus Stockhausen

> Von: Scott Wood [scottw...@freescale.com]
> Gesendet: Freitag, 30. Januar 2015 01:49
> An: Markus Stockhausen
> Cc: Michael Ellerman; linuxppc-dev@lists.ozlabs.org; Herbert Xu
> Betreff: Re: AW: SPE & Interrupt context (was how to make use of SPE 
> instructions)
> 
> On Wed, 2015-01-28 at 05:00 +, Markus Stockhausen wrote:
> > > > Von: Scott Wood [scottw...@freescale.com]
> > > > Gesendet: Mittwoch, 28. Januar 2015 05:21
> > > > An: Markus Stockhausen
> > > > Cc: Michael Ellerman; linuxppc-dev@lists.ozlabs.org; Herbert Xu
> > > > Betreff: Re: SPE & Interrupt context (was how to make use of SPE 
> > > > instructions)
> > > >
> > > > Hi Scott,
> > > >
> > > > thanks for your helpful feedback. As you might have seen I sent a first
> > > > patch for the sha256 kernel module that takes care about preemption.
> > > >
> > > > Herbert Xu noticed that my module won't run in for IPsec as all
> > > > work will be done from interrupt context. Do you have a tip how I can
> > > > mitigate the check I implemented:
> > > >
> > > > static bool spe_usable(void)
> > > > {
> > > >   return !in_interrupt();
> > > > }
> > > >
> > > > Intel guys have something like that
> > > >
> > > > bool irq_fpu_usable(void)
> > > > {
> > > >   return !in_interrupt() ||
> > > > interrupted_user_mode() ||
> > > > interrupted_kernel_fpu_idle();
> > > > }
> > > >
> > > > But I have no idea how to transfer it to the PPC/SPE case.
> > >
> > > I'm not sure what sort of tip you're looking for, other than
> > > implementing it myself. :-)
> >
> > Hi Scott,
> >
> > maybe I did not explain it correctly. interrupted_kernel_fpu_idle()
> > is x86 specific. The same applies to interrupted_user_mode().
> > I'm just searching for a similar feature in the PPC/SPE world.
> 
> There isn't one.
> 
> > I can see that enable_kernel_spe() does something with the
> > MSR_SPE flag, but I have no idea  how to determine if I'm allowed
> > to enable SPE although I'm inside an interrupt context.
> 
> As with x86, you'd want to check whether the kernel interrupted
> userspace.  I don't know what x86 is doing with TS, but on PPC you might
> check whether the interrupted thread had MSR_FP enabled.
> 
> > I'm asking because from the previous posts I conclude that
> > running SPE instructions inside an interrupt might be critical.
> > Because of registers not being saved?
> 
> Yes.  Currently callers of enable_kernel_spe() only need to disable
> preemption, not interrupts.
> 
> > Or can I just save the register contents myself and interrupt
> > context is no longer a showstopper?
> 
> If you only need a small number of registers that might be reasonable,
> but if you need a bunch then you don't want to save them when you don't
> have to.
> 
> Another option is to change enable_kernel_spe() to require interrupts to
> be disabled.

Phew, that is going deeper than I expected. 

I'm a newbie in the topic of interrupts and FPU/SPE registers. Nevertheless
enforcing enable_kernel_spe() to only be available outside of interrupt
context sounds too restrictive for me. Also checking for thread/CPU flags 
of an interrupted process is nothing I can or want to implement. There
might be the risk that I'm starting something that will be too complex
for me.

BUT! Given the fact that SPE registers are only extended GPRs and my
algorithm needs just 10 of them I can live with the following design.

- I must already save several non-volatile registers. Putting the 64 bit values 
into them would require me to save their contents with evstdd instead of 
stw. Of course stack alignment to 8 bytes required. So only a few alignment
instructions needed additionally during initialization.

- During function cleanup I will restore the registers the same way.

- In case I interrupted myself, I might have saved sensitive data of another 
thread on my stack. So I will zero that area after I restored the registers.
That needs an additional 10 instructions. In contrast to ~2000 instructions
for one sha256 round that should be neglectable.

This little overhead will save me lots of trouble at other locations:

- I can avoid checking for an interrupt context.

- I don't need a fallback to the generic implementation. 

Thinking about it more and more I think I performance will stay the same. 
Can you confirm that this will work? If yes I will send a v2 patch.

Markus

Diese E-Mail enthÃ¤lt vertrauliche und/oder rechtlich geschÃ¼tzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtÃ¼mlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Ãber das Internet versandte E-Mails kÃ¶nnen unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche WillenserklÃ¤rung.

Collogia
Unternehmensberatung AG
Ubierri

Re: [v4] QE: Move QE from arch/powerpc to drivers/soc

2015-01-29 Thread Scott Wood

On Wed, Nov 12, 2014 at 11:40:13AM +0800, Zhao Qiang wrote:
> ls1 has qe and ls1 has arm cpu.
> move qe from arch/powerpc to drivers/soc/fsl
> to adapt to powerpc and arm
> 
> Signed-off-by: Zhao Qiang 
> ---
> Changes for v2:
>   - move code to driver/soc
> Changes for v3:
>   - change drivers/soc/qe to drivers/soc/fsl-qe
> Changes for v4:
>   - move drivers/soc/fsl-qe to drivers/soc/fsl/qe
>   - move head files for qe from include/linux/fsl to include/soc/fsl
>   - move qe_ic.c to drivers/irqchip/

Need MAINTAINERS update for drivers/soc/fsl/qe, as previously discussed.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v5, 5/6] powerpc/mpc85xx: Add FSL QorIQ DPAA BMan support to device tree(s)

2015-01-29 Thread Scott Wood

On Mon, Dec 08, 2014 at 04:29:20AM -0600, Emil Medve wrote:
> From: Kumar Gala 
> 
> Change-Id: If643fa5ba0a903aef8f5056a2c90ebecc995b760
> Signed-off-by: Kumar Gala 
> Signed-off-by: Geoff Thorpe 
> Signed-off-by: Hai-Ying Wang 
> Signed-off-by: Chunhe Lan 
> Signed-off-by: Poonam Aggrwal 
> [Emil Medve: Sync with the upstream binding]
> Signed-off-by: Emil Medve 

Doesn't apply cleanly

> @@ -408,6 +415,8 @@ crypto: crypto@30 {
>   fsl,iommu-parent = <&pamu1>;
>   };
>  
> +/include/ "qoriq-bman1.dtsi"
> +
>  /include/ "qoriq-fman-0.dtsi"
>  /include/ "qoriq-fman-0-1g-0.dtsi"
>  /include/ "qoriq-fman-0-1g-1.dtsi"

What tree did you base these patches on?  There's no fman in the upstream
device trees yet (just a binding).

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC,1/2] powerpc/fsl-pci: atomic get_user when pagefault_disabled

2015-01-29 Thread Scott Wood

On Tue, Nov 25, 2014 at 12:43:25PM +0100, David Hildenbrand wrote:
> Whenever we have pagefaults disabled, we have to use the atomic variants of
> (set|get)_user and copy_(from|to)_user.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/sysdev/fsl_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index 65d2ed4..c0af4ef 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -1025,7 +1025,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
>   if (is_in_pci_mem_space(addr)) {
>   if (user_mode(regs)) {
>   pagefault_disable();
> - ret = get_user(regs->nip, &inst);
> + ret = __get_user_inatomic(regs->nip, &inst);
>   pagefault_enable();
>   } else {
>   ret = probe_kernel_address(regs->nip, inst);

Please post a non-RFC version if you're ready for this to be merged.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/dts: Update platform PLL node

2015-01-29 Thread Scott Wood

On Tue, 2015-01-20 at 02:51 -0600, Liberman Igal-B31950 wrote:
> 
> 
> Regaeds,
> Igal Liberman.
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, January 20, 2015 9:44 AM
> > To: Liberman Igal-B31950
> > Cc: linuxppc-dev@lists.ozlabs.org; Medve Emilian-EMMEDVE1
> > Subject: Re: [PATCH] powerpc/dts: Update platform PLL node
> > 
> > On Mon, 2015-01-12 at 08:00 +0200, Igal.Liberman wrote:
> > > From: Igal Liberman 
> > >
> > > Signed-off-by: Igal Liberman 
> > > Change-Id: I92d020651237041d3767aa35e9345439714f9831
> > > ---
> > >  arch/powerpc/boot/dts/fsl/qoriq-clockgen2.dtsi |6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > Please explain this more.  Was it just wrong before?  Is this for a new 
> > chip?  If
> > the latter, what effect does this have on existing chips?
> > 
> 
> It wasn't wrong, however it was missing some clocking options which
> might be used by some hardware accelerators available in T/B devices.
> I need this options for FMan, however it might be used for other
> accelerators too.

If the PLL had a div3 option and it wasn't described by the PLL node,
the node was wrong.

Do all chips that use this file have a div3?

> > > diff --git a/arch/powerpc/boot/dts/fsl/qoriq-clockgen2.dtsi
> > > b/arch/powerpc/boot/dts/fsl/qoriq-clockgen2.dtsi
> > > index 48e0b6e..7e1f074 100644
> > > --- a/arch/powerpc/boot/dts/fsl/qoriq-clockgen2.dtsi
> > > +++ b/arch/powerpc/boot/dts/fsl/qoriq-clockgen2.dtsi
> > > @@ -49,14 +49,16 @@ global-utilities@e1000 {
> > >   reg = <0x800 0x4>;
> > >   compatible = "fsl,qoriq-core-pll-2.0";
> > >   clocks = <&sysclk>;
> > > - clock-output-names = "pll0", "pll0-div2", "pll0-div4";
> > > + clock-output-names = "pll0", "pll0-div2", "pll0-div3",
> > > +   "pll0-div4";
> > 
> > You're changing the meaning of existing clock index 2.
> > 
> 
> Yes, however this platform PLL is a new work which is not yet used, so we 
> aren't breaking any  functionality.

No, it's the core PLL node which is already in use.  However, it looks
like the driver already interprets clock index 2 differently based on
whether clock index 3 exists.  None of this is mentioned in the binding
document...

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/3] powerpc/pmac: Fix DT refcount imbalance in pmac_pic_probe_oldstyle

2015-01-29 Thread Michael Ellerman

On Wed, 2015-14-01 at 13:51:57 UTC, Geert Uytterhoeven wrote:
> of_find_node_by_name() calls of_node_put() on its "from" parameter,
> which must not be done on "master", as it's still in use, and will be
> released manually later.  This may cause a zero kref refcount.
> Use of_get_child_by_name() instead to fix this.

But of_find_node_by_name() searches *all* nodes, not just the children of the
parameter.

So this is a logic change AFAICS, and I have no idea what machines we'd need to
test on to check it.

So I think an of_node_get(master) would be safer and also fix the refcounting.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V2] cpuidle/powernv: Read target_residency value of idle states from DT if available

2015-01-29 Thread Michael Ellerman

On Wed, 2015-28-01 at 02:13:06 UTC, Preeti U Murthy wrote:
> The device tree now exposes the residency values for different idle states. 
> Read
> these values instead of calculating residency from the latency values. The 
> values
> exposed in the DT are validated for optimal power efficiency. However to 
> maintain
> compatibility with the older firmware code which does not expose residency
> values, use default values as a fallback mechanism. While at it, handle some
> cleanups.
> 
> Signed-off-by: Preeti U Murthy 
> Acked-by: Stewart Smith 

This looks good to me.

Acked-by: Michael Ellerman 

I'm assuming Rafael will take it.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH][v4] power/fsl: add MDIO dt binding for FMan

2015-01-29 Thread Shaohui Xie

> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, January 30, 2015 10:44 AM
> To: Xie Shaohui-B21989
> Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve
> Emilian-EMMEDVE1
> Subject: Re: [PATCH][v4] power/fsl: add MDIO dt binding for FMan
> 
> On Thu, 2015-01-29 at 20:38 -0600, Xie Shaohui-B21989 wrote:
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Friday, January 30, 2015 8:54 AM
> > > To: shh@gmail.com
> > > Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve
> > > Emilian-EMMEDVE1; Xie Shaohui-B21989
> > > Subject: Re: [PATCH][v4] power/fsl: add MDIO dt binding for FMan
> > >
> > > On Wed, 2015-01-28 at 19:54 +0800, shh@gmail.com wrote:
> > > > +- interrupts
> > > > +   Usage: required
> > > > +   Value type: 
> > > > +   Definition: Event interrupt of external MDIO controller.
> > >
> > > What if this MDIO controller is not "external"?  Should Usage say
> > > "required for external MDIO"?
> > [S.H] I thought the definition can tell the interrupt is for external
> > MDIO, I can change the Usage to "required for external MDIO" in next
> version.
> > how about other parts, are they OK?
> 
> Yes.  I'll fix it up when applying.

Thank you!

Shaohui

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH][v4] power/fsl: add MDIO dt binding for FMan

2015-01-29 Thread Shaohui Xie

> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, January 30, 2015 8:54 AM
> To: shh@gmail.com
> Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve
> Emilian-EMMEDVE1; Xie Shaohui-B21989
> Subject: Re: [PATCH][v4] power/fsl: add MDIO dt binding for FMan
> 
> On Wed, 2015-01-28 at 19:54 +0800, shh@gmail.com wrote:
> > +- interrupts
> > +   Usage: required
> > +   Value type: 
> > +   Definition: Event interrupt of external MDIO controller.
> 
> What if this MDIO controller is not "external"?  Should Usage say
> "required for external MDIO"?
[S.H] I thought the definition can tell the interrupt is for external MDIO,
I can change the Usage to "required for external MDIO" in next version. 
how about other parts, are they OK?

Thanks!
Shaohui
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v4] power/fsl: add MDIO dt binding for FMan

2015-01-29 Thread Scott Wood

On Thu, 2015-01-29 at 20:38 -0600, Xie Shaohui-B21989 wrote:
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Friday, January 30, 2015 8:54 AM
> > To: shh@gmail.com
> > Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve
> > Emilian-EMMEDVE1; Xie Shaohui-B21989
> > Subject: Re: [PATCH][v4] power/fsl: add MDIO dt binding for FMan
> > 
> > On Wed, 2015-01-28 at 19:54 +0800, shh@gmail.com wrote:
> > > +- interrupts
> > > + Usage: required
> > > + Value type: 
> > > + Definition: Event interrupt of external MDIO controller.
> > 
> > What if this MDIO controller is not "external"?  Should Usage say
> > "required for external MDIO"?
> [S.H] I thought the definition can tell the interrupt is for external MDIO,
> I can change the Usage to "required for external MDIO" in next version. 
> how about other parts, are they OK?

Yes.  I'll fix it up when applying.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv2 0/8] Fix perf probe issues on powerpc

2015-01-29 Thread Michael Ellerman

On Wed, 2015-01-28 at 12:13 +0530, Naveen N. Rao wrote:
> On 2015/01/28 05:14PM, Michael Ellerman wrote:
> > On Wed, 2015-01-28 at 11:12 +0530, Naveen N. Rao wrote:
> > > On 2014/12/15 08:20PM, Naveen N Rao wrote:
> > > > This patchset fixes various issues with perf probe on powerpc across 
> > > > ABIv1 and
> > > > ABIv2:
> > > > - in the presence of DWARF debug-info,
> > > > - in the absence of DWARF, but with the symbol table, and
> > > > - in the absence of debug-info, but with kallsyms.
> > > > 
> > > > Applies cleanly on -tip. Tested on ppc64 BE and LE.
> > > > 
> > > > Changes from previous version:
> > > > Addressed various review comments from Mike Ellerman largely to 
> > > > generalize
> > > > changes. Some of the simpler patches have been retained in their 
> > > > previous form
> > > > to limit code churn, while others have been generalized by introducing 
> > > > arch
> > > > helpers. Individual patches have more details.
> > > 
> > > Michael,
> > > Can you please take a quick look at this?
> > 
> > I merged patch 1.
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/commit/?h=next&id=bf794bf52a80c6278a028f0af2ca32d7c3508c9b
> > 
> > The rest are not for me, they're perf tools, so you need to convince acme
> > they're good.
> 
> Oh, thanks! Sorry, I didn't realize you had already merged it.
> I assume you are ok with my changes in v2 w.r.t your previous review 
> comments.

Yeah it looks like you addressed most of my comments.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/1] powerpc/iommu: Handling null return of kzalloc_node

2015-01-29 Thread Michael Ellerman

On Tue, 2014-10-06 at 07:32:10 UTC, Zhouyi Zhou wrote:
> NULL return of kzalloc_node should be handled 

Yeah it should.

But just returning doesn't seem like it's going to end well. We end up with a
device that's not properly setup.

I think we need to rework that further so that either the error is propagated
up the stack, or the device is left in a working but degraded state.

cheers


 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 33b552f..593cd3d 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -613,7 +613,11 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus 
> *bus)
>  
>   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
>  pci->phb->node);
> -
> + if (!tbl) {
> + pr_debug(" out of memory, can't create iommu_table !\n");
> + return;
> + }
> +
>   iommu_table_setparms(pci->phb, dn, tbl);
>   pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
>   iommu_register_group(tbl, pci_domain_nr(bus), 0);
> @@ -659,6 +663,10 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
> *bus)
>   if (!ppci->iommu_table) {
>   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
>  ppci->phb->node);
> + if (!tbl) {
> + pr_debug(" out of memory, can't create iommu_table 
> !\n");
> + return;
> + }
>   iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window);
>   ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
>   iommu_register_group(tbl, pci_domain_nr(bus), 0);
> @@ -686,6 +694,11 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev 
> *dev)
>   pr_debug(" --> first child, no bridge. Allocating iommu 
> table.\n");
>   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
>  phb->node);
> + if (!tbl) {
> + pr_debug(" out of memory, can't create iommu_table 
> !\n");
> + return;
> + }
> +
>   iommu_table_setparms(phb, dn, tbl);
>   PCI_DN(dn)->iommu_table = iommu_init_table(tbl, phb->node);
>   iommu_register_group(tbl, pci_domain_nr(phb->bus), 0);
> @@ -1102,6 +1116,10 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev 
> *dev)
>   if (!pci->iommu_table) {
>   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
>  pci->phb->node);
> + if (!tbl) {
> + pr_debug(" out of memory, can't create iommu_table 
> !\n");
> + return;
> + }
>   iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window);
>   pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
>   iommu_register_group(tbl, pci_domain_nr(pci->phb->bus), 0);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,1/2] powerpc/corenet: Enable muxing MDIO buses via GPIO

2015-01-29 Thread Scott Wood

On Thu, Jan 22, 2015 at 04:48:37AM -0600, Emil Medve wrote:
> From: Andy Fleming 
> 
> Change-Id: I4489db79957ad533f4ba3f04fe7d5bcb3288e981
> Signed-off-by: Andy Fleming 
> Signed-off-by: Shaohui Xie 
> Signed-off-by: Shruti Kanetkar 
> ---

scott@snotra:~/fsl/git/linux/upstream$ ./scripts/checkpatch.pl 
v3-1-2-powerpc-corenet-Enable-muxing-MDIO-buses-via-GPIO.patch 
ERROR: Remove Gerrit Change-Id's before submitting upstream.
#21: 
Change-Id: I4489db79957ad533f4ba3f04fe7d5bcb3288e981

total: 1 errors, 0 warnings, 24 lines checked

v3-1-2-powerpc-corenet-Enable-muxing-MDIO-buses-via-GPIO.patch has style 
problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,1/2] powerpc/corenet: Enable muxing MDIO buses via GPIO

2015-01-29 Thread Scott Wood

On Thu, Jan 22, 2015 at 04:48:37AM -0600, Emil Medve wrote:
> From: Andy Fleming 
> 
> Change-Id: I4489db79957ad533f4ba3f04fe7d5bcb3288e981
> Signed-off-by: Andy Fleming 
> Signed-off-by: Shaohui Xie 
> Signed-off-by: Shruti Kanetkar 
> ---

These patches are missing your signoff.  Everyone who passes the patch
along needs to sign off.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required

2015-01-29 Thread Gavin Shan

On Wed, Jan 28, 2015 at 10:58:42AM +1100, Benjamin Herrenschmidt wrote:
>On Tue, 2015-01-27 at 16:58 -0600, Brian King wrote:
>> I'd argue we are our own worst enemy here really. The new user is EEH
>> code.
>> I don't see a huge reason that code would need to use this exact same
>> API.
>> 
>> > In fact, even with IPR and the existing call, how do you wait for
>> the link to come
>> > back for a PERST ? That can take a while...
>> 
>> Basically, I assert reset, delay for 1/2 second via a timer interrupt,
>> deassert reset,
>> delay for 2 seconds via another timer interrupt, then proceed with
>> adapter initialization.
>
>I'm surprised that even works properly... for example in the case of
>PERST we need to mask various error traps before asserting and unmask
>them when the link comes up (such as the surprise link down error), I
>don't see an opportunity in that scheme for FW to do that latter...
>

The FW perhaps does more than what's supposed to do for assert, and
less than what's supposed to do for deassert, but need confirm with
FW developers later. In this case, the link should come up in 1/2
second, which is really short. Otherwise, FW need implement deassert
function in blocking mode to wait the link to come back, which forces
the API to be called in non-atomic context. I'll check with FW developer
later on this.

I guess we have to change the API to be called in non-atomic context in
long run. For now, Wendy is waiting for the fix and port it to RHEL7.1.
I also sent another alternative patch, which was verified by Wendy.
I'm not sure if it's reasonable to include the following patch and
change driver's code to call this API under non-atomic context later
as proceeding enhancement?

https://patchwork.ozlabs.org/patch/432065/

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/4] powerpc/fsl-booke: Add device tree support for T1024/T1023 SoC

2015-01-29 Thread Scott Wood

On Thu, Jan 29, 2015 at 03:52:24PM +0800, Shengzhou Liu wrote:
> +/include/ "qoriq-i2c-0.dtsi"
> +/include/ "qoriq-i2c-1.dtsi"

t1023 has only three i2c controllers -- where do you disable the fourth? 

> +/include/ "t1023si-post.dtsi"
> +
> +/ {
> + aliases {
> + vga = &display;
> + display = &display;
> + };
> +};
> +
> +&soc {
> + display:display@18 {
> + compatible = "fsl,t1024-diu", "fsl,diu";
> + reg = <0x18 1000>;
> + interrupts = <74 2 0 0>;
> + };
> +};

There are other differences between t1023 an t1024.  Where do you
describe t1024's QE?  Where do you describe the DDR and IFC differences? 
Ccan they be detected at runtime?  t1024 supports deep sleep, but t1023
doesn't -- yet you label both chips as having t1024 rcpm.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/4] powerpc/fsl-booke: Add device tree support for T1024/T1023 SoC

2015-01-29 Thread Scott Wood

On Thu, Jan 29, 2015 at 03:52:24PM +0800, Shengzhou Liu wrote:
> + corenet-cf@18000 {
> + compatible = "fsl,corenet2-cf";

While the damage has already been done by the t1040 device tree, this is
not 100% compatible with what's on t4240.  I'm not sure if it's worth
doing anything about it at this point, given that you can tell the
difference by the version register even though that register is reserved
on t4240 and simliar chips, which is what I do in
http://patchwork.ozlabs.org/patch/419911/

> + reg = <0x18000 0x1000>;
> + interrupts = <16 2 1 31>;
> + fsl,ccf-num-csdids = <32>;
> + fsl,ccf-num-snoopids = <32>;

The t1040/t1024 CCM does not have CSD/Snoop IDs.

> + };
> +
> + iommu@2 {
> + compatible = "fsl,pamu-v1.0", "fsl,pamu";
> + reg = <0x2 0x1000>;
> + ranges = <0 0x2 0x1000>;
> + #address-cells = <1>;
> + #size-cells = <1>;
> + interrupts = <
> + 24 2 0 0
> + 16 2 1 30>;
> + pamu0: pamu@0 {
> + reg = <0 0x1000>;
> + fsl,primary-cache-geometry = <128 1>;
> + fsl,secondary-cache-geometry = <16 2>;
> + };

The secondary cache has 32 entries, not 16.

Please verify all information when submitting a device tree.  Don't just
copy and paste.

> + };
> +
> +/include/ "qoriq-mpic.dtsi"
> +
> + guts: global-utilities@e {
> + compatible = "fsl,t1024-device-config", 
> "fsl,qoriq-device-config-2.0";

Has anyone checked whether these "-2.0" properties make sense on
t1040/t1024?

> +/include/ "elo3-dma-0.dtsi"
> +/include/ "elo3-dma-1.dtsi"
> +
> +/include/ "qoriq-espi-0.dtsi"
> + spi@11 {
> + fsl,espi-num-chipselects = <4>;
> + };
> +
> +/include/ "qoriq-esdhc-0.dtsi"
> + sdhc@114000 {
> + compatible = "fsl,t1024-esdhc", "fsl,esdhc";
> + fsl,iommu-parent = <&pamu0>;
> + fsl,liodn-reg = <&guts 0x530>; /* eSDHCLIODNR */
> + sdhci,auto-cmd12;
> + no-1-8-v;
> + sleep = <&rcpm 0x0080>;
> + };
> +/include/ "qoriq-i2c-0.dtsi"
> +/include/ "qoriq-i2c-1.dtsi"
> +/include/ "qoriq-duart-0.dtsi"
> +/include/ "qoriq-duart-1.dtsi"
> +/include/ "qoriq-gpio-0.dtsi"
> + gpio@13 {
> + sleep = <&rcpm 0x0040>;
> + };
> +/include/ "qoriq-gpio-1.dtsi"
> +/include/ "qoriq-gpio-2.dtsi"
> +/include/ "qoriq-gpio-3.dtsi"
> +/include/ "qoriq-usb2-mph-0.dtsi"
> + usb0: usb@21 {
> + compatible = "fsl-usb2-mph-v2.5", "fsl-usb2-mph";
> + fsl,iommu-parent = <&pamu0>;
> + fsl,liodn-reg = <&guts 0x520>; /* USB1LIODNR */
> + phy_type = "utmi";
> + sleep = <&rcpm 0x0020>;
> + port0;
> + };
> +/include/ "qoriq-usb2-dr-0.dtsi"
> + usb1: usb@211000 {
> + compatible = "fsl-usb2-dr-v2.5", "fsl-usb2-dr";
> + fsl,iommu-parent = <&pamu0>;
> + fsl,liodn-reg = <&guts 0x524>; /* USB2LIODNR */
> + dr_mode = "host";
> + phy_type = "utmi";
> + sleep = <&rcpm 0x0010>;
> + };
> +/include/ "qoriq-sata2-0.dtsi"
> +sata@22 {
> + fsl,iommu-parent = <&pamu0>;
> + fsl,liodn-reg = <&guts 0x550>; /* SATA1LIODNR */
> +};
> +
> +/include/ "qoriq-sec5.0-0.dtsi"
> +};

Please fix indentation.

> +/ {
> + compatible = "fsl,T104x";

Drop this.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v4] power/fsl: add MDIO dt binding for FMan

2015-01-29 Thread Scott Wood

On Wed, 2015-01-28 at 19:54 +0800, shh@gmail.com wrote:
> +- interrupts
> + Usage: required
> + Value type: 
> + Definition: Event interrupt of external MDIO controller.

What if this MDIO controller is not "external"?  Should Usage say
"required for external MDIO"?

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: AW: SPE & Interrupt context (was how to make use of SPE instructions)

2015-01-29 Thread Scott Wood

On Wed, 2015-01-28 at 05:00 +, Markus Stockhausen wrote:
> > > Von: Scott Wood [scottw...@freescale.com]
> > > Gesendet: Mittwoch, 28. Januar 2015 05:21
> > > An: Markus Stockhausen
> > > Cc: Michael Ellerman; linuxppc-dev@lists.ozlabs.org; Herbert Xu
> > > Betreff: Re: SPE & Interrupt context (was how to make use of SPE 
> > > instructions)
> > > 
> > > Hi Scott,
> > >
> > > thanks for your helpful feedback. As you might have seen I sent a first
> > > patch for the sha256 kernel module that takes care about preemption.
> > >
> > > Herbert Xu noticed that my module won't run in for IPsec as all
> > > work will be done from interrupt context. Do you have a tip how I can
> > > mitigate the check I implemented:
> > >
> > > static bool spe_usable(void)
> > > {
> > >   return !in_interrupt();
> > > }
> > >
> > > Intel guys have something like that
> > >
> > > bool irq_fpu_usable(void)
> > > {
> > >   return !in_interrupt() ||
> > > interrupted_user_mode() ||
> > > interrupted_kernel_fpu_idle();
> > > }
> > >
> > > But I have no idea how to transfer it to the PPC/SPE case.
> > 
> > I'm not sure what sort of tip you're looking for, other than
> > implementing it myself. :-)
> 
> Hi Scott,
> 
> maybe I did not explain it correctly. interrupted_kernel_fpu_idle()
> is x86 specific. The same applies to interrupted_user_mode().
> I'm just searching for a similar feature in the PPC/SPE world.

There isn't one.

> I can see that enable_kernel_spe() does something with the
> MSR_SPE flag, but I have no idea  how to determine if I'm allowed
> to enable SPE although I'm inside an interrupt context.

As with x86, you'd want to check whether the kernel interrupted
userspace.  I don't know what x86 is doing with TS, but on PPC you might
check whether the interrupted thread had MSR_FP enabled.

> I'm asking because from the previous posts I conclude that 
> running SPE instructions inside an interrupt might be critical. 
> Because of registers not being saved?

Yes.  Currently callers of enable_kernel_spe() only need to disable
preemption, not interrupts.

> Or can I just save the register contents myself and interrupt
> context is no longer a showstopper?

If you only need a small number of registers that might be reasonable,
but if you need a bunch then you don't want to save them when you don't
have to.

Another option is to change enable_kernel_spe() to require interrupts to
be disabled.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powernv: Add OPAL soft-poweroff routine

2015-01-29 Thread Joel Stanley

Register a notifier for a OPAL message indicating that the machine
should prepare itself for a graceful power off.

OPAL will tell us if the power off is a reboot or shutdown, but for now
we perform the same orderly_poweroff action.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/platforms/powernv/Makefile |  2 +-
 arch/powerpc/platforms/powernv/opal-power.c | 68 +
 3 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-power.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ca2dd45..cdf32c0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -250,7 +250,7 @@ enum OpalMessageType {
 */
OPAL_MSG_MEM_ERR,
OPAL_MSG_EPOW,
-   OPAL_MSG_SHUTDOWN,
+   OPAL_MSG_SHUTDOWN,  /* params[0] = 1 reboot, 0 shutdown */
OPAL_MSG_HMI_EVT,
OPAL_MSG_TYPE_MAX,
 };
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index f241acc..6f3c5d3 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
-obj-y  += opal-msglog.o opal-hmi.o
+obj-y  += opal-msglog.o opal-hmi.o opal-power.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/opal-power.c 
b/arch/powerpc/platforms/powernv/opal-power.c
new file mode 100644
index 000..c047d7e
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-power.c
@@ -0,0 +1,68 @@
+/*
+ * PowerNV OPAL power control for graceful shutdown handling
+ *
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define SOFT_OFF 0x00
+#define SOFT_REBOOT 0x01
+
+static int opal_power_control_event(struct notifier_block *nb,
+   unsigned long msg_type, void *msg)
+{
+   struct opal_msg *power_msg = msg;
+   uint64_t type;
+
+   type = be64_to_cpu(power_msg->params[0]);
+
+   switch (type) {
+   case SOFT_OFF:
+   pr_info("OPAL: poweroff requested\n");
+   orderly_poweroff(true);
+   break;
+   case SOFT_REBOOT:
+   pr_info("OPAL: reboot requested\n");
+   /* TODO: shutdown such that userspace knows it's rebooting */
+   orderly_poweroff(true);
+   break;
+   default:
+   pr_err("OPAL: power control type unexpected %016llx\n", type);
+   }
+
+   return 0;
+}
+
+static struct notifier_block opal_power_control_nb = {
+   .notifier_call  = opal_power_control_event,
+   .next   = NULL,
+   .priority   = 0,
+};
+
+static int __init opal_power_control_init(void)
+{
+   int ret;
+
+   ret = opal_message_notifier_register(OPAL_MSG_SHUTDOWN,
+&opal_power_control_nb);
+   if (ret) {
+   pr_err("%s: Can't register OPAL event notifier (%d)\n",
+   __func__, ret);
+   return ret;
+   }
+
+   return 0;
+}
+
+machine_subsys_initcall(powernv, opal_power_control_init);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] VPHN parsing fixes

2015-01-29 Thread Greg Kurz

On Wed, 17 Dec 2014 10:40:46 +0100
Greg Kurz  wrote:
> Hi,
> 
> This series addresses remarks from Ben and Michael (see individual patches).
> The most notable changes are:
> - the parsing code being pull out into a separate file in patch 3/4. This
>   allows to write userland tests like the one below.
> - a full rewrite of the parsing logic in patch 4/4
> 

Ping ?

> --
> #include 
> #include 
> 
> typedef unsigned long u64;
> typedef unsigned int u32;
> typedef unsigned short u16;
> typedef enum { false = 0, true } bool;
> 
> #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> #define cpu_to_be32(x)bswap_32(x)
> #define be32_to_cpu(x)bswap_32(x)
> #define be16_to_cpup(x)   bswap_16(*x)
> #define cpu_to_be64(x)bswap_64(x)
> #else
> #define cpu_to_be32(x)(x)
> #define be32_to_cpu(x)(x)
> #define be16_to_cpup(x)   (*x)
> #define cpu_to_be64(x)(x)
> #endif
> 
> #define pr_debug(...) printf(__VA_ARGS__)
> 
> #include "vphn.c"
> 
> void print_packed(const long *packed)
> {
>   char *p = (char*) packed;
>   int i;
> 
>   printf("\nRegisters:\n");
>   for (i = 0; i < VPHN_REGISTER_COUNT; i++)
>   printf("0x%016lx\n", packed[i]);
> 
>   printf("\nMemory layout:\n");
>   for (i = 0; i < 6; i++) {
>   printf("0x %02hhx %02hhx %02hhx %02hhx"
>  " %02hhx %02hhx %02hhx %02hhx\n",
>  *(p + 0), *(p + 1), *(p + 2), *(p + 3),
>  *(p + 4), *(p + 5), *(p + 6), *(p + 7));
>   p += 8;
>   }
> 
>   putchar('\n');
> }
> 
> void print_unpacked(const __be32 *unpacked)
> {
>   int i;
> 
>   printf("\nVPHN associativity:\n");
>   for (i = 0; i <= be32_to_cpu(unpacked[0]); i++)
>   printf("0x%08x\n", be32_to_cpu(unpacked[i]));
> 
>   putchar('\n');
> }
> 
> int main(int argc, char **argv)
> {
>   int i;
>   struct {
>   const char *descr;
>   long packed[VPHN_REGISTER_COUNT];
>   } data[] = {
>   {
>   "16-bit and 32-bit",
>   0x8001800280038004,
>   0x8005800680078008,
>   0x0009000a,
>   0x000b000c,
>   0x,
>   0x
>   },
>   {
>   "filled with 16-bit",
>   0x8001800280038004,
>   0x8005800680078008,
>   0x8009800a800b800c,
>   0x800d800e800f8010,
>   0x8011801280138014,
>   0x8015801680178018,
>   },
>   {
>   "filled with 32-bit",
>   0x00010002,
>   0x00030004,
>   0x00050006,
>   0x00070008,
>   0x0009000a,
>   0x000b000c,
>   },
>   {
>   "32-bit has all ones in 16 lower bits",
>   0x000180028003,
>   0x,
>   0x,
>   0x,
>   0x,
>   0x,
>   },
>   {
>   "32-bit accross two 64-bit registers",
>   0x80010002,
>   0x000300048005,
>   0x,
>   0x,
>   0x,
>   0x,
>   },
>   {
>   "Truncated last 32-bit",
>   0x00010002,
>   0x00030004,
>   0x00050006,
>   0x00070008,
>   0x0009000a,
>   0x000b800c0bad,
>   },
>   };
> 
>   for (i = 0; i < sizeof(data) / sizeof(data[0]); i++) {
>   __be32 unpacked[VPHN_ASSOC_BUFSIZE] = { 0 };
>   
> printf("\n==\n");
>   printf("\nSet #%d: %s\n", i, data[i].descr);
>   
> printf("\n==\n");
>   print_packed(data[i].packed);
>   vphn_unpack_associativity(data[i].packed, unpacked);
>   print_unpacked(unpacked);
>   }
> 
>   return 0;
> }
> 
> ---
> 
> Greg Kurz (4):
>   powerpc/vphn: clarify the H_HOME_NODE_ASSOCIATIVITY API
>   powerpc/vphn: move endianness fixing to vphn_unpack_associativity()
>   powerpc/vphn: move VPHN parsing logic

Re: [PATCH] mmc: sdhci: Apply FSL ESDHC reset handling quirk to OF

2015-01-29 Thread Ulf Hansson

On 28 January 2015 at 20:52, Martin Hicks  wrote:
>
> The reset code was pushed into the esdhc-imx driver, but missed being
> pushed into the FSL OF driver at the same time.  The commit that broke
> the OF ESDHC driver was 0718e59ae259f7c48155b4e852d8b0632d59028e
>
> Signed-off-by: Martin Hicks 

Martin, thanks for the patch. Though I have already queued a patch for
this issue. It's available on my next branch.

Alessio Igor Bogani 
mmc: sdhci: Fix FSL ESDHC reset handling quirk

Kind regards
Uffe

> ---
>  drivers/mmc/host/sdhci-of-esdhc.c |   10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c 
> b/drivers/mmc/host/sdhci-of-esdhc.c
> index 8872c85..4a654d4 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -276,6 +276,14 @@ static void esdhc_pltfm_set_bus_width(struct sdhci_host 
> *host, int width)
> ESDHC_CTRL_BUSWIDTH_MASK, ctrl);
>  }
>
> +static void esdhc_reset(struct sdhci_host *host, u8 mask)
> +{
> +   sdhci_reset(host, mask);
> +
> +   sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
> +   sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
> +}
> +
>  static const struct sdhci_ops sdhci_esdhc_ops = {
> .read_l = esdhc_readl,
> .read_w = esdhc_readw,
> @@ -290,7 +298,7 @@ static const struct sdhci_ops sdhci_esdhc_ops = {
> .platform_init = esdhc_of_platform_init,
> .adma_workaround = esdhci_of_adma_workaround,
> .set_bus_width = esdhc_pltfm_set_bus_width,
> -   .reset = sdhci_reset,
> +   .reset = esdhc_reset,
> .set_uhs_signaling = sdhci_set_uhs_signaling,
>  };
>
> --
> 1.7.10.4
>
>
> --
> Martin Hicks P.Eng.|  m...@bork.org
> Bork Consulting Inc.   |  +1 (613) 266-2296
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] powerpc/fsl-booke: Add T1023 RDB board support

2015-01-29 Thread Shengzhou Liu

T1023RDB is a Freescale Reference Design Board that hosts T1023 SoC.

T1023RDB board Overview
---
- T1023 SoC integrating two 64-bit e5500 cores up to 1.4GHz
- CoreNet fabric supporting coherent and noncoherent transactions with
  prioritization and bandwidth allocation
- SDRAM memory: 2GB Micron MT40A512M8HX unbuffered 32-bit DDR4 without ECC
- Accelerator: DPAA components consist of FMan, BMan, QMan, DCE and SEC
- Ethernet interfaces:
  - one 1G RGMII port on-board(RTL8211F PHY)
  - one 1G SGMII port on-board(RTL8211F PHY)
  - one 2.5G SGMII port on-board(AQR105 PHY)
- PCIe: Two Mini-PCIe connectors on-board.
- SerDes: 4 lanes up to 10.3125GHz
- NOR:  128MB S29GL01GS110TFIV10 Spansion NOR Flash
- NAND: 512MB S34MS04G200BFI000 Spansion NAND Flash
- eSPI: 64MB S25FL512SAGMFI010 Spansion SPI flash.
- USB: one Type-A USB 2.0 port with internal PHY
- eSDHC: support SD/MMC and eMMC card
- 256Kbit M24256 I2C EEPROM
- RTC: Real-time clock DS1339 on I2C bus
- UART: one serial port on-board with RJ45 connector
- Debugging: JTAG/COP for T1023 debugging

Signed-off-by: Shengzhou Liu 
---
 arch/powerpc/boot/dts/t1023rdb.dts| 150 ++
 arch/powerpc/platforms/85xx/corenet_generic.c |   1 +
 2 files changed, 151 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/t1023rdb.dts

diff --git a/arch/powerpc/boot/dts/t1023rdb.dts 
b/arch/powerpc/boot/dts/t1023rdb.dts
new file mode 100644
index 000..b187cfe
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1023rdb.dts
@@ -0,0 +1,150 @@
+/*
+ * T1023 RDB Device Tree Source
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t102xsi-pre.dtsi"
+
+/ {
+   model = "fsl,T1023RDB";
+   compatible = "fsl,T1023RDB";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   interrupt-parent = <&mpic>;
+
+   ifc: localbus@ffe124000 {
+   reg = <0xf 0xfe124000 0 0x2000>;
+   ranges = <0 0 0xf 0xe800 0x0800
+ 1 0 0xf 0xff80 0x0001>;
+
+   nor@0,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "cfi-flash";
+   reg = <0x0 0x0 0x800>;
+   bank-width = <2>;
+   device-width = <1>;
+   };
+
+   nand@1,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "fsl,ifc-nand";
+   reg = <0x2 0x0 0x1>;
+   };
+   };
+
+   memory {
+   device_type = "memory";
+   };
+
+   dcsr: dcsr@f {
+   ranges = <0x 0xf 0x 0x01072000>;
+   };
+
+   soc: soc@ffe00 {
+   ranges = <0x 0xf 0xfe00 0x100>;
+   reg = <0xf 0xfe00 0 0x1000>;
+   spi@11 {
+   flash@0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "spansion,s25fl512s";
+   reg = <0>;
+

[PATCH 1/4] powerpc/fsl-booke: Add device tree support for T1024/T1023 SoC

2015-01-29 Thread Shengzhou Liu

The T1024 SoC includes the following function and features:
- Two 64-bit Power architecture e5500 cores, up to 1.4GHz
- private 256KB L2 cache each core and shared 256KB CoreNet platform cache (CPC)
- 32-/64-bit DDR3L/DDR4 SDRAM memory controller with ECC and interleaving 
support
- Data Path Acceleration Architecture (DPAA) incorporating acceleration
- Four MAC for 1G/2.5G/10G network interfaces (RGMII, SGMII, QSGMII, XFI)
- High-speed peripheral interfaces
  - Three PCI Express 2.0 controllers
- Additional peripheral interfaces
  - One SATA 2.0 controller
  - Two USB 2.0 controllers with integrated PHY
  - Enhanced secure digital host controller (SD/eSDHC/eMMC)
  - Enhanced serial peripheral interface (eSPI)
  - Four I2C controllers
  - Four 2-pin UARTs or two 4-pin UARTs
  - Integrated Flash Controller supporting NAND and NOR flash
- Two 8-channel DMA engines
- Multicore programmable interrupt controller (PIC)
- LCD interface (DIU) with 12 bit dual data rate
- QUICC Engine block supporting TDM, HDLC, and UART
- Deep Sleep power implementaion (wakeup from GPIO/Timer/Ethernet/USB)
- Support for hardware virtualization and partitioning enforcement
- QorIQ Platform's Trust Architecture 2.0

Signed-off-by: Shengzhou Liu 
---
 arch/powerpc/boot/dts/fsl/t1023si-post.dtsi | 351 
 arch/powerpc/boot/dts/fsl/t1024si-post.dtsi |  50 
 arch/powerpc/boot/dts/fsl/t102xsi-pre.dtsi  |  88 +++
 3 files changed, 489 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/fsl/t1023si-post.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/t1024si-post.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/t102xsi-pre.dtsi

diff --git a/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi
new file mode 100644
index 000..23fbc5d
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi
@@ -0,0 +1,351 @@
+/*
+ * T1023 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+&ifc {
+   #address-cells = <2>;
+   #size-cells = <1>;
+   compatible = "fsl,ifc", "simple-bus";
+   interrupts = <25 2 0 0>;
+};
+
+&pci0 {
+   compatible = "fsl,t1024-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+   device_type = "pci";
+   #size-cells = <2>;
+   #address-cells = <3>;
+   bus-range = <0x0 0xff>;
+   interrupts = <20 2 0 0>;
+   fsl,iommu-parent = <&pamu0>;
+   pcie@0 {
+   reg = <0 0 0 0 0>;
+   #interrupt-cells = <1>;
+   #size-cells = <2>;
+   #address-cells = <3>;
+   device_type = "pci";
+   interrupts = <20 2 0 0>;
+   interrupt-map-mask = <0xf800 0 0 7>;
+   interrupt-map = <
+   /* IDSEL 0x0 */
+    0 0 1 &mpic 40 1 0 0
+    0 0 2 &mpic 1 1 0 0
+    0 0 3 &mpic 2 1 0 0
+    0 0 4 &mpic 3 1 0 0
+   >;
+   };
+};
+
+&pci1 {
+   compatible = "fsl,t1024-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+   device_type = "pci";
+   #size-cells = <2>;
+   #address-cells = <3>;

[PATCH 2/4] powerpc/fsl-booke: Add T1024 QDS board support

2015-01-29 Thread Shengzhou Liu

Add support for Freescale T1024/T1023 QorIQ Development System Board.

T1024QDS is a high-performance computing evaluation, development and
test platform for T1024 QorIQ Power Architecture processor.

T1024QDS board Overview
---
- T1024 SoC integrating two 64-bit e5500 cores up to 1.4GHz
- CoreNet fabric supporting coherent and noncoherent transactions with
  prioritization and bandwidth allocation
- 32-/64-bit DDR3L/DDR4 SDRAM memory controller with ECC and interleaving 
support
- Accelerator: DPAA components consist of FMan, BMan, QMan, PME, DCE and SEC
- Ethernet interfaces:
  - Two 10M/100M/1G RGMII ports on-board
  - Three 1G/2.5Gbps SGMII ports
  - Four 1Gbps QSGMII ports
  - one 10Gbps XFI or 10G Base-KR interface
- SerDes: 4 lanes up to 10.3125GHz Supporting SGMII/QSGMII, XFI, PCIe, SATA and 
Aurora
- PCIe: Three PCI Express controllers with five PCIe slots.
- IFC: 128MB NOR Flash, 2GB NAND Flash, PromJet debug port and Qixis FPGA
- Video: DIU supports video up to 1280x1024x32 bpp.
  - Chrontel CH7201 for HDMI connection.
  - TI DS90C387R for direct LCD connection.
  - Raw (not encoded) video connector for testing or other encoders.
- QUICC Engine block
  - 32-bit RISC controller for flexible support of the communications 
peripherals
  - Serial DMA channel for receive and transmit on all serial channels
  - Two universal communication controllers, supporting TDM, HDLC, and UART
- Deep sleep power management implementaion (wakeup from 
GPIO/Timer/Ethernet/USB)
- eSPI: Three SPI flash devices.
- SATA: one SATA 2.O.
- USB: Two USB2.0 ports with internal PHY (one Type-A and one micro Type 
mini-AB)
- eSDHC: Support SD, SDHC, SDXC and MMC/eMMC.
- I2C: Four I2C controllers.
- UART: Two UART on board.

Signed-off-by: Shengzhou Liu 
---
 arch/powerpc/boot/dts/t1024qds.dts|  46 +
 arch/powerpc/boot/dts/t102xqds.dtsi   | 247 ++
 arch/powerpc/platforms/85xx/Kconfig   |   2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c |   1 +
 4 files changed, 295 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/boot/dts/t1024qds.dts
 create mode 100644 arch/powerpc/boot/dts/t102xqds.dtsi

diff --git a/arch/powerpc/boot/dts/t1024qds.dts 
b/arch/powerpc/boot/dts/t1024qds.dts
new file mode 100644
index 000..30d0d51
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1024qds.dts
@@ -0,0 +1,46 @@
+/*
+ * T1024 QDS Device Tree Source
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t102xsi-pre.dtsi"
+/include/ "t102xqds.dtsi"
+
+/ {
+   model = "fsl,T1024QDS";
+   compatible = "fsl,T1024QDS";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   interrupt-parent = <&mpic>;
+};
+
+/include/ "fsl/t1024si-post.dtsi"
diff --git a/arch/powerpc/boot/dts/t102xqds.dtsi 
b/arch/powerpc/boot/dts/t102xqds.dtsi
new file mode 100644
index 000..a7eae95
--- /dev/null
+++ b/arch/powerpc/boot/dts/t102xqds.dtsi
@@ -0,0 +1,247 @@
+/*
+ * T102x QDS Device Tree Source
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * *

[PATCH 3/4] powerpc/fsl-booke: Add T1024RDB board support

2015-01-29 Thread Shengzhou Liu

T1024RDB is a Freescale Reference Design Board that hosts the T1024 SoC.

T1024RDB board Overview
---
- Processor: T1024 SoC integrating two 64-bit e5500 cores up to 1.4GHz
- DDR: 64-bit 4GB DDR3L UDIMM with ECC and interleaving support
- Ethernet: two 10M/100M/1Gbps RGMII ports and one 10Gbps Base-T port on-board
- Accelerator: DPAA components consist of FMan, BMan, QMan, PME, DCE and SEC
- SerDes: 4 lanes up to 10.3125GHz
- IFC: 128MB NOR Flash, 1GB NAND Flash and CPLD system controlling
- PCIe: one PCIe slot and two Mini-PCIe connectors on-board
- USB: two Type-A USB2.0 ports with internal PHY
- eSDHC: one SDHC/MMC/eMMC connector
- eSPI: one 64MB N25Q512 SPI flash
- QE-TDM: support TDM Riser card
   - 32-bit RISC controller for flexible support of the communications 
peripherals
   - Serial DMA channel for receive and transmit on all serial channels
   - Two universal communication controllers, supporting TDM, HDLC, and UART
- I2C: four I2C controllers
- UART: two UART on board
- Deep sleep power management implementaion

Signed-off-by: Shengzhou Liu 
---
 arch/powerpc/boot/dts/t1024rdb.dts| 185 ++
 arch/powerpc/platforms/85xx/Kconfig   |   2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c |   1 +
 3 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/boot/dts/t1024rdb.dts

diff --git a/arch/powerpc/boot/dts/t1024rdb.dts 
b/arch/powerpc/boot/dts/t1024rdb.dts
new file mode 100644
index 000..a6b88e3
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1024rdb.dts
@@ -0,0 +1,185 @@
+/*
+ * T1024 RDB Device Tree Source
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t102xsi-pre.dtsi"
+
+/ {
+   model = "fsl,T1024RDB";
+   compatible = "fsl,T1024RDB";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   interrupt-parent = <&mpic>;
+
+   ifc: localbus@ffe124000 {
+   reg = <0xf 0xfe124000 0 0x2000>;
+   ranges = <0 0 0xf 0xe800 0x0800
+ 2 0 0xf 0xff80 0x0001
+ 3 0 0xf 0xffdf 0x8000>;
+
+   nor@0,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "cfi-flash";
+   reg = <0x0 0x0 0x800>;
+   bank-width = <2>;
+   device-width = <1>;
+   };
+
+   nand@1,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "fsl,ifc-nand";
+   reg = <0x2 0x0 0x1>;
+   };
+
+   board-control@2,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "fsl,t1024-cpld", "fsl,deepsleep-cpld";
+   reg = <3 0 0x300>;
+   ranges = <0 3 0 0x300>;
+   bank-width = <1>;
+   device-width = <1>;
+   };
+   };
+
+   memory {
+   device_type = "memory";
+   };
+
+   dcsr: dcsr@f {
+

[PATCH v3 01/24] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver

2015-01-29 Thread Alexey Kardashevskiy

This moves page pinning (get_user_pages_fast()/put_page()) code out of
the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
to as the platform code does not deal with page pinning.

This makes iommu_take_ownership()/iommu_release_ownership() deal with
the IOMMU table bitmap only.

This removes page unpinning from iommu_take_ownership() as the actual
TCE table might contain garbage and doing put_page() on it is undefined
behaviour.

Besides the last part, the rest of the patch is mechanical.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h|  6 ---
 arch/powerpc/kernel/iommu.c | 68 ---
 drivers/vfio/vfio_iommu_spapr_tce.c | 91 +++--
 3 files changed, 78 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 9cfa370..45b07f6 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -191,16 +191,10 @@ extern int iommu_tce_build(struct iommu_table *tbl, 
unsigned long entry,
unsigned long hwaddr, enum dma_data_direction direction);
 extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
unsigned long entry);
-extern int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
-   unsigned long entry, unsigned long pages);
-extern int iommu_put_tce_user_mode(struct iommu_table *tbl,
-   unsigned long entry, unsigned long tce);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
 extern int iommu_take_ownership(struct iommu_table *tbl);
 extern void iommu_release_ownership(struct iommu_table *tbl);
 
-extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 5d3968c..456acb1 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -903,19 +903,6 @@ void iommu_register_group(struct iommu_table *tbl,
kfree(name);
 }
 
-enum dma_data_direction iommu_tce_direction(unsigned long tce)
-{
-   if ((tce & TCE_PCI_READ) && (tce & TCE_PCI_WRITE))
-   return DMA_BIDIRECTIONAL;
-   else if (tce & TCE_PCI_READ)
-   return DMA_TO_DEVICE;
-   else if (tce & TCE_PCI_WRITE)
-   return DMA_FROM_DEVICE;
-   else
-   return DMA_NONE;
-}
-EXPORT_SYMBOL_GPL(iommu_tce_direction);
-
 void iommu_flush_tce(struct iommu_table *tbl)
 {
/* Flush/invalidate TLB caches if necessary */
@@ -991,30 +978,6 @@ unsigned long iommu_clear_tce(struct iommu_table *tbl, 
unsigned long entry)
 }
 EXPORT_SYMBOL_GPL(iommu_clear_tce);
 
-int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
-   unsigned long entry, unsigned long pages)
-{
-   unsigned long oldtce;
-   struct page *page;
-
-   for ( ; pages; --pages, ++entry) {
-   oldtce = iommu_clear_tce(tbl, entry);
-   if (!oldtce)
-   continue;
-
-   page = pfn_to_page(oldtce >> PAGE_SHIFT);
-   WARN_ON(!page);
-   if (page) {
-   if (oldtce & TCE_PCI_WRITE)
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL_GPL(iommu_clear_tces_and_put_pages);
-
 /*
  * hwaddr is a kernel virtual address here (0xc... bazillion),
  * tce_build converts it to a physical address.
@@ -1044,35 +1007,6 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned 
long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_build);
 
-int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry,
-   unsigned long tce)
-{
-   int ret;
-   struct page *page = NULL;
-   unsigned long hwaddr, offset = tce & IOMMU_PAGE_MASK(tbl) & ~PAGE_MASK;
-   enum dma_data_direction direction = iommu_tce_direction(tce);
-
-   ret = get_user_pages_fast(tce & PAGE_MASK, 1,
-   direction != DMA_TO_DEVICE, &page);
-   if (unlikely(ret != 1)) {
-   /* pr_err("iommu_tce: get_user_pages_fast failed tce=%lx 
ioba=%lx ret=%d\n",
-   tce, entry << tbl->it_page_shift, ret); */
-   return -EFAULT;
-   }
-   hwaddr = (unsigned long) page_address(page) + offset;
-
-   ret = iommu_tce_build(tbl, entry, hwaddr, direction);
-   if (ret)
-   put_page(page);
-
-   if (ret < 0)
-   pr_err("iommu_tce: %s failed ioba=%lx, tce=%lx, ret=%d\n",
-   __func__, entry << tbl->it_page_shift, tce, ret);
-
-   return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode);
-
 int iommu_take_ownership(struct iommu_table *tbl)
 {
unsigned long sz = (tbl->it_size + 7) >> 3;
@@ -1086,7 +1020,6 @@ int iommu_take_ownership(struct iommu_table *tbl)
}
 
mem

[PATCH v3 24/24] vfio: powerpc/spapr: Support Dynamic DMA windows

2015-01-29 Thread Alexey Kardashevskiy

This adds create/remove window ioctls to create and remove DMA windows.

This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
information such as a number of supported windows and maximum number
levels of TCE tables.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h|   2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 137 +++-
 include/uapi/linux/vfio.h   |  24 ++-
 3 files changed, 160 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 33009f9..7ca1c8c 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -133,7 +133,7 @@ extern void iommu_free_table(struct iommu_table *tbl, const 
char *node_name);
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);
 
-#define POWERPC_IOMMU_MAX_TABLES   1
+#define POWERPC_IOMMU_MAX_TABLES   2
 
 #define POWERPC_IOMMU_DEFAULT_LEVELS   1
 
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8bcafb7..d3a1cc9 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -300,6 +300,20 @@ static struct iommu_table *spapr_tce_find_table(
return ret;
 }
 
+static int spapr_tce_find_free_table(struct tce_container *container)
+{
+   int i;
+
+   for (i = 0; i < POWERPC_IOMMU_MAX_TABLES; ++i) {
+   struct iommu_table *tbl = &container->tables[i];
+
+   if (!tbl->it_size)
+   return i;
+   }
+
+   return -1;
+}
+
 static unsigned long tce_default_winsize(struct tce_container *container)
 {
struct tce_iommu_group *tcegrp;
@@ -594,7 +608,7 @@ static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
 {
struct tce_container *container = iommu_data;
-   unsigned long minsz;
+   unsigned long minsz, ddwsz;
long ret;
 
switch (cmd) {
@@ -636,6 +650,15 @@ static long tce_iommu_ioctl(void *iommu_data,
 
info.dma32_window_start = iommu->tce32_start;
info.dma32_window_size = iommu->tce32_size;
+   info.windows_supported = iommu->windows_supported;
+   info.levels = iommu->levels;
+   info.flags = iommu->flags;
+
+   ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info,
+   levels);
+
+   if (info.argsz == ddwsz)
+   minsz = ddwsz;
 
if (copy_to_user((void __user *)arg, &info, minsz))
return -EFAULT;
@@ -800,6 +823,118 @@ static long tce_iommu_ioctl(void *iommu_data,
return ret;
}
 
+   case VFIO_IOMMU_SPAPR_TCE_CREATE: {
+   struct vfio_iommu_spapr_tce_create create;
+   struct powerpc_iommu *iommu;
+   struct tce_iommu_group *tcegrp;
+   int num;
+
+   if (!tce_preregistered(container))
+   return -ENXIO;
+
+   minsz = offsetofend(struct vfio_iommu_spapr_tce_create,
+   start_addr);
+
+   if (copy_from_user(&create, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (create.argsz < minsz)
+   return -EINVAL;
+
+   if (create.flags)
+   return -EINVAL;
+
+   num = spapr_tce_find_free_table(container);
+   if (num < 0)
+   return -ENOSYS;
+
+   tcegrp = list_first_entry(&container->group_list,
+   struct tce_iommu_group, next);
+   iommu = iommu_group_get_iommudata(tcegrp->grp);
+
+   ret = iommu->ops->create_table(iommu, num,
+   create.page_shift, create.window_shift,
+   create.levels,
+   &container->tables[num]);
+   if (ret)
+   return ret;
+
+   list_for_each_entry(tcegrp, &container->group_list, next) {
+   struct powerpc_iommu *iommutmp =
+   iommu_group_get_iommudata(tcegrp->grp);
+
+   if (WARN_ON_ONCE(iommutmp->ops != iommu->ops))
+   return -EFAULT;
+
+   ret = iommu->ops->set_window(iommutmp, num,
+   &container->tables[num]);
+   if (ret)
+   return ret;
+   }
+
+   create.start_addr =
+   container->tables[num].it_offset <<
+   container->tables[num].it_page_shift;
+
+   if (copy_to_user((void __user *)arg, &create, minsz))
+

[PATCH v3 22/24] powerpc/iommu: Get rid of ownership helpers

2015-01-29 Thread Alexey Kardashevskiy

iommu_take_ownership/iommu_release_ownership used to be used to mark
bits in iommu_table::it_map. Since the IOMMU tables are recreated for
VFIO, it_map is always NULL.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h |  2 -
 arch/powerpc/kernel/iommu.c  | 96 
 2 files changed, 98 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 8393822..33009f9 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -272,8 +272,6 @@ extern long iommu_tce_xchg(struct iommu_table *tbl, 
unsigned long entry,
enum dma_data_direction direction);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
-extern int iommu_take_ownership(struct powerpc_iommu *iommu);
-extern void iommu_release_ownership(struct powerpc_iommu *iommu);
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 5f87076..6987115 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1007,102 +1007,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned 
long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_xchg);
 
-static int iommu_table_take_ownership(struct iommu_table *tbl)
-{
-   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
-   int ret = 0;
-
-   /*
-* VFIO does not control TCE entries allocation and the guest
-* can write new TCEs on top of existing ones so iommu_tce_build()
-* must be able to release old pages. This functionality
-* requires exchange() callback defined so if it is not
-* implemented, we disallow taking ownership over the table.
-*/
-   if (!tbl->it_ops->exchange)
-   return -EINVAL;
-
-   spin_lock_irqsave(&tbl->large_pool.lock, flags);
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_lock(&tbl->pools[i].lock);
-
-   if (tbl->it_offset == 0)
-   clear_bit(0, tbl->it_map);
-
-   if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
-   pr_err("iommu_tce: it_map is not empty");
-   ret = -EBUSY;
-   if (tbl->it_offset == 0)
-   set_bit(0, tbl->it_map);
-   } else {
-   memset(tbl->it_map, 0xff, sz);
-   }
-
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_unlock(&tbl->pools[i].lock);
-   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-
-   return 0;
-}
-
-static void iommu_table_release_ownership(struct iommu_table *tbl);
-
-int iommu_take_ownership(struct powerpc_iommu *iommu)
-{
-   int i, j, rc = 0;
-
-   for (i = 0; i < POWERPC_IOMMU_MAX_TABLES; ++i) {
-   struct iommu_table *tbl = &iommu->tables[i];
-
-   if (!tbl->it_map)
-   continue;
-
-   rc = iommu_table_take_ownership(tbl);
-   if (rc) {
-   for (j = 0; j < i; ++j)
-   iommu_table_release_ownership(
-   &iommu->tables[j]);
-
-   return rc;
-   }
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL_GPL(iommu_take_ownership);
-
-static void iommu_table_release_ownership(struct iommu_table *tbl)
-{
-   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
-
-   spin_lock_irqsave(&tbl->large_pool.lock, flags);
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_lock(&tbl->pools[i].lock);
-
-   memset(tbl->it_map, 0, sz);
-
-   /* Restore bit#0 set by iommu_init_table() */
-   if (tbl->it_offset == 0)
-   set_bit(0, tbl->it_map);
-
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_unlock(&tbl->pools[i].lock);
-   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-}
-
-extern void iommu_release_ownership(struct powerpc_iommu *iommu)
-{
-   int i;
-
-   for (i = 0; i < POWERPC_IOMMU_MAX_TABLES; ++i) {
-   struct iommu_table *tbl = &iommu->tables[i];
-
-   if (tbl->it_map)
-   iommu_table_release_ownership(tbl);
-   }
-}
-EXPORT_SYMBOL_GPL(iommu_release_ownership);
-
 int iommu_add_device(struct device *dev)
 {
struct iommu_table *tbl;
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 23/24] vfio/spapr: Enable multiple groups in a container

2015-01-29 Thread Alexey Kardashevskiy

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 243 +++-
 1 file changed, 155 insertions(+), 88 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index d0987ae..8bcafb7 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -84,9 +84,15 @@ static void decrement_locked_vm(long npages)
  */
 struct tce_container {
struct mutex lock;
-   struct iommu_group *grp;
bool enabled;
struct list_head mem_list;
+   struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
+   struct list_head group_list;
+};
+
+struct tce_iommu_group {
+   struct list_head next;
+   struct iommu_group *grp;
 };
 
 struct tce_memory {
@@ -265,17 +271,21 @@ static bool tce_check_page_size(struct page *page, 
unsigned page_shift)
return false;
 }
 
+static inline bool tce_groups_attached(struct tce_container *container)
+{
+   return !list_empty(&container->group_list);
+}
+
 static struct iommu_table *spapr_tce_find_table(
struct tce_container *container,
phys_addr_t ioba)
 {
long i;
struct iommu_table *ret = NULL;
-   struct powerpc_iommu *iommu = iommu_group_get_iommudata(container->grp);
 
mutex_lock(&container->lock);
for (i = 0; i < POWERPC_IOMMU_MAX_TABLES; ++i) {
-   struct iommu_table *tbl = &iommu->tables[i];
+   struct iommu_table *tbl = &container->tables[i];
unsigned long entry = ioba >> tbl->it_page_shift;
unsigned long start = tbl->it_offset;
unsigned long end = start + tbl->it_size;
@@ -290,13 +300,31 @@ static struct iommu_table *spapr_tce_find_table(
return ret;
 }
 
+static unsigned long tce_default_winsize(struct tce_container *container)
+{
+   struct tce_iommu_group *tcegrp;
+   struct powerpc_iommu *iommu;
+
+   if (!tce_groups_attached(container))
+   return 0;
+
+   tcegrp = list_first_entry(&container->group_list,
+   struct tce_iommu_group, next);
+   if (!tcegrp)
+   return 0;
+
+   iommu = iommu_group_get_iommudata(tcegrp->grp);
+   if (!iommu)
+   return 0;
+
+   return iommu->tce32_size;
+}
+
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
-   struct powerpc_iommu *iommu;
-   struct iommu_table *tbl;
 
-   if (!container->grp)
+   if (!tce_groups_attached(container))
return -ENXIO;
 
if (container->enabled)
@@ -328,12 +356,8 @@ static int tce_iommu_enable(struct tce_container 
*container)
 * KVM agnostic.
 */
if (!tce_preregistered(container)) {
-   iommu = iommu_group_get_iommudata(container->grp);
-   if (!iommu)
-   return -EFAULT;
-
-   tbl = &iommu->tables[0];
-   ret = try_increment_locked_vm(IOMMU_TABLE_PAGES(tbl));
+   ret = try_increment_locked_vm(
+   tce_default_winsize(container) >> PAGE_SHIFT);
if (ret)
return ret;
}
@@ -343,27 +367,23 @@ static int tce_iommu_enable(struct tce_container 
*container)
return ret;
 }
 
+static int tce_iommu_clear(struct tce_container *container,
+   struct iommu_table *tbl,
+   unsigned long entry, unsigned long pages);
+
 static void tce_iommu_disable(struct tce_container *container)
 {
-   struct powerpc_iommu *iommu;
-   struct iommu_table *tbl;
-
if (!container->enabled)
return;
 
container->enabled = false;
 
-   if (!container->grp || !current->mm)
+   if (!current->mm)
return;
 
-   if (!tce_preregistered(container)) {
-   iommu = iommu_group_get_iommudata(container->grp);
-   if (!iommu)
-   return;
-
-   tbl = &iommu->tables[0];
-   decrement_locked_vm(IOMMU_TABLE_PAGES(tbl));
-   }
+   if (!tce_preregistered(container))
+   decrement_locked_vm(
+   tce_default_winsize(container) >> PAGE_SHIFT);
 }
 
 static void *tce_iommu_open(unsigned long arg)
@@ -381,20 +401,44 @@ static void *tce_iommu_open(unsigned long arg)
 
mutex_init(&container->lock);
INIT_LIST_HEAD_RCU(&container->mem_list);
+   INIT_LIST_HEAD_RCU(&container->group_list);
 
return container;
 }
 
 static void tce_iommu_release(void *iommu_data)
 {
+   int i;
+   struct powerpc_iommu *iommu;
+   struct tce_iommu_group *tcegrp;
struct tce_container *container = iommu_data;
struct tce_memory *mem, *memtmp;
+   struct powerpc_iommu_ops *iommuops = NULL;
 
-   WARN_ON(container->grp);
tce_iommu_disable(container);
 
-   if (

[PATCH v3 21/24] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

2015-01-29 Thread Alexey Kardashevskiy

This extends powerpc_iommu_ops by a set of callbacks to support dynamic
DMA windows management.

query() returns IOMMU capabilities such as default DMA window address and
supported number of DMA windows and TCE table levels.

create_table() creates a TCE table with specific parameters. For now
it receives powerpc_iommu to know nodeid in order to allocate TCE table
memory closer to the PHB. The exact format of allocated multi-level table
might be also specific to the PHB model (not the case now though).

set_window() sets the window at specified TVT index on PHB.

unset_window() unsets the window from specified TVT.

free_table() frees the memory occupied by a table.

The purpose of this separation is that we need to be able to create
one table and assign it to a set of PHB. This way we can support multiple
IOMMU groups in one VFIO container and make use of VFIO on SPAPR closer
to the way it works on x86.

This uses new helpers to remove the default TCE table if the ownership is
being taken and create it otherwise. So once an external user (such as
VFIO) obtained the ownership over a group, it does not have any DMA
windows, neither default 32bit not bypass window. The external user is
expected to unprogram DMA windows on PHBs before returning ownership
back to the kernel.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  | 31 ++
 arch/powerpc/platforms/powernv/pci-ioda.c | 98 ++-
 2 files changed, 113 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 283f70f..8393822 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -147,12 +147,43 @@ struct powerpc_iommu_ops {
 */
void (*set_ownership)(struct powerpc_iommu *iommu,
bool enable);
+
+   long (*create_table)(struct powerpc_iommu *iommu,
+   int num,
+   __u32 page_shift,
+   __u32 window_shift,
+   __u32 levels,
+   struct iommu_table *tbl);
+   long (*set_window)(struct powerpc_iommu *iommu,
+   int num,
+   struct iommu_table *tblnew);
+   long (*unset_window)(struct powerpc_iommu *iommu,
+   int num);
+   void (*free_table)(struct iommu_table *tbl);
 };
 
+/* Page size flags for ibm,query-pe-dma-window */
+#define DDW_PGSIZE_4K   0x01
+#define DDW_PGSIZE_64K  0x02
+#define DDW_PGSIZE_16M  0x04
+#define DDW_PGSIZE_32M  0x08
+#define DDW_PGSIZE_64M  0x10
+#define DDW_PGSIZE_128M 0x20
+#define DDW_PGSIZE_256M 0x40
+#define DDW_PGSIZE_16G  0x80
+#define DDW_PGSIZE_MASK 0xFF
+
 struct powerpc_iommu {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *group;
 #endif
+   /* Some key properties of IOMMU */
+   __u32 tce32_start;
+   __u32 tce32_size;
+   __u32 windows_supported;
+   __u32 levels;
+   __u32 flags;
+
struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
struct powerpc_iommu_ops *ops;
 };
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 29bd7a4..cf63ebb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1360,7 +1360,7 @@ static __be64 *pnv_alloc_tce_table(int nid,
return addr;
 }
 
-static long pnv_pci_ioda2_create_table(struct powerpc_iommu *iommu,
+static long pnv_pci_ioda2_create_table(struct powerpc_iommu *iommu, int num,
__u32 page_shift, __u32 window_shift, __u32 levels,
struct iommu_table *tbl)
 {
@@ -1388,8 +1388,8 @@ static long pnv_pci_ioda2_create_table(struct 
powerpc_iommu *iommu,
shift = ROUND_UP(window_shift - page_shift, levels) / levels;
shift += 3;
shift = max_t(unsigned, shift, IOMMU_PAGE_SHIFT_4K);
-   pr_info("Creating TCE table %08llx, %d levels, TCE table size = %lx\n",
-   1ULL << window_shift, levels, 1UL << shift);
+   pr_info("Creating TCE table #%d %08llx, %d levels, TCE table size = 
%lx\n",
+   num, 1ULL << window_shift, levels, 1UL << shift);
 
tbl->it_level_size = 1ULL << (shift - 3);
left = tce_table_size;
@@ -1400,11 +1400,10 @@ static long pnv_pci_ioda2_create_table(struct 
powerpc_iommu *iommu,
tbl->it_indirect_levels = levels - 1;
 
/* Setup linux iommu table */
-   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
-   page_shift);
+   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size,
+   num ? pe->tce_bypass_base : 0, page_shift);
 
tbl->it_ops = &pnv_ioda2_iommu_ops;
-   iommu_init_table(tbl, nid);
 
return 0;
 }
@@ -1421,8 +1420,18 @@ static void pnv_pci_ioda2_free_table(struct iommu_tab

[PATCH v3 20/24] powerpc/powernv: Change prototypes to receive iommu

2015-01-29 Thread Alexey Kardashevskiy

This changes few functions to receive a powerpc_iommu pointer
rather than PE as they are going to be a part of upcoming
powerpc_iommu_ops callback set.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f542819..29bd7a4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1360,10 +1360,12 @@ static __be64 *pnv_alloc_tce_table(int nid,
return addr;
 }
 
-static long pnv_pci_ioda2_create_table(struct pnv_ioda_pe *pe,
+static long pnv_pci_ioda2_create_table(struct powerpc_iommu *iommu,
__u32 page_shift, __u32 window_shift, __u32 levels,
struct iommu_table *tbl)
 {
+   struct pnv_ioda_pe *pe = container_of(iommu, struct pnv_ioda_pe,
+   iommu);
int nid = pe->phb->hose->node;
void *addr;
unsigned long tce_table_size, left;
@@ -1419,9 +1421,11 @@ static void pnv_pci_ioda2_free_table(struct iommu_table 
*tbl)
iommu_reset_table(tbl, "ioda2");
 }
 
-static long pnv_pci_ioda2_set_window(struct pnv_ioda_pe *pe,
+static long pnv_pci_ioda2_set_window(struct powerpc_iommu *iommu,
struct iommu_table *tbl)
 {
+   struct pnv_ioda_pe *pe = container_of(iommu, struct pnv_ioda_pe,
+   iommu);
struct pnv_phb *phb = pe->phb;
const __be64 *swinvp;
int64_t rc;
@@ -1554,12 +1558,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 
/* The PE will reserve all possible 32-bits space */
pe->tce32_seg = 0;
-
end = (1 << ilog2(phb->ioda.m32_pci_base));
pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
end);
 
-   rc = pnv_pci_ioda2_create_table(pe, IOMMU_PAGE_SHIFT_4K,
+   rc = pnv_pci_ioda2_create_table(&pe->iommu, IOMMU_PAGE_SHIFT_4K,
ilog2(phb->ioda.m32_pci_base),
POWERPC_IOMMU_DEFAULT_LEVELS, tbl);
if (rc) {
@@ -1571,7 +1574,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
pe->iommu.tables[0].it_iommu = &pe->iommu;
pe->iommu.ops = &pnv_pci_ioda2_ops;
 
-   rc = pnv_pci_ioda2_set_window(pe, tbl);
+   rc = pnv_pci_ioda2_set_window(&pe->iommu, tbl);
if (rc) {
pe_err(pe, "Failed to configure 32-bit TCE table,"
   " err %ld\n", rc);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 18/24] powerpc/iommu: Split iommu_free_table into 2 helpers

2015-01-29 Thread Alexey Kardashevskiy

The iommu_free_table helper release memory it is using (the TCE table and
@it_map) and release the iommu_table struct as well. We might not want
the very last step as we store iommu_table in parent structures.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h |  1 +
 arch/powerpc/kernel/iommu.c  | 57 
 2 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index bf26d47..cc26eca 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -122,6 +122,7 @@ static inline void *get_iommu_table_base(struct device *dev)
 
 extern struct iommu_table *iommu_table_alloc(int node);
 /* Frees table for an individual device node */
+extern void iommu_reset_table(struct iommu_table *tbl, const char *node_name);
 extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
 
 /* Initializes an iommu_table based in values set in the passed-in
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 26feaff..5f87076 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -721,24 +721,46 @@ struct iommu_table *iommu_table_alloc(int node)
return &iommu->tables[0];
 }
 
+void iommu_reset_table(struct iommu_table *tbl, const char *node_name)
+{
+   if (!tbl)
+   return;
+
+   if (tbl->it_map) {
+   unsigned long bitmap_sz;
+   unsigned int order;
+
+   /*
+* In case we have reserved the first bit, we should not emit
+* the warning below.
+*/
+   if (tbl->it_offset == 0)
+   clear_bit(0, tbl->it_map);
+
+   /* verify that table contains no entries */
+   if (!bitmap_empty(tbl->it_map, tbl->it_size))
+   pr_warn("%s: Unexpected TCEs for %s\n", __func__,
+   node_name);
+
+   /* calculate bitmap size in bytes */
+   bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
+
+   /* free bitmap */
+   order = get_order(bitmap_sz);
+   free_pages((unsigned long) tbl->it_map, order);
+   }
+
+   memset(tbl, 0, sizeof(*tbl));
+}
+
 void iommu_free_table(struct iommu_table *tbl, const char *node_name)
 {
-   unsigned long bitmap_sz;
-   unsigned int order;
struct powerpc_iommu *iommu = tbl->it_iommu;
 
-   if (!tbl || !tbl->it_map) {
-   printk(KERN_ERR "%s: expected TCE map for %s\n", __func__,
-   node_name);
+   if (!tbl)
return;
-   }
 
-   /*
-* In case we have reserved the first bit, we should not emit
-* the warning below.
-*/
-   if (tbl->it_offset == 0)
-   clear_bit(0, tbl->it_map);
+   iommu_reset_table(tbl, node_name);
 
 #ifdef CONFIG_IOMMU_API
if (iommu->group) {
@@ -747,17 +769,6 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
}
 #endif
 
-   /* verify that table contains no entries */
-   if (!bitmap_empty(tbl->it_map, tbl->it_size))
-   pr_warn("%s: Unexpected TCEs for %s\n", __func__, node_name);
-
-   /* calculate bitmap size in bytes */
-   bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
-
-   /* free bitmap */
-   order = get_order(bitmap_sz);
-   free_pages((unsigned long) tbl->it_map, order);
-
/* free table */
kfree(iommu);
 }
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 19/24] powerpc/powernv: Implement multilevel TCE tables

2015-01-29 Thread Alexey Kardashevskiy

This adds multi-level TCE tables support to pnv_pci_ioda2_create_table()
and pnv_pci_ioda2_free_table() callbacks.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  |   4 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 125 +++---
 arch/powerpc/platforms/powernv/pci.c  |  19 +
 3 files changed, 122 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index cc26eca..283f70f 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -85,6 +85,8 @@ struct iommu_pool {
 struct iommu_table {
unsigned long  it_busno; /* Bus number this table belongs to */
unsigned long  it_size;  /* Size of iommu table in entries */
+   unsigned long  it_indirect_levels;
+   unsigned long  it_level_size;
unsigned long  it_offset;/* Offset into global table */
unsigned long  it_base;  /* mapped address of tce table */
unsigned long  it_index; /* which iommu table this is */
@@ -133,6 +135,8 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
 
 #define POWERPC_IOMMU_MAX_TABLES   1
 
+#define POWERPC_IOMMU_DEFAULT_LEVELS   1
+
 struct powerpc_iommu;
 
 struct powerpc_iommu_ops {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1f725d4..f542819 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1295,16 +1295,79 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
*phb,
__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
 
+static void pnv_free_tce_table(unsigned long addr, unsigned size,
+   unsigned level)
+{
+   addr &= ~(TCE_PCI_READ | TCE_PCI_WRITE);
+
+   if (level) {
+   long i;
+   u64 *tmp = (u64 *) addr;
+
+   for (i = 0; i < size; ++i) {
+   unsigned long hpa = be64_to_cpu(tmp[i]);
+
+   if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
+   continue;
+
+   pnv_free_tce_table((unsigned long) __va(hpa),
+   size, level - 1);
+   }
+   }
+
+   free_pages(addr, get_order(size << 3));
+}
+
+static __be64 *pnv_alloc_tce_table(int nid,
+   unsigned shift, unsigned levels, unsigned long *left)
+{
+   struct page *tce_mem = NULL;
+   __be64 *addr, *tmp;
+   unsigned order = max_t(unsigned, shift, PAGE_SHIFT) - PAGE_SHIFT;
+   unsigned long chunk = 1UL << shift, i;
+
+   tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
+   if (!tce_mem) {
+   pr_err("Failed to allocate a TCE memory\n");
+   return NULL;
+   }
+
+   if (!*left)
+   return NULL;
+
+   addr = page_address(tce_mem);
+   memset(addr, 0, chunk);
+
+   --levels;
+   if (!levels) {
+   /* This is last level, actual TCEs */
+   *left -= min(*left, chunk);
+   return addr;
+   }
+
+   for (i = 0; i < (chunk >> 3); ++i) {
+   /* We allocated required TCEs, mark the rest "page fault" */
+   if (!*left) {
+   addr[i] = cpu_to_be64(0);
+   continue;
+   }
+
+   tmp = pnv_alloc_tce_table(nid, shift, levels, left);
+   addr[i] = cpu_to_be64(__pa(tmp) |
+   TCE_PCI_READ | TCE_PCI_WRITE);
+   }
+
+   return addr;
+}
+
 static long pnv_pci_ioda2_create_table(struct pnv_ioda_pe *pe,
-   __u32 page_shift, __u32 window_shift,
+   __u32 page_shift, __u32 window_shift, __u32 levels,
struct iommu_table *tbl)
 {
int nid = pe->phb->hose->node;
-   struct page *tce_mem = NULL;
void *addr;
-   unsigned long tce_table_size;
-   int64_t rc;
-   unsigned order;
+   unsigned long tce_table_size, left;
+   unsigned shift;
 
if ((page_shift != 12) && (page_shift != 16) && (page_shift != 24))
return -EINVAL;
@@ -1312,20 +1375,27 @@ static long pnv_pci_ioda2_create_table(struct 
pnv_ioda_pe *pe,
if ((1ULL << window_shift) > memory_hotplug_max())
return -EINVAL;
 
+   if (!levels || (levels > 5))
+   return -EINVAL;
+
tce_table_size = (1ULL << (window_shift - page_shift)) * 8;
tce_table_size = max(0x1000UL, tce_table_size);
 
/* Allocate TCE table */
-   order = get_order(tce_table_size);
+#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
+   shift = ROUND_UP(window_shift - page_shift, levels) / levels;
+   shift += 3;
+   shift = max_t(unsigned, shift, IOMMU_PAGE_SHIFT_4K);
+   pr_info("Creating TCE table %08llx, %d levels, TCE table size = %lx\n",
+

[PATCH v3 08/24] powerpc/spapr: vfio: Switch from iommu_table to new powerpc_iommu

2015-01-29 Thread Alexey Kardashevskiy

Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a powerpc_iommu container
for TCE tables. Right now just one table is supported.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h|  18 ++--
 arch/powerpc/kernel/eeh.c   |   2 +-
 arch/powerpc/kernel/iommu.c |  34 
 arch/powerpc/platforms/powernv/pci-ioda.c   |  37 +---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  16 ++--
 arch/powerpc/platforms/powernv/pci.c|   2 +-
 arch/powerpc/platforms/powernv/pci.h|   4 +-
 arch/powerpc/platforms/pseries/iommu.c  |   9 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 131 
 9 files changed, 170 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 335e3d4..4fe 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -90,9 +90,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-#ifdef CONFIG_IOMMU_API
-   struct iommu_group *it_group;
-#endif
+   struct powerpc_iommu *it_iommu;
struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
@@ -126,13 +124,23 @@ extern void iommu_free_table(struct iommu_table *tbl, 
const char *node_name);
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);
+
+#define POWERPC_IOMMU_MAX_TABLES   1
+
+struct powerpc_iommu {
 #ifdef CONFIG_IOMMU_API
-extern void iommu_register_group(struct iommu_table *tbl,
+   struct iommu_group *group;
+#endif
+   struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
+};
+
+#ifdef CONFIG_IOMMU_API
+extern void iommu_register_group(struct powerpc_iommu *iommu,
 int pci_domain_number, unsigned long pe_num);
 extern int iommu_add_device(struct device *dev);
 extern void iommu_del_device(struct device *dev);
 #else
-static inline void iommu_register_group(struct iommu_table *tbl,
+static inline void iommu_register_group(struct powerpc_iommu *iommu,
int pci_domain_number,
unsigned long pe_num)
 {
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e1b6d8e..319eae3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1360,7 +1360,7 @@ static int dev_has_iommu_table(struct device *dev, void 
*data)
return 0;
 
tbl = get_iommu_table_base(dev);
-   if (tbl && tbl->it_group) {
+   if (tbl && tbl->it_iommu) {
*ppdev = pdev;
return 1;
}
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 2f7e92b..952939f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -712,17 +712,20 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid)
 
 struct iommu_table *iommu_table_alloc(int node)
 {
-   struct iommu_table *tbl;
+   struct powerpc_iommu *iommu;
 
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
+   iommu = kzalloc_node(sizeof(struct powerpc_iommu), GFP_KERNEL,
+  node);
+   iommu->tables[0].it_iommu = iommu;
 
-   return tbl;
+   return &iommu->tables[0];
 }
 
 void iommu_free_table(struct iommu_table *tbl, const char *node_name)
 {
unsigned long bitmap_sz;
unsigned int order;
+   struct powerpc_iommu *iommu = tbl->it_iommu;
 
if (!tbl || !tbl->it_map) {
printk(KERN_ERR "%s: expected TCE map for %s\n", __func__,
@@ -738,9 +741,9 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
clear_bit(0, tbl->it_map);
 
 #ifdef CONFIG_IOMMU_API
-   if (tbl->it_group) {
-   iommu_group_put(tbl->it_group);
-   BUG_ON(tbl->it_group);
+   if (iommu->group) {
+   iommu_group_put(iommu->group);
+   BUG_ON(iommu->group);
}
 #endif
 
@@ -756,7 +759,7 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
free_pages((unsigned long) tbl->it_map, order);
 
/* free table */
-   kfree(tbl);
+   kfree(iommu);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -888,11 +891,12 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
  */
 static void group_release(void *iommu_data)
 {
-   struct iommu_table *tbl = iommu_data;
-   tbl->it_group = NULL;
+   struct powerpc_iommu *iommu = iommu_data;
+
+   iommu->group = NULL;
 }
 
-void iommu_register_group(struct iommu_table *tbl,
+void iommu_register_group(

[PATCH v3 06/24] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table

2015-01-29 Thread Alexey Kardashevskiy

This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.

This adds the requirement for @it_ops to be initialized before calling
iommu_init_table() to make sure that we do not leave any IOMMU table
with iommu_table_ops uninitialized. This is not a parameter of
iommu_init_table() though as there will be cases when iommu_init_table()
will not be called on TCE tables used by VFIO.

This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
redundand prefixes.

This removes tce_xxx_rm handlers from ppc_md but does not add
them to iommu_table_ops as this will be done later if we decide to
support TCE hypercalls in real mode.

For pSeries, this always uses tce_buildmulti_pSeriesLP/
tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
present. The reason for this is we still have to support "multitce=off"
boot parameter in disable_multitce() and we do not want to walk through
all IOMMU tables in the system and replace "multi" callbacks with single
ones.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h| 17 +++
 arch/powerpc/include/asm/machdep.h  | 25 
 arch/powerpc/kernel/iommu.c | 46 +++--
 arch/powerpc/kernel/vio.c   |  5 
 arch/powerpc/platforms/cell/iommu.c |  8 +++--
 arch/powerpc/platforms/pasemi/iommu.c   |  7 +++--
 arch/powerpc/platforms/powernv/pci-ioda.c   |  2 ++
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  1 +
 arch/powerpc/platforms/powernv/pci.c| 23 ---
 arch/powerpc/platforms/powernv/pci.h|  1 +
 arch/powerpc/platforms/pseries/iommu.c  | 34 +++--
 arch/powerpc/sysdev/dart_iommu.c| 12 
 12 files changed, 93 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 45b07f6..eb5822d 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -43,6 +43,22 @@
 extern int iommu_is_off;
 extern int iommu_force_on;
 
+struct iommu_table_ops {
+   int (*set)(struct iommu_table *tbl,
+   long index, long npages,
+   unsigned long uaddr,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs);
+   void (*clear)(struct iommu_table *tbl,
+   long index, long npages);
+   unsigned long (*get)(struct iommu_table *tbl, long index);
+   void (*flush)(struct iommu_table *tbl);
+};
+
+/* These are used by VIO */
+extern struct iommu_table_ops iommu_table_lpar_multi_ops;
+extern struct iommu_table_ops iommu_table_pseries_ops;
+
 /*
  * IOMAP_MAX_ORDER defines the largest contiguous block
  * of dma space we can get.  IOMAP_MAX_ORDER = 13
@@ -77,6 +93,7 @@ struct iommu_table {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *it_group;
 #endif
+   struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
 
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c8175a3..2abe744 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -65,31 +65,6 @@ struct machdep_calls {
 * destroyed as well */
void(*hpte_clear_all)(void);
 
-   int (*tce_build)(struct iommu_table *tbl,
-long index,
-long npages,
-unsigned long uaddr,
-enum dma_data_direction direction,
-struct dma_attrs *attrs);
-   void(*tce_free)(struct iommu_table *tbl,
-   long index,
-   long npages);
-   unsigned long   (*tce_get)(struct iommu_table *tbl,
-   long index);
-   void(*tce_flush)(struct iommu_table *tbl);
-
-   /* _rm versions are for real mode use only */
-   int (*tce_build_rm)(struct iommu_table *tbl,
-long index,
-long npages,
-unsigned long uaddr,
-enum dma_data_direction direction,
-struct dma_attrs *attrs);
-   void(*tce_free_rm)(struct iommu_table *tbl,
-   long index,
-   long npages);
-   void(*tce_flush_rm)(struct iommu_table *tbl);
-
void __iomem *  (*ioremap)(phys_addr_t addr, unsigned long size,

[PATCH v3 16/24] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_create_table

2015-01-29 Thread Alexey Kardashevskiy

This is a part of moving TCE table allocation into an iommu_ops
callback to support multiple IOMMU groups per one VFIO container.

This is a mechanical patch.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++
 1 file changed, 65 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index ebfea0a..95d9119 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1295,6 +1295,62 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
*phb,
__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
 
+static long pnv_pci_ioda2_create_table(struct pnv_ioda_pe *pe,
+   __u32 page_shift, __u32 window_shift,
+   struct iommu_table *tbl)
+{
+   int nid = pe->phb->hose->node;
+   struct page *tce_mem = NULL;
+   void *addr;
+   unsigned long tce_table_size;
+   int64_t rc;
+   unsigned order;
+
+   if ((page_shift != 12) && (page_shift != 16) && (page_shift != 24))
+   return -EINVAL;
+
+   if ((1ULL << window_shift) > memory_hotplug_max())
+   return -EINVAL;
+
+   tce_table_size = (1ULL << (window_shift - page_shift)) * 8;
+   tce_table_size = max(0x1000UL, tce_table_size);
+
+   /* Allocate TCE table */
+   order = get_order(tce_table_size);
+
+   tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
+   if (!tce_mem) {
+   pr_err("Failed to allocate a TCE memory, order=%d\n", order);
+   rc = -ENOMEM;
+   goto fail;
+   }
+   addr = page_address(tce_mem);
+   memset(addr, 0, tce_table_size);
+
+   /* Setup linux iommu table */
+   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
+   page_shift);
+
+   tbl->it_ops = &pnv_ioda2_iommu_ops;
+   iommu_init_table(tbl, nid);
+
+   return 0;
+fail:
+   if (tce_mem)
+   __free_pages(tce_mem, get_order(tce_table_size));
+
+   return rc;
+}
+
+static void pnv_pci_ioda2_free_table(struct iommu_table *tbl)
+{
+   if (!tbl->it_size)
+   return;
+
+   free_pages(tbl->it_base, get_order(tbl->it_size << 3));
+   memset(tbl, 0, sizeof(struct iommu_table));
+}
+
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
uint16_t window_id = (pe->pe_number << 1 ) + 1;
@@ -1365,11 +1421,9 @@ static struct powerpc_iommu_ops pnv_pci_ioda2_ops = {
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
   struct pnv_ioda_pe *pe)
 {
-   struct page *tce_mem = NULL;
-   void *addr;
const __be64 *swinvp;
-   struct iommu_table *tbl;
-   unsigned int tce_table_size, end;
+   unsigned int end;
+   struct iommu_table *tbl = &pe->iommu.tables[0];
int64_t rc;
 
/* We shouldn't already have a 32-bit DMA associated */
@@ -1378,31 +1432,20 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 
/* The PE will reserve all possible 32-bits space */
pe->tce32_seg = 0;
+
end = (1 << ilog2(phb->ioda.m32_pci_base));
-   tce_table_size = (end / 0x1000) * 8;
pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
end);
 
-   /* Allocate TCE table */
-   tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-  get_order(tce_table_size));
-   if (!tce_mem) {
-   pe_err(pe, "Failed to allocate a 32-bit TCE memory\n");
-   goto fail;
+   rc = pnv_pci_ioda2_create_table(pe, IOMMU_PAGE_SHIFT_4K,
+   ilog2(phb->ioda.m32_pci_base), tbl);
+   if (rc) {
+   pe_err(pe, "Failed to create 32-bit TCE table, err %ld", rc);
+   return;
}
-   addr = page_address(tce_mem);
-   memset(addr, 0, tce_table_size);
 
/* Setup iommu */
pe->iommu.tables[0].it_iommu = &pe->iommu;
-
-   /* Setup linux iommu table */
-   tbl = &pe->iommu.tables[0];
-   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
-   IOMMU_PAGE_SHIFT_4K);
-
-   tbl->it_ops = &pnv_ioda2_iommu_ops;
-   iommu_init_table(tbl, phb->hose->node);
pe->iommu.ops = &pnv_pci_ioda2_ops;
 
/*
@@ -1447,8 +1490,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 fail:
if (pe->tce32_seg >= 0)
pe->tce32_seg = -1;
-   if (tce_mem)
-   __free_pages(tce_mem, get_order(tce_table_size));
+   pnv_pci_ioda2_free_table(tbl);
 }
 
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 17/24] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

2015-01-29 Thread Alexey Kardashevskiy

This is a part of moving DMA window programming to an iommu_ops
callback.

This is a mechanical patch.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 84 ---
 1 file changed, 56 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 95d9119..1f725d4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1351,6 +1351,57 @@ static void pnv_pci_ioda2_free_table(struct iommu_table 
*tbl)
memset(tbl, 0, sizeof(struct iommu_table));
 }
 
+static long pnv_pci_ioda2_set_window(struct pnv_ioda_pe *pe,
+   struct iommu_table *tbl)
+{
+   struct pnv_phb *phb = pe->phb;
+   const __be64 *swinvp;
+   int64_t rc;
+   const __u64 start_addr = tbl->it_offset << tbl->it_page_shift;
+   const __u64 win_size = tbl->it_size << tbl->it_page_shift;
+
+   pe_info(pe, "Setting up window at %llx..%llx pagesize=0x%x 
tablesize=0x%lx\n",
+   start_addr, start_addr + win_size - 1,
+   1UL << tbl->it_page_shift, tbl->it_size << 3);
+
+   pe->iommu.tables[0] = *tbl;
+   tbl = &pe->iommu.tables[0];
+   tbl->it_iommu = &pe->iommu;
+
+   /*
+* Map TCE table through TVT. The TVE index is the PE number
+* shifted by 1 bit for 32-bits DMA space.
+*/
+   rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
+   pe->pe_number << 1, 1, __pa(tbl->it_base),
+   tbl->it_size << 3, 1ULL << tbl->it_page_shift);
+   if (rc) {
+   pe_err(pe, "Failed to configure TCE table, err %ld\n", rc);
+   goto fail;
+   }
+
+   /* OPAL variant of PHB3 invalidated TCEs */
+   swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
+   if (swinvp) {
+   /* We need a couple more fields -- an address and a data
+* to or.  Since the bus is only printed out on table free
+* errors, and on the first pass the data will be a relative
+* bus number, print that out instead.
+*/
+   pe->tce_inval_reg_phys = be64_to_cpup(swinvp);
+   tbl->it_index = (unsigned long)ioremap(pe->tce_inval_reg_phys,
+   8);
+   tbl->it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
+   }
+
+   return 0;
+fail:
+   if (pe->tce32_seg >= 0)
+   pe->tce32_seg = -1;
+
+   return rc;
+}
+
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
uint16_t window_id = (pe->pe_number << 1 ) + 1;
@@ -1421,7 +1472,6 @@ static struct powerpc_iommu_ops pnv_pci_ioda2_ops = {
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
   struct pnv_ioda_pe *pe)
 {
-   const __be64 *swinvp;
unsigned int end;
struct iommu_table *tbl = &pe->iommu.tables[0];
int64_t rc;
@@ -1448,31 +1498,14 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
pe->iommu.tables[0].it_iommu = &pe->iommu;
pe->iommu.ops = &pnv_pci_ioda2_ops;
 
-   /*
-* Map TCE table through TVT. The TVE index is the PE number
-* shifted by 1 bit for 32-bits DMA space.
-*/
-   rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
-   pe->pe_number << 1, 1, __pa(tbl->it_base),
-   tbl->it_size << 3, 1ULL << tbl->it_page_shift);
+   rc = pnv_pci_ioda2_set_window(pe, tbl);
if (rc) {
pe_err(pe, "Failed to configure 32-bit TCE table,"
   " err %ld\n", rc);
-   goto fail;
-   }
-
-   /* OPAL variant of PHB3 invalidated TCEs */
-   swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
-   if (swinvp) {
-   /* We need a couple more fields -- an address and a data
-* to or.  Since the bus is only printed out on table free
-* errors, and on the first pass the data will be a relative
-* bus number, print that out instead.
-*/
-   pe->tce_inval_reg_phys = be64_to_cpup(swinvp);
-   tbl->it_index = (unsigned long)ioremap(pe->tce_inval_reg_phys,
-   8);
-   tbl->it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
+   pnv_pci_ioda2_free_table(tbl);
+   if (pe->tce32_seg >= 0)
+   pe->tce32_seg = -1;
+   return;
}
 
iommu_register_group(&pe->iommu, phb->hose->global_number,
@@ -1486,11 +1519,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 
/* Also create a bypass window */
pnv_pci_ioda2_setup_bypass_pe(phb, pe);
-   return;
-fail:
-   if

[PATCH v3 15/24] poweppc/powernv/ioda2: Rework iommu_table creation

2015-01-29 Thread Alexey Kardashevskiy

This moves iommu_table creation to the beginning. This is a mechanical
patch.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6d279d5..ebfea0a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1393,27 +1393,31 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
addr = page_address(tce_mem);
memset(addr, 0, tce_table_size);
 
+   /* Setup iommu */
+   pe->iommu.tables[0].it_iommu = &pe->iommu;
+
+   /* Setup linux iommu table */
+   tbl = &pe->iommu.tables[0];
+   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
+   IOMMU_PAGE_SHIFT_4K);
+
+   tbl->it_ops = &pnv_ioda2_iommu_ops;
+   iommu_init_table(tbl, phb->hose->node);
+   pe->iommu.ops = &pnv_pci_ioda2_ops;
+
/*
 * Map TCE table through TVT. The TVE index is the PE number
 * shifted by 1 bit for 32-bits DMA space.
 */
rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
-   pe->pe_number << 1, 1, __pa(addr),
-   tce_table_size, 0x1000);
+   pe->pe_number << 1, 1, __pa(tbl->it_base),
+   tbl->it_size << 3, 1ULL << tbl->it_page_shift);
if (rc) {
pe_err(pe, "Failed to configure 32-bit TCE table,"
   " err %ld\n", rc);
goto fail;
}
 
-   /* Setup iommu */
-   pe->iommu.tables[0].it_iommu = &pe->iommu;
-
-   /* Setup linux iommu table */
-   tbl = &pe->iommu.tables[0];
-   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
-   IOMMU_PAGE_SHIFT_4K);
-
/* OPAL variant of PHB3 invalidated TCEs */
swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
if (swinvp) {
@@ -1427,14 +1431,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
8);
tbl->it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
}
-   tbl->it_ops = &pnv_ioda2_iommu_ops;
-   iommu_init_table(tbl, phb->hose->node);
-   pe->iommu.ops = &pnv_pci_ioda2_ops;
+
iommu_register_group(&pe->iommu, phb->hose->global_number,
pe->pe_number);
 
if (pe->pdev)
-   set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
+   set_iommu_table_base_and_group(&pe->pdev->dev,
+   &pe->iommu.tables[0]);
else
pnv_ioda_setup_bus_dma(pe, pe->pbus, true);
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 14/24] vfio: powerpc/spapr: Register memory

2015-01-29 Thread Alexey Kardashevskiy

The existing implementation accounts the whole DMA window in
the locked_vm counter which is going to be even worse with multiple
containers and huge DMA windows.

This introduces 2 ioctls to register/unregister DMA memory which
receive user space address and size of the memory region which
needs to be pinned/unpinned and counted in locked_vm.

If any memory region was registered, all subsequent DMA map requests
should address already pinned memory. If no memory was registered,
then the amount of memory required for a single default memory will be
accounted when the container is enabled and every map/unmap will pin/unpin
a page.

Dynamic DMA window and in-kernel acceleration will require memory to
be registered in order to work.

The accounting is done per VFIO container. When the support of
multiple groups per container is added, we will have accurate locked_vm
accounting.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 333 
 include/uapi/linux/vfio.h   |  29 
 2 files changed, 331 insertions(+), 31 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8256275..d0987ae 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -86,8 +86,169 @@ struct tce_container {
struct mutex lock;
struct iommu_group *grp;
bool enabled;
+   struct list_head mem_list;
 };
 
+struct tce_memory {
+   struct list_head next;
+   struct rcu_head rcu;
+   __u64 vaddr;
+   __u64 size;
+   __u64 pfns[];
+};
+
+static void tce_unpin_pages(struct tce_container *container,
+   struct tce_memory *mem, __u64 vaddr, __u64 size)
+{
+   __u64 off;
+   struct page *page = NULL;
+
+
+   for (off = 0; off < size; off += PAGE_SIZE) {
+   if (!mem->pfns[off >> PAGE_SHIFT])
+   continue;
+
+   page = pfn_to_page(mem->pfns[off >> PAGE_SHIFT]);
+   if (!page)
+   continue;
+
+   put_page(page);
+   mem->pfns[off >> PAGE_SHIFT] = 0;
+   }
+}
+
+static void release_tce_memory(struct rcu_head *head)
+{
+   struct tce_memory *mem = container_of(head, struct tce_memory, rcu);
+
+   kfree(mem);
+}
+
+static void tce_do_unregister_pages(struct tce_container *container,
+   struct tce_memory *mem)
+{
+   tce_unpin_pages(container, mem, mem->vaddr, mem->size);
+   decrement_locked_vm(mem->size);
+   list_del_rcu(&mem->next);
+   call_rcu_sched(&mem->rcu, release_tce_memory);
+}
+
+static long tce_unregister_pages(struct tce_container *container,
+   __u64 vaddr, __u64 size)
+{
+   struct tce_memory *mem, *memtmp;
+
+   if (container->enabled)
+   return -EBUSY;
+
+   if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK))
+   return -EINVAL;
+
+   list_for_each_entry_safe(mem, memtmp, &container->mem_list, next) {
+   if ((mem->vaddr == vaddr) && (mem->size == size)) {
+   tce_do_unregister_pages(container, mem);
+   return 0;
+   }
+   }
+
+   return -ENOENT;
+}
+
+static long tce_pin_pages(struct tce_container *container,
+   struct tce_memory *mem, __u64 vaddr, __u64 size)
+{
+   __u64 off;
+   struct page *page = NULL;
+
+   for (off = 0; off < size; off += PAGE_SIZE) {
+   if (1 != get_user_pages_fast(vaddr + off,
+   1/* pages */, 1/* iswrite */, &page)) {
+   tce_unpin_pages(container, mem, vaddr, off);
+   return -EFAULT;
+   }
+
+   mem->pfns[off >> PAGE_SHIFT] = page_to_pfn(page);
+   }
+
+   return 0;
+}
+
+static long tce_register_pages(struct tce_container *container,
+   __u64 vaddr, __u64 size)
+{
+   long ret;
+   struct tce_memory *mem;
+
+   if (container->enabled)
+   return -EBUSY;
+
+   if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK) ||
+   ((vaddr + size) < vaddr))
+   return -EINVAL;
+
+   /* Any overlap with registered chunks? */
+   rcu_read_lock();
+   list_for_each_entry_rcu(mem, &container->mem_list, next) {
+   if ((mem->vaddr < (vaddr + size)) &&
+   (vaddr < (mem->vaddr + mem->size))) {
+   ret = -EBUSY;
+   goto unlock_exit;
+   }
+   }
+
+   ret = try_increment_locked_vm(size >> PAGE_SHIFT);
+   if (ret)
+   goto unlock_exit;
+
+   mem = kzalloc(sizeof(*mem) + (size >> (PAGE_SHIFT - 3)), GFP_KERNEL);
+   if (!mem)
+   goto unlock_exit;
+
+   if (tce_pin_pages(container, mem, vaddr, size))
+   goto free_exit;
+
+   mem->vaddr = vaddr;
+   mem->size = s

[PATCH v3 09/24] powerpc/iommu: Fix IOMMU ownership control functions

2015-01-29 Thread Alexey Kardashevskiy

This adds missing locks in iommu_take_ownership()/
iommu_release_ownership().

This marks all pages busy in iommu_table::it_map in order to catch
errors if there is an attempt to use this table while ownership over it
is taken.

This only clears TCE content if there is no page marked busy in it_map.
Clearing must be done outside of the table locks as iommu_clear_tce()
called from iommu_clear_tces_and_put_pages() does this.

Signed-off-by: Alexey Kardashevskiy 
---
Note: we might want to get rid of it as this patchset removes it_map
from tables passed to VFIO.

Changes:
v5:
* do not store bit#0 value, it has to be set for zero-based table
anyway
* removed test_and_clear_bit
* only disable bypass if succeeded
---
 arch/powerpc/kernel/iommu.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 952939f..407d0d6 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1024,33 +1024,48 @@ EXPORT_SYMBOL_GPL(iommu_tce_build);
 
 int iommu_take_ownership(struct iommu_table *tbl)
 {
-   unsigned long sz = (tbl->it_size + 7) >> 3;
+   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
+   int ret = 0;
+
+   spin_lock_irqsave(&tbl->large_pool.lock, flags);
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_lock(&tbl->pools[i].lock);
 
if (tbl->it_offset == 0)
clear_bit(0, tbl->it_map);
 
if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
pr_err("iommu_tce: it_map is not empty");
-   return -EBUSY;
+   ret = -EBUSY;
+   if (tbl->it_offset == 0)
+   set_bit(0, tbl->it_map);
+   } else {
+   memset(tbl->it_map, 0xff, sz);
}
 
-   memset(tbl->it_map, 0xff, sz);
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_unlock(&tbl->pools[i].lock);
+   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 
/*
 * Disable iommu bypass, otherwise the user can DMA to all of
 * our physical memory via the bypass window instead of just
 * the pages that has been explicitly mapped into the iommu
 */
-   if (tbl->set_bypass)
+   if (!ret && tbl->set_bypass)
tbl->set_bypass(tbl, false);
 
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
 void iommu_release_ownership(struct iommu_table *tbl)
 {
-   unsigned long sz = (tbl->it_size + 7) >> 3;
+   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
+
+   spin_lock_irqsave(&tbl->large_pool.lock, flags);
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_lock(&tbl->pools[i].lock);
 
memset(tbl->it_map, 0, sz);
 
@@ -1058,6 +1073,10 @@ void iommu_release_ownership(struct iommu_table *tbl)
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
 
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_unlock(&tbl->pools[i].lock);
+   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
+
/* The kernel owns the device now, we can restore the iommu bypass */
if (tbl->set_bypass)
tbl->set_bypass(tbl, true);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 13/24] powerpc/pseries/lpar: Enable VFIO

2015-01-29 Thread Alexey Kardashevskiy

The previous patch introduced iommu_table_ops::exchange() callback
which effectively disabled VFIO on pseries. This implements exchange()
for pseries/lpar so VFIO can work in nested guests.

Since exchange() callback returns an old TCE, it has to call H_GET_TCE
for every TCE being put to the table so VFIO performance in guests
running under PR KVM is expected to be slower than in guests running under
HV KVM or bare metal hosts.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v5:
* added global lock for xchg operations
* added missing be64_to_cpu(oldtce)
---
 arch/powerpc/platforms/pseries/iommu.c | 44 --
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index f537e6e..a903a27 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -137,14 +137,25 @@ static void tce_freemulti_pSeriesLP(struct iommu_table*, 
long, long);
 
 static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
long npages, unsigned long uaddr,
+   unsigned long *old_tces,
enum dma_data_direction direction,
struct dma_attrs *attrs)
 {
u64 rc = 0;
u64 proto_tce, tce;
u64 rpn;
-   int ret = 0;
+   int ret = 0, i = 0;
long tcenum_start = tcenum, npages_start = npages;
+   static spinlock_t get_tces_lock;
+   static bool get_tces_lock_initialized;
+
+   if (old_tces) {
+   if (!get_tces_lock_initialized) {
+   spin_lock_init(&get_tces_lock);
+   get_tces_lock_initialized = true;
+   }
+   spin_lock(&get_tces_lock);
+   }
 
rpn = __pa(uaddr) >> TCE_SHIFT;
proto_tce = TCE_PCI_READ;
@@ -153,6 +164,14 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
 
while (npages--) {
tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT;
+   if (old_tces) {
+   unsigned long oldtce = 0;
+
+   plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12,
+   &oldtce);
+   old_tces[i] = be64_to_cpu(oldtce);
+   i++;
+   }
rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce);
 
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
@@ -173,13 +192,18 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
tcenum++;
rpn++;
}
+
+   if (old_tces)
+   spin_unlock(&get_tces_lock);
+
return ret;
 }
 
 static DEFINE_PER_CPU(__be64 *, tce_page);
 
-static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
+static int tce_xchg_pSeriesLP(struct iommu_table *tbl, long tcenum,
 long npages, unsigned long uaddr,
+unsigned long *old_tces,
 enum dma_data_direction direction,
 struct dma_attrs *attrs)
 {
@@ -194,6 +218,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
 
if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+  old_tces,
   direction, attrs);
}
 
@@ -210,6 +235,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
if (!tcep) {
local_irq_restore(flags);
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+   old_tces,
direction, attrs);
}
__this_cpu_write(tce_page, tcep);
@@ -231,6 +257,10 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
for (l = 0; l < limit; l++) {
tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) 
<< TCE_RPN_SHIFT);
rpn++;
+   if (old_tces)
+   plpar_tce_get((u64)tbl->it_index,
+   (u64)(tcenum + l) << 12,
+   &old_tces[tcenum + l]);
}
 
rc = plpar_tce_put_indirect((u64)tbl->it_index,
@@ -261,6 +291,15 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
return ret;
 }
 
+static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
+long npages, unsigned long uaddr,
+

[PATCH v3 10/24] powerpc/powernv/ioda2: Rework IOMMU ownership control

2015-01-29 Thread Alexey Kardashevskiy

At the moment the iommu_table struct has a set_bypass() which enables/
disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
which calls this callback when external IOMMU users such as VFIO are
about to get over a PHB.

The set_bypass() callback is not really an iommu_table function but
IOMMU/PE function. This introduces a powerpc_iommu_ops struct and
adds a set_ownership() callback to it which is called when an external
user takes control over the IOMMU.

This renames set_bypass() to set_ownership() as it is not necessarily
just enabling bypassing, it can be something else/more so let's give it
more generic name. The bool parameter is inverted.

The callback is implemented for IODA2 only.

This replaces iommu_take_ownership()/iommu_release_ownership() calls
with the callback calls and it is up to the platform code to call
iommu_take_ownership()/iommu_release_ownership() if needed. Next patches
will remove these calls from IODA2 code.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  | 18 +--
 arch/powerpc/kernel/iommu.c   | 53 +++
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 -
 drivers/vfio/vfio_iommu_spapr_tce.c   | 19 ---
 4 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 4fe..ba16aa0 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -92,7 +92,6 @@ struct iommu_table {
unsigned long  it_page_shift;/* table iommu page size */
struct powerpc_iommu *it_iommu;
struct iommu_table_ops *it_ops;
-   void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
 
 /* Pure 2^n version of get_order */
@@ -127,11 +126,24 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
 
 #define POWERPC_IOMMU_MAX_TABLES   1
 
+struct powerpc_iommu;
+
+struct powerpc_iommu_ops {
+   /*
+* Switches ownership from the kernel itself to an external
+* user. While onwership is enabled, the kernel cannot use IOMMU
+* for itself.
+*/
+   void (*set_ownership)(struct powerpc_iommu *iommu,
+   bool enable);
+};
+
 struct powerpc_iommu {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *group;
 #endif
struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
+   struct powerpc_iommu_ops *ops;
 };
 
 #ifdef CONFIG_IOMMU_API
@@ -219,8 +231,8 @@ extern unsigned long iommu_clear_tce(struct iommu_table 
*tbl,
unsigned long entry);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
-extern int iommu_take_ownership(struct iommu_table *tbl);
-extern void iommu_release_ownership(struct iommu_table *tbl);
+extern int iommu_take_ownership(struct powerpc_iommu *iommu);
+extern void iommu_release_ownership(struct powerpc_iommu *iommu);
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 407d0d6..9d06425 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1022,7 +1022,7 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned 
long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_build);
 
-int iommu_take_ownership(struct iommu_table *tbl)
+static int iommu_table_take_ownership(struct iommu_table *tbl)
 {
unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
int ret = 0;
@@ -1047,19 +1047,36 @@ int iommu_take_ownership(struct iommu_table *tbl)
spin_unlock(&tbl->pools[i].lock);
spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 
-   /*
-* Disable iommu bypass, otherwise the user can DMA to all of
-* our physical memory via the bypass window instead of just
-* the pages that has been explicitly mapped into the iommu
-*/
-   if (!ret && tbl->set_bypass)
-   tbl->set_bypass(tbl, false);
-
-   return ret;
+   return 0;
+}
+
+static void iommu_table_release_ownership(struct iommu_table *tbl);
+
+int iommu_take_ownership(struct powerpc_iommu *iommu)
+{
+   int i, j, rc = 0;
+
+   for (i = 0; i < POWERPC_IOMMU_MAX_TABLES; ++i) {
+   struct iommu_table *tbl = &iommu->tables[i];
+
+   if (!tbl->it_map)
+   continue;
+
+   rc = iommu_table_take_ownership(tbl);
+   if (rc) {
+   for (j = 0; j < i; ++j)
+   iommu_table_release_ownership(
+   &iommu->tables[j]);
+
+   return rc;
+   }
+   }
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
-void iommu_release_ownership(struct iommu_table *tbl)
+static void iommu_table_release_ownership(struct iommu_table *tbl)
 {
unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 
@@ -1076,10 +1093,18 @@ vo

[PATCH v3 07/24] powerpc/iommu: Introduce iommu_table_alloc() helper

2015-01-29 Thread Alexey Kardashevskiy

This replaces multiple calls of kzalloc_node() with a new
iommu_table_alloc() helper. Right now it calls kzalloc_node() but
later it will be modified to allocate a powerpc_iommu struct with
a single iommu_table in it.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h   |  1 +
 arch/powerpc/kernel/iommu.c|  9 +
 arch/powerpc/platforms/powernv/pci.c   |  2 +-
 arch/powerpc/platforms/pseries/iommu.c | 12 
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index eb5822d..335e3d4 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -117,6 +117,7 @@ static inline void *get_iommu_table_base(struct device *dev)
return dev->archdata.dma_data.iommu_table_base;
 }
 
+extern struct iommu_table *iommu_table_alloc(int node);
 /* Frees table for an individual device node */
 extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
 
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c51ad3e..2f7e92b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -710,6 +710,15 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid)
return tbl;
 }
 
+struct iommu_table *iommu_table_alloc(int node)
+{
+   struct iommu_table *tbl;
+
+   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
+
+   return tbl;
+}
+
 void iommu_free_table(struct iommu_table *tbl, const char *node_name)
 {
unsigned long bitmap_sz;
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index c4782b1..bbe529b 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -693,7 +693,7 @@ static struct iommu_table *pnv_pci_setup_bml_iommu(struct 
pci_controller *hose)
   hose->dn->full_name);
return NULL;
}
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, hose->node);
+   tbl = iommu_table_alloc(hose->node);
if (WARN_ON(!tbl))
return NULL;
pnv_pci_setup_iommu_table(tbl, __va(be64_to_cpup(basep)),
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 1aa1815..bc14299 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -617,8 +617,7 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
pci->phb->dma_window_size = 0x800ul;
pci->phb->dma_window_base_cur = 0x800ul;
 
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  pci->phb->node);
+   tbl = iommu_table_alloc(pci->phb->node);
 
iommu_table_setparms(pci->phb, dn, tbl);
tbl->it_ops = &iommu_table_pseries_ops;
@@ -669,8 +668,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 pdn->full_name, ppci->iommu_table);
 
if (!ppci->iommu_table) {
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  ppci->phb->node);
+   tbl = iommu_table_alloc(ppci->phb->node);
iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window);
tbl->it_ops = &iommu_table_lpar_multi_ops;
ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
@@ -697,8 +695,7 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
struct pci_controller *phb = PCI_DN(dn)->phb;
 
pr_debug(" --> first child, no bridge. Allocating iommu 
table.\n");
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  phb->node);
+   tbl = iommu_table_alloc(phb->node);
iommu_table_setparms(phb, dn, tbl);
tbl->it_ops = &iommu_table_pseries_ops;
PCI_DN(dn)->iommu_table = iommu_init_table(tbl, phb->node);
@@ -1120,8 +1117,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev 
*dev)
 
pci = PCI_DN(pdn);
if (!pci->iommu_table) {
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  pci->phb->node);
+   tbl = iommu_table_alloc(pci->phb->node);
iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window);
tbl->it_ops = &iommu_table_lpar_multi_ops;
pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 12/24] powerpc/iommu/powernv: Release replaced TCE

2015-01-29 Thread Alexey Kardashevskiy

At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE(s) so the caller can release
the pages afterwards.

This implements exchange() for IODA2 only. This adds a requirement
for a platform to have exchange() implemented so from now on IODA2 is
the only supported PHB for VFIO-SPAPR.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  | 13 +---
 arch/powerpc/kernel/iommu.c   | 50 +++
 arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++
 arch/powerpc/platforms/powernv/pci.c  | 22 ++
 arch/powerpc/platforms/powernv/pci.h  |  4 +++
 drivers/vfio/vfio_iommu_spapr_tce.c   | 36 ++
 6 files changed, 92 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index ba16aa0..bf26d47 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -49,6 +49,12 @@ struct iommu_table_ops {
unsigned long uaddr,
enum dma_data_direction direction,
struct dma_attrs *attrs);
+   int (*exchange)(struct iommu_table *tbl,
+   long index, long npages,
+   unsigned long uaddr,
+   unsigned long *old_tces,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs);
void (*clear)(struct iommu_table *tbl,
long index, long npages);
unsigned long (*get)(struct iommu_table *tbl, long index);
@@ -225,10 +231,9 @@ extern int iommu_tce_clear_param_check(struct iommu_table 
*tbl,
unsigned long npages);
 extern int iommu_tce_put_param_check(struct iommu_table *tbl,
unsigned long ioba, unsigned long tce);
-extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
-   unsigned long hwaddr, enum dma_data_direction direction);
-extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
-   unsigned long entry);
+extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
+   unsigned long hwaddr, unsigned long *oldtce,
+   enum dma_data_direction direction);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
 extern int iommu_take_ownership(struct powerpc_iommu *iommu);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9d06425..26feaff 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -974,44 +974,18 @@ int iommu_tce_put_param_check(struct iommu_table *tbl,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_put_param_check);
 
-unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry)
-{
-   unsigned long oldtce;
-   struct iommu_pool *pool = get_pool(tbl, entry);
-
-   spin_lock(&(pool->lock));
-
-   oldtce = tbl->it_ops->get(tbl, entry);
-   if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))
-   tbl->it_ops->clear(tbl, entry, 1);
-   else
-   oldtce = 0;
-
-   spin_unlock(&(pool->lock));
-
-   return oldtce;
-}
-EXPORT_SYMBOL_GPL(iommu_clear_tce);
-
 /*
  * hwaddr is a kernel virtual address here (0xc... bazillion),
  * tce_build converts it to a physical address.
  */
-int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
-   unsigned long hwaddr, enum dma_data_direction direction)
+long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
+   unsigned long hwaddr, unsigned long *oldtce,
+   enum dma_data_direction direction)
 {
-   int ret = -EBUSY;
-   unsigned long oldtce;
-   struct iommu_pool *pool = get_pool(tbl, entry);
+   long ret;
 
-   spin_lock(&(pool->lock));
-
-   oldtce = tbl->it_ops->get(tbl, entry);
-   /* Add new entry if it is not busy */
-   if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
-   ret = tbl->it_ops->set(tbl, entry, 1, hwaddr, direction, NULL);
-
-   spin_unlock(&(pool->lock));
+   ret = tbl->it_ops->exchange(tbl, entry, 1, hwaddr, oldtce,
+   direction, NULL);
 
/* if (unlikely(ret))
pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%l

[PATCH v3 11/24] powerpc/powernv/ioda/ioda2: Rework tce_build()/tce_free()

2015-01-29 Thread Alexey Kardashevskiy

The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
supposed to be called on IODA1/2 and not called on p5ioc2. It receives
start and end host addresses of TCE table. This approach makes it possible
to get pnv_pci_ioda_tce_invalidate() unintentionally called on p5ioc2.
Another issue is that IODA2 needs PCI addresses to invalidate the cache
and those can be calculated from host addresses but since we are going
to implement multi-level TCE tables, calculating PCI address from
a host address might get either tricky or ugly as TCE table remains flat
on PCI bus but not in RAM.

This defines separate iommu_table_ops callbacks for p5ioc2 and IODA1/2
PHBs. They all call common pnv_tce_build/pnv_tce_free/pnv_tce_get helpers
but call PHB specific TCE invalidation helper (when needed).

This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and
number of pages which are PCI addresses shifted by IOMMU page shift.

The patch is pretty mechanical and behaviour is not expected to change.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c   | 92 ++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  8 ++-
 arch/powerpc/platforms/powernv/pci.c| 76 +---
 arch/powerpc/platforms/powernv/pci.h|  7 ++-
 4 files changed, 110 insertions(+), 73 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index a33a116..dfc56fc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1041,18 +1041,20 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe 
*pe,
}
 }
 
-static void pnv_pci_ioda1_tce_invalidate(struct pnv_ioda_pe *pe,
-struct iommu_table *tbl,
-__be64 *startp, __be64 *endp, bool rm)
+static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
+   unsigned long index, unsigned long npages, bool rm)
 {
+   struct pnv_ioda_pe *pe = container_of(tbl->it_iommu,
+   struct pnv_ioda_pe, iommu);
__be64 __iomem *invalidate = rm ?
(__be64 __iomem *)pe->tce_inval_reg_phys :
(__be64 __iomem *)tbl->it_index;
unsigned long start, end, inc;
const unsigned shift = tbl->it_page_shift;
 
-   start = __pa(startp);
-   end = __pa(endp);
+   start = __pa((__be64 *)tbl->it_base + index - tbl->it_offset);
+   end = __pa((__be64 *)tbl->it_base + index - tbl->it_offset +
+   npages - 1);
 
/* BML uses this case for p6/p7/galaxy2: Shift addr and put in node */
if (tbl->it_busno) {
@@ -1088,10 +1090,40 @@ static void pnv_pci_ioda1_tce_invalidate(struct 
pnv_ioda_pe *pe,
 */
 }
 
-static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
-struct iommu_table *tbl,
-__be64 *startp, __be64 *endp, bool rm)
+static int pnv_ioda1_tce_build_vm(struct iommu_table *tbl, long index,
+   long npages, unsigned long uaddr,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs)
 {
+   long ret = pnv_tce_build(tbl, index, npages, uaddr, direction,
+   attrs);
+
+   if (!ret && (tbl->it_type & TCE_PCI_SWINV_CREATE))
+   pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
+
+   return ret;
+}
+
+static void pnv_ioda1_tce_free_vm(struct iommu_table *tbl, long index,
+   long npages)
+{
+   pnv_tce_free(tbl, index, npages);
+
+   if (tbl->it_type & TCE_PCI_SWINV_FREE)
+   pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
+}
+
+struct iommu_table_ops pnv_ioda1_iommu_ops = {
+   .set = pnv_ioda1_tce_build_vm,
+   .clear = pnv_ioda1_tce_free_vm,
+   .get = pnv_tce_get,
+};
+
+static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
+   unsigned long index, unsigned long npages, bool rm)
+{
+   struct pnv_ioda_pe *pe = container_of(tbl->it_iommu,
+   struct pnv_ioda_pe, iommu);
unsigned long start, end, inc;
__be64 __iomem *invalidate = rm ?
(__be64 __iomem *)pe->tce_inval_reg_phys :
@@ -1104,9 +1136,9 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
end = start;
 
/* Figure out the start, end and step */
-   inc = tbl->it_offset + (((u64)startp - tbl->it_base) / sizeof(u64));
+   inc = tbl->it_offset + index / sizeof(u64);
start |= (inc << shift);
-   inc = tbl->it_offset + (((u64)endp - tbl->it_base) / sizeof(u64));
+   inc = tbl->it_offset + (index + npages - 1) / sizeof(u64);
end |= (inc << shift);
inc = (0x1ull << shift);
mb();
@@ -1120,19 +1152,35 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe

[PATCH v3 03/24] powerpc/powernv: Do not set "read" flag if direction==DMA_NONE

2015-01-29 Thread Alexey Kardashevskiy

Normally a bitmap from the iommu_table is used to track what TCE entry
is in use. Since we are going to use iommu_table without its locks and
do xchg() instead, it becomes essential not to put bits which are not
implied in the direction flag.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
 arch/powerpc/platforms/powernv/pci.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 4945e87..9ec7d68 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -589,19 +589,27 @@ struct pci_ops pnv_pci_ops = {
.write = pnv_pci_write_config,
 };
 
+static unsigned long pnv_dmadir_to_flags(enum dma_data_direction direction)
+{
+   switch (direction) {
+   case DMA_BIDIRECTIONAL:
+   case DMA_FROM_DEVICE:
+   return TCE_PCI_READ | TCE_PCI_WRITE;
+   case DMA_TO_DEVICE:
+   return TCE_PCI_READ;
+   default:
+   return 0;
+   }
+}
+
 static int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
 unsigned long uaddr, enum dma_data_direction direction,
 struct dma_attrs *attrs, bool rm)
 {
-   u64 proto_tce;
+   u64 proto_tce = pnv_dmadir_to_flags(direction);
__be64 *tcep, *tces;
u64 rpn;
 
-   proto_tce = TCE_PCI_READ; // Read allowed
-
-   if (direction != DMA_TO_DEVICE)
-   proto_tce |= TCE_PCI_WRITE;
-
tces = tcep = ((__be64 *)tbl->it_base) + index - tbl->it_offset;
rpn = __pa(uaddr) >> tbl->it_page_shift;
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 00/24] powerpc/iommu/vfio: Enable Dynamic DMA windows

2015-01-29 Thread Alexey Kardashevskiy


This enables PAPR defined feature called Dynamic DMA windows (DDW).

Each Partitionable Endpoint (IOMMU group) has a separate DMA window on
a PCI bus where devices are allows to perform DMA. By default there is
1 or 2GB window allocated at the host boot time and these windows are
used when an IOMMU group is passed to the userspace (guest). These windows
are mapped at zero offset on a PCI bus.

Hi-speed devices may suffer from limited size of this window. On the host
side a TCE bypass mode is enabled on POWER8 CPU which implements
direct mapping of the host memory to a PCI bus at 1<<59.

For the guest, PAPR defines a DDW RTAS API which allows the pseries guest
to query the hypervisor if it supports DDW and what are the parameters
of possible windows.

Currently POWER8 supports 2 DMA windows per PE - already mentioned and used
small 32bit window and 64bit window which can only start from 1<<59 and
can support various page sizes.

This patchset reworks PPC IOMMU code and adds necessary structures
to extend it to support big windows.

When the guest detectes the feature and the PE is capable of 64bit DMA,
it does:
1. query to hypervisor about number of available windows and page masks;
2. creates a window with the biggest possible page size (current guests can do
64K or 16MB TCEs);
3. maps the entire guest RAM via H_PUT_TCE* hypercalls
4. switches dma_ops to direct_dma_ops on the selected PE.

Once this is done, H_PUT_TCE is not called anymore and the guest gets
maximum performance.

Changes:
v3:
* (!) redesigned the whole thing
* multiple IOMMU groups per PHB -> one PHB is needed for VFIO in the guest ->
no problems with locked_vm counting; also we save memory on actual tables
* guest RAM preregistration is required for DDW
* PEs (IOMMU groups) are passed to VFIO with no DMA windows at all so
we do not bother with iommu_table::it_map anymore
* added multilevel TCE tables support to support really huge guests

v2:
* added missing __pa() in "powerpc/powernv: Release replaced TCE"
* reposted to make some noise




Alexey Kardashevskiy (24):
  vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU
driver
  vfio: powerpc/iommu: Check that TCE page size is equal to it_page_size
  powerpc/powernv: Do not set "read" flag if direction==DMA_NONE
  vfio: powerpc/spapr: Use it_page_size
  vfio: powerpc/spapr: Move locked_vm accounting to helpers
  powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table
  powerpc/iommu: Introduce iommu_table_alloc() helper
  powerpc/spapr: vfio: Switch from iommu_table to new powerpc_iommu
  powerpc/iommu: Fix IOMMU ownership control functions
  powerpc/powernv/ioda2: Rework IOMMU ownership control
  powerpc/powernv/ioda/ioda2: Rework tce_build()/tce_free()
  powerpc/iommu/powernv: Release replaced TCE
  powerpc/pseries/lpar: Enable VFIO
  vfio: powerpc/spapr: Register memory
  poweppc/powernv/ioda2: Rework iommu_table creation
  powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_create_table
  powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window
  powerpc/iommu: Split iommu_free_table into 2 helpers
  powerpc/powernv: Implement multilevel TCE tables
  powerpc/powernv: Change prototypes to receive iommu
  powerpc/powernv/ioda: Define and implement DMA table/window management
callbacks
  powerpc/iommu: Get rid of ownership helpers
  vfio/spapr: Enable multiple groups in a container
  vfio: powerpc/spapr: Support Dynamic DMA windows

 arch/powerpc/include/asm/iommu.h| 107 +++-
 arch/powerpc/include/asm/machdep.h  |  25 -
 arch/powerpc/kernel/eeh.c   |   2 +-
 arch/powerpc/kernel/iommu.c | 282 +++--
 arch/powerpc/kernel/vio.c   |   5 +
 arch/powerpc/platforms/cell/iommu.c |   8 +-
 arch/powerpc/platforms/pasemi/iommu.c   |   7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c   | 470 ---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  21 +-
 arch/powerpc/platforms/powernv/pci.c| 130 +++--
 arch/powerpc/platforms/powernv/pci.h|  14 +-
 arch/powerpc/platforms/pseries/iommu.c  |  99 +++-
 arch/powerpc/sysdev/dart_iommu.c|  12 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 874 
 include/uapi/linux/vfio.h   |  53 +-
 15 files changed, 1584 insertions(+), 525 deletions(-)

-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 05/24] vfio: powerpc/spapr: Move locked_vm accounting to helpers

2015-01-29 Thread Alexey Kardashevskiy

There moves locked pages accounting to helpers.
Later they will be reused for Dynamic DMA windows (DDW).

While we are here, update the comment explaining why RLIMIT_MEMLOCK
might be required to be bigger than the guest RAM. This also prints
pid of the current process in pr_warn/pr_debug.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 72 +++--
 1 file changed, 53 insertions(+), 19 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index c596053..29d5708 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -29,6 +29,47 @@
 static void tce_iommu_detach_group(void *iommu_data,
struct iommu_group *iommu_group);
 
+#define IOMMU_TABLE_PAGES(tbl) \
+   (((tbl)->it_size << (tbl)->it_page_shift) >> PAGE_SHIFT)
+
+static long try_increment_locked_vm(long npages)
+{
+   long ret = 0, locked, lock_limit;
+
+   if (!current || !current->mm)
+   return -ESRCH; /* process exited */
+
+   down_write(¤t->mm->mmap_sem);
+   locked = current->mm->locked_vm + npages;
+   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
+   pr_warn("[%d] RLIMIT_MEMLOCK (%ld) exceeded\n",
+   current->pid, rlimit(RLIMIT_MEMLOCK));
+   ret = -ENOMEM;
+   } else {
+   current->mm->locked_vm += npages;
+   }
+   pr_debug("[%d] RLIMIT_MEMLOCK+ %ld pages\n", current->pid,
+   current->mm->locked_vm);
+   up_write(¤t->mm->mmap_sem);
+
+   return ret;
+}
+
+static void decrement_locked_vm(long npages)
+{
+   if (!current || !current->mm)
+   return; /* process exited */
+
+   down_write(¤t->mm->mmap_sem);
+   if (npages > current->mm->locked_vm)
+   npages = current->mm->locked_vm;
+   current->mm->locked_vm -= npages;
+   pr_debug("[%d] RLIMIT_MEMLOCK- %ld pages\n", current->pid,
+   current->mm->locked_vm);
+   up_write(¤t->mm->mmap_sem);
+}
+
 /*
  * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
  *
@@ -66,8 +107,6 @@ static bool tce_check_page_size(struct page *page, unsigned 
page_shift)
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
-   unsigned long locked, lock_limit, npages;
-   struct iommu_table *tbl = container->tbl;
 
if (!container->tbl)
return -ENXIO;
@@ -95,21 +134,19 @@ static int tce_iommu_enable(struct tce_container 
*container)
 * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits,
 * that would effectively kill the guest at random points, much better
 * enforcing the limit based on the max that the guest can map.
+*
+* Unfortunately at the moment it counts whole tables, no matter how
+* much memory the guest has. I.e. for 4GB guest and 4 IOMMU groups
+* each with 2GB DMA window, 8GB will be counted here. The reason for
+* this is that we cannot tell here the amount of RAM used by the guest
+* as this information is only available from KVM and VFIO is
+* KVM agnostic.
 */
-   down_write(¤t->mm->mmap_sem);
-   npages = (tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT;
-   locked = current->mm->locked_vm + npages;
-   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
-   pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n",
-   rlimit(RLIMIT_MEMLOCK));
-   ret = -ENOMEM;
-   } else {
+   ret = try_increment_locked_vm(IOMMU_TABLE_PAGES(container->tbl));
+   if (ret)
+   return ret;
 
-   current->mm->locked_vm += npages;
-   container->enabled = true;
-   }
-   up_write(¤t->mm->mmap_sem);
+   container->enabled = true;
 
return ret;
 }
@@ -124,10 +161,7 @@ static void tce_iommu_disable(struct tce_container 
*container)
if (!container->tbl || !current->mm)
return;
 
-   down_write(¤t->mm->mmap_sem);
-   current->mm->locked_vm -= (container->tbl->it_size <<
-   container->tbl->it_page_shift) >> PAGE_SHIFT;
-   up_write(¤t->mm->mmap_sem);
+   decrement_locked_vm(IOMMU_TABLE_PAGES(container->tbl));
 }
 
 static void *tce_iommu_open(unsigned long arg)
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 02/24] vfio: powerpc/iommu: Check that TCE page size is equal to it_page_size

2015-01-29 Thread Alexey Kardashevskiy

This checks that the TCE table page size is not bigger that the size of
a page we just pinned and going to put its physical address to the table.

Otherwise the hardware gets unwanted access to physical memory between
the end of the actual page and the end of the aligned up TCE page.

Since compound_order() and compound_head() work correctly on non-huge
pages, there is no need for additional check whether the page is huge.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v5:
* check is done for all page sizes now, not just for huge pages
* failed check returns EFAULT now (was EINVAL)
* moved the check to VFIO SPAPR IOMMU driver
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index dc4a886..99b98fa 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -47,6 +47,22 @@ struct tce_container {
bool enabled;
 };
 
+static bool tce_check_page_size(struct page *page, unsigned page_shift)
+{
+   unsigned shift;
+
+   /*
+* Check that the TCE table granularity is not bigger than the size of
+* a page we just found. Otherwise the hardware can get access to
+* a bigger memory chunk that it should.
+*/
+   shift = PAGE_SHIFT + compound_order(compound_head(page));
+   if (shift >= page_shift)
+   return true;
+
+   return false;
+}
+
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
@@ -199,6 +215,12 @@ static long tce_iommu_build(struct tce_container 
*container,
ret = -EFAULT;
break;
}
+
+   if (!tce_check_page_size(page, tbl->it_page_shift)) {
+   ret = -EFAULT;
+   break;
+   }
+
hva = (unsigned long) page_address(page) +
(tce & IOMMU_PAGE_MASK(tbl) & ~PAGE_MASK);
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 04/24] vfio: powerpc/spapr: Use it_page_size

2015-01-29 Thread Alexey Kardashevskiy

This makes use of the it_page_size from the iommu_table struct
as page size can differ.

This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 99b98fa..c596053 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -97,7 +97,7 @@ static int tce_iommu_enable(struct tce_container *container)
 * enforcing the limit based on the max that the guest can map.
 */
down_write(¤t->mm->mmap_sem);
-   npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
+   npages = (tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT;
locked = current->mm->locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
@@ -126,7 +126,7 @@ static void tce_iommu_disable(struct tce_container 
*container)
 
down_write(¤t->mm->mmap_sem);
current->mm->locked_vm -= (container->tbl->it_size <<
-   IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
+   container->tbl->it_page_shift) >> PAGE_SHIFT;
up_write(¤t->mm->mmap_sem);
 }
 
@@ -232,7 +232,7 @@ static long tce_iommu_build(struct tce_container *container,
tce, ret);
break;
}
-   tce += IOMMU_PAGE_SIZE_4K;
+   tce += IOMMU_PAGE_SIZE(tbl);
}
 
if (ret)
@@ -277,8 +277,8 @@ static long tce_iommu_ioctl(void *iommu_data,
if (info.argsz < minsz)
return -EINVAL;
 
-   info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT_4K;
-   info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT_4K;
+   info.dma32_window_start = tbl->it_offset << tbl->it_page_shift;
+   info.dma32_window_size = tbl->it_size << tbl->it_page_shift;
info.flags = 0;
 
if (copy_to_user((void __user *)arg, &info, minsz))
@@ -308,8 +308,8 @@ static long tce_iommu_ioctl(void *iommu_data,
VFIO_DMA_MAP_FLAG_WRITE))
return -EINVAL;
 
-   if ((param.size & ~IOMMU_PAGE_MASK_4K) ||
-   (param.vaddr & ~IOMMU_PAGE_MASK_4K))
+   if ((param.size & ~IOMMU_PAGE_MASK(tbl)) ||
+   (param.vaddr & ~IOMMU_PAGE_MASK(tbl)))
return -EINVAL;
 
/* iova is checked by the IOMMU API */
@@ -324,8 +324,8 @@ static long tce_iommu_ioctl(void *iommu_data,
return ret;
 
ret = tce_iommu_build(container, tbl,
-   param.iova >> IOMMU_PAGE_SHIFT_4K,
-   tce, param.size >> IOMMU_PAGE_SHIFT_4K);
+   param.iova >> tbl->it_page_shift,
+   tce, param.size >> tbl->it_page_shift);
 
iommu_flush_tce(tbl);
 
@@ -351,17 +351,17 @@ static long tce_iommu_ioctl(void *iommu_data,
if (param.flags)
return -EINVAL;
 
-   if (param.size & ~IOMMU_PAGE_MASK_4K)
+   if (param.size & ~IOMMU_PAGE_MASK(tbl))
return -EINVAL;
 
ret = iommu_tce_clear_param_check(tbl, param.iova, 0,
-   param.size >> IOMMU_PAGE_SHIFT_4K);
+   param.size >> tbl->it_page_shift);
if (ret)
return ret;
 
ret = tce_iommu_clear(container, tbl,
-   param.iova >> IOMMU_PAGE_SHIFT_4K,
-   param.size >> IOMMU_PAGE_SHIFT_4K);
+   param.iova >> tbl->it_page_shift,
+   param.size >> tbl->it_page_shift);
iommu_flush_tce(tbl);
 
return ret;
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

56 matches

Mail list logo