date:20190822

Re: [PATCH v7 0/7] KVMPPC driver to manage secure guest pages

2019-08-22 Thread Paul Mackerras

On Thu, Aug 22, 2019 at 03:56:13PM +0530, Bharata B Rao wrote:
> Hi,
> 
> A pseries guest can be run as a secure guest on Ultravisor-enabled
> POWER platforms. On such platforms, this driver will be used to manage
> the movement of guest pages between the normal memory managed by
> hypervisor(HV) and secure memory managed by Ultravisor(UV).
> 
> Private ZONE_DEVICE memory equal to the amount of secure memory
> available in the platform for running secure guests is created.
> Whenever a page belonging to the guest becomes secure, a page from
> this private device memory is used to represent and track that secure
> page on the HV side. The movement of pages between normal and secure
> memory is done via migrate_vma_pages(). The reverse movement is driven
> via pagemap_ops.migrate_to_ram().
> 
> The page-in or page-out requests from UV will come to HV as hcalls and
> HV will call back into UV via uvcalls to satisfy these page requests.
> 
> These patches are against hmm.git
> (https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm)
> 
> plus
> 
> Claudio Carvalho's base ultravisor enablement patchset v6
> (https://lore.kernel.org/linuxppc-dev/20190822034838.27876-1-cclau...@linux.ibm.com/T/#t)

How are you thinking these patches will go upstream?  Are you going to
send them via the hmm tree?

I assume you need Claudio's patchset as a prerequisite for your series
to compile, which means the hmm maintainers would need to pull in a
topic branch from Michael Ellerman's powerpc tree, or something like
that.

Paul.

Re: [PATCH kernel] vfio/spapr_tce: Fix incorrect tce_iommu_group memory free

2019-08-22 Thread Paul Mackerras

On Mon, Aug 19, 2019 at 11:51:17AM +1000, Alexey Kardashevskiy wrote:
> The @tcegrp variable is used in 1) a loop over attached groups
> 2) it stores a pointer to a newly allocated tce_iommu_group if 1) found
> nothing. However the error handler does not distinguish how we got there
> and incorrectly releases memory for a found+incompatible group.
> 
> This fixes it by adding another error handling case.
> 
> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> Signed-off-by: Alexey Kardashevskiy 

Good catch.  This is potentially nasty since it is a double free.
Alex, are you going to take this, or would you prefer it goes via
Michael Ellerman's tree?

Reviewed-by: Paul Mackerras

RE: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-22 Thread Xiaowei Bao



> -Original Message-
> From: Kishon Vijay Abraham I 
> Sent: 2019年8月23日 11:40
> To: Xiaowei Bao ; bhelg...@google.com;
> robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
> ; lorenzo.pieral...@arm.co
> ; a...@arndb.de; gre...@linuxfoundation.org;
> M.h. Lian ; Mingkai Hu ;
> Roy Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org;
> andrew.mur...@arm.com
> Subject: Re: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting
> capability with different PEX
> 
> Hi,
> 
> (Fixed Lorenzo's email address. All the patches in the series have wrong email
> id)
> 
> On 23/08/19 8:09 AM, Xiaowei Bao wrote:
> >
> >
> >> -Original Message-
> >> From: Kishon Vijay Abraham I 
> >> Sent: 2019年8月22日 19:44
> >> To: Xiaowei Bao ; bhelg...@google.com;
> >> robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org; Leo
> Li
> >> ; lorenzo.pieral...@arm.co; a...@arndb.de;
> >> gre...@linuxfoundation.org; M.h. Lian ;
> >> Mingkai Hu ; Roy Zang ;
> >> jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> >> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> >> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> >> linuxppc-dev@lists.ozlabs.org; andrew.mur...@arm.com
> >> Subject: Re: [PATCH v2 06/10] PCI: layerscape: Modify the way of
> >> getting capability with different PEX
> >>
> >> Hi,
> >>
> >> On 22/08/19 4:52 PM, Xiaowei Bao wrote:
> >>> The different PCIe controller in one board may be have different
> >>> capability of MSI or MSIX, so change the way of getting the MSI
> >>> capability, make it more flexible.
> >>
> >> please use different pci_epc_features table for different boards.
> > Thanks, I think that it will be more flexible to dynamically get MSI
> > or MSIX capability, Thus, we will not need to define the pci_epc_feature for
> different boards.
> 
> Is the restriction because you cannot have different compatible for different
> boards?
Sorry, I am not very clear what your mean, I think even if I use the same 
compatible
with different boards, each boards will enter the probe function, in there I 
will get
the MSI or MSIX PCIe capability of the current controller in this board. Why do 
I need
to define the pci_epc_feature for different boards? 
> 
> Thanks
> Kishon
> 
> >>
> >> Thanks
> >> Kishon
> >>>
> >>> Signed-off-by: Xiaowei Bao 
> >>> ---
> >>> v2:
> >>>  - Remove the repeated assignment code.
> >>>
> >>>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 26
> >>> +++---
> >>>  1 file changed, 19 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>> index 4e92a95..8461f62 100644
> >>> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>> @@ -22,6 +22,7 @@
> >>>
> >>>  struct ls_pcie_ep {
> >>>   struct dw_pcie  *pci;
> >>> + struct pci_epc_features *ls_epc;
> >>>  };
> >>>
> >>>  #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> >>> @@ -40,25 +41,26 @@ static const struct of_device_id
> >> ls_pcie_ep_of_match[] = {
> >>>   { },
> >>>  };
> >>>
> >>> -static const struct pci_epc_features ls_pcie_epc_features = {
> >>> - .linkup_notifier = false,
> >>> - .msi_capable = true,
> >>> - .msix_capable = false,
> >>> -};
> >>> -
> >>>  static const struct pci_epc_features*
> >>> ls_pcie_ep_get_features(struct dw_pcie_ep *ep)  {
> >>> - return _pcie_epc_features;
> >>> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >>> + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> >>> +
> >>> + return pcie->ls_epc;
> >>>  }
> >>>
> >>>  static void ls_pcie_ep_init(struct dw_pcie_ep *ep)  {
> >>>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >>> + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> >>>   enum pci_barno bar;
> >>>
> >>>   for (bar = BAR_0; bar <= BAR_5; bar++)
> >>>   dw_pcie_ep_reset_bar(pci, bar);
> >>> +
> >>> + pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
> >>> + pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
> >>>  }
> >>>
> >>>  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> >>> @@
> >>> -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct
> >>> platform_device
> >> *pdev)
> >>>   struct device *dev = >dev;
> >>>   struct dw_pcie *pci;
> >>>   struct ls_pcie_ep *pcie;
> >>> + struct pci_epc_features *ls_epc;
> >>>   struct resource *dbi_base;
> >>>   int ret;
> >>>
> >>> @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct
> >> platform_device *pdev)
> >>>   if (!pci)
> >>>   return -ENOMEM;
> >>>
> >>> + ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
> >>> + if (!ls_epc)
> >>> + return -ENOMEM;
> >>> +
> >>>   dbi_base = platform_get_resource_byname(pdev,
> IORESOURCE_MEM,
>

Re: [PATCH v3 1/2] powerpc/powernv: Enhance opal message read interface

2019-08-22 Thread Vasant Hegde


On 8/22/19 9:06 PM, Vasant Hegde wrote:

On 8/22/19 11:21 AM, Oliver O'Halloran wrote:

On Wed, 2019-08-21 at 13:43 +0530, Vasant Hegde wrote:

Use "opal-msg-size" device tree property to allocate memory for "opal_msg".

Cc: Mahesh Salgaonkar 
Cc: Jeremy Kerr 
Signed-off-by: Vasant Hegde 
---
Changes in v3:
   - Call BUG_ON, if we fail to allocate memory during init.

-Vasant

  arch/powerpc/platforms/powernv/opal.c | 29 ++-
  1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c

index aba443be7daa..4f1f68f568bf 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -58,6 +58,8 @@ static DEFINE_SPINLOCK(opal_write_lock);
  static struct atomic_notifier_head opal_msg_notifier_head[OPAL_MSG_TYPE_MAX];
  static uint32_t opal_heartbeat;
  static struct task_struct *kopald_tsk;
+static struct opal_msg *opal_msg;
+static uint64_t opal_msg_size;
  void opal_configure_cores(void)
  {
@@ -271,14 +273,9 @@ static void opal_message_do_notify(uint32_t msg_type, 
void *msg)

  static void opal_handle_message(void)
  {
  s64 ret;
-    /*
- * TODO: pre-allocate a message buffer depending on opal-msg-size
- * value in /proc/device-tree.
- */
-    static struct opal_msg msg;
  u32 type;
-    ret = opal_get_msg(__pa(), sizeof(msg));
+    ret = opal_get_msg(__pa(opal_msg), opal_msg_size);
  /* No opal message pending. */
  if (ret == OPAL_RESOURCE)
  return;
@@ -290,14 +287,14 @@ static void opal_handle_message(void)
  return;
  }
-    type = be32_to_cpu(msg.msg_type);
+    type = be32_to_cpu(opal_msg->msg_type);
  /* Sanity check */
  if (type >= OPAL_MSG_TYPE_MAX) {
  pr_warn_once("%s: Unknown message type: %u\n", __func__, type);
  return;
  }
-    opal_message_do_notify(type, (void *));
+    opal_message_do_notify(type, (void *)opal_msg);
  }
  static irqreturn_t opal_message_notify(int irq, void *data)
@@ -306,9 +303,21 @@ static irqreturn_t opal_message_notify(int irq, void *data)
  return IRQ_HANDLED;
  }
-static int __init opal_message_init(void)
+static int __init opal_message_init(struct device_node *opal_node)
  {
  int ret, i, irq;



+    const __be32 *val;
+
+    val = of_get_property(opal_node, "opal-msg-size", NULL);
+    if (val)
+    opal_msg_size = be32_to_cpup(val);


Use of_property_read_u32()


Yes. Will fix it.




+
+    /* If opal-msg-size property is not available then use default size */
+    if (!opal_msg_size)
+    opal_msg_size = sizeof(struct opal_msg);
+
+    opal_msg = kmalloc(opal_msg_size, GFP_KERNEL);



+    BUG_ON(opal_msg == NULL);


Seems questionable. Why not fall back to using a staticly allocated
struct opal_msg? Or re-try the allocation with the size limited to
sizeof(struct opal_msg)?


If we are not able to allocate memory during init then we have bigger problem.
No point in continuing. Hence added BUG_ON().
May be I can retry allocation with fixed size before calling BUG_ON().
How about something like below :

+   /* If opal-msg-size property is not available then use default size */
+   if (!opal_msg_size)
+   opal_msg_size = sizeof(struct opal_msg);
+
+   opal_msg = kmalloc(opal_msg_size, GFP_KERNEL);
+   if (!opal_msg) {


Just to be clear I will adjust `opal_msg_size` size here.

-Vasant


+   opal_msg = kmalloc(sizeof(struct opal_msg), GFP_KERNEL);
+   BUG_ON(opal_msg == NULL);
+   }


-Vasant

Re: [PATCH v6 7/7] powerpc/kvm: Use UV_RETURN ucall to return to ultravisor

2019-08-22 Thread Paul Mackerras

On Thu, Aug 22, 2019 at 12:48:38AM -0300, Claudio Carvalho wrote:
> From: Sukadev Bhattiprolu 
> 
> When an SVM makes an hypercall or incurs some other exception, the
> Ultravisor usually forwards (a.k.a. reflects) the exceptions to the
> Hypervisor. After processing the exception, Hypervisor uses the
> UV_RETURN ultracall to return control back to the SVM.
> 
> The expected register state on entry to this ultracall is:
> 
> * Non-volatile registers are restored to their original values.
> * If returning from an hypercall, register R0 contains the return value
>   (unlike other ultracalls) and, registers R4 through R12 contain any
>   output values of the hypercall.
> * R3 contains the ultracall number, i.e UV_RETURN.
> * If returning with a synthesized interrupt, R2 contains the
>   synthesized interrupt number.

This isn't accurate: R2 contains the value that should end up in SRR1
when we are back in the secure guest.  HSRR0 and HSRR1 contain the
instruction pointer and MSR that the guest should run with.  They may
be different from the instruction pointer and MSR that the guest vCPU
last had, if the hypervisor has synthesized an interrupt for the
guest.  The ultravisor needs to detect this case and respond
appropriately.

> Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson.
> 
> Signed-off-by: Sukadev Bhattiprolu 
> Signed-off-by: Claudio Carvalho 

Apart from that comment on the patch description -

Acked-by: Paul Mackerras

Re: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-22 Thread Kishon Vijay Abraham I

Hi,

(Fixed Lorenzo's email address. All the patches in the series have wrong email 
id)

On 23/08/19 8:09 AM, Xiaowei Bao wrote:
> 
> 
>> -Original Message-
>> From: Kishon Vijay Abraham I 
>> Sent: 2019年8月22日 19:44
>> To: Xiaowei Bao ; bhelg...@google.com;
>> robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
>> ; lorenzo.pieral...@arm.co; a...@arndb.de;
>> gre...@linuxfoundation.org; M.h. Lian ; Mingkai
>> Hu ; Roy Zang ;
>> jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
>> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
>> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
>> linuxppc-dev@lists.ozlabs.org; andrew.mur...@arm.com
>> Subject: Re: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting
>> capability with different PEX
>>
>> Hi,
>>
>> On 22/08/19 4:52 PM, Xiaowei Bao wrote:
>>> The different PCIe controller in one board may be have different
>>> capability of MSI or MSIX, so change the way of getting the MSI
>>> capability, make it more flexible.
>>
>> please use different pci_epc_features table for different boards.
> Thanks, I think that it will be more flexible to dynamically get MSI or MSIX 
> capability,
> Thus, we will not need to define the pci_epc_feature for different boards.

Is the restriction because you cannot have different compatible for different
boards?

Thanks
Kishon

>>
>> Thanks
>> Kishon
>>>
>>> Signed-off-by: Xiaowei Bao 
>>> ---
>>> v2:
>>>  - Remove the repeated assignment code.
>>>
>>>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 26
>>> +++---
>>>  1 file changed, 19 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
>>> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
>>> index 4e92a95..8461f62 100644
>>> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
>>> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
>>> @@ -22,6 +22,7 @@
>>>
>>>  struct ls_pcie_ep {
>>> struct dw_pcie  *pci;
>>> +   struct pci_epc_features *ls_epc;
>>>  };
>>>
>>>  #define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
>>> @@ -40,25 +41,26 @@ static const struct of_device_id
>> ls_pcie_ep_of_match[] = {
>>> { },
>>>  };
>>>
>>> -static const struct pci_epc_features ls_pcie_epc_features = {
>>> -   .linkup_notifier = false,
>>> -   .msi_capable = true,
>>> -   .msix_capable = false,
>>> -};
>>> -
>>>  static const struct pci_epc_features*  ls_pcie_ep_get_features(struct
>>> dw_pcie_ep *ep)  {
>>> -   return _pcie_epc_features;
>>> +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>> +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
>>> +
>>> +   return pcie->ls_epc;
>>>  }
>>>
>>>  static void ls_pcie_ep_init(struct dw_pcie_ep *ep)  {
>>> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>> +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
>>> enum pci_barno bar;
>>>
>>> for (bar = BAR_0; bar <= BAR_5; bar++)
>>> dw_pcie_ep_reset_bar(pci, bar);
>>> +
>>> +   pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
>>> +   pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
>>>  }
>>>
>>>  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, @@
>>> -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device
>> *pdev)
>>> struct device *dev = >dev;
>>> struct dw_pcie *pci;
>>> struct ls_pcie_ep *pcie;
>>> +   struct pci_epc_features *ls_epc;
>>> struct resource *dbi_base;
>>> int ret;
>>>
>>> @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct
>> platform_device *pdev)
>>> if (!pci)
>>> return -ENOMEM;
>>>
>>> +   ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
>>> +   if (!ls_epc)
>>> +   return -ENOMEM;
>>> +
>>> dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM,
>> "regs");
>>> pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
>>> if (IS_ERR(pci->dbi_base))
>>> @@ -139,6 +146,11 @@ static int __init ls_pcie_ep_probe(struct
>> platform_device *pdev)
>>> pci->ops = _pcie_ep_ops;
>>> pcie->pci = pci;
>>>
>>> +   ls_epc->linkup_notifier = false,
>>> +   ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
>>> +
>>> +   pcie->ls_epc = ls_epc;
>>> +
>>> platform_set_drvdata(pdev, pcie);
>>>
>>> ret = ls_add_pcie_ep(pcie, pdev);
>>>

RE: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-22 Thread Xiaowei Bao



> -Original Message-
> From: Kishon Vijay Abraham I 
> Sent: 2019年8月22日 19:44
> To: Xiaowei Bao ; bhelg...@google.com;
> robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
> ; lorenzo.pieral...@arm.co; a...@arndb.de;
> gre...@linuxfoundation.org; M.h. Lian ; Mingkai
> Hu ; Roy Zang ;
> jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; andrew.mur...@arm.com
> Subject: Re: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting
> capability with different PEX
> 
> Hi,
> 
> On 22/08/19 4:52 PM, Xiaowei Bao wrote:
> > The different PCIe controller in one board may be have different
> > capability of MSI or MSIX, so change the way of getting the MSI
> > capability, make it more flexible.
> 
> please use different pci_epc_features table for different boards.
Thanks, I think that it will be more flexible to dynamically get MSI or MSIX 
capability,
Thus, we will not need to define the pci_epc_feature for different boards.
> 
> Thanks
> Kishon
> >
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - Remove the repeated assignment code.
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 26
> > +++---
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index 4e92a95..8461f62 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -22,6 +22,7 @@
> >
> >  struct ls_pcie_ep {
> > struct dw_pcie  *pci;
> > +   struct pci_epc_features *ls_epc;
> >  };
> >
> >  #define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
> > @@ -40,25 +41,26 @@ static const struct of_device_id
> ls_pcie_ep_of_match[] = {
> > { },
> >  };
> >
> > -static const struct pci_epc_features ls_pcie_epc_features = {
> > -   .linkup_notifier = false,
> > -   .msi_capable = true,
> > -   .msix_capable = false,
> > -};
> > -
> >  static const struct pci_epc_features*  ls_pcie_ep_get_features(struct
> > dw_pcie_ep *ep)  {
> > -   return _pcie_epc_features;
> > +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > +
> > +   return pcie->ls_epc;
> >  }
> >
> >  static void ls_pcie_ep_init(struct dw_pcie_ep *ep)  {
> > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > enum pci_barno bar;
> >
> > for (bar = BAR_0; bar <= BAR_5; bar++)
> > dw_pcie_ep_reset_bar(pci, bar);
> > +
> > +   pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
> > +   pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
> >  }
> >
> >  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, @@
> > -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device
> *pdev)
> > struct device *dev = >dev;
> > struct dw_pcie *pci;
> > struct ls_pcie_ep *pcie;
> > +   struct pci_epc_features *ls_epc;
> > struct resource *dbi_base;
> > int ret;
> >
> > @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct
> platform_device *pdev)
> > if (!pci)
> > return -ENOMEM;
> >
> > +   ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
> > +   if (!ls_epc)
> > +   return -ENOMEM;
> > +
> > dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> "regs");
> > pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
> > if (IS_ERR(pci->dbi_base))
> > @@ -139,6 +146,11 @@ static int __init ls_pcie_ep_probe(struct
> platform_device *pdev)
> > pci->ops = _pcie_ep_ops;
> > pcie->pci = pci;
> >
> > +   ls_epc->linkup_notifier = false,
> > +   ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> > +
> > +   pcie->ls_epc = ls_epc;
> > +
> > platform_set_drvdata(pdev, pcie);
> >
> > ret = ls_add_pcie_ep(pcie, pdev);
> >

Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity

2019-08-22 Thread Nathan Lynch

Srikar Dronamraju  writes:
> * Nathan Lynch  [2019-08-22 12:17:48]:
>> > However home node associativity requires cpu's hwid which is set in
>> > smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
>> 
>> But this seems like it would negatively affect pacas' NUMA placements?
>> 
>> Would it be less risky to figure out a way to do "early" VPHN hcalls
>> before mem_topology_setup, getting the hwids from the cpu_to_phys_id
>> array perhaps?
>> 
>
> Do you mean for calls from mem_topology_setup(), stuff we use cpu_to_phys_id
> but for the calls from ppc_numa_cpu_prepare() we use the
> get_hard_smp_processor_id()?

Yes, something like that, I think. Although numa_setup_cpu() is used in
both contexts.

Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity

2019-08-22 Thread Srikar Dronamraju

* Nathan Lynch  [2019-08-22 12:17:48]:

> Hi Srikar,

Thanks Nathan for the review.

> 
> > However home node associativity requires cpu's hwid which is set in
> > smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
> 
> But this seems like it would negatively affect pacas' NUMA placements?
> 
> Would it be less risky to figure out a way to do "early" VPHN hcalls
> before mem_topology_setup, getting the hwids from the cpu_to_phys_id
> array perhaps?
> 

Do you mean for calls from mem_topology_setup(), stuff we use cpu_to_phys_id
but for the calls from ppc_numa_cpu_prepare() we use the
get_hard_smp_processor_id()?

Thats doable.

> 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 88b5157..7965d3b 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb 
> > *lmb)
> > return nid;
> >  }
> >  
> > +static int vphn_get_nid(unsigned long cpu)
> > +{
> > +   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > +   long rc;
> > +
> > +   /* Use associativity from first thread for all siblings */
> 
> I don't understand how this comment corresponds to the code it
> accompanies.


Okay will rephrase
> 
> 
> > +   rc = hcall_vphn(get_hard_smp_processor_id(cpu),
> > +   VPHN_FLAG_VCPU, associativity);
> > +
> > +   if (rc == H_SUCCESS)
> > +   return  associativity_to_nid(associativity);
>   ^^ extra space
> 
> > @@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
> > goto out;
> > }
> >  
> > -   nid = of_node_to_nid_single(cpu);
> > +   /*
> > +* On a shared lpar, the device tree might not have the correct node
> > +* associativity.  At this time lppaca, or its __old_status field
> 
> Sorry but I'm going to quibble with this phrasing a bit. On SPLPAR the
> CPU nodes have no affinity information in the device tree at all. This
> comment implies that they may have incorrect information, which is
> AFAIK not the case.
> 

Okay will clarify.

-- 
Thanks and Regards
Srikar Dronamraju

[PATCH] powerpc/64: don't select ARCH_HAS_SCALED_CPUTIME on book3E

2019-08-22 Thread Christophe Leroy

Book3E doesn't have SPRN_SPURR/SPRN_PURR.

Activating ARCH_HAS_SCALED_CPUTIME is just wasting CPU time.

Signed-off-by: Christophe Leroy 
Link: https://github.com/linuxppc/issues/issues/171
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 0c322abb1495..6137f5e3bb2d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -132,7 +132,7 @@ config PPC
select ARCH_HAS_PTE_DEVMAP  if PPC_BOOK3S_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
-   select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC64
+   select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!RELOCATABLE && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
-- 
2.13.3

Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity

2019-08-22 Thread Nathan Lynch

Hi Srikar,

Srikar Dronamraju  writes:
> Currently the kernel detects if its running on a shared lpar platform
> and requests home node associativity before the scheduler sched_domains
> are setup. However between the time NUMA setup is initialized and the
> request for home node associativity, workqueue initializes its per node
> cpumask. The per node workqueue possible cpumask may turn invalid
> after home node associativity resulting in weird situations like
> workqueue possible cpumask being a subset of workqueue online cpumask.
>
> This can be fixed by requesting home node associativity earlier just
> before NUMA setup. However at the NUMA setup time, kernel may not be in
> a position to detect if its running on a shared lpar platform. So
> request for home node associativity and if the request fails, fallback
> on the device tree property.

I think this is generally sound at the conceptual level.


> However home node associativity requires cpu's hwid which is set in
> smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.

But this seems like it would negatively affect pacas' NUMA placements?

Would it be less risky to figure out a way to do "early" VPHN hcalls
before mem_topology_setup, getting the hwids from the cpu_to_phys_id
array perhaps?


> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 88b5157..7965d3b 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   return nid;
>  }
>  
> +static int vphn_get_nid(unsigned long cpu)
> +{
> + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> + long rc;
> +
> + /* Use associativity from first thread for all siblings */

I don't understand how this comment corresponds to the code it
accompanies.


> + rc = hcall_vphn(get_hard_smp_processor_id(cpu),
> + VPHN_FLAG_VCPU, associativity);
> +
> + if (rc == H_SUCCESS)
> + return  associativity_to_nid(associativity);
  ^^ extra space

> @@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
>   goto out;
>   }
>  
> - nid = of_node_to_nid_single(cpu);
> + /*
> +  * On a shared lpar, the device tree might not have the correct node
> +  * associativity.  At this time lppaca, or its __old_status field

Sorry but I'm going to quibble with this phrasing a bit. On SPLPAR the
CPU nodes have no affinity information in the device tree at all. This
comment implies that they may have incorrect information, which is
AFAIK not the case.

Re: [PATCH v2 3/8] powerpc: Fix vDSO clock_getres()

2019-08-22 Thread Sasha Levin

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: a7f290dad32e [PATCH] powerpc: Merge vdso's and add vdso support 
to 32 bits kernel.

The bot has tested the following trees: v5.2.9, v4.19.67, v4.14.139, v4.9.189, 
v4.4.189.

v5.2.9: Build OK!
v4.19.67: Build OK!
v4.14.139: Failed to apply! Possible dependencies:
5c929885f1bb ("powerpc/vdso64: Add support for 
CLOCK_{REALTIME/MONOTONIC}_COARSE")
b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across 
Y2038")

v4.9.189: Failed to apply! Possible dependencies:
454656155110 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
5c929885f1bb ("powerpc/vdso64: Add support for 
CLOCK_{REALTIME/MONOTONIC}_COARSE")
5d451a87e5eb ("powerpc/64: Retrieve number of L1 cache sets from 
device-tree")
7c5b06cadf27 ("KVM: PPC: Book3S HV: Adapt TLB invalidations to work on 
POWER9")
83677f551e0a ("KVM: PPC: Book3S HV: Adjust host/guest context switch for 
POWER9")
902e06eb86cd ("powerpc/32: Change the stack protector canary value per 
task")
b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across 
Y2038")
bd067f83b084 ("powerpc/64: Fix naming of cache block vs. cache line")
e2827fe5c156 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
e9cf1e085647 ("KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs")
f4c51f841d2a ("KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle 
radix guests")

v4.4.189: Failed to apply! Possible dependencies:
153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
3eb5d5888dc6 ("powerpc: Add ppc_strict_facility_enable boot option")
454656155110 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
579e633e764e ("powerpc: create flush_all_to_thread()")
5c929885f1bb ("powerpc/vdso64: Add support for 
CLOCK_{REALTIME/MONOTONIC}_COARSE")
70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
85baa095497f ("powerpc/livepatch: Add live patching support on ppc64le")
902e06eb86cd ("powerpc/32: Change the stack protector canary value per 
task")
b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across 
Y2038")
bf76f73c5f65 ("powerpc: enable UBSAN support")
c208505900b2 ("powerpc: create giveup_all()")
d1e1cf2e38de ("powerpc: clean up asm/switch_to.h")
dc4fbba11e46 ("powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()")
f17c4e01e906 ("powerpc/module: Mark module stubs with a magic value")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks,
Sasha

Re: next take at setting up a dma mask by default for platform devices v2

2019-08-22 Thread Greg Kroah-Hartman

On Fri, Aug 16, 2019 at 08:24:29AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this is another attempt to make sure the dma_mask pointer is always
> initialized for platform devices.  Not doing so lead to lots of
> boilerplate code, and makes platform devices different from all our
> major busses like PCI where we always set up a dma_mask.  In the long
> run this should also help to eventually make dma_mask a scalar value
> instead of a pointer and remove even more cruft.
> 
> The bigger blocker for this last time was the fact that the usb
> subsystem uses the presence or lack of a dma_mask to check if the core
> should do dma mapping for the driver, which is highly unusual.  So we
> fix this first.  Note that this has some overlap with the pending
> desire to use the proper dma_mmap_coherent helper for mapping usb
> buffers.  The first two patches have already been queued up by Greg
> and are only included for completeness.

Note to everyone.  The first two patches in this series is already in
5.3-rc5.

I've applied the rest of the series to my usb-next branch (with the 6th
patch landing there later today.)  They are scheduled to be merge to
Linus in 5.4-rc1.

Christoph, thanks so much for these cleanups.

greg k-h

Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()

2019-08-22 Thread Marta Rybczynska




- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko 
s.miroshniche...@yadro.com wrote:

> This is a yet another approach to fix an old [1-2] concurrency issue, when:
> - two or more devices are being hot-added into a bridge which was
>   initially empty;
> - a bridge with two or more devices is being hot-added;
> - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
> CPU0CPU1
> 
> pci_enable_device_mem() pci_enable_device_mem()
>   pci_enable_bridge() pci_enable_bridge()
> pci_is_enabled()
>   return false;
> atomic_inc_return(enable_cnt)
> Start actual enabling the bridge
> ... pci_is_enabled()
> ...   return true;
> ... Start memory requests <-- FAIL
> ...
> Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.
> 
> CC: Srinath Mannam 
> CC: Marta Rybczynska 
> Signed-off-by: Sergey Miroshnichenko 
> 
> [1]
> https://lore.kernel.org/linux-pci/1501858648-8-1-git-send-email-srinath.man...@broadcom.com/T/#u
>[RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [2]
> https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.javamail.zim...@kalray.eu/T/#u
>[RFC PATCH] nvme: avoid race-conditions when enabling devices
> ---
> drivers/pci/pci.c   | 26 ++
> drivers/pci/probe.c |  1 +
> include/linux/pci.h |  1 +
> 3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1b27b5af3d55..e7f8c354e644 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   struct pci_dev *bridge;
>   int retval;
> 
> + mutex_lock(>enable_mutex);
> +
>   bridge = pci_upstream_bridge(dev);
>   if (bridge)
>   pci_enable_bridge(bridge);
> @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   if (pci_is_enabled(dev)) {
>   if (!dev->is_busmaster)
>   pci_set_master(dev);
> + mutex_unlock(>enable_mutex);
>   return;
>   }
> 

This code is used by numerous drivers and when we've seen that issue I was 
wondering
if there are some use-cases when this (or pci_disable_device) is called with 
interrupts
disabled. It seems that it shouldn't be, but a BUG_ON or error when someone 
calls
it this way would be helpful when debugging.

Marta

Re: [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn

2019-08-22 Thread Nathan Lynch

Hi Srikar,

Srikar Dronamraju  writes:
> There is no point in unpacking associativity, if
> H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.
>
> Also added error messages for H_PARAMETER and default case in
> vphn_get_associativity.

These are two logical changes and should be separated IMO.


> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 50d68d2..88b5157 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1191,6 +1191,10 @@ static long vphn_get_associativity(unsigned long cpu,
>   VPHN_FLAG_VCPU, associativity);
>  
>   switch (rc) {
> + case H_SUCCESS:
> + dbg("VPHN hcall succeeded. Reset polling...\n");
> + timed_topology_update(0);
> + break;
>   case H_FUNCTION:
>   printk_once(KERN_INFO
>   "VPHN is not supported. Disabling polling...\n");
> @@ -1202,9 +1206,15 @@ static long vphn_get_associativity(unsigned long cpu,
>   "preventing VPHN. Disabling polling...\n");
>   stop_topology_update();
>   break;
> - case H_SUCCESS:
> - dbg("VPHN hcall succeeded. Reset polling...\n");
> - timed_topology_update(0);
> + case H_PARAMETER:
> + printk(KERN_ERR
> + "hcall_vphn() was passed an invalid parameter."
> + "Disabling polling...\n");

This will come out as:

hcall_vphn() was passed an invalid parameter.Disabling polling...
 ^

And it's misleading to say VPHN polling is being disabled when this case
does not invoke stop_topology_update().

> + break;
> + default:
> + printk(KERN_ERR
> + "hcall_vphn() returned %ld. Disabling polling \n", rc);
> + stop_topology_update();
>   break;

Any added prints in this routine must be _once or _ratelimited to avoid
log floods. Also use the pr_ APIs instead of printk please.

[PATCH v2 6/8] powerpc/vdso32: use LOAD_REG_IMMEDIATE()

2019-08-22 Thread Christophe Leroy

Use LOAD_REG_IMMEDIATE() to load registers with immediate value.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index 3e55cba19f44..0f87be0ebf7e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -40,8 +40,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
get_datapager9, r0
cmplwi  r10,0   /* check if tv is NULL */
beq 3f
-   lis r7,100@ha   /* load up USEC_PER_SEC */
-   addir7,r7,100@l /* so we get microseconds in r4 */
+   LOAD_REG_IMMEDIATE(r7, 100) /* load up USEC_PER_SEC */
bl  __do_get_tspec@local/* get sec/usec from tb & kernel */
stw r3,TVAL32_TV_SEC(r10)
stw r4,TVAL32_TV_USEC(r10)
@@ -84,8 +83,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
   .cfi_register lr,r12
mr  r11,r4  /* r11 saves tp */
get_datapager9, r0
-   lis r7,NSEC_PER_SEC@h   /* want nanoseconds */
-   ori r7,r7,NSEC_PER_SEC@l
+   LOAD_REG_IMMEDIATE(r7, NSEC_PER_SEC)/* load up NSEC_PER_SEC */
beq cr5, .Lcoarse_clocks
 .Lprecise_clocks:
bl  __do_get_tspec@local/* get sec/nsec from tb & kernel */
-- 
2.13.3

[PATCH v2 8/8] powerpc/vdso32: miscellaneous optimisations

2019-08-22 Thread Christophe Leroy

Various optimisations by inverting branches and removing
redundant instructions.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/datapage.S |  3 +--
 arch/powerpc/kernel/vdso32/getcpu.S   |  6 +++---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 18 +-
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
b/arch/powerpc/kernel/vdso32/datapage.S
index d480d2d4a3fe..436b88c455d1 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -31,11 +31,10 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   mr  r4,r3
+   mr. r4,r3
get_datapager3, r0
mtlrr12
addir3,r3,CFG_SYSCALL_MAP32
-   cmpli   cr0,r4,0
beqlr
li  r0,NR_syscalls
stw r0,0(r4)
diff --git a/arch/powerpc/kernel/vdso32/getcpu.S 
b/arch/powerpc/kernel/vdso32/getcpu.S
index bde226ad904d..ac1faa8a2bfd 100644
--- a/arch/powerpc/kernel/vdso32/getcpu.S
+++ b/arch/powerpc/kernel/vdso32/getcpu.S
@@ -31,10 +31,10 @@ V_FUNCTION_BEGIN(__kernel_getcpu)
rlwinm  r7,r5,16,31-15,31-0
beq cr0,1f
stw r6,0(r3)
-1: beq cr1,2f
-   stw r7,0(r4)
-2: crclr   cr0*4+so
+1: crclr   cr0*4+so
li  r3,0/* always success */
+   beqlr   cr1
+   stw r7,0(r4)
blr
   .cfi_endproc
 V_FUNCTION_END(__kernel_getcpu)
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index c65f41c612f7..47aa44ab8bbb 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,10 +35,9 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
mflrr12
   .cfi_register lr,r12
 
-   mr  r10,r3  /* r10 saves tv */
+   mr. r10,r3  /* r10 saves tv */
mr  r11,r4  /* r11 saves tz */
get_datapager9, r0
-   cmplwi  r10,0   /* check if tv is NULL */
beq 3f
LOAD_REG_IMMEDIATE(r7, 100) /* load up USEC_PER_SEC */
bl  __do_get_tspec@local/* get sec/usec from tb & kernel */
@@ -46,15 +45,16 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
stw r4,TVAL32_TV_USEC(r10)
 
 3: cmplwi  r11,0   /* check if tz is NULL */
-   beq 1f
+   mtlrr12
+   crclr   cr0*4+so
+   li  r3,0
+   beqlr
+
lwz r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */
lwz r5,CFG_TZ_DSTTIME(r9)
stw r4,TZONE_TZ_MINWEST(r11)
stw r5,TZONE_TZ_DSTTIME(r11)
 
-1: mtlrr12
-   crclr   cr0*4+so
-   li  r3,0
blr
   .cfi_endproc
 V_FUNCTION_END(__kernel_gettimeofday)
@@ -248,10 +248,10 @@ V_FUNCTION_BEGIN(__kernel_time)
lwz r3,STAMP_XTIME+TSPEC_TV_SEC(r9)
 
cmplwi  r11,0   /* check if t is NULL */
-   beq 2f
-   stw r3,0(r11)   /* store result at *t */
-2: mtlrr12
+   mtlrr12
crclr   cr0*4+so
+   beqlr
+   stw r3,0(r11)   /* store result at *t */
blr
   .cfi_endproc
 V_FUNCTION_END(__kernel_time)
-- 
2.13.3

[PATCH v2 7/8] powerpc/vdso32: implement clock_getres entirely

2019-08-22 Thread Christophe Leroy

clock_getres returns hrtimer_res for all clocks but coarse ones
for which it returns KTIME_LOW_RES.

return EINVAL for unknown clocks.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/asm-offsets.c |  3 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S | 19 +++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index b6328a90cad7..dbfd3ddc85dc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -417,7 +417,10 @@ int main(void)
DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
DEFINE(CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
+   DEFINE(CLOCK_MAX, CLOCK_TAI);
DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
+   DEFINE(EINVAL, EINVAL);
+   DEFINE(KTIME_LOW_RES, KTIME_LOW_RES);
 
 #ifdef CONFIG_BUG
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index 0f87be0ebf7e..c65f41c612f7 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -199,17 +199,20 @@ V_FUNCTION_END(__kernel_clock_gettime)
 V_FUNCTION_BEGIN(__kernel_clock_getres)
   .cfi_startproc
/* Check for supported clock IDs */
-   cmpwi   cr0,r3,CLOCK_REALTIME
-   cmpwi   cr1,r3,CLOCK_MONOTONIC
-   crorcr0*4+eq,cr0*4+eq,cr1*4+eq
-   bne cr0,99f
+   cmplwi  cr0, r3, CLOCK_MAX
+   cmpwi   cr1, r3, CLOCK_REALTIME_COARSE
+   cmpwi   cr7, r3, CLOCK_MONOTONIC_COARSE
+   bgt cr0, 99f
+   LOAD_REG_IMMEDIATE(r5, KTIME_LOW_RES)
+   beq cr1, 1f
+   beq cr7, 1f
 
mflrr12
   .cfi_register lr,r12
get_datapager3, r0
lwz r5, CLOCK_HRTIMER_RES(r3)
mtlrr12
-   li  r3,0
+1: li  r3,0
cmpli   cr0,r4,0
crclr   cr0*4+so
beqlr
@@ -218,11 +221,11 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
blr
 
/*
-* syscall fallback
+* invalid clock
 */
 99:
-   li  r0,__NR_clock_getres
-   sc
+   li  r3, EINVAL
+   crset   so
blr
   .cfi_endproc
 V_FUNCTION_END(__kernel_clock_getres)
-- 
2.13.3

[PATCH v2 5/8] powerpc/vdso32: Don't read cache line size from the datapage on PPC32.

2019-08-22 Thread Christophe Leroy

On PPC32, the cache lines have a fixed size known at build time.

Don't read it from the datapage.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso_datapage.h |  4 
 arch/powerpc/kernel/asm-offsets.c|  2 +-
 arch/powerpc/kernel/vdso.c   |  5 -
 arch/powerpc/kernel/vdso32/cacheflush.S  | 23 +++
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index 2ccb938d8544..869df2f43400 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -106,10 +106,6 @@ struct vdso_data {
__u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
__u32 hrtimer_res;  /* hrtimer resolution */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
-   __u32 dcache_block_size;/* L1 d-cache block size */
-   __u32 icache_block_size;/* L1 i-cache block size */
-   __u32 dcache_log_block_size;/* L1 d-cache log block size */
-   __u32 icache_log_block_size;/* L1 i-cache log block size */
 };
 
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 6279053967fd..b6328a90cad7 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -388,11 +388,11 @@ int main(void)
OFFSET(STAMP_XTIME, vdso_data, stamp_xtime);
OFFSET(STAMP_SEC_FRAC, vdso_data, stamp_sec_fraction);
OFFSET(CLOCK_HRTIMER_RES, vdso_data, hrtimer_res);
+#ifdef CONFIG_PPC64
OFFSET(CFG_ICACHE_BLOCKSZ, vdso_data, icache_block_size);
OFFSET(CFG_DCACHE_BLOCKSZ, vdso_data, dcache_block_size);
OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_data, icache_log_block_size);
OFFSET(CFG_DCACHE_LOGBLOCKSZ, vdso_data, dcache_log_block_size);
-#ifdef CONFIG_PPC64
OFFSET(CFG_SYSCALL_MAP64, vdso_data, syscall_map_64);
OFFSET(TVAL64_TV_SEC, timeval, tv_sec);
OFFSET(TVAL64_TV_USEC, timeval, tv_usec);
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index d60598113a9f..87d43e43e2e7 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -750,11 +750,6 @@ static int __init vdso_init(void)
 */
vdso64_pages = (_end - _start) >> PAGE_SHIFT;
DBG("vdso64_kbase: %p, 0x%x pages\n", vdso64_kbase, vdso64_pages);
-#else
-   vdso_data->dcache_block_size = L1_CACHE_BYTES;
-   vdso_data->dcache_log_block_size = L1_CACHE_SHIFT;
-   vdso_data->icache_block_size = L1_CACHE_BYTES;
-   vdso_data->icache_log_block_size = L1_CACHE_SHIFT;
 #endif /* CONFIG_PPC64 */
 
 
diff --git a/arch/powerpc/kernel/vdso32/cacheflush.S 
b/arch/powerpc/kernel/vdso32/cacheflush.S
index e9453837e4ee..b9340a89984e 100644
--- a/arch/powerpc/kernel/vdso32/cacheflush.S
+++ b/arch/powerpc/kernel/vdso32/cacheflush.S
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "datapage.h"
 
@@ -24,28 +25,44 @@
  */
 V_FUNCTION_BEGIN(__kernel_sync_dicache)
   .cfi_startproc
+#ifdef CONFIG_PPC64
mflrr12
   .cfi_register lr,r12
get_datapager10, r0
mtlrr12
+#endif
 
+#ifdef CONFIG_PPC64
lwz r7,CFG_DCACHE_BLOCKSZ(r10)
addir5,r7,-1
+#else
+   li  r5, L1_CACHE_BYTES - 1
+#endif
andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
+#ifdef CONFIG_PPC64
lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
srw.r8,r8,r9/* compute line count */
+#else
+   srwi.   r8, r8, L1_CACHE_SHIFT
+   mr  r7, r6
+#endif
crclr   cr0*4+so
beqlr   /* nothing to do? */
mtctr   r8
 1: dcbst   0,r6
+#ifdef CONFIG_PPC64
add r6,r6,r7
+#else
+   addir6, r6, L1_CACHE_BYTES
+#endif
bdnz1b
sync
 
 /* Now invalidate the instruction cache */
 
+#ifdef CONFIG_PPC64
lwz r7,CFG_ICACHE_BLOCKSZ(r10)
addir5,r7,-1
andcr6,r3,r5/* round low to line bdy */
@@ -55,9 +72,15 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
srw.r8,r8,r9/* compute line count */
crclr   cr0*4+so
beqlr   /* nothing to do? */
+#endif
mtctr   r8
+#ifdef CONFIG_PPC64
 2: icbi0,r6
add r6,r6,r7
+#else
+2: icbi0, r7
+   addir7, r7, L1_CACHE_BYTES
+#endif
bdnz2b
isync
li  r3,0
-- 
2.13.3

[PATCH v2 4/8] powerpc/vdso32: inline __get_datapage()

2019-08-22 Thread Christophe Leroy

__get_datapage() is only a few instructions to retrieve the
address of the page where the kernel stores data to the VDSO.

By inlining this function into its users, a bl/blr pair and
a mflr/mtlr pair is avoided, plus a few reg moves.

The improvement is noticeable (about 55 nsec/call on an 8xx)

vdsotest before the patch:
gettimeofday:vdso: 731 nsec/call
clock-gettime-realtime-coarse:vdso: 668 nsec/call
clock-gettime-monotonic-coarse:vdso: 745 nsec/call

vdsotest after the patch:
gettimeofday:vdso: 677 nsec/call
clock-gettime-realtime-coarse:vdso: 613 nsec/call
clock-gettime-monotonic-coarse:vdso: 690 nsec/call

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/cacheflush.S   | 10 +-
 arch/powerpc/kernel/vdso32/datapage.S | 29 -
 arch/powerpc/kernel/vdso32/datapage.h | 11 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S | 13 ++---
 4 files changed, 26 insertions(+), 37 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso32/datapage.h

diff --git a/arch/powerpc/kernel/vdso32/cacheflush.S 
b/arch/powerpc/kernel/vdso32/cacheflush.S
index 7f882e7b9f43..e9453837e4ee 100644
--- a/arch/powerpc/kernel/vdso32/cacheflush.S
+++ b/arch/powerpc/kernel/vdso32/cacheflush.S
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include "datapage.h"
+
.text
 
 /*
@@ -24,14 +26,12 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   mr  r11,r3
-   bl  __get_datapage@local
+   get_datapager10, r0
mtlrr12
-   mr  r10,r3
 
lwz r7,CFG_DCACHE_BLOCKSZ(r10)
addir5,r7,-1
-   andcr6,r11,r5   /* round low to line bdy */
+   andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
@@ -48,7 +48,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
 
lwz r7,CFG_ICACHE_BLOCKSZ(r10)
addir5,r7,-1
-   andcr6,r11,r5   /* round low to line bdy */
+   andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10)
diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
b/arch/powerpc/kernel/vdso32/datapage.S
index 6984125b9fc0..d480d2d4a3fe 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -11,34 +11,13 @@
 #include 
 #include 
 
+#include "datapage.h"
+
.text
.global __kernel_datapage_offset;
 __kernel_datapage_offset:
.long   0
 
-V_FUNCTION_BEGIN(__get_datapage)
-  .cfi_startproc
-   /* We don't want that exposed or overridable as we want other objects
-* to be able to bl directly to here
-*/
-   .protected __get_datapage
-   .hidden __get_datapage
-
-   mflrr0
-  .cfi_register lr,r0
-
-   bcl 20,31,data_page_branch
-data_page_branch:
-   mflrr3
-   mtlrr0
-   addir3, r3, __kernel_datapage_offset-data_page_branch
-   lwz r0,0(r3)
-  .cfi_restore lr
-   add r3,r0,r3
-   blr
-  .cfi_endproc
-V_FUNCTION_END(__get_datapage)
-
 /*
  * void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
  *
@@ -53,7 +32,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr  r4,r3
-   bl  __get_datapage@local
+   get_datapager3, r0
mtlrr12
addir3,r3,CFG_SYSCALL_MAP32
cmpli   cr0,r4,0
@@ -74,7 +53,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   bl  __get_datapage@local
+   get_datapager3, r0
lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3)
lwz r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
diff --git a/arch/powerpc/kernel/vdso32/datapage.h 
b/arch/powerpc/kernel/vdso32/datapage.h
new file mode 100644
index ..74f4f57c2da8
--- /dev/null
+++ b/arch/powerpc/kernel/vdso32/datapage.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+.macro get_datapage ptr, tmp
+   bcl 20,31,.+4
+   mflr\ptr
+   addi\ptr, \ptr, __kernel_datapage_offset - (.-4)
+   lwz \tmp, 0(\ptr)
+   add \ptr, \tmp, \ptr
+.endm
+
+
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index 355b537d327a..3e55cba19f44 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#include "datapage.h"
+
 /* Offset for the low 32-bit part of a field of long type */
 #ifdef CONFIG_PPC64
 #define LOPART 4
@@ -35,8 +37,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
 
mr

[PATCH v2 2/8] powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE

2019-08-22 Thread Christophe Leroy

This is copied and adapted from commit 5c929885f1bb ("powerpc/vdso64:
Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
from Santosh Sivaraj 

Benchmark from vdsotest-all:
clock-gettime-realtime: syscall: 3601 nsec/call
clock-gettime-realtime:libc: 1072 nsec/call
clock-gettime-realtime:vdso: 931 nsec/call
clock-gettime-monotonic: syscall: 4034 nsec/call
clock-gettime-monotonic:libc: 1213 nsec/call
clock-gettime-monotonic:vdso: 1076 nsec/call
clock-gettime-realtime-coarse: syscall: 2722 nsec/call
clock-gettime-realtime-coarse:libc: 805 nsec/call
clock-gettime-realtime-coarse:vdso: 668 nsec/call
clock-gettime-monotonic-coarse: syscall: 2949 nsec/call
clock-gettime-monotonic-coarse:libc: 882 nsec/call
clock-gettime-monotonic-coarse:vdso: 745 nsec/call

Additional test passed with:
vdsotest -d 30 clock-gettime-monotonic-coarse verify

Signed-off-by: Christophe Leroy 
Cc: Naveen N. Rao 
Cc: Santosh Sivaraj 
Link: https://github.com/linuxppc/issues/issues/41
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 64 +++
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index becd9f8767ed..decd263c16e0 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -71,7 +71,13 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cmpli   cr0,r3,CLOCK_REALTIME
cmpli   cr1,r3,CLOCK_MONOTONIC
crorcr0*4+eq,cr0*4+eq,cr1*4+eq
-   bne cr0,99f
+
+   cmpli   cr5,r3,CLOCK_REALTIME_COARSE
+   cmpli   cr6,r3,CLOCK_MONOTONIC_COARSE
+   crorcr5*4+eq,cr5*4+eq,cr6*4+eq
+
+   crorcr0*4+eq,cr0*4+eq,cr5*4+eq
+   bne cr0, .Lgettime_fallback
 
mflrr12 /* r12 saves lr */
   .cfi_register lr,r12
@@ -80,8 +86,10 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
mr  r9,r3   /* datapage ptr in r9 */
lis r7,NSEC_PER_SEC@h   /* want nanoseconds */
ori r7,r7,NSEC_PER_SEC@l
-50:bl  __do_get_tspec@local/* get sec/nsec from tb & kernel */
-   bne cr1,80f /* not monotonic -> all done */
+   beq cr5, .Lcoarse_clocks
+.Lprecise_clocks:
+   bl  __do_get_tspec@local/* get sec/nsec from tb & kernel */
+   bne cr1, .Lfinish   /* not monotonic -> all done */
 
/*
 * CLOCK_MONOTONIC
@@ -105,12 +113,53 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
add r9,r9,r0
lwz r0,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
 cmplcr0,r8,r0  /* check if updated */
-   bne-50b
+   bne-.Lprecise_clocks
+   b   .Lfinish_monotonic
+
+   /*
+* For coarse clocks we get data directly from the vdso data page, so
+* we don't need to call __do_get_tspec, but we still need to do the
+* counter trick.
+*/
+.Lcoarse_clocks:
+   lwz r8,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
+   andi.   r0,r8,1 /* pending update ? loop */
+   bne-.Lcoarse_clocks
+   add r9,r9,r0/* r0 is already 0 */
+
+   /*
+* CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
+* too
+*/
+   lwz r3,STAMP_XTIME+TSPC32_TV_SEC(r9)
+   lwz r4,STAMP_XTIME+TSPC32_TV_NSEC(r9)
+   bne cr6,1f
+
+   /* CLOCK_MONOTONIC_COARSE */
+   lwz r5,(WTOM_CLOCK_SEC+LOPART)(r9)
+   lwz r6,WTOM_CLOCK_NSEC(r9)
+
+   /* check if counter has updated */
+   or  r0,r6,r5
+1: or  r0,r0,r3
+   or  r0,r0,r4
+   xor r0,r0,r0
+   add r3,r3,r0
+   lwz r0,CFG_TB_UPDATE_COUNT+LOPART(r9)
+   cmplcr0,r0,r8   /* check if updated */
+   bne-.Lcoarse_clocks
+
+   /* Counter has not updated, so continue calculating proper values for
+* sec and nsec if monotonic coarse, or just return with the proper
+* values for realtime.
+*/
+   bne cr6, .Lfinish
 
/* Calculate and store result. Note that this mimics the C code,
 * which may cause funny results if nsec goes negative... is that
 * possible at all ?
 */
+.Lfinish_monotonic:
add r3,r3,r5
add r4,r4,r6
cmpwcr0,r4,r7
@@ -118,11 +167,12 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
blt 1f
subfr4,r7,r4
addir3,r3,1
-1: bge cr1,80f
+1: bge cr1, .Lfinish
addir3,r3,-1
add r4,r4,r7
 
-80:stw r3,TSPC32_TV_SEC(r11)
+.Lfinish:
+   stw r3,TSPC32_TV_SEC(r11)
stw r4,TSPC32_TV_NSEC(r11)
 
mtlrr12
@@ -133,7 +183,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
/*
 * syscall fallback
 */
-99:
+.Lgettime_fallback:
li  r0,__NR_clock_gettime

[PATCH v2 0/8] powerpc/vdso32 enhancement and optimisation

2019-08-22 Thread Christophe Leroy

This series:
- adds getcpu()
- adds coarse clocks in clock_gettime
- fixes and adds all clocks in clock_getres
- optimises the retrieval of the datapage address
- optimises the cache functions

It puts together the three patches sent out earlier allthought they
were not presented as a series, hence the 'v2' tag for now.

v2:
- Used named labels in patch 2
- Added patch from Vincenzo to fix clock_getres() (patch 3)
- Removed unnecessary label in patch 4 as suggested by Segher
- Added patches 5 to 8

Christophe Leroy (8):
  powerpc/32: Add VDSO version of getcpu
  powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE
  powerpc: Fix vDSO clock_getres()
  powerpc/vdso32: inline __get_datapage()
  powerpc/vdso32: Don't read cache line size from the datapage on PPC32.
  powerpc/vdso32: use LOAD_REG_IMMEDIATE()
  powerpc/vdso32: implement clock_getres entirely
  powerpc/vdso32: miscellaneous optimisations

 arch/powerpc/include/asm/vdso.h   |   2 +
 arch/powerpc/include/asm/vdso_datapage.h  |   6 +-
 arch/powerpc/kernel/asm-offsets.c |   7 +-
 arch/powerpc/kernel/head_32.h |  13 
 arch/powerpc/kernel/head_booke.h  |  11 +++
 arch/powerpc/kernel/time.c|   1 +
 arch/powerpc/kernel/vdso.c|   5 --
 arch/powerpc/kernel/vdso32/Makefile   |   4 +-
 arch/powerpc/kernel/vdso32/cacheflush.S   |  33 ++--
 arch/powerpc/kernel/vdso32/datapage.S |  32 ++--
 arch/powerpc/kernel/vdso32/datapage.h |  11 +++
 arch/powerpc/kernel/vdso32/getcpu.S   |  13 +++-
 arch/powerpc/kernel/vdso32/gettimeofday.S | 125 +-
 arch/powerpc/kernel/vdso32/vdso32.lds.S   |   2 -
 arch/powerpc/kernel/vdso64/gettimeofday.S |   7 +-
 15 files changed, 183 insertions(+), 89 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso32/datapage.h

-- 
2.13.3

[PATCH v2 3/8] powerpc: Fix vDSO clock_getres()

2019-08-22 Thread Christophe Leroy

From: Vincenzo Frascino 

clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.

Fix the powerpc vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.

Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support
to 32 bits kernel")
Cc: sta...@vger.kernel.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Signed-off-by: Vincenzo Frascino 
Reviewed-by: Christophe Leroy 
Acked-by: Shuah Khan 
[chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES]
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso_datapage.h  | 2 ++
 arch/powerpc/kernel/asm-offsets.c | 2 +-
 arch/powerpc/kernel/time.c| 1 +
 arch/powerpc/kernel/vdso32/gettimeofday.S | 7 +--
 arch/powerpc/kernel/vdso64/gettimeofday.S | 7 +--
 5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index c61d59ed3b45..2ccb938d8544 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,6 +82,7 @@ struct vdso_data {
__s32 wtom_clock_nsec;  /* Wall to monotonic clock nsec 
*/
__s64 wtom_clock_sec;   /* Wall to monotonic clock sec 
*/
struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
+   __u32 hrtimer_res;  /* hrtimer resolution */
__u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls  */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
 };
@@ -103,6 +104,7 @@ struct vdso_data {
__s32 wtom_clock_nsec;
struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
__u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
+   __u32 hrtimer_res;  /* hrtimer resolution */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
__u32 dcache_block_size;/* L1 d-cache block size */
__u32 icache_block_size;/* L1 i-cache block size */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4ccb6b3a7fbd..6279053967fd 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -387,6 +387,7 @@ int main(void)
OFFSET(WTOM_CLOCK_NSEC, vdso_data, wtom_clock_nsec);
OFFSET(STAMP_XTIME, vdso_data, stamp_xtime);
OFFSET(STAMP_SEC_FRAC, vdso_data, stamp_sec_fraction);
+   OFFSET(CLOCK_HRTIMER_RES, vdso_data, hrtimer_res);
OFFSET(CFG_ICACHE_BLOCKSZ, vdso_data, icache_block_size);
OFFSET(CFG_DCACHE_BLOCKSZ, vdso_data, dcache_block_size);
OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_data, icache_log_block_size);
@@ -417,7 +418,6 @@ int main(void)
DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
DEFINE(CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
-   DEFINE(CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC);
 
 #ifdef CONFIG_BUG
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 694522308cd5..619447b1b797 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -959,6 +959,7 @@ void update_vsyscall(struct timekeeper *tk)
vdso_data->wtom_clock_nsec = tk->wall_to_monotonic.tv_nsec;
vdso_data->stamp_xtime = xt;
vdso_data->stamp_sec_fraction = frac_sec;
+   vdso_data->hrtimer_res = hrtimer_resolution;
smp_wmb();
++(vdso_data->tb_update_count);
 }
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index decd263c16e0..355b537d327a 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -206,12 +206,15 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
crorcr0*4+eq,cr0*4+eq,cr1*4+eq
bne cr0,99f
 
+   mflrr12
+  .cfi_register lr,r12
+   bl  __get_datapage@local/* get data page */
+   lwz r5, CLOCK_HRTIMER_RES(r3)
+   mtlrr12
li  r3,0
cmpli   cr0,r4,0
crclr   cr0*4+so
beqlr
-   lis r5,CLOCK_REALTIME_RES@h
-   ori r5,r5,CLOCK_REALTIME_RES@l
stw r3,TSPC32_TV_SEC(r4)
stw r5,TSPC32_TV_NSEC(r4)
blr
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index 07bfe33fe874..81757f06bbd7 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -186,12 +186,15 @@

[PATCH v2 1/8] powerpc/32: Add VDSO version of getcpu

2019-08-22 Thread Christophe Leroy

Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added
getcpu() for PPC64 only, by making use of a user readable general
purpose SPR.

PPC32 doesn't have any such SPR, a full system call can still be
avoided by implementing a fast system call which reads the CPU id
in the task struct and returns immediately without going back in
virtual mode.

Before the patch, vdsotest reported:
getcpu: syscall: 1572 nsec/call
getcpu:libc: 1787 nsec/call
getcpu:vdso: not tested

Now, vdsotest reports:
getcpu: syscall: 1582 nsec/call
getcpu:libc: 667 nsec/call
getcpu:vdso: 368 nsec/call

For non SMP, just return CPU id 0 from the VDSO directly.

PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0.

Signed-off-by: Christophe Leroy 

---
v2: fixed build error in getcpu.S
---
 arch/powerpc/include/asm/vdso.h |  2 ++
 arch/powerpc/kernel/head_32.h   | 13 +
 arch/powerpc/kernel/head_booke.h| 11 +++
 arch/powerpc/kernel/vdso32/Makefile |  4 +---
 arch/powerpc/kernel/vdso32/getcpu.S |  7 +++
 arch/powerpc/kernel/vdso32/vdso32.lds.S |  2 --
 6 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso.h b/arch/powerpc/include/asm/vdso.h
index b5e1f8f8a05c..adb54782df5f 100644
--- a/arch/powerpc/include/asm/vdso.h
+++ b/arch/powerpc/include/asm/vdso.h
@@ -16,6 +16,8 @@
 /* Define if 64 bits VDSO has procedure descriptors */
 #undef VDS64_HAS_DESCRIPTORS
 
+#define NR_MAGIC_FAST_VDSO_SYSCALL 0x789a
+
 #ifndef __ASSEMBLY__
 
 /* Offsets relative to thread->vdso_base */
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 4a692553651f..a2e38b59785a 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -3,6 +3,8 @@
 #define __HEAD_32_H__
 
 #include /* for STACK_FRAME_REGS_MARKER */
+#include 
+#include 
 
 /*
  * MSR_KERNEL is > 0x8000 on 4xx/Book-E since it include MSR_CE.
@@ -74,7 +76,13 @@
 .endm
 
 .macro SYSCALL_ENTRY trapno
+#ifdef CONFIG_SMP
+   cmplwi  cr0, r0, NR_MAGIC_FAST_VDSO_SYSCALL
+#endif
mfspr   r12,SPRN_SPRG_THREAD
+#ifdef CONFIG_SMP
+   beq-1f
+#endif
mfcrr10
lwz r11,TASK_STACK-THREAD(r12)
mflrr9
@@ -152,6 +160,11 @@
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+#ifdef CONFIG_SMP
+1:
+   lwz r5, TASK_CPU - THREAD(r12)
+   RFI
+#endif
 .endm
 
 /*
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 2ae635df9026..c534e87cac84 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -3,6 +3,8 @@
 #define __HEAD_BOOKE_H__
 
 #include /* for STACK_FRAME_REGS_MARKER */
+#include 
+#include 
 #include 
 #include 
 
@@ -104,6 +106,10 @@ FTR_SECTION_ELSE
 #ifdef CONFIG_KVM_BOOKE_HV
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 #endif
+#ifdef CONFIG_SMP
+   cmplwi  cr0, r0, NR_MAGIC_FAST_VDSO_SYSCALL
+   beq-1f
+#endif
BOOKE_CLEAR_BTB(r11)
lwz r11, TASK_STACK - THREAD(r10)
rlwinm  r12,r12,0,4,2   /* Clear SO bit in CR */
@@ -176,6 +182,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+#ifdef CONFIG_SMP
+1:
+   lwz r5, TASK_CPU - THREAD(r10)
+   RFI
+#endif
 .endm
 
 /* To handle the additional exception priority levels on 40x and Book-E
diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 06f54d947057..e147bbdc12cd 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -2,9 +2,7 @@
 
 # List of files in the vdso, has to be asm only for now
 
-obj-vdso32-$(CONFIG_PPC64) = getcpu.o
-obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o \
-   $(obj-vdso32-y)
+obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
 
 # Build rules
 
diff --git a/arch/powerpc/kernel/vdso32/getcpu.S 
b/arch/powerpc/kernel/vdso32/getcpu.S
index 63e914539e1a..bde226ad904d 100644
--- a/arch/powerpc/kernel/vdso32/getcpu.S
+++ b/arch/powerpc/kernel/vdso32/getcpu.S
@@ -17,7 +17,14 @@
  */
 V_FUNCTION_BEGIN(__kernel_getcpu)
   .cfi_startproc
+#if defined(CONFIG_PPC64)
mfspr   r5,SPRN_SPRG_VDSO_READ
+#elif defined(CONFIG_SMP)
+   li  r0, NR_MAGIC_FAST_VDSO_SYSCALL
+   sc  /* returns cpuid in r5, clobbers cr0 and r10-r13 */
+#else
+   li  r5, 0
+#endif
cmpwi   cr0,r3,0
cmpwi   cr1,r4,0
clrlwi  r6,r5,16
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 099a6db14e67..663880671e20 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -152,9 +152,7 @@ VERSION
__kernel_sync_dicache_p5;

Re: [PATCH] powerpc/vdso64: inline __get_datapage()

2019-08-22 Thread Santosh Sivaraj

Christophe Leroy  writes:

> Le 21/08/2019 à 14:15, Segher Boessenkool a écrit :
>> On Wed, Aug 21, 2019 at 01:50:52PM +0200, Christophe Leroy wrote:
>>> Do you have any idea on how to avoid that bcl/mflr stuff ?
>> 
>> Do a load from some fixed address?  Maybe an absolute address, even?
>> lwz r3,-12344(0)  or similar (that address is in kernel space...)
>> 
>> There aren't many options, and certainly not many *good* options!
>> 
>
> IIUC, the VDSO is seen by apps the same way as a dynamic lib. Couldn't 
> the relocation be done only once when the app loads the VDSO as for a 
> regular .so lib ?

How does address space randomization work for .so libs?

>
> It looks like it is what others do, at least x86 and arm64, unless I 
> misunderstood their code.
>
> Christophe

Re: [PATCH v3 1/2] powerpc/powernv: Enhance opal message read interface

2019-08-22 Thread Vasant Hegde


On 8/22/19 11:21 AM, Oliver O'Halloran wrote:

On Wed, 2019-08-21 at 13:43 +0530, Vasant Hegde wrote:

Use "opal-msg-size" device tree property to allocate memory for "opal_msg".

Cc: Mahesh Salgaonkar 
Cc: Jeremy Kerr 
Signed-off-by: Vasant Hegde 
---
Changes in v3:
   - Call BUG_ON, if we fail to allocate memory during init.

-Vasant

  arch/powerpc/platforms/powernv/opal.c | 29 ++-
  1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index aba443be7daa..4f1f68f568bf 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -58,6 +58,8 @@ static DEFINE_SPINLOCK(opal_write_lock);
  static struct atomic_notifier_head opal_msg_notifier_head[OPAL_MSG_TYPE_MAX];
  static uint32_t opal_heartbeat;
  static struct task_struct *kopald_tsk;
+static struct opal_msg *opal_msg;
+static uint64_t opal_msg_size;
  
  void opal_configure_cores(void)

  {
@@ -271,14 +273,9 @@ static void opal_message_do_notify(uint32_t msg_type, void 
*msg)
  static void opal_handle_message(void)
  {
s64 ret;
-   /*
-* TODO: pre-allocate a message buffer depending on opal-msg-size
-* value in /proc/device-tree.
-*/
-   static struct opal_msg msg;
u32 type;
  
-	ret = opal_get_msg(__pa(), sizeof(msg));

+   ret = opal_get_msg(__pa(opal_msg), opal_msg_size);
/* No opal message pending. */
if (ret == OPAL_RESOURCE)
return;
@@ -290,14 +287,14 @@ static void opal_handle_message(void)
return;
}
  
-	type = be32_to_cpu(msg.msg_type);

+   type = be32_to_cpu(opal_msg->msg_type);
  
  	/* Sanity check */

if (type >= OPAL_MSG_TYPE_MAX) {
pr_warn_once("%s: Unknown message type: %u\n", __func__, type);
return;
}
-   opal_message_do_notify(type, (void *));
+   opal_message_do_notify(type, (void *)opal_msg);
  }
  
  static irqreturn_t opal_message_notify(int irq, void *data)

@@ -306,9 +303,21 @@ static irqreturn_t opal_message_notify(int irq, void *data)
return IRQ_HANDLED;
  }
  
-static int __init opal_message_init(void)

+static int __init opal_message_init(struct device_node *opal_node)
  {
int ret, i, irq;



+   const __be32 *val;
+
+   val = of_get_property(opal_node, "opal-msg-size", NULL);
+   if (val)
+   opal_msg_size = be32_to_cpup(val);


Use of_property_read_u32()


Yes. Will fix it.




+
+   /* If opal-msg-size property is not available then use default size */
+   if (!opal_msg_size)
+   opal_msg_size = sizeof(struct opal_msg);
+
+   opal_msg = kmalloc(opal_msg_size, GFP_KERNEL);



+   BUG_ON(opal_msg == NULL);


Seems questionable. Why not fall back to using a staticly allocated
struct opal_msg? Or re-try the allocation with the size limited to
sizeof(struct opal_msg)?


If we are not able to allocate memory during init then we have bigger problem.
No point in continuing. Hence added BUG_ON().
May be I can retry allocation with fixed size before calling BUG_ON().
How about something like below :

+   /* If opal-msg-size property is not available then use default size */
+   if (!opal_msg_size)
+   opal_msg_size = sizeof(struct opal_msg);
+
+   opal_msg = kmalloc(opal_msg_size, GFP_KERNEL);
+   if (!opal_msg) {
+   opal_msg = kmalloc(sizeof(struct opal_msg), GFP_KERNEL);
+   BUG_ON(opal_msg == NULL);
+   }


-Vasant

[PATCH -next] crypto: nx - remove unused variables 'nx_driver_string' and 'nx_driver_version'

2019-08-22 Thread YueHaibing

drivers/crypto/nx/nx.h:12:19: warning:
 nx_driver_string defined but not used [-Wunused-const-variable=]
drivers/crypto/nx/nx.h:13:19: warning:
 nx_driver_version defined but not used [-Wunused-const-variable=]

They are never used, so just remove it.

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 drivers/crypto/nx/nx.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/crypto/nx/nx.h b/drivers/crypto/nx/nx.h
index c6b5a3b..7ecca16 100644
--- a/drivers/crypto/nx/nx.h
+++ b/drivers/crypto/nx/nx.h
@@ -9,9 +9,6 @@
 #define NX_STRING  "IBM Power7+ Nest Accelerator Crypto Driver"
 #define NX_VERSION "1.0"
 
-static const char nx_driver_string[] = NX_STRING;
-static const char nx_driver_version[] = NX_VERSION;
-
 /* a scatterlist in the format PHYP is expecting */
 struct nx_sg {
u64 addr;
-- 
2.7.4

[PATCH 3/3] powerpc/numa: Remove late request for home node associativity

2019-08-22 Thread Srikar Dronamraju

With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea626306810 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.

Hence remove the late request for home node associativity.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Satheesh Rajendran 
Reported-by: Abdul Haleem 
---
 arch/powerpc/include/asm/topology.h | 4 
 arch/powerpc/kernel/smp.c   | 5 -
 arch/powerpc/mm/numa.c  | 9 -
 3 files changed, 18 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 2f7e1ea..9bd396f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -98,7 +98,6 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 
*cpu2_assoc)
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 extern int timed_topology_update(int nsecs);
-extern void __init shared_proc_topology_init(void);
 #else
 static inline int start_topology_update(void)
 {
@@ -121,9 +120,6 @@ static inline int timed_topology_update(int nsecs)
return 0;
 }
 
-#ifdef CONFIG_SMP
-static inline void shared_proc_topology_init(void) {}
-#endif
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ea6adbf..cdd39a0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1359,11 +1359,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
if (smp_ops && smp_ops->bringup_done)
smp_ops->bringup_done();
 
-   /*
-* On a shared LPAR, associativity needs to be requested.
-* Hence, get numa topology before dumping cpu topology
-*/
-   shared_proc_topology_init();
dump_numa_cpu_topology();
 
 #ifdef CONFIG_SCHED_SMT
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 7965d3b..2efeac8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1604,15 +1604,6 @@ int prrn_is_enabled(void)
return prrn_enabled;
 }
 
-void __init shared_proc_topology_init(void)
-{
-   if (lppaca_shared_proc(get_lppaca())) {
-   bitmap_fill(cpumask_bits(_associativity_changes_mask),
-   nr_cpumask_bits);
-   numa_update_cpu_topology(false);
-   }
-}
-
 static int topology_read(struct seq_file *file, void *v)
 {
if (vphn_enabled || prrn_enabled)
-- 
1.8.3.1

[PATCH 2/3] powerpc/numa: Early request for home node associativity

2019-08-22 Thread Srikar Dronamraju

Currently the kernel detects if its running on a shared lpar platform
and requests home node associativity before the scheduler sched_domains
are setup. However between the time NUMA setup is initialized and the
request for home node associativity, workqueue initializes its per node
cpumask. The per node workqueue possible cpumask may turn invalid
after home node associativity resulting in weird situations like
workqueue possible cpumask being a subset of workqueue online cpumask.

This can be fixed by requesting home node associativity earlier just
before NUMA setup. However at the NUMA setup time, kernel may not be in
a position to detect if its running on a shared lpar platform. So
request for home node associativity and if the request fails, fallback
on the device tree property.

However home node associativity requires cpu's hwid which is set in
smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Satheesh Rajendran 
Reported-by: Abdul Haleem 
---
 arch/powerpc/kernel/setup-common.c |  5 +++--
 arch/powerpc/mm/numa.c | 28 +++-
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 1f8db66..9135dba 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -888,6 +888,9 @@ void __init setup_arch(char **cmdline_p)
/* Check the SMT related command line arguments (ppc64). */
check_smt_enabled();
 
+#ifdef CONFIG_SMP
+   smp_setup_pacas();
+#endif
/* Parse memory topology */
mem_topology_setup();
 
@@ -899,8 +902,6 @@ void __init setup_arch(char **cmdline_p)
 * so smp_release_cpus() does nothing for them.
 */
 #ifdef CONFIG_SMP
-   smp_setup_pacas();
-
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 88b5157..7965d3b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
return nid;
 }
 
+static int vphn_get_nid(unsigned long cpu)
+{
+   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+   long rc;
+
+   /* Use associativity from first thread for all siblings */
+   rc = hcall_vphn(get_hard_smp_processor_id(cpu),
+   VPHN_FLAG_VCPU, associativity);
+
+   if (rc == H_SUCCESS)
+   return  associativity_to_nid(associativity);
+
+   return NUMA_NO_NODE;
+}
+
 /*
  * Figure out to which domain a cpu belongs and stick it there.
  * Return the id of the domain used.
@@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
goto out;
}
 
-   nid = of_node_to_nid_single(cpu);
+   /*
+* On a shared lpar, the device tree might not have the correct node
+* associativity.  At this time lppaca, or its __old_status field
+* may not be updated. Hence request an explicit associativity
+* irrespective of whether the lpar is shared or dedicated.  Use the
+* device tree property as a fallback.
+*/
+   if (firmware_has_feature(FW_FEATURE_VPHN))
+   nid = vphn_get_nid(lcpu);
+
+   if (nid == NUMA_NO_NODE)
+   nid = of_node_to_nid_single(cpu);
 
 out_present:
if (nid < 0 || !node_possible(nid))
-- 
1.8.3.1

[PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn

2019-08-22 Thread Srikar Dronamraju

There is no point in unpacking associativity, if
H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.

Also added error messages for H_PARAMETER and default case in
vphn_get_associativity.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Satheesh Rajendran 
Reported-by: Abdul Haleem 
---
 arch/powerpc/mm/numa.c| 16 +---
 arch/powerpc/platforms/pseries/vphn.c |  3 ++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 50d68d2..88b5157 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1191,6 +1191,10 @@ static long vphn_get_associativity(unsigned long cpu,
VPHN_FLAG_VCPU, associativity);
 
switch (rc) {
+   case H_SUCCESS:
+   dbg("VPHN hcall succeeded. Reset polling...\n");
+   timed_topology_update(0);
+   break;
case H_FUNCTION:
printk_once(KERN_INFO
"VPHN is not supported. Disabling polling...\n");
@@ -1202,9 +1206,15 @@ static long vphn_get_associativity(unsigned long cpu,
"preventing VPHN. Disabling polling...\n");
stop_topology_update();
break;
-   case H_SUCCESS:
-   dbg("VPHN hcall succeeded. Reset polling...\n");
-   timed_topology_update(0);
+   case H_PARAMETER:
+   printk(KERN_ERR
+   "hcall_vphn() was passed an invalid parameter."
+   "Disabling polling...\n");
+   break;
+   default:
+   printk(KERN_ERR
+   "hcall_vphn() returned %ld. Disabling polling \n", rc);
+   stop_topology_update();
break;
}
 
diff --git a/arch/powerpc/platforms/pseries/vphn.c 
b/arch/powerpc/platforms/pseries/vphn.c
index 3f07bf6..cca474a 100644
--- a/arch/powerpc/platforms/pseries/vphn.c
+++ b/arch/powerpc/platforms/pseries/vphn.c
@@ -82,7 +82,8 @@ long hcall_vphn(unsigned long cpu, u64 flags, __be32 
*associativity)
long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
 
rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, cpu);
-   vphn_unpack_associativity(retbuf, associativity);
+   if (rc == H_SUCCESS)
+   vphn_unpack_associativity(retbuf, associativity);
 
return rc;
 }
-- 
1.8.3.1

[PATCH 0/3] Early node associativity

2019-08-22 Thread Srikar Dronamraju

Abdul reported  a warning on a shared lpar.
"WARNING: workqueue cpumask: online intersect > possible intersect".
This is because per node workqueue possible mask is set very early in the
boot process even before the system was querying the home node
associativity. However per node workqueue online cpumask gets updated
dynamically. Hence there is a chance when per node workqueue online cpumask
is a superset of per node workqueue possible mask.

The below patches try to fix this problem.
Reported at : https://github.com/linuxppc/issues/issues/167

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Abdul Haleem 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran 

Srikar Dronamraju (3):
  powerpc/vphn: Check for error from hcall_vphn
  powerpc/numa: Early request for home node associativity
  powerpc/numa: Remove late request for home node associativity

 arch/powerpc/include/asm/topology.h   |  4 ---
 arch/powerpc/kernel/setup-common.c|  5 ++--
 arch/powerpc/kernel/smp.c |  5 
 arch/powerpc/mm/numa.c| 53 ++-
 arch/powerpc/platforms/pseries/vphn.c |  3 +-
 5 files changed, 45 insertions(+), 25 deletions(-)

-- 
1.8.3.1

[PATCH v2] powerpc/smp: Use nid as fallback for package_id

2019-08-22 Thread Srikar Dronamraju

Package_id is to find out all cores that are part of the same chip. On
PowerNV machines, package_id defaults to chip_id. However ibm,chip_id
property is not present in device-tree of PowerVM Lpars. Hence lscpu
output shows one core per socket and multiple cores.

To overcome this, use nid as the package_id on PowerVM Lpars.

Before the patch.
---
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  8
Core(s) per socket:  1   <--
Socket(s):   16  <--
NUMA node(s):2
Model:   2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
 #
 # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
 -1
 #

After the patch
---
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  8  <--
Core(s) per socket:  8  <--
Socket(s):   2
NUMA node(s):2
Model:   2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
 #
 # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
 0
 #
Now lscpu output is more in line with the system configuration.

Link to previous posting: https://patchwork.ozlabs.org/patch/1126145

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Vasant Hegde 
Cc: Vaidyanathan Srinivasan 
---
Changelog from v1:
In V1 cpu_to_chip_id was overloaded to fallback on nid.  Michael Ellerman
wasn't comfortable with nid being shown up as chip_id.

 arch/powerpc/include/asm/topology.h |  3 +-
 arch/powerpc/kernel/smp.c   | 46 +++--
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 2f7e1ea5089e..f0c4b2f06665 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -134,7 +134,8 @@ static inline void shared_proc_topology_init(void) {}
 #ifdef CONFIG_PPC64
 #include 
 
-#define topology_physical_package_id(cpu)  (cpu_to_chip_id(cpu))
+extern int get_physical_package_id(int);
+#define topology_physical_package_id(cpu)  (get_physical_package_id(cpu))
 #define topology_sibling_cpumask(cpu)  (per_cpu(cpu_sibling_map, cpu))
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)  (cpu_to_core_id(cpu))
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ea6adbf6a221..4d1541cc5e95 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1185,10 +1185,50 @@ static inline void add_cpu_to_smallcore_masks(int cpu)
}
 }
 
+#ifdef CONFIG_PPC64
+int get_physical_package_id(cpu)
+{
+   struct device_node *np, *root;
+   struct property *pp;
+   int gppid = cpu_to_chip_id(cpu);
+
+   /*
+* If the platform is PowerNV or Guest on KVM, ibm,chip-id is
+* defined. Hence we would return the chip-id as the
+* get_physical_package_id.
+*/
+   if (gppid == -1 && firmware_has_feature(FW_FEATURE_LPAR) &&
+   machine_is(pseries)) {
+   /*
+* PowerVM hypervisor doesn't export ibm,chip-id property.
+* Currently only PowerVM hypervisor supports
+* /rtas/ibm,configure-kernel-dump property. Use this
+* property to identify PowerVM LPARs within pseries
+* platform.
+*/
+   root = of_find_node_by_path("/rtas");
+   if (root) {
+   pp = of_find_property(root,
+   "ibm,configure-kernel-dump", NULL);
+   if (pp) {
+   np = of_get_cpu_node(cpu, NULL);
+   if (np) {
+   gppid = of_node_to_nid(np);
+   of_node_put(np);
+   }
+   }
+   of_node_put(root);
+   }
+   }
+   return gppid;
+}
+EXPORT_SYMBOL(get_physical_package_id);
+#endif
+
 static void add_cpu_to_masks(int cpu)
 {
int first_thread = cpu_first_thread_sibling(cpu);
-   int chipid =

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-22 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #39 from David Sterba (dste...@suse.com) ---
Though I don't like neither of the patches, I'll apply one of them so it works
and we can think of a better fix later.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH v11 1/7] powerpc/mce: Schedule work from irq_work

2019-08-22 Thread Michael Ellerman

On Tue, 2019-08-20 at 08:13:46 UTC, Santosh Sivaraj wrote:
> schedule_work() cannot be called from MCE exception context as MCE can
> interrupt even in interrupt disabled context.
> 
> fixes: 733e4a4c ("powerpc/mce: hookup memory_failure for UE errors")
> Reviewed-by: Mahesh Salgaonkar 
> Reviewed-by: Nicholas Piggin 
> Acked-by: Balbir Singh 
> Signed-off-by: Santosh Sivaraj 
> Cc: sta...@vger.kernel.org # v4.15+

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b5bda6263cad9a927e1a4edb7493d542da0c1410

cheers

Re: [PATCH] powerpc/603: fix handling of the DIRTY flag

2019-08-22 Thread Michael Ellerman

On Mon, 2019-08-19 at 06:40:25 UTC, Christophe Leroy wrote:
> If a page is already mapped RW without the DIRTY flag, the DIRTY
> flag is never set and a TLB store miss exception is taken forever.
> 
> This is easily reproduced with the following app:
> 
> void main(void)
> {
>   volatile char *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | 
> MAP_ANONYMOUS, -1, 0);
> 
>   *ptr = *ptr;
> }
> 
> When DIRTY flag is not set, bail out of TLB miss handler and take
> a minor page fault which will set the DIRTY flag.
> 
> Fixes: f8b58c64eaef ("powerpc/603: let's handle PAGE_DIRTY directly")
> Cc: sta...@vger.kernel.org
> Reported-by: Doug Crawford 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/415480dce2ef03bb8335deebd2f402f475443ce0

cheers

Re: [PATCH] powerpc/32: Add warning on misaligned copy_page() or clear_page()

2019-08-22 Thread Michael Ellerman

On Fri, 2019-08-16 at 07:52:20 UTC, Christophe Leroy wrote:
> copy_page() and clear_page() expect page aligned destination, and
> use dcbz instruction to clear entire cache lines based on the
> assumption that the destination is cache aligned.
> 
> As shown during analysis of a bug in BTRFS filesystem, a misaligned
> copy_page() can create bugs that are difficult to locate (see Link).
> 
> Add an explicit WARNING when copy_page() or clear_page() are called
> with misaligned destination.
> 
> Signed-off-by: Christophe Leroy 
> Cc: Erhard F. 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204371

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7ab0b7cb8951d4095d73e203759b74d41916e455

cheers

Re: [PATCH 1/5] powerpc/mm: define empty update_mmu_cache() as static inline

2019-08-22 Thread Michael Ellerman

On Fri, 2019-08-16 at 05:41:40 UTC, Christophe Leroy wrote:
> Only BOOK3S and FSL_BOOK3E have a usefull update_mmu_cache().
> 
> For the others, just define it static inline.
> 
> In the meantime, simplify the FSL_BOOK3E related ifdef as
> book3e_hugetlb_preload() only exists when CONFIG_PPC_FSL_BOOK3E
> is selected.
> 
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d9642117914c9d3f800b3bacc19d7e388b04edb4

cheers

Re: [PATCH 1/3] powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL

2019-08-22 Thread Michael Ellerman

On Wed, 2019-08-14 at 15:47:52 UTC, =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= wrote:
> Currently, the xmon 'dx' command calls OPAL to dump the XIVE state in
> the OPAL logs and also outputs some of the fields of the internal XIVE
> structures in Linux. The OPAL calls can only be done on baremetal
> (PowerNV) and they crash a pseries machine. Fix by checking the
> hypervisor feature of the CPU.
> 
> Signed-off-by: CÃ©dric Le Goater 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c3e0dbd7f780a58c4695f1cd8fc8afde80376737

cheers

Re: [PATCH] powerpc/mm: don't display empty early ioremap area

2019-08-22 Thread Michael Ellerman

On Wed, 2019-08-14 at 14:36:10 UTC, Christophe Leroy wrote:
> On the 8xx, the layout displayed at boot is:
> 
> [0.00] Memory: 121856K/131072K available (5728K kernel code, 592K 
> rwdata, 1248K rodata, 560K init, 448K bss, 9216K reserved, 0K cma-reserved)
> [0.00] Kernel virtual memory layout:
> [0.00]   * 0xffefc000..0xc000  : fixmap
> [0.00]   * 0xffefc000..0xffefc000  : early ioremap
> [0.00]   * 0xc900..0xffefc000  : vmalloc & ioremap
> [0.00] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> 
> Remove display of an empty early ioremap.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ad628a34ec4e3558bf838195f60bbaa4c2b68f2a

cheers

Re: [PATCH 1/5] powerpc/ptdump: fix addresses display on PPC32

2019-08-22 Thread Michael Ellerman

On Wed, 2019-08-14 at 12:36:09 UTC, Christophe Leroy wrote:
> Commit 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")
> wrongly changed KERN_VIRT_START from 0 to PAGE_OFFSET, leading to a
> shift in the displayed addresses.
> 
> Lets revert that change to resync walk_pagetables()'s addr val and
> pgd_t pointer for PPC32.
> 
> Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7c7a532ba3fc51bf9527d191fb410786c1fdc73c

cheers

Re: [PATCH v2] powerpc/32s: fix boot failure with DEBUG_PAGEALLOC without KASAN.

2019-08-22 Thread Michael Ellerman

On Wed, 2019-08-14 at 10:02:20 UTC, Christophe Leroy wrote:
> When KASAN is selected, the definitive hash table has to be
> set up later, but there is already an early temporary one.
> 
> When KASAN is not selected, there is no early hash table,
> so the setup of the definitive hash table cannot be delayed.
> 
> Reported-by: Jonathan Neuschafer 
> Fixes: 72f208c6a8f7 ("powerpc/32s: move hash code patching out of 
> MMU_init_hw()")
> Tested-by: Jonathan Neuschafer 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9d6d712fbf7766f21c838940eebcd7b4d476c5e6

cheers

Re: [PATCH] powerpc/futex: fix warning: 'oldval' may be used uninitialized in this function

2019-08-22 Thread Michael Ellerman

On Wed, 2019-08-14 at 09:25:52 UTC, Christophe Leroy wrote:
> CC  kernel/futex.o
> kernel/futex.c: In function 'do_futex':
> kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
>return oldval == cmparg;
>  ^
> kernel/futex.c:1651:6: note: 'oldval' was declared here
>   int oldval, ret;
>   ^
> 
> This is because arch_futex_atomic_op_inuser() only sets *oval
> if ret is NUL and GCC doesn't see that it will use it only when
> ret is NUL.
> 
> Anyway, the non-NUL ret path is an error path that won't suffer from
> setting *oval, and as *oval is a local var in futex_atomic_op_inuser()
> it will have no impact.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/38a0d0cdb46d3f91534e5b9839ec2d67be14c59d

cheers

Re: [PATCH] powerpc/kasan: fix parallele loading of modules.

2019-08-22 Thread Michael Ellerman

On Fri, 2019-08-09 at 14:58:09 UTC, Christophe Leroy wrote:
> Parallele loading of modules may lead to bad setup of shadow
> page table entries.
> 
> First, lets align modules so that two modules never share the same
> shadow page.
> 
> Second, ensure that two modules cannot allocate two page tables for
> the same PMD entry at the same time. This is done by using
> init_mm.page_table_lock in the same way as __pte_alloc_kernel()
> 
> Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/45ff3c55958542c3b76075d59741297b8cb31cbb

cheers

Re: [PATCH] powerpc/kasan: fix shadow area set up for modules.

2019-08-22 Thread Michael Ellerman

On Fri, 2019-08-09 at 14:58:10 UTC, Christophe Leroy wrote:
> When loading modules, from time to time an Oops is encountered
> during the init of shadow area for globals. This is due to the
> last page not always being mapped depending on the exact distance
> between the start and the end of the shadow area and the alignment
> with the page addresses.
> 
> Fix this by aligning the starting address with the page address.
> 
> Reported-by: Erhard F. 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204479
> Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/663c0c9496a69f80011205ba3194049bcafd681d

cheers

Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM

2019-08-22 Thread Michael Ellerman

On Fri, 2019-08-02 at 19:29:24 UTC, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
> 
> 1. Set cpu 7 offline via sysfs.
> 
> 2. Begin a partition migration, whose implementation requires the OS
>to ensure all present cpus are online; cpu 7 is onlined:
> 
>  rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
> 
>This sets cpu 7 online in all respects except for the cpu's
>corresponding struct device; dev->offline remains true.
> 
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>already online and returns success. The driver core (device_online)
>sets dev->offline = false.
> 
> 4. The migration completes and restores cpu 7 to offline state:
> 
>  rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
> 
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
> 
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
> 
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to 
> migration/hibernation")
> Signed-off-by: Nathan Lynch 
> Reviewed-by: Gautham R. Shenoy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a6717c01ddc259f6f73364779df058e2c67309f8

cheers

Re: [PATCH 1/5] powerpc/64s/radix: Fix memory hotplug section page table creation

2019-08-22 Thread Michael Ellerman

On Wed, 2019-07-24 at 08:46:34 UTC, Nicholas Piggin wrote:
> create_physical_mapping expects physical addresses, but creating and
> splitting these mappings after boot is supplying virtual (effective)
> addresses. This can be irritated by booting with mem= to limit memory
> then probing an unused physical memory range:
> 
>   echo  > /sys/devices/system/memory/probe
> 
> This mostly works by accident, firstly because __va(__va(x)) == __va(x)
> so the virtual address does not get corrupted. Secondly because pfn_pte
> masks out the upper bits of the pfn beyond the physical address limit,
> so a pfn constructed with a 0xc000 virtual linear address
> will be masked back to the correct physical address in the pte.
> 
> Cc: Reza Arbab 
> Fixes: 6cc27341b21a8 ("powerpc/mm: add radix__create_section_mapping()")
> Signed-off-by: Nicholas Piggin 
> Reviewed-by: Aneesh Kumar K.V 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8f51e3929470942e6a8744061254fdeef646cd36

cheers

Re: [PATCH kernel v5 1/4] powerpc/powernv/ioda: Fix race in TCE level allocation

2019-08-22 Thread Michael Ellerman

On Thu, 2019-07-18 at 05:11:36 UTC, Alexey Kardashevskiy wrote:
> pnv_tce() returns a pointer to a TCE entry and originally a TCE table
> would be pre-allocated. For the default case of 2GB window the table
> needs only a single level and that is fine. However if more levels are
> requested, it is possible to get a race when 2 threads want a pointer
> to a TCE entry from the same page of TCEs.
> 
> This adds cmpxchg to handle the race. Note that once TCE is non-zero,
> it cannot become zero again.
> 
> CC: sta...@vger.kernel.org # v4.19+
> Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on 
> demand")
> Signed-off-by: Alexey Kardashevskiy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/56090a3902c80c296e822d11acdb6a101b322c52

cheers

Re: [PATCH] powerpc/hw_breakpoint: move instruction stepping out of hw_breakpoint_handler()

2019-08-22 Thread Michael Ellerman

On Fri, 2019-06-28 at 15:55:52 UTC, Christophe Leroy wrote:
> On 8xx, breakpoints stop after executing the instruction, so
> stepping/emulation is not needed. Move it into a sub-function and
> remove the #ifdefs.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Ravi Bangoria 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/658d029df0bce6472c94b724ca54d74bc6659c2e

cheers

Re: [PATCH] powerpc/64: allow compiler to cache 'current'

2019-08-22 Thread Michael Ellerman

On Wed, 2019-06-12 at 14:03:17 UTC, Nicholas Piggin wrote:
> current may be cached by the compiler, so remove the volatile asm
> restriction. This results in better generated code, as well as being
> smaller and fewer dependent loads, it can avoid store-hit-load flushes
> like this one that shows up in irq_exit():
> 
> preempt_count_sub(HARDIRQ_OFFSET);
> if (!in_interrupt() && ...)
> 
> Which ends up as:
> 
> ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET;
> if (((struct thread_info *)current)->preempt_count ...
> 
> Evaluating current twice presently means it has to be loaded twice, and
> here gcc happens to pick a different register each time, then
> preempt_count is accessed via that base register:
> 
> 1058:   ld  r10,2392(r13) <-- current
> 105c:   lwz r9,0(r10) <-- preempt_count
> 1060:   addis   r9,r9,-1
> 1064:   stw r9,0(r10) <-- preempt_count
> 1068:   ld  r9,2392(r13)  <-- current
> 106c:   lwz r9,0(r9)  <-- preempt_count
> 1070:   rlwinm. r9,r9,0,11,23
> 1074:   bne 1090 
> 
> This can frustrate store-hit-load detection heuristics and cause
> flushes. Allowing the compiler to cache current in a reigster with this
> patch results in the same base register being used for all accesses,
> which is more likely to be detected as an alias:
> 
> 1058:   ld  r31,2392(r13)
> ...
> 1070:   lwz r9,0(r31)
> 1074:   addis   r9,r9,-1
> 1078:   stw r9,0(r31)
> 107c:   lwz r9,0(r31)
> 1080:   rlwinm. r9,r9,0,11,23
> 1084:   bne 10a0 
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e354d7dc81d0e81bea33165f381aff1eda45f5d9

cheers

Re: [PATCH v3] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()

2019-08-22 Thread Michael Ellerman

On Wed, 2019-05-15 at 07:45:52 UTC, "Gautham R. Shenoy" wrote:
> From: "Gautham R. Shenoy" 
> 
> The calls to arch_add_memory()/arch_remove_memory() are always made
> with the read-side cpu_hotplug_lock acquired via
> memory_hotplug_begin().  On pSeries,
> arch_add_memory()/arch_remove_memory() eventually call resize_hpt()
> which in turn calls stop_machine() which acquires the read-side
> cpu_hotplug_lock again, thereby resulting in the recursive acquisition
> of this lock.
...
> 
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>  with cpu_hotplug_lock held.
> 
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>  as a consequence of 1)
> 
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>  with cpu_hotplug_lock held.
> 
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Gautham R. Shenoy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c784be435d5dae28d3b03db31753dd7a18733f0c

cheers

Re: [PATCH v1 4/4] mmc: sdhci-of-esdhc: add erratum A011334 support in ls1028a 1.0 SoC

2019-08-22 Thread Ulf Hansson

On Wed, 14 Aug 2019 at 09:24, Yinbo Zhu  wrote:
>
> This patch is to add erratum A011334 support in ls1028a 1.0 SoC
>
> Signed-off-by: Yinbo Zhu 

Applied for next, thanks!

Kind regards
Uffe


> ---
>  drivers/mmc/host/sdhci-of-esdhc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c 
> b/drivers/mmc/host/sdhci-of-esdhc.c
> index b16f7d440f78..eb2b290447fc 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -1006,6 +1006,7 @@ static struct soc_device_attribute 
> soc_incorrect_hostver[] = {
>  static struct soc_device_attribute soc_fixup_sdhc_clkdivs[] = {
> { .family = "QorIQ LX2160A", .revision = "1.0", },
> { .family = "QorIQ LX2160A", .revision = "2.0", },
> +   { .family = "QorIQ LS1028A", .revision = "1.0", },
> { },
>  };
>
> --
> 2.17.1
>

Re: [PATCH v2 06/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-22 Thread Kishon Vijay Abraham I

Hi,

On 22/08/19 4:52 PM, Xiaowei Bao wrote:
> The different PCIe controller in one board may be have different
> capability of MSI or MSIX, so change the way of getting the MSI
> capability, make it more flexible.

please use different pci_epc_features table for different boards.

Thanks
Kishon
> 
> Signed-off-by: Xiaowei Bao 
> ---
> v2:
>  - Remove the repeated assignment code.
> 
>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 26 
> +++---
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index 4e92a95..8461f62 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -22,6 +22,7 @@
>  
>  struct ls_pcie_ep {
>   struct dw_pcie  *pci;
> + struct pci_epc_features *ls_epc;
>  };
>  
>  #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> @@ -40,25 +41,26 @@ static const struct of_device_id ls_pcie_ep_of_match[] = {
>   { },
>  };
>  
> -static const struct pci_epc_features ls_pcie_epc_features = {
> - .linkup_notifier = false,
> - .msi_capable = true,
> - .msix_capable = false,
> -};
> -
>  static const struct pci_epc_features*
>  ls_pcie_ep_get_features(struct dw_pcie_ep *ep)
>  {
> - return _pcie_epc_features;
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> +
> + return pcie->ls_epc;
>  }
>  
>  static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
>  {
>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
>   enum pci_barno bar;
>  
>   for (bar = BAR_0; bar <= BAR_5; bar++)
>   dw_pcie_ep_reset_bar(pci, bar);
> +
> + pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
> + pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
>  }
>  
>  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> @@ -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device 
> *pdev)
>   struct device *dev = >dev;
>   struct dw_pcie *pci;
>   struct ls_pcie_ep *pcie;
> + struct pci_epc_features *ls_epc;
>   struct resource *dbi_base;
>   int ret;
>  
> @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct 
> platform_device *pdev)
>   if (!pci)
>   return -ENOMEM;
>  
> + ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
> + if (!ls_epc)
> + return -ENOMEM;
> +
>   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
>   pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
>   if (IS_ERR(pci->dbi_base))
> @@ -139,6 +146,11 @@ static int __init ls_pcie_ep_probe(struct 
> platform_device *pdev)
>   pci->ops = _pcie_ep_ops;
>   pcie->pci = pci;
>  
> + ls_epc->linkup_notifier = false,
> + ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> +
> + pcie->ls_epc = ls_epc;
> +
>   platform_set_drvdata(pdev, pcie);
>  
>   ret = ls_add_pcie_ep(pcie, pdev);
>

[PATCH v2 10/10] misc: pci_endpoint_test: Add LS1088a in pci_device_id table

2019-08-22 Thread Xiaowei Bao

Add LS1088a in pci_device_id table so that pci-epf-test can be used
for testing PCIe EP in LS1088a.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.

 drivers/misc/pci_endpoint_test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 6e208a0..d531951 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -793,6 +793,7 @@ static const struct pci_device_id pci_endpoint_test_tbl[] = 
{
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA74x) },
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA72x) },
{ PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x81c0) },
+   { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x80c0) },
{ PCI_DEVICE_DATA(SYNOPSYS, EDDA, NULL) },
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_AM654),
  .driver_data = (kernel_ulong_t)_data
-- 
2.9.5

[PATCH v2 09/10] arm64: dts: layerscape: Add PCIe EP node for ls1088a

2019-08-22 Thread Xiaowei Bao

Add PCIe EP node for ls1088a to support EP mode.

Signed-off-by: Xiaowei Bao 
---
v2:
 - Remove the pf-offset proparty.

 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 31 ++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
index dfbead4..79109ad 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
@@ -471,6 +471,17 @@
status = "disabled";
};
 
+   pcie_ep@340 {
+   compatible = "fsl,ls1088a-pcie-ep","fsl,ls-pcie-ep";
+   reg = <0x00 0x0340 0x0 0x0010
+  0x20 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <24>;
+   num-ob-windows = <128>;
+   max-functions = /bits/ 8 <2>;
+   status = "disabled";
+   };
+
pcie@350 {
compatible = "fsl,ls1088a-pcie";
reg = <0x00 0x0350 0x0 0x0010   /* controller 
registers */
@@ -497,6 +508,16 @@
status = "disabled";
};
 
+   pcie_ep@350 {
+   compatible = "fsl,ls1088a-pcie-ep","fsl,ls-pcie-ep";
+   reg = <0x00 0x0350 0x0 0x0010
+  0x28 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <8>;
+   status = "disabled";
+   };
+
pcie@360 {
compatible = "fsl,ls1088a-pcie";
reg = <0x00 0x0360 0x0 0x0010   /* controller 
registers */
@@ -523,6 +544,16 @@
status = "disabled";
};
 
+   pcie_ep@360 {
+   compatible = "fsl,ls1088a-pcie-ep","fsl,ls-pcie-ep";
+   reg = <0x00 0x0360 0x0 0x0010
+  0x30 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <8>;
+   status = "disabled";
+   };
+
smmu: iommu@500 {
compatible = "arm,mmu-500";
reg = <0 0x500 0 0x80>;
-- 
2.9.5

[PATCH v2 08/10] PCI: layerscape: Add EP mode support for ls1088a and ls2088a

2019-08-22 Thread Xiaowei Bao

Add PCIe EP mode support for ls1088a and ls2088a, there are some
difference between LS1 and LS2 platform, so refactor the code of
the EP driver.

Signed-off-by: Xiaowei Bao 
---
v2:
 - New mechanism for layerscape EP driver.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 76 --
 1 file changed, 58 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 7ca5fe8..2a66f07 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -20,27 +20,29 @@
 
 #define PCIE_DBI2_OFFSET   0x1000  /* DBI2 base address*/
 
-struct ls_pcie_ep {
-   struct dw_pcie  *pci;
-   struct pci_epc_features *ls_epc;
+#define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
+
+struct ls_pcie_ep_drvdata {
+   u32 func_offset;
+   const struct dw_pcie_ep_ops *ops;
+   const struct dw_pcie_ops*dw_pcie_ops;
 };
 
-#define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
+struct ls_pcie_ep {
+   struct dw_pcie  *pci;
+   struct pci_epc_features *ls_epc;
+   const struct ls_pcie_ep_drvdata *drvdata;
+};
 
 static int ls_pcie_establish_link(struct dw_pcie *pci)
 {
return 0;
 }
 
-static const struct dw_pcie_ops ls_pcie_ep_ops = {
+static const struct dw_pcie_ops dw_ls_pcie_ep_ops = {
.start_link = ls_pcie_establish_link,
 };
 
-static const struct of_device_id ls_pcie_ep_of_match[] = {
-   { .compatible = "fsl,ls-pcie-ep",},
-   { },
-};
-
 static const struct pci_epc_features*
 ls_pcie_ep_get_features(struct dw_pcie_ep *ep)
 {
@@ -82,10 +84,44 @@ static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 
func_no,
}
 }
 
-static const struct dw_pcie_ep_ops pcie_ep_ops = {
+static unsigned int ls_pcie_ep_func_conf_select(struct dw_pcie_ep *ep,
+   u8 func_no)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
+   u8 header_type;
+
+   header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE);
+
+   if (header_type & (1 << 7))
+   return pcie->drvdata->func_offset * func_no;
+   else
+   return 0;
+}
+
+static const struct dw_pcie_ep_ops ls_pcie_ep_ops = {
.ep_init = ls_pcie_ep_init,
.raise_irq = ls_pcie_ep_raise_irq,
.get_features = ls_pcie_ep_get_features,
+   .func_conf_select = ls_pcie_ep_func_conf_select,
+};
+
+static const struct ls_pcie_ep_drvdata ls1_ep_drvdata = {
+   .ops = _pcie_ep_ops,
+   .dw_pcie_ops = _ls_pcie_ep_ops,
+};
+
+static const struct ls_pcie_ep_drvdata ls2_ep_drvdata = {
+   .func_offset = 0x2,
+   .ops = _pcie_ep_ops,
+   .dw_pcie_ops = _ls_pcie_ep_ops,
+};
+
+static const struct of_device_id ls_pcie_ep_of_match[] = {
+   { .compatible = "fsl,ls1046a-pcie-ep", .data = _ep_drvdata },
+   { .compatible = "fsl,ls1088a-pcie-ep", .data = _ep_drvdata },
+   { .compatible = "fsl,ls2088a-pcie-ep", .data = _ep_drvdata },
+   { },
 };
 
 static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
@@ -98,7 +134,7 @@ static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
int ret;
 
ep = >ep;
-   ep->ops = _ep_ops;
+   ep->ops = pcie->drvdata->ops;
 
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
if (!res)
@@ -137,14 +173,11 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
if (!ls_epc)
return -ENOMEM;
 
-   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
-   pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
-   if (IS_ERR(pci->dbi_base))
-   return PTR_ERR(pci->dbi_base);
+   pcie->drvdata = of_device_get_match_data(dev);
 
-   pci->dbi_base2 = pci->dbi_base + PCIE_DBI2_OFFSET;
pci->dev = dev;
-   pci->ops = _pcie_ep_ops;
+   pci->ops = pcie->drvdata->dw_pcie_ops;
+
pcie->pci = pci;
 
ls_epc->linkup_notifier = false,
@@ -152,6 +185,13 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
 
pcie->ls_epc = ls_epc;
 
+   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
+   pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
+   if (IS_ERR(pci->dbi_base))
+   return PTR_ERR(pci->dbi_base);
+
+   pci->dbi_base2 = pci->dbi_base + PCIE_DBI2_OFFSET;
+
platform_set_drvdata(pdev, pcie);
 
ret = ls_add_pcie_ep(pcie, pdev);
-- 
2.9.5

[PATCH v2 07/10] PCI: layerscape: Modify the MSIX to the doorbell way

2019-08-22 Thread Xiaowei Bao

The layerscape platform use the doorbell way to trigger MSIX
interrupt in EP mode.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 8461f62..7ca5fe8 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -74,7 +74,8 @@ static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 
func_no,
case PCI_EPC_IRQ_MSI:
return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
case PCI_EPC_IRQ_MSIX:
-   return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
+   return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
+ interrupt_num);
default:
dev_err(pci->dev, "UNKNOWN IRQ type\n");
return -EINVAL;
-- 
2.9.5

[PATCH v2 06/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-22 Thread Xiaowei Bao

The different PCIe controller in one board may be have different
capability of MSI or MSIX, so change the way of getting the MSI
capability, make it more flexible.

Signed-off-by: Xiaowei Bao 
---
v2:
 - Remove the repeated assignment code.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 4e92a95..8461f62 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -22,6 +22,7 @@
 
 struct ls_pcie_ep {
struct dw_pcie  *pci;
+   struct pci_epc_features *ls_epc;
 };
 
 #define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
@@ -40,25 +41,26 @@ static const struct of_device_id ls_pcie_ep_of_match[] = {
{ },
 };
 
-static const struct pci_epc_features ls_pcie_epc_features = {
-   .linkup_notifier = false,
-   .msi_capable = true,
-   .msix_capable = false,
-};
-
 static const struct pci_epc_features*
 ls_pcie_ep_get_features(struct dw_pcie_ep *ep)
 {
-   return _pcie_epc_features;
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
+
+   return pcie->ls_epc;
 }
 
 static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
enum pci_barno bar;
 
for (bar = BAR_0; bar <= BAR_5; bar++)
dw_pcie_ep_reset_bar(pci, bar);
+
+   pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
+   pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
 }
 
 static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
@@ -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
struct device *dev = >dev;
struct dw_pcie *pci;
struct ls_pcie_ep *pcie;
+   struct pci_epc_features *ls_epc;
struct resource *dbi_base;
int ret;
 
@@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
if (!pci)
return -ENOMEM;
 
+   ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
+   if (!ls_epc)
+   return -ENOMEM;
+
dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
if (IS_ERR(pci->dbi_base))
@@ -139,6 +146,11 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
pci->ops = _pcie_ep_ops;
pcie->pci = pci;
 
+   ls_epc->linkup_notifier = false,
+   ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
+
+   pcie->ls_epc = ls_epc;
+
platform_set_drvdata(pdev, pcie);
 
ret = ls_add_pcie_ep(pcie, pdev);
-- 
2.9.5

[PATCH v2 05/10] PCI: layerscape: Fix some format issue of the code

2019-08-22 Thread Xiaowei Bao

Fix some format issue of the code in EP driver.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index be61d96..4e92a95 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -62,7 +62,7 @@ static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
 }
 
 static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
- enum pci_epc_irq_type type, u16 interrupt_num)
+   enum pci_epc_irq_type type, u16 interrupt_num)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 
@@ -86,7 +86,7 @@ static const struct dw_pcie_ep_ops pcie_ep_ops = {
 };
 
 static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
-   struct platform_device *pdev)
+struct platform_device *pdev)
 {
struct dw_pcie *pci = pcie->pci;
struct device *dev = pci->dev;
-- 
2.9.5

[PATCH v2 04/10] dt-bindings: pci: layerscape-pci: add compatible strings for ls1088a and ls2088a

2019-08-22 Thread Xiaowei Bao

Add compatible strings for ls1088a and ls2088a.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.

 Documentation/devicetree/bindings/pci/layerscape-pci.txt | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
index e20ceaa..16f592e 100644
--- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
+++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
@@ -22,7 +22,10 @@ Required properties:
 "fsl,ls1043a-pcie"
 "fsl,ls1012a-pcie"
   EP mode:
-   "fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep"
+   "fsl,ls-pcie-ep"
+   "fsl,ls1046a-pcie-ep"
+   "fsl,ls1088a-pcie-ep"
+   "fsl,ls2088a-pcie-ep"
 - reg: base addresses and lengths of the PCIe controller register blocks.
 - interrupts: A list of interrupt outputs of the controller. Must contain an
   entry for each entry in the interrupt-names property.
-- 
2.9.5

[PATCH v2 03/10] PCI: designware-ep: Move the function of getting MSI capability forward

2019-08-22 Thread Xiaowei Bao

Move the function of getting MSI capability to the front of init
function, because the init function of the EP platform driver will use
the return value by the function of getting MSI capability.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.

 drivers/pci/controller/dwc/pcie-designware-ep.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index b8388f8..0a6c199 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -656,6 +656,10 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
if (ret < 0)
epc->max_functions = 1;
 
+   ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
+
+   ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX);
+
if (ep->ops->ep_init)
ep->ops->ep_init(ep);
 
@@ -672,9 +676,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
dev_err(dev, "Failed to reserve memory for MSI/MSI-X\n");
return -ENOMEM;
}
-   ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
-
-   ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX);
 
offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
if (offset) {
-- 
2.9.5

[PATCH v2 01/10] PCI: designware-ep: Add multiple PFs support for DWC

2019-08-22 Thread Xiaowei Bao

Add multiple PFs support for DWC, different PF have different config space
we use pf-offset property which get from the DTS to access the different pF
config space.

Signed-off-by: Xiaowei Bao 
---
v2:
 - Remove duplicate redundant code.
 - Reimplement the PF config space access way.

 drivers/pci/controller/dwc/pcie-designware-ep.c | 122 
 drivers/pci/controller/dwc/pcie-designware.c|  59 
 drivers/pci/controller/dwc/pcie-designware.h|  11 ++-
 3 files changed, 134 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 2bf5a35..3e2b740 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -19,12 +19,17 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
pci_epc_linkup(epc);
 }
 
-static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar,
-  int flags)
+static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no,
+  enum pci_barno bar, int flags)
 {
u32 reg;
+   unsigned int func_offset = 0;
+   struct dw_pcie_ep *ep = >ep;
 
-   reg = PCI_BASE_ADDRESS_0 + (4 * bar);
+   if (ep->ops->func_conf_select)
+   func_offset = ep->ops->func_conf_select(ep, func_no);
+
+   reg = func_offset + PCI_BASE_ADDRESS_0 + (4 * bar);
dw_pcie_dbi_ro_wr_en(pci);
dw_pcie_writel_dbi2(pci, reg, 0x0);
dw_pcie_writel_dbi(pci, reg, 0x0);
@@ -37,7 +42,12 @@ static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum 
pci_barno bar,
 
 void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
 {
-   __dw_pcie_ep_reset_bar(pci, bar, 0);
+   u8 func_no, funcs;
+
+   funcs = pci->ep.epc->max_functions;
+
+   for (func_no = 0; func_no < funcs; func_no++)
+   __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
 }
 
 static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr,
@@ -78,28 +88,32 @@ static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 
func_no,
 {
struct dw_pcie_ep *ep = epc_get_drvdata(epc);
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   unsigned int func_offset = 0;
+
+   if (ep->ops->func_conf_select)
+   func_offset = ep->ops->func_conf_select(ep, func_no);
 
dw_pcie_dbi_ro_wr_en(pci);
-   dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid);
-   dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, hdr->deviceid);
-   dw_pcie_writeb_dbi(pci, PCI_REVISION_ID, hdr->revid);
-   dw_pcie_writeb_dbi(pci, PCI_CLASS_PROG, hdr->progif_code);
-   dw_pcie_writew_dbi(pci, PCI_CLASS_DEVICE,
+   dw_pcie_writew_dbi(pci, func_offset + PCI_VENDOR_ID, hdr->vendorid);
+   dw_pcie_writew_dbi(pci, func_offset + PCI_DEVICE_ID, hdr->deviceid);
+   dw_pcie_writeb_dbi(pci, func_offset + PCI_REVISION_ID, hdr->revid);
+   dw_pcie_writeb_dbi(pci, func_offset + PCI_CLASS_PROG, hdr->progif_code);
+   dw_pcie_writew_dbi(pci, func_offset + PCI_CLASS_DEVICE,
   hdr->subclass_code | hdr->baseclass_code << 8);
-   dw_pcie_writeb_dbi(pci, PCI_CACHE_LINE_SIZE,
+   dw_pcie_writeb_dbi(pci, func_offset + PCI_CACHE_LINE_SIZE,
   hdr->cache_line_size);
-   dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_VENDOR_ID,
+   dw_pcie_writew_dbi(pci, func_offset + PCI_SUBSYSTEM_VENDOR_ID,
   hdr->subsys_vendor_id);
-   dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_ID, hdr->subsys_id);
-   dw_pcie_writeb_dbi(pci, PCI_INTERRUPT_PIN,
+   dw_pcie_writew_dbi(pci, func_offset + PCI_SUBSYSTEM_ID, hdr->subsys_id);
+   dw_pcie_writeb_dbi(pci, func_offset + PCI_INTERRUPT_PIN,
   hdr->interrupt_pin);
dw_pcie_dbi_ro_wr_dis(pci);
 
return 0;
 }
 
-static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, enum pci_barno bar,
- dma_addr_t cpu_addr,
+static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, u8 func_no,
+ enum pci_barno bar, dma_addr_t cpu_addr,
  enum dw_pcie_as_type as_type)
 {
int ret;
@@ -112,7 +126,7 @@ static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, 
enum pci_barno bar,
return -EINVAL;
}
 
-   ret = dw_pcie_prog_inbound_atu(pci, free_win, bar, cpu_addr,
+   ret = dw_pcie_prog_inbound_atu(pci, func_no, free_win, bar, cpu_addr,
   as_type);
if (ret < 0) {
dev_err(pci->dev, "Failed to program IB window\n");
@@ -125,7 +139,8 @@ static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, 
enum pci_barno bar,
return 0;
 }
 
-static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep, phys_addr_t 
phys_addr,
+static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep, u8

[PATCH v2 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode

2019-08-22 Thread Xiaowei Bao

Add the doorbell mode of MSI-X in EP mode.

Signed-off-by: Xiaowei Bao 
---
v2:
 - Remove the macro of no used.

 drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++
 drivers/pci/controller/dwc/pcie-designware.h| 12 
 2 files changed, 26 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 3e2b740..b8388f8 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -480,6 +480,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 
func_no,
return 0;
 }
 
+int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
+  u16 interrupt_num)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   u32 msg_data;
+
+   msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
+  (interrupt_num - 1);
+
+   dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
+
+   return 0;
+}
+
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
  u16 interrupt_num)
 {
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index a0fdbf7..895a9ef 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -88,6 +88,9 @@
 #define PCIE_MISC_CONTROL_1_OFF0x8BC
 #define PCIE_DBI_RO_WR_EN  BIT(0)
 
+#define PCIE_MSIX_DOORBELL 0x948
+#define PCIE_MSIX_DOORBELL_PF_SHIFT24
+
 /*
  * iATU Unroll-specific register definitions
  * From 4.80 core version the address translation will be made by unroll
@@ -400,6 +403,8 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 
func_no,
 u8 interrupt_num);
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
 u16 interrupt_num);
+int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
+  u16 interrupt_num);
 void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
 #else
 static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
@@ -432,6 +437,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct 
dw_pcie_ep *ep, u8 func_no,
return 0;
 }
 
+static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep,
+u8 func_no,
+u16 interrupt_num)
+{
+   return 0;
+}
+
 static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno 
bar)
 {
 }
-- 
2.9.5

Re: [PATCH v2 3/3] powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race

2019-08-22 Thread Michael Ellerman

On Tue, 2019-08-13 at 10:06:48 UTC, Paul Mackerras wrote:
> Testing has revealed the existence of a race condition where a XIVE
> interrupt being shut down can be in one of the XIVE interrupt queues
> (of which there are up to 8 per CPU, one for each priority) at the
> point where free_irq() is called.  If this happens, can return an
> interrupt number which has been shut down.  This can lead to various
> symptoms:
> 
> - irq_to_desc(irq) can be NULL.  In this case, no end-of-interrupt
>   function gets called, resulting in the CPU's elevated interrupt
>   priority (numerically lowered CPPR) never gets reset.  That then
>   means that the CPU stops processing interrupts, causing device
>   timeouts and other errors in various device drivers.
> 
> - The irq descriptor or related data structures can be in the process
>   of being freed as the interrupt code is using them.  This typically
>   leads to crashes due to bad pointer dereferences.
> 
> This race is basically what commit 62e0468650c3 ("genirq: Add optional
> hardware synchronization for shutdown", 2019-06-28) is intended to
> fix, given a get_irqchip_state() method for the interrupt controller
> being used.  It works by polling the interrupt controller when an
> interrupt is being freed until the controller says it is not pending.
> 
> With XIVE, the PQ bits of the interrupt source indicate the state of
> the interrupt source, and in particular the P bit goes from 0 to 1 at
> the point where the hardware writes an entry into the interrupt queue
> that this interrupt is directed towards.  Normally, the code will then
> process the interrupt and do an end-of-interrupt (EOI) operation which
> will reset PQ to 00 (assuming another interrupt hasn't been generated
> in the meantime).  However, there are situations where the code resets
> P even though a queue entry exists (for example, by setting PQ to 01,
> which disables the interrupt source), and also situations where the
> code leaves P at 1 after removing the queue entry (for example, this
> is done for escalation interrupts so they cannot fire again until
> they are explicitly re-enabled).
> 
> The code already has a 'saved_p' flag for the interrupt source which
> indicates that a queue entry exists, although it isn't maintained
> consistently.  This patch adds a 'stale_p' flag to indicate that
> P has been left at 1 after processing a queue entry, and adds code
> to set and clear saved_p and stale_p as necessary to maintain a
> consistent indication of whether a queue entry may or may not exist.
> 
> With this, we can implement xive_get_irqchip_state() by looking at
> stale_p, saved_p and the ESB PQ bits for the interrupt.
> 
> There is some additional code to handle escalation interrupts
> properly; because they are enabled and disabled in KVM assembly code,
> which does not have access to the xive_irq_data struct for the
> escalation interrupt.  Hence, stale_p may be incorrect when the
> escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
> Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
> with some careful attention to barriers in order to ensure the correct
> result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
> 
> Finally, this adds code to make noise on the console (pr_crit and
> WARN_ON(1)) if we find an interrupt queue entry for an interrupt
> which does not have a descriptor.  While this won't catch the race
> reliably, if it does get triggered it will be an indication that
> the race is occurring and needs to be debugged.
> 
> Signed-off-by: Paul Mackerras 

Applied to powerpc topic/ppc-kvm, thanks.

https://git.kernel.org/powerpc/c/da15c03b047dca891d37b9f4ef9ca14d84a6484f

cheers

Re: [PATCH v2 2/3] KVM: PPC: Book3S HV: Don't push XIVE context when not using XIVE device

2019-08-22 Thread Michael Ellerman

On Tue, 2019-08-13 at 10:01:00 UTC, Paul Mackerras wrote:
> At present, when running a guest on POWER9 using HV KVM but not using
> an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
> is run with the kernel_irqchip=off option, the guest entry code goes
> ahead and tries to load the guest context into the XIVE hardware, even
> though no context has been set up.
> 
> To fix this, we check that the "CAM word" is non-zero before pushing
> it to the hardware.  The CAM word is initialized to a non-zero value
> in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
> and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.
> 
> Cc: sta...@vger.kernel.org # v4.11+
> Reported-by: CÃ©dric Le Goater 
> Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt 
> controller")
> Signed-off-by: Paul Mackerras 
> Reviewed-by: CÃ©dric Le Goater 

Applied to powerpc topic/ppc-kvm, thanks.

https://git.kernel.org/powerpc/c/8d4ba9c931bc384bcc6889a43915aaaf19d3e499

cheers

Re: [PATCH v2 1/3] KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts

2019-08-22 Thread Michael Ellerman

On Tue, 2019-08-13 at 10:03:49 UTC, Paul Mackerras wrote:
> Escalation interrupts are interrupts sent to the host by the XIVE
> hardware when it has an interrupt to deliver to a guest VCPU but that
> VCPU is not running anywhere in the system.  Hence we disable the
> escalation interrupt for the VCPU being run when we enter the guest
> and re-enable it when the guest does an H_CEDE hypercall indicating
> it is idle.
> 
> It is possible that an escalation interrupt gets generated just as we
> are entering the guest.  In that case the escalation interrupt may be
> using a queue entry in one of the interrupt queues, and that queue
> entry may not have been processed when the guest exits with an H_CEDE.
> The existing entry code detects this situation and does not clear the
> vcpu->arch.xive_esc_on flag as an indication that there is a pending
> queue entry (if the queue entry gets processed, xive_esc_irq() will
> clear the flag).  There is a comment in the code saying that if the
> flag is still set on H_CEDE, we have to abort the cede rather than
> re-enabling the escalation interrupt, lest we end up with two
> occurrences of the escalation interrupt in the interrupt queue.
> 
> However, the exit code doesn't do that; it aborts the cede in the sense
> that vcpu->arch.ceded gets cleared, but it still enables the escalation
> interrupt by setting the source's PQ bits to 00.  Instead we need to
> set the PQ bits to 10, indicating that an interrupt has been triggered.
> We also need to avoid setting vcpu->arch.xive_esc_on in this case
> (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
> xive_esc_irq() will run at some point and clear it, and if we race with
> that we may end up with an incorrect result (i.e. xive_esc_on set when
> the escalation interrupt has just been handled).
> 
> It is extremely unlikely that having two queue entries would cause
> observable problems; theoretically it could cause queue overflow, but
> the CPU would have to have thousands of interrupts targetted to it for
> that to be possible.  However, this fix will also make it possible to
> determine accurately whether there is an unhandled escalation
> interrupt in the queue, which will be needed by the following patch.
> 
> Cc: sta...@vger.kernel.org # v4.16+
> Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt 
> masked unless ceded")
> Signed-off-by: Paul Mackerras 

Applied to powerpc topic/ppc-kvm, thanks.

https://git.kernel.org/powerpc/c/959c5d5134786b4988b6fdd08e444aa67d1667ed

cheers

Re: [PATCH] KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP

2019-08-22 Thread Michael Ellerman

On Tue, 2019-08-06 at 17:25:38 UTC, =?utf-8?q?C=C3=A9dric_Le_Goater?= wrote:
> When a vCPU is brought done, the XIVE VP is first disabled and then
> the event notification queues are freed. When freeing the queues, we
> check for possible escalation interrupts and free them also.
> 
> But when a XIVE VP is disabled, the underlying XIVE ENDs also are
> disabled in OPAL. When an END is disabled, its ESB pages (ESn and ESe)
> are disabled and loads return all 1s. Which means that any access on
> the ESB page of the escalation interrupt will return invalid values.
> 
> When an interrupt is freed, the shutdown handler computes a 'saved_p'
> field from the value returned by a load in xive_do_source_set_mask().
> This value is incorrect for escalation interrupts for the reason
> described above.
> 
> This has no impact on Linux/KVM today because we don't make use of it
> but we will introduce in future changes a xive_get_irqchip_state()
> handler. This handler will use the 'saved_p' field to return the state
> of an interrupt and 'saved_p' being incorrect, softlockup will occur.
> 
> Fix the vCPU cleanup sequence by first freeing the escalation
> interrupts if any, then disable the XIVE VP and last free the queues.
> 
> Signed-off-by: CÃ©dric Le Goater 

Applied to powerpc topic/ppc-kvm, thanks.

https://git.kernel.org/powerpc/c/237aed48c642328ff0ab19b63423634340224a06

cheers

[PATCH v7 7/7] KVM: PPC: Ultravisor: Add PPC_UV config option

2019-08-22 Thread Bharata B Rao

From: Anshuman Khandual 

CONFIG_PPC_UV adds support for ultravisor.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Bharata B Rao 
Signed-off-by: Ram Pai 
[ Update config help and commit message ]
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/Kconfig | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d8dcd8820369..044838794112 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -448,6 +448,23 @@ config PPC_TRANSACTIONAL_MEM
help
  Support user-mode Transactional Memory on POWERPC.
 
+config PPC_UV
+   bool "Ultravisor support"
+   depends on KVM_BOOK3S_HV_POSSIBLE
+   select ZONE_DEVICE
+   select DEV_PAGEMAP_OPS
+   select DEVICE_PRIVATE
+   select MEMORY_HOTPLUG
+   select MEMORY_HOTREMOVE
+   default n
+   help
+ This option paravirtualizes the kernel to run in POWER platforms that
+ supports the Protected Execution Facility (PEF). On such platforms,
+ the ultravisor firmware runs at a privilege level above the
+ hypervisor.
+
+ If unsure, say "N".
+
 config LD_HEAD_STUB_CATCH
bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if 
EXPERT
depends on PPC64
-- 
2.21.0

[PATCH v7 6/7] kvmppc: Support reset of secure guest

2019-08-22 Thread Bharata B Rao

Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
This ioctl will be issued by QEMU during reset and includes the
the following steps:

- Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
- Unpin the VPA pages so that they can be migrated back to secure
  side when guest becomes secure again. This is required because
  pinned pages can't be migrated.
- Reinitialize guest's partitioned scoped page tables. These are
  freed when guest becomes secure (H_SVM_INIT_DONE)
- Release all device pages of the secure guest.

After these steps, guest is ready to issue UV_ESM call once again
to switch to secure mode.

Signed-off-by: Bharata B Rao 
Signed-off-by: Sukadev Bhattiprolu 
[Implementation of uv_svm_terminate() and its call from
guest shutdown path]
Signed-off-by: Ram Pai 
[Unpinning of VPA pages]
---
 Documentation/virtual/kvm/api.txt  | 19 ++
 arch/powerpc/include/asm/kvm_book3s_devm.h |  6 ++
 arch/powerpc/include/asm/kvm_ppc.h |  2 +
 arch/powerpc/include/asm/ultravisor-api.h  |  1 +
 arch/powerpc/include/asm/ultravisor.h  |  5 ++
 arch/powerpc/kvm/book3s_hv.c   | 70 ++
 arch/powerpc/kvm/book3s_hv_devm.c  | 60 +++
 arch/powerpc/kvm/powerpc.c | 12 
 include/uapi/linux/kvm.h   |  1 +
 9 files changed, 176 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index e54a3f51ddc5..531562e01343 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4111,6 +4111,25 @@ Valid values for 'action':
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+4.121 KVM_PPC_SVM_OFF
+
+Capability: basic
+Architectures: powerpc
+Type: vm ioctl
+Parameters: none
+Returns: 0 on successful completion,
+Errors:
+  EINVAL:if ultravisor failed to terminate the secure guest
+  ENOMEM:if hypervisor failed to allocate new radix page tables for guest
+
+This ioctl is used to turn off the secure mode of the guest or transition
+the guest from secure mode to normal mode. This is invoked when the guest
+is reset. This has no effect if called for a normal guest.
+
+This ioctl issues an ultravisor call to terminate the secure guest,
+unpins the VPA pages, reinitializes guest's partition scoped page
+tables and releases all the device pages that are used to track the
+secure pages by hypervisor.
 
 5. The kvm_run structure
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_devm.h 
b/arch/powerpc/include/asm/kvm_book3s_devm.h
index fc924ef00b91..62e1992cbac7 100644
--- a/arch/powerpc/include/asm/kvm_book3s_devm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_devm.h
@@ -13,6 +13,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
unsigned long page_shift);
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
+void kvmppc_devm_free_memslot_pfns(struct kvm *kvm, struct kvm_memslots 
*slots);
 #else
 static inline unsigned long
 kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
@@ -37,5 +38,10 @@ static inline unsigned long kvmppc_h_svm_init_done(struct 
kvm *kvm)
 {
return H_UNSUPPORTED;
 }
+
+static inline void kvmppc_devm_free_memslot_pfns(struct kvm *kvm,
+struct kvm_memslots *slots)
+{
+}
 #endif /* CONFIG_PPC_UV */
 #endif /* __POWERPC_KVM_PPC_HMM_H__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2484e6a8f5ca..e4093d067354 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -177,6 +177,7 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm 
*kvm,
 extern int kvmppc_switch_mmu_to_hpt(struct kvm *kvm);
 extern int kvmppc_switch_mmu_to_radix(struct kvm *kvm);
 extern void kvmppc_setup_partition_table(struct kvm *kvm);
+extern int kvmppc_reinit_partition_table(struct kvm *kvm);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce_64 *args);
@@ -321,6 +322,7 @@ struct kvmppc_ops {
   int size);
int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
  int size);
+   int (*svm_off)(struct kvm *kvm);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index cf200d4ce703..3a27a0c0be05 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -30,5 +30,6 @@
 #define UV_PAGE_IN 0xF128
 #define UV_PAGE_OUT0xF12C
 #define UV_PAGE_INVAL  0xF138
+#define UV_SVM_TERMINATE   0xF13C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git

[PATCH v7 5/7] kvmppc: Radix changes for secure guest

2019-08-22 Thread Bharata B Rao

- After the guest becomes secure, when we handle a page fault of a page
  belonging to SVM in HV, send that page to UV via UV_PAGE_IN.
- Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL.
- Ensure all those routines that walk the secondary page tables of
  the guest don't do so in case of secure VM. For secure guest, the
  active secondary page tables are in secure memory and the secondary
  page tables in HV are freed when guest becomes secure.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/kvm_host.h   | 12 
 arch/powerpc/include/asm/ultravisor-api.h |  1 +
 arch/powerpc/include/asm/ultravisor.h |  5 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c| 22 ++
 arch/powerpc/kvm/book3s_hv_devm.c | 20 
 5 files changed, 60 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 66e5cc8c9759..29333e8de1c4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -867,6 +867,8 @@ static inline void kvm_arch_vcpu_block_finish(struct 
kvm_vcpu *vcpu) {}
 #ifdef CONFIG_PPC_UV
 extern int kvmppc_devm_init(void);
 extern void kvmppc_devm_free(void);
+extern bool kvmppc_is_guest_secure(struct kvm *kvm);
+extern int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gpa);
 #else
 static inline int kvmppc_devm_init(void)
 {
@@ -874,6 +876,16 @@ static inline int kvmppc_devm_init(void)
 }
 
 static inline void kvmppc_devm_free(void) {}
+
+static inline bool kvmppc_is_guest_secure(struct kvm *kvm)
+{
+   return false;
+}
+
+static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gpa)
+{
+   return -EFAULT;
+}
 #endif /* CONFIG_PPC_UV */
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 46b1ee381695..cf200d4ce703 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -29,5 +29,6 @@
 #define UV_UNREGISTER_MEM_SLOT 0xF124
 #define UV_PAGE_IN 0xF128
 #define UV_PAGE_OUT0xF12C
+#define UV_PAGE_INVAL  0xF138
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index 719c0c3930b9..b333241bbe4c 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -57,4 +57,9 @@ static inline int uv_unregister_mem_slot(u64 lpid, u64 slotid)
return ucall_norets(UV_UNREGISTER_MEM_SLOT, lpid, slotid);
 }
 
+static inline int uv_page_inval(u64 lpid, u64 gpa, u64 page_shift)
+{
+   return ucall_norets(UV_PAGE_INVAL, lpid, gpa, page_shift);
+}
+
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 2d415c36a61d..93ad34e63045 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -915,6 +917,9 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
if (!(dsisr & DSISR_PRTABLE_FAULT))
gpa |= ea & 0xfff;
 
+   if (kvmppc_is_guest_secure(kvm))
+   return kvmppc_send_page_to_uv(kvm, gpa & PAGE_MASK);
+
/* Get the corresponding memslot */
memslot = gfn_to_memslot(kvm, gfn);
 
@@ -972,6 +977,11 @@ int kvm_unmap_radix(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
unsigned long gpa = gfn << PAGE_SHIFT;
unsigned int shift;
 
+   if (kvmppc_is_guest_secure(kvm)) {
+   uv_page_inval(kvm->arch.lpid, gpa, PAGE_SIZE);
+   return 0;
+   }
+
ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, );
if (ptep && pte_present(*ptep))
kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot,
@@ -989,6 +999,9 @@ int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot 
*memslot,
int ref = 0;
unsigned long old, *rmapp;
 
+   if (kvmppc_is_guest_secure(kvm))
+   return ref;
+
ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, );
if (ptep && pte_present(*ptep) && pte_young(*ptep)) {
old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED, 0,
@@ -1013,6 +1026,9 @@ int kvm_test_age_radix(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
unsigned int shift;
int ref = 0;
 
+   if (kvmppc_is_guest_secure(kvm))
+   return ref;
+
ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, );
if (ptep && pte_present(*ptep) && pte_young(*ptep))
ref = 1;
@@ -1030,6 +1046,9 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
int ret = 0;
unsigned long old, *rmapp;
 
+   if

[PATCH v7 4/7] kvmppc: Handle memory plug/unplug to secure VM

2019-08-22 Thread Bharata B Rao

Register the new memslot with UV during plug and unregister
the memslot during unplug.

Signed-off-by: Bharata B Rao 
Acked-by: Paul Mackerras 
---
 arch/powerpc/include/asm/ultravisor-api.h |  1 +
 arch/powerpc/include/asm/ultravisor.h |  5 +
 arch/powerpc/kvm/book3s_hv.c  | 17 +
 3 files changed, 23 insertions(+)

diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index c578d9b13a56..46b1ee381695 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -26,6 +26,7 @@
 #define UV_WRITE_PATE  0xF104
 #define UV_RETURN  0xF11C
 #define UV_REGISTER_MEM_SLOT   0xF120
+#define UV_UNREGISTER_MEM_SLOT 0xF124
 #define UV_PAGE_IN 0xF128
 #define UV_PAGE_OUT0xF12C
 
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index 58ccf5e2d6bb..719c0c3930b9 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -52,4 +52,9 @@ static inline int uv_register_mem_slot(u64 lpid, u64 
start_gpa, u64 size,
size, flags, slotid);
 }
 
+static inline int uv_unregister_mem_slot(u64 lpid, u64 slotid)
+{
+   return ucall_norets(UV_UNREGISTER_MEM_SLOT, lpid, slotid);
+}
+
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 33b8ebffbef0..57f8b0b1c703 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -74,6 +74,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "book3s.h"
 
@@ -4504,6 +4505,22 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
if (change == KVM_MR_FLAGS_ONLY && kvm_is_radix(kvm) &&
((new->flags ^ old->flags) & KVM_MEM_LOG_DIRTY_PAGES))
kvmppc_radix_flush_memslot(kvm, old);
+   /*
+* If UV hasn't yet called H_SVM_INIT_START, don't register memslots.
+*/
+   if (!kvm->arch.secure_guest)
+   return;
+
+   /*
+* TODO: Handle KVM_MR_MOVE
+*/
+   if (change == KVM_MR_CREATE) {
+   uv_register_mem_slot(kvm->arch.lpid,
+new->base_gfn << PAGE_SHIFT,
+new->npages * PAGE_SIZE,
+0, new->id);
+   } else if (change == KVM_MR_DELETE)
+   uv_unregister_mem_slot(kvm->arch.lpid, old->id);
 }
 
 /*
-- 
2.21.0

[PATCH v7 3/7] kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls

2019-08-22 Thread Bharata B Rao

H_SVM_INIT_START: Initiate securing a VM
H_SVM_INIT_DONE: Conclude securing a VM

As part of H_SVM_INIT_START, register all existing memslots with
the UV. H_SVM_INIT_DONE call by UV informs HV that transition of
the guest to secure mode is complete.

These two states (transition to secure mode STARTED and transition
to secure mode COMPLETED) are recorded in kvm->arch.secure_guest.
Setting these states will cause the assembly code that enters the
guest to call the UV_RETURN ucall instead of trying to enter the
guest directly.

Signed-off-by: Bharata B Rao 
Acked-by: Paul Mackerras 
---
 arch/powerpc/include/asm/hvcall.h  |  2 ++
 arch/powerpc/include/asm/kvm_book3s_devm.h | 12 
 arch/powerpc/include/asm/kvm_host.h|  4 +++
 arch/powerpc/include/asm/ultravisor-api.h  |  1 +
 arch/powerpc/include/asm/ultravisor.h  |  7 +
 arch/powerpc/kvm/book3s_hv.c   |  7 +
 arch/powerpc/kvm/book3s_hv_devm.c  | 34 ++
 7 files changed, 67 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 05b8536f6653..fa7695928e30 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -343,6 +343,8 @@
 /* Platform-specific hcalls used by the Ultravisor */
 #define H_SVM_PAGE_IN  0xEF00
 #define H_SVM_PAGE_OUT 0xEF04
+#define H_SVM_INIT_START   0xEF08
+#define H_SVM_INIT_DONE0xEF0C
 
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR  1
diff --git a/arch/powerpc/include/asm/kvm_book3s_devm.h 
b/arch/powerpc/include/asm/kvm_book3s_devm.h
index 9603c2b48d67..fc924ef00b91 100644
--- a/arch/powerpc/include/asm/kvm_book3s_devm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_devm.h
@@ -11,6 +11,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
unsigned long gra,
unsigned long flags,
unsigned long page_shift);
+unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
+unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
 #else
 static inline unsigned long
 kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
@@ -25,5 +27,15 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra,
 {
return H_UNSUPPORTED;
 }
+
+static inline unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
+{
+   return H_UNSUPPORTED;
+}
+
+static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
+{
+   return H_UNSUPPORTED;
+}
 #endif /* CONFIG_PPC_UV */
 #endif /* __POWERPC_KVM_PPC_HMM_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 855d82730f44..66e5cc8c9759 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,6 +272,10 @@ struct kvm_hpt_info {
 
 struct kvm_resize_hpt;
 
+/* Flag values for kvm_arch.secure_guest */
+#define KVMPPC_SECURE_INIT_START 0x1 /* H_SVM_INIT_START has been called */
+#define KVMPPC_SECURE_INIT_DONE  0x2 /* H_SVM_INIT_DONE completed */
+
 struct kvm_arch {
unsigned int lpid;
unsigned int smt_mode;  /* # vcpus per virtual core */
diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 1cd1f595fd81..c578d9b13a56 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -25,6 +25,7 @@
 /* opcodes */
 #define UV_WRITE_PATE  0xF104
 #define UV_RETURN  0xF11C
+#define UV_REGISTER_MEM_SLOT   0xF120
 #define UV_PAGE_IN 0xF128
 #define UV_PAGE_OUT0xF12C
 
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index 0fc4a974b2e8..58ccf5e2d6bb 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -45,4 +45,11 @@ static inline int uv_page_out(u64 lpid, u64 dst_ra, u64 
src_gpa, u64 flags,
page_shift);
 }
 
+static inline int uv_register_mem_slot(u64 lpid, u64 start_gpa, u64 size,
+  u64 flags, u64 slotid)
+{
+   return ucall_norets(UV_REGISTER_MEM_SLOT, lpid, start_gpa,
+   size, flags, slotid);
+}
+
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 00b43ee8b693..33b8ebffbef0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1089,6 +1089,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
kvmppc_get_gpr(vcpu, 5),
kvmppc_get_gpr(vcpu, 6));
break;
+   case H_SVM_INIT_START:
+   ret = kvmppc_h_svm_init_start(vcpu->kvm);
+   break;
+   case H_SVM_INIT_DONE:
+   ret

[PATCH v7 2/7] kvmppc: Shared pages support for secure guests

2019-08-22 Thread Bharata B Rao

A secure guest will share some of its pages with hypervisor (Eg. virtio
bounce buffers etc). Support sharing of pages between hypervisor and
ultravisor.

Once a secure page is converted to shared page, the device page is
unmapped from the HV side page tables.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/hvcall.h |  3 ++
 arch/powerpc/kvm/book3s_hv_devm.c | 70 +--
 2 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 2f6b952deb0f..05b8536f6653 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -337,6 +337,9 @@
 #define H_TLB_INVALIDATE   0xF808
 #define H_COPY_TOFROM_GUEST0xF80C
 
+/* Flags for H_SVM_PAGE_IN */
+#define H_PAGE_IN_SHARED0x1
+
 /* Platform-specific hcalls used by the Ultravisor */
 #define H_SVM_PAGE_IN  0xEF00
 #define H_SVM_PAGE_OUT 0xEF04
diff --git a/arch/powerpc/kvm/book3s_hv_devm.c 
b/arch/powerpc/kvm/book3s_hv_devm.c
index 13722f27fa7d..6a3229b78fed 100644
--- a/arch/powerpc/kvm/book3s_hv_devm.c
+++ b/arch/powerpc/kvm/book3s_hv_devm.c
@@ -46,6 +46,7 @@ struct kvmppc_devm_page_pvt {
unsigned long *rmap;
unsigned int lpid;
unsigned long gpa;
+   bool skip_page_out;
 };
 
 /*
@@ -139,6 +140,54 @@ kvmppc_devm_migrate_alloc_and_copy(struct migrate_vma *mig,
return 0;
 }
 
+/*
+ * Shares the page with HV, thus making it a normal page.
+ *
+ * - If the page is already secure, then provision a new page and share
+ * - If the page is a normal page, share the existing page
+ *
+ * In the former case, uses the dev_pagemap_ops migrate_to_ram handler
+ * to unmap the device page from QEMU's page tables.
+ */
+static unsigned long
+kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+{
+
+   int ret = H_PARAMETER;
+   struct page *devm_page;
+   struct kvmppc_devm_page_pvt *pvt;
+   unsigned long pfn;
+   unsigned long *rmap;
+   struct kvm_memory_slot *slot;
+   unsigned long gfn = gpa >> page_shift;
+   int srcu_idx;
+
+   srcu_idx = srcu_read_lock(>srcu);
+   slot = gfn_to_memslot(kvm, gfn);
+   if (!slot)
+   goto out;
+
+   rmap = >arch.rmap[gfn - slot->base_gfn];
+   if (kvmppc_rmap_is_devm_pfn(*rmap)) {
+   devm_page = pfn_to_page(*rmap & ~KVMPPC_RMAP_DEVM_PFN);
+   pvt = (struct kvmppc_devm_page_pvt *)
+   devm_page->zone_device_data;
+   pvt->skip_page_out = true;
+   }
+
+   pfn = gfn_to_pfn(kvm, gpa >> page_shift);
+   if (is_error_noslot_pfn(pfn))
+   goto out;
+
+   ret = uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift);
+   if (ret == U_SUCCESS)
+   ret = H_SUCCESS;
+   kvm_release_pfn_clean(pfn);
+out:
+   srcu_read_unlock(>srcu, srcu_idx);
+   return ret;
+}
+
 /*
  * Move page from normal memory to secure memory.
  */
@@ -159,9 +208,12 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
if (page_shift != PAGE_SHIFT)
return H_P3;
 
-   if (flags)
+   if (flags & ~H_PAGE_IN_SHARED)
return H_P2;
 
+   if (flags & H_PAGE_IN_SHARED)
+   return kvmppc_share_page(kvm, gpa, page_shift);
+
ret = H_PARAMETER;
down_read(>mm->mmap_sem);
srcu_idx = srcu_read_lock(>srcu);
@@ -211,7 +263,7 @@ kvmppc_devm_fault_migrate_alloc_and_copy(struct migrate_vma 
*mig,
struct page *dpage, *spage;
struct kvmppc_devm_page_pvt *pvt;
unsigned long pfn;
-   int ret;
+   int ret = U_SUCCESS;
 
spage = migrate_pfn_to_page(*mig->src);
if (!spage || !(*mig->src & MIGRATE_PFN_MIGRATE))
@@ -226,8 +278,18 @@ kvmppc_devm_fault_migrate_alloc_and_copy(struct 
migrate_vma *mig,
pvt = spage->zone_device_data;
 
pfn = page_to_pfn(dpage);
-   ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0,
- page_shift);
+
+   /*
+* This same function is used in two cases:
+* - When HV touches a secure page, for which we do page-out
+* - When a secure page is converted to shared page, we touch
+*   the page to essentially unmap the device page. In this
+*   case we skip page-out.
+*/
+   if (!pvt->skip_page_out)
+   ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0,
+ page_shift);
+
if (ret == U_SUCCESS)
*mig->dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
else {
-- 
2.21.0

[PATCH v7 1/7] kvmppc: Driver to manage pages of secure guest

2019-08-22 Thread Bharata B Rao

KVMPPC driver to manage page transitions of secure guest
via H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.

H_SVM_PAGE_IN: Move the content of a normal page to secure page
H_SVM_PAGE_OUT: Move the content of a secure page to normal page

Private ZONE_DEVICE memory equal to the amount of secure memory
available in the platform for running secure guests is created.
Whenever a page belonging to the guest becomes secure, a page from
this private device memory is used to represent and track that secure
page on the HV side. The movement of pages between normal and secure
memory is done via migrate_vma_pages() using UV_PAGE_IN and
UV_PAGE_OUT ucalls.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/hvcall.h  |   4 +
 arch/powerpc/include/asm/kvm_book3s_devm.h |  29 ++
 arch/powerpc/include/asm/kvm_host.h|  23 ++
 arch/powerpc/include/asm/ultravisor-api.h  |   2 +
 arch/powerpc/include/asm/ultravisor.h  |  14 +
 arch/powerpc/kvm/Makefile  |   3 +
 arch/powerpc/kvm/book3s_hv.c   |  19 +
 arch/powerpc/kvm/book3s_hv_devm.c  | 438 +
 8 files changed, 532 insertions(+)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_devm.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_devm.c

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 463c63a9fcf1..2f6b952deb0f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -337,6 +337,10 @@
 #define H_TLB_INVALIDATE   0xF808
 #define H_COPY_TOFROM_GUEST0xF80C
 
+/* Platform-specific hcalls used by the Ultravisor */
+#define H_SVM_PAGE_IN  0xEF00
+#define H_SVM_PAGE_OUT 0xEF04
+
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR  1
 #define H_SET_MODE_RESOURCE_SET_DAWR   2
diff --git a/arch/powerpc/include/asm/kvm_book3s_devm.h 
b/arch/powerpc/include/asm/kvm_book3s_devm.h
new file mode 100644
index ..9603c2b48d67
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_devm.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __POWERPC_KVM_PPC_HMM_H__
+#define __POWERPC_KVM_PPC_HMM_H__
+
+#ifdef CONFIG_PPC_UV
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm,
+  unsigned long gra,
+  unsigned long flags,
+  unsigned long page_shift);
+unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
+   unsigned long gra,
+   unsigned long flags,
+   unsigned long page_shift);
+#else
+static inline unsigned long
+kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
+unsigned long flags, unsigned long page_shift)
+{
+   return H_UNSUPPORTED;
+}
+
+static inline unsigned long
+kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra,
+ unsigned long flags, unsigned long page_shift)
+{
+   return H_UNSUPPORTED;
+}
+#endif /* CONFIG_PPC_UV */
+#endif /* __POWERPC_KVM_PPC_HMM_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 4bb552d639b8..855d82730f44 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -242,6 +242,17 @@ struct revmap_entry {
 #define KVMPPC_RMAP_PRESENT0x1ul
 #define KVMPPC_RMAP_INDEX  0xul
 
+/*
+ * Bits 60:56 in the rmap entry will be used to identify the
+ * different uses/functions of rmap.
+ */
+#define KVMPPC_RMAP_DEVM_PFN   (0x2ULL << 56)
+
+static inline bool kvmppc_rmap_is_devm_pfn(unsigned long pfn)
+{
+   return !!(pfn & KVMPPC_RMAP_DEVM_PFN);
+}
+
 struct kvm_arch_memory_slot {
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
unsigned long *rmap;
@@ -849,4 +860,16 @@ static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+#ifdef CONFIG_PPC_UV
+extern int kvmppc_devm_init(void);
+extern void kvmppc_devm_free(void);
+#else
+static inline int kvmppc_devm_init(void)
+{
+   return 0;
+}
+
+static inline void kvmppc_devm_free(void) {}
+#endif /* CONFIG_PPC_UV */
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 6a0f9c74f959..1cd1f595fd81 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -25,5 +25,7 @@
 /* opcodes */
 #define UV_WRITE_PATE  0xF104
 #define UV_RETURN  0xF11C
+#define UV_PAGE_IN 0xF128
+#define UV_PAGE_OUT0xF12C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index

[PATCH v7 0/7] KVMPPC driver to manage secure guest pages

2019-08-22 Thread Bharata B Rao

Hi,

A pseries guest can be run as a secure guest on Ultravisor-enabled
POWER platforms. On such platforms, this driver will be used to manage
the movement of guest pages between the normal memory managed by
hypervisor(HV) and secure memory managed by Ultravisor(UV).

Private ZONE_DEVICE memory equal to the amount of secure memory
available in the platform for running secure guests is created.
Whenever a page belonging to the guest becomes secure, a page from
this private device memory is used to represent and track that secure
page on the HV side. The movement of pages between normal and secure
memory is done via migrate_vma_pages(). The reverse movement is driven
via pagemap_ops.migrate_to_ram().

The page-in or page-out requests from UV will come to HV as hcalls and
HV will call back into UV via uvcalls to satisfy these page requests.

These patches are against hmm.git
(https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm)

plus

Claudio Carvalho's base ultravisor enablement patchset v6
(https://lore.kernel.org/linuxppc-dev/20190822034838.27876-1-cclau...@linux.ibm.com/T/#t)

These patches along with Claudio's above patches are required to
run a secure pseries guest on KVM. This patchset is based on hmm.git
because hmm.git has migrate_vma cleanup and not-device memremap_pages
patchsets that are required by this patchset.

Changes in v7
=
- The major change in this version is to not create a char device but
  instead use the not device versions of memremap_pages and
  request_free_mem_region (Christoph Hellwig)
- Other changes
  * Addressed all the changes suggested by Christoph Hellwig for v6.
  * Removed MIGRATE_VMA_HELPER dependency
  * Switched to using of_find_compatible_node() and not doing
find by path (Thiago Jung Bauermann)
  * Moved kvmppc_rmap_is_devm_pfn to kvm_host.h
  * Updated comments
  * use @page_shift argument in H_SVM_PAGE_OUT instead of PAGE_SHIFT
  * Proper handling of return val from kvmppc_devm_fault_migrate_alloc_and_copy

v6: 
https://lore.kernel.org/linuxppc-dev/20190809084108.30343-1-bhar...@linux.ibm.com/T/#t

Anshuman Khandual (1):
  KVM: PPC: Ultravisor: Add PPC_UV config option

Bharata B Rao (6):
  kvmppc: Driver to manage pages of secure guest
  kvmppc: Shared pages support for secure guests
  kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls
  kvmppc: Handle memory plug/unplug to secure VM
  kvmppc: Radix changes for secure guest
  kvmppc: Support reset of secure guest

 Documentation/virtual/kvm/api.txt  |  19 +
 arch/powerpc/Kconfig   |  17 +
 arch/powerpc/include/asm/hvcall.h  |   9 +
 arch/powerpc/include/asm/kvm_book3s_devm.h |  47 ++
 arch/powerpc/include/asm/kvm_host.h|  39 ++
 arch/powerpc/include/asm/kvm_ppc.h |   2 +
 arch/powerpc/include/asm/ultravisor-api.h  |   6 +
 arch/powerpc/include/asm/ultravisor.h  |  36 ++
 arch/powerpc/kvm/Makefile  |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  22 +
 arch/powerpc/kvm/book3s_hv.c   | 113 
 arch/powerpc/kvm/book3s_hv_devm.c  | 614 +
 arch/powerpc/kvm/powerpc.c |  12 +
 include/uapi/linux/kvm.h   |   1 +
 14 files changed, 940 insertions(+)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_devm.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_devm.c

-- 
2.21.0

Re: [PATCH v1 5/5] mm/memory_hotplug: Remove zone parameter from __remove_pages()

2019-08-22 Thread David Hildenbrand

On 21.08.19 17:40, David Hildenbrand wrote:
> No longer in use, let's drop it. We no longer access the zone of
> possibly never onlined memory (and therefore don't read garabage in
> these scenarios).
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Tony Luck 
> Cc: Fenghua Yu 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: Andrew Morton 
> Cc: Mark Rutland 
> Cc: Steve Capper 
> Cc: Mike Rapoport 
> Cc: Anshuman Khandual 
> Cc: Yu Zhao 
> Cc: Jun Yao 
> Cc: Robin Murphy 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: "Matthew Wilcox (Oracle)" 
> Cc: Christophe Leroy 
> Cc: "Aneesh Kumar K.V" 
> Cc: Pavel Tatashin 
> Cc: Gerald Schaefer 
> Cc: Halil Pasic 
> Cc: Tom Lendacky 
> Cc: Greg Kroah-Hartman 
> Cc: Masahiro Yamada 
> Cc: Dan Williams 
> Cc: Wei Yang 
> Cc: Qian Cai 
> Cc: Jason Gunthorpe 
> Cc: Logan Gunthorpe 
> Cc: Ira Weiny 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-i...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Signed-off-by: David Hildenbrand 
> ---
>  arch/arm64/mm/mmu.c| 4 +---
>  arch/ia64/mm/init.c| 4 +---
>  arch/powerpc/mm/mem.c  | 3 +--
>  arch/s390/mm/init.c| 4 +---
>  arch/sh/mm/init.c  | 4 +---
>  arch/x86/mm/init_32.c  | 4 +---
>  arch/x86/mm/init_64.c  | 4 +---
>  include/linux/memory_hotplug.h | 4 ++--
>  mm/memory_hotplug.c| 6 +++---
>  mm/memremap.c  | 3 +--
>  10 files changed, 13 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index e67bab4d613e..b3843aff12bf 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1080,7 +1080,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
> - struct zone *zone;
>  
>   /*
>* FIXME: Cleanup page tables (also in arch_add_memory() in case
> @@ -1089,7 +1088,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>* unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>* unlocked yet.
>*/
> - zone = page_zone(pfn_to_page(start_pfn));
> - __remove_pages(zone, start_pfn, nr_pages, altmap);
> + __remove_pages(start_pfn, nr_pages, altmap);
>  }
>  #endif
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index bf9df2625bc8..a6dd80a2c939 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -689,9 +689,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
> - struct zone *zone;
>  
> - zone = page_zone(pfn_to_page(start_pfn));
> - __remove_pages(zone, start_pfn, nr_pages, altmap);
> + __remove_pages(start_pfn, nr_pages, altmap);
>  }
>  #endif
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 9191a66b3bc5..7351c44c435a 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -130,10 +130,9 @@ void __ref arch_remove_memory(int nid, u64 start, u64 
> size,
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
> - struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
>   int ret;
>  
> - __remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
> + __remove_pages(start_pfn, nr_pages, altmap);
>  
>   /* Remove htab bolted mappings for this section of memory */
>   start = (unsigned long)__va(start);
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 20340a03ad90..6f13eb66e375 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -296,10 +296,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
> - struct zone *zone;
>  
> - zone = page_zone(pfn_to_page(start_pfn));
> - __remove_pages(zone, start_pfn, nr_pages, altmap);
> + __remove_pages(start_pfn, nr_pages, altmap);
>   vmem_remove_mapping(start, size);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index dfdbaa50946e..d1b1ff2be17a 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -434,9 +434,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>   unsigned long start_pfn = PFN_DOWN(start);
>   unsigned long nr_pages = size >> PAGE_SHIFT;
> - struct zone *zone;
>  
> - zone = page_zone(pfn_to_page(start_pfn));
> -

Re: [PATCH] powerpc/8xx: drop unused self-modifying code alternative to FixupDAR.

2019-08-22 Thread Joakim Tjernlund

On Wed, 2019-08-21 at 20:00 +, Christophe Leroy wrote:
> 
> The code which fixups the DAR on TLB errors for dbcX instructions
> has a self-modifying code alternative that has never been used.
> 
> Drop it.

Argh, my master piece from way back :)
But it is time for it to go.

Reviewed-by: Joakim Tjernlund 

 Jocke
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/head_8xx.S | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index b8ca5b43e587..19f583e18402 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -575,8 +575,6 @@ InstructionBreakpoint:
>   * by decoding the registers used by the dcbx instruction and adding them.
>   * DAR is set to the calculated address.
>   */
> - /* define if you don't want to use self modifying code */
> -#define NO_SELF_MODIFYING_CODE
>  FixupDAR:/* Entry point for dcbx workaround. */
> mtspr   SPRN_M_TW, r10
> /* fetch instruction from memory. */
> @@ -640,27 +638,6 @@ FixupDAR:/* Entry point for dcbx workaround. */
> rlwinm  r10, r10,0,7,5  /* Clear store bit for buggy dcbst insn */
> mtspr   SPRN_DSISR, r10
>  142:   /* continue, it was a dcbx, dcbi instruction. */
> -#ifndef NO_SELF_MODIFYING_CODE
> -   andis.  r10,r11,0x1f/* test if reg RA is r0 */
> -   li  r10,modified_instr@l
> -   dcbtst  r0,r10  /* touch for store */
> -   rlwinm  r11,r11,0,0,20  /* Zero lower 10 bits */
> -   orisr11,r11,640 /* Transform instr. to a "add r10,RA,RB" */
> -   ori r11,r11,532
> -   stw r11,0(r10)  /* store add/and instruction */
> -   dcbf0,r10   /* flush new instr. to memory. */
> -   icbi0,r10   /* invalidate instr. cache line */
> -   mfspr   r11, SPRN_SPRG_SCRATCH1 /* restore r11 */
> -   mfspr   r10, SPRN_SPRG_SCRATCH0 /* restore r10 */
> -   isync   /* Wait until new instr is loaded from memory 
> */
> -modified_instr:
> -   .space  4   /* this is where the add instr. is stored */
> -   bne+143f
> -   subfr10,r0,r10  /* r10=r10-r0, only if reg RA is r0 */
> -143:   mtdar   r10 /* store faulting EA in DAR */
> -   mfspr   r10,SPRN_M_TW
> -   b   DARFixed/* Go back to normal TLB handling */
> -#else
> mfctr   r10
> mtdar   r10 /* save ctr reg in DAR */
> rlwinm  r10, r11, 24, 24, 28/* offset into jump table for reg RB 
> */
> @@ -724,7 +701,6 @@ modified_instr:
> add r10, r10, r11   /* add it */
> mfctr   r11 /* restore r11 */
> b   151b
> -#endif
> 
>  /*
>   * This is where the main kernel code starts.
> --
> 2.13.3
>

[PATCH] powerpc: dump kernel log before carrying out fadump or kdump

2019-08-22 Thread Ganesh Goudar

Die or panic path in system reset handler dumps kernel log to
nvram, since commit 4388c9b3a6ee ("powerpc: Do not send system
reset request through the oops path") system reset request is
not allowed to take die path if fadump or kdump is configured,
hence we miss dumping kernel log to nvram, call kmsg_dump()
before carrying out fadump or kdump.

Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the 
oops path")
Reviewed-by: Mahesh Salgaonkar 
Signed-off-by: Ganesh Goudar 
---
 arch/powerpc/kernel/traps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 11caa0291254..82f43535e686 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -472,6 +472,7 @@ void system_reset_exception(struct pt_regs *regs)
if (debugger(regs))
goto out;
 
+   kmsg_dump(KMSG_DUMP_OOPS);
/*
 * A system reset is a request to dump, so we always send
 * it through the crashdump code (if fadump or kdump are
-- 
2.17.2

[PATCH] powerpc/eeh: Fixup EEH for pSeries hotplug

2019-08-22 Thread Sam Bobroff

Signed-off-by: Sam Bobroff 
---
Let's move the test into eeh_add_device_tree_late().

Thanks,
Sam.

 arch/powerpc/kernel/eeh.c | 2 ++
 arch/powerpc/kernel/of_platform.c | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 87edac6f2fd9..e95a7a3c9037 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1328,6 +1328,8 @@ void eeh_add_device_tree_late(struct pci_bus *bus)
 {
struct pci_dev *dev;
 
+   if (eeh_has_flag(EEH_FORCE_DISABLED))
+   return;
list_for_each_entry(dev, >devices, bus_list) {
eeh_add_device_late(dev);
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
diff --git a/arch/powerpc/kernel/of_platform.c 
b/arch/powerpc/kernel/of_platform.c
index 11c807468ab5..427fc22f72b6 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -81,8 +81,7 @@ static int of_pci_phb_probe(struct platform_device *dev)
pcibios_claim_one_bus(phb->bus);
 
/* Finish EEH setup */
-   if (!eeh_has_flag(EEH_FORCE_DISABLED))
-   eeh_add_device_tree_late(phb->bus);
+   eeh_add_device_tree_late(phb->bus);
 
/* Add probed PCI devices to the device model */
pci_bus_add_devices(phb->bus);
-- 
2.22.0.216.g00a2a96fc9

Re: [linux-next][PPC][bisected c7d8b7][gcc 6.4.1] build error at drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1471

2019-08-22 Thread Stephen Rothwell

Hi Abdul,

On Thu, 22 Aug 2019 11:16:51 +0530 Abdul Haleem  
wrote:
>
> Today's linux-next kernel 5.3.0-rc5-next-20190820 failed to build on my
> powerpc machine
> 
> Build errors:
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c: In function amdgpu_exit:
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1471:2: error: implicit
> declaration of function mmu_notifier_synchronize
> [-Werror=implicit-function-declaration]
>   mmu_notifier_synchronize();
>   ^~~~ 
> cc1: some warnings being treated as errors
> make[4]: *** [drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o] Error 1
> make[3]: *** [drivers/gpu/drm/amd/amdgpu] Error 2
> 
> It was introduced with commit c7d8b7 (hmm: use mmu_notifier_get/put for
> 'struct hmm')

This should have been fixed in next-20190821.

-- 
Cheers,
Stephen Rothwell


pgpK9H9hqG16g.pgp
Description: OpenPGP digital signature

Re: [PATCH 2/3] powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specific

2019-08-22 Thread Sam Bobroff

On Wed, Aug 21, 2019 at 04:26:54PM +1000, Oliver O'Halloran wrote:
> The powerpc PCI code requires that a pci_dn structure exists for all
> devices in the system. This is fine for real devices since at boot a pci_dn
> is created for each PCI device in the DT and it's fine for hotplugged devices
> since the hotplug slot driver will manage the pci_dn's devices in hotplug
> slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for
> virtual functions since firmware is unaware of VFs, and they aren't
> "hot plugged" in the traditional sense.
> 
> Management of the pci_dn is handled by the, poorly named, functions:
> add_pci_dev_data() and remove_pci_dev_data(). The entire body of these
> functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used
> in any other context, so make them only available when CONFIG_PCI_IOV
> is selected, and rename them to reflect their actual usage rather than
> having them masquerade as generic code.
> 
> Signed-off-by: Oliver O'Halloran 

Nice cleanup,

Reviewed-by: Sam Bobroff 

> ---
>  arch/powerpc/include/asm/pci-bridge.h |  7 +--
>  arch/powerpc/kernel/pci_dn.c  | 15 +--
>  arch/powerpc/platforms/powernv/pci-ioda.c |  4 ++--
>  arch/powerpc/platforms/pseries/pci.c  |  4 ++--
>  4 files changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> b/arch/powerpc/include/asm/pci-bridge.h
> index ea6ec65..69f4cb3 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -223,12 +223,15 @@ struct pci_dn {
>  extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
>  int devfn);
>  extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
> -extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
> -extern void remove_dev_pci_data(struct pci_dev *pdev);
>  extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>  struct device_node *dn);
>  extern void pci_remove_device_node_info(struct device_node *dn);
>  
> +#ifdef CONFIG_PCI_IOV
> +struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev);
> +void remove_sriov_vf_pdns(struct pci_dev *pdev);
> +#endif
> +
>  static inline int pci_device_from_OF_node(struct device_node *np,
> u8 *bus, u8 *devfn)
>  {
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 795c4e3..24da1d8 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -125,7 +125,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_PCI_IOV
> -static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
> +static struct pci_dn *add_one_sriov_vf_pdn(struct pci_dn *parent,
>  int vf_index,
>  int busno, int devfn)
>  {
> @@ -151,11 +151,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
> *parent,
>  
>   return pdn;
>  }
> -#endif
>  
> -struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
> +struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev)
>  {
> -#ifdef CONFIG_PCI_IOV
>   struct pci_dn *parent, *pdn;
>   int i;
>  
> @@ -176,7 +174,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>   for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>   struct eeh_dev *edev __maybe_unused;
>  
> - pdn = add_one_dev_pci_data(parent, i,
> + pdn = add_one_sriov_vf_pdn(parent, i,
>  pci_iov_virtfn_bus(pdev, i),
>  pci_iov_virtfn_devfn(pdev, i));
>   if (!pdn) {
> @@ -192,14 +190,11 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>   edev->physfn = pdev;
>  #endif /* CONFIG_EEH */
>   }
> -#endif /* CONFIG_PCI_IOV */
> -
>   return pci_get_pdn(pdev);
>  }
>  
> -void remove_dev_pci_data(struct pci_dev *pdev)
> +void remove_sriov_vf_pdns(struct pci_dev *pdev)
>  {
> -#ifdef CONFIG_PCI_IOV
>   struct pci_dn *parent;
>   struct pci_dn *pdn, *tmp;
>   int i;
> @@ -271,8 +266,8 @@ void remove_dev_pci_data(struct pci_dev *pdev)
>   kfree(pdn);
>   }
>   }
> -#endif /* CONFIG_PCI_IOV */
>  }
> +#endif /* CONFIG_PCI_IOV */
>  
>  struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>   struct device_node *dn)
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index d8080558d0..f1fa489 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1719,14 +1719,14 @@ int pnv_pcibios_sriov_disable(struct pci_dev *pdev)
>   pnv_pci_sriov_disable(pdev);
>  
>   /* Release PCI data */
> - remove_dev_pci_data(pdev);
> +

Re: [PATCH 3/3] powerpc/pcidn: Warn when sriov pci_dn management is used incorrectly

2019-08-22 Thread Sam Bobroff

On Wed, Aug 21, 2019 at 04:26:55PM +1000, Oliver O'Halloran wrote:
> These functions can only be used on a SR-IOV capable physical function and
> they're only called in pcibios_sriov_enable / disable. Make them emit a
> warning in the future if they're used incorrectly and remove the dead
> code that checks if the device is a VF.
> 
> Signed-off-by: Oliver O'Halloran 

Looks good, but you might want to consider using WARN_ON_ONCE() just in
case it gets hit a lot.

Reviewed-by: Sam Bobroff 

> ---
>  arch/powerpc/kernel/pci_dn.c | 17 +++--
>  1 file changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 24da1d8..69dafc3 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -158,8 +158,8 @@ struct pci_dn *add_sriov_vf_pdns(struct pci_dev *pdev)
>   int i;
>  
>   /* Only support IOV for now */
> - if (!pdev->is_physfn)
> - return pci_get_pdn(pdev);
> + if (WARN_ON(!pdev->is_physfn))
> + return NULL;
>  
>   /* Check if VFs have been populated */
>   pdn = pci_get_pdn(pdev);
> @@ -199,19 +199,8 @@ void remove_sriov_vf_pdns(struct pci_dev *pdev)
>   struct pci_dn *pdn, *tmp;
>   int i;
>  
> - /*
> -  * VF and VF PE are created/released dynamically, so we need to
> -  * bind/unbind them.  Otherwise the VF and VF PE would be mismatched
> -  * when re-enabling SR-IOV.
> -  */
> - if (pdev->is_virtfn) {
> - pdn = pci_get_pdn(pdev);
> - pdn->pe_number = IODA_INVALID_PE;
> - return;
> - }
> -
>   /* Only support IOV PF for now */
> - if (!pdev->is_physfn)
> + if (WARN_ON(!pdev->is_physfn))
>   return;
>  
>   /* Check if VFs have been populated */
> -- 
> 2.9.5
> 


signature.asc
Description: PGP signature

81 matches

Mail list logo