Re: linux-next: build failure after merge of the powerpc tree

2016-01-12 Thread Michael Ellerman
On Wed, 2016-01-13 at 11:16 +0530, Aneesh Kumar K.V wrote:
> Michael Ellerman  writes:
> > On Thu, 2016-01-07 at 19:16 +1100, Stephen Rothwell wrote:
> > > After merging the powerpc tree, today's linux-next build (powerpc64
> > > allnoconfig) failed like this:
> > > 
> > > arch/powerpc/mm/hash_utils_64.c: In function 'get_paca_psize':
> > > arch/powerpc/mm/hash_utils_64.c:869:19: error: 'struct paca_struct' has 
> > > no member named 'context'
> > >   return get_paca()->context.user_psize;
> > >^
> > > arch/powerpc/mm/hash_utils_64.c:870:1: error: control reaches end of 
> > > non-void function [-Werror=return-type]
> > >  }
> > >  ^
> > > 
> > > Caused by commit
> > > 
> > >   2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t 
> > > to the paca")
> > 
> > Well that's rather embarrassing, for Mikey ;D

> > > This build has CONFIG_PPC_MM_SLICES not set ...
> > 
> > Ugh, but it would seem none of our defconfigs do :/
> 
> 4K page size with hugetlb disabled will get that 

Yeah, but none of our defconfigs do that.

I've got a kisskb target for it now:

  http://kisskb.ellerman.id.au/kisskb/target/28577/

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 14/33] powerpc/mm: Use helper for finding pte bits mapping I/O area

2016-01-12 Thread Benjamin Herrenschmidt
On Wed, 2016-01-13 at 11:37 +0530, Aneesh Kumar K.V wrote:
> Benjamin Herrenschmidt  writes:
> 
> > On Tue, 2016-01-12 at 10:42 +0300, Denis Kirjanov wrote:
> > > > +static inline unsigned long pte_io_cache_bits(void)
> > > > +{
> > > > + return _PAGE_NO_CACHE | _PAGE_GUARDED;
> > > > +}
> > > This could be just plain #define
> > 
> > Or just use pgprot_noncached()
> > 
> #define pgprot_noncached(prot)  (__pgprot((pgprot_val(prot) &
> ~_PAGE_CACHE_CTL) | \
>   _PAGE_NO_CACHE |
> _PAGE_GUARDED))
> 
> 
> That will return me a pgprot_t.  I can fix that by using
> pgprot_val(pgprot_noncached(0)). Is that what you are suggesting ?

Shouln't ioremap just use pgprot_noncached(PAGE_KERNEL) or similar ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2016-01-12 Thread Aravinda Prasad
Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reasons upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector. In the earlier approach QEMU was enhanced to
patch the 0x200 interrupt vector during boot. The
patched code at 0x200 issued a private hcall to pass
the control to QEMU to build the error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/kvm/book3s_hv.c|   12 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   48 +++
 2 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7352b5..4fa03d0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -858,15 +858,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
-   /*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
-*/
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..84e32a3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stb r0, HSTATE_HWTHREAD_REQ(r13)
 
/*
-* For external and machine check interrupts, we need
-* to call the Linux handler to process the interrupt.
-* We do that by jumping to absolute address 0x500 for
-* external interrupts, or the machine_check_fwnmi label
-* for machine checks (since firmware might have patched
-* the vector area at 0x200).  The [h]rfid at the end of the
-* handler will return to the book3s_hv_interrupts.S code.
-* For other interrupts we do the rfid to get back
-* to the book3s_hv_interrupts.S code here.
+* For external interrupts we need to call the Linux
+* handler to process the interrupt. We do that by jumping
+* to absolute address 0x500 for external interrupts.
+* The [h]rfid at the end of the handler will return to
+* the book3s_hv_interrupts.S code. For other interrupts
+* we do the rfid to get back to the book3s_hv_interrupts.S
+* code here.
 */
ld  r8, 112+PPC_LR_STKOFF(r1)
addir1, r1, 112
ld  r7, HSTATE_HOST_MSR(r13)
 
-   cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
@@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtmsrd  r6, 1   /* Clear RI in MSR */
mtsrr0  r8
mtsrr1  r7
-   beq cr1, 13f/* machine check */
RFI
 
/* On POWER7, we have external interrupts set to use HSRR0/1 */
@@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_HSRR1, r7
ba  0x500
 
-13:b   machine_check_fwnmi
-
 14:mtspr   SPRN_HSRR0, r8
mtspr   SPRN_HSRR1, r7
b   hmi_exception_after_realmode
@@ -2390,15 +2384,13 @@ machine_check_realmode:
ld  r9, HSTATE_KVM_VCPU(r13)
li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
/*
-* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
-* machine check interrupt (set HSRR0 to 0x200). And for handled
-* errors (no-fatal), just go back to guest execution with current
-* HSRR0 instead of exiting guest. This new approach will inject
-* machine check to guest for fatal error causing guest to

[PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour

2016-01-12 Thread Aravinda Prasad
This patch introduces a new KVM capability to control
how KVM behaves on machine check exception (MCE).
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector if the address in
error belongs to the guest. With this capability KVM
causes a guest exit with NMI exit reason.

This is required to avoid problems if a new kernel/KVM
is used with an old QEMU for guests that don't issue
"ibm,nmi-register". As old QEMU does not understand the
NMI exit type, it treats it as a fatal error. However,
the guest could have handled the machine check error
if the exception was delivered to guest's 0x200 interrupt
vector instead of NMI exit in case of old QEMU.

QEMU part can be found at:
http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html

Change Log v3:
  - Split the patch into 2. First patch introduces the
new capability while the second one enhances KVM to
redirect MCE.
  - Fix access width bug
  - Rebased to v4.4-rc7

Change Log v2:
  - Added KVM capability

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/powerpc.c  |7 +++
 include/uapi/linux/kvm.h|1 +
 4 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cfa758c..9ac2b84 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -243,6 +243,7 @@ struct kvm_arch {
int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 221d584..6a4e81a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,7 @@ int main(void)
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6fd2405..a8399b5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_PPC_FWNMI:
+   r = 1;
+   break;
default:
r = 0;
break;
@@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu 
*vcpu,
break;
}
 #endif /* CONFIG_KVM_XICS */
+   case KVM_CAP_PPC_FWNMI:
+   r = 0;
+   vcpu->kvm->arch.fwnmi_enabled = true;
+   break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..d8a07b5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_PPC_FWNMI 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] powerpc/powernv: Remove support for p5ioc2

2016-01-12 Thread Russell Currey
"p5ioc2 is used by approximately 2 machines in the world, and has never
ever been a supported configuration."

The code for p5ioc2 is essentially unused and complicates what is already
a very complicated codebase.  Its removal is essentially a "free win" in
the effort to simplify the powernv PCI code.

In addition, support for p5ioc2 has been dropped from skiboot.  There's no
reason to keep it around in the kernel.

Signed-off-by: Russell Currey 
---
V2: Remove pointless union and rebase on -next

Tested on a P7IOC machine and a PHB3 machine.

Skiboot p5ioc2 removal patch: https://patchwork.ozlabs.org/patch/544898/
---
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/pci-p5ioc2.c | 271 
 arch/powerpc/platforms/powernv/pci.c|  15 +-
 arch/powerpc/platforms/powernv/pci.h| 152 
 4 files changed, 74 insertions(+), 366 deletions(-)
 delete mode 100644 arch/powerpc/platforms/powernv/pci-p5ioc2.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index f1516b5..cd9711e 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -5,7 +5,7 @@ obj-y   += opal-msglog.o opal-hmi.o 
opal-power.o opal-irqchip.o
 obj-y  += opal-kmsg.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o npu-dma.o
+obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
 obj-$(CONFIG_EEH)  += eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)   += opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c 
b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
deleted file mode 100644
index f2bdfea..000
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ /dev/null
@@ -1,271 +0,0 @@
-/*
- * Support PCI/PCIe on PowerNV platforms
- *
- * Currently supports only P5IOC2
- *
- * Copyright 2011 Benjamin Herrenschmidt, IBM Corp.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "powernv.h"
-#include "pci.h"
-
-/* For now, use a fixed amount of TCE memory for each p5ioc2
- * hub, 16M will do
- */
-#define P5IOC2_TCE_MEMORY  0x0100
-
-#ifdef CONFIG_PCI_MSI
-static int pnv_pci_p5ioc2_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
-   unsigned int hwirq, unsigned int virq,
-   unsigned int is_64, struct msi_msg *msg)
-{
-   if (WARN_ON(!is_64))
-   return -ENXIO;
-   msg->data = hwirq - phb->msi_base;
-   msg->address_hi = 0x1000;
-   msg->address_lo = 0;
-
-   return 0;
-}
-
-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb)
-{
-   unsigned int count;
-   const __be32 *prop = of_get_property(phb->hose->dn,
-"ibm,opal-msi-ranges", NULL);
-   if (!prop)
-   return;
-
-   /* Don't do MSI's on p5ioc2 PCI-X are they are not properly
-* verified in HW
-*/
-   if (of_device_is_compatible(phb->hose->dn, "ibm,p5ioc2-pcix"))
-   return;
-   phb->msi_base = be32_to_cpup(prop);
-   count = be32_to_cpup(prop + 1);
-   if (msi_bitmap_alloc(&phb->msi_bmp, count, phb->hose->dn)) {
-   pr_err("PCI %d: Failed to allocate MSI bitmap !\n",
-  phb->hose->global_number);
-   return;
-   }
-   phb->msi_setup = pnv_pci_p5ioc2_msi_setup;
-   phb->msi32_support = 0;
-   pr_info(" Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
-   count, phb->msi_base);
-}
-#else
-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { }
-#endif /* CONFIG_PCI_MSI */
-
-static struct iommu_table_ops pnv_p5ioc2_iommu_ops = {
-   .set = pnv_tce_build,
-#ifdef CONFIG_IOMMU_API
-   .exchange = pnv_tce_xchg,
-#endif
-   .clear = pnv_tce_free,
-   .get = pnv_tce_get,
-};
-
-static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
-struct pci_dev *pdev)
-{
-   struct iommu_table *tbl = phb->p5ioc2.table_group.tables[0];
-
-   if (!tbl->it_map) {
-   tbl->it_ops = &pnv_p5ioc2_iommu_ops;
-   iommu_init_table(tbl, phb->hose->node);
-   iommu_register_group(&phb->p5ioc2.table_group,
-   pci_domain_nr(phb->hose->bus), phb->opal_id);
-   INIT_LIST_HEAD_RCU(&tbl->it_group_list);
-   pnv_pc

Re: [PATCH] powerpc/powernv: Remove support for p5ioc2

2016-01-12 Thread Russell Currey
On Wed, 2016-01-13 at 17:39 +1100, Andrew Donnellan wrote:
> On 13/01/16 17:10, Russell Currey wrote:
> > "p5ioc2 is used by approximately 2 machines in the world, and has never
> > ever been a supported configuration."
> > 
> > The code for p5ioc2 is essentially unused and complicates what is already
> > a very complicated codebase.  Its removal is essentially a "free win" in
> > the effort to simplify the powernv PCI code.
> > 
> > In addition, support for p5ioc2 has been dropped from skiboot.  There's no
> > reason to keep it around in the kernel.
> > 
> > Signed-off-by: Russell Currey 
> 
> Doesn't apply cleanly on next, but that's minor.
Going to do a V2 to address your other comment, so I might as well fix the next
issue.
> 
> > @@ -117,11 +115,6 @@ struct pnv_phb {
> > 
> >     union {
> >     struct {
> > -   struct iommu_table iommu_table;
> > -   struct iommu_table_group table_group;
> > -   } p5ioc2;
> > -
> > -   struct {
> >     /* Global bridge info */
> >     unsigned inttotal_pe;
> >     unsigned intreserved_pe;
> 
> Given this leaves struct ioda as the only member of the union, do we 
> want to get rid of the union?
> 
Probably.  I was going to leave that for future patches (which will be a proper
refactoring rather than a pure removal), but given it makes no difference I
should just get rid of it now.

Thanks for the review.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/powernv: Remove support for p5ioc2

2016-01-12 Thread Andrew Donnellan

On 13/01/16 17:10, Russell Currey wrote:

"p5ioc2 is used by approximately 2 machines in the world, and has never
ever been a supported configuration."

The code for p5ioc2 is essentially unused and complicates what is already
a very complicated codebase.  Its removal is essentially a "free win" in
the effort to simplify the powernv PCI code.

In addition, support for p5ioc2 has been dropped from skiboot.  There's no
reason to keep it around in the kernel.

Signed-off-by: Russell Currey 


Doesn't apply cleanly on next, but that's minor.


@@ -117,11 +115,6 @@ struct pnv_phb {

union {
struct {
-   struct iommu_table iommu_table;
-   struct iommu_table_group table_group;
-   } p5ioc2;
-
-   struct {
/* Global bridge info */
unsigned inttotal_pe;
unsigned intreserved_pe;


Given this leaves struct ioda as the only member of the union, do we 
want to get rid of the union?


--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/powernv: Remove support for p5ioc2

2016-01-12 Thread Russell Currey
"p5ioc2 is used by approximately 2 machines in the world, and has never
ever been a supported configuration."

The code for p5ioc2 is essentially unused and complicates what is already
a very complicated codebase.  Its removal is essentially a "free win" in
the effort to simplify the powernv PCI code.

In addition, support for p5ioc2 has been dropped from skiboot.  There's no
reason to keep it around in the kernel.

Signed-off-by: Russell Currey 
---
Tested on a P7IOC machine and a PHB3 machine.

Skiboot p5ioc2 removal patch: https://patchwork.ozlabs.org/patch/544898/
---
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/pci-p5ioc2.c | 271 
 arch/powerpc/platforms/powernv/pci.c|  15 +-
 arch/powerpc/platforms/powernv/pci.h|  12 +-
 4 files changed, 5 insertions(+), 295 deletions(-)
 delete mode 100644 arch/powerpc/platforms/powernv/pci-p5ioc2.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 1c8cdb6..8a65c9c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -4,7 +4,7 @@ obj-y   += rng.o opal-elog.o opal-dump.o 
opal-sysparam.o opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
+obj-$(CONFIG_PCI)  += pci.o pci-ioda.o
 obj-$(CONFIG_EEH)  += eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)   += opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c 
b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
deleted file mode 100644
index f2bdfea..000
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ /dev/null
@@ -1,271 +0,0 @@
-/*
- * Support PCI/PCIe on PowerNV platforms
- *
- * Currently supports only P5IOC2
- *
- * Copyright 2011 Benjamin Herrenschmidt, IBM Corp.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "powernv.h"
-#include "pci.h"
-
-/* For now, use a fixed amount of TCE memory for each p5ioc2
- * hub, 16M will do
- */
-#define P5IOC2_TCE_MEMORY  0x0100
-
-#ifdef CONFIG_PCI_MSI
-static int pnv_pci_p5ioc2_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
-   unsigned int hwirq, unsigned int virq,
-   unsigned int is_64, struct msi_msg *msg)
-{
-   if (WARN_ON(!is_64))
-   return -ENXIO;
-   msg->data = hwirq - phb->msi_base;
-   msg->address_hi = 0x1000;
-   msg->address_lo = 0;
-
-   return 0;
-}
-
-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb)
-{
-   unsigned int count;
-   const __be32 *prop = of_get_property(phb->hose->dn,
-"ibm,opal-msi-ranges", NULL);
-   if (!prop)
-   return;
-
-   /* Don't do MSI's on p5ioc2 PCI-X are they are not properly
-* verified in HW
-*/
-   if (of_device_is_compatible(phb->hose->dn, "ibm,p5ioc2-pcix"))
-   return;
-   phb->msi_base = be32_to_cpup(prop);
-   count = be32_to_cpup(prop + 1);
-   if (msi_bitmap_alloc(&phb->msi_bmp, count, phb->hose->dn)) {
-   pr_err("PCI %d: Failed to allocate MSI bitmap !\n",
-  phb->hose->global_number);
-   return;
-   }
-   phb->msi_setup = pnv_pci_p5ioc2_msi_setup;
-   phb->msi32_support = 0;
-   pr_info(" Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
-   count, phb->msi_base);
-}
-#else
-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { }
-#endif /* CONFIG_PCI_MSI */
-
-static struct iommu_table_ops pnv_p5ioc2_iommu_ops = {
-   .set = pnv_tce_build,
-#ifdef CONFIG_IOMMU_API
-   .exchange = pnv_tce_xchg,
-#endif
-   .clear = pnv_tce_free,
-   .get = pnv_tce_get,
-};
-
-static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
-struct pci_dev *pdev)
-{
-   struct iommu_table *tbl = phb->p5ioc2.table_group.tables[0];
-
-   if (!tbl->it_map) {
-   tbl->it_ops = &pnv_p5ioc2_iommu_ops;
-   iommu_init_table(tbl, phb->hose->node);
-   iommu_register_group(&phb->p5ioc2.table_group,
-   pci_domain_nr(phb->hose->bus), phb->opal_id);
-   INIT_LIST_HEAD_RCU(&tbl->it_group_list);
-   pnv_pci_link_table_and_group(phb->hose->n

Re: [RFC PATCH V1 14/33] powerpc/mm: Use helper for finding pte bits mapping I/O area

2016-01-12 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt  writes:

> On Tue, 2016-01-12 at 10:42 +0300, Denis Kirjanov wrote:
>> > +static inline unsigned long pte_io_cache_bits(void)
>> > +{
>> > + return _PAGE_NO_CACHE | _PAGE_GUARDED;
>> > +}
>> This could be just plain #define
>
> Or just use pgprot_noncached()
>
#define pgprot_noncached(prot)(__pgprot((pgprot_val(prot) & 
~_PAGE_CACHE_CTL) | \
_PAGE_NO_CACHE | _PAGE_GUARDED))


That will return me a pgprot_t.  I can fix that by using
pgprot_val(pgprot_noncached(0)). Is that what you are suggesting ?

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()

2016-01-12 Thread Benjamin Herrenschmidt
On Sun, 2016-01-10 at 01:08 -0200, Guilherme G. Piccoli wrote:weust changes the 
way the arch checking is done in function
> 
> This patch jeeh_add_device_early(): we use no more eeh_enabled(), but instead 
> we check therunning architecture by using the macro machine_is(). If we are 
> running on
> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
> the function. This way, we don't enable EEH on Cell and we don't hit the oops
> on DLPAR either.

Can't we just check for eeh_ops being NULL ?

Cheers,
Ben.

> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on 
> Cell")
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  arch/powerpc/kernel/eeh.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 40e4d4a..81e2d3e 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1072,7 +1072,13 @@ void eeh_add_device_early(struct pci_dn *pdn)
>   struct pci_controller *phb;
>   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>  
> - if (!edev || !eeh_enabled())
> + if (!edev)
> + return;
> +
> + /* Some platforms (like Cell) don't have EEH capabilities, so we
> +  * need to abort here. In case of pseries or powernv, we have EEH
> +  * so we can continue. */
> + if (!machine_is(pseries) && !machine_is(powernv))
>   return;
>  
>   if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 01/33] powerpc/mm: add _PAGE_HASHPTE similar to 4K hash

2016-01-12 Thread Aneesh Kumar K.V
Balbir Singh  writes:

> On Tue, 12 Jan 2016 12:45:36 +0530
> "Aneesh Kumar K.V"  wrote:
>
>> Not really needed. But this brings it back to as it was before
>> 
>
> Could you expand on not really needed. Could the changelog describe how
> the bits will be used in the follow on patches.
>

What confused me in the beginning was difference between 4k and 64k
page size. I was trying to find out whether we miss a hpte flush in any
scenario because of this. ie, a pte update on a linux pte, for which we
are doing a parallel hash pte insert. After looking at it closer my
understanding is this won't happen because pte update also look at
_PAGE_BUSY and we will wait for hash pte insert to finish before going
ahead with the pte update. But to avoid further confusion I was wondering
whether we should keep this closer to what we have with __hash_page_4k.
Hence the statement "Not really needed".

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-next: build failure after merge of the powerpc tree

2016-01-12 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> On Thu, 2016-01-07 at 19:16 +1100, Stephen Rothwell wrote:
>> Hi all,
>> 
>> After merging the powerpc tree, today's linux-next build (powerpc64
>> allnoconfig) failed like this:
>> 
>> arch/powerpc/mm/hash_utils_64.c: In function 'get_paca_psize':
>> arch/powerpc/mm/hash_utils_64.c:869:19: error: 'struct paca_struct' has no 
>> member named 'context'
>>   return get_paca()->context.user_psize;
>>^
>> arch/powerpc/mm/hash_utils_64.c:870:1: error: control reaches end of 
>> non-void function [-Werror=return-type]
>>  }
>>  ^
>> 
>> Caused by commit
>> 
>>   2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to 
>> the paca")
>
> Well that's rather embarrassing, for Mikey ;D
>
>> This build has CONFIG_PPC_MM_SLICES not set ...
>
> Ugh, but it would seem none of our defconfigs do :/

4K page size with hugetlb disabled will get that 

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 14/33] powerpc/mm: Use helper for finding pte bits mapping I/O area

2016-01-12 Thread Benjamin Herrenschmidt
On Tue, 2016-01-12 at 10:42 +0300, Denis Kirjanov wrote:
> > +static inline unsigned long pte_io_cache_bits(void)
> > +{
> > + return _PAGE_NO_CACHE | _PAGE_GUARDED;
> > +}
> This could be just plain #define

Or just use pgprot_noncached()

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 01/33] powerpc/mm: add _PAGE_HASHPTE similar to 4K hash

2016-01-12 Thread Balbir Singh
On Tue, 12 Jan 2016 12:45:36 +0530
"Aneesh Kumar K.V"  wrote:

> Not really needed. But this brings it back to as it was before
> 

Could you expand on not really needed. Could the changelog describe how
the bits will be used in the follow on patches.

Balbir
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH kernel] powerpc/ioda: Set "read" permission when "write" is set

2016-01-12 Thread Douglas Miller



On 01/12/2016 05:07 PM, Benjamin Herrenschmidt wrote:

On Tue, 2016-01-12 at 15:40 +1100, Alexey Kardashevskiy wrote:

Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty
platforms.
However IODA2 is strict about this and produces an EEH when "read"
permission is not and reading happens.

This adds a workaround in IODA code to always add the "read" bit when
the "write" bit is set.

Cc: Benjamin Herrenschmidt 
Signed-off-by: Alexey Kardashevskiy 
---


Ben, what was the driver which did not set "read" and caused EEH?

aacraid

Cheers,
Ben.
Just to be precise, the driver wasn't responsible for setting READ. The 
driver called scsi_dma_map() and the scsicmd was set (by scsi layer) as 
DMA_FROM_DEVICE so the current code would set the permissions to 
WRITE-ONLY. Previously, and in other architectures, this scsicmd would 
have resulted in READ+WRITE permissions on the DMA map.



---
  arch/powerpc/platforms/powernv/pci.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci.c
b/arch/powerpc/platforms/powernv/pci.c
index f2dd772..c7dcae5 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -601,6 +601,9 @@ int pnv_tce_build(struct iommu_table *tbl, long
index, long npages,
u64 rpn = __pa(uaddr) >> tbl->it_page_shift;
long i;
  
+	if (proto_tce & TCE_PCI_WRITE)

+   proto_tce |= TCE_PCI_READ;
+
for (i = 0; i < npages; i++) {
unsigned long newtce = proto_tce |
((rpn + i) << tbl->it_page_shift);
@@ -622,6 +625,9 @@ int pnv_tce_xchg(struct iommu_table *tbl, long
index,
  
  	BUG_ON(*hpa & ~IOMMU_PAGE_MASK(tbl));
  
+	if (newtce & TCE_PCI_WRITE)

+   newtce |= TCE_PCI_READ;
+
oldtce = xchg(pnv_tce(tbl, idx), cpu_to_be64(newtce));
*hpa = be64_to_cpu(oldtce) & ~(TCE_PCI_READ |
TCE_PCI_WRITE);
*direction = iommu_tce_direction(oldtce);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages

2016-01-12 Thread Andrew Donnellan

On 13/01/16 12:04, Russell Currey wrote:

The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
parameters and returned nothing.  The call was updated to accept the
terminal number to flush, and returned various values depending on the
state of the output buffer.

The prototype has been updated and its usage in the OPAL kmsg dumper has
been modified to support its new behaviour as an incremental flush.

Signed-off-by: Russell Currey 


Looks fine to me.

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages

2016-01-12 Thread Russell Currey
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
parameters and returned nothing.  The call was updated to accept the
terminal number to flush, and returned various values depending on the
state of the output buffer.

The prototype has been updated and its usage in the OPAL kmsg dumper has
been modified to support its new behaviour as an incremental flush.

Signed-off-by: Russell Currey 
---
This patch should be applied on top of "powerpc/powernv: Add a kmsg_dumper
that flushes console output on panic", which was recently merged into
powerpc-next.
---
 arch/powerpc/include/asm/opal.h| 2 +-
 arch/powerpc/platforms/powernv/opal-kmsg.c | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a5fd407..07a99e6 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -35,7 +35,7 @@ int64_t opal_console_read(int64_t term_number, __be64 *length,
  uint8_t *buffer);
 int64_t opal_console_write_buffer_space(int64_t term_number,
__be64 *length);
-void opal_console_flush(void);
+int64_t opal_console_flush(int64_t term_number);
 int64_t opal_rtc_read(__be32 *year_month_day,
  __be64 *hour_minute_second_millisecond);
 int64_t opal_rtc_write(uint32_t year_month_day,
diff --git a/arch/powerpc/platforms/powernv/opal-kmsg.c 
b/arch/powerpc/platforms/powernv/opal-kmsg.c
index bd3b2ee..6f1214d 100644
--- a/arch/powerpc/platforms/powernv/opal-kmsg.c
+++ b/arch/powerpc/platforms/powernv/opal-kmsg.c
@@ -27,6 +27,7 @@ static void force_opal_console_flush(struct kmsg_dumper 
*dumper,
 enum kmsg_dump_reason reason)
 {
int i;
+   int64_t ret;
 
/*
 * Outside of a panic context the pollers will continue to run,
@@ -36,7 +37,13 @@ static void force_opal_console_flush(struct kmsg_dumper 
*dumper,
return;
 
if (opal_check_token(OPAL_CONSOLE_FLUSH)) {
-   opal_console_flush();
+   ret = opal_console_flush(0);
+
+   if (ret == OPAL_UNSUPPORTED || ret == OPAL_PARAMETER)
+   return;
+
+   /* Incrementally flush until there's nothing left */
+   while (opal_console_flush(0) != OPAL_SUCCESS);
} else {
/*
 * If OPAL_CONSOLE_FLUSH is not implemented in the firmware,
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Leonid Yegoshin

On 01/12/2016 01:40 PM, Peter Zijlstra wrote:



It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
resource, especially taking into account that "lightweight syncs" are
converted to a heavy "SYNC 0" in many of that CPUs. However the latest
MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
of SYNC at LL/SC inside atomics, barriers etc.

What ?! Are you saying that because R2 has short pipelines its unlikely
to hit the reordering issues and we can omit barriers?


It was my guess to explain - why barriers was not included originally. 
You can check with Ralf, he knows more about that time MIPS Linux code.


I bother with this more than 2 years and I just try to solve that issue 
- in recent CPUs the load after LL/SC synchronization instruction loop 
can get ahead of SC for sure, it was tested.





And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are_NOT_  transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers.

Please see above, point 2.

That did not in fact enlighten things. Are they transitive/multi-copy
atomic or not?


Peter Zijlstra recently wrote: "In particular we're very much all 
'confused' about the various notions of transitivity". I am actually 
confused too and need some examples here.




(and here Will will go into great detail on the differences between the
two and make our collective brains explode :-)


That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.

I don't understand that - I tried hard but I can't find any word like
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.

From: lkml.kernel.org/r/20150828153921.gf19...@twins.programming.kicks-ass.net

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.


MIPS Arch starting from R2 requires that. If some CPU can't, it should 
execute a full "SYNC 0" instead, which is a full memory barrier.




Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.


Yes, it was a goal for SYNC_ACQUIRE and SYNC_RELEASE.

Caveats:

- "Full memory barrier" on MIPS means - full barrier for any device 
in coherent domain. In MIPS Tech/Imagination Tech MIPS-based CPU it is 
"for any device connected to CM or IOCU + directly connected memory".


- It is not applied to instruction fetch. However, I-Cache flushes 
and SYNCI are consistent with that. There is also hazard barrier 
instructions to clear CPU pipeline to some extent - to help with this 
limitation.


I don't think that these caveats prevent a correct Acquire/Release semantic.

- Leonid.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH kernel] powerpc/ioda: Set "read" permission when "write" is set

2016-01-12 Thread Benjamin Herrenschmidt
On Tue, 2016-01-12 at 15:40 +1100, Alexey Kardashevskiy wrote:
> Quite often drivers set only "write" permission assuming that this
> includes "read" permission as well and this works on plenty
> platforms.
> However IODA2 is strict about this and produces an EEH when "read"
> permission is not and reading happens.
> 
> This adds a workaround in IODA code to always add the "read" bit when
> the "write" bit is set.
> 
> Cc: Benjamin Herrenschmidt 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> 
> Ben, what was the driver which did not set "read" and caused EEH?

aacraid

Cheers,
Ben.

> 
> ---
>  arch/powerpc/platforms/powernv/pci.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci.c
> b/arch/powerpc/platforms/powernv/pci.c
> index f2dd772..c7dcae5 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -601,6 +601,9 @@ int pnv_tce_build(struct iommu_table *tbl, long
> index, long npages,
>   u64 rpn = __pa(uaddr) >> tbl->it_page_shift;
>   long i;
>  
> + if (proto_tce & TCE_PCI_WRITE)
> + proto_tce |= TCE_PCI_READ;
> +
>   for (i = 0; i < npages; i++) {
>   unsigned long newtce = proto_tce |
>   ((rpn + i) << tbl->it_page_shift);
> @@ -622,6 +625,9 @@ int pnv_tce_xchg(struct iommu_table *tbl, long
> index,
>  
>   BUG_ON(*hpa & ~IOMMU_PAGE_MASK(tbl));
>  
> + if (newtce & TCE_PCI_WRITE)
> + newtce |= TCE_PCI_READ;
> +
>   oldtce = xchg(pnv_tce(tbl, idx), cpu_to_be64(newtce));
>   *hpa = be64_to_cpu(oldtce) & ~(TCE_PCI_READ |
> TCE_PCI_WRITE);
>   *direction = iommu_tce_direction(oldtce);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: cxl: Fix DSI misses when the context owning task exits

2016-01-12 Thread Michael Ellerman
On Tue, 2016-01-12 at 13:29 +, David Laight wrote:
> From: Michael Ellerman
> > Sent: 11 January 2016 09:14
> > On Tue, 2015-24-11 at 10:56:18 UTC, Vaibhav Jain wrote:
> > > Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
> > > store the pid of the current task_struct and use it to get pointer to
> > > the mm_struct of the process, while processing page or segment faults
> > > from the capi card. However this causes issues when the thread that had
> > > originally issued the start-work ioctl exits in which case the stored
> > > pid is no more valid and the cxl driver is unable to handle faults as
> > > the mm_struct corresponding to process is no more accessible.
> > > 
> > > This patch fixes this issue by using the mm_struct of the next alive
> > > task in the thread group. This is done by iterating over all the tasks
> > > in the thread group starting from thread group leader and calling
> > > get_task_mm on each one of them. When a valid mm_struct is obtained the
> > > pid of the associated task is stored in the context replacing the
> > > exiting one for handling future faults.
> 
> I don't even claim to understand the linux model for handling process
> address maps, nor what the cxl driver is doing, but the above looks
> more than dodgy.

Thanks for reviewing it!

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] scripts/recordmcount.pl: support data in text section on powerpc

2016-01-12 Thread Michael Ellerman
On Tue, 2016-01-12 at 10:42 -0500, Steven Rostedt wrote:
> On Tue, 12 Jan 2016 23:14:22 +1100
> Michael Ellerman  wrote:
> > From: Ulrich Weigand 
> > 
> > If a text section starts out with a data blob before the first
> > function start label, disassembly parsing doing in recordmcount.pl
> > gets confused on powerpc, leading to creation of corrupted module
> > objects.
> > 
> > This was not a problem so far since the compiler would never create
> > such text sections.  However, this has changed with a recent change
> > in GCC 6 to support distances of > 2GB between a function and its
> > assoicated TOC in the ELFv2 ABI, exposing this problem.
> > 
> > There is already code in recordmcount.pl to handle such data blobs
> > on the sparc64 platform.  This patch uses the same method to handle
> > those on powerpc as well.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Ulrich Weigand 
> > Signed-off-by: Michael Ellerman 
> > ---
> >  scripts/recordmcount.pl | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > Steve can we get an ack for this one, to go via powerpc? cheers
> 
> Acked-by: Steven Rostedt 

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> (I try to answer on multiple mails in one)
> 
> First of all, it seems like some generic notes should be given here:
> 
> 1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some
> CPUs. On that CPUs it basically kills pipelines in each CPU, can do a
> special memory/IO bus transaction (similar to "fence") and hold a system
> until all R/W is completed. It is like Big Kernel Lock but worse. So, the
> move to SMP_* kind of barriers is needed to improve performance, especially
> on newest CPUs with long pipelines.

The MIPS SYNC isn't any worse than the PPC SYNC, x86 MFENCE or arm DSB
SY, yes they're heavy, so what.

> 2. MIPS Arch document may be misleading because words "ordering" and
> "completion" means different from Linux, the SYNC instruction description is
> written for HW engineers. I wrote that in a separate patch of the same
> patchset - http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use
> lightweight SYNC instruction in smp_* memory barriers":

Did you actually say anything here?

> >This instructions were specifically designed to work for smp_*() sort of
> >memory barriers in MIPS R2/R3/R5 and R6.
> >
> >Unfortunately, it's description is very cryptic and is done in HW engineering
> >style which prevents use of it by SW.
> 
> 3. I bother MIPS Arch team long time until I completely understood that MIPS
> SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an exactly
> that is required in Documentation/memory-barriers.txt

Ha! and you think that document covers all the really fun details?

In particular we're very much all 'confused' about the various notions
of transitivity and what barriers imply how much of it.

> In Peter Zijlstra mail:
> 
> >1) you do not make such things selectable; either the hardware needs
> >them or it doesn't. If it does you_must_  use them, however unlikely.

> It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
> MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
> resource, especially taking into account that "lightweight syncs" are
> converted to a heavy "SYNC 0" in many of that CPUs. However the latest
> MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
> of SYNC at LL/SC inside atomics, barriers etc.

What ?! Are you saying that because R2 has short pipelines its unlikely
to hit the reordering issues and we can omit barriers?

> >And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> >are_NOT_  transitive and therefore cannot be used to implement the
> >smp_mb__{before,after} stuff.
> >
> >That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> >Completion Barriers.
> 
> Please see above, point 2.

That did not in fact enlighten things. Are they transitive/multi-copy
atomic or not?

(and here Will will go into great detail on the differences between the
two and make our collective brains explode :-)

> >That is, currently all architectures -- with exception of PPC -- have
> >RCsc locks, but using these non-transitive things will get you RCpc
> >locks.
> >
> >So yes, MIPS can go RCpc for its locks and share the burden of pain with
> >PPC, but that needs to be a very concious decision.
> 
> I don't understand that - I tried hard but I can't find any word like
> "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.

From: lkml.kernel.org/r/20150828153921.gf19...@twins.programming.kicks-ass.net

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.

Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.

As it stands, RCU is the only _known_ codebase where this matters, but
we did in fact write code for a fair number of years 'assuming' RELEASE
+ ACQUIRE was a full barrier, so who knows what else is out there.


RCsc - release consistency sequential consistency
RCpc - release consistency processor consistency

https://en.wikipedia.org/wiki/Processor_consistency

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Leonid Yegoshin

(I try to answer on multiple mails in one)

First of all, it seems like some generic notes should be given here:

1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in 
some CPUs. On that CPUs it basically kills pipelines in each CPU, can do 
a special memory/IO bus transaction (similar to "fence") and hold a 
system until all R/W is completed. It is like Big Kernel Lock but worse. 
So, the move to SMP_* kind of barriers is needed to improve performance, 
especially on newest CPUs with long pipelines.


2. MIPS Arch document may be misleading because words "ordering" and 
"completion" means different from Linux, the SYNC instruction 
description is written for HW engineers. I wrote that in a separate 
patch of the same patchset - 
http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use lightweight 
SYNC instruction in smp_* memory barriers":



This instructions were specifically designed to work for smp_*() sort of
memory barriers in MIPS R2/R3/R5 and R6.

Unfortunately, it's description is very cryptic and is done in HW engineering
style which prevents use of it by SW.


3. I bother MIPS Arch team long time until I completely understood that 
MIPS SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an 
exactly that is required in Documentation/memory-barriers.txt



In Peter Zijlstra mail:


1) you do not make such things selectable; either the hardware needs
them or it doesn't. If it does you_must_  use them, however unlikely.
It is selectable only for MIPS R2 but not MIPS R6. The reason is - most 
of MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU 
resource, especially taking into account that "lightweight syncs" are 
converted to a heavy "SYNC 0" in many of that CPUs. However the latest 
MIPS/Imagination CPU have a pipeline long enough to hit a problem - 
absence of SYNC at LL/SC inside atomics, barriers etc.



And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are_NOT_  transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers.


Please see above, point 2.


That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.


I don't understand that - I tried hard but I can't find any word like 
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.



In Will Deacon mail:


The issue I have with the SYNC description in the text above is that it
describes the single CPU (program order) and the dual-CPU (confusingly
named global order) cases, but then doesn't generalise any further. That
means we can't sensibly reason about transitivity properties when a third
agent is involved. For example, the WRC+sync+addr test:


P0:
Wx = 1

P1:
Rx == 1
SYNC
Wy = 1

P2:
Ry == 1

Rx = 0


I can't find anything to forbid that, given the text. The main problem
is having the SYNC on P1 affect the write by P0.


As I understand that test, the visibility of P0: W[x] = 1 is identical 
to P1 and P2 here. If P1 got X before SYNC and write to Y after SYNC 
then instruction source register dependency tracking in P2 prevents a 
speculative load of X before P2 obtains Y from the same place as P0/P1 
and calculate address of X. If some load of X in P2 happens before 
address dependency calculation it's result is discarded.


Yes, you can't find that in MIPS SYNC instruction description, it is 
more likely in CM (Coherence Manager) area. I just pointed our arch team 
member responsible for documents and he will think how to explain that.


- Leonid.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 08:28:44AM -0800, Paul E. McKenney wrote:
> On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> > From: Davidlohr Bueso 
> > 
> > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> > it was made clear that the context of this call (and thus set_mb)
> > is strictly for CPU ordering, as opposed to IO. As such all archs
> > should use the smp variant of mb(), respecting the semantics and
> > saving a mandatory barrier on UP.
> > 
> > Signed-off-by: Davidlohr Bueso 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Cc: 
> > Cc: Andrew Morton 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Heiko Carstens 
> > Cc: Linus Torvalds 
> > Cc: Paul E. McKenney 
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: Tony Luck 
> > Cc: d...@stgolabs.net
> > Link: 
> > http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-d...@stgolabs.net
> > Signed-off-by: Ingo Molnar 
> 
> Aside from a need for s/lcoking/locking/ in the subject line:
> 
> Reviewed-by: Paul E. McKenney 

Thanks!
Though Ingo already put this in tip tree like this,
and I need a copy in my tree to avoid breaking bisect,
so I will probably keep it exactly the same to avoid confusion.

> > ---
> >  arch/ia64/include/asm/barrier.h| 2 +-
> >  arch/powerpc/include/asm/barrier.h | 2 +-
> >  arch/s390/include/asm/barrier.h| 2 +-
> >  include/asm-generic/barrier.h  | 2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/ia64/include/asm/barrier.h 
> > b/arch/ia64/include/asm/barrier.h
> > index df896a1..209c4b8 100644
> > --- a/arch/ia64/include/asm/barrier.h
> > +++ b/arch/ia64/include/asm/barrier.h
> > @@ -77,7 +77,7 @@ do {  
> > \
> > ___p1;  \
> >  })
> > 
> > -#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); mb(); } 
> > while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> > while (0)
> > 
> >  /*
> >   * The group barrier in front of the rsm & ssm are necessary to ensure
> > diff --git a/arch/powerpc/include/asm/barrier.h 
> > b/arch/powerpc/include/asm/barrier.h
> > index 0eca6ef..a7af5fb 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -34,7 +34,7 @@
> >  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> > 
> > -#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); mb(); } 
> > while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> > while (0)
> > 
> >  #ifdef __SUBARCH_HAS_LWSYNC
> >  #define SMPWMB  LWSYNC
> > diff --git a/arch/s390/include/asm/barrier.h 
> > b/arch/s390/include/asm/barrier.h
> > index d68e11e..7ffd0b1 100644
> > --- a/arch/s390/include/asm/barrier.h
> > +++ b/arch/s390/include/asm/barrier.h
> > @@ -36,7 +36,7 @@
> >  #define smp_mb__before_atomic()smp_mb()
> >  #define smp_mb__after_atomic() smp_mb()
> > 
> > -#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); 
> > mb(); } while (0)
> > +#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); smp_mb(); 
> > } while (0)
> > 
> >  #define smp_store_release(p, v)
> > \
> >  do {   
> > \
> > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> > index b42afad..0f45f93 100644
> > --- a/include/asm-generic/barrier.h
> > +++ b/include/asm-generic/barrier.h
> > @@ -93,7 +93,7 @@
> >  #endif /* CONFIG_SMP */
> > 
> >  #ifndef smp_store_mb
> > -#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } 
> > while (0)
> > +#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } 
> > while (0)
> >  #endif
> > 
> >  #ifndef smp_mb__before_atomic
> > -- 
> > MST
> > 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] ASoC: fsl: select SND_SOC_FSL_SAI or SND_SOC_FSL_SSI depending on SoC type

2016-01-12 Thread Timur Tabi

Lothar Waßmann wrote:

-   select SND_SOC_FSL_SSI
+   select SND_SOC_FSL_SAI if SOC_IMX6UL
+   select SND_SOC_FSL_SSI if SOC_IMX6Q || SOC_IMX6SL || SOC_IMX6SX


I don't think this is compatible with a multiarch kernel.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] ASoC: fsl: imx-sgtl5000: make audmux optional for imx sound driver

2016-01-12 Thread Mark Brown
On Tue, Jan 12, 2016 at 07:13:30PM +0100, Lothar Waßmann wrote:

> i.MX6UL does not have the audio multiplexer (AUDMUX) like e.g. i.MX6Q,
> but apart from that can use the same audio driver. Make audmux
> optional for the imx-sgtl5000 driver, so it can be used on i.MX6UL
> too. Also i.MX6UL requires use of the SAI interface rather than SSI.

If it doesn't have the audmux can you use simple-card?


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] ASoC: fsl: imx-sgtl5000: make audmux optional for imx sound driver

2016-01-12 Thread Lothar Waßmann
i.MX6UL does not have the audio multiplexer (AUDMUX) like e.g. i.MX6Q,
but apart from that can use the same audio driver. Make audmux
optional for the imx-sgtl5000 driver, so it can be used on i.MX6UL
too. Also i.MX6UL requires use of the SAI interface rather than SSI.

Signed-off-by: Lothar Waßmann 
---
 sound/soc/fsl/imx-sgtl5000.c | 70 +++-
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/sound/soc/fsl/imx-sgtl5000.c b/sound/soc/fsl/imx-sgtl5000.c
index b99e0b5..7cefb40 100644
--- a/sound/soc/fsl/imx-sgtl5000.c
+++ b/sound/soc/fsl/imx-sgtl5000.c
@@ -65,40 +65,42 @@ static int imx_sgtl5000_probe(struct platform_device *pdev)
int int_port, ext_port;
int ret;
 
-   ret = of_property_read_u32(np, "mux-int-port", &int_port);
-   if (ret) {
-   dev_err(&pdev->dev, "mux-int-port missing or invalid\n");
-   return ret;
-   }
-   ret = of_property_read_u32(np, "mux-ext-port", &ext_port);
-   if (ret) {
-   dev_err(&pdev->dev, "mux-ext-port missing or invalid\n");
-   return ret;
-   }
-
-   /*
-* The port numbering in the hardware manual starts at 1, while
-* the audmux API expects it starts at 0.
-*/
-   int_port--;
-   ext_port--;
-   ret = imx_audmux_v2_configure_port(int_port,
-   IMX_AUDMUX_V2_PTCR_SYN |
-   IMX_AUDMUX_V2_PTCR_TFSEL(ext_port) |
-   IMX_AUDMUX_V2_PTCR_TCSEL(ext_port) |
-   IMX_AUDMUX_V2_PTCR_TFSDIR |
-   IMX_AUDMUX_V2_PTCR_TCLKDIR,
-   IMX_AUDMUX_V2_PDCR_RXDSEL(ext_port));
-   if (ret) {
-   dev_err(&pdev->dev, "audmux internal port setup failed\n");
-   return ret;
-   }
-   ret = imx_audmux_v2_configure_port(ext_port,
-   IMX_AUDMUX_V2_PTCR_SYN,
-   IMX_AUDMUX_V2_PDCR_RXDSEL(int_port));
-   if (ret) {
-   dev_err(&pdev->dev, "audmux external port setup failed\n");
-   return ret;
+   if (!of_property_read_bool(np, "fsl,no-audmux")) {
+   ret = of_property_read_u32(np, "mux-int-port", &int_port);
+   if (ret) {
+   dev_err(&pdev->dev, "mux-int-port missing or 
invalid\n");
+   return ret;
+   }
+   ret = of_property_read_u32(np, "mux-ext-port", &ext_port);
+   if (ret) {
+   dev_err(&pdev->dev, "mux-ext-port missing or 
invalid\n");
+   return ret;
+   }
+
+   /*
+* The port numbering in the hardware manual starts at 1, while
+* the audmux API expects it starts at 0.
+*/
+   int_port--;
+   ext_port--;
+   ret = imx_audmux_v2_configure_port(int_port,
+   IMX_AUDMUX_V2_PTCR_SYN |
+   IMX_AUDMUX_V2_PTCR_TFSEL(ext_port) |
+   IMX_AUDMUX_V2_PTCR_TCSEL(ext_port) |
+   IMX_AUDMUX_V2_PTCR_TFSDIR |
+   IMX_AUDMUX_V2_PTCR_TCLKDIR,
+   IMX_AUDMUX_V2_PDCR_RXDSEL(ext_port));
+   if (ret) {
+   dev_err(&pdev->dev, "audmux internal port setup 
failed\n");
+   return ret;
+   }
+   ret = imx_audmux_v2_configure_port(ext_port,
+   IMX_AUDMUX_V2_PTCR_SYN,
+   IMX_AUDMUX_V2_PDCR_RXDSEL(int_port));
+   if (ret) {
+   dev_err(&pdev->dev, "audmux external port setup 
failed\n");
+   return ret;
+   }
}
 
ssi_np = of_parse_phandle(pdev->dev.of_node, "ssi-controller", 0);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] ASoC: fsl: select SND_SOC_FSL_SAI or SND_SOC_FSL_SSI depending on SoC type

2016-01-12 Thread Lothar Waßmann
i.MX6UL does not provide an SSI interface like the other i.MX6 SoCs,
but only an SAI interface.
Select the appropriate interface(s) depending on the enabled SoC types.

Signed-off-by: Lothar Waßmann 
---
 sound/soc/fsl/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index 14dfdee..c128823 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -258,7 +258,8 @@ config SND_SOC_IMX_SGTL5000
select SND_SOC_SGTL5000
select SND_SOC_IMX_PCM_DMA
select SND_SOC_IMX_AUDMUX
-   select SND_SOC_FSL_SSI
+   select SND_SOC_FSL_SAI if SOC_IMX6UL
+   select SND_SOC_FSL_SSI if SOC_IMX6Q || SOC_IMX6SL || SOC_IMX6SX
help
  Say Y if you want to add support for SoC audio on an i.MX board with
  a sgtl5000 codec.
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/2] ASoC: fsl: make snd-soc-imx-sgtl5000 driver useable on i.MX6UL

2016-01-12 Thread Lothar Waßmann
This patchset adds support for the i.MX6UL SoC to the imx-sgtl5000
sound driver.
The first patch makes the audmux setup optional for the driver, since
i.MX6UL does not have this unit.
The second patch selects the SAI interface rather than the SSI
interface for the i.MX6UL SoC.

A patch to make the corresponding DTB changes has been sent
separately.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] clk: imx: add kpp clock for i.MX6UL

2016-01-12 Thread Lothar Waßmann
Add the necessary clock to use the KPP interface on i.MX6UL.

Signed-off-by: Lothar Waßmann 
---
 drivers/clk/imx/clk-imx6ul.c | 1 +
 include/dt-bindings/clock/imx6ul-clock.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/imx/clk-imx6ul.c b/drivers/clk/imx/clk-imx6ul.c
index 3e31ec0..1ee28d3 100644
--- a/drivers/clk/imx/clk-imx6ul.c
+++ b/drivers/clk/imx/clk-imx6ul.c
@@ -365,6 +365,7 @@ static void __init imx6ul_clocks_init(struct device_node 
*ccm_node)
/* CCGR5 */
clks[IMX6UL_CLK_ROM]= imx_clk_gate2("rom",  "ahb",  
base + 0x7c,0);
clks[IMX6UL_CLK_SDMA]   = imx_clk_gate2("sdma", "ahb",  
base + 0x7c,6);
+   clks[IMX6UL_CLK_KPP]= imx_clk_gate2("kpp",  "ipg",  
base + 0x7c,8);
clks[IMX6UL_CLK_WDOG2]  = imx_clk_gate2("wdog2","ipg",  
base + 0x7c,10);
clks[IMX6UL_CLK_SPBA]   = imx_clk_gate2("spba", "ipg",  
base + 0x7c,12);
clks[IMX6UL_CLK_SPDIF]  = imx_clk_gate2_shared("spdif", 
"spdif_podf",   base + 0x7c,14, &share_count_audio);
diff --git a/include/dt-bindings/clock/imx6ul-clock.h 
b/include/dt-bindings/clock/imx6ul-clock.h
index 08ce4a7..fd8aee8 100644
--- a/include/dt-bindings/clock/imx6ul-clock.h
+++ b/include/dt-bindings/clock/imx6ul-clock.h
@@ -234,7 +234,8 @@
 #define IMX6UL_CLK_CSI_SEL 221
 #define IMX6UL_CLK_CSI_PODF222
 #define IMX6UL_CLK_PLL3_120M   223
+#define IMX6UL_CLK_KPP 224
 
-#define IMX6UL_CLK_END 224
+#define IMX6UL_CLK_END 225
 
 #endif /* __DT_BINDINGS_CLOCK_IMX6UL_H */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] clk: imx: whitespace cleanup; no functional change

2016-01-12 Thread Lothar Waßmann
remove whitespace before TAB.

Signed-off-by: Lothar Waßmann 
---
 drivers/clk/imx/clk-imx6ul.c |  62 ++---
 include/dt-bindings/clock/imx6ul-clock.h | 146 +++
 2 files changed, 104 insertions(+), 104 deletions(-)

diff --git a/drivers/clk/imx/clk-imx6ul.c b/drivers/clk/imx/clk-imx6ul.c
index 08692d7..3e31ec0 100644
--- a/drivers/clk/imx/clk-imx6ul.c
+++ b/drivers/clk/imx/clk-imx6ul.c
@@ -157,9 +157,9 @@ static void __init imx6ul_clocks_init(struct device_node 
*ccm_node)
clk_set_parent(clks[IMX6UL_PLL7_BYPASS], clks[IMX6UL_CLK_PLL7]);
 
clks[IMX6UL_CLK_PLL1_SYS]   = imx_clk_fixed_factor("pll1_sys",  
"pll1_bypass", 1, 1);
-   clks[IMX6UL_CLK_PLL2_BUS]   = imx_clk_gate("pll2_bus",  
"pll2_bypass", base + 0x30, 13);
-   clks[IMX6UL_CLK_PLL3_USB_OTG]   = imx_clk_gate("pll3_usb_otg",  
"pll3_bypass", base + 0x10, 13);
-   clks[IMX6UL_CLK_PLL4_AUDIO] = imx_clk_gate("pll4_audio",
"pll4_bypass", base + 0x70, 13);
+   clks[IMX6UL_CLK_PLL2_BUS]   = imx_clk_gate("pll2_bus",  
"pll2_bypass", base + 0x30, 13);
+   clks[IMX6UL_CLK_PLL3_USB_OTG]   = imx_clk_gate("pll3_usb_otg",  
"pll3_bypass", base + 0x10, 13);
+   clks[IMX6UL_CLK_PLL4_AUDIO] = imx_clk_gate("pll4_audio",
"pll4_bypass", base + 0x70, 13);
clks[IMX6UL_CLK_PLL5_VIDEO] = imx_clk_gate("pll5_video",
"pll5_bypass", base + 0xa0, 13);
clks[IMX6UL_CLK_PLL6_ENET]  = imx_clk_gate("pll6_enet", 
"pll6_bypass", base + 0xe0, 13);
clks[IMX6UL_CLK_PLL7_USB_HOST]  = imx_clk_gate("pll7_usb_host", 
"pll7_bypass", base + 0x20, 13);
@@ -196,8 +196,8 @@ static void __init imx6ul_clocks_init(struct device_node 
*ccm_node)
base + 0xe0, 2, 2, 0, clk_enet_ref_table, 
&imx_ccm_lock);
 
clks[IMX6UL_CLK_ENET2_REF_125M] = imx_clk_gate("enet_ref_125m", 
"enet2_ref", base + 0xe0, 20);
-   clks[IMX6UL_CLK_ENET_PTP_REF]   = imx_clk_fixed_factor("enet_ptp_ref", 
"pll6_enet", 1, 20);
-   clks[IMX6UL_CLK_ENET_PTP]   = imx_clk_gate("enet_ptp", 
"enet_ptp_ref", base + 0xe0, 21);
+   clks[IMX6UL_CLK_ENET_PTP_REF]   = imx_clk_fixed_factor("enet_ptp_ref", 
"pll6_enet", 1, 20);
+   clks[IMX6UL_CLK_ENET_PTP]   = imx_clk_gate("enet_ptp", 
"enet_ptp_ref", base + 0xe0, 21);
 
clks[IMX6UL_CLK_PLL4_POST_DIV]  = clk_register_divider_table(NULL, 
"pll4_post_div", "pll4_audio",
 CLK_SET_RATE_PARENT | CLK_SET_RATE_GATE, base + 0x70, 19, 2, 
0, post_div_table, &imx_ccm_lock);
@@ -210,8 +210,8 @@ static void __init imx6ul_clocks_init(struct device_node 
*ccm_node)
 
/* name 
parent_name  mult  div */
clks[IMX6UL_CLK_PLL2_198M] = imx_clk_fixed_factor("pll2_198m", 
"pll2_pfd2_396m", 1, 2);
-   clks[IMX6UL_CLK_PLL3_80M]  = imx_clk_fixed_factor("pll3_80m",  
"pll3_usb_otg",   1, 6);
-   clks[IMX6UL_CLK_PLL3_60M]  = imx_clk_fixed_factor("pll3_60m",  
"pll3_usb_otg",   1, 8);
+   clks[IMX6UL_CLK_PLL3_80M]  = imx_clk_fixed_factor("pll3_80m",  
"pll3_usb_otg",   1, 6);
+   clks[IMX6UL_CLK_PLL3_60M]  = imx_clk_fixed_factor("pll3_60m",  
"pll3_usb_otg",   1, 8);
clks[IMX6UL_CLK_GPT_3M]= imx_clk_fixed_factor("gpt_3m", "osc",  
 1, 8);
 
np = ccm_node;
@@ -219,34 +219,34 @@ static void __init imx6ul_clocks_init(struct device_node 
*ccm_node)
WARN_ON(!base);
 
clks[IMX6UL_CA7_SECONDARY_SEL]= imx_clk_mux("ca7_secondary_sel", 
base + 0xc, 3, 1, ca7_secondary_sels, ARRAY_SIZE(ca7_secondary_sels));
-   clks[IMX6UL_CLK_STEP] = imx_clk_mux("step", base + 0x0c, 8, 
1, step_sels, ARRAY_SIZE(step_sels));
-   clks[IMX6UL_CLK_PLL1_SW]  = imx_clk_mux_flags("pll1_sw",   base 
+ 0x0c, 2,  1, pll1_sw_sels, ARRAY_SIZE(pll1_sw_sels), 0);
+   clks[IMX6UL_CLK_STEP] = imx_clk_mux("step", base + 0x0c, 8, 
1, step_sels, ARRAY_SIZE(step_sels));
+   clks[IMX6UL_CLK_PLL1_SW]  = imx_clk_mux_flags("pll1_sw",   base 
+ 0x0c, 2,  1, pll1_sw_sels, ARRAY_SIZE(pll1_sw_sels), 0);
clks[IMX6UL_CLK_AXI_ALT_SEL]  = imx_clk_mux("axi_alt_sel",  
base + 0x14, 7,  1, axi_alt_sels, ARRAY_SIZE(axi_alt_sels));
-   clks[IMX6UL_CLK_AXI_SEL]  = imx_clk_mux_flags("axi_sel",
base + 0x14, 6,  1, axi_sels, ARRAY_SIZE(axi_sels), 0);
-   clks[IMX6UL_CLK_PERIPH_PRE]   = imx_clk_mux("periph_pre",   
base + 0x18, 18, 2, periph_pre_sels, ARRAY_SIZE(periph_pre_sels));
-   clks[IMX6UL_CLK_PERIPH2_PRE]  = imx_clk_mux("periph2_pre",  
base + 0x18, 21, 2, periph2_pre_sels, ARRAY_SIZE(periph2_pre_sels));
+   clks[IMX6UL_CLK_AXI_SEL]  = imx_clk_mux_flags("axi_sel",
base + 0x14, 6,  1, axi_sels, ARRAY_SIZE(axi_sels), 0);
+   clks[IMX6UL_CLK_PERIPH_PRE]   = imx_clk_mux("periph_pre",   
base + 0x18, 18, 2, periph_pre_sels, ARRAY_

[PATCH 0/2] clk: imx6: add kpp clock for i.MX6UL

2016-01-12 Thread Lothar Waßmann
This patchset adds the clock which is necessary to operate the KPP
unit on i.MX6UL.
The first patch removes bogus whitespace before TABs in indentation.
The second patch adds the clock definition.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Add hwcap2 bits for POWER9

2016-01-12 Thread Carlos O'Donell
On 01/12/2016 11:39 AM, Steven Munroe wrote:
>> That's the rule. There are no other discussions to be had.
>>
> Well is was posted to to powerpc next:
> https://git.kernel.org/powerpc/c/e708c24cd01ce80b1609d8bacc
> 
> We have agreement between the kernel and GLIBC (and the ABI). 
> 
> The issue is just coordination across communities and individuals that
> may not being paying attention to other communities dead lines.
> 
> Have you ever tried to push a string, up hill. That is open source
> development in nutshell. ;)

I know exactly what this is like.

> So it is in flight and glibc is soft/slush freeze. I would hate to
> revert this one day just to add it back to the next. Especially if those
> days straddle the hard freeze ...
> 
> So can we let this ride a day or too?

Sure. I'm not an unreasonable person.

My goal as a glibc steward is to remind IBM that our best practice is that
we *wait* until it goes into mainline before committing to glibc master.

There really isn't any reason to check this in to glibc master right now.
It could wait.

Adhemerval as a release manager is also not an unreasonable person.
I have already discussed with Tulio that he should have just waited to
commit these changes, but gotten an exception from Adhemerval to checkin
the fairly low-risk patches late in the freeze. That's exactly the purpose
of a release managers job, to grant you exceptions as we approach release,
particularly when schedules don't quite line up.

Cheers,
Carlos.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h

2016-01-12 Thread Paul E. McKenney
On Sun, Jan 10, 2016 at 04:17:09PM +0200, Michael S. Tsirkin wrote:
> On powerpc read_barrier_depends, smp_read_barrier_depends
> smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
> asm-generic variants exactly. Drop the local definitions and pull in
> asm-generic/barrier.h instead.
> 
> This is in preparation to refactoring this code area.
> 
> Signed-off-by: Michael S. Tsirkin 
> Acked-by: Arnd Bergmann 

Looks sane to me.

Reviewed-by: Paul E. McKenney 

> ---
>  arch/powerpc/include/asm/barrier.h | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/barrier.h 
> b/arch/powerpc/include/asm/barrier.h
> index a7af5fb..980ad0c 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -34,8 +34,6 @@
>  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> 
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> while (0)
> -
>  #ifdef __SUBARCH_HAS_LWSYNC
>  #define SMPWMB  LWSYNC
>  #else
> @@ -60,9 +58,6 @@
>  #define smp_wmb()barrier()
>  #endif /* CONFIG_SMP */
> 
> -#define read_barrier_depends()   do { } while (0)
> -#define smp_read_barrier_depends()   do { } while (0)
> -
>  /*
>   * This is a barrier which prevents following instructions from being
>   * started until the value of the argument x is known.  For example, if
> @@ -87,8 +82,8 @@ do {
> \
>   ___p1;  \
>  })
> 
> -#define smp_mb__before_atomic() smp_mb()
> -#define smp_mb__after_atomic()  smp_mb()
>  #define smp_mb__before_spinlock()   smp_mb()
> 
> +#include 
> +
>  #endif /* _ASM_POWERPC_BARRIER_H */
> -- 
> MST
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Add hwcap2 bits for POWER9

2016-01-12 Thread Steven Munroe
On Mon, 2016-01-11 at 15:48 -0500, Carlos O'Donell wrote:
> On 01/11/2016 02:55 PM, Tulio Magno Quites Machado Filho wrote:
> > "Carlos O'Donell"  writes:
> > 
> >> On 01/11/2016 10:16 AM, Tulio Magno Quites Machado Filho wrote:
> >>> Adhemerval Zanella  writes:
> >>>
>  On 08-01-2016 13:36, Peter Bergner wrote:
> > On Fri, 2016-01-08 at 11:25 -0200, Tulio Magno Quites Machado Filho 
> > wrote:
> >> Peter, this solves the issue you reported previously [1].
> >>
> >> [1] https://sourceware.org/ml/libc-alpha/2015-12/msg00522.html
> >
> > Agreed, thanks.  I'll also add the POWER9 support to the GCC side
> > of the patch now that the glibc code is upstream.
> 
>  I do not see these bits being added in kernel side yet and GLIBC usual
>  only sync these kind of bits *after* they are included in kernel side.
>  So I would advise to either get these pieces (kernel support and hwcap
>  advertise) in kernel before 2.23 release, otherwise revert the patches.
> >>>
> >>> Ack.
> >>> It has just been sent to the correspondent Linux mailing list:
> >>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-January/137763.html
> >>
> >> Please revert the changes from glibc until you checkin support to linux
> >> kernel mainline.
> >>
> >> Leaving these bits in increases the risk that someone uses to deploy a 
> >> glibc
> >> that then may have the wrong value.
> > 
> > Could you clarify this statement, please?
> > I fail to see how they could have the wrong value.
> 
> Until it is checked into the mainline kernel it is not canonical.
> 
> That's the rule. There are no other discussions to be had.
> 
Well is was posted to to powerpc next:
https://git.kernel.org/powerpc/c/e708c24cd01ce80b1609d8bacc

We have agreement between the kernel and GLIBC (and the ABI). 

The issue is just coordination across communities and individuals that
may not being paying attention to other communities dead lines.

Have you ever tried to push a string, up hill. That is open source
development in nutshell. ;)

So it is in flight and glibc is soft/slush freeze. I would hate to
revert this one day just to add it back to the next. Especially if those
days straddle the hard freeze ...

So can we let this ride a day or too?

> The single rule avoids discussions like "it can never be wrong because that's
> what our ABI says it is."
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()

2016-01-12 Thread Paul E. McKenney
On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> From: Davidlohr Bueso 
> 
> With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> it was made clear that the context of this call (and thus set_mb)
> is strictly for CPU ordering, as opposed to IO. As such all archs
> should use the smp variant of mb(), respecting the semantics and
> saving a mandatory barrier on UP.
> 
> Signed-off-by: Davidlohr Bueso 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: 
> Cc: Andrew Morton 
> Cc: Benjamin Herrenschmidt 
> Cc: Heiko Carstens 
> Cc: Linus Torvalds 
> Cc: Paul E. McKenney 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Tony Luck 
> Cc: d...@stgolabs.net
> Link: 
> http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-d...@stgolabs.net
> Signed-off-by: Ingo Molnar 

Aside from a need for s/lcoking/locking/ in the subject line:

Reviewed-by: Paul E. McKenney 

> ---
>  arch/ia64/include/asm/barrier.h| 2 +-
>  arch/powerpc/include/asm/barrier.h | 2 +-
>  arch/s390/include/asm/barrier.h| 2 +-
>  include/asm-generic/barrier.h  | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
> index df896a1..209c4b8 100644
> --- a/arch/ia64/include/asm/barrier.h
> +++ b/arch/ia64/include/asm/barrier.h
> @@ -77,7 +77,7 @@ do {
> \
>   ___p1;  \
>  })
> 
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } 
> while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> while (0)
> 
>  /*
>   * The group barrier in front of the rsm & ssm are necessary to ensure
> diff --git a/arch/powerpc/include/asm/barrier.h 
> b/arch/powerpc/include/asm/barrier.h
> index 0eca6ef..a7af5fb 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -34,7 +34,7 @@
>  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> 
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } 
> while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } 
> while (0)
> 
>  #ifdef __SUBARCH_HAS_LWSYNC
>  #define SMPWMB  LWSYNC
> diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> index d68e11e..7ffd0b1 100644
> --- a/arch/s390/include/asm/barrier.h
> +++ b/arch/s390/include/asm/barrier.h
> @@ -36,7 +36,7 @@
>  #define smp_mb__before_atomic()  smp_mb()
>  #define smp_mb__after_atomic()   smp_mb()
> 
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); 
> mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); 
> } while (0)
> 
>  #define smp_store_release(p, v)  
> \
>  do { \
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index b42afad..0f45f93 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -93,7 +93,7 @@
>  #endif   /* CONFIG_SMP */
> 
>  #ifndef smp_store_mb
> -#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } while 
> (0)
> +#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } 
> while (0)
>  #endif
> 
>  #ifndef smp_mb__before_atomic
> -- 
> MST
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] scripts/recordmcount.pl: support data in text section on powerpc

2016-01-12 Thread Steven Rostedt
On Tue, 12 Jan 2016 23:14:22 +1100
Michael Ellerman  wrote:

> From: Ulrich Weigand 
> 
> If a text section starts out with a data blob before the first
> function start label, disassembly parsing doing in recordmcount.pl
> gets confused on powerpc, leading to creation of corrupted module
> objects.
> 
> This was not a problem so far since the compiler would never create
> such text sections.  However, this has changed with a recent change
> in GCC 6 to support distances of > 2GB between a function and its
> assoicated TOC in the ELFv2 ABI, exposing this problem.
> 
> There is already code in recordmcount.pl to handle such data blobs
> on the sparc64 platform.  This patch uses the same method to handle
> those on powerpc as well.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Ulrich Weigand 
> Signed-off-by: Michael Ellerman 
> ---
>  scripts/recordmcount.pl | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Steve can we get an ack for this one, to go via powerpc? cheers

Acked-by: Steven Rostedt 

-- Steve

> 
> diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
> index 826470d7f000..96e2486a6fc4 100755
> --- a/scripts/recordmcount.pl
> +++ b/scripts/recordmcount.pl
> @@ -263,7 +263,8 @@ if ($arch eq "x86_64") {
>  
>  } elsif ($arch eq "powerpc") {
>  $local_regex = "^[0-9a-fA-F]+\\s+t\\s+(\\.?\\S+)";
> -$function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?.*?)>:";
> +# See comment in the sparc64 section for why we use '\w'.
> +$function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?\\w*?)>:";
>  $mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\s\\.?_mcount\$";
>  
>  if ($bits == 64) {

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 13/41] x86: reuse asm-generic/barrier.h

2016-01-12 Thread Thomas Gleixner
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:

> As on most architectures, on x86 read_barrier_depends and
> smp_read_barrier_depends are empty.  Drop the local definitions and pull
> the generic ones from asm-generic/barrier.h instead: they are identical.
> 
> This is in preparation to refactoring this code area.
> 
> Signed-off-by: Michael S. Tsirkin 
> Acked-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 27/41] x86: define __smp_xxx

2016-01-12 Thread Thomas Gleixner
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:

> This defines __smp_xxx barriers for x86,
> for use by virtualization.
> 
> smp_xxx barriers are removed as they are
> defined correctly by asm-generic/barriers.h
> 
> Signed-off-by: Michael S. Tsirkin 
> Acked-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: cxl: Fix DSI misses when the context owning task exits

2016-01-12 Thread David Laight
From: Michael Ellerman
> Sent: 11 January 2016 09:14
> On Tue, 2015-24-11 at 10:56:18 UTC, Vaibhav Jain wrote:
> > Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
> > store the pid of the current task_struct and use it to get pointer to
> > the mm_struct of the process, while processing page or segment faults
> > from the capi card. However this causes issues when the thread that had
> > originally issued the start-work ioctl exits in which case the stored
> > pid is no more valid and the cxl driver is unable to handle faults as
> > the mm_struct corresponding to process is no more accessible.
> >
> > This patch fixes this issue by using the mm_struct of the next alive
> > task in the thread group. This is done by iterating over all the tasks
> > in the thread group starting from thread group leader and calling
> > get_task_mm on each one of them. When a valid mm_struct is obtained the
> > pid of the associated task is stored in the context replacing the
> > exiting one for handling future faults.

I don't even claim to understand the linux model for handling process
address maps, nor what the cxl driver is doing, but the above looks
more than dodgy.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 00/41] arch: barrier cleanup + barriers for virt

2016-01-12 Thread Peter Zijlstra
On Sun, Jan 10, 2016 at 04:16:22PM +0200, Michael S. Tsirkin wrote:
> I parked this in vhost tree for now, though the inclusion of patch 1 from tip
> creates a merge conflict - but one that is trivial to resolve.
> 
> So I intend to just merge it all through my tree, including the
> duplicate patch, and assume conflict will be resolved.
> 
> I would really appreciate some feedback on arch bits (especially the x86 
> bits),
> and acks for merging this through the vhost tree.

Thanks for doing this, looks good to me.

Acked-by: Peter Zijlstra (Intel) 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/powernv: Remove misleading comment in pci.c

2016-01-12 Thread Michael Ellerman
On Fri, 2016-08-01 at 05:16:47 UTC, Russell Currey wrote:
> PCI in powernv now supports quite a bit more than p5ioc2, so remove the
> outdated comment.
> 
> Signed-off-by: Russell Currey 
> Acked-by: Stewart Smith 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b0eab5b29a55fd9f31fad28df5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing (was [RFC] ppc: Implement save_stack_trace_regs())

2016-01-12 Thread Michael Ellerman
On Mon, 2016-11-01 at 03:30:31 UTC, Michael Ellerman wrote:
> On Fri, 2016-01-08 at 17:50 -0500, Steven Rostedt wrote:
> > > Are you going to take this, or do you want me to?
>
> Sorry, yep I'll take it.
>
> I trimmed the change log a bit, final version below.
>
> powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
>
> It has come to my attention that kprobe event stack tracing does not
> work on powerpc. You can see with the following:
>
>   # cd /sys/kernel/debug/tracing
>   # echo stacktrace > trace_options
>   # echo 'p kfree' > kprobe_events
>   # echo 1 > events/kprobes/enable
>
> Will print the following warning:
>   save_stack_trace_regs() not implemented yet.
>
> Although save_stack_trace() (which normal event stack traces use) is
> implemented, save_stack_trace_regs() which kprobe events use is not.
> This is a cheap attempt to implement that function.
>
> Note, This may have issues if a task tries to get a stack trace from
> another task with its regs, because it just passes in "current" to
> save_context_stack(). But this does solve the issue with stack tracing
> kprobe events.
>
> Reported-by: Chunyu Hu 
> Signed-off-by: Steven Rostedt 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/35de3b1aa16842214e0cd7c603

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Add HWCAP bits for Power9

2016-01-12 Thread Michael Ellerman
On Mon, 2016-11-01 at 02:59:04 UTC, Michael Ellerman wrote:
> In order to support Power9 we need two new HWCAP bits. We are merging
> these ahead of the cputable entry so that glibc can start referring to
> them.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/e708c24cd01ce80b1609d8bacc

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: platforms/powernv: Fix update of NVLink DMA mask

2016-01-12 Thread Michael Ellerman
On Fri, 2016-08-01 at 00:35:09 UTC, Alistair Popple wrote:
> The emulated NVLink PCI devices share the same IODA2 TCE tables but
> only support a single TVT (instead of the normal two for PCI
> devices). This requires the kernel to manually replace windows with
> either the bypass or non-bypass window depending on what the driver
> has requested.
> 
> Unfortunately an incorrect optimisation was made in
> pnv_pci_ioda_dma_set_mask() which caused updating of some NPU device
> PEs to be skipped in certain configurations due to an incorrect
> assumption that a NULL peer PE in the array indicated there were no
> more peers present. This patch fixes the problem by ensuring all peer
> PEs are updated.
> 
> Signed-off-by: Alistair Popple 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/419dbd5e1ff0e45a6e1d28c1f7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [next] powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff

2016-01-12 Thread Michael Ellerman
On Sun, 2016-10-01 at 00:54:59 UTC, Hugh Dickins wrote:
> Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y
> but CONFIG_MEM_SOFT_DIRTY is not set.  That's because the non-zero
> _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not
> discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be
> recognized.
> 
> (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on
> CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete
> attempt to solve this problem.)
> 
> It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and
> and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff
> should be made more robust; but nevertheless, fix up the powerpc ifdefs
> as x86_64 and s390 (which met the same problem) have them, defining the
> bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set.
> 
> Signed-off-by: Hugh Dickins 
> Reviewed-by: Cyrill Gorcunov 
> Acked-by: Laurent Dufour 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2f10f1a7884e97a68e52c4b6f7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V2] mm/powerpc: Fix _PAGE_PTE breaking swapoff

2016-01-12 Thread Michael Ellerman
On Mon, 2016-11-01 at 15:49:34 UTC, "Aneesh Kumar K.V" wrote:
> Core kernel expect swp_entry_t to be consisting of
> only swap type and swap offset. We should not leak pte bits to
> swp_entry_t. This breaks swapoff which use the swap type and offset
> to build a swp_entry_t and later compare that to the swp_entry_t
> obtained from linux page table pte. Leaking pte bits to swp_entry_t
> breaks that comparison and results in us looping in try_to_unuse.
> 
> The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c,
> since swapoff is circling around and around that function, reading from
> each used swap block into a page, then trying to find where that page
> belongs, looking at every non-file pte of every mm that ever swapped.
> 
> Reported-by: Hugh Dickins 
> Suggested-by: Hugh Dickins 
> Signed-off-by: Aneesh Kumar K.V 
> Acked-by: Hugh Dickins 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/44734f23de2465c3c0d39e4a16

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-next: build failure after merge of the powerpc tree

2016-01-12 Thread Michael Ellerman
On Thu, 2016-07-01 at 08:16:13 UTC, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the powerpc tree, today's linux-next build (powerpc64
> allnoconfig) failed like this:
> 
> arch/powerpc/mm/hash_utils_64.c: In function 'get_paca_psize':
> arch/powerpc/mm/hash_utils_64.c:869:19: error: 'struct paca_struct' has no 
> member named 'context'
>   return get_paca()->context.user_psize;
>^
> arch/powerpc/mm/hash_utils_64.c:870:1: error: control reaches end of non-void 
> function [-Werror=return-type]
>  }
>  ^
> 
> Caused by commit
> 
>   2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to 
> the paca")
> 
> This build has CONFIG_PPC_MM_SLICES not set ...
> 
> I have applied the following patch for today:
> 
> From: Stephen Rothwell 
> Date: Thu, 7 Jan 2016 19:07:18 +1100
> Subject: [PATCH] powerpc: restore the user_psize member of the mm_context_t in
>  the paca
> 
> It is used when CONFIG_PPC_MM_SLICES is not set.
> 
> Fixes: 2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t 
> to the paca")
> Signed-off-by: Stephen Rothwell 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c33e54fafacaf83b3e345aae0e

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: cxl: Enable PCI device ID for future IBM CXL adapter

2016-01-12 Thread Michael Ellerman
On Mon, 2015-07-12 at 22:03:32 UTC, Uma Krishnan wrote:
> Add support for future IBM Coherent Accelerator (CXL) device
> with ID of 0x0601.
> 
> Signed-off-by: Uma Krishnan 
> Reviewed-by: Matthew R. Ochs 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/68adb7bfd66504e97364651fb7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/2] powerpc/powernv: Reserve PE#0 on NPU

2016-01-12 Thread Michael Ellerman
On Mon, 2016-11-01 at 05:53:50 UTC, Alistair Popple wrote:
> P8+ hardware reports all errors on PE#0. This patch ensures PE#0 is
> not assigned to NPU devices so that it can be used for EEH.
> 
> Signed-off-by: Alistair Popple 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/08f48f3234a79bca86c2283a16

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,2/2] cxl: use -Werror only with CONFIG_PPC_WERROR

2016-01-12 Thread Michael Ellerman
On Fri, 2016-08-01 at 18:30:10 UTC, Brian Norris wrote:
> Some developers really like to have -Werror enabled for their code, as
> it helps to ensure warning free code. Others don't want -Werror, as it
> (for example) can cause problems when newer (or older) compilers have
> different sets of warnings, or new warnings can appear just when turning
> up the warning level (e.g., make W=1 or W=2). Thus, it seems prudent to
> have the use of -Werror be configurable.
> 
> It so happens that cxl is only built on PowerPC, and PowerPC already
> has a nice set of Kconfig options for this, under CONFIG_PPC_WERROR. So
> let's use that, and the world is a happy place again! (Note that
> PPC_WERROR defaults to =y, so the common case compile should still be
> enforcing -Werror.)
> 
> Fixes: d3d73f4b38a8 ("cxl: Compile with -Werror")
> Signed-off-by: Brian Norris 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/57f7c3932516b9f7908d9b0a24

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/2] cxl: fix build for GCC 4.6.x

2016-01-12 Thread Michael Ellerman
On Fri, 2016-08-01 at 18:30:09 UTC, Brian Norris wrote:
> GCC 4.6.3 does not support -Wno-unused-const-variable. Instead, use the
> kbuild infrastructure that checks if this options exists.
> 
> Fixes: 2cd55c68c0a4 ("cxl: Fix build failure due to -Wunused-variable 
> behaviour change")
> Suggested-by: Michal Marek 
> Suggested-by: Arnd Bergmann 
> Signed-off-by: Brian Norris 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/aa09545589ceeff884421d8eb3

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/2] powerpc/powernv: Change NPU PE# assignment

2016-01-12 Thread Michael Ellerman
On Mon, 2016-11-01 at 05:53:49 UTC, Alistair Popple wrote:
> The P8+ hardware supports four partitionable endpoints (PEs) however
> the hardware reports all errors as occurring on PE#0. This means we
> need to reserve this PE for error handling (EEH) and not assign it to
> a NPU device, implying that some devices will need to share PEs.
> 
> This patch changes the PE assignment for NPU devices such that NPU
> devices which connect to the same GPU are assigned to the same
> PE#.
> 
> Signed-off-by: Alistair Popple 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b521549a09ddfac3bed38e2611

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc/module: Handle R_PPC64_ENTRY relocations

2016-01-12 Thread Michael Ellerman
From: Ulrich Weigand 

GCC 6 will include changes to generated code with -mcmodel=large,
which is used to build kernel modules on powerpc64le.  This was
necessary because the large model is supposed to allow arbitrary
sizes and locations of the code and data sections, but the ELFv2
global entry point prolog still made the unconditional assumption
that the TOC associated with any particular function can be found
within 2 GB of the function entry point:

func:
addis r2,r12,(.TOC.-func)@ha
addi  r2,r2,(.TOC.-func)@l
.localentry func, .-func

To remove this assumption, GCC will now generate instead this global
entry point prolog sequence when using -mcmodel=large:

.quad .TOC.-func
func:
.reloc ., R_PPC64_ENTRY
ldr2, -8(r12)
add   r2, r2, r12
.localentry func, .-func

The new .reloc triggers an optimization in the linker that will
replace this new prolog with the original code (see above) if the
linker determines that the distance between .TOC. and func is in
range after all.

Since this new relocation is now present in module object files,
the kernel module loader is required to handle them too.  This
patch adds support for the new relocation and implements the
same optimization done by the GNU linker.

Cc: sta...@vger.kernel.org
Signed-off-by: Ulrich Weigand 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/uapi/asm/elf.h |  2 ++
 arch/powerpc/kernel/module_64.c | 27 +++
 2 files changed, 29 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/elf.h 
b/arch/powerpc/include/uapi/asm/elf.h
index 59dad113897b..c2d21d11c2d2 100644
--- a/arch/powerpc/include/uapi/asm/elf.h
+++ b/arch/powerpc/include/uapi/asm/elf.h
@@ -295,6 +295,8 @@ do {
\
 #define R_PPC64_TLSLD  108
 #define R_PPC64_TOCSAVE109
 
+#define R_PPC64_ENTRY  118
+
 #define R_PPC64_REL16  249
 #define R_PPC64_REL16_LO   250
 #define R_PPC64_REL16_HI   251
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 68384514506b..59663af9315f 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -635,6 +635,33 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 */
break;
 
+   case R_PPC64_ENTRY:
+   /*
+* Optimize ELFv2 large code model entry point if
+* the TOC is within 2GB range of current location.
+*/
+   value = my_r2(sechdrs, me) - (unsigned long)location;
+   if (value + 0x80008000 > 0x)
+   break;
+   /*
+* Check for the large code model prolog sequence:
+*  ld r2, ...(r12)
+*  add r2, r2, r12
+*/
+   if uint32_t *)location)[0] & ~0xfffc)
+   != 0xe84c)
+   break;
+   if (((uint32_t *)location)[1] != 0x7c426214)
+   break;
+   /*
+* If found, replace it with:
+*  addis r2, r12, (.TOC.-func)@ha
+*  addi r2, r12, (.TOC.-func)@l
+*/
+   ((uint32_t *)location)[0] = 0x3c4c + PPC_HA(value);
+   ((uint32_t *)location)[1] = 0x3842 + PPC_LO(value);
+   break;
+
case R_PPC64_REL16_HA:
/* Subtract location pointer */
value -= (unsigned long)location;
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] scripts/recordmcount.pl: support data in text section on powerpc

2016-01-12 Thread Michael Ellerman
From: Ulrich Weigand 

If a text section starts out with a data blob before the first
function start label, disassembly parsing doing in recordmcount.pl
gets confused on powerpc, leading to creation of corrupted module
objects.

This was not a problem so far since the compiler would never create
such text sections.  However, this has changed with a recent change
in GCC 6 to support distances of > 2GB between a function and its
assoicated TOC in the ELFv2 ABI, exposing this problem.

There is already code in recordmcount.pl to handle such data blobs
on the sparc64 platform.  This patch uses the same method to handle
those on powerpc as well.

Cc: sta...@vger.kernel.org
Signed-off-by: Ulrich Weigand 
Signed-off-by: Michael Ellerman 
---
 scripts/recordmcount.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Steve can we get an ack for this one, to go via powerpc? cheers

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 826470d7f000..96e2486a6fc4 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -263,7 +263,8 @@ if ($arch eq "x86_64") {
 
 } elsif ($arch eq "powerpc") {
 $local_regex = "^[0-9a-fA-F]+\\s+t\\s+(\\.?\\S+)";
-$function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?.*?)>:";
+# See comment in the sparc64 section for why we use '\w'.
+$function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?\\w*?)>:";
 $mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\s\\.?_mcount\$";
 
 if ($bits == 64) {
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V3] powerpc/powernv: Add a kmsg_dumper that flushes console output on panic

2016-01-12 Thread Michael Ellerman
On Tue, 2016-01-12 at 15:17 +1100, Russell Currey wrote:
> On Tue, 2016-01-12 at 14:44 +1100, Stewart Smith wrote:
> > Michael Ellerman  writes:
> > > On Fri, 2015-27-11 at 06:23:07 UTC, Russell Currey wrote:
> > > > On BMC machines, console output is controlled by the OPAL firmware and 
> > > > is
> > > > only flushed when its pollers are called.  When the kernel is in a panic
> > > > state, it no longer calls these pollers and thus console output does not
> > > > completely flush, causing some output from the panic to be lost.
> > > > 
> > > > Output is only actually lost when the kernel is configured to not power
> > > > off
> > > > or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
> > > > flushes the console buffer as part of its power down routines.  Before
> > > > this
> > > > patch, however, only partial output would be printed during the timeout
> > > > wait.
> > > > 
> > > > This patch adds a new kmsg_dumper which gets called at panic time to
> > > > ensure
> > > > panic output is not lost.  It accomplishes this by calling
> > > > OPAL_CONSOLE_FLUSH
> > > > in the OPAL API, and if that is not available, the pollers are called
> > > > enough
> > > > times to (hopefully) completely flush the buffer.
> > > > 
> > > > The flushing mechanism will only affect output printed at and before the
> > > > kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel
> > > > panic"
> > > > message may still be truncated as follows:
> > > > 

> > > > > Call Trace:
> > > > > [c00f1f603b00] [c08e9458] dump_stack+0x90/0xbc 
> > > > > (unreliable)
> > > > > [c00f1f603b30] [c08e7e78] panic+0xf8/0x2c4
> > > > > [c00f1f603bc0] [c0be4860] mount_block_root+0x288/0x33c
> > > > > [c00f1f603c80] [c0be4d14] prepare_namespace+0x1f4/0x254
> > > > > [c00f1f603d00] [c0be43e8] kernel_init_freeable+0x318/0x350
> > > > > [c00f1f603dc0] [c000bd74] kernel_init+0x24/0x130
> > > > > [c00f1f603e30] [c00095b0] ret_from_kernel_thread+0x5c/0xac
> > > > > ---[ end Kernel panic - not
> > > > 
> > > > This functionality is implemented as a kmsg_dumper as it seems to be the
> > > > most sensible way to introduce platform-specific functionality to the
> > > > panic function.
> > > > 
> > > > Signed-off-by: Russell Currey 
> > > > Reviewed-by: Andrew Donnellan 
> > > 
> > > Applied to powerpc next, thanks.
> > > 
> > > https://git.kernel.org/powerpc/c/affddff69c55eb68969448f35f
> > 
> > The firmware interface changed slightly since this kernel patch[1], it
> > added a parameter to OPAL_CONSOLE_FLUSH which accepted the terminal
> > number to flush, theoretically allowing this to be plumbed into TTY
> > layer or something too.
> > 
> > So, we'll either have to update this patch or replace it with an updated
> > one.
> > 
> > [1] i'm pushing the accepted skiboot patch now.
> > 
> I'm working on an updated kernel patch to use the new parameter and additional
> return values, so I suppose it's up to mpe whether or not this patch gets
> merged now and another gets sent later to amend it, or if this patch gets
> reverted in next and I can send a V4 adding the new stuff.

Doh. I'd rather not revert it, unless we have to.

Basically we're passing junk in r3, which skiboot is expecting to be the
terminal number.

So running the current kernel code on the updated skiboot shouldn't crash and
burn, it just won't actually work the way it's supposed to.

So my preference would be just an incremental patch ASAP to fix the kernel to
do the right thing with the new interface.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc: tracing: don't trace hcalls on offline CPUs

2016-01-12 Thread Michael Ellerman
On Mon, 2015-12-14 at 23:18 +0300, Denis Kirjanov wrote:

> ./drmgr -c cpu -a -r gives the following warning:
> 
> [ 2327.035563]
> RCU used illegally from offline CPU!
> rcu_scheduler_active = 1, debug_locks = 1
> [ 2327.035564] no locks held by swapper/12/0.
> [ 2327.035565]
> stack backtrace:
> [ 2327.035567] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G S
> 4.3.0-rc3-00060-g353169a #5
> [ 2327.035568] Call Trace:
> [ 2327.035573] [c001d62578e0] [c08977fc] .dump_stack+0x98/0xd4 
> (unreliable)
> [ 2327.035577] [c001d6257960] [c0109bd8] 
> .lockdep_rcu_suspicious+0x108/0x170
> [ 2327.035580] [c001d62579f0] [c006a1d0] 
> .__trace_hcall_exit+0x2b0/0x2c0
> [ 2327.035584] [c001d6257ab0] [c006a2e8] 
> plpar_hcall_norets_trace+0x70/0x8c
> [ 2327.035588] [c001d6257b20] [c0067a14] 
> .icp_hv_set_cpu_priority+0x54/0xc0
> [ 2327.035592] [c001d6257ba0] [c0066c5c] 
> .xics_teardown_cpu+0x5c/0xa0
> [ 2327.035595] [c001d6257c20] [c00747ac] 
> .pseries_mach_cpu_die+0x6c/0x320
> [ 2327.035598] [c001d6257cd0] [c00439cc] .cpu_die+0x3c/0x60
> [ 2327.035602] [c001d6257d40] [c00183d8] 
> .arch_cpu_idle_dead+0x28/0x40
> [ 2327.035606] [c001d6257db0] [c00ff1dc] 
> .cpu_startup_entry+0x4fc/0x560
> [ 2327.035610] [c001d6257ed0] [c0043728] 
> .start_secondary+0x328/0x360
> [ 2327.035614] [c001d6257f90] [c0008a6c] 
> start_secondary_prolog+0x10/0x14
> [ 2327.035620] cpu 12 (hwid 12) Ready to die...
> [ 2327.144463] cpu 13 (hwid 13) Ready to die...
> [ 2327.294180] cpu 14 (hwid 14) Ready to die...
> [ 2327.403599] cpu 15 (hwid 15) Ready to die...
> 
> Make the hypervisor tracepoints conditional
> by using TRACE_EVENT_FN_COND
> 
> Signed-off-by: Denis Kirjanov 

Acked-by: Michael Ellerman 

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v2] perf/probe: Search both .eh_frame and .debug_frame sections for probe location

2016-01-12 Thread 平松雅巳 / HIRAMATU,MASAMI
Hi Hemant,

>From: Hemant Kumar [mailto:hem...@linux.vnet.ibm.com]
>
>perf probe through debuginfo__find_probes() in util/probe-finder.c
>checks for the functions' frame descriptions in either .eh_frame section
>of an ELF or the .debug_frame. The check is based on whether either one
>of these sections is present. Depending on distro, toolchain defaults,
>architetcutre, build flags, etc., CFI might be found in either .eh_frame
>and/or .debug_frame. Sometimes, it may happen that, .eh_frame, even if
>present, may not be complete and may miss some descriptions. Therefore,
>to be sure, to find the CFI covering an address we will always have to
>investigate both if available.

OK, so we'd better check both cfi's.
 [...]
>+/* Find probe points from debuginfo */
>+static int debuginfo__find_probes(struct debuginfo *dbg,
>+struct probe_finder *pf)
>+{
>+  int ret = 0;
>+
>+#if _ELFUTILS_PREREQ(0, 142)
>+  Elf *elf;
>+  GElf_Ehdr ehdr;
>+  GElf_Shdr shdr;
>+
>+  if (pf->cfi_eh || pf->cfi_dbg)
>+  return debuginfo__find_probe_location(dbg, pf);
>+
>+  /* Get the call frame information from this dwarf */
>+  elf = dwarf_getelf(dbg->dbg);
>+  if (elf == NULL)
>+  return -EINVAL;
>+
>+  if (gelf_getehdr(elf, &ehdr) == NULL)
>+  return -EINVAL;
>+
>+  if (elf_section_by_name(elf, &ehdr, &shdr, ".eh_frame", NULL) &&
>+  shdr.sh_type == SHT_PROGBITS) {
>+  pf->cfi_eh = dwarf_getcfi_elf(elf);
>+  } else {
>+  pf->cfi_dbg = dwarf_getcfi(dbg->dbg);
>+  }

Hmm, if you want to check both of those cfi's, don't we have to do below?

if (elf_section_by_name(elf, &ehdr, &shdr, ".eh_frame", NULL) &&
shdr.sh_type == SHT_PROGBITS)
pf->cfi_eh = dwarf_getcfi_elf(elf);

pf->cfi_dbg = dwarf_getcfi(dbg->dbg);

Then, both of pf->cfi_* will be filled (if the elf has ".eh_frame").

Thanks!



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Will Deacon
On Tue, Jan 12, 2016 at 11:40:12AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > > 0x12 semantics nor does it provide a publicly accessible link to
> > > documentation that does.
> > 
> > Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> > 
> > > 3) it really should have explained what you did with
> > > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
> > 
> > And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> > are _NOT_ transitive and therefore cannot be used to implement the
> > smp_mb__{before,after} stuff.
> > 
> > That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> > Completion Barriers. They need not be globally performed.
> 
> Which if true; and I know Will has some questions here; would also mean
> that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
> recently suggested by David Daney.

The issue I have with the SYNC description in the text above is that it
describes the single CPU (program order) and the dual-CPU (confusingly
named global order) cases, but then doesn't generalise any further. That
means we can't sensibly reason about transitivity properties when a third
agent is involved. For example, the WRC+sync+addr test:


P0:
Wx = 1

P1:
Rx == 1
SYNC
Wy = 1

P2:
Ry == 1

Rx = 0


I can't find anything to forbid that, given the text. The main problem
is having the SYNC on P1 affect the write by P0.

> That is, currently all architectures -- with exception of PPC -- have
> RCsc locks, but using these non-transitive things will get you RCpc
> locks.
> 
> So yes, MIPS can go RCpc for its locks and share the burden of pain with
> PPC, but that needs to be a very concious decision.

I think it's much worse than RCpc, given my interpretation of the wording.

Will
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v4 4/4] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-12 Thread Gautham R Shenoy
Hi Shilpa,

On Tue, Jan 12, 2016 at 04:24:27AM -0600, Shilpasri G Bhat wrote:

> +static inline int get_chip_index(struct kobject *kobj)

Probably have "get_chip_index(int id)". See the reason below.
> +{
> + int i, id;
> +
> + i = kstrtoint(kobj->name + 4, 0, &id);
> + if (i)
> + return i;
> +
> + for (i = 0; i < nr_chips; i++)
> + if (chips[i].id == id)
> + return i;

This pattern to obtain a chip index from the chip id is repeated in
multiple place inside this file. Might be worthwhile to move this to a
helper function, i.e get_chip_index(id)!

> + return -EINVAL;
> +}
> +
> +static ssize_t throttle_freq_show(struct kobject *kobj,
> +   struct kobj_attribute *attr, char *buf)
> +{
> + int i, count = 0, id;
> + 
We obtain the id from kobj here and then obtain the index from
id via the function below.


> + id = get_chip_index(kobj);


> + if (id < 0)
> + return id;
> +
> + for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
> + count += sprintf(&buf[count], "%d %d\n",
> +powernv_freqs[i].frequency,
> +chips[id].pstate_stat[i]);
> +
> + return count;
> +}
> +
> +static struct kobj_attribute attr_throttle_frequencies =
> +__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
> +
--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v4 1/4] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-12 Thread Shreyas B Prabhu


On 01/12/2016 03:54 PM, Shilpasri G Bhat wrote:
> cpu_to_chip_id() does a DT walk through to find out the chip id by taking a
> contended device tree lock. This adds an unnecessary overhead in a hot-path.
> So instead of cpu_to_chip_id() use PIR of the cpu to find the chip id.
> 
> Reported-by: Anton Blanchard 
> Signed-off-by: Shilpasri G Bhat 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index cb50138..597a084 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -39,6 +39,7 @@
>  #define PMSR_PSAFE_ENABLE(1UL << 30)
>  #define PMSR_SPR_EM_DISABLE  (1UL << 31)
>  #define PMSR_MAX(x)  ((x >> 32) & 0xFF)
> +#define pir_to_chip_id(pir)  (((pir) >> 7) & 0x3f)

Since this is platform specific and true only for power8, this is not
the right place to put it. Either you can move this to arch/powerpc  or
you can maintain a cpu to chip map within the driver.
> 
>  static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
>  static bool rebooting, throttled, occ_reset;
> @@ -312,13 +313,14 @@ static inline unsigned int get_nominal_index(void)
>  static void powernv_cpufreq_throttle_check(void *data)
>  {
>   unsigned int cpu = smp_processor_id();
> + unsigned int chip_id = pir_to_chip_id(hard_smp_processor_id());
>   unsigned long pmsr;
>   int pmsr_pmax, i;
> 
>   pmsr = get_pmspr(SPRN_PMSR);
> 
>   for (i = 0; i < nr_chips; i++)
> - if (chips[i].id == cpu_to_chip_id(cpu))
> + if (chips[i].id == chip_id)
>   break;
> 
>   /* Check for Pmax Capping */
> @@ -558,7 +560,8 @@ static int init_chip_info(void)
>   unsigned int prev_chip_id = UINT_MAX;
> 
>   for_each_possible_cpu(cpu) {
> - unsigned int id = cpu_to_chip_id(cpu);
> + unsigned int id =
> + pir_to_chip_id(get_hard_smp_processor_id(cpu));
> 
>   if (prev_chip_id != id) {
>   prev_chip_id = id;
> 

Thanks,
Shreyas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v4 3/4] cpufreq: powernv: Add a trace print for the throttle event

2016-01-12 Thread Gautham R Shenoy
Hi Shilpa,

Just saw this resend!

On Tue, Jan 12, 2016 at 04:24:26AM -0600, Shilpasri G Bhat wrote:
> Record the throttle event with a trace print replacing the printk,
> except for events like throttling below nominal and occ reset
> event which print a warning message.
> 
> Signed-off-by: Shilpasri G Bhat 
> ---

[..snip..]

> 
> -static void powernv_cpufreq_throttle_check(void *data)
> +static void powernv_cpufreq_check_pmax(void)
   ^^^
This function only contains code moved from
powernv_cpufreq_throttle_check with pr_crit/pr_warns replaced by
trace_powernv_throttle. Furthermore, it is not called from any other
place. Given that the original function was ~60 lines do we really
need to split it into two separate functions ? If yes, could it be an
inline function ?

>  {
>   unsigned int cpu = smp_processor_id();
>   unsigned int chip_id = pir_to_chip_id(hard_smp_processor_id());
> - unsigned long pmsr;
>   int pmsr_pmax, i;
> 
> - pmsr = get_pmspr(SPRN_PMSR);
> + pmsr_pmax = (s8)PMSR_MAX(get_pmspr(SPRN_PMSR));
> 
>   for (i = 0; i < nr_chips; i++)
>   if (chips[i].id == chip_id)
>   break;
> 
> - /* Check for Pmax Capping */
> - pmsr_pmax = (s8)PMSR_MAX(pmsr);
>   if (pmsr_pmax != powernv_pstate_info.max) {
>   if (chips[i].throttled)
> - goto next;
> + return;
> +
>   chips[i].throttled = true;
>   if (pmsr_pmax < powernv_pstate_info.nominal)
> - pr_crit("CPU %d on Chip %u has Pmax reduced below 
> nominal frequency (%d < %d)\n",
> - cpu, chips[i].id, pmsr_pmax,
> - powernv_pstate_info.nominal);
> - else
> - pr_info("CPU %d on Chip %u has Pmax reduced below turbo 
> frequency (%d < %d)\n",
> - cpu, chips[i].id, pmsr_pmax,
> - powernv_pstate_info.max);
> + pr_warn_once("CPU %d on Chip %u has Pmax reduced below 
> nominal frequency (%d < %d)\n",
> +  cpu, chips[i].id, pmsr_pmax,
> +  powernv_pstate_info.nominal);
> +
> + trace_powernv_throttle(chips[i].id,
> +throttle_reason[chips[i].throt_reason],
> +pmsr_pmax);
>   } else if (chips[i].throttled) {
>   chips[i].throttled = false;
> - pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
> - chips[i].id, pmsr_pmax);
> + trace_powernv_throttle(chips[i].id,
> +throttle_reason[chips[i].throt_reason],
> +pmsr_pmax);
>  }
> +}
> +
> +static void powernv_cpufreq_throttle_check(void *data)
> +{
> + unsigned long pmsr;
> +
> + pmsr = get_pmspr(SPRN_PMSR);
> +
> + /* Check for Pmax Capping */
> + powernv_cpufreq_check_pmax();
  
If you want to retain this function, you could pass pmsr as an
argument instead of computing it afresh in
powernv_cpufreq_check_pmax()

>   /* Check if Psafe_mode_active is set in PMSR. */
> -next:
>   if (pmsr & PMSR_PSAFE_ENABLE) {
>   throttled = true;
>   pr_info("Pstate set to safe frequency\n");

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc: tracing: don't trace hcalls on offline CPUs

2016-01-12 Thread Denis Kirjanov
On 12/23/15, Steven Rostedt  wrote:
> On Mon, 14 Dec 2015 23:18:06 +0300
> Denis Kirjanov  wrote:
>
>> ./drmgr -c cpu -a -r gives the following warning:
>>
>> [ 2327.035563]
>> RCU used illegally from offline CPU!
>> rcu_scheduler_active = 1, debug_locks = 1
>> [ 2327.035564] no locks held by swapper/12/0.
>> [ 2327.035565]
>> stack backtrace:
>> [ 2327.035567] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G S
>> 4.3.0-rc3-00060-g353169a #5
>> [ 2327.035568] Call Trace:
>> [ 2327.035573] [c001d62578e0] [c08977fc] .dump_stack+0x98/0xd4
>> (unreliable)
>> [ 2327.035577] [c001d6257960] [c0109bd8]
>> .lockdep_rcu_suspicious+0x108/0x170
>> [ 2327.035580] [c001d62579f0] [c006a1d0]
>> .__trace_hcall_exit+0x2b0/0x2c0
>> [ 2327.035584] [c001d6257ab0] [c006a2e8]
>> plpar_hcall_norets_trace+0x70/0x8c
>> [ 2327.035588] [c001d6257b20] [c0067a14]
>> .icp_hv_set_cpu_priority+0x54/0xc0
>> [ 2327.035592] [c001d6257ba0] [c0066c5c]
>> .xics_teardown_cpu+0x5c/0xa0
>> [ 2327.035595] [c001d6257c20] [c00747ac]
>> .pseries_mach_cpu_die+0x6c/0x320
>> [ 2327.035598] [c001d6257cd0] [c00439cc] .cpu_die+0x3c/0x60
>> [ 2327.035602] [c001d6257d40] [c00183d8]
>> .arch_cpu_idle_dead+0x28/0x40
>> [ 2327.035606] [c001d6257db0] [c00ff1dc]
>> .cpu_startup_entry+0x4fc/0x560
>> [ 2327.035610] [c001d6257ed0] [c0043728]
>> .start_secondary+0x328/0x360
>> [ 2327.035614] [c001d6257f90] [c0008a6c]
>> start_secondary_prolog+0x10/0x14
>> [ 2327.035620] cpu 12 (hwid 12) Ready to die...
>> [ 2327.144463] cpu 13 (hwid 13) Ready to die...
>> [ 2327.294180] cpu 14 (hwid 14) Ready to die...
>> [ 2327.403599] cpu 15 (hwid 15) Ready to die...
>>
>> Make the hypervisor tracepoints conditional
>> by using TRACE_EVENT_FN_COND
>>
>> Signed-off-by: Denis Kirjanov 
>
> I applied the first patch, but I need Acks from the powerpc maintainers
> to take this one.
>

Hi Michael,

Could you please put your ack to the second patch.

Thanks!

> -- Steve
>
>
>>
>> v2 changes:
>>  - Use raw_smp_processor_id as suggested by BenH
>>  since since hcalls can be called from preemptable sections
>>
>> v3 changes:
>>  - Fix the subject line
>> ---
>>  arch/powerpc/include/asm/trace.h | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/trace.h
>> b/arch/powerpc/include/asm/trace.h
>> index 8e86b48..32e36b1 100644
>> --- a/arch/powerpc/include/asm/trace.h
>> +++ b/arch/powerpc/include/asm/trace.h
>> @@ -57,12 +57,14 @@ DEFINE_EVENT(ppc64_interrupt_class,
>> timer_interrupt_exit,
>>  extern void hcall_tracepoint_regfunc(void);
>>  extern void hcall_tracepoint_unregfunc(void);
>>
>> -TRACE_EVENT_FN(hcall_entry,
>> +TRACE_EVENT_FN_COND(hcall_entry,
>>
>>  TP_PROTO(unsigned long opcode, unsigned long *args),
>>
>>  TP_ARGS(opcode, args),
>>
>> +TP_CONDITION(cpu_online(raw_smp_processor_id())),
>> +
>>  TP_STRUCT__entry(
>>  __field(unsigned long, opcode)
>>  ),
>> @@ -76,13 +78,15 @@ TRACE_EVENT_FN(hcall_entry,
>>  hcall_tracepoint_regfunc, hcall_tracepoint_unregfunc
>>  );
>>
>> -TRACE_EVENT_FN(hcall_exit,
>> +TRACE_EVENT_FN_COND(hcall_exit,
>>
>>  TP_PROTO(unsigned long opcode, unsigned long retval,
>>  unsigned long *retbuf),
>>
>>  TP_ARGS(opcode, retval, retbuf),
>>
>> +TP_CONDITION(cpu_online(raw_smp_processor_id())),
>> +
>>  TP_STRUCT__entry(
>>  __field(unsigned long, opcode)
>>  __field(unsigned long, retval)
>
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > 0x12 semantics nor does it provide a publicly accessible link to
> > documentation that does.
> 
> Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> 
> > 3) it really should have explained what you did with
> > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
> 
> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> are _NOT_ transitive and therefore cannot be used to implement the
> smp_mb__{before,after} stuff.
> 
> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> Completion Barriers. They need not be globally performed.

Which if true; and I know Will has some questions here; would also mean
that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
recently suggested by David Daney.

That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v4 4/4] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-12 Thread Shilpasri G Bhat
Create sysfs attributes to export throttle information in
/sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
follows:

1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
  This gives the throttle stats for each of the available frequencies.
  The throttle stat of a frequency is the total number of times the max
  frequency is reduced to that frequency.
  # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
  4023000 0
  399 0
  3956000 1
  3923000 0
  389 0
  3857000 2
  3823000 0
  379 0
  3757000 2
  3724000 1
  369 1
  ...

2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
  This directory contains throttle reason files. Each file gives the
  total number of times the max frequency is throttled, except for
  'throttle_reset', which gives the total number of times the max
  frequency is unthrottled after being throttled.
  # cd /sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
  # cat cpu_over_temperature
  7
  # cat occ_reset
  0
  # cat over_current
  0
  # cat power_cap
  0
  # cat power_supply_failure
  0
  # cat throttle_reset
  7

3)/sys/devices/system/cpu/cpufreq/chip0/throttle_stat
  This gives the total number of events of max frequency throttling to
  lower frequencies in the turbo range of frequencies and the sub-turbo(at
  and below nominal) range of frequencies.
  # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_stat
  turbo 7
  sub-turbo 0

Signed-off-by: Shilpasri G Bhat 
---
Changes from v3:
- Seperate the patch to contain only the throttle sysfs attribute changes.
- Add helper inline function get_chip_index()

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Changes from v1:
- Added a kobject to struct chip
- Grouped the throttle reasons under a separate attribute_group and
  exported each reason as individual file.
- Moved the sysfs files from /sys/devices/system/node/nodeN to
  /sys/devices/system/cpu/cpufreq/chipN
- As suggested by Paul Clarke replaced 'Nominal' with 'sub-turbo'.
- Modified the commit message.

 drivers/cpufreq/powernv-cpufreq.c | 177 +-
 1 file changed, 173 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index c98a6e7..40ccd9d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -54,6 +54,16 @@ static const char * const throttle_reason[] = {
"OCC Reset"
 };
 
+enum throt_reason_type {
+   NO_THROTTLE = 0,
+   POWERCAP,
+   CPU_OVERTEMP,
+   POWER_SUPPLY_FAILURE,
+   OVERCURRENT,
+   OCC_RESET_THROTTLE,
+   OCC_MAX_REASON
+};
+
 static struct chip {
unsigned int id;
bool throttled;
@@ -61,6 +71,11 @@ static struct chip {
u8 throt_reason;
cpumask_t mask;
struct work_struct throttle;
+   int throt_turbo;
+   int throt_nominal;
+   int reason[OCC_MAX_REASON];
+   int *pstate_stat;
+   struct kobject *kobj;
 } *chips;
 
 static int nr_chips;
@@ -195,6 +210,113 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
NULL,
 };
 
+static inline int get_chip_index(struct kobject *kobj)
+{
+   int i, id;
+
+   i = kstrtoint(kobj->name + 4, 0, &id);
+   if (i)
+   return i;
+
+   for (i = 0; i < nr_chips; i++)
+   if (chips[i].id == id)
+   return i;
+   return -EINVAL;
+}
+
+static ssize_t throttle_freq_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+   int i, count = 0, id;
+
+   id = get_chip_index(kobj);
+   if (id < 0)
+   return id;
+
+   for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
+   count += sprintf(&buf[count], "%d %d\n",
+  powernv_freqs[i].frequency,
+  chips[id].pstate_stat[i]);
+
+   return count;
+}
+
+static struct kobj_attribute attr_throttle_frequencies =
+__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
+
+static ssize_t throttle_stat_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+   int id, count = 0;
+
+   id = get_chip_index(kobj);
+   if (id < 0)
+   return id;
+
+   count += sprintf(&buf[count], "turbo %d\n", chips[id].throt_turbo);
+   count += sprintf(&buf[count], "sub-turbo %d\n",
+   chips[id].throt_nominal);
+
+   return count;
+}
+
+static struct kobj_attribute attr_throttle_stat =
+__ATTR(throttle_stat, 0444, throttle_stat_show, NULL);
+
+#define define_throttle_reason_attr(attr_name, val)  \
+static ssize_t attr_name##_show(struct kobject *kobj,\
+ st

[PATCH RESEND v4 2/4] cpufreq: powernv/tracing: Add powernv_throttle tracepoint

2016-01-12 Thread Shilpasri G Bhat
This patch adds the powernv_throttle tracepoint to trace the CPU
frequency throttling event, which is used by the powernv-cpufreq
driver in POWER8.

Signed-off-by: Shilpasri G Bhat 
CC: Ingo Molnar 
CC: Steven Rostedt 
---
No changes from v2 and v3.

 include/trace/events/power.h | 22 ++
 kernel/trace/power-traces.c  |  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 284244e..19e5030 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -38,6 +38,28 @@ DEFINE_EVENT(cpu, cpu_idle,
TP_ARGS(state, cpu_id)
 );
 
+TRACE_EVENT(powernv_throttle,
+
+   TP_PROTO(int chip_id, const char *reason, int pmax),
+
+   TP_ARGS(chip_id, reason, pmax),
+
+   TP_STRUCT__entry(
+   __field(int, chip_id)
+   __string(reason, reason)
+   __field(int, pmax)
+   ),
+
+   TP_fast_assign(
+   __entry->chip_id = chip_id;
+   __assign_str(reason, reason);
+   __entry->pmax = pmax;
+   ),
+
+   TP_printk("Chip %d Pmax %d %s", __entry->chip_id,
+ __entry->pmax, __get_str(reason))
+);
+
 TRACE_EVENT(pstate_sample,
 
TP_PROTO(u32 core_busy,
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index eb4220a..81b8745 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -15,4 +15,5 @@
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(suspend_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
+EXPORT_TRACEPOINT_SYMBOL_GPL(powernv_throttle);
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v4 3/4] cpufreq: powernv: Add a trace print for the throttle event

2016-01-12 Thread Shilpasri G Bhat
Record the throttle event with a trace print replacing the printk,
except for events like throttling below nominal and occ reset
event which print a warning message.

Signed-off-by: Shilpasri G Bhat 
---
Changes from v3:
- Separate this patch to contain trace_point changes
- Move struct chip member 'restore' of type bool above 'mask' to reduce
  structure padding.

No changes from v2.

Changes from v1:
- As suggested by Paul Clarke replaced char * throttle_reason[][30] by 
  const char * const throttle_reason[].

 drivers/cpufreq/powernv-cpufreq.c | 95 ---
 1 file changed, 49 insertions(+), 46 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 597a084..c98a6e7 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -44,12 +45,22 @@
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 
+static const char * const throttle_reason[] = {
+   "No throttling",
+   "Power Cap",
+   "Processor Over Temperature",
+   "Power Supply Failure",
+   "Over Current",
+   "OCC Reset"
+};
+
 static struct chip {
unsigned int id;
bool throttled;
+   bool restore;
+   u8 throt_reason;
cpumask_t mask;
struct work_struct throttle;
-   bool restore;
 } *chips;
 
 static int nr_chips;
@@ -310,41 +321,49 @@ static inline unsigned int get_nominal_index(void)
return powernv_pstate_info.max - powernv_pstate_info.nominal;
 }
 
-static void powernv_cpufreq_throttle_check(void *data)
+static void powernv_cpufreq_check_pmax(void)
 {
unsigned int cpu = smp_processor_id();
unsigned int chip_id = pir_to_chip_id(hard_smp_processor_id());
-   unsigned long pmsr;
int pmsr_pmax, i;
 
-   pmsr = get_pmspr(SPRN_PMSR);
+   pmsr_pmax = (s8)PMSR_MAX(get_pmspr(SPRN_PMSR));
 
for (i = 0; i < nr_chips; i++)
if (chips[i].id == chip_id)
break;
 
-   /* Check for Pmax Capping */
-   pmsr_pmax = (s8)PMSR_MAX(pmsr);
if (pmsr_pmax != powernv_pstate_info.max) {
if (chips[i].throttled)
-   goto next;
+   return;
+
chips[i].throttled = true;
if (pmsr_pmax < powernv_pstate_info.nominal)
-   pr_crit("CPU %d on Chip %u has Pmax reduced below 
nominal frequency (%d < %d)\n",
-   cpu, chips[i].id, pmsr_pmax,
-   powernv_pstate_info.nominal);
-   else
-   pr_info("CPU %d on Chip %u has Pmax reduced below turbo 
frequency (%d < %d)\n",
-   cpu, chips[i].id, pmsr_pmax,
-   powernv_pstate_info.max);
+   pr_warn_once("CPU %d on Chip %u has Pmax reduced below 
nominal frequency (%d < %d)\n",
+cpu, chips[i].id, pmsr_pmax,
+powernv_pstate_info.nominal);
+
+   trace_powernv_throttle(chips[i].id,
+  throttle_reason[chips[i].throt_reason],
+  pmsr_pmax);
} else if (chips[i].throttled) {
chips[i].throttled = false;
-   pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
-   chips[i].id, pmsr_pmax);
+   trace_powernv_throttle(chips[i].id,
+  throttle_reason[chips[i].throt_reason],
+  pmsr_pmax);
}
+}
+
+static void powernv_cpufreq_throttle_check(void *data)
+{
+   unsigned long pmsr;
+
+   pmsr = get_pmspr(SPRN_PMSR);
+
+   /* Check for Pmax Capping */
+   powernv_cpufreq_check_pmax();
 
/* Check if Psafe_mode_active is set in PMSR. */
-next:
if (pmsr & PMSR_PSAFE_ENABLE) {
throttled = true;
pr_info("Pstate set to safe frequency\n");
@@ -358,7 +377,7 @@ next:
 
if (throttled) {
pr_info("PMSR = %16lx\n", pmsr);
-   pr_crit("CPU Frequency could be throttled\n");
+   pr_warn("CPU Frequency could be throttled\n");
}
 }
 
@@ -449,15 +468,6 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
}
 }
 
-static char throttle_reason[][30] = {
-   "No throttling",
-   "Power Cap",
-   "Processor Over Temperature",
-   "Power Supply Failure",
-   "Over Current",
-   "OCC Reset"
-};
-
 static int

[PATCH RESEND v4 1/4] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-12 Thread Shilpasri G Bhat
cpu_to_chip_id() does a DT walk through to find out the chip id by taking a
contended device tree lock. This adds an unnecessary overhead in a hot-path.
So instead of cpu_to_chip_id() use PIR of the cpu to find the chip id.

Reported-by: Anton Blanchard 
Signed-off-by: Shilpasri G Bhat 
---
 drivers/cpufreq/powernv-cpufreq.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index cb50138..597a084 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -39,6 +39,7 @@
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
+#define pir_to_chip_id(pir)(((pir) >> 7) & 0x3f)
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
@@ -312,13 +313,14 @@ static inline unsigned int get_nominal_index(void)
 static void powernv_cpufreq_throttle_check(void *data)
 {
unsigned int cpu = smp_processor_id();
+   unsigned int chip_id = pir_to_chip_id(hard_smp_processor_id());
unsigned long pmsr;
int pmsr_pmax, i;
 
pmsr = get_pmspr(SPRN_PMSR);
 
for (i = 0; i < nr_chips; i++)
-   if (chips[i].id == cpu_to_chip_id(cpu))
+   if (chips[i].id == chip_id)
break;
 
/* Check for Pmax Capping */
@@ -558,7 +560,8 @@ static int init_chip_info(void)
unsigned int prev_chip_id = UINT_MAX;
 
for_each_possible_cpu(cpu) {
-   unsigned int id = cpu_to_chip_id(cpu);
+   unsigned int id =
+   pir_to_chip_id(get_hard_smp_processor_id(cpu));
 
if (prev_chip_id != id) {
prev_chip_id = id;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v4 0/4] cpufreq: powernv: Redesign the presentation of throttle notification

2016-01-12 Thread Shilpasri G Bhat
In POWER8, OCC(On-Chip-Controller) can throttle the frequency of the
CPU when the chip crosses its thermal and power limits. Currently,
powernv-cpufreq driver detects and reports this event as a console
message. Some machines may not sustain the max turbo frequency in all
conditions and can be throttled frequently. This can lead to the
flooding of console with throttle messages. So this patchset aims to
redesign the presentation of this event via sysfs counters and
tracepoints. 

Patches [2] to [4] will add a perf trace point "power:powernv_throttle" and
sysfs throttle counter stats in /sys/devices/system/cpu/cpufreq/chipN.
Patch [1] solves a bug in powernv_cpufreq_throttle_check(), which calls in to
cpu_to_chip_id() in hot path which reads DT every time to find the chip id.

Resending the patchset as I has cc'ed sta...@vger.kernel.org in developemnt
cycle and used --in-reply-to to post a new version.

Changes from v3:
- Add a fix to replace cpu_to_chip_id() with simpler PIR shift to obtain the
  chip id.
- Break patch2 in to two patches separating the tracepoint and sysfs attribute
  changes.

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Shilpasri G Bhat (4):
  cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  cpufreq: powernv/tracing: Add powernv_throttle tracepoint
  cpufreq: powernv: Add a trace print for the throttle event
  cpufreq: powernv: Add sysfs attributes to show throttle stats

 drivers/cpufreq/powernv-cpufreq.c | 279 +++---
 include/trace/events/power.h  |  22 +++
 kernel/trace/power-traces.c   |   1 +
 3 files changed, 250 insertions(+), 52 deletions(-)

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> 0x12 semantics nor does it provide a publicly accessible link to
> documentation that does.

Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/

> 3) it really should have explained what you did with
> smp_llsc_mb/smp_mb__before_llsc() in _detail_.

And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are _NOT_ transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers. They need not be globally performed.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 10:43:36AM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> > On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> > >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> > >smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
> > >the asm-generic variants exactly. Drop the local definitions and pull in
> > >asm-generic/barrier.h instead.
> > >
> > This statement doesn't fit MIPS barriers variations. Moreover, there is a
> > reason to extend that even more specific, at least for smp_store_release and
> > smp_load_acquire, look into
> > 
> > http://patchwork.linux-mips.org/patch/10506/
> > 
> > - Leonid.
> 
> Fine, but it matches what current code is doing.  Since that
> MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
> you see a problem reworking it on top of this patchset?

That patch is a complete doorstop atm. It needs a lot more work before
it can go anywhere. Don't worry about it.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:

> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
> 
> http://patchwork.linux-mips.org/patch/10506/

Dude, that's one horrible patch.

1) you do not make such things selectable; either the hardware needs
them or it doesn't. If it does you _must_ use them, however unlikely.

2) the changelog _completely_ fails to explain the sync 0x11 and sync
0x12 semantics nor does it provide a publicly accessible link to
documentation that does.

3) it really should have explained what you did with
smp_llsc_mb/smp_mb__before_llsc() in _detail_.

And I agree that ideally it should be split into parts.

Seriously, this is _NOT_ OK.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Michael S. Tsirkin
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> >smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
> >the asm-generic variants exactly. Drop the local definitions and pull in
> >asm-generic/barrier.h instead.
> >
> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
> 
> http://patchwork.linux-mips.org/patch/10506/
> 
> - Leonid.

Fine, but it matches what current code is doing.  Since that
MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
you see a problem reworking it on top of this patchset?

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 18/33] powerpc/mm: Add helper for update page flags during ioremap

2016-01-12 Thread Denis Kirjanov
On 1/12/16, Aneesh Kumar K.V  wrote:
> They differ between radix and hash. Hence we need a helper
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/32/pgtable.h | 11 +++
>  arch/powerpc/include/asm/book3s/64/hash.h| 11 +++
>  arch/powerpc/include/asm/nohash/pgtable.h| 20 
>  arch/powerpc/mm/pgtable_64.c | 16 +---
>  4 files changed, 43 insertions(+), 15 deletions(-)
Can we put it alone in some common header file?
>
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h
> b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index c0898e26ed4a..b53d7504d6f6 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -491,6 +491,17 @@ static inline unsigned long gup_pte_filter(int write)
>   mask |= _PAGE_RW;
>   return mask;
>  }
> +
> +static inline unsigned long ioremap_prot_flags(unsigned long flags)
> +{
> + /* writeable implies dirty for kernel addresses */
> + if (flags & _PAGE_RW)
> + flags |= _PAGE_DIRTY;
> +
> + /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
> + flags &= ~(_PAGE_USER | _PAGE_EXEC);
> + return flags;
> +}
>  #endif /* !__ASSEMBLY__ */
>
>  #endif /*  _ASM_POWERPC_BOOK3S_32_PGTABLE_H */
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h
> b/arch/powerpc/include/asm/book3s/64/hash.h
> index d51709dad729..4f0fdb9a5d19 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -592,6 +592,17 @@ static inline unsigned long gup_pte_filter(int write)
>   return mask;
>  }
>
> +static inline unsigned long ioremap_prot_flags(unsigned long flags)
> +{
> + /* writeable implies dirty for kernel addresses */
> + if (flags & _PAGE_RW)
> + flags |= _PAGE_DIRTY;
> +
> + /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
> + flags &= ~(_PAGE_USER | _PAGE_EXEC);
> + return flags;
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long
> addr,
>  pmd_t *pmdp, unsigned long old_pmd);
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h
> b/arch/powerpc/include/asm/nohash/pgtable.h
> index e4173cb06e5b..8861ec146985 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -238,6 +238,26 @@ static inline unsigned long gup_pte_filter(int write)
>   return mask;
>  }
>
> +static inline unsigned long ioremap_prot_flags(unsigned long flags)
> +{
> + /* writeable implies dirty for kernel addresses */
> + if (flags & _PAGE_RW)
> + flags |= _PAGE_DIRTY;
> +
> + /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
> + flags &= ~(_PAGE_USER | _PAGE_EXEC);
> +
> +#ifdef _PAGE_BAP_SR
> + /* _PAGE_USER contains _PAGE_BAP_SR on BookE using the new PTE format
> +  * which means that we just cleared supervisor access... oops ;-) This
> +  * restores it
> +  */
> + flags |= _PAGE_BAP_SR;
> +#endif
> +
> + return flags;
> +}
> +
>  #ifdef CONFIG_HUGETLB_PAGE
>  static inline int hugepd_ok(hugepd_t hpd)
>  {
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 21a9a171c267..aa8ff4c74563 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -188,21 +188,7 @@ void __iomem * ioremap_prot(phys_addr_t addr, unsigned
> long size,
>  {
>   void *caller = __builtin_return_address(0);
>
> - /* writeable implies dirty for kernel addresses */
> - if (flags & _PAGE_RW)
> - flags |= _PAGE_DIRTY;
> -
> - /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
> - flags &= ~(_PAGE_USER | _PAGE_EXEC);
> -
> -#ifdef _PAGE_BAP_SR
> - /* _PAGE_USER contains _PAGE_BAP_SR on BookE using the new PTE format
> -  * which means that we just cleared supervisor access... oops ;-) This
> -  * restores it
> -  */
> - flags |= _PAGE_BAP_SR;
> -#endif
> -
> + flags = ioremap_prot_flags(flags);
>   if (ppc_md.ioremap)
>   return ppc_md.ioremap(addr, size, flags, caller);
>   return __ioremap_caller(addr, size, flags, caller);
> --
> 2.5.0
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 14/33] powerpc/mm: Use helper for finding pte bits mapping I/O area

2016-01-12 Thread Denis Kirjanov
On 1/12/16, Aneesh Kumar K.V  wrote:
> We will have different values for hash and radix. Hence we
> cannot use #define constants. Add helper
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/32/pgtable.h | 5 +
>  arch/powerpc/include/asm/book3s/64/hash.h| 5 +
>  arch/powerpc/include/asm/nohash/pgtable.h| 5 +
>  arch/powerpc/kernel/isa-bridge.c | 4 ++--
>  arch/powerpc/kernel/pci_64.c | 2 +-
>  arch/powerpc/mm/pgtable_64.c | 2 +-
>  6 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h
> b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 3ed3303c1295..77adada2f3b4 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -478,6 +478,11 @@ static inline pgprot_t pgprot_writecombine(pgprot_t
> prot)
>   return pgprot_noncached_wc(prot);
>  }
>
> +static inline unsigned long pte_io_cache_bits(void)
> +{
> + return _PAGE_NO_CACHE | _PAGE_GUARDED;
> +}
This could be just plain #define
> +
>  #endif /* !__ASSEMBLY__ */
>
>  #endif /*  _ASM_POWERPC_BOOK3S_32_PGTABLE_H */
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h
> b/arch/powerpc/include/asm/book3s/64/hash.h
> index ced3aed63af2..1b27c0c8effa 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -578,6 +578,11 @@ static inline pgprot_t pgprot_writecombine(pgprot_t
> prot)
>  extern pgprot_t vm_get_page_prot(unsigned long vm_flags);
>  #define vm_get_page_prot vm_get_page_prot
>
> +static inline unsigned long pte_io_cache_bits(void)
> +{
> + return _PAGE_NO_CACHE | _PAGE_GUARDED;
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long
> addr,
>  pmd_t *pmdp, unsigned long old_pmd);
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h
> b/arch/powerpc/include/asm/nohash/pgtable.h
> index 11e3767216c0..8c4bb8fda0de 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -224,6 +224,11 @@ extern pgprot_t phys_mem_access_prot(struct file *file,
> unsigned long pfn,
>unsigned long size, pgprot_t vma_prot);
>  #define __HAVE_PHYS_MEM_ACCESS_PROT
>
> +static inline unsigned long pte_io_cache_bits(void)
> +{
> + return _PAGE_NO_CACHE | _PAGE_GUARDED;
> +}
> +
>  #ifdef CONFIG_HUGETLB_PAGE
>  static inline int hugepd_ok(hugepd_t hpd)
>  {
> diff --git a/arch/powerpc/kernel/isa-bridge.c
> b/arch/powerpc/kernel/isa-bridge.c
> index 0f1997097960..d81185f025fa 100644
> --- a/arch/powerpc/kernel/isa-bridge.c
> +++ b/arch/powerpc/kernel/isa-bridge.c
> @@ -109,14 +109,14 @@ static void pci_process_ISA_OF_ranges(struct
> device_node *isa_node,
>   size = 0x1;
>
>   __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
> -  size, _PAGE_NO_CACHE|_PAGE_GUARDED);
> +  size, pte_io_cache_bits());
>   return;
>
>  inval_range:
>   printk(KERN_ERR "no ISA IO ranges or unexpected isa range, "
>  "mapping 64k\n");
>   __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
> -  0x1, _PAGE_NO_CACHE|_PAGE_GUARDED);
> +  0x1, pte_io_cache_bits());
>  }
>
>
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 60bb187cb46a..7fe1dfd214a1 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -159,7 +159,7 @@ static int pcibios_map_phb_io_space(struct
> pci_controller *hose)
>
>   /* Establish the mapping */
>   if (__ioremap_at(phys_page, area->addr, size_page,
> -  _PAGE_NO_CACHE | _PAGE_GUARDED) == NULL)
> +  pte_io_cache_bits()) == NULL)
>   return -ENOMEM;
>
>   /* Fixup hose IO resource */
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index e5f600d19326..6d161cec2e32 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -253,7 +253,7 @@ void __iomem * __ioremap(phys_addr_t addr, unsigned long
> size,
>
>  void __iomem * ioremap(phys_addr_t addr, unsigned long size)
>  {
> - unsigned long flags = _PAGE_NO_CACHE | _PAGE_GUARDED;
> + unsigned long flags = pte_io_cache_bits();
>   void *caller = __builtin_return_address(0);
>
>   if (ppc_md.ioremap)
> --
> 2.5.0
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH V1 29/33] powerpc/mm: Hash linux abstraction for THP

2016-01-12 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  42 ---
 arch/powerpc/include/asm/book3s/64/hash.h |  14 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 154 +-
 arch/powerpc/mm/pgtable-hash64.c  |  58 +-
 4 files changed, 198 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 8008c9a89416..e697fc528c0a 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -190,11 +190,19 @@ static inline int hugepd_ok(hugepd_t hpd)
 #endif /* CONFIG_HUGETLB_PAGE */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-extern unsigned long pmd_hugepage_update(struct mm_struct *mm,
-unsigned long addr,
-pmd_t *pmdp,
-unsigned long clr,
-unsigned long set);
+
+extern pmd_t pfn_hlpmd(unsigned long pfn, pgprot_t pgprot);
+extern pmd_t mk_hlpmd(struct page *page, pgprot_t pgprot);
+extern pmd_t hlpmd_modify(pmd_t pmd, pgprot_t newprot);
+extern int hl_has_transparent_hugepage(void);
+extern void set_hlpmd_at(struct mm_struct *mm, unsigned long addr,
+pmd_t *pmdp, pmd_t pmd);
+
+extern unsigned long hlpmd_hugepage_update(struct mm_struct *mm,
+  unsigned long addr,
+  pmd_t *pmdp,
+  unsigned long clr,
+  unsigned long set);
 static inline char *get_hpte_slot_array(pmd_t *pmdp)
 {
/*
@@ -253,51 +261,55 @@ static inline void mark_hpte_slot_valid(unsigned char 
*hpte_slot_array,
  * that for explicit huge pages.
  *
  */
-static inline int pmd_trans_huge(pmd_t pmd)
+static inline int hlpmd_trans_huge(pmd_t pmd)
 {
return !!((pmd_val(pmd) & (H_PAGE_PTE | H_PAGE_THP_HUGE)) ==
  (H_PAGE_PTE | H_PAGE_THP_HUGE));
 }
 
-static inline int pmd_large(pmd_t pmd)
+static inline int hlpmd_large(pmd_t pmd)
 {
return !!(pmd_val(pmd) & H_PAGE_PTE);
 }
 
-static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+static inline pmd_t hlpmd_mknotpresent(pmd_t pmd)
 {
return __pmd(pmd_val(pmd) & ~H_PAGE_PRESENT);
 }
 
-#define __HAVE_ARCH_PMD_SAME
-static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
+static inline pmd_t hlpmd_mkhuge(pmd_t pmd)
+{
+   return __pmd(pmd_val(pmd) | (H_PAGE_PTE | H_PAGE_THP_HUGE));
+}
+
+static inline int hlpmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~H_PAGE_HPTEFLAGS) == 0);
 }
 
-static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
+static inline int __hlpmdp_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pmd_t *pmdp)
 {
unsigned long old;
 
if ((pmd_val(*pmdp) & (H_PAGE_ACCESSED | H_PAGE_HASHPTE)) == 0)
return 0;
-   old = pmd_hugepage_update(mm, addr, pmdp, H_PAGE_ACCESSED, 0);
+   old = hlpmd_hugepage_update(mm, addr, pmdp, H_PAGE_ACCESSED, 0);
return ((old & H_PAGE_ACCESSED) != 0);
 }
 
-#define __HAVE_ARCH_PMDP_SET_WRPROTECT
-static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr,
+static inline void hlpmdp_set_wrprotect(struct mm_struct *mm, unsigned long 
addr,
  pmd_t *pmdp)
 {
 
if ((pmd_val(*pmdp) & H_PAGE_RW) == 0)
return;
 
-   pmd_hugepage_update(mm, addr, pmdp, H_PAGE_RW, 0);
+   hlpmd_hugepage_update(mm, addr, pmdp, H_PAGE_RW, 0);
 }
 
 #endif /*  CONFIG_TRANSPARENT_HUGEPAGE */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_64K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 20bb9da200c6..f43b26c4d319 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -600,6 +600,20 @@ static inline void hpte_do_hugepage_flush(struct mm_struct 
*mm,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+extern int hlpmdp_set_access_flags(struct vm_area_struct *vma,
+   unsigned long address, pmd_t *pmdp,
+   pmd_t entry, int dirty);
+extern int hlpmdp_test_and_clear_young(struct vm_area_struct *vma,
+   unsigned long address, pmd_t *pmdp);
+extern pmd_t hlpmdp_huge_get_and_clear(struct mm_struct *mm,
+   unsigned long addr, pmd_t *pmdp);
+extern pmd_t hlpmdp_collapse_flush(struct vm_area_struct *vma,
+   unsigned long address, pmd_t *pmdp);
+extern void hlpgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+