Re: [PATCH v6 0/7] perf report: Show branch type

2017-06-25 Thread Jin, Yao

Hi maintainers,

Is this patch series OK or anything I should update?

Thanks

Jin Yao


On 6/2/2017 4:02 PM, Jin, Yao wrote:

Hi maintainers,

Is this patch series (v6) OK for merging?

Thanks

Jin Yao


On 4/20/2017 5:36 PM, Jiri Olsa wrote:

On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:

v6:
Update according to the review comments from
Jiri Olsa . Major modifications are:

1. Move that multiline conditional code inside {} brackets.

2. Move branch_type_stat_display() from builtin-report.c to
   branch.c. Move branch_type_str() from callchain.c to
   branch.c.

3. Keep the original branch info display order, that is:
   predicted, abort, cycles, iterations

for the tools part

Acked-by: Jiri Olsa 

thanks,
jirka






Re: [PATCH v3 3/9] powerpc/kprobes/optprobes: Move over to patch_instruction

2017-06-25 Thread Michael Ellerman
Balbir Singh  writes:

> With text moving to read-only migrate optprobes to using
> the patch_instruction infrastructure. Without this optprobes
> will fail and complain.
>
> Signed-off-by: Balbir Singh 
> ---
>  arch/powerpc/kernel/optprobes.c | 58 
> ++---
>  1 file changed, 37 insertions(+), 21 deletions(-)

I picked this up too.

cheers


Re: [PATCH v3 1/9] powerpc/lib/code-patching: Enhance code patching

2017-06-25 Thread Michael Ellerman
Balbir Singh  writes:

> Today our patching happens via direct copy and
> patch_instruction. The patching code is well
> contained in the sense that copying bits are limited.
>
> While considering implementation of CONFIG_STRICT_RWX,
> the first requirement is to a create another mapping
> that will allow for patching. We create the window using
> text_poke_area, allocated via get_vm_area(), which might
> be an overkill. text_poke_area is per CPU to avoid locking
> Other arches do similar things, but use fixmaps. The reason
> for not using fixmaps is to make use of any randomization in
> the future. The code also relies on set_pte_at and pte_clear
> to do the appropriate tlb flushing.
>
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 500b0f6..c0a0834 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -12,23 +12,154 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
> +static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> +static unsigned int text_area_patch_avail;

All of this should be under #ifdef STRICT_RWX. So that when STRICT_RWX=n
we basically use the old code.

> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +static int text_area_cpu_up(unsigned int cpu)
> +{
> + struct vm_struct *area;
> +
> + area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> + if (!area) {
> + WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> + cpu);
> + return -1;

This is good, it will block bringing up a CPU if we can't get the VM
area, which is the safe option.

> + }
> + this_cpu_write(text_poke_area, area);
> + return 0;
> +}
> +
> +static int text_area_cpu_down(unsigned int cpu)
> +{
> + free_vm_area(this_cpu_read(text_poke_area));
> + return 0;
> +}
> +
> +/*
> + * This is an early_initcall and early_initcalls happen at the right time
> + * for us, after slab is enabled and before we mark ro pages R/O. In the
> + * future if get_vm_area is randomized, this will be more flexible than
> + * fixmap
> + */
> +static int __init setup_text_poke_area(void)
>  {
> + struct vm_struct *area;
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> + if (!area) {
> + WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> + cpu);
> + /* Should we disable strict rwx? */
> + continue;
> + }
> + this_cpu_write(text_poke_area, area);
> + }

So I think skip this.

> + cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "powerpc/text_poke:online", text_area_cpu_up,
> + text_area_cpu_down);

Use the cpuhp_setup_state() version, which will call it on the boot CPU.
And then just BUG_ON() if it fails.

Also switch this to a late_initcall(), so that if we do BUG_ON() it's
nice and late and the kernel is mostly up.

> + text_area_patch_avail = 1;

Instead of this global flag, we should just check that each CPUs
text_poke_area is non-NULL before using it ...

> + /*
> +  * The barrier here ensures the write is visible to
> +  * patch_instruction()
> +  */
> + smp_wmb();
> + pr_info("text_poke area ready...\n");
> + return 0;
> +}
> +
> +/*
> + * This can be called for kernel text or a module.
> + */
> +static int kernel_map_addr(void *addr)

map_patch_area() ?

> +{
> + unsigned long pfn;
>   int err;
>  
> - __put_user_size(instr, addr, 4, err);
> + if (is_vmalloc_addr(addr))
> + pfn = vmalloc_to_pfn(addr);
> + else
> + pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> +

.. in here.

ie. if (!this_cpu_read(text_poke_area))
return -Exxx;

> + err = map_kernel_page(
> + (unsigned long)__this_cpu_read(text_poke_area)->addr,
> + (pfn << PAGE_SHIFT), pgprot_val(PAGE_KERNEL));

Use a local or two to make that less gross.

> + pr_devel("Mapped addr %p with pfn %lx\n",
> + __this_cpu_read(text_poke_area)->addr, pfn);

err?

>   if (err)
> - return err;
> - asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr));
> + return -1;
>   return 0;
>  }
>  
> +static inline void kernel_unmap_addr(void *addr)
> +{
> + pte_t *pte;
> + unsigned long kaddr = (unsigned long)addr;
> +
> + pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr),
> + kaddr), kaddr), kaddr);

This is pretty fragile, I'd rather you checked each level returned
something sane.

> + pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr);
> + pte_clear(&init_mm, kaddr, pte);
> + flush_tlb_kernel_range(kaddr, kaddr + PAGE_SIZE);
> +

[PATCH V5] powerpc/powernv : Add support for OPAL-OCC command/response interface

2017-06-25 Thread Shilpasri G Bhat
In P9, OCC (On-Chip-Controller) supports shared memory based
commad-response interface. Within the shared memory there is an OPAL
command buffer and OCC response buffer that can be used to send
inband commands to OCC. This patch adds a platform driver to support
the command/response interface between OCC and the host.

Signed-off-by: Shilpasri G Bhat 
---
The skiboot patch for the interface is posted here:
https://lists.ozlabs.org/pipermail/skiboot/2017-June/007960.html

Changes from V4:
- Add token as a parameter to the opal_occ_command()
- Use per-occ counter for command request_id instead of using async
  token.

 arch/powerpc/include/asm/opal-api.h|  41 +++-
 arch/powerpc/include/asm/opal.h|   3 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-occ.c  | 303 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   8 +
 6 files changed, 356 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index cb3e624..011d86c 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -42,6 +42,10 @@
 #define OPAL_I2C_STOP_ERR  -24
 #define OPAL_XIVE_PROVISIONING -31
 #define OPAL_XIVE_FREE_ACTIVE  -32
+#define OPAL_OCC_INVALID_STATE -33
+#define OPAL_OCC_BUSY  -34
+#define OPAL_OCC_CMD_TIMEOUT   -35
+#define OPAL_OCC_RSP_MISMATCH  -36
 
 /* API Tokens (in r0) */
 #define OPAL_INVALID_CALL -1
@@ -190,7 +194,8 @@
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
-#define OPAL_LAST  148
+#define OPAL_OCC_COMMAND   149
+#define OPAL_LAST  149
 
 /* Device tree flags */
 
@@ -829,6 +834,40 @@ struct opal_prd_msg_header {
 
 struct opal_prd_msg;
 
+enum occ_cmd {
+   OCC_CMD_AMESTER_PASS_THRU = 0,
+   OCC_CMD_CLEAR_SENSOR_DATA,
+   OCC_CMD_SET_POWER_CAP,
+   OCC_CMD_SET_POWER_SHIFTING_RATIO,
+   OCC_CMD_SELECT_SENSOR_GROUPS,
+   OCC_CMD_LAST
+};
+
+struct opal_occ_cmd_rsp_msg {
+   __be64 cdata;
+   __be64 rdata;
+   __be16 cdata_size;
+   __be16 rdata_size;
+   u8 cmd;
+   u8 request_id;
+   u8 status;
+};
+
+struct opal_occ_cmd_data {
+   __be16 size;
+   u8 cmd;
+   u8 data[];
+};
+
+struct opal_occ_rsp_data {
+   __be16 size;
+   u8 status;
+   u8 data[];
+};
+
+#define MAX_OPAL_CMD_DATA_LENGTH4090
+#define MAX_OCC_RSP_DATA_LENGTH 8698
+
 #define OCC_RESET   0
 #define OCC_LOAD1
 #define OCC_THROTTLE2
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 03ed493..84659bd 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -346,6 +346,9 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_wake_poller(void);
 
+int64_t opal_occ_command(int chip_id, struct opal_occ_cmd_rsp_msg *msg,
+int token, bool retry);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..f5f0902 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-occ.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-occ.c 
b/arch/powerpc/platforms/powernv/opal-occ.c
new file mode 100644
index 000..440304f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-occ.c
@@ -0,0 +1,303 @@
+/*
+ * Copyright IBM Corporation 2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) "opal-occ: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+st

Re: [PATCH v3 2/9] powerpc/kprobes: Move kprobes over to patch_instruction

2017-06-25 Thread Michael Ellerman
Balbir Singh  writes:

> On Wed, 2017-06-07 at 00:35 +0530, Naveen N. Rao wrote:
>> Hi Balbir,
>> 
>> On 2017/06/06 02:29PM, Balbir Singh wrote:
>> > arch_arm/disarm_probe use direct assignment for copying
>> > instructions, replace them with patch_instruction
>> > 
>> > Signed-off-by: Balbir Singh 
>> > ---
>> >  arch/powerpc/kernel/kprobes.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
>> > index fc43435..b49f8f0 100644
>> > --- a/arch/powerpc/kernel/kprobes.c
>> > +++ b/arch/powerpc/kernel/kprobes.c
>> > @@ -158,7 +158,7 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
>> > 
>> >  void arch_arm_kprobe(struct kprobe *p)
>> >  {
>> > -  *p->addr = BREAKPOINT_INSTRUCTION;
>> > +  patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
>> >flush_icache_range((unsigned long) p->addr,
>> >   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>> 
>> Do we still need flush_icache_range() after patch_instruction()?
>
> Good catch! No, we don't

I picked this up independently.

cheers


Re: [RFC v3 02/23] powerpc: introduce set_hidx_slot helper

2017-06-25 Thread Benjamin Herrenschmidt
On Mon, 2017-06-26 at 09:03 +1000, Balbir Singh wrote:
> On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> > Introduce set_hidx_slot() which sets the (H_PAGE_F_SECOND|H_PAGE_F_GIX)
> > bits at  the  appropriate  location  in  the  PTE  of  4K  PTE.  In the
> > case of 64K PTE, it sets the bits in the second part of the PTE. Though
> > the implementation for the former just needs the slot parameter, it does
> > take some additional parameters to keep the prototype consistent.
> > 
> > This function will come in handy as we  work  towards  re-arranging the
> > bits in the later patches.

The name somewhat sucks. Something like pte_set_hash_slot() or
something like that would be much more meaningful.

> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
> >  arch/powerpc/include/asm/book3s/64/hash-64k.h | 16 
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > index 9c2c8f1..cef644c 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > @@ -55,6 +55,13 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
> >  }
> >  #endif
> >  
> > +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> > +   unsigned int subpg_index, unsigned long slot)
> > +{
> > +   return (slot << H_PAGE_F_GIX_SHIFT) &
> > +   (H_PAGE_F_SECOND | H_PAGE_F_GIX);
> > +}
> > +
> 
> A comment on top would help explain that 4k and 64k are different, 64k
> is a new layout.
> 
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  
> >  static inline char *get_hpte_slot_array(pmd_t *pmdp)
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> > index 3f49941..4bac70a 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> > @@ -75,6 +75,22 @@ static inline unsigned long __rpte_to_hidx(real_pte_t 
> > rpte, unsigned long index)
> > return (pte_val(rpte.pte) >> H_PAGE_F_GIX_SHIFT) & 0xf;
> >  }
> >  
> > +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> > +   unsigned int subpg_index, unsigned long slot)
> > +{
> > +   unsigned long *hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
> > +
> > +   rpte.hidx &= ~(0xfUL << (subpg_index << 2));
> > +   *hidxp = rpte.hidx  | (slot << (subpg_index << 2));
> > +   /*
> > +* Avoid race with __real_pte()
> > +* hidx must be committed to memory before committing
> > +* the pte.
> > +*/
> > +   smp_wmb();
> 
> Whats the other paired barrier, is it in set_pte()?
> 
> > +   return 0x0UL;
> > +}
> 
> We return 0 here and slot information for 4k pages, it is not that
> clear
> 
> Balbir Singh.


Re: [PATCH 1/2] powerpc/powernv/pci: Add helper to check if a PE has a single vendor

2017-06-25 Thread Alistair Popple
You may need some kind of temporary unused annotation to shut the
compiler/kbuild robot up but the patch itself looks fine.

Reviewed-by: Alistair Popple 

On Wed, 21 Jun 2017 05:18:03 PM Russell Currey wrote:
> Add a helper that determines if all the devices contained in a given PE
> are all from the same vendor or not.  This can be useful in determining
> if it's okay to make PE-wide changes that may be suitable for some
> devices but not for others.
> 
> This is used later in the series.
> 
> Signed-off-by: Russell Currey 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 283caf1070c9..13835ac30795 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1718,6 +1718,31 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
> *phb, struct pci_dev *pdev
>*/
>  }
>  
> +static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
> +{
> + unsigned short vendor = 0;
> + struct pci_dev *pdev;
> +
> + if (pe->device_count == 1)
> + return true;
> +
> + /* pe->pdev should be set if it's a single device, pe->pbus if not */
> + if (!pe->pbus)
> + return true;
> +
> + list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
> + if (!vendor) {
> + vendor = pdev->vendor;
> + continue;
> + }
> +
> + if (pdev->vendor != vendor)
> + return false;
> + }
> +
> + return true;
> +}
> +
>  static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
>  {
>   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> 



Re: [PATCH] kernel/power/suspend: use CONFIG_HAVE_SET_MEMORY for include condition

2017-06-25 Thread Balbir Singh
On Sat, Jun 3, 2017 at 11:27 PM, Pavel Machek  wrote:
> On Sat 2017-06-03 20:52:32, Balbir Singh wrote:
>> Kbuild reported a build failure when CONFIG_STRICT_KERNEL_RWX was
>> enabled on powerpc. We don't yet have ARCH_HAS_SET_MEMORY and ppc32
>> saw a build failure.
>>
>> fixes(50327dd kernel/power/snapshot.c: use set_memory.h header)
>>
>> I've only done a basic compile test with a config that has
>> hibernation enabled.
>>
>> Cc: "Rafael J. Wysocki" 
>> Cc: Len Brown 
> Acked-by: Pavel Machek 

Ping. Could we please pick this up? it breaks any attempt to support
STRICT_KERNEL_RWX on powerpc

Balbir Singh.


Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)

2017-06-25 Thread Michael Ellerman
Paolo Abeni  writes:
> Thank you!
>
> I'll submit formally the patch after some more testing.

Thanks.

> I noticed this version has entered the ppc patchwork, but I think that
> the formal submission should go towards the net-next tree.

Yeah it picks up all patches sent to the list. That's fine I'll just
mark it "Not applicable", and expect to see it arrive via net-next.

cheers


Re: 1M hugepage size being registered on Linux

2017-06-25 Thread Michael Ellerman
victora  writes:
> Em 2017-06-22 00:59, Michael Ellerman escreveu:
>> 
>> We merged a patch from Aneesh to filter it out in 4.12-rc1:
>> 
>>   a525108cf1cc ("powerpc/mm/hugetlb: Filter out hugepage size not
>> supported by page table layout")
>> 
>> I guess we should probably send that patch to stable et. al.
  ^
  :)
>
> Sorry for the delay. Thanks for merging that patch.
> Was that patch also sent to stable et. al.?

No it wasn't.

cheers


Re: [PATCH v3 2/6] powerpc/vmemmap: Reshuffle vmemmap_free()

2017-06-25 Thread Balbir Singh
On Fri, 23 Jun 2017 18:31:18 +1000
Oliver O'Halloran  wrote:

> Removes an indentation level and shuffles some code around to make the
> following patch cleaner. No functional changes.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
> v1 -> v2: Remove broken initialiser
> ---
>  arch/powerpc/mm/init_64.c | 48 
> ---
>  1 file changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index ec84b31c6c86..8851e4f5dbab 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -234,13 +234,15 @@ static unsigned long vmemmap_list_free(unsigned long 
> start)
>  void __ref vmemmap_free(unsigned long start, unsigned long end)
>  {
>   unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
> + unsigned long page_order = get_order(page_size);
>  
>   start = _ALIGN_DOWN(start, page_size);
>  
>   pr_debug("vmemmap_free %lx...%lx\n", start, end);
>  
>   for (; start < end; start += page_size) {
> - unsigned long addr;
> + unsigned long nr_pages, addr;
> + struct page *page;
>  
>   /*
>* the section has already be marked as invalid, so
> @@ -251,29 +253,29 @@ void __ref vmemmap_free(unsigned long start, unsigned 
> long end)
>   continue;
>  
>   addr = vmemmap_list_free(start);
> - if (addr) {
> - struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
> -
> - if (PageReserved(page)) {
> - /* allocated from bootmem */
> - if (page_size < PAGE_SIZE) {
> - /*
> -  * this shouldn't happen, but if it is
> -  * the case, leave the memory there
> -  */
> - WARN_ON_ONCE(1);
> - } else {
> - unsigned int nr_pages =
> - 1 << get_order(page_size);
> - while (nr_pages--)
> - free_reserved_page(page++);
> - }
> - } else
> - free_pages((unsigned long)(__va(addr)),
> - get_order(page_size));
> -
> - vmemmap_remove_mapping(start, page_size);
> + if (!addr)
> + continue;
> +
> + page = pfn_to_page(addr >> PAGE_SHIFT);
> + nr_pages = 1 << page_order;
> +
> + if (PageReserved(page)) {
> + /* allocated from bootmem */
> + if (page_size < PAGE_SIZE) {
> + /*
> +  * this shouldn't happen, but if it is
> +  * the case, leave the memory there
> +  */
> + WARN_ON_ONCE(1);
> + } else {
> + while (nr_pages--)
> + free_reserved_page(page++);
> + }
> + } else {
> + free_pages((unsigned long)(__va(addr)), page_order);
>   }
> +
> + vmemmap_remove_mapping(start, page_size);
>   }
>  }
>  #endif

Reviewed-by: Balbir Singh 

Balbir Singh.


Re: [RFC v3 02/23] powerpc: introduce set_hidx_slot helper

2017-06-25 Thread Balbir Singh
On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> Introduce set_hidx_slot() which sets the (H_PAGE_F_SECOND|H_PAGE_F_GIX)
> bits at  the  appropriate  location  in  the  PTE  of  4K  PTE.  In the
> case of 64K PTE, it sets the bits in the second part of the PTE. Though
> the implementation for the former just needs the slot parameter, it does
> take some additional parameters to keep the prototype consistent.
> 
> This function will come in handy as we  work  towards  re-arranging the
> bits in the later patches.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 16 
>  2 files changed, 23 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index 9c2c8f1..cef644c 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -55,6 +55,13 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
>  }
>  #endif
>  
> +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> + unsigned int subpg_index, unsigned long slot)
> +{
> + return (slot << H_PAGE_F_GIX_SHIFT) &
> + (H_PAGE_F_SECOND | H_PAGE_F_GIX);
> +}
> +

A comment on top would help explain that 4k and 64k are different, 64k
is a new layout.

>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  
>  static inline char *get_hpte_slot_array(pmd_t *pmdp)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 3f49941..4bac70a 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -75,6 +75,22 @@ static inline unsigned long __rpte_to_hidx(real_pte_t 
> rpte, unsigned long index)
>   return (pte_val(rpte.pte) >> H_PAGE_F_GIX_SHIFT) & 0xf;
>  }
>  
> +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> + unsigned int subpg_index, unsigned long slot)
> +{
> + unsigned long *hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
> +
> + rpte.hidx &= ~(0xfUL << (subpg_index << 2));
> + *hidxp = rpte.hidx  | (slot << (subpg_index << 2));
> + /*
> +  * Avoid race with __real_pte()
> +  * hidx must be committed to memory before committing
> +  * the pte.
> +  */
> + smp_wmb();

Whats the other paired barrier, is it in set_pte()?

> + return 0x0UL;
> +}

We return 0 here and slot information for 4k pages, it is not that
clear

Balbir Singh.



Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)

2017-06-25 Thread Al Viro
On Sun, Jun 25, 2017 at 04:44:09PM -0500, Segher Boessenkool wrote:

> Do you have a short stand-alone testcase?  4.6 is ancient, of course, but
> the actual problem may still exist in more recent compilers (if it _is_
> a compiler problem; if it's not, you *really* want to know :-) )

Enjoy.  At least 6.3 doesn't step into that.  Look for mtctr in the resulting
asm...

cat <<'EOF' >a.c
struct iovec
{
 void *iov_base;
 unsigned iov_len;
};

unsigned long v;

extern void * barf(void *,int,unsigned);

extern unsigned long bar(void *to, const void *from, unsigned long size);

static inline unsigned long __bar(void *to, const void *from, unsigned long n)
{
 unsigned long res = n;
 if (__builtin_expect(!!(((void)0,  unsigned long)(from)) <= v) && n)) 
== 0) || n)) - 1) <= (v - (( unsigned long)(from, 1))
  res = bar(to, from, n);
 if (res)
  barf(to + (n - res), 0, res);
 return res;
}

int foo(int type, const struct iovec * uvector,
 unsigned long nr_segs, unsigned long fast_segs,
 struct iovec *iov,
 struct iovec **ret_pointer)
{
 unsigned long seg;
 int ret;
 if (nr_segs == 0) {
  ret = 0;
  goto out;
 }
 if (nr_segs > 1024) {
  ret = -22;
  goto out;
 }
 if (__bar(iov, uvector, nr_segs*sizeof(*uvector))) {
  ret = -14;
  goto out;
 }
 ret = 0;
 for (seg = 0; seg < nr_segs; seg++) {
  void *buf = iov[seg].iov_base;
  int len = (int)iov[seg].iov_len;
  if (len < 0) {
   ret = -22;
   goto out;
  }
  if (type >= 0
  && __builtin_expect(!!(!((void)0,  unsigned long)(buf)) <= v) && 
len)) == 0) || len)) - 1) <= (v - (( unsigned long)(buf, 0)) {
   ret = -14;
   goto out;
  }
  ret += len;
 }
out:
 *ret_pointer = iov;
 return ret;
}
EOF
powerpc-linux-gcc -m32 -fno-strict-aliasing -fno-common -std=gnu89 -fno-PIE 
-msoft-float -pipe -ffixed-r2 -mmultiple -mno-altivec -mno-vsx -mno-spe 
-mspe=no -funit-at-a-time -fno-dwarf2-cfi-asm -mno-string -mcpu=powerpc 
-Wa,-maltivec -mbig-endian -fno-delete-null-pointer-checks -Os 
-fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer 
-fno-var-tracking-assignments -femit-struct-debug-baseonly -fno-var-tracking 
-fno-strict-overflow -fconserve-stack -fverbose-asm -S a.c


Re: drivers:soc:fsl:qbman:qman.c: Change a comment for an entry check inside drain_mr_fqrni function

2017-06-25 Thread karim eshapa
>Do you mean "sufficient" here rather than "efficient"?  It's far less
>inefficient than what the code was previously doing, but still...

Yes, I'm gonna send a new fix for the comment patch and
change the subject of the previous patch soc/qman

Thanks,
Karim


On 25 June 2017 at 04:49, Scott Wood  wrote:

> On Fri, May 05, 2017 at 10:05:56AM +0200, Karim Eshapa wrote:
> > Change the comment for an entry check inside function
> > drain_mr_fqrni() with sleep for sufficient period
> > of time instead of long time proccessor cycles.
> >
> > Signed-off-by: Karim Eshapa 
> > ---
> >  drivers/soc/fsl/qbman/qman.c | 25 +
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
> > index 18d391e..636a7d7 100644
> > --- a/drivers/soc/fsl/qbman/qman.c
> > +++ b/drivers/soc/fsl/qbman/qman.c
> > @@ -1071,18 +1071,19 @@ static int drain_mr_fqrni(struct qm_portal *p)
> >   msg = qm_mr_current(p);
> >   if (!msg) {
> >   /*
> > -  * if MR was full and h/w had other FQRNI entries to
> produce, we
> > -  * need to allow it time to produce those entries once the
> > -  * existing entries are consumed. A worst-case situation
> > -  * (fully-loaded system) means h/w sequencers may have to
> do 3-4
> > -  * other things before servicing the portal's MR pump,
> each of
> > -  * which (if slow) may take ~50 qman cycles (which is ~200
> > -  * processor cycles). So rounding up and then multiplying
> this
> > -  * worst-case estimate by a factor of 10, just to be
> > -  * ultra-paranoid, goes as high as 10,000 cycles. NB, we
> consume
> > -  * one entry at a time, so h/w has an opportunity to
> produce new
> > -  * entries well before the ring has been fully consumed, so
> > -  * we're being *really* paranoid here.
> > +  * if MR was full and h/w had other FQRNI entries to
> > +  * produce, we need to allow it time to produce those
> > +  * entries once the existing entries are consumed.
> > +  * A worst-case situation (fully-loaded system) means
> > +  * h/w sequencers may have to do 3-4 other things
> > +  * before servicing the portal's MR pump, each of
> > +  * which (if slow) may take ~50 qman cycles
> > +  * (which is ~200 processor cycles). So sleep with
> > +  * 1 ms would be very efficient, after this period
> > +  * we can check if there is something produced.
> > +  * NB, we consume one entry at a time, so h/w has
> > +  * an opportunity to produce new entries well before
> > +  * the ring has been fully consumed.
>
> Do you mean "sufficient" here rather than "efficient"?  It's far less
> inefficient than what the code was previously doing, but still...
>
> Otherwise, looks good.
>
> -Scott
>


Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)

2017-06-25 Thread Segher Boessenkool
On Sun, Jun 25, 2017 at 09:53:24PM +0100, Al Viro wrote:
> Confirmed.  It manages to bugger the loop immediately after the (successful)
> copying of iovec array in rw_copy_check_uvector(); both with and without
> INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27
> set to nr_segs * sizeof(struct iovec).  The call is made, we check that it
> has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER
> we have (interleaved with unrelated insns)
> addi 27,27,-8
> srwi 27,27,3
> addi 27,27,1
> mtctr 27
> Weird, but manages to pass nr_segs to mtctr.

This weirdosity is https://gcc.gnu.org/PR67288 .  Those three instructions
are not the same as just  srwi 27,27,3  in case r27 is 0; GCC does not
figure out this cannot happen here.

> _With_ INLINE_COPY_FROM_USER we
> get this:
> lis 9,0x2000
> mtctr 9
> In other words, the loop will try to go through 8192 iterations.  No idea 
> where
> that number has come from, but it sure as hell is wrong.

8192*65535, even.  This is as if r27 was 0 always.

Do you have a short stand-alone testcase?  4.6 is ancient, of course, but
the actual problem may still exist in more recent compilers (if it _is_
a compiler problem; if it's not, you *really* want to know :-) )


Segher


gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)

2017-06-25 Thread Al Viro
On Sun, Jun 25, 2017 at 12:14:04PM +0100, Al Viro wrote:
> On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote:
> > On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote:
> > 
> > > I made a break through. If I turn off inline copy to/from users for 32-bit
> > > ppc with the following patch, then the system boots:
> > 
> > OK...  So it's 4.6.3 miscompiling something - it is hardware-independent,
> > reproduced in qemu.  I'd like to get more self-contained example of
> > miscompile, though; should be done by tonight...
> 
> OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER
> it's miscompiled by 4.6.3.  I hadn't looked through the generated code
> yet; will do that after I grab some sleep.

Confirmed.  It manages to bugger the loop immediately after the (successful)
copying of iovec array in rw_copy_check_uvector(); both with and without
INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27
set to nr_segs * sizeof(struct iovec).  The call is made, we check that it
has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER
we have (interleaved with unrelated insns)
addi 27,27,-8
srwi 27,27,3
addi 27,27,1
mtctr 27
Weird, but manages to pass nr_segs to mtctr.  _With_ INLINE_COPY_FROM_USER we
get this:
lis 9,0x2000
mtctr 9
In other words, the loop will try to go through 8192 iterations.  No idea where
that number has come from, but it sure as hell is wrong.  That's where those
-EINVAL, etc. are coming from - we run into something negative in iov[seg].len,
after having run out of on-stack iovec array.

Assembler generated out of rw_copy_check_uvector() with and without
INLINE_COPY_FROM_USER is attached; it's a definite miscompile.  Neither 4.4.5
nor 6.3.0 use mtctr/bdnz for that loop.

The bottom line is, ppc cross-toolchain on kernel.org happens to be
the version that miscompiles rw_copy_check_uvector() with INLINE_COPY_FROM_USER
and hell knows what else.  Said that, I would rather have ppc32 drop the
INLINE_COPY_{TO,FROM}_USER anyway; that won't fix any other places where
the same 4.6.3 bug hits, but I seriously suspect that it will end up being
faster even on non^Wless buggy gcc versions.  Could powerpc folks check
what does removing those two defines from arch/powerpc/include/asm/uaccess.h
do to performance?  If there's no slowdown, I would strongly recommend just
removing those as in the patch Larry has posted upthread.

Fixing whatever it is in gcc 4.6.3 that triggers that behaviour is
IMO pointless - it might make sense to switch kernel.org cross-toolchain to
something more recent, but that's it.
.globl rw_copy_check_uvector
.type   rw_copy_check_uvector, @function
rw_copy_check_uvector:
.LFB2683:
.loc 1 773 0
stwu 1,-32(1)#,,
.LCFI142:
mflr 0   #,
.LCFI143:
stmw 27,12(1)#,
.LCFI144:
.loc 1 783 0
mr. 27,5 # nr_segs, nr_segs
.loc 1 773 0
mr 30,3  # type, type
stw 0,36(1)  #,
.LCFI145:
.loc 1 773 0
mr 31,4  # uvector, uvector
mr 29,8  # ret_pointer, ret_pointer
.loc 1 776 0
mr 28,7  # iov, fast_pointer
.loc 1 784 0
li 0,0   # ret,
.loc 1 783 0
beq- 0,.L495 #
.loc 1 792 0
cmplwi 7,27,1024 #, tmp160, nr_segs
.loc 1 793 0
li 0,-22 # ret,
.loc 1 792 0
bgt- 7,.L495 #
.loc 1 796 0
cmplw 7,27,6 # fast_segs, tmp161, nr_segs
ble- 7,.L496 #
.LBB1538:
.LBB1539:
.file 21 "./include/linux/slab.h"
.loc 21 495 0
lis 4,0x140  # tmp190,
slwi 3,27,3  #, nr_segs,
ori 4,4,192  #,, tmp190,
bl __kmalloc #
.LBE1539:
.LBE1538:
.loc 1 799 0
li 0,-12 # ret,
.loc 1 798 0
mr. 28,3 # iov,
beq- 0,.L495 #
.L496:
.LBB1540:
.LBB1541:
.LBB1542:
.LBB1543:
.loc 19 113 0
lwz 0,1128(2)# current.192_185->thread.fs.seg, D.39493
.LBE1543:
.LBE1542:
.LBE1541:
.LBE1540:
.loc 1 803 0
slwi 27,27,3 # n, nr_segs,
.LBB1549:
.LBB1548:
.LBB1547:
.LBB1546:
mr 5,27  # n, n
.loc 19 113 0
cmplw 7,31,0 # D.39493, tmp165, uvector
bgt- 7,.L497 #
addi 9,27,-1 # tmp166, n,
subf 0,31,0  # tmp167, uvector, D.39493
cmplw 7,9,0  # tmp167, tmp168, tmp166
bgt- 7,.L497 #
.LBB1544:
.LBB1545:
.file 22 "./arch/powerpc/include/asm/uaccess.h"
.loc 22 305 0
mr 3,28  #, iov
mr 4,31  #, uvector
bl __copy_tofrom_user#
.LBE1545:
.LBE1544:
.loc 19 115 0
mr. 5,3  # n,
beq+ 0,.L498 #
.L497:
.loc 19 116 0
subf 3,5,27  # tmp170, n, n
li 4,0   #,
add 3,28,3   #, iov, tmp170
bl memset#
b

[PATCH] powerpc: Only do ERAT invalidate on radix context switch on P9 DD1

2017-06-25 Thread Benjamin Herrenschmidt
From: Michael Neuling 

On P9 (Nimbus) DD2 and later, in radix mode, the move to the PID
register will implicitly invalidate the user space ERAT entries
and leave the kernel ones alone. Thus the only thing needed is
an isync() to synchronize this with subsequent uaccess's 

Signed-off-by: Michael Neuling 
Signed-off-by: Benjamin Herrenschmidt 

diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index a3edf813d4..71de2c6d88 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -235,10 +235,15 @@ void destroy_context(struct mm_struct *mm)
 #ifdef CONFIG_PPC_RADIX_MMU
 void radix__switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 {
-   asm volatile("isync": : :"memory");
-   mtspr(SPRN_PID, next->context.id);
-   asm volatile("isync \n"
-PPC_SLBIA(0x7)
-: : :"memory");
+
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   isync();
+   mtspr(SPRN_PID, next->context.id);
+   isync();
+   asm volatile(PPC_INVALIDATE_ERAT : : :"memory");
+   } else {
+   mtspr(SPRN_PID, next->context.id);
+   isync();
+   }
 }
 #endif


[PATCH] powerpc/powernv: Tell OPAL about our MMU mode

2017-06-25 Thread Benjamin Herrenschmidt
That will allow OPAL to configure the CPU in an optimal way.

Signed-off-by: Benjamin Herrenschmidt 
---

The matching OPAL change has been sent to the skiboot list.

Setting those bits in the reinit() call with an older OPAL
will result in the call returning an error which Linux ignores
but it will still work in the sense that it will still honor
the other flags it understands (the endian switch ones).

 arch/powerpc/include/asm/opal-api.h|  9 +
 arch/powerpc/platforms/powernv/opal.c  | 14 --
 arch/powerpc/platforms/powernv/setup.c |  6 +-
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index cb3e624..85e6d88 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -805,6 +805,15 @@ struct OpalIoPhb3ErrorData {
 enum {
OPAL_REINIT_CPUS_HILE_BE= (1 << 0),
OPAL_REINIT_CPUS_HILE_LE= (1 << 1),
+
+   /* These two define the base MMU mode of the host on P9
+*
+* On P9 Nimbus DD2.0 and Cumlus (and later), KVM can still
+* create hash guests in "radix" mode with care (full core
+* switch only).
+*/
+   OPAL_REINIT_CPUS_MMU_HASH   = (1 << 2),
+   OPAL_REINIT_CPUS_MMU_RADIX  = (1 << 3),
 };
 
 typedef struct oppanel_line {
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 59684b4..e522d6b 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -59,6 +59,8 @@ static struct task_struct *kopald_tsk;
 
 void opal_configure_cores(void)
 {
+   uint64_t reinit_flags = 0;
+
/* Do the actual re-init, This will clobber all FPRs, VRs, etc...
 *
 * It will preserve non volatile GPRs and HSPRG0/1. It will
@@ -66,11 +68,19 @@ void opal_configure_cores(void)
 * but it might clobber a bunch.
 */
 #ifdef __BIG_ENDIAN__
-   opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);
+   reinit_flags |= OPAL_REINIT_CPUS_HILE_BE;
 #else
-   opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_LE);
+   reinit_flags |= OPAL_REINIT_CPUS_HILE_LE;
 #endif
 
+   /* Radix MMU */
+   if (early_radix_enabled())
+   reinit_flags |= OPAL_REINIT_CPUS_MMU_RADIX;
+   else
+   reinit_flags |= OPAL_REINIT_CPUS_MMU_HASH;
+
+   opal_reinit_cpus(reinit_flags);
+
/* Restore some bits */
if (cur_cpu_spec->cpu_restore)
cur_cpu_spec->cpu_restore();
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 2dc7e5f..d1cef70 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -254,8 +254,12 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int 
secondary)
 * We might be running as little-endian - now that interrupts
 * are disabled, reset the HILE bit to big-endian so we don't
 * take interrupts in the wrong endian later
+*
+* We also switch to radix mode on P9 as this is compatible
+* with hash and will allow earlier kernels to boot.
 */
-   opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);
+   opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE |
+OPAL_REINIT_CPUS_MMU_RADIX);
}
 }
 #endif /* CONFIG_KEXEC_CORE */



[PATCH] powerpc/sysfs: Expose MMCR2 spr in sysfs

2017-06-25 Thread Madhavan Srinivasan
Monitor Mode Control Register 2 (MMCR2) is a 64-bit
register that contains 9-bit control fields for
controlling the operation of PMC1 - PMC6. Patch
to expose the MMCR2 spr in sysfs.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/kernel/sysfs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 4437c70c7c2b..587eb3a7b5da 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -482,6 +482,7 @@ SYSFS_PMCSETUP(pmc7, SPRN_PMC7);
 SYSFS_PMCSETUP(pmc8, SPRN_PMC8);
 
 SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
+SYSFS_PMCSETUP(mmcr2, SPRN_MMCR2);
 SYSFS_SPRSETUP(purr, SPRN_PURR);
 SYSFS_SPRSETUP(spurr, SPRN_SPURR);
 SYSFS_SPRSETUP(pir, SPRN_PIR);
@@ -492,6 +493,7 @@ SYSFS_SPRSETUP(pir, SPRN_PIR);
   Lets be conservative and default to pseries.
 */
 static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
+static DEVICE_ATTR(mmcr2, 0600, show_mmcr2, store_mmcr2);
 static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
 static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
 static DEVICE_ATTR(pir, 0400, show_pir, NULL);
@@ -760,6 +762,9 @@ static int register_cpu_online(unsigned int cpu)
if (cpu_has_feature(CPU_FTR_MMCRA))
device_create_file(s, &dev_attr_mmcra);
 
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   device_create_file(s, &dev_attr_mmcr2);
+
if (cpu_has_feature(CPU_FTR_PURR)) {
if (!firmware_has_feature(FW_FEATURE_LPAR))
add_write_permission_dev_attr(&dev_attr_purr);
@@ -845,6 +850,9 @@ static int unregister_cpu_online(unsigned int cpu)
if (cpu_has_feature(CPU_FTR_MMCRA))
device_remove_file(s, &dev_attr_mmcra);
 
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   device_remove_file(s, &dev_attr_mmcr2);
+
if (cpu_has_feature(CPU_FTR_PURR))
device_remove_file(s, &dev_attr_purr);
 
-- 
2.7.4



[PATCH] powerpc/perf: Fix branch event code for power9

2017-06-25 Thread Madhavan Srinivasan
Correct "branch" event code of Power9 is "r4d05e".
Replace the current "branch" event code with "r4d05e"
and add a hack to use "r10012" as event code for
power9 dd1.

Fixes: d89f473ff6f8 ("powerpc/perf: Fix PM_BRU_CMPL event code for power9")
Reported-by: Anton Blanchard 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-events-list.h | 4 +++-
 arch/powerpc/perf/power9-pmu.c | 8 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/power9-events-list.h 
b/arch/powerpc/perf/power9-events-list.h
index 71a6bfee5c02..80204e064362 100644
--- a/arch/powerpc/perf/power9-events-list.h
+++ b/arch/powerpc/perf/power9-events-list.h
@@ -16,7 +16,7 @@ EVENT(PM_CYC, 0x0001e)
 EVENT(PM_ICT_NOSLOT_CYC,   0x100f8)
 EVENT(PM_CMPLU_STALL,  0x1e054)
 EVENT(PM_INST_CMPL,0x2)
-EVENT(PM_BRU_CMPL, 0x10012)
+EVENT(PM_BRU_CMPL, 0x4d05e)
 EVENT(PM_BR_MPRED_CMPL,0x400f6)
 
 /* All L1 D cache load references counted at finish, gated by reject */
@@ -56,3 +56,5 @@ EVENT(PM_RUN_CYC, 0x600f4)
 /* Instruction Dispatched */
 EVENT(PM_INST_DISP,0x200f2)
 EVENT(PM_INST_DISP_ALT,0x300f2)
+/* Alternate Branch event code */
+EVENT(PM_BR_CMPL_ALT,  0x10012)
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index bb28e1a41257..f17435e4a489 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -231,7 +231,7 @@ static int power9_generic_events_dd1[] = {
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =   PM_ICT_NOSLOT_CYC,
[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =PM_CMPLU_STALL,
[PERF_COUNT_HW_INSTRUCTIONS] =  PM_INST_DISP,
-   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PM_BRU_CMPL,
+   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PM_BR_CMPL_ALT,
[PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_CMPL,
[PERF_COUNT_HW_CACHE_REFERENCES] =  PM_LD_REF_L1,
[PERF_COUNT_HW_CACHE_MISSES] =  PM_LD_MISS_L1_FIN,
@@ -453,6 +453,12 @@ static int __init init_power9_pmu(void)
 * sampling scenarios in power9 DD1, instead use PM_INST_DISP.
 */
EVENT_VAR(PM_INST_CMPL, _g).id = PM_INST_DISP;
+   /*
+* Power9 DD1 should use PM_BR_CMPL_ALT event code for
+* "branches" to provide correct counter value.
+*/
+   EVENT_VAR(PM_BRU_CMPL, _g).id = PM_BR_CMPL_ALT;
+   EVENT_VAR(PM_BRU_CMPL, _c).id = PM_BR_CMPL_ALT;
rc = register_power_pmu(&power9_isa207_pmu);
} else {
rc = register_power_pmu(&power9_pmu);
-- 
2.7.4



[PATCH] soc/qman: Change a comment for an entry check insid drain_mr_fqrni function

2017-06-25 Thread Karim Eshapa
Change the comment for an entry check inside function
drain_mr_fqrni() with sleep for sufficient period
of time instead of long time proccessor cycles.

Signed-off-by: Karim Eshapa 
---
 drivers/soc/fsl/qbman/qman.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 18d391e..636a7d7 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1071,18 +1071,19 @@ static int drain_mr_fqrni(struct qm_portal *p)
msg = qm_mr_current(p);
if (!msg) {
/*
-* if MR was full and h/w had other FQRNI entries to produce, we
-* need to allow it time to produce those entries once the
-* existing entries are consumed. A worst-case situation
-* (fully-loaded system) means h/w sequencers may have to do 3-4
-* other things before servicing the portal's MR pump, each of
-* which (if slow) may take ~50 qman cycles (which is ~200
-* processor cycles). So rounding up and then multiplying this
-* worst-case estimate by a factor of 10, just to be
-* ultra-paranoid, goes as high as 10,000 cycles. NB, we consume
-* one entry at a time, so h/w has an opportunity to produce new
-* entries well before the ring has been fully consumed, so
-* we're being *really* paranoid here.
+* if MR was full and h/w had other FQRNI entries to
+* produce, we need to allow it time to produce those
+* entries once the existing entries are consumed.
+* A worst-case situation (fully-loaded system) means
+* h/w sequencers may have to do 3-4 other things
+* before servicing the portal's MR pump, each of
+* which (if slow) may take ~50 qman cycles
+* (which is ~200 processor cycles). So sleep with
+* 1 ms would be very sufficient, after this period
+* we can check if there is something produced.
+* NB, we consume one entry at a time, so h/w has
+* an opportunity to produce new entries well before
+* the ring has been fully consumed.
 */
msleep(1);
msg = qm_mr_current(p);
-- 
2.7.4



[PATCH] soc/qman: Sleep instead of stuck hacking jiffies.

2017-06-25 Thread Karim Eshapa
Use msleep() instead of stucking with
long delay will be more efficient.

Signed-off-by: Karim Eshapa 
---
 drivers/soc/fsl/qbman/qman.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 3d891db..18d391e 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1084,11 +1084,7 @@ static int drain_mr_fqrni(struct qm_portal *p)
 * entries well before the ring has been fully consumed, so
 * we're being *really* paranoid here.
 */
-   u64 now, then = jiffies;
-
-   do {
-   now = jiffies;
-   } while ((then + 1) > now);
+   msleep(1);
msg = qm_mr_current(p);
if (!msg)
return 0;
-- 
2.7.4



Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3

2017-06-25 Thread Al Viro
On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote:
> On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote:
> 
> > I made a break through. If I turn off inline copy to/from users for 32-bit
> > ppc with the following patch, then the system boots:
> 
> OK...  So it's 4.6.3 miscompiling something - it is hardware-independent,
> reproduced in qemu.  I'd like to get more self-contained example of
> miscompile, though; should be done by tonight...

OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER
it's miscompiled by 4.6.3.  I hadn't looked through the generated code
yet; will do that after I grab some sleep.


Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3

2017-06-25 Thread Al Viro
On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote:

> I made a break through. If I turn off inline copy to/from users for 32-bit
> ppc with the following patch, then the system boots:

OK...  So it's 4.6.3 miscompiling something - it is hardware-independent,
reproduced in qemu.  I'd like to get more self-contained example of
miscompile, though; should be done by tonight...