Re: [PATCH net-next v5 12/12] sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW

2019-02-08 Thread Deepa Dinamani
> You touched powerpc in the previous patch but not this one.
>
> That's because we use the asm-generic version I assume.

That is correct.

> Would be good to mention in the change log though to avoid any confusion.

I'm not sure how to do that now. It looks like the series has already
been applied to net-next with a couple of merge conflicts fixed.

-Deepa


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.0-4 tag

2019-02-08 Thread pr-tracker-bot
The pull request you sent on Sat, 09 Feb 2019 00:12:56 +1100:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.0-4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/820828bffeb11eee41e197a0c9be1b72afa37482

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-08 Thread Linus Torvalds
On Fri, Feb 8, 2019 at 12:31 PM Waiman Long  wrote:
>
> >  (b) what's the new fastpath case
>
> The only change in the fastpath is the use of cmpxchg for writer lock.

.. since a big deal here was about using the generic atomic accessor
functions, I really was looking forward to seeing the *actual* fast
path code generation.

In other words, right now I have very little visibility in how it
actually affects the code. Looking at the patches themselves doesn't
make it obvious. I was hoping for the overview to really explain the
whole "before and after" situation, and it didn't. Not at the high
level, and not at a low level. And no performance numbers in the
overview either.

And yes, I see the numbers in the patches, but what I really hoped for
was some real load numbers. In particular, I would have loved to see
numbers from th ekernel test robot "will-it-scale.per_thread_ops"
case, which is the one that had a 65% regression due to the lack of
reader spinning.

So I was kind of hoping to hear whether that regression is basically
entirely gone with this patch series, or if we still have a regression
due to the extra downgrade, or what?

 Linus


Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device

2019-02-08 Thread Paul Mackerras
On Fri, Feb 08, 2019 at 08:58:14AM +0100, Cédric Le Goater wrote:
> On 2/8/19 6:15 AM, David Gibson wrote:
> > On Thu, Feb 07, 2019 at 10:03:15AM +0100, Cédric Le Goater wrote:
> >> That's the plan I have in mind as suggested by Paul if I understood it 
> >> well.
> >> The mechanics are more complex than the patch zapping the PTEs from the VMA
> >> but it's also safer.
> > 
> > Well, yes, where "safer" means "has the possibility to be correct".
> 
> Well, the only problem with the kernel approach is keeping a pointer on 
> the VMA. If we could call find_vma(), it would be perfectly safe and much 
> more simpler.

You seem to be assuming that the kernel can easily work out a single
virtual address which will be the only place where a given set of
interrupt pages are mapped.  But that is really not possible in the
general case, because userspace could have mapped the fd at many
different offsets in many different places.

QEMU doesn't do that; in QEMU, the mmaps are sufficiently limited that
it can work out a single virtual address that needs to be changed.
The way that QEMU should tell the kernel what that address is and what
the mapping should be changed to, is via the existing munmap()/mmap()
interface.

Paul.


Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-08 Thread Waiman Long
On 02/08/2019 02:50 PM, Linus Torvalds wrote:
> On Thu, Feb 7, 2019 at 11:08 AM Waiman Long  wrote:
>> This patchset revamps the current rwsem-xadd implementation to make
>> it saner and easier to work with. This patchset removes all the
>> architecture specific assembly code and uses generic C code for all
>> architectures. This eases maintenance and enables us to enhance the
>> code more easily.
>>
>> This patchset also implements the following 3 new features:
>>
>>  1) Waiter lock handoff
>>  2) Reader optimistic spinning
>>  3) Store write-lock owner in the atomic count (x86-64 only)
> The patches are kind of hard to read, with most of them just doing
> prep-work that doesn't necessarily matter to the big picture.
>
> What I'd really like to see is
>
>  (a) an overview of the new locking logic

The new locking logic is similar to qrwlock (see patch 11). Cmpxchg is
used to acquire the write lock, while xadd is still used for read lock.
Some of the bits in the count are also reserved for special purpose like
has waiter or lock handoff. Patch 15 tries to compress the write-lock
owner task pointer and put it into the count field for x86-64 at the
expense of less bits available for reader count. I have sent out an
additional patch this morning to make sure that the reader count won't
overflow.

In term of performance, there isn't much change with respect to
read-lock performance. For write-lock, I saw a slight drop in some
cases, but nothing significant. The merging of owner task pointer into
the count field does impose a slightly bigger drop than I would have
liked which I am going to look into a bit more.

>
>  (b) what's the new fastpath case

The only change in the fastpath is the use of cmpxchg for writer lock.

>
>  (c) some performance numbers

There are performance data at patches 11, 12, 15, 19, 20, 21. There was
performance data for patch 4 as well for eliminating the arch specific
file. Apparently, I might have deleted it accidentally. Anyway, no
noticeable performance difference was observed when switching to use
generic C code for x86, ppc and ARM64.

The major gain in performance is due to reader optimistic spinning
patches. The microbenchmark that I used shown an order of magnitude of
performance improvement for mixed reader-writer workloads. Of course, we
will see less performance gain with real world benchmarks.

I am planning to run more performance test and post the data sometimes
next week. Davidlohr is also going to run some of his rwsem performance
test on this patchset.

>
> to explain the changes from a "this is the point of the whole
> exercise" standpoint.
>
> And yes, I realize that the lock handoff and optimistic spinning is a
> big deal, since I've seen the same regression numbers that presumably
> caused this effort to be resurrected. So it's not that I don't find
> this intriguing and worthwhile, it's literally that I'd like a summary
> not so much of the individual patches, but of the new model.
>
> Please?

Maybe I should break this patchset into a few smaller ones to make it
easier to review. Any suggestion is welcome.

Cheers,
Longman



Re: [PATCH 0/1] Start conversion of PowerPC docs

2019-02-08 Thread Jonathan Corbet
On Fri, 08 Feb 2019 14:40:28 +1100
Michael Ellerman  wrote:

> > - I don't think this should be a top-level directory full of docs; the top
> >   level is already rather overpopulated.  At worst, we should create an
> >   arch/ directory for architecture-specific docs.  
> 
> We currently have arch specific directories for arm, arm64, ia64, m68k,
> nios2, openrisc, parisc, powerpc, s390, sh, sparc, x86, xtensa.
> 
> Do you mean they should all be moved to Documentation/arch ?

Over time I'm really trying to bring some organization to Documentation/,
and to have that reflected in an RST tree that looks like somebody actually
thought about it.  So yes, I would eventually like to see something like
Documentation/arch, just like we have arch/ in the top-level directory.

> >   I kind of think that
> >   this should be thought through a bit more, though, with an eye toward
> >   who the audience is.  Some of it is clearly developer documentation, and
> >   some of it is aimed at admins; ptrace.rst is user-space API stuff.
> >   Nobody ever welcomes me saying this, but we should really split things
> >   into the appropriate manuals according to audience.  
> 
> I don't think any of it's aimed at admins, but I haven't read every
> word. I see it as aimed at kernel devs or people writing directly to the
> kernel API, eg. gdb developers reading ptrace.rst.
> 
> If Documentation/ wants to be more user focused and nicely curated
> perhaps we need arch/foo/docs/ for these developer centric docs?

Stuff for GDB developers is best placed in the userspace-api docbook; we're
trying to concentrate that there.  Stuff for kernel developers is a bit
more diffuse still; arch/foo/docs may end up being the best place for it in
the end, yes.

> > - It would be good to know how much of this stuff is still relevant.
> >   bootwrapper.txt hasn't been modified since it was added in 2008.  
> 
> It hasn't been modified but AFAIK it's still pretty much accurate and
> definitely something we want to have documented.

That's fine for this (and all the others); I'm just hoping that somebody
has thought about it.  We're carrying a *lot* of dusty old stuff that, IMO,
can only serve to confuse those who read it.  If these files don't fall
into that category, that's great.

> We support some hardware that is ~25 years old, so we have some old
> documentation too, and I'd rather we didn't drop things just because
> they're old.

I agree, as long as they remain correct and relevant.

> > - I'm glad you're adding SPDX lines, but do you know that the license is
> >   correct in each case?  It's best to be careful with such things.  
> 
> None of the files have licenses so I think we just fall back to COPYING
> don't we? In which case GPL-2.0 is correct for all files.

That's often the best choice, though some people have resorted to some
rather more in-depth archeology to try to figure out what the original
author actually intended.  Again, I'm just asking so that we're sure it's
the best choice.

Thanks,

jon


Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-08 Thread Linus Torvalds
On Thu, Feb 7, 2019 at 11:08 AM Waiman Long  wrote:
>
> This patchset revamps the current rwsem-xadd implementation to make
> it saner and easier to work with. This patchset removes all the
> architecture specific assembly code and uses generic C code for all
> architectures. This eases maintenance and enables us to enhance the
> code more easily.
>
> This patchset also implements the following 3 new features:
>
>  1) Waiter lock handoff
>  2) Reader optimistic spinning
>  3) Store write-lock owner in the atomic count (x86-64 only)

The patches are kind of hard to read, with most of them just doing
prep-work that doesn't necessarily matter to the big picture.

What I'd really like to see is

 (a) an overview of the new locking logic

 (b) what's the new fastpath case

 (c) some performance numbers

to explain the changes from a "this is the point of the whole
exercise" standpoint.

And yes, I realize that the lock handoff and optimistic spinning is a
big deal, since I've seen the same regression numbers that presumably
caused this effort to be resurrected. So it's not that I don't find
this intriguing and worthwhile, it's literally that I'd like a summary
not so much of the individual patches, but of the new model.

Please?

 Linus


Re: [PATCH v03] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update

2019-02-08 Thread Michael Bringmann
On 2/7/19 11:44 PM, Srikar Dronamraju wrote:
>>
>>  int arch_update_cpu_topology(void)
>>  {
>> -return numa_update_cpu_topology(true);
>> +int changed = topology_changed;
>> +
>> +topology_changed = 0;
>> +return changed;
>>  }
>>
> 
> Do we need Powerpc override for arch_update_cpu_topology() now?  That
> topology_changed sometime back doesn't seem to have help. The scheduler
> atleast now is neglecting whether the topology changed or not.

I was dealing with a a concurrency problem.  Revisiting again.
> 
> Also we can do away with the new topology_changed.
> 
>>  static void topology_work_fn(struct work_struct *work)
>>  {
>> -rebuild_sched_domains();
>> +lock_device_hotplug();
>> +if (numa_update_cpu_topology(true))
>> +rebuild_sched_domains();
>> +unlock_device_hotplug();
>>  }
> 
> Should this hunk be a separate patch by itself to say why
> rebuild_sched_domains with a changelog that explains why it should be under
> lock_device_hotplug? rebuild_sched_domains already takes cpuset_mutex. 
> So I am not sure if we need to take device_hotplug_lock.

topology_work_fn runs in its own thread like the DLPAR operations.
This patch adds calls to Nathan's 'dlpar_cpu_readd' from the topology_work_fn
thread.  The lock/unlock_device_hotplug guard against concurrency issues
with the DLPAR operations, grabbing that lock here to avoid overlap with
those other operations.  This mod is dependent upon using dlpar_cpu_readd.

> 
>>  static DECLARE_WORK(topology_work, topology_work_fn);
>>
>> -static void topology_schedule_update(void)
>> +void topology_schedule_update(void)
>>  {
>> -schedule_work(&topology_work);
>> +if (!topology_update_in_progress)
>> +schedule_work(&topology_work);
>>  }
>>
>>  static void topology_timer_fn(struct timer_list *unused)
>>  {
>> +bool sdo = false;
> 
> Is sdo any abbrevation?

'for do the schedule update'.  Will remove per below.

> 
>> +
>> +if (topology_scans < 1)
>> +bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
>> +nr_cpumask_bits);
> 
> Why do we need topology_scan? Just to make sure
> cpu_associativity_changes_mask is populated only once?
> cant we use a static bool inside the function for the same?

I was running into a race condition.  On one of my test systems,
start_topology_update via shared_proc_topology_init and the PHYP did
not provide any change info about the CPUs that early in the boot.
The first run erased the cpu bits in cpu_associativity_changes_mask,
and subsequent runs did not pay attention to the reported updates.
Taking another look.
> 
> 
>> +
>>  if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask))
>> -topology_schedule_update();
>> -else if (vphn_enabled) {
>> +sdo =  true;
>> +if (vphn_enabled) {
> 
> Any reason to remove the else above?
When vphn_enabled and prrn_enabled, it was not calling 
'update_cpu_associativity_changes_mask()',
so was not getting the necessary change info.

>>  if (update_cpu_associativity_changes_mask() > 0)
>> -topology_schedule_update();
>> +sdo =  true;
>>  reset_topology_timer();
>>  }
>> +if (sdo)
>> +topology_schedule_update();
>> +topology_scans++;
>>  }
> 
> Are the above two hunks necessary? Not getting how the current changes are
> different from the previous.
Not important.  Will undo.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-08 Thread Steven Rostedt
On Thu, Feb 07, 2019 at 11:46:29PM +1100, Michael Ellerman wrote:
> 
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 77740a506ebb..d5fb4f960271 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -198,6 +198,7 @@ u32 log_buf_len_get(void);
>  void log_buf_vmcoreinfo_setup(void);
>  void __init setup_log_buf(int early);
>  __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
> +__printf(1, 2) void dump_stack_add_arch_desc(const char *fmt, ...);
>  void dump_stack_print_info(const char *log_lvl);
>  void show_regs_print_info(const char *log_lvl);
>  extern asmlinkage void dump_stack(void) __cold;
> @@ -256,6 +257,10 @@ static inline __printf(1, 2) void 
> dump_stack_set_arch_desc(const char *fmt, ...)
>  {
>  }
>  
> +static inline __printf(1, 2) void dump_stack_add_arch_desc(const char *fmt, 
> ...)
> +{
> +}
> +
>  static inline void dump_stack_print_info(const char *log_lvl)
>  {
>  }
> diff --git a/lib/dump_stack.c b/lib/dump_stack.c
> index 5cff72f18c4a..69b710ff92b5 100644
> --- a/lib/dump_stack.c
> +++ b/lib/dump_stack.c
> @@ -35,6 +35,64 @@ void __init dump_stack_set_arch_desc(const char *fmt, ...)
>   va_end(args);
>  }
>  
> +/**
> + * dump_stack_add_arch_desc - add arch-specific info to show with task dumps
> + * @fmt: printf-style format string
> + * @...: arguments for the format string
> + *
> + * See dump_stack_set_arch_desc() for why you'd want to use this.
> + *
> + * This version adds to any existing string already created with either
> + * dump_stack_set_arch_desc() or dump_stack_add_arch_desc(). If there is an
> + * existing string a space will be prepended to the passed string.
> + */
> +void __init dump_stack_add_arch_desc(const char *fmt, ...)
> +{
> + va_list args;
> + int pos, len;
> + char *p;
> +
> + /*
> +  * If there's an existing string we snprintf() past the end of it, and
> +  * then turn the terminating NULL of the existing string into a space
> +  * to create one string separated by a space.
> +  *
> +  * If there's no existing string we just snprintf() to the buffer, like
> +  * dump_stack_set_arch_desc(), but without calling it because we'd need
> +  * a varargs version.
> +  */
> + len = strnlen(dump_stack_arch_desc_str, 
> sizeof(dump_stack_arch_desc_str));
> + pos = len;
> +
> + if (len)
> + pos++;
> +
> + if (pos >= sizeof(dump_stack_arch_desc_str))
> + return; /* Ran out of space */
> +
> + p = &dump_stack_arch_desc_str[pos];
> +
> + va_start(args, fmt);
> + vsnprintf(p, sizeof(dump_stack_arch_desc_str) - pos, fmt, args);
> + va_end(args);
> +
> + if (len) {
> + /*
> +  * Order the stores above in vsnprintf() vs the store of the
> +  * space below which joins the two strings. Note this doesn't
> +  * make the code truly race free because there is no barrier on
> +  * the read side. ie. Another CPU might load the uninitialised
> +  * tail of the buffer first and then the space below (rather
> +  * than the NULL that was there previously), and so print the
> +  * uninitialised tail. But the whole string lives in BSS so in
> +  * practice it should just see NULLs.
> +  */
> + smp_wmb();

This shows me that this can be called at a time when more than one CPU is
active. What happens if we have two CPUs calling dump_stack_add_arch_desc() at
the same time? Can't that corrupt the dump_stack_arch_desc_str?

-- Steve

> +
> + dump_stack_arch_desc_str[len] = ' ';
> + }
> +}
> +
>  /**
>   * dump_stack_print_info - print generic debug info for dump_stack()
>   * @log_lvl: log level
> -- 
> 2.20.1


Re: [PATCH v4 3/3] powerpc/32: Add KASAN support

2019-02-08 Thread Christophe Leroy

Hi Daniel,

Le 08/02/2019 à 17:18, Daniel Axtens a écrit :

Hi Christophe,

I've been attempting to port this to 64-bit Book3e nohash (e6500),
although I think I've ended up with an approach more similar to Aneesh's
much earlier (2015) series for book3s.

Part of this is just due to the changes between 32 and 64 bits - we need
to hack around the discontiguous mappings - but one thing that I'm
particularly puzzled by is what the kasan_early_init is supposed to do.


It should be a problem as my patch uses a 'for_each_memblock(memory, 
reg)' loop.





+void __init kasan_early_init(void)
+{
+   unsigned long addr = KASAN_SHADOW_START;
+   unsigned long end = KASAN_SHADOW_END;
+   unsigned long next;
+   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(addr), addr), addr);
+   int i;
+   phys_addr_t pa = __pa(kasan_early_shadow_page);
+
+   BUILD_BUG_ON(KASAN_SHADOW_START & ~PGDIR_MASK);
+
+   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
+   panic("KASAN not supported with Hash MMU\n");
+
+   for (i = 0; i < PTRS_PER_PTE; i++)
+   __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+kasan_early_shadow_pte + i,
+pfn_pte(PHYS_PFN(pa), PAGE_KERNEL_RO), 0);
+
+   do {
+   next = pgd_addr_end(addr, end);
+   pmd_populate_kernel(&init_mm, pmd, kasan_early_shadow_pte);
+   } while (pmd++, addr = next, addr != end);
+}


As far as I can tell it's mapping the early shadow page, read-only, over
the KASAN_SHADOW_START->KASAN_SHADOW_END range, and it's using the early
shadow PTE array from the generic code.

I haven't been able to find an answer to why this is in the docs, so I
was wondering if you or anyone else could explain the early part of
kasan init a bit better.


See https://www.kernel.org/doc/html/latest/dev-tools/kasan.html for an 
explanation of the shadow.


When shadow is 0, it means the memory area is entirely accessible.

It is necessary to setup a shadow area as soon as possible because all 
data accesses check the shadow area, from the begining (except for a few 
files where sanitizing has been disabled in Makefiles).


Until the real shadow area is set, all access are granted thanks to the 
zero shadow area beeing for of zeros.


I mainly used ARM arch as an exemple when I implemented KASAN for ppc32.



At the moment, I don't do any early init, and like Aneesh's series for
book3s, I end up needing a special flag to disable kasan until after
kasan_init. Also, as with Balbir's seris for Radix, some tests didn't
fire, although my missing tests are a superset of his. I suspect the
early init has something to do with these...?


I think you should really focus on establishing a zero shadow area as 
early as possible instead of trying to ack the core parts of KASAN.




(I'm happy to collate answers into a patch to the docs, btw!)


We can also have the discussion going via 
https://github.com/linuxppc/issues/issues/106




In the long term I hope to revive Aneesh's and Balbir's series for hash
and radix as well.


Great.

Christophe



Regards,
Daniel


+
+static void __init kasan_init_region(struct memblock_region *reg)
+{
+   void *start = __va(reg->base);
+   void *end = __va(reg->base + reg->size);
+   unsigned long k_start, k_end, k_cur, k_next;
+   pmd_t *pmd;
+
+   if (start >= end)
+   return;
+
+   k_start = (unsigned long)kasan_mem_to_shadow(start);
+   k_end = (unsigned long)kasan_mem_to_shadow(end);
+   pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
+
+   for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd++) {
+   k_next = pgd_addr_end(k_cur, k_end);
+   if ((void *)pmd_page_vaddr(*pmd) == kasan_early_shadow_pte) {
+   pte_t *new = pte_alloc_one_kernel(&init_mm);
+
+   if (!new)
+   panic("kasan: pte_alloc_one_kernel() failed");
+   memcpy(new, kasan_early_shadow_pte, PTE_TABLE_SIZE);
+   pmd_populate_kernel(&init_mm, pmd, new);
+   }
+   };
+
+   for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
+   void *va = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+   pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
+
+   if (!va)
+   panic("kasan: memblock_alloc() failed");
+   pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), k_cur);
+   pte_update(pte_offset_kernel(pmd, k_cur), ~0, pte_val(pte));
+   }
+   flush_tlb_kernel_range(k_start, k_end);
+}
+
+void __init kasan_init(void)
+{
+   struct memblock_region *reg;
+
+   for_each_memblock(memory, reg)
+   kasan_init_region(reg);
+
+   kasan_init_tags();
+
+   /* At this point kasan is fully initialized. Enable error messages */
+   ini

Re: [PATCH v4 3/3] powerpc/32: Add KASAN support

2019-02-08 Thread Daniel Axtens
Hi Christophe,

I've been attempting to port this to 64-bit Book3e nohash (e6500),
although I think I've ended up with an approach more similar to Aneesh's
much earlier (2015) series for book3s.

Part of this is just due to the changes between 32 and 64 bits - we need
to hack around the discontiguous mappings - but one thing that I'm
particularly puzzled by is what the kasan_early_init is supposed to do.

> +void __init kasan_early_init(void)
> +{
> + unsigned long addr = KASAN_SHADOW_START;
> + unsigned long end = KASAN_SHADOW_END;
> + unsigned long next;
> + pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(addr), addr), addr);
> + int i;
> + phys_addr_t pa = __pa(kasan_early_shadow_page);
> +
> + BUILD_BUG_ON(KASAN_SHADOW_START & ~PGDIR_MASK);
> +
> + if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
> + panic("KASAN not supported with Hash MMU\n");
> +
> + for (i = 0; i < PTRS_PER_PTE; i++)
> + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +  kasan_early_shadow_pte + i,
> +  pfn_pte(PHYS_PFN(pa), PAGE_KERNEL_RO), 0);
> +
> + do {
> + next = pgd_addr_end(addr, end);
> + pmd_populate_kernel(&init_mm, pmd, kasan_early_shadow_pte);
> + } while (pmd++, addr = next, addr != end);
> +}

As far as I can tell it's mapping the early shadow page, read-only, over
the KASAN_SHADOW_START->KASAN_SHADOW_END range, and it's using the early
shadow PTE array from the generic code.

I haven't been able to find an answer to why this is in the docs, so I
was wondering if you or anyone else could explain the early part of
kasan init a bit better.

At the moment, I don't do any early init, and like Aneesh's series for
book3s, I end up needing a special flag to disable kasan until after
kasan_init. Also, as with Balbir's seris for Radix, some tests didn't
fire, although my missing tests are a superset of his. I suspect the
early init has something to do with these...?

(I'm happy to collate answers into a patch to the docs, btw!)

In the long term I hope to revive Aneesh's and Balbir's series for hash
and radix as well.

Regards,
Daniel

> +
> +static void __init kasan_init_region(struct memblock_region *reg)
> +{
> + void *start = __va(reg->base);
> + void *end = __va(reg->base + reg->size);
> + unsigned long k_start, k_end, k_cur, k_next;
> + pmd_t *pmd;
> +
> + if (start >= end)
> + return;
> +
> + k_start = (unsigned long)kasan_mem_to_shadow(start);
> + k_end = (unsigned long)kasan_mem_to_shadow(end);
> + pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
> +
> + for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd++) {
> + k_next = pgd_addr_end(k_cur, k_end);
> + if ((void *)pmd_page_vaddr(*pmd) == kasan_early_shadow_pte) {
> + pte_t *new = pte_alloc_one_kernel(&init_mm);
> +
> + if (!new)
> + panic("kasan: pte_alloc_one_kernel() failed");
> + memcpy(new, kasan_early_shadow_pte, PTE_TABLE_SIZE);
> + pmd_populate_kernel(&init_mm, pmd, new);
> + }
> + };
> +
> + for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> + void *va = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> + pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
> +
> + if (!va)
> + panic("kasan: memblock_alloc() failed");
> + pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), k_cur);
> + pte_update(pte_offset_kernel(pmd, k_cur), ~0, pte_val(pte));
> + }
> + flush_tlb_kernel_range(k_start, k_end);
> +}
> +
> +void __init kasan_init(void)
> +{
> + struct memblock_region *reg;
> +
> + for_each_memblock(memory, reg)
> + kasan_init_region(reg);
> +
> + kasan_init_tags();
> +
> + /* At this point kasan is fully initialized. Enable error messages */
> + init_task.kasan_depth = 0;
> + pr_info("KASAN init done\n");
> +}
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 33cc6f676fa6..ae7db88b72d6 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -369,6 +369,10 @@ void __init mem_init(void)
>   pr_info("  * 0x%08lx..0x%08lx  : highmem PTEs\n",
>   PKMAP_BASE, PKMAP_ADDR(LAST_PKMAP));
>  #endif /* CONFIG_HIGHMEM */
> +#ifdef CONFIG_KASAN
> + pr_info("  * 0x%08lx..0x%08lx  : kasan shadow mem\n",
> + KASAN_SHADOW_START, KASAN_SHADOW_END);
> +#endif
>  #ifdef CONFIG_NOT_COHERENT_CACHE
>   pr_info("  * 0x%08lx..0x%08lx  : consistent mem\n",
>   IOREMAP_TOP, IOREMAP_TOP + CONFIG_CONSISTENT_SIZE);
> -- 
> 2.13.3


Re: [PATCH v2] powerpc/64: Fix memcmp reading past the end of src/dest

2019-02-08 Thread Segher Boessenkool
On Fri, Feb 08, 2019 at 05:12:21PM +1100, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Thu, Feb 07, 2019 at 10:53:13PM +1100, Michael Ellerman wrote:
> >> Chandan reported that fstests' generic/026 test hit a crash:
> >
> >> The instruction dump decodes as:
> >>   subfic  r6,r5,8
> >>   rlwinm  r6,r6,3,0,28
> >>   ldbrx   r9,0,r3
> >>   ldbrx   r10,0,r4 <-
> >> 
> >> Which shows us doing an 8 byte load from c0062ac3fff9, which
> >> crosses the page boundary at c0062ac4 and faults.
> >> 
> >> It's not OK for memcmp to read past the end of the source or
> >> destination buffers.
> >
> > It's not okay to access memory pages unsolicited.  Reading past the end
> > is fine per se.
> 
> Yeah I guess that's true.
> 
> Things like KASAN/valgrind probably disagree, but KASAN at least
> overrides memcmp AIUI.
> 
> I guess I feel better about it not reading past the end of the buffers,
> but maybe I'm being paranoid.

Sure, and that may be the best thing to do in the kernel.  OTOH, newer GCC
will inline many mem* for powerpc, and it will access past the end of
strings and buffers (but not past 4kB boundaries).

> The other complication is we support multiple page sizes, so detecting a
> page boundary is more complicated than it could be.

Yeah.

> So I guess I'm inclined to stick with this approach, but I can update
> the change log.

Thanks!  I mentioned it because this was the bug that was hit here: reading
past the end had no ill effect (as far as we know), but accessing the wrong
page did :-)


Segher


Re: [PATCH 1/2] powerpc/64s: Work around spurious warning on old gccs with -fsanitize-coverage

2019-02-08 Thread Segher Boessenkool
On Fri, Feb 08, 2019 at 02:02:24PM +1100, Michael Ellerman wrote:
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 8be3721d9302..a1acccd25839 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -675,12 +675,10 @@ static bool __init cpufeatures_process_feature(struct 
> dt_cpu_feature *f)
>   }
>   }
>  
> - if (!known && enable_unknown) {
> - if (!feat_try_enable_unknown(f)) {
> - pr_info("not enabling: %s (unknown and unsupported by 
> kernel)\n",
> - f->name);
> - return false;
> - }
> + if (!known && (!enable_unknown || !feat_try_enable_unknown(f))) {
> + pr_info("not enabling: %s (unknown and unsupported by 
> kernel)\n",
> + f->name);
> + return false;
>   }
>  
>   if (m->cpu_ftr_bit_mask)
cur_cpu_spec->cpu_features |= m->cpu_ftr_bit_mask;

This still set the wrong mask here, which is the bug you're trying to fix.
It should only do this if "known", afaics.


Segher


Re: powerpc: Enable kernel XZ compression option on 44x

2019-02-08 Thread Christian Lamparter
On Friday, February 8, 2019 2:02:41 PM CET Michael Ellerman wrote:
> On Thu, 2019-01-31 at 20:59:04 UTC, Christian Lamparter wrote:
> > Enable kernel XZ compression option on 44x.
> > Tested on a Western Digital - MyBook Live NAS.
> > It takes 22 seconds for the 800 MHz CPU to decompress
> > and boot a 2.63 MiB XZ-compressed kernel simpleImage.
> > 
> > Signed-off-by: Christian Lamparter 
> 
> Applied to powerpc next, thanks.
> 
> https://git.kernel.org/powerpc/c/423bfc69d7f491c47fc35921f7d460be
> 
> cheers
> 

Hello,

I'm happy to report that a xz compressed kernel (as simpleImage)
also booted on a TP-Link WDR4900-v1 (Freescale P1014):


Hence, I'm inclined to also add PPC_85xx to list in /arch/powerpc/KConfig

|select HAVE_KERNEL_XZ  if PPC_BOOK3S || 44x || PPC_85xx

But on the other hand, it could very well be that more or all? PPC
would benifit from having HAVE_KERNEL_XZ available. What do you think?
(If you know of any interesting ARCHs, I could ask around too.)

Regards,
Christian




[PATCH v3 1/2] drivers/mtd: Use mtd->name when registering nvmem device

2019-02-08 Thread Aneesh Kumar K.V
With this patch, we use the mtd->name instead of concatenating the name with '0'

Fixes: c4dfa25ab307 ("mtd: add support for reading MTD devices via the nvmem 
API")
Signed-off-by: Aneesh Kumar K.V 
---
 drivers/mtd/mtdcore.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 999b705769a8..3ef01baef9b6 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -507,6 +507,7 @@ static int mtd_nvmem_add(struct mtd_info *mtd)
 {
struct nvmem_config config = {};
 
+   config.id = -1;
config.dev = &mtd->dev;
config.name = mtd->name;
config.owner = THIS_MODULE;
-- 
2.20.1



[PATCH v3 2/2] drivers/mtd: Fix device registration error

2019-02-08 Thread Aneesh Kumar K.V
This change helps me to get multiple mtd device registered. Without this
I get

sysfs: cannot create duplicate filename '/bus/nvmem/devices/flash0'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2-00557-g1ef20ef21f22 #13
Call Trace:
[c000b38e3220] [c0b58fe4] dump_stack+0xe8/0x164 (unreliable)
[c000b38e3270] [c04cf074] sysfs_warn_dup+0x84/0xb0
[c000b38e32f0] [c04cf6c4] sysfs_do_create_link_sd.isra.0+0x114/0x150
[c000b38e3340] [c0726a84] bus_add_device+0x94/0x1e0
[c000b38e33c0] [c07218f0] device_add+0x4d0/0x830
[c000b38e3480] [c09d54a8] nvmem_register.part.2+0x1c8/0xb30
[c000b38e3560] [c0834530] mtd_nvmem_add+0x90/0x120
[c000b38e3650] [c0835bc8] add_mtd_device+0x198/0x4e0
[c000b38e36f0] [c083619c] mtd_device_parse_register+0x11c/0x280
[c000b38e3780] [c0840830] powernv_flash_probe+0x180/0x250
[c000b38e3820] [c072c120] platform_drv_probe+0x60/0xf0
[c000b38e38a0] [c07283c8] really_probe+0x138/0x4d0
[c000b38e3930] [c0728acc] driver_probe_device+0x13c/0x1b0
[c000b38e39b0] [c0728c7c] __driver_attach+0x13c/0x1c0
[c000b38e3a30] [c0725130] bus_for_each_dev+0xa0/0x120
[c000b38e3a90] [c0727b2c] driver_attach+0x2c/0x40
[c000b38e3ab0] [c07270f8] bus_add_driver+0x228/0x360
[c000b38e3b40] [c072a2e0] driver_register+0x90/0x1a0
[c000b38e3bb0] [c072c020] __platform_driver_register+0x50/0x70
[c000b38e3bd0] [c105c984] powernv_flash_driver_init+0x24/0x38
[c000b38e3bf0] [c0010904] do_one_initcall+0x84/0x464
[c000b38e3cd0] [c1004548] kernel_init_freeable+0x530/0x634
[c000b38e3db0] [c0011154] kernel_init+0x1c/0x168
[c000b38e3e20] [c000bed4] ret_from_kernel_thread+0x5c/0x68
mtd mtd1: Failed to register NVMEM device

With the change we now have

root@(none):/sys/bus/nvmem/devices# ls -al
total 0
drwxr-xr-x 2 root root 0 Feb  6 20:49 .
drwxr-xr-x 4 root root 0 Feb  6 20:49 ..
lrwxrwxrwx 1 root root 0 Feb  6 20:49 flash@0 -> 
../../../devices/platform/ibm,opal:flash@0/mtd/mtd0/flash@0
lrwxrwxrwx 1 root root 0 Feb  6 20:49 flash@1 -> 
../../../devices/platform/ibm,opal:flash@1/mtd/mtd1/flash@1

Fixes: acfe63ec1c59 ("mtd: Convert to using %pOFn instead of device_node.name")
Signed-off-by: Aneesh Kumar K.V 
---
 drivers/mtd/devices/powernv_flash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 22f753e555ac..83f88b8b5d9f 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -212,7 +212,7 @@ static int powernv_flash_set_driver_info(struct device *dev,
 * Going to have to check what details I need to set and how to
 * get them
 */
-   mtd->name = devm_kasprintf(dev, GFP_KERNEL, "%pOFn", dev->of_node);
+   mtd->name = devm_kasprintf(dev, GFP_KERNEL, "%pOFP", dev->of_node);
mtd->type = MTD_NORFLASH;
mtd->flags = MTD_WRITEABLE;
mtd->size = size;
-- 
2.20.1



[PATCH] tools/selftest/vm: allow choosing mem size and page size in map_hugetlb

2019-02-08 Thread Christophe Leroy
map_hugetlb maps 256Mbytes of memory with default hugepage size.

This patch allows the user to pass the size and page shift as an
argument in order to use different size and page size.

Signed-off-by: Christophe Leroy 
---
 tools/testing/selftests/vm/map_hugetlb.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/map_hugetlb.c 
b/tools/testing/selftests/vm/map_hugetlb.c
index 9b777fa95f09..5a2d7b8efc40 100644
--- a/tools/testing/selftests/vm/map_hugetlb.c
+++ b/tools/testing/selftests/vm/map_hugetlb.c
@@ -23,6 +23,14 @@
 #define MAP_HUGETLB 0x4 /* arch specific */
 #endif
 
+#ifndef MAP_HUGE_SHIFT
+#define MAP_HUGE_SHIFT 26
+#endif
+
+#ifndef MAP_HUGE_MASK
+#define MAP_HUGE_MASK 0x3f
+#endif
+
 /* Only ia64 requires this */
 #ifdef __ia64__
 #define ADDR (void *)(0x8000UL)
@@ -58,12 +66,29 @@ static int read_bytes(char *addr)
return 0;
 }
 
-int main(void)
+int main(int argc, char **argv)
 {
void *addr;
int ret;
+   size_t length = LENGTH;
+   int flags = FLAGS;
+   int shift = 0;
+
+   if (argc > 1)
+   length = atol(argv[1]) << 20;
+   if (argc > 2) {
+   shift = atoi(argv[2]);
+   if (shift)
+   flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
+   }
+
+   if (shift)
+   printf("%u kB hugepages\n", 1 << shift);
+   else
+   printf("Default size hugepages\n");
+   printf("Mapping %lu Mbytes\n", (unsigned long)length >> 20);
 
-   addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, -1, 0);
+   addr = mmap(ADDR, length, PROTECTION, flags, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(1);
-- 
2.13.3



Re: [PATCH] powerpc: fix 32-bit KVM-PR lockup and panic with MacOS guest

2019-02-08 Thread Mark Cave-Ayland
On 08/02/2019 14:45, Christophe Leroy wrote:

> Le 08/02/2019 à 15:33, Mark Cave-Ayland a écrit :
>> Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it 
>> up"
> 
> Expected format for the above is:
> 
> Commit 123456789abc ("text")

Hi Christophe,

Apologies - I'm fairly new at submitting kernel patches, but I can re-send it 
in the
correct format later if required.

>> unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
>> update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR 
>> MacOS
>> guest to lockup and panic the kernel.
>>
>> Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
>> 32-bit KVM-PR once again without issue.
>>
>> Signed-off-by: Mark Cave-Ayland 
> 
> Should include a Fixes: and a Cc to stable ?
> 
> Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it 
> up")
> Cc: sta...@vger.kernel.org

Indeed, but there are still some questions to be asked here:

1) Why were these bits removed from the original bitmask in the first place 
without
it being documented in the commit message?

2) Is this the right fix? I'm told that MacOS guests already run without this 
patch
on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for 
another
bug elsewhere in the 32-bit powerpc code.


If you think that these points don't matter, then I'm happy to resubmit the 
patch
as-is based upon your comments above.


ATB,

Mark.


Re: [PATCH] powerpc: fix 32-bit KVM-PR lockup and panic with MacOS guest

2019-02-08 Thread Christophe Leroy




Le 08/02/2019 à 15:33, Mark Cave-Ayland a écrit :

Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"


Expected format for the above is:

Commit 123456789abc ("text")


unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
guest to lockup and panic the kernel.

Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
32-bit KVM-PR once again without issue.

Signed-off-by: Mark Cave-Ayland 


Should include a Fixes: and a Cc to stable ?

Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without 
giving it up")

Cc: sta...@vger.kernel.org

Christophe


---
  arch/powerpc/kernel/process.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ce393df243aa..71bad4b6f80d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -176,7 +176,7 @@ static void __giveup_fpu(struct task_struct *tsk)
  
  	save_fpu(tsk);

msr = tsk->thread.regs->msr;
-   msr &= ~MSR_FP;
+   msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
  #ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX))
msr &= ~MSR_VSX;



[RFC PATCH v1] powerpc/accounting: do not account system time on transition to user.

2019-02-08 Thread Christophe Leroy
Time spent in kernel mode don't need to be accounted on transition
to user space. As far as the time spent in user is known, it
is possible to calculate the time spent in kernel by substracting
the time spent in user.

To do so, this patch modifies vtime_delta() to substract the
time spent in user since the last call to vtime_delta().

This patch gives a 2% improvment of null_syscall() selftest on a 83xx.

Signed-off-by: Christophe Leroy 
---
But surprisingly, this patch degrades the null_syscall selftest by 20% on the 
8xx. Any idea of the reason ?

 arch/powerpc/include/asm/accounting.h | 1 +
 arch/powerpc/include/asm/ppc_asm.h| 8 +---
 arch/powerpc/kernel/asm-offsets.c | 8 ++--
 arch/powerpc/kernel/time.c| 4 +++-
 4 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/accounting.h 
b/arch/powerpc/include/asm/accounting.h
index c607c5d835cc..2f1ff5f9fd7a 100644
--- a/arch/powerpc/include/asm/accounting.h
+++ b/arch/powerpc/include/asm/accounting.h
@@ -27,6 +27,7 @@ struct cpu_accounting_data {
/* Internal counters */
unsigned long starttime;/* TB value snapshot */
unsigned long starttime_user;   /* TB value on exit to usermode */
+   unsigned long utime_asm;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
unsigned long startspurr;   /* SPURR value snapshot */
unsigned long utime_sspurr; /* ->user_time when ->startspurr set */
diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index e0637730a8e7..be17d570d484 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -28,9 +28,8 @@
 #define ACCOUNT_STOLEN_TIME
 #else
 #define ACCOUNT_CPU_USER_ENTRY(ptr, ra, rb)\
-   MFTB(ra);   /* get timebase */  \
PPC_LL  rb, ACCOUNT_STARTTIME_USER(ptr);\
-   PPC_STL ra, ACCOUNT_STARTTIME(ptr); \
+   MFTB(ra);   /* get timebase */  \
subfrb,rb,ra;   /* subtract start value */  \
PPC_LL  ra, ACCOUNT_USER_TIME(ptr); \
add ra,ra,rb;   /* add on to user time */   \
@@ -38,12 +37,7 @@
 
 #define ACCOUNT_CPU_USER_EXIT(ptr, ra, rb) \
MFTB(ra);   /* get timebase */  \
-   PPC_LL  rb, ACCOUNT_STARTTIME(ptr); \
PPC_STL ra, ACCOUNT_STARTTIME_USER(ptr);\
-   subfrb,rb,ra;   /* subtract start value */  \
-   PPC_LL  ra, ACCOUNT_SYSTEM_TIME(ptr);   \
-   add ra,ra,rb;   /* add on to system time */ \
-   PPC_STL ra, ACCOUNT_SYSTEM_TIME(ptr)
 
 #ifdef CONFIG_PPC_SPLPAR
 #define ACCOUNT_STOLEN_TIME\
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 7a1b93c5af63..f2ba7735f56f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -260,19 +260,15 @@ int main(void)
OFFSET(PACAHWCPUID, paca_struct, hw_cpu_id);
OFFSET(PACAKEXECSTATE, paca_struct, kexec_state);
OFFSET(PACA_DSCR_DEFAULT, paca_struct, dscr_default);
-   OFFSET(ACCOUNT_STARTTIME, paca_struct, accounting.starttime);
OFFSET(ACCOUNT_STARTTIME_USER, paca_struct, accounting.starttime_user);
-   OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime);
-   OFFSET(ACCOUNT_SYSTEM_TIME, paca_struct, accounting.stime);
+   OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime_asm);
OFFSET(PACA_TRAP_SAVE, paca_struct, trap_save);
OFFSET(PACA_NAPSTATELOST, paca_struct, nap_state_lost);
OFFSET(PACA_SPRG_VDSO, paca_struct, sprg_vdso);
 #else /* CONFIG_PPC64 */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   OFFSET(ACCOUNT_STARTTIME, thread_info, accounting.starttime);
OFFSET(ACCOUNT_STARTTIME_USER, thread_info, accounting.starttime_user);
-   OFFSET(ACCOUNT_USER_TIME, thread_info, accounting.utime);
-   OFFSET(ACCOUNT_SYSTEM_TIME, thread_info, accounting.stime);
+   OFFSET(ACCOUNT_USER_TIME, thread_info, accounting.utime_asm);
 #endif
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index bc0503ef9c9c..79420643b45f 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -331,8 +331,10 @@ static unsigned long vtime_delta(struct task_struct *tsk,
WARN_ON_ONCE(!irqs_disabled());
 
now = mftb();
-   stime = now - acct->starttime;
+   stime = now - acct->starttime - acct->utime_asm;
acct->starttime = now;
+   acct->utime += acct->utime_asm;
+   acct->utime_asm = 0;
 
*stime_scaled = vtime_delta_scaled(acct, now, stime);
 
-- 
2.13.3



[PATCH] powerpc: fix 32-bit KVM-PR lockup and panic with MacOS guest

2019-02-08 Thread Mark Cave-Ayland
Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"
unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
guest to lockup and panic the kernel.

Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
32-bit KVM-PR once again without issue.

Signed-off-by: Mark Cave-Ayland 
---
 arch/powerpc/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ce393df243aa..71bad4b6f80d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -176,7 +176,7 @@ static void __giveup_fpu(struct task_struct *tsk)
 
save_fpu(tsk);
msr = tsk->thread.regs->msr;
-   msr &= ~MSR_FP;
+   msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
 #ifdef CONFIG_VSX
if (cpu_has_feature(CPU_FTR_VSX))
msr &= ~MSR_VSX;
-- 
2.11.0



Re: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-08 Thread Waiman Long
On 02/07/2019 03:54 PM, Waiman Long wrote:
> On 02/07/2019 03:08 PM, Peter Zijlstra wrote:
>> On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote:
>>> On 32-bit architectures, there aren't enough bits to hold both.
>>> 64-bit architectures, however, can have enough bits to do that. For
>>> x86-64, the physical address can use up to 52 bits. That is 4PB of
>>> memory. That leaves 12 bits available for other use. The task structure
>>> pointer is also aligned to the L1 cache size. That means another 6 bits
>>> (64 bytes cacheline) will be available. Reserving 2 bits for status
>>> flags, we will have 16 bits for the reader count.  That can supports
>>> up to (64k-1) readers.
>> 64k readers sounds like a number that is fairly 'easy' to reach, esp. on
>> 64bit. These are preemptible locks after all, all we need to do is get
>> 64k tasks nested on enough CPUs.
>>
>> I'm sure there's some willing Java proglet around that spawns more than
>> 64k threads just because it can. Run it on a big enough machine (ISTR
>> there's a number of >1k CPU systems out there) and voila.
> Yes, that can be a problem.
>
> One possible solution is to check if the count goes negative. If so,
> fail the read lock and make the readers wait in the wait queue until the
> count is in positive territory. That effectively reduces the reader
> count to 15 bits, but it will avoid the overflow situation. I will try
> to add that support into the next version.
>
> Cheers,
> Longman

Something like the attached patch.

Cheers,
Longman
>From 746913e7d14e874eeace1e146e63bdaea4dfd4a5 Mon Sep 17 00:00:00 2001
From: Waiman Long 
Date: Fri, 8 Feb 2019 08:58:10 -0500
Subject: [PATCH 23/23] locking/rwsem: Make MSbit of count as guard bit to fail
 readlock

With the merging of owner into count for x86-64, there is only 16 bits
left for reader count. It is theoretically possible for an application to
cause more than 64k readers to acquire a rwsem leading to count overflow.

To prevent this dire situation, the most significant bit of the count
is now treated as a guard bit (RWSEM_FLAG_READFAIL). Read-lock will now
fails for both the fast and optimistic spinning paths whenever this bit
is set. So all those extra readers will be put to sleep in the wait
queue. Wakeup will not happen until the reader count reaches 0.

A limit of 256 is also imposed on the number of readers that can be woken
up in one wakeup function call. This will eliminate the possibility of
waking up more than 64k readers and overflowing the count.

Signed-off-by: Waiman Long 
---
 kernel/locking/lock_events_list.h |  1 +
 kernel/locking/rwsem-xadd.c   | 40 --
 kernel/locking/rwsem-xadd.h   | 41 ++-
 3 files changed, 62 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 0052534..9ecdeac 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -60,6 +60,7 @@
 LOCK_EVENT(rwsem_opt_rlock)	/* # of read locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_wlock)	/* # of write locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_fail)	/* # of failed opt-spinnings		*/
+LOCK_EVENT(rwsem_opt_rfail)	/* # of failed reader-owned readlocks	*/
 LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled reader opt-spinnings	*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 213c2aa..a993055 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -110,6 +110,8 @@ enum rwsem_wake_type {
 # define RWSEM_RSPIN_MAX	(1 << 12)
 #endif
 
+#define MAX_READERS_WAKEUP	0x100
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_(), then the RWSEM_FLAG_WAITERS bit must
@@ -208,6 +210,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 * after setting the reader waiter to nil.
 		 */
 		wake_q_add_safe(wake_q, tsk);
+
+		/*
+		 * Limit # of readers that can be woken up per wakeup call.
+		 */
+		if (woken >= MAX_READERS_WAKEUP)
+			break;
 	}
 
 	adjustment = woken * RWSEM_READER_BIAS - adjustment;
@@ -445,6 +453,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock)
 			break;
 
 		/*
+		 * If a reader cannot acquire a reader-owned lock, we
+		 * have to quit. It is either the handoff bit just got
+		 * set or (unlikely) readfail bit is somehow set.
+		 */
+		if (unlikely(!wlock && (owner_state == OWNER_READER))) {
+			lockevent_inc(rwsem_opt_rfail);
+			break;
+		}
+
+		/*
 		 * An RT task cannot do optimistic spinning if it cannot
 		 * be sure the lock holder is running. When there's no owner
 		 * or is reader-owned, an RT task has to stop spinning or
@@ -526,12 +544,22 @@ static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem,
  * Wait for the read lock to be gr

Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache

2019-02-08 Thread Oliver
On Fri, Feb 8, 2019 at 8:47 PM Michael Ellerman  wrote:
>
> Oliver O'Halloran  writes:
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index f6e65375a8de..d1f0bdf41fac 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1810,7 +1810,7 @@ static int __init eeh_init_proc(void)
> >  &eeh_enable_dbgfs_ops);
> >   debugfs_create_u32("eeh_max_freezes", 0600,
> >   powerpc_debugfs_root, &eeh_max_freezes);
> > -#endif
> > + eeh_cache_debugfs_init();
>
> Oops :)

Yeah :(

> > diff --git a/arch/powerpc/kernel/eeh_cache.c 
> > b/arch/powerpc/kernel/eeh_cache.c
> > index b2c320e0fcef..dba421a577e7 100644
> > --- a/arch/powerpc/kernel/eeh_cache.c
> > +++ b/arch/powerpc/kernel/eeh_cache.c
> > @@ -298,9 +299,34 @@ void eeh_addr_cache_build(void)
> >   eeh_addr_cache_insert_dev(dev);
> >   eeh_sysfs_add_device(dev);
> >   }
> > +}
> >
> > -#ifdef DEBUG
> > - /* Verify tree built up above, echo back the list of addrs. */
> > - eeh_addr_cache_print(&pci_io_addr_cache_root);
> > -#endif
> > +static int eeh_addr_cache_show(struct seq_file *s, void *v)
> > +{
> > + struct rb_node *n = rb_first(&pci_io_addr_cache_root.rb_root);
> > + struct pci_io_addr_range *piar;
> > + int cnt = 0;
> > +
> > + spin_lock(&pci_io_addr_cache_root.piar_lock);
> > + while (n) {
> > + piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> > +
> > + seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
> > +(piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
> > +&piar->addr_lo, &piar->addr_hi, 
> > pci_name(piar->pcidev));
> > +
> > + n = rb_next(n);
> > + cnt++;
> > + }
>
> You can write that as a for loop can't you?
>
> struct rb_node *n;
> int i = 0;
>
> for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = 
> rb_next(n), i++) {

IIRC I did try that, but it's too long. 85 cols wide according to my editor.

> piar = rb_entry(n, struct pci_io_addr_range, rb_node);
>
> seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
>(piar->flags & IORESOURCE_IO) ? "i/o" : "mem", i,
>&piar->addr_lo, &piar->addr_hi, 
> pci_name(piar->pcidev));
> }
>
> cheers


[GIT PULL] Please pull powerpc/linux.git powerpc-5.0-4 tag

2019-02-08 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some more powerpc fixes for 5.0:

The following changes since commit 7bea7ac0ca0121798f3618d16201ca4dc4e67a00:

  powerpc/syscalls: Fix syscall tracing (2019-01-15 21:32:25 +1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.0-4

for you to fetch changes up to 5a3840a470c41ec0b85cd36ca80370330656b163:

  powerpc/papr_scm: Use the correct bind address (2019-02-01 10:13:51 +1100)

- --
powerpc fixes for 5.0 #4

Just two fixes, both going to stable.

Our support for split pmd page table lock had a bug which could lead to a crash
on mremap() when using the Radix MMU (Power9 only).

A fix for the PAPR SCM driver (nvdimm) we added last release, which had a bug
where we might mis-handle a hypervisor response leading to us failing to attach
the memory region.

Thanks to:
  Aneesh Kumar K.V, Oliver O'Halloran.

- --
Aneesh Kumar K.V (1):
  powerpc/radix: Fix kernel crash with mremap()

Oliver O'Halloran (1):
  powerpc/papr_scm: Use the correct bind address


 arch/powerpc/include/asm/book3s/64/pgtable.h | 22 +++---
 arch/powerpc/mm/pgtable-book3s64.c   | 22 ++
 arch/powerpc/platforms/pseries/papr_scm.c|  5 -
 3 files changed, 33 insertions(+), 16 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJcXYAdAAoJEFHr6jzI4aWAGAwP/1VlkYVIU0gA+x3iifUW0vD+
MhDhWuzFYpP1xfy5rMvEtwQ+IbwgO/j4220NilCAbCNbB69Ccj6x4mdPayTLWswi
lVb8VVKG0kre0sj0hXm15j/ZEhkIwGuAzu1FH/q6mf0kyEOCsdUiZ1k+vDjWtgJS
TzrpmstgCMlkAtJSq2RfGdWTqQc2N5uZktyjobpmkubvBm9PoEgjA/f+LwMmxZEa
M83cOdQ8HXKV+tZVMftUR9dfDuDo2L+5EhXdrBgreAF0haEtVo9JALp4rwvCdS4H
c5SzuJSNnaBvGTzCVmwOadWJHH8/Ok3N0ryC991SeAonYLedOQhSVZRzgqN9nCF2
KIGahudY5djIMTe41FXDsHC95yYuH6C8rfDPJBZUj6o9I6N98Xjz0gASoMFRJ71p
25kskxbH7ehL3ZdAENp93MmNS0gXyBkfhYs2JGKy+ufdlELUh8Yn6q0kJxrqgGnj
Cc9i221OXkRbi8Z0eZeAdWzzz48cdmqio2bvT679xwghRtFXfy1BhGv0m9F9evS4
TGTIKL/al/UVtCEStjdbTT9XGovGqrUSbIqe7MDaH478cGrQjBTVdwn6eU2G48Vh
nZMyY7g5J3BQWeE9vOkX8T9T/9ChwIIXDDjiXYr76+7jcgamWH9VL6bCwKQO00Lx
+9G3ikkFDRcFqleuTrf3
=BKaS
-END PGP SIGNATURE-


Re: powerpc/32: Include .branch_lt in data section

2019-02-08 Thread Michael Ellerman
On Wed, 2018-11-14 at 03:02:18 UTC, Joel Stanley wrote:
> When building a 32 bit powerpc kernel with Binutils 2.31.1 this warning
> is emitted:
> 
>  powerpc-linux-gnu-ld: warning: orphan section `.branch_lt' from
>  `arch/powerpc/kernel/head_44x.o' being placed in section `.branch_lt'
> 
> As of binutils commit 2d7ad24e8726 ("Support PLT16 relocs against local
> symbols")[1], 32 bit targets can produce .branch_lt sections in their
> output.
> 
> Include these symbols in the .data section as the ppc64 kernel does.
> 
> [1] 
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=2d7ad24e8726ba4c45c9e67be08223a146a837ce
> Signed-off-by: Joel Stanley 
> Reviewed-by: Alan Modra 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/98ecc6768e8fdba95da1fc1efa0ef2d7

cheers


Re: powerpc/pseries: Perform full re-add of CPU for topology update post-migration

2019-02-08 Thread Michael Ellerman
On Mon, 2018-10-29 at 18:43:36 UTC, Nathan Fontenot wrote:
> On pseries systems, performing a partition migration can result in
> altering the nodes a CPU is assigned to on the destination system. For
> exampl, pre-migration on the source system CPUs are in node 1 and 3,
> post-migration on the destination system CPUs are in nodes 2 and 3.
> 
> Handling the node change for a CPU can cause corruption in the slab
> cache if we hit a timing where a CPUs node is changed while cache_reap()
> is invoked. The corruption occurs because the slab cache code appears
> to rely on the CPU and slab cache pages being on the same node.
> 
> The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
> does not prevent us from hitting this scenario.
> 
> Changing the device tree property update notification handler that
> recognizes an affinity change for a CPU to do a full DLPAR remove and
> add of the CPU instead of dynamically changing its node resolves this
> issue.
> 
> Signed-off-by: Nathan Fontenot 
> Signed-off-by: Michael W. Bringmann 
> Tested-by: Michael W. Bringmann 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/81b61324922c67f73813d8a9c175f3c1

cheers


Re: Move static keyword at beginning of declaration

2019-02-08 Thread Michael Ellerman
On Sat, 2019-02-02 at 13:05:35 UTC, Mathieu Malaterre wrote:
> Move the static keyword around to remove the following warnings (W=1):
> 
>   arch/powerpc/platforms/ps3/os-area.c:212:1: error: 'static' is not at 
> beginning of declaration [-Werror=old-style-declaration]
>   arch/powerpc/platforms/ps3/system-bus.c:45:1: error: 'static' is not at 
> beginning of declaration [-Werror=old-style-declaration]
> 
> Signed-off-by: Mathieu Malaterre 
> Acked-by: Geoff Levand 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8e0f97357533aa5b57b333de47eb008c

cheers


Re: powerpc: Remove trailing semicolon after curly brace

2019-02-08 Thread Michael Ellerman
On Sat, 2019-02-02 at 12:54:27 UTC, Mathieu Malaterre wrote:
> There is not point in having a trailing semicolon after a closing curly
> brace. Remove it.
> 
> Signed-off-by: Mathieu Malaterre 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e5c27ef7a5f204ff2f894f0dd7ed3774

cheers


Re: [v2] powerpc: drop page_is_ram() and walk_system_ram_range()

2019-02-08 Thread Michael Ellerman
On Fri, 2019-02-01 at 10:46:52 UTC, Christophe Leroy wrote:
> Since commit c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
> it is possible to use the generic walk_system_ram_range() and
> the generic page_is_ram().
> 
> To enable the use of walk_system_ram_range() by the IBM EHEA
> ethernet driver, the generic function has to be exported.
> 
> As powerpc was the only (last?) user of CONFIG_ARCH_HAS_WALK_MEMORY,
> the #ifdef around the generic walk_system_ram_range() has become
> useless and can be dropped.
> 
> Fixes: c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/26b523356f49a0117c8f9e32ca98aa6d

cheers


Re: powerpc/powernv: Escalate reset when IODA reset fails

2019-02-08 Thread Michael Ellerman
On Fri, 2019-02-01 at 00:42:01 UTC, Oliver O'Halloran wrote:
> The IODA reset is used to flush out any OS controlled state from the PHB.
> This reset can fail if a PHB fatal error has occurred in early boot,
> probably due to a because of a bad device. We already do a fundemental
> reset of the device in some cases, so this patch just adds a test to force
> a full reset if firmware reports an error when performing the IODA reset.
> 
> Signed-off-by: Oliver O'Halloran 
> Reviewed-by: Alexey Kardashevskiy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b174b4fb919d118d9ac546b99a69574d

cheers


Re: [V2] powerpc/ptrace: Mitigate potential Spectre v1

2019-02-08 Thread Michael Ellerman
On Wed, 2019-01-30 at 12:46:00 UTC, Breno Leitao wrote:
> 'regno' is directly controlled by user space, hence leading to a potential
> exploitation of the Spectre variant 1 vulnerability.
> 
> On PTRACE_SETREGS and PTRACE_GETREGS requests, user space passes the
> register number that would be read or written. This register number is
> called 'regno' which is part of the 'addr' syscall parameter.
> 
> This 'regno' value is checked against the maximum pt_regs structure size,
> and then used to dereference it, which matches the initial part of a
> Spectre v1 (and Spectre v1.1) attack. The dereferenced value, then,
> is returned to userspace in the GETREGS case.
> 
> This patch sanitizes 'regno' before using it to dereference pt_reg.
> 
> Notice that given that speculation windows are large, the policy is
> to kill the speculation on the first load and not worry if it can be
> completed with a dependent load/store [1].
> 
> [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
> 
> Signed-off-by: Breno Leitao 
> Acked-by: Gustavo A. R. Silva 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ebb0e13ead2ddc186a80b1b0235deeef

cheers


Re: powerpc: Enable kernel XZ compression option on 44x

2019-02-08 Thread Michael Ellerman
On Thu, 2019-01-31 at 20:59:04 UTC, Christian Lamparter wrote:
> Enable kernel XZ compression option on 44x.
> Tested on a Western Digital - MyBook Live NAS.
> It takes 22 seconds for the 800 MHz CPU to decompress
> and boot a 2.63 MiB XZ-compressed kernel simpleImage.
> 
> Signed-off-by: Christian Lamparter 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/423bfc69d7f491c47fc35921f7d460be

cheers


Re: [v2] powerpc/traps: fix the message printed when stack overflows

2019-02-08 Thread Michael Ellerman
On Tue, 2019-01-29 at 16:37:55 UTC, Christophe Leroy wrote:
> Today's message is useless:
> 
> [   42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0
> 
> This patch fixes it:
> 
> [   66.905235] Kernel stack overflow in process sh[356], r1=c65560b0
> 
> Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
> Cc: 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9bf3d3c4e4fd82c7174f4856df372ab2

cheers


Re: [RESEND, v3] cxl: Wrap iterations over afu slices inside 'afu_list_lock'

2019-02-08 Thread Michael Ellerman
On Tue, 2019-01-29 at 11:06:18 UTC, Vaibhav Jain wrote:
> Within cxl module, iteration over array 'adapter->afu' may be racy
> at few points as it might be simultaneously read during an EEH and its
> contents being set to NULL while driver is being unloaded or unbound
> from the adapter. This might result in a NULL pointer to 'struct afu'
> being de-referenced during an EEH thereby causing a kernel oops.
> 
> This patch fixes this by making sure that all access to the array
> 'adapter->afu' is wrapped within the context of spin-lock
> 'adapter->afu_list_lock'.
> 
> Cc: sta...@vger.kernel.org
> Fixes: 9e8df8a2196("cxl: EEH support")
> Acked-by: Andrew Donnellan 
> Acked-by: Frederic Barrat 
> Acked-by: Christophe Lombard 
> Signed-off-by: Vaibhav Jain 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/edeb304f659792fb5bab90d7d6f3408b

cheers


Re: powerpc/kernel/time: Remove duplicate header

2019-02-08 Thread Michael Ellerman
On Mon, 2019-01-28 at 16:11:36 UTC, Brajeswar Ghosh wrote:
> Remove linux/rtc.h which is included more than once
> 
> Signed-off-by: Brajeswar Ghosh 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/75f8a37580b64f87c223fbd08db6b2f7

cheers


Re: powerpc/mm: Add _PAGE_SAO to _PAGE_CACHE_CTL mask

2019-02-08 Thread Michael Ellerman
On Mon, 2019-01-28 at 17:31:42 UTC, Reza Arbab wrote:
> In htab_convert_pte_flags(), _PAGE_CACHE_CTL is used to check for the
> _PAGE_SAO flag:
> 
>   else if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_SAO)
>   rflags |= (HPTE_R_W | HPTE_R_I | HPTE_R_M);
> 
> But, it isn't defined to include that flag:
> 
>   #define _PAGE_CACHE_CTL (_PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT)
> 
> This happens to work, but only because of the flag values:
> 
>   #define _PAGE_SAO   0x00010 /* Strong access order */
>   #define _PAGE_NON_IDEMPOTENT0x00020 /* non idempotent memory */
>   #define _PAGE_TOLERANT  0x00030 /* tolerant memory, cache inhibited 
> */
> 
> To prevent any issues if these particulars ever change, add _PAGE_SAO to
> the mask.
> 
> Suggested-by: Charles Johns 
> Signed-off-by: Reza Arbab 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/865a9432d16fe2f40a1a52005fd30778

cheers


Re: [1/4] powerpc/64s: Clear on-stack exception marker upon exception return

2019-02-08 Thread Michael Ellerman
On Tue, 2019-01-22 at 15:57:21 UTC, Joe Lawrence wrote:
> From: Nicolai Stange 
> 
> The ppc64 specific implementation of the reliable stacktracer,
> save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
> trace" whenever it finds an exception frame on the stack. Stack frames
> are classified as exception frames if the STACK_FRAME_REGS_MARKER magic,
> as written by exception prologues, is found at a particular location.
> 
> However, as observed by Joe Lawrence, it is possible in practice that
> non-exception stack frames can alias with prior exception frames and thus,
> that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on
> the stack. It in turn falsely reports an unreliable stacktrace and blocks
> any live patching transition to finish. Said condition lasts until the
> stack frame is overwritten/initialized by function call or other means.
> 
> In principle, we could mitigate this by making the exception frame
> classification condition in save_stack_trace_tsk_reliable() stronger:
> in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into
> account that for all exceptions executing on the kernel stack
> - their stack frames's backlink pointers always match what is saved
>   in their pt_regs instance's ->gpr[1] slot and that
> - their exception frame size equals STACK_INT_FRAME_SIZE, a value
>   uncommonly large for non-exception frames.
> 
> However, while these are currently true, relying on them would make the
> reliable stacktrace implementation more sensitive towards future changes in
> the exception entry code. Note that false negatives, i.e. not detecting
> exception frames, would silently break the live patching consistency model.
> 
> Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
> rely on STACK_FRAME_REGS_MARKER as well.
> 
> Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER
> for those exceptions running on the "normal" kernel stack and returning
> to kernelspace: because the topmost frame is ignored by the reliable stack
> tracer anyway, returns to userspace don't need to take care of clearing
> the marker.
> 
> Furthermore, as I don't have the ability to test this on Book 3E or
> 32 bits, limit the change to Book 3S and 64 bits.
> 
> Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on
> PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended
> on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies
> PPC_BOOK3S_64, there's no functional change here.
> 
> Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for 
> the consistency model")
> Reported-by: Joe Lawrence 
> Signed-off-by: Nicolai Stange 
> Signed-off-by: Joe Lawrence 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/eddd0b332304d554ad6243942f87c2fc

cheers


Re: powerpc/cell: Remove duplicate header

2019-02-08 Thread Michael Ellerman
On Thu, 2019-01-17 at 16:19:05 UTC, Sabyasachi Gupta wrote:
> Remove linux/syscalls.h which is included more than once
> 
> Signed-off-by: Sabyasachi Gupta 
> Acked-by: Souptick Joarder 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/45a202a3fefc6ee7b19b1222bfb5b067

cheers


Re: powerpc/powernv: Remove duplicate header

2019-02-08 Thread Michael Ellerman
On Thu, 2019-01-17 at 16:10:33 UTC, Sabyasachi Gupta wrote:
> Remove linux/printk.h which is included more than once.
> 
> Signed-off-by: Sabyasachi Gupta 
> Acked-by: Souptick Joarder 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f069a062ecce7ccc17221c24097826e8

cheers


Re: [v2] powerpc/perf: Add mem access events to sysfs

2019-02-08 Thread Michael Ellerman
On Mon, 2018-12-10 at 03:59:05 UTC, Madhavan Srinivasan wrote:
> Add mem-loads/mem-stores events to sysfs.
> The event is formed based on raw event encoding.
> Primary PMU event used here is PM_MRK_INST_CMPL
> along with MMCRA[SM] modes and Thresholding bit
> 
> Signed-off-by: Madhavan Srinivasan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ab4510e9ac6dcdd5e9ec0380bec279b5

cheers


Re: [1/6] powerpc/eeh: Cleanup eeh_pe_clear_frozen_state()

2019-02-08 Thread Michael Ellerman
On Thu, 2018-11-29 at 03:16:37 UTC, Sam Bobroff wrote:
> The 'clear_sw_state' parameter for eeh_pe_clear_frozen_state() is
> redundant because it has no effect (except in the rare case of a
> hardware error part way through unfreezing a tree of PEs, where it
> would dangerously allow partial de-isolation before returning
> failure).
> 
> It is passed down to __eeh_pe_clear_frozen_state(), and from there to
> eeh_unfreeze_pe(), where it causes EEH_PE_ISOLATED to be removed
> from the state of each PE during the traversal.  However, when the
> traversal finishes, EEH_PE_ISOLATED is unconditionally removed by a
> call to eeh_pe_state_clear() regardless of the parameter's value.
> 
> So remove the flag and pass false to eeh_unfreeze_pe() (to avoid the
> rare case described above, as it was before the flag was introduced).
> Also, perform the recursion directly in the function and eliminate a
> bit of boilerplate.
> 
> There should be no change in functionality, except as mentioned above.
> 
> Signed-off-by: Sam Bobroff 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/3376cb91ed908eb0728900894a77d820

cheers


Re: [PATCH] powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64

2019-02-08 Thread Christophe Leroy




On 02/08/2019 12:34 PM, Michael Ellerman wrote:

In commit 7820856a4fcd ("powerpc/mm/book3e/64: Remove unsupported
64Kpage size from 64bit booke") we dropped the 64K page size support
from the 64-bit nohash (Book3E) code.

But we didn't update the dependencies of the PPC_64K_PAGES option,
meaning a randconfig can still trigger this code and cause a build
breakage, eg:
   arch/powerpc/include/asm/nohash/64/pgtable.h:14:2: error: #error "Page size not 
supported"
   arch/powerpc/include/asm/nohash/mmu-book3e.h:275:2: error: #error 
Unsupported page size

So remove PPC_BOOK3E_64 from the dependencies. This also means we
don't need to worry about PPC_FSL_BOOK3E, because that was just trying
to prevent the PPC_BOOK3E_64=y && PPC_FSL_BOOK3E=y case.


Does it means some cleanup could be done, for instance:

arch/powerpc/include/asm/nohash/64/pgalloc.h:#ifndef CONFIG_PPC_64K_PAGES
arch/powerpc/include/asm/nohash/64/pgalloc.h:#endif /* 
CONFIG_PPC_64K_PAGES */

arch/powerpc/include/asm/nohash/64/pgtable.h:#ifdef CONFIG_PPC_64K_PAGES
arch/powerpc/include/asm/nohash/64/slice.h:#ifdef CONFIG_PPC_64K_PAGES
arch/powerpc/include/asm/nohash/64/slice.h:#else /* CONFIG_PPC_64K_PAGES */
arch/powerpc/include/asm/nohash/64/slice.h:#endif /* 
!CONFIG_PPC_64K_PAGES */

arch/powerpc/include/asm/nohash/pte-book3e.h:#ifdef CONFIG_PPC_64K_PAGES

arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES


Christophe



Signed-off-by: Michael Ellerman 
---
  arch/powerpc/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3f237ffa0649..7a16b8a7b54b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -694,7 +694,7 @@ config PPC_16K_PAGES
  
  config PPC_64K_PAGES

bool "64k page size"
-   depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
+   depends on 44x || PPC_BOOK3S_64
select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64
  
  config PPC_256K_PAGES




Re: [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain()

2019-02-08 Thread Oliver
On Fri, Feb 8, 2019 at 8:57 PM Michael Ellerman  wrote:
>
> Oliver O'Halloran  writes:
> > diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> > b/arch/powerpc/include/asm/pci-bridge.h
> > index aee4fcc24990..149053b7f481 100644
> > --- a/arch/powerpc/include/asm/pci-bridge.h
> > +++ b/arch/powerpc/include/asm/pci-bridge.h
> > @@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
> >  extern struct pci_controller *pci_find_hose_for_OF_device(
> >   struct device_node* node);
> >
> > +extern struct pci_controller *pci_find_hose_for_domain(uint32_t domain_nr);
>
> I know we use "hose" a lot in the PCI code, but it's a stupid name. Can
> we not introduce new usages?

I was tempted to call it pci_find_horse_for_domain(), but neigh.

>
> It returns a pci_controller so pci_find_controller_for_domain()?

ok

>
> cheers


[RFC PATCH v1 16/16] powerpc/32: don't do syscall stuff in transfer_to_handler on non BOOKE

2019-02-08 Thread Christophe Leroy
As syscalls are now handled via a fast entry path, syscall related
actions can be removed from the generic transfer_to_handler path.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0927d5ff1e79..85f1fc88c237 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -211,7 +211,9 @@ transfer_to_handler_cont:
 */
tophys(r12, r1)
lwz r12,_MSR(r12)
+#ifdef CONFIG_BOOKE/* to be removed once BOOKE uses fast syscall entry */
xor r12,r10,r12
+#endif
andi.   r12,r12,MSR_EE
bne 1f
 
@@ -252,8 +254,10 @@ reenable_mmu:
 * the rest is restored from the exception frame.
 */
 
+#ifdef CONFIG_BOOKE/* to be removed once BOOKE uses fast syscall entry */
/* Are we enabling or disabling interrupts ? */
andi.   r0,r10,MSR_EE
+#endif
 
stwur1,-32(r1)
stw r9,8(r1)
@@ -262,7 +266,9 @@ reenable_mmu:
stw r4,20(r1)
stw r5,24(r1)
 
+#ifdef CONFIG_BOOKE/* to be removed once BOOKE uses fast syscall entry */
bne-0f
+#endif
 
/* If we are disabling interrupts (normal case), simply log it with
 * lockdep
@@ -282,6 +288,7 @@ reenable_mmu:
mtlrr9
bctr/* jump to handler */
 
+#ifdef CONFIG_BOOKE/* to be removed once BOOKE uses fast syscall entry */
/* If we are enabling interrupt, this is a syscall. They shouldn't
 * happen while interrupts are disabled, so let's do a warning here.
 */
@@ -294,6 +301,7 @@ reenable_mmu:
ori r10,r10,MSR_EE
mtmsr   r10
b   2b
+#endif
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
 #if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
-- 
2.13.3



[RFC PATCH v1 15/16] powerpc/32: Remove MSR_PR test when returning from syscall

2019-02-08 Thread Christophe Leroy
syscalls are from user only, so we can account time without checking
whether returning to kernel or user as it will only be user.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 137bd2103051..0927d5ff1e79 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -482,12 +482,7 @@ BEGIN_FTR_SECTION
lwarx   r7,0,r1
 END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
stwcx.  r0,0,r1 /* to clear the reservation */
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   andi.   r4,r8,MSR_PR
-   beq 3f
ACCOUNT_CPU_USER_EXIT(r2, r5, r7)
-3:
-#endif
lwz r4,_LINK(r1)
lwz r5,_CCR(r1)
mtlrr4
-- 
2.13.3



[RFC PATCH v1 14/16] powerpc/32: implement fast entry for syscalls on non BOOKE

2019-02-08 Thread Christophe Leroy
This patch implements a fast entry for syscalls.

Syscalls don't have to preserve non volatile registers except LR.

This patch then implement a fast entry for syscalls, where
volatile registers get clobbered.

As this entry is dedicated to syscall it always sets MSR_EE
and warns in case MSR_EE was previously off

It also assumes that the call is always from user, system calls are
unexpected from kernel.

The overall series improves null_syscall selftest by 12,5% on an 83xx
and by 17% on a 8xx.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 42 +
 arch/powerpc/kernel/head_32.S  |  3 +-
 arch/powerpc/kernel/head_32.h  | 85 --
 arch/powerpc/kernel/head_40x.S |  3 +-
 arch/powerpc/kernel/head_8xx.S |  3 +-
 5 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1e11528d45ae..137bd2103051 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -335,6 +335,46 @@ stack_ovf:
SYNC
RFI
 
+#ifndef CONFIG_BOOKE   /* to be removed once BOOKE uses fast syscall entry */
+#ifdef CONFIG_TRACE_IRQFLAGS
+trace_syscall_entry_irq_off:
+   /*
+* The trace_hardirqs_off will use CALLER_ADDR0 and CALLER_ADDR1.
+* If from user mode there is only one stack frame on the stack, and
+* accessing CALLER_ADDR1 will cause oops. So we need create a dummy
+* stack frame to make trace_hardirqs_on happy.
+*
+*/
+   stwur1,-32(r1)
+
+   /*
+* Syscall shouldn't happen while interrupts are disabled,
+* so let's do a warning here.
+*/
+0: trap
+   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+   bl  trace_hardirqs_on
+
+   addir1,r1,32
+
+   /* Now enable for real */
+   LOAD_MSR_KERNEL(r10, MSR_KERNEL | MSR_EE)
+   mtmsr   r10
+
+   REST_GPR(0, r1)
+   REST_4GPRS(3, r1)
+   REST_2GPRS(7, r1)
+   b   DoSyscall
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+   .globl  transfer_to_syscall
+transfer_to_syscall:
+#ifdef CONFIG_TRACE_IRQFLAGS
+   andi.   r12,r9,MSR_EE
+   beq-trace_syscall_entry_irq_off
+#endif /* CONFIG_TRACE_IRQFLAGS */
+#endif /* !CONFIG_BOOKE */
+
 /*
  * Handle a system call.
  */
@@ -346,9 +386,11 @@ _GLOBAL(DoSyscall)
stw r3,ORIG_GPR3(r1)
li  r12,0
stw r12,RESULT(r1)
+#ifdef CONFIG_BOOKE/* to be removed once BOOKE uses fast syscall entry */
lwz r11,_CCR(r1)/* Clear SO bit in CR */
rlwinm  r11,r11,0,4,2
stw r11,_CCR(r1)
+#endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
/* Make sure interrupts are enabled */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 3a1df9edf6da..7576e1374a69 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -368,8 +368,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
. = 0xc00
DO_KVM  0xc00
 SystemCall:
-   EXCEPTION_PROLOG
-   EXC_XFER_SYS(0xc00, DoSyscall)
+   SYSCALL_ENTRY   0xc00
 
 /* Single step - not used on 601 */
EXCEPTION(0xd00, SingleStep, single_step_exception, EXC_XFER_STD)
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 14cb0af2f494..4a692553651f 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -73,6 +73,87 @@
SAVE_2GPRS(7, r11)
 .endm
 
+.macro SYSCALL_ENTRY trapno
+   mfspr   r12,SPRN_SPRG_THREAD
+   mfcrr10
+   lwz r11,TASK_STACK-THREAD(r12)
+   mflrr9
+   addir11,r11,THREAD_SIZE - INT_FRAME_SIZE
+   rlwinm  r10,r10,0,4,2   /* Clear SO bit in CR */
+   tophys(r11,r11)
+   stw r10,_CCR(r11)   /* save registers */
+   mfspr   r10,SPRN_SRR0
+   stw r9,_LINK(r11)
+   mfspr   r9,SPRN_SRR1
+   stw r1,GPR1(r11)
+   stw r1,0(r11)
+   tovirt(r1,r11)  /* set new kernel sp */
+   stw r10,_NIP(r11)
+#ifdef CONFIG_40x
+   rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
+#else
+   LOAD_MSR_KERNEL(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take 
exceptions */
+   MTMSRD(r10) /* (except for mach check in rtas) */
+#endif
+   lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
+   stw r2,GPR2(r11)
+   addir10,r10,STACK_FRAME_REGS_MARKER@l
+   stw r9,_MSR(r11)
+   li  r2, \trapno + 1
+   stw r10,8(r11)
+   stw r2,_TRAP(r11)
+   SAVE_GPR(0, r11)
+   SAVE_4GPRS(3, r11)
+   SAVE_2GPRS(7, r11)
+   addir11,r1,STACK_FRAME_OVERHEAD
+   addir2,r12,-THREAD
+   stw r11,PT_REGS(r12)
+#if defined(CONFIG_40x)
+   /* Check to see if the dbcr0 register is set up to debug.  Use the
+  internal debug mode bit to do this. */
+   lwz

[RFC PATCH v1 13/16] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2019-02-08 Thread Christophe Leroy
[text mostly copied from benh's RFC/WIP]

ppc32 are still doing something rather gothic and wrong on 32-bit
which we stopped doing on 64-bit a while ago.

We have that thing where some handlers "copy" the EE value from the
original stack frame into the new MSR before transferring to the
handler.

Thus for a number of exceptions, we enter the handlers with interrupts
enabled.

This is rather fishy, some of the stuff that handlers might do early
on such as irq_enter/exit or user_exit, context tracking, etc...
should be run with interrupts off afaik.

Generally our handlers know when to re-enable interrupts if needed.

The problem we were having is that we assumed these interrupts would
return with interrupts enabled. However that isn't the case.

Instead, this patch changes things so that we always enter exception
handlers with interrupts *off* with the notable exception of syscalls
which are special (and get a fast path).

Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 117 -
 1 file changed, 68 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index b489aebdc5c5..1e11528d45ae 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "head_32.h"
 
@@ -200,19 +201,42 @@ transfer_to_handler_cont:
mtspr   SPRN_NRI, r0
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
+   /*
+* When tracing IRQ state (lockdep) we enable the MMU before we call
+* the IRQ tracing functions as they might access vmalloc space or
+* perform IOs for console output.
+*
+* To speed up the syscall path where interrupts stay on, let's check
+* first if we are changing the MSR value at all.
+*/
+   tophys(r12, r1)
+   lwz r12,_MSR(r12)
+   xor r12,r10,r12
+   andi.   r12,r12,MSR_EE
+   bne 1f
+
+   /* MSR isn't changing, just transition directly */
+#endif
+   mtspr   SPRN_SRR0,r11
+   mtspr   SPRN_SRR1,r10
+   mtlrr9
+   SYNC
+   RFI /* jump to handler, enable MMU */
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+1: /* MSR is changing, re-enable MMU so we can notify lockdep. We need to
+* keep interrupts disabled at this point otherwise we might risk
+* taking an interrupt before we tell lockdep they are enabled.
+*/
lis r12,reenable_mmu@h
ori r12,r12,reenable_mmu@l
+   LOAD_MSR_KERNEL(r0, MSR_KERNEL)
mtspr   SPRN_SRR0,r12
-   mtspr   SPRN_SRR1,r10
+   mtspr   SPRN_SRR1,r0
SYNC
RFI
-reenable_mmu:  /* re-enable mmu so we can */
-   mfmsr   r10
-   lwz r12,_MSR(r1)
-   xor r10,r10,r12
-   andi.   r10,r10,MSR_EE  /* Did EE change? */
-   beq 1f
 
+reenable_mmu:
/*
 * The trace_hardirqs_off will use CALLER_ADDR0 and CALLER_ADDR1.
 * If from user mode there is only one stack frame on the stack, and
@@ -227,14 +251,24 @@ reenable_mmu: /* re-enable 
mmu so we can */
 * they aren't useful past this point (aren't syscall arguments),
 * the rest is restored from the exception frame.
 */
+
+   /* Are we enabling or disabling interrupts ? */
+   andi.   r0,r10,MSR_EE
+
stwur1,-32(r1)
stw r9,8(r1)
stw r11,12(r1)
stw r3,16(r1)
stw r4,20(r1)
stw r5,24(r1)
-   bl  trace_hardirqs_off
-   lwz r5,24(r1)
+
+   bne-0f
+
+   /* If we are disabling interrupts (normal case), simply log it with
+* lockdep
+*/
+1: bl  trace_hardirqs_off
+2: lwz r5,24(r1)
lwz r4,20(r1)
lwz r3,16(r1)
lwz r11,12(r1)
@@ -244,15 +278,22 @@ reenable_mmu: /* re-enable 
mmu so we can */
lwz r6,GPR6(r1)
lwz r7,GPR7(r1)
lwz r8,GPR8(r1)
-1: mtctr   r11
+   mtctr   r11
mtlrr9
bctr/* jump to handler */
-#else /* CONFIG_TRACE_IRQFLAGS */
-   mtspr   SPRN_SRR0,r11
-   mtspr   SPRN_SRR1,r10
-   mtlrr9
-   SYNC
-   RFI /* jump to handler, enable MMU */
+
+   /* If we are enabling interrupt, this is a syscall. They shouldn't
+* happen while interrupts are disabled, so let's do a warning here.
+*/
+0: trap
+   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+   bl  trace_hardirqs_on
+
+   /* Now enable for real */
+   mfmsr   r10
+   ori r10,r10,MSR_EE
+   mtmsr   r10
+   b   2b
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
 #if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
@@ -308,30 

[RFC PATCH v1 12/16] powerpc/32: get rid of COPY_EE in exception entry

2019-02-08 Thread Christophe Leroy
EXC_XFER_TEMPLATE() is not called with COPY_EE anymore so
we can get rid of copyee parameters and related COPY_EE and NOCOPY
macros.

Suggested-by: Benjamin Herrenschmidt 
[splited out from benh RFC patch]

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h| 12 
 arch/powerpc/kernel/head_40x.S   |  8 +++-
 arch/powerpc/kernel/head_booke.h | 22 --
 3 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 8881b6887841..14cb0af2f494 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -103,28 +103,24 @@
addir3,r1,STACK_FRAME_OVERHEAD; \
xfer(n, hdlr)
 
-#define EXC_XFER_TEMPLATE(hdlr, trap, msr, copyee, tfer, ret)  \
+#define EXC_XFER_TEMPLATE(hdlr, trap, msr, tfer, ret)  \
li  r10,trap;   \
stw r10,_TRAP(r11); \
LOAD_MSR_KERNEL(r10, msr);  \
-   copyee(r10, r9);\
bl  tfer;   \
.long   hdlr;   \
.long   ret
 
-#define COPY_EE(d, s)  rlwimi d,s,0,MSR_EE
-#define NOCOPY(d, s)
-
 #define EXC_XFER_STD(n, hdlr)  \
-   EXC_XFER_TEMPLATE(hdlr, n, MSR_KERNEL, NOCOPY, 
transfer_to_handler_full,\
+   EXC_XFER_TEMPLATE(hdlr, n, MSR_KERNEL, transfer_to_handler_full,
\
  ret_from_except_full)
 
 #define EXC_XFER_LITE(n, hdlr) \
-   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL, NOCOPY, transfer_to_handler, \
+   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL, transfer_to_handler, \
  ret_from_except)
 
 #define EXC_XFER_SYS(n, hdlr)  \
-   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL | MSR_EE, NOCOPY, 
transfer_to_handler, \
+   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL | MSR_EE, transfer_to_handler, \
  ret_from_except)
 
 #endif /* __HEAD_32_H__ */
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 549db4fed183..1afab9190147 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -166,8 +166,7 @@ _ENTRY(saved_ksp_limit)
CRITICAL_EXCEPTION_PROLOG;  \
addir3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_TEMPLATE(hdlr, n+2, (MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)), \
- NOCOPY, crit_transfer_to_handler, \
- ret_from_crit_exc)
+ crit_transfer_to_handler, ret_from_crit_exc)
 
 /*
  * 0x0100 - Critical Interrupt Exception
@@ -651,7 +650,7 @@ _ENTRY(saved_ksp_limit)
addir3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_TEMPLATE(DebugException, 0x2002, \
(MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)), \
-   NOCOPY, crit_transfer_to_handler, ret_from_crit_exc)
+   crit_transfer_to_handler, ret_from_crit_exc)
 
/* Programmable Interval Timer (PIT) Exception. (from 0x1000) */
 Decrementer:
@@ -673,8 +672,7 @@ WDTException:
addir3,r1,STACK_FRAME_OVERHEAD;
EXC_XFER_TEMPLATE(WatchdogException, 0x1020+2,
  (MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)),
- NOCOPY, crit_transfer_to_handler,
- ret_from_crit_exc)
+ crit_transfer_to_handler, ret_from_crit_exc)
 
 /*
  * The other Data TLB exceptions bail out to this point
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 264976c43f34..56dd1341eb3d 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -217,8 +217,7 @@ END_BTB_FLUSH_SECTION
CRITICAL_EXCEPTION_PROLOG(intno);   \
addir3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_TEMPLATE(hdlr, n+2, (MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)), \
- NOCOPY, crit_transfer_to_handler, \
- ret_from_crit_exc)
+ crit_transfer_to_handler, ret_from_crit_exc)
 
 #define MCHECK_EXCEPTION(n, label, hdlr)   \
START_EXCEPTION(label); \
@@ -227,32 +226,27 @@ END_BTB_FLUSH_SECTION
stw r5,_ESR(r11);   \
addir3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_TEMPLATE(hdlr, n+4, (MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)), \
- NOCOPY, mcheck_transfer_to_handler,   \
- ret_from_mcheck_exc)
+ mcheck_transfer_to_handler, ret_from_mcheck_exc)
 
-#define EXC_XFER_TEMPLATE(hdlr, trap, msr, copyee, tfer, ret)  \
+#define EXC_XFER_T

[RFC PATCH v1 11/16] powerpc/32: Enter exceptions with MSR_EE unset

2019-02-08 Thread Christophe Leroy
All exceptions handler knows when to reenable interrupts, so
it is safer to enter all of them with MSR_EE unset, except
for syscalls.

Suggested-by: Benjamin Herrenschmidt 
[splited out from benh RFC patch]

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S| 68 ++--
 arch/powerpc/kernel/head_32.h|  8 -
 arch/powerpc/kernel/head_40x.S   | 44 +++
 arch/powerpc/kernel/head_44x.S   |  6 ++--
 arch/powerpc/kernel/head_8xx.S   | 32 -
 arch/powerpc/kernel/head_booke.h | 12 ++-
 arch/powerpc/kernel/head_fsl_booke.S | 26 +++---
 7 files changed, 90 insertions(+), 106 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 33fb08b2398f..3a1df9edf6da 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -335,7 +335,7 @@ Alignment:
mfspr   r5,SPRN_DSISR
stw r5,_DSISR(r11)
addir3,r1,STACK_FRAME_OVERHEAD
-   EXC_XFER_EE(0x600, alignment_exception)
+   EXC_XFER_STD(0x600, alignment_exception)
 
 /* Program check exception */
EXCEPTION(0x700, ProgramCheck, program_check_exception, EXC_XFER_STD)
@@ -356,13 +356,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
bl  load_up_fpu /* if from user, just load it up */
b   fast_exception_return
 1: addir3,r1,STACK_FRAME_OVERHEAD
-   EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
+   EXC_XFER_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
EXCEPTION(0x900, Decrementer, timer_interrupt, EXC_XFER_LITE)
 
-   EXCEPTION(0xa00, Trap_0a, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0xb00, Trap_0b, unknown_exception, EXC_XFER_EE)
+   EXCEPTION(0xa00, Trap_0a, unknown_exception, EXC_XFER_STD)
+   EXCEPTION(0xb00, Trap_0b, unknown_exception, EXC_XFER_STD)
 
 /* System call */
. = 0xc00
@@ -373,7 +373,7 @@ SystemCall:
 
 /* Single step - not used on 601 */
EXCEPTION(0xd00, SingleStep, single_step_exception, EXC_XFER_STD)
-   EXCEPTION(0xe00, Trap_0e, unknown_exception, EXC_XFER_EE)
+   EXCEPTION(0xe00, Trap_0e, unknown_exception, EXC_XFER_STD)
 
 /*
  * The Altivec unavailable trap is at 0x0f20.  Foo.
@@ -617,35 +617,35 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU)
 #define altivec_assist_exception   unknown_exception
 #endif
 
-   EXCEPTION(0x1300, Trap_13, instruction_breakpoint_exception, 
EXC_XFER_EE)
-   EXCEPTION(0x1400, SMI, SMIException, EXC_XFER_EE)
-   EXCEPTION(0x1500, Trap_15, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1600, Trap_16, altivec_assist_exception, EXC_XFER_EE)
+   EXCEPTION(0x1300, Trap_13, instruction_breakpoint_exception, 
EXC_XFER_STD)
+   EXCEPTION(0x1400, SMI, SMIException, EXC_XFER_STD)
+   EXCEPTION(0x1500, Trap_15, unknown_exception, EXC_XFER_STD)
+   EXCEPTION(0x1600, Trap_16, altivec_assist_exception, EXC_XFER_STD)
EXCEPTION(0x1700, Trap_17, TAUException, EXC_XFER_STD)
-   EXCEPTION(0x1800, Trap_18, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1900, Trap_19, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1a00, Trap_1a, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1b00, Trap_1b, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1c00, Trap_1c, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1e00, Trap_1e, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x1f00, Trap_1f, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2000, RunMode, RunModeException, EXC_XFER_EE)
-   EXCEPTION(0x2100, Trap_21, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2200, Trap_22, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2300, Trap_23, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2400, Trap_24, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2500, Trap_25, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2600, Trap_26, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2700, Trap_27, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2800, Trap_28, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2900, Trap_29, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2a00, Trap_2a, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2b00, Trap_2b, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2c00, Trap_2c, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2d00, Trap_2d, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2e00, Trap_2e, unknown_exception, EXC_XFER_EE)
-   EXCEPTION(0x2f00, Trap_2f, unknown_exception, EXC_XFER_EE)
+   EXCEPTION(0x1800, Trap_18, unknown_exception, EXC_XFER_STD)
+   EXCEPTION(0x1900, Trap_19, unknown_exception, EXC_XFER_STD)
+   EXCEPTION(0x1a00, Trap_1a, unknown_exception, EXC_XFER_STD)
+   EXCEPTION(0x1b00, Trap_1b, unknown_exception, EXC_XFER_STD)
+   EXCEPTIO

[RFC PATCH v1 10/16] powerpc/32: enter syscall with MSR_EE inconditionaly set

2019-02-08 Thread Christophe Leroy
syscalls are expected to be entered with MSR_EE set. Lets
make it inconditional by forcing MSR_EE on syscalls.

This patch adds EXC_XFER_SYS for that.

Suggested-by: Benjamin Herrenschmidt 
[splited out from benh RFC patch]

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S| 2 +-
 arch/powerpc/kernel/head_32.h| 4 
 arch/powerpc/kernel/head_40x.S   | 2 +-
 arch/powerpc/kernel/head_44x.S   | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 2 +-
 arch/powerpc/kernel/head_booke.h | 4 
 arch/powerpc/kernel/head_fsl_booke.S | 2 +-
 7 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 9410e5490c33..33fb08b2398f 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -369,7 +369,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
DO_KVM  0xc00
 SystemCall:
EXCEPTION_PROLOG
-   EXC_XFER_EE_LITE(0xc00, DoSyscall)
+   EXC_XFER_SYS(0xc00, DoSyscall)
 
 /* Single step - not used on 601 */
EXCEPTION(0xd00, SingleStep, single_step_exception, EXC_XFER_STD)
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index aa0131bb09b5..7221418a883f 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -123,6 +123,10 @@
EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL, NOCOPY, transfer_to_handler, \
  ret_from_except)
 
+#define EXC_XFER_SYS(n, hdlr)  \
+   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL | MSR_EE, NOCOPY, 
transfer_to_handler, \
+ ret_from_except)
+
 #define EXC_XFER_EE(n, hdlr)   \
EXC_XFER_TEMPLATE(hdlr, n, MSR_KERNEL, COPY_EE, 
transfer_to_handler_full, \
  ret_from_except_full)
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 3dc8a35849ff..ee6edf8a20c2 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -350,7 +350,7 @@ _ENTRY(saved_ksp_limit)
 /* 0x0C00 - System Call Exception */
START_EXCEPTION(0x0C00, SystemCall)
EXCEPTION_PROLOG
-   EXC_XFER_EE_LITE(0xc00, DoSyscall)
+   EXC_XFER_SYS(0xc00, DoSyscall)
 
EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_EE)
EXCEPTION(0x0E00, Trap_0E, unknown_exception, EXC_XFER_EE)
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 37117ab11584..9cc01948651f 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -283,7 +283,7 @@ interrupt_base:
/* System Call Interrupt */
START_EXCEPTION(SystemCall)
NORMAL_EXCEPTION_PROLOG(BOOKE_INTERRUPT_SYSCALL)
-   EXC_XFER_EE_LITE(0x0c00, DoSyscall)
+   EXC_XFER_SYS(0x0c00, DoSyscall)
 
/* Auxiliary Processor Unavailable Interrupt */
EXCEPTION(0x2020, BOOKE_INTERRUPT_AP_UNAVAIL, \
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 083f27f383b4..dfd5a8195e5e 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -186,7 +186,7 @@ Alignment:
. = 0xc00
 SystemCall:
EXCEPTION_PROLOG
-   EXC_XFER_EE_LITE(0xc00, DoSyscall)
+   EXC_XFER_SYS(0xc00, DoSyscall)
 
 /* Single step - not used on 601 */
EXCEPTION(0xd00, SingleStep, single_step_exception, EXC_XFER_STD)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 1b22a8dea399..612f54ba1125 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -251,6 +251,10 @@ END_BTB_FLUSH_SECTION
EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL, NOCOPY, transfer_to_handler, \
  ret_from_except)
 
+#define EXC_XFER_SYS(n, hdlr)  \
+   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL | MSR_EE, NOCOPY, 
transfer_to_handler, \
+ ret_from_except)
+
 #define EXC_XFER_EE(n, hdlr)   \
EXC_XFER_TEMPLATE(hdlr, n, MSR_KERNEL, COPY_EE, 
transfer_to_handler_full, \
  ret_from_except_full)
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 1881127682e9..89f36623993d 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -418,7 +418,7 @@ interrupt_base:
/* System Call Interrupt */
START_EXCEPTION(SystemCall)
NORMAL_EXCEPTION_PROLOG(SYSCALL)
-   EXC_XFER_EE_LITE(0x0c00, DoSyscall)
+   EXC_XFER_SYS(0x0c00, DoSyscall)
 
/* Auxiliary Processor Unavailable Interrupt */
EXCEPTION(0x2900, AP_UNAVAIL, AuxillaryProcessorUnavailable, \
-- 
2.13.3



[RFC PATCH v1 09/16] powerpc/fsl_booke: ensure SPEFloatingPointException() reenables interrupts

2019-02-08 Thread Christophe Leroy
SPEFloatingPointException() is the only exception handler which 'forgets' to
re-enable interrupts. This patch makes sure it does.

Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/traps.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 040b60293613..ea552793fe44 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1997,6 +1997,10 @@ void SPEFloatingPointException(struct pt_regs *regs)
int code = FPE_FLTUNK;
int err;
 
+   /* We restore the interrupt state now */
+   if (!arch_irq_disabled_regs(regs))
+   local_irq_enable();
+
flush_spe_to_thread(current);
 
spefscr = current->thread.spefscr;
@@ -2042,6 +2046,10 @@ void SPEFloatingPointRoundException(struct pt_regs *regs)
extern int speround_handler(struct pt_regs *regs);
int err;
 
+   /* We restore the interrupt state now */
+   if (!arch_irq_disabled_regs(regs))
+   local_irq_enable();
+
preempt_disable();
if (regs->msr & MSR_SPE)
giveup_spe(current);
-- 
2.13.3



[PATCH v1 08/16] powerpc/40x: Refactor exception entry macros by using head_32.h

2019-02-08 Thread Christophe Leroy
Refactor exception entry macros by using the ones defined in head_32.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h  |  4 ++
 arch/powerpc/kernel/head_40x.S | 88 +-
 2 files changed, 6 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 985758cbf577..aa0131bb09b5 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -59,8 +59,12 @@
stw r1,GPR1(r11)
stw r1,0(r11)
tovirt(r1,r11)  /* set new kernel sp */
+#ifdef CONFIG_40x
+   rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
+#else
li  r10,MSR_KERNEL & ~(MSR_IR|MSR_DR) /* can take exceptions */
MTMSRD(r10) /* (except for mach check in rtas) */
+#endif
stw r0,GPR0(r11)
lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
addir10,r10,STACK_FRAME_REGS_MARKER@l
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 2961e2aa1d18..3dc8a35849ff 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -44,6 +44,8 @@
 #include 
 #include 
 
+#include "head_32.h"
+
 /* As with the other PowerPC ports, it is expected that when code
  * execution begins here, the following registers contain valid, yet
  * optional, information:
@@ -99,52 +101,6 @@ _ENTRY(saved_ksp_limit)
.space  4
 
 /*
- * Exception vector entry code. This code runs with address translation
- * turned off (i.e. using physical addresses). We assume SPRG_THREAD has
- * the physical address of the current task thread_struct.
- */
-#define EXCEPTION_PROLOG\
-   mtspr   SPRN_SPRG_SCRATCH0,r10; /* save two registers to work with */\
-   mtspr   SPRN_SPRG_SCRATCH1,r11;  \
-   mfcrr10;/* save CR in r10 for now  */\
-   EXCEPTION_PROLOG_1;  \
-   EXCEPTION_PROLOG_2
-
-#define EXCEPTION_PROLOG_1  \
-   mfspr   r11,SPRN_SRR1;  /* check whether user or kernel*/\
-   andi.   r11,r11,MSR_PR;  \
-   tophys(r11,r1);  \
-   beq 1f;  \
-   mfspr   r11,SPRN_SPRG_THREAD;   /* if from user, start at top of   */\
-   lwz r11,TASK_STACK-THREAD(r11); /* this thread's kernel stack */\
-   addir11,r11,THREAD_SIZE; \
-   tophys(r11,r11); \
-1: subir11,r11,INT_FRAME_SIZE  /* Allocate an exception frame */
-
-#define EXCEPTION_PROLOG_2  \
-   stw r10,_CCR(r11);  /* save various registers  */\
-   stw r12,GPR12(r11);  \
-   stw r9,GPR9(r11);\
-   mfspr   r10,SPRN_SPRG_SCRATCH0;  \
-   stw r10,GPR10(r11);  \
-   mfspr   r12,SPRN_SPRG_SCRATCH1;  \
-   stw r12,GPR11(r11);  \
-   mflrr10; \
-   stw r10,_LINK(r11);  \
-   mfspr   r12,SPRN_SRR0;   \
-   stw r1,GPR1(r11);\
-   mfspr   r9,SPRN_SRR1;\
-   stw r1,0(r11);   \
-   tovirt(r1,r11); /* set new kernel sp */ \
-   rlwinm  r9,r9,0,14,12;  /* clear MSR_WE (necessary?)   */\
-   stw r0,GPR0(r11);\
-   lis r10, STACK_FRAME_REGS_MARKER@ha; /* exception frame marker */\
-   addir10, r10, STACK_FRAME_REGS_MARKER@l; \
-   stw r10, 8(r11); \
-   SAVE_4GPRS(3, r11);  \
-   SAVE_2GPRS(7, r11)
-
-/*
  * Exception prolog for critical exceptions.  This is a little different
  * from the normal exception prolog above since a critical exception
  * can potentially occur at any point during normal exception processing.
@@ -205,16 +161,6 @@ _ENTRY(saved_ksp_limit)
 /*
  * Exception vectors.
  */
-#defineSTART_EXCEPTION(n, label)   
 \

[PATCH v1 07/16] powerpc/40x: Split and rename NORMAL_EXCEPTION_PROLOG

2019-02-08 Thread Christophe Leroy
This patch splits NORMAL_EXCEPTION_PROLOG in the same way as in
head_8xx.S and head_32.S and renames it EXCEPTION_PROLOG() as well
to match head_32.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index f3bfb695f952..2961e2aa1d18 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -103,10 +103,14 @@ _ENTRY(saved_ksp_limit)
  * turned off (i.e. using physical addresses). We assume SPRG_THREAD has
  * the physical address of the current task thread_struct.
  */
-#define NORMAL_EXCEPTION_PROLOG
 \
+#define EXCEPTION_PROLOG\
mtspr   SPRN_SPRG_SCRATCH0,r10; /* save two registers to work with */\
mtspr   SPRN_SPRG_SCRATCH1,r11;  \
mfcrr10;/* save CR in r10 for now  */\
+   EXCEPTION_PROLOG_1;  \
+   EXCEPTION_PROLOG_2
+
+#define EXCEPTION_PROLOG_1  \
mfspr   r11,SPRN_SRR1;  /* check whether user or kernel*/\
andi.   r11,r11,MSR_PR;  \
tophys(r11,r1);  \
@@ -115,7 +119,9 @@ _ENTRY(saved_ksp_limit)
lwz r11,TASK_STACK-THREAD(r11); /* this thread's kernel stack */\
addir11,r11,THREAD_SIZE; \
tophys(r11,r11); \
-1: subir11,r11,INT_FRAME_SIZE; /* Allocate an exception frame */\
+1: subir11,r11,INT_FRAME_SIZE  /* Allocate an exception frame */
+
+#define EXCEPTION_PROLOG_2  \
stw r10,_CCR(r11);  /* save various registers  */\
stw r12,GPR12(r11);  \
stw r9,GPR9(r11);\
@@ -205,7 +211,7 @@ label:
 
 #define EXCEPTION(n, label, hdlr, xfer)\
START_EXCEPTION(n, label);  \
-   NORMAL_EXCEPTION_PROLOG;\
+   EXCEPTION_PROLOG;   \
addir3,r1,STACK_FRAME_OVERHEAD; \
xfer(n, hdlr)
 
@@ -396,7 +402,7 @@ label:
  * This is caused by a fetch from non-execute or guarded pages.
  */
START_EXCEPTION(0x0400, InstructionAccess)
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
mr  r4,r12  /* Pass SRR0 as arg2 */
li  r5,0/* Pass zero as arg3 */
EXC_XFER_LITE(0x400, handle_page_fault)
@@ -406,7 +412,7 @@ label:
 
 /* 0x0600 - Alignment Exception */
START_EXCEPTION(0x0600, Alignment)
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
mfspr   r4,SPRN_DEAR/* Grab the DEAR and save it */
stw r4,_DEAR(r11)
addir3,r1,STACK_FRAME_OVERHEAD
@@ -414,7 +420,7 @@ label:
 
 /* 0x0700 - Program Exception */
START_EXCEPTION(0x0700, ProgramCheck)
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
mfspr   r4,SPRN_ESR /* Grab the ESR and save it */
stw r4,_ESR(r11)
addir3,r1,STACK_FRAME_OVERHEAD
@@ -427,7 +433,7 @@ label:
 
 /* 0x0C00 - System Call Exception */
START_EXCEPTION(0x0C00, SystemCall)
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
EXC_XFER_EE_LITE(0xc00, DoSyscall)
 
EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_EE)
@@ -733,7 +739,7 @@ label:
 
/* Programmable Interval Timer (PIT) Exception. (from 0x1000) */
 Decrementer:
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
lis r0,TSR_PIS@h
mtspr   SPRN_TSR,r0 /* Clear the PIT exception */
addir3,r1,STACK_FRAME_OVERHEAD
@@ -741,7 +747,7 @@ Decrementer:
 
/* Fixed Interval Timer (FIT) Exception. (from 0x1010) */
 FITException:
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
addir3,r1,STACK_FRAME_OVERHEAD;
EXC_XFER_EE(0x1010, unknown_exception)
 
@@ -759,7 +765,7 @@ WDTException:
  * if they can't resolve the lightweight TLB fault.
  */
 DataAccess:
-   NORMAL_EXCEPTION_PROLOG
+   EXCEPTION_PROLOG
mfspr   r5,SPRN_ESR /* Grab the ESR, save it, pass arg3 */
stw r5,_ESR(r11)
mfspr   r4,SPRN_DEAR/* Grab the DEAR, save it, pass arg2 */
-- 
2.13.3



[PATCH v1 06/16] powerpc/40x: add exception frame marker

2019-02-08 Thread Christophe Leroy
This patch adds STACK_FRAME_REGS_MARKER in the stack at exception entry
in order to see interrupts in call traces as below:

[0.013964] Call Trace:
[0.014014] [c0745db0] [c007a9d4] tick_periodic.constprop.5+0xd8/0x104 
(unreliable)
[0.014086] [c0745dc0] [c007aa20] tick_handle_periodic+0x20/0x9c
[0.014181] [c0745de0] [c0009cd0] timer_interrupt+0xa0/0x264
[0.014258] [c0745e10] [c000e484] ret_from_except+0x0/0x14
[0.014390] --- interrupt: 901 at console_unlock.part.7+0x3f4/0x528
[0.014390] LR = console_unlock.part.7+0x3f0/0x528
[0.014455] [c0745ee0] [c0050334] console_unlock.part.7+0x114/0x528 
(unreliable)
[0.014542] [c0745f30] [c00524e0] register_console+0x3d8/0x44c
[0.014625] [c0745f60] [c0675aac] cpm_uart_console_init+0x18/0x2c
[0.014709] [c0745f70] [c06614f4] console_init+0x114/0x1cc
[0.014795] [c0745fb0] [c0658b68] start_kernel+0x300/0x3d8
[0.014864] [c0745ff0] [c00022cc] start_here+0x44/0x98

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 59f6f53f1ac2..f3bfb695f952 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -132,6 +132,9 @@ _ENTRY(saved_ksp_limit)
tovirt(r1,r11); /* set new kernel sp */ \
rlwinm  r9,r9,0,14,12;  /* clear MSR_WE (necessary?)   */\
stw r0,GPR0(r11);\
+   lis r10, STACK_FRAME_REGS_MARKER@ha; /* exception frame marker */\
+   addir10, r10, STACK_FRAME_REGS_MARKER@l; \
+   stw r10, 8(r11); \
SAVE_4GPRS(3, r11);  \
SAVE_2GPRS(7, r11)
 
@@ -174,6 +177,9 @@ _ENTRY(saved_ksp_limit)
tovirt(r1,r11);  \
rlwinm  r9,r9,0,14,12;  /* clear MSR_WE (necessary?)   */\
stw r0,GPR0(r11);\
+   lis r10, STACK_FRAME_REGS_MARKER@ha; /* exception frame marker */\
+   addir10, r10, STACK_FRAME_REGS_MARKER@l; \
+   stw r10, 8(r11); \
SAVE_4GPRS(3, r11);  \
SAVE_2GPRS(7, r11)
 
-- 
2.13.3



[PATCH v1 05/16] powerpc/40x: Don't use SPRN_SPRG_SCRATCH2 in EXCEPTION_PROLOG

2019-02-08 Thread Christophe Leroy
Unlike said in the comment, r1 is not reused by the critical
exception handler, as it uses a dedicated critirq_ctx stack.
Decrementing r1 early is then unneeded.

Should the above be valid, the code is crap buggy anyway as
r1 gets some intermediate values that would jeopardise the
whole process (for instance after mfspr   r1,SPRN_SPRG_THREAD)

Using SPRN_SPRG_SCRATCH2 to save r1 is then not needed, r11 can be
used instead. This avoids one mtspr and one mfspr and makes the
prolog closer to what's done on 6xx and 8xx.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 3088c9f29f5e..59f6f53f1ac2 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -102,23 +102,20 @@ _ENTRY(saved_ksp_limit)
  * Exception vector entry code. This code runs with address translation
  * turned off (i.e. using physical addresses). We assume SPRG_THREAD has
  * the physical address of the current task thread_struct.
- * Note that we have to have decremented r1 before we write to any fields
- * of the exception frame, since a critical interrupt could occur at any
- * time, and it will write to the area immediately below the current r1.
  */
 #define NORMAL_EXCEPTION_PROLOG
 \
mtspr   SPRN_SPRG_SCRATCH0,r10; /* save two registers to work with */\
mtspr   SPRN_SPRG_SCRATCH1,r11;  \
-   mtspr   SPRN_SPRG_SCRATCH2,r1;   \
mfcrr10;/* save CR in r10 for now  */\
mfspr   r11,SPRN_SRR1;  /* check whether user or kernel*/\
andi.   r11,r11,MSR_PR;  \
-   beq 1f;  \
-   mfspr   r1,SPRN_SPRG_THREAD;/* if from user, start at top of   */\
-   lwz r1,TASK_STACK-THREAD(r1); /* this thread's kernel stack   */\
-   addir1,r1,THREAD_SIZE;   \
-1: subir1,r1,INT_FRAME_SIZE;   /* Allocate an exception frame */\
tophys(r11,r1);  \
+   beq 1f;  \
+   mfspr   r11,SPRN_SPRG_THREAD;   /* if from user, start at top of   */\
+   lwz r11,TASK_STACK-THREAD(r11); /* this thread's kernel stack */\
+   addir11,r11,THREAD_SIZE; \
+   tophys(r11,r11); \
+1: subir11,r11,INT_FRAME_SIZE; /* Allocate an exception frame */\
stw r10,_CCR(r11);  /* save various registers  */\
stw r12,GPR12(r11);  \
stw r9,GPR9(r11);\
@@ -128,11 +125,11 @@ _ENTRY(saved_ksp_limit)
stw r12,GPR11(r11);  \
mflrr10; \
stw r10,_LINK(r11);  \
-   mfspr   r10,SPRN_SPRG_SCRATCH2;  \
mfspr   r12,SPRN_SRR0;   \
-   stw r10,GPR1(r11);   \
+   stw r1,GPR1(r11);\
mfspr   r9,SPRN_SRR1;\
-   stw r10,0(r11);  \
+   stw r1,0(r11);   \
+   tovirt(r1,r11); /* set new kernel sp */ \
rlwinm  r9,r9,0,14,12;  /* clear MSR_WE (necessary?)   */\
stw r0,GPR0(r11);\
SAVE_4GPRS(3, r11);  \
-- 
2.13.3



[PATCH v1 04/16] powerpc/32: make the 6xx/8xx EXC_XFER_TEMPLATE() similar to the 40x/booke one

2019-02-08 Thread Christophe Leroy
6xx/8xx EXC_XFER_TEMPLATE() macro adds a i##n symbol which is
unused and can be removed.
40x and booke EXC_XFER_TEMPLATE() macros takes msr from the caller
while the 6xx/8xx version uses only MSR_KERNEL as msr value.

This patch modifies the 6xx/8xx version to make it similar to the
40x and booke versions.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index cf3d00844597..985758cbf577 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -99,13 +99,12 @@
addir3,r1,STACK_FRAME_OVERHEAD; \
xfer(n, hdlr)
 
-#define EXC_XFER_TEMPLATE(n, hdlr, trap, copyee, tfer, ret)\
+#define EXC_XFER_TEMPLATE(hdlr, trap, msr, copyee, tfer, ret)  \
li  r10,trap;   \
stw r10,_TRAP(r11); \
-   LOAD_MSR_KERNEL(r10, MSR_KERNEL);   \
+   LOAD_MSR_KERNEL(r10, msr);  \
copyee(r10, r9);\
bl  tfer;   \
-i##n:  \
.long   hdlr;   \
.long   ret
 
@@ -113,19 +112,19 @@ i##n: 
\
 #define NOCOPY(d, s)
 
 #define EXC_XFER_STD(n, hdlr)  \
-   EXC_XFER_TEMPLATE(n, hdlr, n, NOCOPY, transfer_to_handler_full, \
+   EXC_XFER_TEMPLATE(hdlr, n, MSR_KERNEL, NOCOPY, 
transfer_to_handler_full,\
  ret_from_except_full)
 
 #define EXC_XFER_LITE(n, hdlr) \
-   EXC_XFER_TEMPLATE(n, hdlr, n+1, NOCOPY, transfer_to_handler, \
+   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL, NOCOPY, transfer_to_handler, \
  ret_from_except)
 
 #define EXC_XFER_EE(n, hdlr)   \
-   EXC_XFER_TEMPLATE(n, hdlr, n, COPY_EE, transfer_to_handler_full, \
+   EXC_XFER_TEMPLATE(hdlr, n, MSR_KERNEL, COPY_EE, 
transfer_to_handler_full, \
  ret_from_except_full)
 
 #define EXC_XFER_EE_LITE(n, hdlr)  \
-   EXC_XFER_TEMPLATE(n, hdlr, n+1, COPY_EE, transfer_to_handler, \
+   EXC_XFER_TEMPLATE(hdlr, n+1, MSR_KERNEL, COPY_EE, transfer_to_handler, \
  ret_from_except)
 
 #endif /* __HEAD_32_H__ */
-- 
2.13.3



Re: [PATCH 6/7] powerpc/eeh: Allow disabling recovery

2019-02-08 Thread Oliver
On Fri, Feb 8, 2019 at 8:58 PM Michael Ellerman  wrote:
>
> Oliver O'Halloran  writes:
>
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index d1f0bdf41fac..92809b137e39 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
> >  &eeh_enable_dbgfs_ops);
> >   debugfs_create_u32("eeh_max_freezes", 0600,
> >   powerpc_debugfs_root, &eeh_max_freezes);
> > + debugfs_create_bool("eeh_disable_recovery", 0600,
> > + powerpc_debugfs_root,
> > + &eeh_debugfs_no_recover);
> >   eeh_cache_debugfs_init();
> > +#endif
>
> There's that endif.

Bleh

>
> Whem I'm doing rebasing and think I might have broken bisectability I
> build every commit with:
>
>   https://github.com/mpe/misc-scripts/blob/master/git/for-each-commit

Thanks, I have something similar for skiboot but never got around to
porting it to the kernel.


[PATCH v1 03/16] powerpc/32: move LOAD_MSR_KERNEL() into head_32.h and use it

2019-02-08 Thread Christophe Leroy
As preparation for using head_32.h for head_40x.S, move
LOAD_MSR_KERNEL() there and use it to load r10 with MSR_KERNEL value.

In the mean time, this patch modifies it so that it takes into account
the size of the passed value to determine if 'li' can be used or if
'lis/ori' is needed instead of using the size of MSR_KERNEL. This is
done by using gas macro.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S |  9 +
 arch/powerpc/kernel/head_32.h  | 15 ++-
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index a5e2d5585dcb..b489aebdc5c5 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -37,14 +37,7 @@
 #include 
 #include 
 
-/*
- * MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
- */
-#if MSR_KERNEL >= 0x1
-#define LOAD_MSR_KERNEL(r, x)  lis r,(x)@h; ori r,r,(x)@l
-#else
-#define LOAD_MSR_KERNEL(r, x)  li r,(x)
-#endif
+#include "head_32.h"
 
 /*
  * Align to 4k in order to ensure that all functions modyfing srr0/srr1
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 7456e2a45acc..cf3d00844597 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -5,6 +5,19 @@
 #include /* for STACK_FRAME_REGS_MARKER */
 
 /*
+ * MSR_KERNEL is > 0x8000 on 4xx/Book-E since it include MSR_CE.
+ */
+.macro __LOAD_MSR_KERNEL r, x
+.if \x >= 0x8000
+   lis \r, (\x)@h
+   ori \r, \r, (\x)@l
+.else
+   li \r, (\x)
+.endif
+.endm
+#define LOAD_MSR_KERNEL(r, x) __LOAD_MSR_KERNEL r, x
+
+/*
  * Exception entry code.  This code runs with address translation
  * turned off, i.e. using physical addresses.
  * We assume sprg3 has the physical address of the current
@@ -89,7 +102,7 @@
 #define EXC_XFER_TEMPLATE(n, hdlr, trap, copyee, tfer, ret)\
li  r10,trap;   \
stw r10,_TRAP(r11); \
-   li  r10,MSR_KERNEL; \
+   LOAD_MSR_KERNEL(r10, MSR_KERNEL);   \
copyee(r10, r9);\
bl  tfer;   \
 i##n:  \
-- 
2.13.3



[PATCH v1 02/16] powerpc/32: Refactor EXCEPTION entry macros for head_8xx.S and head_32.S

2019-02-08 Thread Christophe Leroy
EXCEPTION_PROLOG is similar in head_8xx.S and head_32.S

This patch creates head_32.h and moves EXCEPTION_PROLOG macro
into it. It also converts it from a GCC macro to a GAS macro
in order to ease refactorisation with 40x later, since
GAS macros allows the use of #ifdef/#else/#endif inside it.
And it also has the advantage of not requiring the uggly "; \"
at the end of each line.

This patch also moves EXCEPTION() and EXC_XFER_() macros which
are also similar while adding START_EXCEPTION() out of EXCEPTION().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S  |  99 +-
 arch/powerpc/kernel/head_32.h  | 118 +
 arch/powerpc/kernel/head_8xx.S |  98 +-
 3 files changed, 122 insertions(+), 193 deletions(-)
 create mode 100644 arch/powerpc/kernel/head_32.h

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 146385b1c2da..9410e5490c33 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -37,6 +37,8 @@
 #include 
 #include 
 
+#include "head_32.h"
+
 /* 601 only have IBAT; cr0.eq is set on 601 when using this macro */
 #define LOAD_BAT(n, reg, RA, RB)   \
/* see the comment for clear_bats() -- Cort */ \
@@ -242,103 +244,6 @@ __secondary_hold_spinloop:
 __secondary_hold_acknowledge:
.long   -1
 
-/*
- * Exception entry code.  This code runs with address translation
- * turned off, i.e. using physical addresses.
- * We assume sprg3 has the physical address of the current
- * task's thread_struct.
- */
-#define EXCEPTION_PROLOG   \
-   mtspr   SPRN_SPRG_SCRATCH0,r10; \
-   mtspr   SPRN_SPRG_SCRATCH1,r11; \
-   mfcrr10;\
-   EXCEPTION_PROLOG_1; \
-   EXCEPTION_PROLOG_2
-
-#define EXCEPTION_PROLOG_1 \
-   mfspr   r11,SPRN_SRR1;  /* check whether user or kernel */ \
-   andi.   r11,r11,MSR_PR; \
-   tophys(r11,r1); /* use tophys(r1) if kernel */ \
-   beq 1f; \
-   mfspr   r11,SPRN_SPRG_THREAD;   \
-   lwz r11,TASK_STACK-THREAD(r11); \
-   addir11,r11,THREAD_SIZE;\
-   tophys(r11,r11);\
-1: subir11,r11,INT_FRAME_SIZE  /* alloc exc. frame */
-
-
-#define EXCEPTION_PROLOG_2 \
-   stw r10,_CCR(r11);  /* save registers */ \
-   stw r12,GPR12(r11); \
-   stw r9,GPR9(r11);   \
-   mfspr   r10,SPRN_SPRG_SCRATCH0; \
-   stw r10,GPR10(r11); \
-   mfspr   r12,SPRN_SPRG_SCRATCH1; \
-   stw r12,GPR11(r11); \
-   mflrr10;\
-   stw r10,_LINK(r11); \
-   mfspr   r12,SPRN_SRR0;  \
-   mfspr   r9,SPRN_SRR1;   \
-   stw r1,GPR1(r11);   \
-   stw r1,0(r11);  \
-   tovirt(r1,r11); /* set new kernel sp */ \
-   li  r10,MSR_KERNEL & ~(MSR_IR|MSR_DR); /* can take exceptions */ \
-   MTMSRD(r10);/* (except for mach check in rtas) */ \
-   stw r0,GPR0(r11);   \
-   lis r10,STACK_FRAME_REGS_MARKER@ha; /* exception frame marker */ \
-   addir10,r10,STACK_FRAME_REGS_MARKER@l; \
-   stw r10,8(r11); \
-   SAVE_4GPRS(3, r11); \
-   SAVE_2GPRS(7, r11)
-
-/*
- * Note: code which follows this uses cr0.eq (set if from kernel),
- * r11, r12 (SRR0), and r9 (SRR1).
- *
- * Note2: once we have set r1 we are in a position to take exceptions
- * again, and we could thus set MSR:RI at that point.
- */
-
-/*
- * Exception vectors.
- */
-#define EXCEPTION(n, label, hdlr, xfer)\
-   . = n;  \
-   DO_KVM n;   \
-label: \
-   EXCEPTION_PROLOG;   \
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   xfer(n, hdlr)
-
-#define EXC_XFER_TEMPLATE(n, hdlr, trap, copyee, tfer, ret)\
-   li  r10,trap;   \
-   stw r10,_TRAP(r11); \
-   li  r10,MSR_KERNEL; \
-   copyee(r10, r9);\
-   bl  tfer;   \
-i##n:  \
-   .long   hdlr;   \
-   .long   ret
-
-#define COPY_EE(d, s)  rlwimi d,s,0,16,16
-#define NOCOPY(d, s)
-
-#define EXC_XFER_STD(n, hdlr)  \
-   EXC_XFER_TEMPLATE(n, hdlr, n, NOCOPY, transfer_to_handler_full, \
- ret_from_except_full)
-
-#define EXC_XFER_LITE(n, hdlr) \
-   EXC_XFER_TEMPLATE(n, hdlr, n+1, NOCOPY, transfer_to_handler, \
- ret_from_except)
-
-#define EXC_XFER_EE(n, hdlr)   \
-   EXC_XFER_TEMPLATE(n, hdlr, n, COPY_EE, transfer_to_handler_f

[PATCH v1 01/16] powerpc: CONFIG_THREAD_INFO_IN_TASK series

2019-02-08 Thread Christophe Leroy
This patch is a squashed version of the
CONFIG_THREAD_INFO_IN_TASK series to make building robots
happy until that serie appears in powerpc/next.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/Makefile  |   7 ++
 arch/powerpc/include/asm/asm-prototypes.h  |   4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |   2 +-
 arch/powerpc/include/asm/exception-64s.h   |   4 +-
 arch/powerpc/include/asm/irq.h |  18 ++--
 arch/powerpc/include/asm/livepatch.h   |   7 +-
 arch/powerpc/include/asm/processor.h   | 105 ++-
 arch/powerpc/include/asm/ptrace.h  |   2 +-
 arch/powerpc/include/asm/reg.h |   2 +-
 arch/powerpc/include/asm/smp.h |  17 +++-
 arch/powerpc/include/asm/task_size_32.h|  21 +
 arch/powerpc/include/asm/task_size_64.h|  79 +
 arch/powerpc/include/asm/thread_info.h |  19 -
 arch/powerpc/kernel/asm-offsets.c  |  12 ++-
 arch/powerpc/kernel/entry_32.S |  80 ++---
 arch/powerpc/kernel/entry_64.S |  12 +--
 arch/powerpc/kernel/epapr_hcalls.S |   5 +-
 arch/powerpc/kernel/exceptions-64e.S   |  13 +--
 arch/powerpc/kernel/exceptions-64s.S   |   2 +-
 arch/powerpc/kernel/head_32.S  |  14 +--
 arch/powerpc/kernel/head_40x.S |   4 +-
 arch/powerpc/kernel/head_44x.S |   8 +-
 arch/powerpc/kernel/head_64.S  |   1 +
 arch/powerpc/kernel/head_8xx.S |   2 +-
 arch/powerpc/kernel/head_booke.h   |  12 +--
 arch/powerpc/kernel/head_fsl_booke.S   |  16 ++--
 arch/powerpc/kernel/idle_6xx.S |   8 +-
 arch/powerpc/kernel/idle_book3e.S  |   2 +-
 arch/powerpc/kernel/idle_e500.S|   8 +-
 arch/powerpc/kernel/idle_power4.S  |   2 +-
 arch/powerpc/kernel/irq.c  | 114 +++--
 arch/powerpc/kernel/kgdb.c |  28 --
 arch/powerpc/kernel/machine_kexec_64.c |   6 +-
 arch/powerpc/kernel/misc_32.S  |  17 ++--
 arch/powerpc/kernel/process.c  |  63 --
 arch/powerpc/kernel/setup-common.c |   2 +-
 arch/powerpc/kernel/setup_32.c |  26 +++---
 arch/powerpc/kernel/setup_64.c |  51 +++
 arch/powerpc/kernel/smp.c  |  16 ++--
 arch/powerpc/kernel/stacktrace.c   |  29 ++-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |   6 +-
 arch/powerpc/kvm/book3s_hv_hmi.c   |   1 +
 arch/powerpc/mm/hash_low_32.S  |  14 ++-
 arch/powerpc/net/bpf_jit32.h   |   5 +-
 arch/powerpc/sysdev/6xx-suspend.S  |   5 +-
 arch/powerpc/xmon/xmon.c   |   2 +-
 47 files changed, 367 insertions(+), 507 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_32.h
 create mode 100644 arch/powerpc/include/asm/task_size_64.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 08908219fba9..3f237ffa0649 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -241,6 +241,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ac033341ed55..7de49889bd5d 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -427,6 +427,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1d911f68a23b..1484df6779ab 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 12e522807f9f..a28a28079edb 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/includ

[RFC PATCH v1 00/16] powerpc/32: Implement fast syscall entry

2019-02-08 Thread Christophe Leroy
The purpose of this series is to implement a fast syscall entry
on ppc32.

Unlike all other exceptions which can happen at any time and
require to preserve all registers, the syscalls do not
require the preservation of volatile registers (except LR).

Syscall entries can then be optimised with lighter entry code
than the general exception handling.

In the meantime this series refactorises the exception entry on
40x/6xx/8xx as they are pretty similar, and it takes benh series
on rationalise the settings of MSR_EE at exceptions/syscall entries
as this change pretty simplies exception entries.

On a 8xx, this series improves null_syscall selftest by 17%
On a 83xx, this series improves null_syscall selftest by 12,5%

Christophe Leroy (16):
  powerpc: CONFIG_THREAD_INFO_IN_TASK series
  powerpc/32: Refactor EXCEPTION entry macros for head_8xx.S and
head_32.S
  powerpc/32: move LOAD_MSR_KERNEL() into head_32.h and use it
  powerpc/32: make the 6xx/8xx EXC_XFER_TEMPLATE() similar to the
40x/booke one
  powerpc/40x: Don't use SPRN_SPRG_SCRATCH2 in EXCEPTION_PROLOG
  powerpc/40x: add exception frame marker
  powerpc/40x: Split and rename NORMAL_EXCEPTION_PROLOG
  powerpc/40x: Refactor exception entry macros by using head_32.h
  powerpc/fsl_booke: ensure SPEFloatingPointException() reenables
interrupts
  powerpc/32: enter syscall with MSR_EE inconditionaly set
  powerpc/32: Enter exceptions with MSR_EE unset
  powerpc/32: get rid of COPY_EE in exception entry
  powerpc: Fix 32-bit handling of MSR_EE on exceptions
  powerpc/32: implement fast entry for syscalls on non BOOKE
  powerpc/32: Remove MSR_PR test when returning from syscall
  powerpc/32: don't do syscall stuff in transfer_to_handler on non BOOKE

 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/Makefile  |   7 +
 arch/powerpc/include/asm/asm-prototypes.h  |   4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |   2 +-
 arch/powerpc/include/asm/exception-64s.h   |   4 +-
 arch/powerpc/include/asm/irq.h |  18 +-
 arch/powerpc/include/asm/livepatch.h   |   7 +-
 arch/powerpc/include/asm/processor.h   | 105 +-
 arch/powerpc/include/asm/ptrace.h  |   2 +-
 arch/powerpc/include/asm/reg.h |   2 +-
 arch/powerpc/include/asm/smp.h |  17 +-
 arch/powerpc/include/asm/task_size_32.h|  21 ++
 arch/powerpc/include/asm/task_size_64.h|  79 
 arch/powerpc/include/asm/thread_info.h |  19 --
 arch/powerpc/kernel/asm-offsets.c  |  12 +-
 arch/powerpc/kernel/entry_32.S | 261 ++---
 arch/powerpc/kernel/entry_64.S |  12 +-
 arch/powerpc/kernel/epapr_hcalls.S |   5 +-
 arch/powerpc/kernel/exceptions-64e.S   |  13 +-
 arch/powerpc/kernel/exceptions-64s.S   |   2 +-
 arch/powerpc/kernel/head_32.S  | 182 -
 arch/powerpc/kernel/head_32.h  | 203 +++
 arch/powerpc/kernel/head_40x.S | 154 ---
 arch/powerpc/kernel/head_44x.S |  16 +-
 arch/powerpc/kernel/head_64.S  |   1 +
 arch/powerpc/kernel/head_8xx.S | 133 ++---
 arch/powerpc/kernel/head_booke.h   |  44 ++---
 arch/powerpc/kernel/head_fsl_booke.S   |  44 ++---
 arch/powerpc/kernel/idle_6xx.S |   8 +-
 arch/powerpc/kernel/idle_book3e.S  |   2 +-
 arch/powerpc/kernel/idle_e500.S|   8 +-
 arch/powerpc/kernel/idle_power4.S  |   2 +-
 arch/powerpc/kernel/irq.c  | 114 ++-
 arch/powerpc/kernel/kgdb.c |  28 ---
 arch/powerpc/kernel/machine_kexec_64.c |   6 +-
 arch/powerpc/kernel/misc_32.S  |  17 +-
 arch/powerpc/kernel/process.c  |  63 +++---
 arch/powerpc/kernel/setup-common.c |   2 +-
 arch/powerpc/kernel/setup_32.c |  26 ++-
 arch/powerpc/kernel/setup_64.c |  51 +
 arch/powerpc/kernel/smp.c  |  16 +-
 arch/powerpc/kernel/stacktrace.c   |  29 ++-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |   6 +-
 arch/powerpc/kernel/traps.c|   8 +
 arch/powerpc/kvm/book3s_hv_hmi.c   |   1 +
 arch/powerpc/mm/hash_low_32.S  |  14 +-
 arch/powerpc/net/bpf_jit32.h   |   5 +-
 arch/powerpc/sysdev/6xx-suspend.S  |   5 +-
 arch/powerpc/xmon/xmon.c   |   2 +-
 49 files changed, 816 insertions(+), 967 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_32.h
 create mode 100644 arch/powerpc/include/asm/task_size_64.h
 create mode 100644 arch/powerpc/kernel/head_32.h

-- 
2.13.3



Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs

2019-02-08 Thread Oliver
On Fri, Feb 8, 2019 at 11:32 PM Michael Ellerman  wrote:
>
> Oliver O'Halloran  writes:
>
> > This patch adds a debugfs interface to force scheduling a recovery event.
> > This can be used to recover a specific PE or schedule a "special" recovery
> > even that checks for errors at the PHB level.
> > To force a recovery of a normal PE, use:
> >
> >  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover
> >
> > To force a scan broken PHBs:
> >
> >  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover
>
> Why 'null', that seems like an odd choice. Why not "all" or "scan" or
> something?

When an EEH event occurs the bit that is sent to the event handler is
just a pointer the the struct eeh_pe. If the pointer is null it's then
treated as a special event which indicates a PHB failure. I agree it's
a bit dumb, but I don't really expect anyone except me or samb to use
this interface so I went with what would make sense to someone
familiar with the internals.

>
> Also it oopsed on me:
>
> [   76.323164] sending failure event
> [   76.323421] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
> [   76.323655] Faulting instruction address: 0x
> [   76.323856] Oops: Kernel access of bad area, sig: 11 [#1]
> [   76.323946] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [   76.324295] Modules linked in: vmx_crypto kvm binfmt_misc ip_tables 
> x_tables autofs4 crc32c_vpmsum
> [   76.324669] CPU: 2 PID: 97 Comm: eehd Not tainted 
> 5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517 #435
> [   76.325054] NIP:   LR: c00451f8 CTR: 
> 
> [   76.325402] REGS: c000fec779c0 TRAP: 0400   Not tainted  
> (5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517)
> [   76.325768] MSR:  80014280b033  
>  CR: 24000482  XER: 2000
> [   76.326243] CFAR: c0002528 IRQMASK: 0
> [   76.326243] GPR00: c0045edc c000fec77c50 c1574000 
> c000fec77cb0
> [   76.326243] GPR04:  00177d76e3e321bc 00177d76e4293a1f 
> 5deadbeef100
> [   76.326243] GPR08: 5deadbeef200   
> 00177d76e3e3216b
> [   76.326243] GPR12:  c0003fffdf00 c01438a8 
> c000fe211700
> [   76.326243] GPR16:    
> 
> [   76.326243] GPR20:    
> c0e814e8
> [   76.326243] GPR24: c0e814c0 5deadbeef100 c1622480 
> 0001
> [   76.326243] GPR28: c1413310 c16244e0 c14132f0 
> c001f84246a0
> [   76.329073] NIP []   (null)
> [   76.329285] LR [c00451f8] eeh_handle_special_event+0x78/0x348
> [   76.329602] Call Trace:
> [   76.329762] [c000fec77c50] [c000fec77ce0] 0xc000fec77ce0 
> (unreliable)
> [   76.330113] [c000fec77d00] [c0045edc] 
> eeh_event_handler+0x10c/0x1c0
> [   76.330464] [c000fec77db0] [c0143a4c] kthread+0x1ac/0x1c0
> [   76.330681] [c000fec77e20] [c000bdc4] 
> ret_from_kernel_thread+0x5c/0x78
> [   76.331026] Instruction dump:
> [   76.331197]        
> 
> [   76.331550]        
> 
> [   76.331803] ---[ end trace dc73d37df5bb9ecd ]---
>
>
> cheers

This is probably a side effect of special events being a PowerNV
specific concept. For a pseries guest there should never be any PHB
PEs since (hardware) PHBs are a concept that is hidden to to a guest.
It's like EEH is poorly thought out and full of layering violations or
something...


[PATCH] powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64

2019-02-08 Thread Michael Ellerman
In commit 7820856a4fcd ("powerpc/mm/book3e/64: Remove unsupported
64Kpage size from 64bit booke") we dropped the 64K page size support
from the 64-bit nohash (Book3E) code.

But we didn't update the dependencies of the PPC_64K_PAGES option,
meaning a randconfig can still trigger this code and cause a build
breakage, eg:
  arch/powerpc/include/asm/nohash/64/pgtable.h:14:2: error: #error "Page size 
not supported"
  arch/powerpc/include/asm/nohash/mmu-book3e.h:275:2: error: #error Unsupported 
page size

So remove PPC_BOOK3E_64 from the dependencies. This also means we
don't need to worry about PPC_FSL_BOOK3E, because that was just trying
to prevent the PPC_BOOK3E_64=y && PPC_FSL_BOOK3E=y case.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3f237ffa0649..7a16b8a7b54b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -694,7 +694,7 @@ config PPC_16K_PAGES
 
 config PPC_64K_PAGES
bool "64k page size"
-   depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
+   depends on 44x || PPC_BOOK3S_64
select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64
 
 config PPC_256K_PAGES
-- 
2.20.1



Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs

2019-02-08 Thread Michael Ellerman
Oliver O'Halloran  writes:

> This patch adds a debugfs interface to force scheduling a recovery event.
> This can be used to recover a specific PE or schedule a "special" recovery
> even that checks for errors at the PHB level.
> To force a recovery of a normal PE, use:
>
>  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover
>
> To force a scan broken PHBs:
>
>  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover

Why 'null', that seems like an odd choice. Why not "all" or "scan" or
something?

Also it oopsed on me:

[   76.323164] sending failure event
[   76.323421] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[   76.323655] Faulting instruction address: 0x
[   76.323856] Oops: Kernel access of bad area, sig: 11 [#1]
[   76.323946] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   76.324295] Modules linked in: vmx_crypto kvm binfmt_misc ip_tables x_tables 
autofs4 crc32c_vpmsum
[   76.324669] CPU: 2 PID: 97 Comm: eehd Not tainted 
5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517 #435
[   76.325054] NIP:   LR: c00451f8 CTR: 
[   76.325402] REGS: c000fec779c0 TRAP: 0400   Not tainted  
(5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517)
[   76.325768] MSR:  80014280b033   
CR: 24000482  XER: 2000
[   76.326243] CFAR: c0002528 IRQMASK: 0 
[   76.326243] GPR00: c0045edc c000fec77c50 c1574000 
c000fec77cb0 
[   76.326243] GPR04:  00177d76e3e321bc 00177d76e4293a1f 
5deadbeef100 
[   76.326243] GPR08: 5deadbeef200   
00177d76e3e3216b 
[   76.326243] GPR12:  c0003fffdf00 c01438a8 
c000fe211700 
[   76.326243] GPR16:    
 
[   76.326243] GPR20:    
c0e814e8 
[   76.326243] GPR24: c0e814c0 5deadbeef100 c1622480 
0001 
[   76.326243] GPR28: c1413310 c16244e0 c14132f0 
c001f84246a0 
[   76.329073] NIP []   (null)
[   76.329285] LR [c00451f8] eeh_handle_special_event+0x78/0x348
[   76.329602] Call Trace:
[   76.329762] [c000fec77c50] [c000fec77ce0] 0xc000fec77ce0 
(unreliable)
[   76.330113] [c000fec77d00] [c0045edc] 
eeh_event_handler+0x10c/0x1c0
[   76.330464] [c000fec77db0] [c0143a4c] kthread+0x1ac/0x1c0
[   76.330681] [c000fec77e20] [c000bdc4] 
ret_from_kernel_thread+0x5c/0x78
[   76.331026] Instruction dump:
[   76.331197]        
 
[   76.331550]        
 
[   76.331803] ---[ end trace dc73d37df5bb9ecd ]---


cheers


Re: [PATCH] powerpc/44x: Force PCI on for CURRITUCK

2019-02-08 Thread Michael Ellerman
Geert Uytterhoeven  writes:
> Hi Michael,
>
> On Thu, Feb 7, 2019 at 3:49 AM Michael Ellerman  wrote:
>> The recent rework of PCI kconfig symbols exposed an existing bug in
>> the CURRITUCK kconfig logic.
>>
>> It selects PPC4xx_PCI_EXPRESS which depends on PCI, but PCI is user
>> selectable and might be disabled, leading to a warning:
>>
>>   WARNING: unmet direct dependencies detected for PPC4xx_PCI_EXPRESS
>> Depends on [n]: PCI [=n] && 4xx [=y]
>> Selected by [y]:
>> - CURRITUCK [=y] && PPC_47x [=y]
>>
>> Prior to commit eb01d42a7778 ("PCI: consolidate PCI config entry in
>> drivers/pci") PCI was enabled by default for currituck_defconfig so we
>> didn't see the warning. The bad logic was still there, it just
>> required someone disabling PCI in their .config to hit it.
>>
>> Fix it by forcing PCI on for CURRITUCK, which seems was always the
>> expectation anyway.
>>
>> Fixes: eb01d42a7778 ("PCI: consolidate PCI config entry in drivers/pci")
>> Reported-by: Randy Dunlap 
>> Signed-off-by: Michael Ellerman 
>> ---
>>  arch/powerpc/platforms/44x/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/platforms/44x/Kconfig 
>> b/arch/powerpc/platforms/44x/Kconfig
>> index 4a9a72d01c3c..35be81fd2dc2 100644
>> --- a/arch/powerpc/platforms/44x/Kconfig
>> +++ b/arch/powerpc/platforms/44x/Kconfig
>> @@ -180,6 +180,7 @@ config CURRITUCK
>> depends on PPC_47x
>> select SWIOTLB
>> select 476FPE
>> +   select FORCE_PCI
>> select PPC4xx_PCI_EXPRESS
>
> Would "select PPC4xx_PCI_EXPRESS if PCI" be a suitable alternative?

It would work, but I don't really like it because it means the
dependency on PCI is now encoded in two places.

I also doubt it reflects the intention of the original authors, because
at the time PCI was default y I suspect they never intended for PCI to
be disabled for that board.

cheers


[PATCH v2] powerpc/powernv/idle: Restore IAMR after idle

2019-02-08 Thread Russell Currey
Without restoring the IAMR after idle, execution prevention on POWER9
with Radix MMU is overwritten and the kernel can freely execute userspace 
without
faulting.

This is necessary when returning from any stop state that modifies user
state, as well as hypervisor state.

To test how this fails without this patch, load the lkdtm driver and
do the following:

   echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT

which won't fault, then boot the kernel with powersave=off, where it
will fault.  Applying this patch will fix this.

Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
space")
Cc: 
Signed-off-by: Russell Currey 
---
Since v1:
  - no longer use paca to save IAMR, instead use _DAR (thanks mpe)
  - remove isync and pnv_wakeup_noloss section (thanks Nick)

 arch/powerpc/kernel/idle_book3s.S | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 7f5ac2e8581b..551cc4649021 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -200,6 +200,13 @@ pnv_powersave_common:
/* Continue saving state */
SAVE_GPR(2, r1)
SAVE_NVGPRS(r1)
+
+BEGIN_FTR_SECTION
+   /* _DAR is unused here, so (ab)use it to save the IAMR */
+   mfspr   r5,SPRN_IAMR
+   std r5,_DAR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
mfcrr5
std r5,_CCR(r1)
std r1,PACAR1(r13)
@@ -924,6 +931,17 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_NVGPRS(r1)
REST_GPR(2, r1)
+
+BEGIN_FTR_SECTION
+   /* IAMR was saved in regs->dar in pnv_powersave_common */
+   ld  r4,_DAR(r1)
+   mtspr   SPRN_IAMR,r4
+   /*
+* We don't need an isync here because the upcoming mtmsrd is
+* execution synchronizing.
+*/
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
ld  r4,PACAKMSR(r13)
ld  r5,_LINK(r1)
ld  r6,_CCR(r1)
-- 
2.20.1



Re: use generic DMA mapping code in powerpc V4

2019-02-08 Thread Christian Zigotzky
OK, I will test it.

— Christian

Sent from my iPhone

> On 8. Feb 2019, at 10:18, Christoph Hellwig  wrote:
> 
>> On Fri, Feb 08, 2019 at 10:01:46AM +0100, Christian Zigotzky wrote:
>> Hi Christoph,
>> 
>> Your new patch fixes the problems with the P.A. Semi Ethernet! :-)
> 
> Thanks a lot once again for testing!
> 
> Now can you test with this patch and the whole series?
> 
> I've updated the powerpc-dma.6 branch to include this fix.


Re: [PATCH 6/7] powerpc/eeh: Allow disabling recovery

2019-02-08 Thread Michael Ellerman
Oliver O'Halloran  writes:

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index d1f0bdf41fac..92809b137e39 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
>  &eeh_enable_dbgfs_ops);
>   debugfs_create_u32("eeh_max_freezes", 0600,
>   powerpc_debugfs_root, &eeh_max_freezes);
> + debugfs_create_bool("eeh_disable_recovery", 0600,
> + powerpc_debugfs_root,
> + &eeh_debugfs_no_recover);
>   eeh_cache_debugfs_init();
> +#endif

There's that endif.

Whem I'm doing rebasing and think I might have broken bisectability I
build every commit with:

  https://github.com/mpe/misc-scripts/blob/master/git/for-each-commit


cheers


Re: [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain()

2019-02-08 Thread Michael Ellerman
Oliver O'Halloran  writes:
> diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> b/arch/powerpc/include/asm/pci-bridge.h
> index aee4fcc24990..149053b7f481 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
>  extern struct pci_controller *pci_find_hose_for_OF_device(
>   struct device_node* node);
>  
> +extern struct pci_controller *pci_find_hose_for_domain(uint32_t domain_nr);

I know we use "hose" a lot in the PCI code, but it's a stupid name. Can
we not introduce new usages?

It returns a pci_controller so pci_find_controller_for_domain() ?

cheers


Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache

2019-02-08 Thread Michael Ellerman
Oliver O'Halloran  writes:
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index f6e65375a8de..d1f0bdf41fac 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1810,7 +1810,7 @@ static int __init eeh_init_proc(void)
>  &eeh_enable_dbgfs_ops);
>   debugfs_create_u32("eeh_max_freezes", 0600,
>   powerpc_debugfs_root, &eeh_max_freezes);
> -#endif
> + eeh_cache_debugfs_init();

Oops :)

> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index b2c320e0fcef..dba421a577e7 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -298,9 +299,34 @@ void eeh_addr_cache_build(void)
>   eeh_addr_cache_insert_dev(dev);
>   eeh_sysfs_add_device(dev);
>   }
> +}
>  
> -#ifdef DEBUG
> - /* Verify tree built up above, echo back the list of addrs. */
> - eeh_addr_cache_print(&pci_io_addr_cache_root);
> -#endif
> +static int eeh_addr_cache_show(struct seq_file *s, void *v)
> +{
> + struct rb_node *n = rb_first(&pci_io_addr_cache_root.rb_root);
> + struct pci_io_addr_range *piar;
> + int cnt = 0;
> +
> + spin_lock(&pci_io_addr_cache_root.piar_lock);
> + while (n) {
> + piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> +
> + seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
> +(piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
> +&piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
> +
> + n = rb_next(n);
> + cnt++;
> + }

You can write that as a for loop can't you?

struct rb_node *n;
int i = 0;

for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n), 
i++) {
piar = rb_entry(n, struct pci_io_addr_range, rb_node);

seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
   (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", i,
   &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
}

cheers


Re: [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes

2019-02-08 Thread Michael Ellerman
Oliver O'Halloran  writes:

> There's no need to the custom getter/setter functions so we should remove
> them in favour of using the generic one. While we're here, change the type
> of eeh_max_freeze to uint32_t and print the value in decimal rather than

Please use kernel types, ie. u32.

Look fine otherwise.

cheers

> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/include/asm/eeh.h |  2 +-
>  arch/powerpc/kernel/eeh.c  | 21 +++--
>  2 files changed, 4 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 8b596d096ebe..c003628441cc 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -219,7 +219,7 @@ struct eeh_ops {
>  };
>  
>  extern int eeh_subsystem_flags;
> -extern int eeh_max_freezes;
> +extern uint32_t eeh_max_freezes;
>  extern struct eeh_ops *eeh_ops;
>  extern raw_spinlock_t confirm_error_lock;
>  
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index ae05203eb4de..f6e65375a8de 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -109,7 +109,7 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
>   * frozen count in last hour exceeds this limit, the PE will
>   * be forced to be offline permanently.
>   */
> -int eeh_max_freezes = 5;
> +uint32_t eeh_max_freezes = 5;
>  
>  /* Platform dependent EEH operations */
>  struct eeh_ops *eeh_ops = NULL;
> @@ -1796,22 +1796,8 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
>   return 0;
>  }
>  
> -static int eeh_freeze_dbgfs_set(void *data, u64 val)
> -{
> - eeh_max_freezes = val;
> - return 0;
> -}
> -
> -static int eeh_freeze_dbgfs_get(void *data, u64 *val)
> -{
> - *val = eeh_max_freezes;
> - return 0;
> -}
> -
>  DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
>eeh_enable_dbgfs_set, "0x%llx\n");
> -DEFINE_DEBUGFS_ATTRIBUTE(eeh_freeze_dbgfs_ops, eeh_freeze_dbgfs_get,
> -  eeh_freeze_dbgfs_set, "0x%llx\n");
>  #endif
>  
>  static int __init eeh_init_proc(void)
> @@ -1822,9 +1808,8 @@ static int __init eeh_init_proc(void)
>   debugfs_create_file_unsafe("eeh_enable", 0600,
>  powerpc_debugfs_root, NULL,
>  &eeh_enable_dbgfs_ops);
> - debugfs_create_file_unsafe("eeh_max_freezes", 0600,
> -powerpc_debugfs_root, NULL,
> -&eeh_freeze_dbgfs_ops);
> + debugfs_create_u32("eeh_max_freezes", 0600,
> + powerpc_debugfs_root, &eeh_max_freezes);
>  #endif
>   }
>  
> -- 
> 2.20.1


Re: use generic DMA mapping code in powerpc V4

2019-02-08 Thread Christoph Hellwig
On Fri, Feb 08, 2019 at 10:01:46AM +0100, Christian Zigotzky wrote:
> Hi Christoph,
>
> Your new patch fixes the problems with the P.A. Semi Ethernet! :-)

Thanks a lot once again for testing!

Now can you test with this patch and the whole series?

I've updated the powerpc-dma.6 branch to include this fix.


Re: use generic DMA mapping code in powerpc V4

2019-02-08 Thread Christian Zigotzky

Hi Christoph,

Your new patch fixes the problems with the P.A. Semi Ethernet! :-)

Thanks,
Christian


On 07 February 2019 at 05:34AM, Christian Zigotzky wrote:

Hi Christoph,

I also didn’t notice the 32-bit DMA mask in your patch. I have to read your 
patches and descriptions carefully in the future. I will test your new patch at 
the weekend.

Thanks,
Christian

Sent from my iPhone


On 6. Feb 2019, at 16:16, Christoph Hellwig  wrote:


On Wed, Feb 06, 2019 at 04:15:05PM +0100, Christoph Hellwig wrote:
The last good one was 29e7e2287e196f48fe5d2a6e017617723ea979bf
("dma-direct: we might need GFP_DMA for 32-bit dma masks"), if I
remember correctly.  powerpc/dma: use the dma_direct mapping routines
was the one that you said makes the pasemi ethernet stop working.

Can you post the dmesg from the failing runs?

But I just noticed I sent you a wrong patch - the pasemi ethernet
should set a 64-bit DMA mask, not 32-bit.  Updated version below,
32-bit would just keep the previous status quo.

commit 6c8f88045dee3597b9ce2ea5371eee37073a
Author: Christoph Hellwig 
Date:   Mon Feb 4 13:38:22 2019 +0100

pasemi WIP

diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c 
b/drivers/net/ethernet/pasemi/pasemi_mac.c
index 8a31a02c9f47..2d7d1589490a 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1716,6 +1716,7 @@ pasemi_mac_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
err = -ENODEV;
goto out;
}
+dma_set_mask(&mac->dma_pdev->dev, DMA_BIT_MASK(64));

mac->iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL);
if (!mac->iob_pdev) {





Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache

2019-02-08 Thread kbuild test robot
Hi Oliver,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.0-rc4 next-20190207]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Oliver-O-Halloran/powerpc-eeh-Use-debugfs_create_u32-for-eeh_max_freezes/20190208-145918
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.2.0 make.cross ARCH=powerpc 

Note: the 
linux-review/Oliver-O-Halloran/powerpc-eeh-Use-debugfs_create_u32-for-eeh_max_freezes/20190208-145918
 HEAD a8dcd44575537e3e67a44fe3139b273a64c0f620 builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/eeh.c:1840: error: unterminated #ifdef
#ifdef CONFIG_DEBUG_FS


vim +1840 arch/powerpc/kernel/eeh.c

7f52a526f arch/powerpc/kernel/eeh.c Gavin Shan2014-04-24  1835  
^1da177e4 arch/ppc64/kernel/eeh.c   Linus Torvalds2005-04-16  1836  static 
int __init eeh_init_proc(void)
^1da177e4 arch/ppc64/kernel/eeh.c   Linus Torvalds2005-04-16  1837  {
7f52a526f arch/powerpc/kernel/eeh.c Gavin Shan2014-04-24  1838  
if (machine_is(pseries) || machine_is(powernv)) {
3f3942aca arch/powerpc/kernel/eeh.c Christoph Hellwig 2018-05-15  1839  
proc_create_single("powerpc/eeh", 0, NULL, proc_eeh_show);
7f52a526f arch/powerpc/kernel/eeh.c Gavin Shan2014-04-24 @1840  #ifdef 
CONFIG_DEBUG_FS

:: The code at line 1840 was first introduced by commit
:: 7f52a526f64c69c913f0027fbf43821ff0b3a7d7 powerpc/eeh: Allow to disable 
EEH

:: TO: Gavin Shan 
:: CC: Benjamin Herrenschmidt 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip