Re: [PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown

2017-11-10 Thread Nicholas Piggin
On Fri, 10 Nov 2017 22:08:32 +1100
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > Currently powernv reboot and shutdown requests just leave secondaries
> > to do their own things. This is undesirable because they can trigger
> > any number of watchdogs while waiting for reboot, but also we don't
> > know what else they might be doing, or they might be stuck somewhere
> > causing trouble.
> >
> > The opal scheduled flash update code already ran into watchdog problems
> > due to flashing taking a long time, but it's possible for regular
> > reboots to trigger problems too (this is with watchdog_thresh set to 1,
> > but I have seen it with watchdog_thresh at the default value once too):
> >
> >   reboot: Restarting system
> >   [  360.038896709,5] OPAL: Reboot request...
> >   Watchdog CPU:0 Hard LOCKUP
> >   Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
> >   Watchdog CPU:16 Hard LOCKUP
> >   watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]
> >
> > So remove the special case for flash update, and unconditionally do
> > smp_send_stop before rebooting.
> >
> > Return the CPUs to Linux stop loops rather than OPAL. The reason for
> > this is that the path to firmware is longer, and the CPUs may have
> > been interrupted from firmware, which may cause problems to re-enter
> > it. It's better to put them into a simple spin loop to maximize the
> > chance of a successful reboot.  
> 
> I always assumed we had to send the CPUs back to OPAL for the flashing
> procedure. Is it OK to leave them in Linux?

According to the comment and changelog

2196c6f1ed66eef23df3b478cfe71661ae83726e

It was added just to keep secondaries from going silly. Vasant, can
you remember details?

Thanks,
Nick


Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()

2017-11-10 Thread Sergey Senozhatsky
On (11/10/17 10:09), Luck, Tony wrote:
> On Fri, Nov 10, 2017 at 08:48:29AM +0900, Sergey Senozhatsky wrote:
> > -Examples::
> > -
> > -   printk("Going to call: %pF\n", gettimeofday);
> > -   printk("Going to call: %pF\n", p->func);
> > -   printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
> > -   printk("%s: called from %pS\n", __func__,
> > -   (void *)__builtin_return_address(0));
> > -   printk("Faulted at %pS\n", (void *)regs->ip);
> > -   printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);
> 
> Did you mean to delete the Examples completely?  Wouldn't it
> be better to just update (s/%pF/%pS/g)?

good question. yes, I think I did it deliberately :) we still
kinda have some sort of "examples", right at the beginning of
section "Symbols/Function Pointers"


>  Symbols/Function Pointers
>  =
>
>  ::
>
> %pS versatile_init+0x0/0x110
>  %ps versatile_init
>  %pF versatile_init+0x0/0x110
>  %pf versatile_init
>  %pSRversatile_init+0x9/0x110
> (with __builtin_extract_return_addr() translation)
>  %pB prev_fn_of_versatile_init+0x88/0x88
>
>  The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic
>  format. They result in the symbol name with (``S``) or without (``s``)
>  offsets. If KALLSYMS are disabled then the symbol address is printed instead.
>
>  Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
>  and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
>  parisc64 function pointers are indirect and, in fact, are function
>  descriptors, which require additional dereferencing before we can lookup
>  the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
>  platforms (when needed), so ``F`` and ``f`` exist for compatibility
>  reasons only.
>
>  The ``B`` specifier results in the symbol name with offsets and should be
>  used when printing stack backtraces. The specifier takes into
>  consideration the effect of compiler optimisations which may occur
>  when tail-call``s are used and marked with the noreturn GCC attribute.

I can return Examples back. don't really have a strong opinion
on this. let me know.

-ss


Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-11-10 Thread Sergey Senozhatsky
On (11/10/17 10:11), Luck, Tony wrote:
> On Fri, Nov 10, 2017 at 08:48:24AM +0900, Sergey Senozhatsky wrote:
> > All Ack-s/Tested-by-s were dropped, since the patch set has been
> > reworked. I'm kindly asking arch-s maintainers and developers to test it
> > once again. Sorry for any inconveniences and thanks for your help in
> > advance.
> 
> You can add back the:
> 
> Tested-by: Tony Luck  #ia64

Thanks a ton, Tony!

-ss


Re: [PATCH 1/9] include: Move compat_timespec/ timeval to compat_time.h

2017-11-10 Thread Steven Rostedt
On Fri, 10 Nov 2017 14:42:51 -0800
Deepa Dinamani  wrote:

> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index 09ad88572746..db25aa15b705 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -49,7 +49,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
>  #if !defined(__ASSEMBLY__) && !defined(COMPILE_OFFSETS)
>  
>  #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
> -#include 
> +#include 
>  

Acked-by: Steven Rostedt (VMware) 

-- Steve


[PATCH 1/9] include: Move compat_timespec/ timeval to compat_time.h

2017-11-10 Thread Deepa Dinamani
All the current architecture specific defines for these
are the same. Refactor these common defines to a common
header file.

The new common linux/compat_time.h is also useful as it
will eventually be used to hold all the defines that
are needed for compat time types that support non y2038
safe types. New architectures need not have to define these
new types as they will only use new y2038 safe syscalls.
This file can be deleted after y2038 when we stop supporting
non y2038 safe syscalls.

The patch also requires an operation similar to:

git grep "asm/compat\.h" | cut -d ":" -f 1 |  xargs -n 1 sed -i -e 
"s%asm/compat.h%linux/compat.h%g"

Cc: a...@kernel.org
Cc: b...@kernel.crashing.org
Cc: borntrae...@de.ibm.com
Cc: catalin.mari...@arm.com
Cc: cmetc...@mellanox.com
Cc: coh...@redhat.com
Cc: da...@davemloft.net
Cc: del...@gmx.de
Cc: de...@driverdev.osuosl.org
Cc: gerald.schae...@de.ibm.com
Cc: gre...@linuxfoundation.org
Cc: heiko.carst...@de.ibm.com
Cc: hoepp...@linux.vnet.ibm.com
Cc: h...@zytor.com
Cc: j...@parisc-linux.org
Cc: j...@linux.vnet.ibm.com
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: mark.rutl...@arm.com
Cc: mi...@redhat.com
Cc: m...@ellerman.id.au
Cc: ober...@linux.vnet.ibm.com
Cc: oprofile-l...@lists.sf.net
Cc: pau...@samba.org
Cc: pet...@infradead.org
Cc: r...@linux-mips.org
Cc: rost...@goodmis.org
Cc: r...@kernel.org
Cc: schwidef...@de.ibm.com
Cc: seb...@linux.vnet.ibm.com
Cc: sparcli...@vger.kernel.org
Cc: s...@linux.vnet.ibm.com
Cc: ubr...@linux.vnet.ibm.com
Cc: will.dea...@arm.com
Cc: x...@kernel.org
Signed-off-by: Deepa Dinamani 
---
 arch/arm64/include/asm/compat.h   | 11 ---
 arch/arm64/include/asm/stat.h |  1 +
 arch/arm64/kernel/hw_breakpoint.c |  1 -
 arch/arm64/kernel/perf_regs.c |  2 +-
 arch/arm64/kernel/process.c   |  1 -
 arch/mips/include/asm/compat.h| 11 ---
 arch/mips/kernel/signal32.c   |  2 +-
 arch/parisc/include/asm/compat.h  | 11 ---
 arch/powerpc/include/asm/compat.h | 11 ---
 arch/powerpc/kernel/asm-offsets.c |  2 +-
 arch/powerpc/oprofile/backtrace.c |  2 +-
 arch/s390/hypfs/hypfs_sprp.c  |  1 -
 arch/s390/include/asm/compat.h| 11 ---
 arch/s390/include/asm/elf.h   |  3 +--
 arch/s390/kvm/priv.c  |  1 -
 arch/s390/pci/pci_clp.c   |  1 -
 arch/sparc/include/asm/compat.h   | 11 ---
 arch/tile/include/asm/compat.h| 11 ---
 arch/x86/events/core.c|  2 +-
 arch/x86/include/asm/compat.h | 11 ---
 arch/x86/include/asm/ftrace.h |  2 +-
 arch/x86/include/asm/sys_ia32.h   |  2 +-
 arch/x86/kernel/sys_x86_64.c  |  2 +-
 drivers/s390/block/dasd_ioctl.c   |  1 -
 drivers/s390/char/fs3270.c|  1 -
 drivers/s390/char/sclp_ctl.c  |  1 -
 drivers/s390/char/vmcp.c  |  1 -
 drivers/s390/cio/chsc_sch.c   |  1 -
 drivers/s390/net/qeth_core_main.c |  2 +-
 drivers/staging/pi433/pi433_if.c  |  2 +-
 include/linux/compat.h|  1 +
 include/linux/compat_time.h   | 19 +++
 32 files changed, 32 insertions(+), 110 deletions(-)
 create mode 100644 include/linux/compat_time.h

diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index e39d487bf724..d4f9c9ee3b15 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -34,7 +34,6 @@
 
 typedef u32compat_size_t;
 typedef s32compat_ssize_t;
-typedef s32compat_time_t;
 typedef s32compat_clock_t;
 typedef s32compat_pid_t;
 typedef u16__compat_uid_t;
@@ -66,16 +65,6 @@ typedef u32  compat_ulong_t;
 typedef u64compat_u64;
 typedef u32compat_uptr_t;
 
-struct compat_timespec {
-   compat_time_t   tv_sec;
-   s32 tv_nsec;
-};
-
-struct compat_timeval {
-   compat_time_t   tv_sec;
-   s32 tv_usec;
-};
-
 struct compat_stat {
 #ifdef __AARCH64EB__
short   st_dev;
diff --git a/arch/arm64/include/asm/stat.h b/arch/arm64/include/asm/stat.h
index 15e35598ac40..eab738019707 100644
--- a/arch/arm64/include/asm/stat.h
+++ b/arch/arm64/include/asm/stat.h
@@ -20,6 +20,7 @@
 
 #ifdef CONFIG_COMPAT
 
+#include 
 #include 
 
 /*
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..bfa2b78cf0e3 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -29,7 +29,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include 
 #include 
diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
index 1d091d048d04..929fc369d0be 100644
--- a/arch/arm64/kernel/perf_regs.c
+++ b/arch/arm64/kernel/perf_regs.c
@@ -5,7 +5,7 @@
 #include 
 #include 
 
-#include 
+#include 
 #include 
 #include 
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/proc

[PATCH 0/9] posix_clocks: Prepare syscalls for 64 bit time_t conversion

2017-11-10 Thread Deepa Dinamani
The series is a preparation series for individual architectures
to use 64 bit time_t syscalls in compat and 32 bit emulation modes.

This is a follow up to the series Arnd Bergmann posted:
https://sourceware.org/ml/libc-alpha/2015-05/msg00070.html

Big picture is as per the lwn article:
https://lwn.net/Articles/643234/

The series is directed at converting posix clock syscalls:
clock_gettime, clock_settime, clock_getres and clock_nanosleep
to use a new data structure __kernel_timespec at syscall boundaries.
__kernel_timespec maintains 64 bit time_t across all execution modes.

vdso will be handled as part of each architecture when they enable
support for 64 bit time_t.

The compat syscalls are repurposed to provide backward compatibility
by using them as native syscalls as well for 32 bit architectures.
They will continue to use timespec at syscall boundaries.

CONFIG_64_BIT_TIME controls whether the syscalls use __kernel_timespec
or timespec at syscall boundaries.

The series does the following:
1. Enable compat syscalls unconditionally.
2. Add a new __kernel_timespec type to be used as the data structure
   for all the new syscalls.
3. Add new config CONFIG_64BIT_TIME(intead of the CONFIG_COMPAT_TIME in
   [1] and [2] to switch to new definition of __kernel_timespec. It is
   the same as struct timespec otherwise.

Arnd Bergmann (1):
  y2038: introduce CONFIG_64BIT_TIME

Deepa Dinamani (8):
  include: Move compat_timespec/ timeval to compat_time.h
  compat: Make compat helpers independent of CONFIG_COMPAT
  compat: enable compat_get/put_timespec64 always
  posix-clocks: Enable compat syscalls always
  include: Add new y2038 safe __kernel_timespec
  fix get_timespec64() for y2038 safe compat interfaces
  change time types to new y2038 safe __kernel_* types
  nanosleep: change time types to safe __kernel_* types

 arch/Kconfig   | 11 
 arch/arm64/include/asm/compat.h| 11 
 arch/arm64/include/asm/stat.h  |  1 +
 arch/arm64/kernel/hw_breakpoint.c  |  1 -
 arch/arm64/kernel/perf_regs.c  |  2 +-
 arch/arm64/kernel/process.c|  1 -
 arch/mips/include/asm/compat.h | 11 
 arch/mips/kernel/signal32.c|  2 +-
 arch/parisc/include/asm/compat.h   | 11 
 arch/powerpc/include/asm/compat.h  | 11 
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/oprofile/backtrace.c  |  2 +-
 arch/s390/hypfs/hypfs_sprp.c   |  1 -
 arch/s390/include/asm/compat.h | 11 
 arch/s390/include/asm/elf.h|  3 +-
 arch/s390/kvm/priv.c   |  1 -
 arch/s390/pci/pci_clp.c|  1 -
 arch/sparc/include/asm/compat.h| 11 
 arch/tile/include/asm/compat.h | 11 
 arch/x86/events/core.c |  2 +-
 arch/x86/include/asm/compat.h  | 11 
 arch/x86/include/asm/ftrace.h  |  2 +-
 arch/x86/include/asm/sys_ia32.h|  2 +-
 arch/x86/kernel/sys_x86_64.c   |  2 +-
 drivers/s390/block/dasd_ioctl.c|  1 -
 drivers/s390/char/fs3270.c |  1 -
 drivers/s390/char/sclp_ctl.c   |  1 -
 drivers/s390/char/vmcp.c   |  1 -
 drivers/s390/cio/chsc_sch.c|  1 -
 drivers/s390/net/qeth_core_main.c  |  2 +-
 drivers/staging/pi433/pi433_if.c   |  2 +-
 include/linux/compat.h |  7 ++-
 include/linux/compat_time.h| 23 +
 include/linux/restart_block.h  |  7 +--
 include/linux/syscalls.h   | 12 ++---
 include/linux/time.h   |  4 +-
 include/linux/time64.h | 10 +++-
 include/uapi/asm-generic/posix_types.h |  1 +
 include/uapi/linux/time.h  |  7 +++
 kernel/Makefile|  2 +-
 kernel/compat.c| 92 ++
 kernel/time/hrtimer.c  |  7 +--
 kernel/time/posix-stubs.c  | 12 ++---
 kernel/time/posix-timers.c | 20 
 kernel/time/time.c | 10 +++-
 45 files changed, 152 insertions(+), 195 deletions(-)
 create mode 100644 include/linux/compat_time.h


base-commit: d9e0e63d9a6f88440eb201e1491fcf730272c706
-- 
2.11.0

Cc: a...@kernel.org
Cc: b...@kernel.crashing.org
Cc: borntrae...@de.ibm.com
Cc: catalin.mari...@arm.com
Cc: cmetc...@mellanox.com
Cc: coh...@redhat.com
Cc: da...@davemloft.net
Cc: del...@gmx.de
Cc: de...@driverdev.osuosl.org
Cc: gerald.schae...@de.ibm.com
Cc: gre...@linuxfoundation.org
Cc: heiko.carst...@de.ibm.com
Cc: hoepp...@linux.vnet.ibm.com
Cc: h...@zytor.com
Cc: j...@parisc-linux.org
Cc: j...@linux.vnet.ibm.com
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: mark.rutl...@arm.com
Cc: mi...@redhat.com
Cc: m...@ellerman.id.au
Cc: ober...@linux.vnet.ibm.com
Cc: oprofile-

[RFC PATCH v7 for 4.15 02/10] membarrier: powerpc: Skip memory barrier in switch_mm()

2017-11-10 Thread Mathieu Desnoyers
Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Changes since v5:
- Rebase on v4.14-rc6.
- Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on
  powerpc (v2)"

Changes since v6:
- Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED.

Signed-off-by: Mathieu Desnoyers 
CC: Peter Zijlstra 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
CC: Alan Stern 
CC: Will Deacon 
CC: Andy Lutomirski 
CC: Ingo Molnar 
CC: Alexander Viro 
CC: Nicholas Piggin 
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-a...@vger.kernel.org
---
 MAINTAINERS   |  1 +
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/membarrier.h | 25 +
 arch/powerpc/mm/mmu_context.c |  7 +++
 include/linux/sched/mm.h  | 12 +++-
 init/Kconfig  |  3 +++
 kernel/sched/core.c   | 10 --
 kernel/sched/membarrier.c |  9 +
 8 files changed, 57 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 1022b5f51cd1..1c02a2be1698 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8837,6 +8837,7 @@ L:linux-ker...@vger.kernel.org
 S: Supported
 F: kernel/sched/membarrier.c
 F: include/uapi/linux/membarrier.h
+F: arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L: linux...@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 41d1dae3b1b5..e54a822e5fb9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -139,6 +139,7 @@ config PPC
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
+   select ARCH_HAS_MEMBARRIER_HOOKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h 
b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index ..046f96768ab5
--- /dev/null
+++ b/arch

RE: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB

2017-11-10 Thread Leo Li


> -Original Message-
> From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
> Sent: Friday, November 10, 2017 12:22 AM
> To: Xiaowei Bao ; robh...@kernel.org;
> mark.rutl...@arm.com; catalin.mari...@arm.com; will.dea...@arm.com;
> bhelg...@google.com; shawn...@kernel.org; Madalin-cristian Bucur
> ; Sumit Garg ; Y.b. Lu
> ; hongtao@nxp.com; Andy Tang
> ; Leo Li ; jingooh...@gmail.com;
> pbrobin...@gmail.com; songxiao...@hisilicon.com;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; linux-...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; Z.q. Hou ; Mingkai Hu
> ; M.h. Lian 
> Subject: Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB
> 
> Hi Bao,
> 
> On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote:
> > Add the property of inbound and outbound windows number for ep driver.
> >
> > Signed-off-by: Bao Xiaowei 
> > Acked-by: Minghuan Lian 
> > ---
> >  v2:
> >  - no change
> >  v3:
> >  - modify the commit message
> >  v4:
> >  - no change
> >
> >  arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++
> >  1 file changed, 6 insertions(+)
> 
> $subject should start with something like
> arm64: dts: ls1046a: **
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > index 06b5e12d04d8..f8332669663c 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > @@ -674,6 +674,8 @@
> > device_type = "pci";
> > dma-coherent;
> > num-lanes = <4>;
> > +   num-ib-windows = <6>;
> > +   num-ob-windows = <6>;
> 
> EP specific properties shouldn't be added in RC dt node. Ideally you should 
> have
> a separate dt node for RC and EP.

It is a single PCIe controller which can be configured to either RC mode or EP 
mode.  Wouldn't it conflict with the device tree principles to have two device 
tree nodes for the same PCIe controller?  And obviously the two modes cannot be 
used at the same time so we cannot have two drivers both probe on the same 
hardware.

Regards,
Leo


Re: [PATCH v2] watchdog: mpc8xxx: use the core worker function

2017-11-10 Thread Guenter Roeck
On Wed, Nov 08, 2017 at 03:39:44PM +0100, Christophe Leroy wrote:
> The watchdog core includes a worker function which pings the
> watchdog until user app starts pinging it and which also
> pings it if the HW require more frequent pings.
> Use that function instead of the dedicated timer.
> In the mean time, we can allow the user to change the timeout.
> 
> Then change the timeout module parameter to use seconds and
> use the watchdog_init_timeout() core function.
> 
> On some HW (eg: the 8xx), SWCRR contains bits unrelated to the
> watchdog which have to be preserved upon write.
> 
> This driver has nothing preventing the use of the magic close, so
> enable it.
> 
> Signed-off-by: Christophe Leroy 

Couple of comments, but unrelated to this patch.

Reviewed-by: Guenter Roeck 

> ---
>  v2: set ddata->wdd.max_hw_heartbeat_ms and ddata->wdd.min_timeout
>  in probe instead of start
> 
>  drivers/watchdog/mpc8xxx_wdt.c | 84 
> +++---
>  1 file changed, 38 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
> index 366e5c7e650b..aca2d6323f8a 100644
> --- a/drivers/watchdog/mpc8xxx_wdt.c
> +++ b/drivers/watchdog/mpc8xxx_wdt.c
> @@ -22,7 +22,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -31,10 +30,13 @@
>  #include 

Not needed.

>  #include 
>  
> +#define WATCHDOG_TIMEOUT 10
> +
>  struct mpc8xxx_wdt {
>   __be32 res0;
>   __be32 swcrr; /* System watchdog control register */
>  #define SWCRR_SWTC 0x /* Software Watchdog Time Count. */
> +#define SWCRR_SWF  0x0008 /* Software Watchdog Freeze (mpc8xx). */
>  #define SWCRR_SWEN 0x0004 /* Watchdog Enable bit. */
>  #define SWCRR_SWRI 0x0002 /* Software Watchdog Reset/Interrupt Select 
> bit.*/
>  #define SWCRR_SWPR 0x0001 /* Software Watchdog Counter Prescale bit. */
> @@ -52,14 +54,15 @@ struct mpc8xxx_wdt_type {
>  struct mpc8xxx_wdt_ddata {
>   struct mpc8xxx_wdt __iomem *base;
>   struct watchdog_device wdd;
> - struct timer_list timer;
>   spinlock_t lock;

Not needed (the watchdog core handles locking).

> + u16 swtc;
>  };
>  
> -static u16 timeout = 0x;
> +static u16 timeout;
>  module_param(timeout, ushort, 0);
>  MODULE_PARM_DESC(timeout,
> - "Watchdog timeout in ticks. (0 + "Watchdog timeout in seconds. (1  
>  static bool reset = 1;
>  module_param(reset, bool, 0);
> @@ -80,31 +83,27 @@ static void mpc8xxx_wdt_keepalive(struct 
> mpc8xxx_wdt_ddata *ddata)
>   spin_unlock(&ddata->lock);
>  }
>  
> -static void mpc8xxx_wdt_timer_ping(unsigned long arg)
> -{
> - struct mpc8xxx_wdt_ddata *ddata = (void *)arg;
> -
> - mpc8xxx_wdt_keepalive(ddata);
> - /* We're pinging it twice faster than needed, just to be sure. */
> - mod_timer(&ddata->timer, jiffies + HZ * ddata->wdd.timeout / 2);
> -}
> -
>  static int mpc8xxx_wdt_start(struct watchdog_device *w)
>  {
>   struct mpc8xxx_wdt_ddata *ddata =
>   container_of(w, struct mpc8xxx_wdt_ddata, wdd);
> -
> - u32 tmp = SWCRR_SWEN | SWCRR_SWPR;
> + u32 tmp = in_be32(&ddata->base->swcrr);
>  
>   /* Good, fire up the show */
> + tmp &= ~(SWCRR_SWTC | SWCRR_SWF | SWCRR_SWEN | SWCRR_SWRI | SWCRR_SWPR);
> + tmp |= SWCRR_SWEN | SWCRR_SWPR | (ddata->swtc << 16);
> +
>   if (reset)
>   tmp |= SWCRR_SWRI;
>  
> - tmp |= timeout << 16;
> -
>   out_be32(&ddata->base->swcrr, tmp);
>  
> - del_timer_sync(&ddata->timer);
> + tmp = in_be32(&ddata->base->swcrr);
> + if (!(tmp & SWCRR_SWEN))
> + return -EOPNOTSUPP;
> +
> + ddata->swtc = tmp >> 16;
> + set_bit(WDOG_HW_RUNNING, &ddata->wdd.status);
>  
>   return 0;
>  }
> @@ -118,17 +117,8 @@ static int mpc8xxx_wdt_ping(struct watchdog_device *w)
>   return 0;
>  }
>  
> -static int mpc8xxx_wdt_stop(struct watchdog_device *w)
> -{
> - struct mpc8xxx_wdt_ddata *ddata =
> - container_of(w, struct mpc8xxx_wdt_ddata, wdd);
> -
> - mod_timer(&ddata->timer, jiffies);
> - return 0;
> -}
> -
>  static struct watchdog_info mpc8xxx_wdt_info = {
> - .options = WDIOF_KEEPALIVEPING,
> + .options = WDIOF_KEEPALIVEPING | WDIOF_MAGICCLOSE | WDIOF_SETTIMEOUT,
>   .firmware_version = 1,
>   .identity = "MPC8xxx",
>  };
> @@ -137,7 +127,6 @@ static struct watchdog_ops mpc8xxx_wdt_ops = {
>   .owner = THIS_MODULE,
>   .start = mpc8xxx_wdt_start,
>   .ping = mpc8xxx_wdt_ping,
> - .stop = mpc8xxx_wdt_stop,
>  };
>  
>  static int mpc8xxx_wdt_probe(struct platform_device *ofdev)
> @@ -148,7 +137,6 @@ static int mpc8xxx_wdt_probe(struct platform_device 
> *ofdev)
>   struct mpc8xxx_wdt_ddata *ddata;
>   u32 freq = fsl_get_sys_freq();
>   bool enabled;
> - unsigned int timeout_sec;
>  
>   wdt_type = of_device_get_match_data(&ofdev->dev);
>   if (!wdt_type)
> @@ -173,27 +161,17 @@ static int mpc

[PATCH 2/2] powerpc/perf: Fix IMC_MAX_PMU macro

2017-11-10 Thread Madhavan Srinivasan
IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds
nest pmu information. Current value for the macro is 32 based on
the initial number of nest pmu units supported by the nest microcode.
But going forward, microcode could support more nest units. Instead
of static storage, patch to fix the code to dynamically allocate an
array based on the number of nest imc units found in the device tree.

Fixes:8f95faaac56c1 ('powerpc/powernv: Detect and create IMC device')
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|  6 +-
 arch/powerpc/perf/imc-pmu.c   | 15 ---
 arch/powerpc/platforms/powernv/opal-imc.c | 16 
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c282710f..fad0e6ff460f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -21,11 +21,6 @@
 #include 
 
 /*
- * For static allocation of some of the structures.
- */
-#define IMC_MAX_PMUS   32
-
-/*
  * Compatibility macros for IMC devices
  */
 #define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
@@ -125,4 +120,5 @@ enum {
 extern int init_imc_pmu(struct device_node *parent,
struct imc_pmu *pmu_ptr, int pmu_id);
 extern void thread_imc_disable(void);
+extern int get_max_nest_dev(void);
 #endif /* __ASM_POWERPC_IMC_PMU_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 74db696ef365..c40cb5f7ceaf 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -26,7 +26,7 @@
  */
 static DEFINE_MUTEX(nest_init_lock);
 static DEFINE_PER_CPU(struct imc_pmu_ref *, local_nest_imc_refc);
-static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static struct imc_pmu **per_nest_pmu_arr;
 static cpumask_t nest_imc_cpumask;
 struct imc_pmu_ref *nest_imc_refc;
 static int nest_pmus;
@@ -286,13 +286,14 @@ static struct imc_pmu_ref *get_nest_pmu_ref(int cpu)
 static void nest_change_cpu_context(int old_cpu, int new_cpu)
 {
struct imc_pmu **pn = per_nest_pmu_arr;
-   int i;
 
if (old_cpu < 0 || new_cpu < 0)
return;
 
-   for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
+   while (*pn) {
perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
+   pn++;
+   }
 }
 
 static int ppc_nest_imc_cpu_offline(unsigned int cpu)
@@ -1192,6 +1193,7 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu 
*pmu_ptr)
kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
kfree(pmu_ptr);
+   kfree(per_nest_pmu_arr);
return;
 }
 
@@ -1216,6 +1218,13 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct 
device_node *parent,
return -ENOMEM;
 
/* Needed for hotplug/migration */
+   if (!per_nest_pmu_arr) {
+   per_nest_pmu_arr = kcalloc(get_max_nest_dev() + 1,
+   sizeof(struct imc_pmu *),
+   GFP_KERNEL);
+   if (!per_nest_pmu_arr)
+   return -ENOMEM;
+   }
per_nest_pmu_arr[pmu_index] = pmu_ptr;
break;
case IMC_DOMAIN_CORE:
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index b150f4deaccf..4764e6932cb7 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -153,6 +153,22 @@ static void disable_core_pmu_counters(void)
put_online_cpus();
 }
 
+int get_max_nest_dev(void)
+{
+   struct device_node *node;
+   u32 pmu_units, type;
+
+   for_each_compatible_node(node, NULL, IMC_DTB_UNIT_COMPAT) {
+   if (of_property_read_u32(node, "type", &type))
+   continue;
+
+   if (type == IMC_TYPE_CHIP)
+   pmu_units++;
+   }
+
+   return pmu_units;
+}
+
 static int opal_imc_counters_probe(struct platform_device *pdev)
 {
struct device_node *imc_dev = pdev->dev.of_node;
-- 
2.7.4



[PATCH 1/2] powerpc/perf: Fix pmu_count to count only nest imc pmus

2017-11-10 Thread Madhavan Srinivasan
"pmu_count" in opal_imc_counters_probe() is intended to hold
the number of successful nest imc pmu registerations. But
current code also counts other imc units like core_imc and
thread_imc. Patch add a check to count only nest imc pmus.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/platforms/powernv/opal-imc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531fae20..b150f4deaccf 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -191,8 +191,10 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
break;
}
 
-   if (!imc_pmu_create(imc_dev, pmu_count, domain))
-   pmu_count++;
+   if (!imc_pmu_create(imc_dev, pmu_count, domain)) {
+   if (domain == IMC_DOMAIN_NEST)
+   pmu_count++;
+   }
}
 
return 0;
-- 
2.7.4



Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-10 Thread Desnes Augusto Nunes do Rosário



On 11/10/2017 12:54 PM, Nathan Fontenot wrote:

On 11/10/2017 08:41 AM, Desnes Augusto Nunes do Rosário wrote:



On 11/09/2017 06:31 PM, Nathan Fontenot wrote:

On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote:

This patch implements and enables VDP support for the ibmvnic driver.
Moreover, it includes the implementation of suitable structs, signal
   transmission/handling and functions which allows the retrival of firmware
   information from the ibmvnic card through the ethtool command.

Signed-off-by: Desnes A. Nunes do Rosario 
Signed-off-by: Thomas Falcon 
---
   drivers/net/ethernet/ibm/ibmvnic.c | 149 
-
   drivers/net/ethernet/ibm/ibmvnic.h |  27 ++-
   2 files changed, 173 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d0cff28..693b502 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
   return 0;
   }

+static void release_vpd_data(struct ibmvnic_adapter *adapter)
+{
+    if (!adapter->vpd)
+    return;
+
+    kfree(adapter->vpd->buff);
+    kfree(adapter->vpd);
+}
+
   static void release_tx_pools(struct ibmvnic_adapter *adapter)
   {
   struct ibmvnic_tx_pool *tx_pool;
@@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter 
*adapter)
   {
   int i;

+    release_vpd_data(adapter);
+
   release_tx_pools(adapter);
   release_rx_pools(adapter);

@@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device *netdev)
   return rc;
   }

+static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
+{
+    struct device *dev = &adapter->vdev->dev;
+    union ibmvnic_crq crq;
+    dma_addr_t dma_addr;
+    int len;
+
+    if (adapter->vpd->buff)
+    len = adapter->vpd->len;
+
+    reinit_completion(&adapter->fw_done);
+    crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
+    crq.get_vpd_size.cmd = GET_VPD_SIZE;
+    ibmvnic_send_crq(adapter, &crq);
+    wait_for_completion(&adapter->fw_done);
+


Shouldn't there be a check for the return code when getting the
vpd size?


Hello Nathan,

This check is already being performed on the handle_vpd_size_rsp() function 
down below.

In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union in 
ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives from the VNIC 
adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union with the vpd size 
information and the rc.code. If successful, a &adapter->fw_done is sent and 
this part of the code continues; however if not, a dev_error() is thrown. Same logic 
applies to GET_VPD/GET_VPD_RSP.



Yes, I did see that code. You do a complet of the completion variable for both 
success and failure,
this then lets this routine continue irregardless of the results of the get vpd 
size request. The
call to dev_err will print the error message but does not prevent use from 
bailing if the
get vpd size fails. Perhaps setting vpd->len to -1 to indicate the get vpd call 
failed which could
then be checked by the requester.

-Nathan

 >> What I am adding on the next version of the patch is a check if 
adapter->vpd->len is different than 0 before allocating 
adapter->vpd->buff, since that in a case of a failure, adapter->vpd->len 
will be 0.


I do concur with your observation that the break is necessary.

If the reception of vpd failed, adapter->vpd->len will be still zeroed 
out since it was created with kzalloc in init_resources().


Thus, do you agree if in the next version I send the following code?

===
  +   reinit_completion(&adapter->fw_done);
  +   crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
  +   crq.get_vpd_size.cmd = GET_VPD_SIZE;
  +   ibmvnic_send_crq(adapter, &crq);
  +   wait_for_completion(&adapter->fw_done);
  +
->+   if(!adapter->vpd->len)
->+   return -ENODATA;
  +
  +   if (!adapter->vpd->buff)
  +   adapter->vpd->buff = kzalloc(adapter->vpd->len, 
GFP_KERNEL);

  +   else if (adapter->vpd->len != len)
  +   adapter->vpd->buff =
  +   krealloc(adapter->vpd->buff,
  +adapter->vpd->len, GFP_KERNEL);
===



Best Regards,





+    if (!adapter->vpd->buff)
+    adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL);
+    else if (adapter->vpd->len != len)
+    adapter->vpd->buff =
+    krealloc(adapter->vpd->buff,
+ adapter->vpd->len, GFP_KERNEL);
+
+    if (!adapter->vpd->buff) {
+    dev_err(dev, "Could allocate VPD buffer\n");
+    return -ENOMEM;
+    }
+
+    adapter->vpd->dma_addr =
+    dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
+   DMA_FROM_DEVICE);
+    if (dma_mapping_error(dev, dma_addr)) {
+    dev_err(dev, "Could not map VPD buffer\n");
+    return -ENOMEM;
+    }
+
+   

Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-11-10 Thread Luck, Tony
On Fri, Nov 10, 2017 at 08:48:24AM +0900, Sergey Senozhatsky wrote:
>   All Ack-s/Tested-by-s were dropped, since the patch set has been
> reworked. I'm kindly asking arch-s maintainers and developers to test it
> once again. Sorry for any inconveniences and thanks for your help in
> advance.

You can add back the:

Tested-by: Tony Luck  #ia64

Apart from my comment about dropping the Examples from the
Documentation the series looks OK to me.

-Tony


Re: [PATCH v9 00/51] powerpc, mm: Memory Protection Keys

2017-11-10 Thread Christophe LEROY

Hi

Le 06/11/2017 à 09:56, Ram Pai a écrit :

Memory protection keys enable applications to protect its
address space from inadvertent access from or corruption
by itself.

These patches along with the pte-bit freeing patch series
enables the protection key feature on powerpc; 4k and 64k
hashpage kernels. It also changes the generic and x86
code to expose memkey features through sysfs. Finally
testcases and Documentation is updated.

All patches can be found at --
https://github.com/rampai/memorykeys.git memkey.v9


As far as I can see you are focussing the implementation on 64 bits 
powerpc. This could also be implemented on 32 bits powerpc, for instance 
the 8xx has MMU Access Protection Registers which can be used to define 
16 domains and could I think be used for implementing protection keys.
Of course the challenge after that would be to find 4 spare PTE bits, 
I'm sure we can find them on the 8xx, at least when using 16k pages we 
have 2 bits already available, then by merging PAGE_SHARED and PAGE_USER 
and by reducing PAGE_RO to only one bit we can get the 4 spare bits.


Therefore I think it would be great if you could implement a framework 
common to both PPC32 and PPC64.


Christophe



The overall idea:
-
  A process allocates a key and associates it with
  an address range within its address space.
  The process then can dynamically set read/write
  permissions on the key without involving the
  kernel. Any code that violates the permissions
  of the address space; as defined by its associated
  key, will receive a segmentation fault.

This patch series enables the feature on PPC64 HPTE
platform.

ISA3.0 section 5.7.13 describes the detailed
specifications.


Highlevel view of the design:
---
When an application associates a key with a address
address range, program the key in the Linux PTE.
When the MMU detects a page fault, allocate a hash
page and program the key into HPTE. And finally
when the MMU detects a key violation; due to
invalid application access, invoke the registered
signal handler and provide the violated key number.


Testing:
---
This patch series has passed all the protection key
tests available in the selftest directory.The
tests are updated to work on both x86 and powerpc.
The selftests have passed on x86 and powerpc hardware.

History:
---
version v9:
(1) used jump-labels to optimize code
-- Balbir
(2) fixed a register initialization bug noted
by Balbir
(3) fixed inappropriate use of paca to pass
siginfo and keys to signal handler
(4) Cleanup of comment style not to be right
justified -- mpe
(5) restructured the patches to depend on the
availability of VM_PKEY_BIT4 in
include/linux/mm.h
(6) Incorporated comments from Dave Hansen
towards changes to selftest and got
them tested on x86.

version v8:
(1) Contents of the AMR register withdrawn from
the siginfo structure. Applications can always
read the AMR register.
(2) AMR/IAMR/UAMOR are now available through
ptrace system call. -- thanks to Thiago
(3) code changes to handle legacy power cpus
that do not support execute-disable.
(4) incorporates many code improvement
suggestions.

version v7:
(1) refers to device tree property to enable
protection keys.
(2) adds 4K PTE support.
(3) fixes a couple of bugs noticed by Thiago
(4) decouples this patch series from arch-
 independent code. This patch series can
 now stand by itself, with one kludge
patch(2).
version v7:
(1) refers to device tree property to enable
protection keys.
(2) adds 4K PTE support.
(3) fixes a couple of bugs noticed by Thiago
(4) decouples this patch series from arch-
 independent code. This patch series can
 now stand by itself, with one kludge
 patch(2).

version v6:
(1) selftest changes are broken down into 20
incremental patches.
(2) A separate key allocation mask that
includes PKEY_DISABLE_EXECUTE is
added for powerpc
(3) pkey feature is enabled for 64K HPT case
only. RPT and 4k HPT is disabled.
(4) Documentation is updated to better
capture the semantics.
(5) introduced arch_pkeys_enabled() to find
if an arch enables pkeys. Correspond-
ing change the logic that displays
key value in smaps.
(6) code rearranged in many places based on
comments from Dave Hansen, Balbir,
Anshuman.   
(7) fixed one bug where a bogus key could be
associated successfully in
pkey_m

Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()

2017-11-10 Thread Luck, Tony
On Fri, Nov 10, 2017 at 08:48:29AM +0900, Sergey Senozhatsky wrote:
> -Examples::
> -
> - printk("Going to call: %pF\n", gettimeofday);
> - printk("Going to call: %pF\n", p->func);
> - printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
> - printk("%s: called from %pS\n", __func__,
> - (void *)__builtin_return_address(0));
> - printk("Faulted at %pS\n", (void *)regs->ip);
> - printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);

Did you mean to delete the Examples completely?  Wouldn't it
be better to just update (s/%pF/%pS/g)?

-Tony


[PATCH v2 2/2] powerpc/pci: Unroll two pass loop when scanning bridges

2017-11-10 Thread Andy Shevchenko
The current scanning code is really hard to understand because it calls
the same function in a loop where pass value is changed without any
comments explaining it:

  for (pass = 0; pass < 2; pass++)
for_each_pci_bridge(dev, bus)
  max = pci_scan_bridge(bus, dev, max, pass);

Unfamiliar reader cannot tell easily what is the purpose of this loop
without looking at internals of pci_scan_bridge().

In order to make this bit easier to understand, open-code the loop in
pci_scan_child_bus() and pci_hp_add_bridge() with added comments.

No functional changes intended.

Cc: Mika Westerberg 
Signed-off-by: Andy Shevchenko 
---
 arch/powerpc/kernel/pci-hotplug.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 741f47295188..cf47b1aec4c2 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -104,7 +104,7 @@ EXPORT_SYMBOL_GPL(pci_hp_remove_devices);
  */
 void pci_hp_add_devices(struct pci_bus *bus)
 {
-   int slotno, mode, pass, max;
+   int slotno, mode, max;
struct pci_dev *dev;
struct pci_controller *phb;
struct device_node *dn = pci_bus_to_OF_node(bus);
@@ -133,10 +133,17 @@ void pci_hp_add_devices(struct pci_bus *bus)
pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
pcibios_setup_bus_devices(bus);
max = bus->busn_res.start;
-   for (pass = 0; pass < 2; pass++) {
-   for_each_pci_bridge(dev, bus)
-   max = pci_scan_bridge(bus, dev, max, pass);
-   }
+   /*
+* Scan bridges that are already configured. We don't touch
+* them unless they are misconfigured (which will be done in
+* the second scan below).
+*/
+   for_each_pci_bridge(dev, bus)
+   max = pci_scan_bridge(bus, dev, max, 0);
+
+   /* Scan bridges that need to be reconfigured */
+   for_each_pci_bridge(dev, bus)
+   max = pci_scan_bridge(bus, dev, max, 1);
}
pcibios_finish_adding_to_bus(bus);
 }
-- 
2.14.2



[PATCH v2 1/2] powerpc/pci: convert to use for_each_pci_bridge() helper

2017-11-10 Thread Andy Shevchenko
...which makes code slightly cleaner.

Requires: d43f59ce6c50 ("PCI: Add for_each_pci_bridge() helper")
Acked-by: Michael Ellerman 
Signed-off-by: Andy Shevchenko 
---
 arch/powerpc/kernel/pci-hotplug.c | 7 ++-
 arch/powerpc/kernel/pci_of_scan.c | 7 ++-
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 2d71269e7dc1..741f47295188 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -134,11 +134,8 @@ void pci_hp_add_devices(struct pci_bus *bus)
pcibios_setup_bus_devices(bus);
max = bus->busn_res.start;
for (pass = 0; pass < 2; pass++) {
-   list_for_each_entry(dev, &bus->devices, bus_list) {
-   if (pci_is_bridge(dev))
-   max = pci_scan_bridge(bus, dev,
- max, pass);
-   }
+   for_each_pci_bridge(dev, bus)
+   max = pci_scan_bridge(bus, dev, max, pass);
}
}
pcibios_finish_adding_to_bus(bus);
diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 0d790f8432d2..8bdaa2a6fa62 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -369,11 +369,8 @@ static void __of_scan_bus(struct device_node *node, struct 
pci_bus *bus,
pcibios_setup_bus_devices(bus);
 
/* Now scan child busses */
-   list_for_each_entry(dev, &bus->devices, bus_list) {
-   if (pci_is_bridge(dev)) {
-   of_scan_pci_bridge(dev);
-   }
-   }
+   for_each_pci_bridge(dev, bus)
+   of_scan_pci_bridge(dev);
 }
 
 /**
-- 
2.14.2



Re: [PATCH] powerpc/64s/slice: Use addr limit when computing slice mask

2017-11-10 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> "Aneesh Kumar K.V"  writes:
>
>> While computing slice mask for the free area we need make sure we only search
>> in the addr limit applicable for this mmap. We update the slb_addr_limit
>> after we request for a mmap above 128TB. But the following mmap request
>> with hint addr below 128TB should still limit its search to below 128TB. ie.
>> we should not use slb_addr_limit to compute slice mask in this case. Instead,
>> we should derive high addr limit based on the mmap hint addr value.
>>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/mm/slice.c | 34 ++
>>  1 file changed, 22 insertions(+), 12 deletions(-)
>
> How does this relate to the fixes Nick has sent?

This patch is on top of the patch series sent by Nick. Without this
patch we will allocate memory across the 128TB range if hint_addr <
128TB but hint_addr + len is more. Inorder to recreate this issue we
will have to map stack below. Hence one won't hit the error in general
case.



>
> cheers
>
>> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
>> index 564fff06f5c1..23ec2c5e3b78 100644
>> --- a/arch/powerpc/mm/slice.c
>> +++ b/arch/powerpc/mm/slice.c
>> @@ -122,7 +122,8 @@ static int slice_high_has_vma(struct mm_struct *mm, 
>> unsigned long slice)
>>  return !slice_area_is_free(mm, start, end - start);
>>  }
>>  
>> -static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask 
>> *ret)
>> +static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask 
>> *ret,
>> +unsigned long high_limit)
>>  {
>>  unsigned long i;
>>  
>> @@ -133,15 +134,16 @@ static void slice_mask_for_free(struct mm_struct *mm, 
>> struct slice_mask *ret)
>>  if (!slice_low_has_vma(mm, i))
>>  ret->low_slices |= 1u << i;
>>  
>> -if (mm->context.slb_addr_limit <= SLICE_LOW_TOP)
>> +if (high_limit <= SLICE_LOW_TOP)
>>  return;
>>  
>> -for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++)
>> +for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
>>  if (!slice_high_has_vma(mm, i))
>>  __set_bit(i, ret->high_slices);
>>  }
>>  
>> -static void slice_mask_for_size(struct mm_struct *mm, int psize, struct 
>> slice_mask *ret)
>> +static void slice_mask_for_size(struct mm_struct *mm, int psize, struct 
>> slice_mask *ret,
>> +unsigned long high_limit)
>>  {
>>  unsigned char *hpsizes;
>>  int index, mask_index;
>> @@ -156,8 +158,11 @@ static void slice_mask_for_size(struct mm_struct *mm, 
>> int psize, struct slice_ma
>>  if (((lpsizes >> (i * 4)) & 0xf) == psize)
>>  ret->low_slices |= 1u << i;
>>  
>> +if (high_limit <= SLICE_LOW_TOP)
>> +return;
>> +
>>  hpsizes = mm->context.high_slices_psize;
>> -for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
>> +for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
>>  mask_index = i & 0x1;
>>  index = i >> 1;
>>  if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
>> @@ -169,6 +174,10 @@ static int slice_check_fit(struct mm_struct *mm,
>> struct slice_mask mask, struct slice_mask available)
>>  {
>>  DECLARE_BITMAP(result, SLICE_NUM_HIGH);
>> +/*
>> + * Make sure we just do bit compare only to the max
>> + * addr limit and not the full bit map size.
>> + */
>>  unsigned long slice_count = 
>> GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
>>  
>>  bitmap_and(result, mask.high_slices,
>> @@ -472,7 +481,7 @@ unsigned long slice_get_unmapped_area(unsigned long 
>> addr, unsigned long len,
>>  /* First make up a "good" mask of slices that have the right size
>>   * already
>>   */
>> -slice_mask_for_size(mm, psize, &good_mask);
>> +slice_mask_for_size(mm, psize, &good_mask, high_limit);
>>  slice_print_mask(" good_mask", good_mask);
>>  
>>  /*
>> @@ -497,7 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long 
>> addr, unsigned long len,
>>  #ifdef CONFIG_PPC_64K_PAGES
>>  /* If we support combo pages, we can allow 64k pages in 4k slices */
>>  if (psize == MMU_PAGE_64K) {
>> -slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask);
>> +slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
>>  if (fixed)
>>  slice_or_mask(&good_mask, &compat_mask);
>>  }
>> @@ -530,11 +539,11 @@ unsigned long slice_get_unmapped_area(unsigned long 
>> addr, unsigned long len,
>>  return newaddr;
>>  }
>>  }
>> -
>> -/* We don't fit in the good mask, check what other slices are
>> +/*
>> + * We don't fit in the good mask, check what other slices are
>>   * empty and thus can be converted
>>   */
>> -slice_mask_for_free(mm, &

Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-10 Thread Nathan Fontenot
On 11/10/2017 08:41 AM, Desnes Augusto Nunes do Rosário wrote:
> 
> 
> On 11/09/2017 06:31 PM, Nathan Fontenot wrote:
>> On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote:
>>> This patch implements and enables VDP support for the ibmvnic driver.
>>> Moreover, it includes the implementation of suitable structs, signal
>>>   transmission/handling and functions which allows the retrival of firmware
>>>   information from the ibmvnic card through the ethtool command.
>>>
>>> Signed-off-by: Desnes A. Nunes do Rosario 
>>> Signed-off-by: Thomas Falcon 
>>> ---
>>>   drivers/net/ethernet/ibm/ibmvnic.c | 149 
>>> -
>>>   drivers/net/ethernet/ibm/ibmvnic.h |  27 ++-
>>>   2 files changed, 173 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
>>> b/drivers/net/ethernet/ibm/ibmvnic.c
>>> index d0cff28..693b502 100644
>>> --- a/drivers/net/ethernet/ibm/ibmvnic.c
>>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>>> @@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter 
>>> *adapter)
>>>   return 0;
>>>   }
>>>
>>> +static void release_vpd_data(struct ibmvnic_adapter *adapter)
>>> +{
>>> +    if (!adapter->vpd)
>>> +    return;
>>> +
>>> +    kfree(adapter->vpd->buff);
>>> +    kfree(adapter->vpd);
>>> +}
>>> +
>>>   static void release_tx_pools(struct ibmvnic_adapter *adapter)
>>>   {
>>>   struct ibmvnic_tx_pool *tx_pool;
>>> @@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter 
>>> *adapter)
>>>   {
>>>   int i;
>>>
>>> +    release_vpd_data(adapter);
>>> +
>>>   release_tx_pools(adapter);
>>>   release_rx_pools(adapter);
>>>
>>> @@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device 
>>> *netdev)
>>>   return rc;
>>>   }
>>>
>>> +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
>>> +{
>>> +    struct device *dev = &adapter->vdev->dev;
>>> +    union ibmvnic_crq crq;
>>> +    dma_addr_t dma_addr;
>>> +    int len;
>>> +
>>> +    if (adapter->vpd->buff)
>>> +    len = adapter->vpd->len;
>>> +
>>> +    reinit_completion(&adapter->fw_done);
>>> +    crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
>>> +    crq.get_vpd_size.cmd = GET_VPD_SIZE;
>>> +    ibmvnic_send_crq(adapter, &crq);
>>> +    wait_for_completion(&adapter->fw_done);
>>> +
>>
>> Shouldn't there be a check for the return code when getting the
>> vpd size?
> 
> Hello Nathan,
> 
> This check is already being performed on the handle_vpd_size_rsp() function 
> down below.
> 
> In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union in 
> ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives from the 
> VNIC adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union with the vpd 
> size information and the rc.code. If successful, a &adapter->fw_done is sent 
> and this part of the code continues; however if not, a dev_error() is thrown. 
> Same logic applies to GET_VPD/GET_VPD_RSP.
> 

Yes, I did see that code. You do a complet of the completion variable for both 
success and failure,
this then lets this routine continue irregardless of the results of the get vpd 
size request. The
call to dev_err will print the error message but does not prevent use from 
bailing if the
get vpd size fails. Perhaps setting vpd->len to -1 to indicate the get vpd call 
failed which could
then be checked by the requester.

-Nathan


> What I am adding on the next version of the patch is a check if 
> adapter->vpd->len is different than 0 before allocating adapter->vpd->buff, 
> since that in a case of a failure, adapter->vpd->len will be 0.
> 
> Best Regards,
> 
>>
>>
>>> +    if (!adapter->vpd->buff)
>>> +    adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL);
>>> +    else if (adapter->vpd->len != len)
>>> +    adapter->vpd->buff =
>>> +    krealloc(adapter->vpd->buff,
>>> + adapter->vpd->len, GFP_KERNEL);
>>> +
>>> +    if (!adapter->vpd->buff) {
>>> +    dev_err(dev, "Could allocate VPD buffer\n");
>>> +    return -ENOMEM;
>>> +    }
>>> +
>>> +    adapter->vpd->dma_addr =
>>> +    dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
>>> +   DMA_FROM_DEVICE);
>>> +    if (dma_mapping_error(dev, dma_addr)) {
>>> +    dev_err(dev, "Could not map VPD buffer\n");
>>> +    return -ENOMEM;
>>> +    }
>>> +
>>> +    reinit_completion(&adapter->fw_done);
>>> +    crq.get_vpd.first = IBMVNIC_CRQ_CMD;
>>> +    crq.get_vpd.cmd = GET_VPD;
>>> +    crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr);
>>> +    crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len);
>>> +    ibmvnic_send_crq(adapter, &crq);
>>> +    wait_for_completion(&adapter->fw_done);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   static int init_resources(struct ibmvnic_adapter *adapter)
>>>   {
>>>   struct net_device *netdev = adapter->netdev;
>>> @@ -850,6 +908,10 @@ static int init_resources(struct ibmvnic_adapter 
>>> *adapter)
>>>  

Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-10 Thread Desnes Augusto Nunes do Rosário



On 11/09/2017 06:31 PM, Nathan Fontenot wrote:

On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote:

This patch implements and enables VDP support for the ibmvnic driver.
Moreover, it includes the implementation of suitable structs, signal
  transmission/handling and functions which allows the retrival of firmware
  information from the ibmvnic card through the ethtool command.

Signed-off-by: Desnes A. Nunes do Rosario 
Signed-off-by: Thomas Falcon 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 149 -
  drivers/net/ethernet/ibm/ibmvnic.h |  27 ++-
  2 files changed, 173 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d0cff28..693b502 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
return 0;
  }

+static void release_vpd_data(struct ibmvnic_adapter *adapter)
+{
+   if (!adapter->vpd)
+   return;
+
+   kfree(adapter->vpd->buff);
+   kfree(adapter->vpd);
+}
+
  static void release_tx_pools(struct ibmvnic_adapter *adapter)
  {
struct ibmvnic_tx_pool *tx_pool;
@@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter 
*adapter)
  {
int i;

+   release_vpd_data(adapter);
+
release_tx_pools(adapter);
release_rx_pools(adapter);

@@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device *netdev)
return rc;
  }

+static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
+{
+   struct device *dev = &adapter->vdev->dev;
+   union ibmvnic_crq crq;
+   dma_addr_t dma_addr;
+   int len;
+
+   if (adapter->vpd->buff)
+   len = adapter->vpd->len;
+
+   reinit_completion(&adapter->fw_done);
+   crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
+   crq.get_vpd_size.cmd = GET_VPD_SIZE;
+   ibmvnic_send_crq(adapter, &crq);
+   wait_for_completion(&adapter->fw_done);
+


Shouldn't there be a check for the return code when getting the
vpd size?


Hello Nathan,

This check is already being performed on the handle_vpd_size_rsp() 
function down below.


In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union 
in ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives 
from the VNIC adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union 
with the vpd size information and the rc.code. If successful, a 
&adapter->fw_done is sent and this part of the code continues; however 
if not, a dev_error() is thrown. Same logic applies to GET_VPD/GET_VPD_RSP.


What I am adding on the next version of the patch is a check if 
adapter->vpd->len is different than 0 before allocating 
adapter->vpd->buff, since that in a case of a failure, adapter->vpd->len 
will be 0.


Best Regards,





+   if (!adapter->vpd->buff)
+   adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL);
+   else if (adapter->vpd->len != len)
+   adapter->vpd->buff =
+   krealloc(adapter->vpd->buff,
+adapter->vpd->len, GFP_KERNEL);
+
+   if (!adapter->vpd->buff) {
+   dev_err(dev, "Could allocate VPD buffer\n");
+   return -ENOMEM;
+   }
+
+   adapter->vpd->dma_addr =
+   dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
+  DMA_FROM_DEVICE);
+   if (dma_mapping_error(dev, dma_addr)) {
+   dev_err(dev, "Could not map VPD buffer\n");
+   return -ENOMEM;
+   }
+
+   reinit_completion(&adapter->fw_done);
+   crq.get_vpd.first = IBMVNIC_CRQ_CMD;
+   crq.get_vpd.cmd = GET_VPD;
+   crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr);
+   crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len);
+   ibmvnic_send_crq(adapter, &crq);
+   wait_for_completion(&adapter->fw_done);
+
+   return 0;
+}
+
  static int init_resources(struct ibmvnic_adapter *adapter)
  {
struct net_device *netdev = adapter->netdev;
@@ -850,6 +908,10 @@ static int init_resources(struct ibmvnic_adapter *adapter)
if (rc)
return rc;

+   adapter->vpd = kzalloc(sizeof(*adapter->vpd), GFP_KERNEL);
+   if (!adapter->vpd)
+   return -ENOMEM;
+
adapter->map_id = 1;
adapter->napi = kcalloc(adapter->req_rx_queues,
sizeof(struct napi_struct), GFP_KERNEL);
@@ -950,6 +1012,10 @@ static int ibmvnic_open(struct net_device *netdev)

rc = __ibmvnic_open(netdev);
netif_carrier_on(netdev);
+
+   /* Vital Product Data (VPD) */
+   ibmvnic_get_vpd(adapter);
+
mutex_unlock(&adapter->reset_lock);

return rc;
@@ -1878,11 +1944,15 @@ static int ibmvnic_get_link_ksettings(struct net_device 
*netdev,
return 0;
  }

-static void ibmvnic_get_drvi

Re: [linux-next][0692229e] next-20171106 fails to boot on Power 7

2017-11-10 Thread Michal Hocko
Hi Abdul,

On Tue 07-11-17 11:28:54, Michal Hocko wrote:
> On Tue 07-11-17 15:20:29, Abdul Haleem wrote:
> > Hi,
> > 
> > Today's next kernel fails to boot on Power 7 Machine with below errors
> > in boot log messages.
> > 
> > 'Uhuuh, elf segement at 1004 requested but the memory is
> > mapped already'
> > 
> > It was introduced with commit:
> > 
> > 0692229e : fs/binfmt_elf.c: drop MAP_FIXED usage from elf_map
> 
> Weird. Clashes shouldn't really happen. Maybe power is doing something
> different from other platforms. Could you apply the following debugging
> patch to see what is going on there?

Did you have chance to test with this debugging patch, please?

> ---
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 0abc30d681ae..f098aaf60039 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -361,8 +361,28 @@ static unsigned long elf_vm_mmap(struct file *filep, 
> unsigned long addr,
>   return map_addr;
>  
>   if ((type & MAP_FIXED) && map_addr != addr) {
> - pr_info("Uhuuh, elf segement at %p requested but the memory is 
> mapped already\n",
> - (void*)addr);
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + pr_info("Uhuuh, elf segement at %p requested but the memory is 
> mapped already, got %p\n",
> + (void*)addr, (void*)map_addr);
> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, addr);
> + if (vma) {
> + const char *name;
> + pr_info("Clashing vma [%lx, %lx] flags:%lx", 
> vma->vm_start, vma->vm_end, vma->vm_flags);
> + name = arch_vma_name(vma);
> + if (!name) {
> + if (vma->vm_start <= mm->brk &&
> + vma->vm_end >= mm->start_brk)
> + name = "[heap]";
> + else if (vma->vm_start <= 
> vma->vm_mm->start_stack &&
> + vma->vm_end >= 
> vma->vm_mm->start_stack)
> + name = "[stack]";
> + }
> + pr_cont(" name:%s\n", name);
> + } else
> + pr_info("Uhm, no clashing VMA\n");
> + up_read(&mm->mmap_sem);
>   return -EAGAIN;
>   }
>  
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


RE: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-10 Thread David Laight
From: Matthew Wilcox
> Sent: 09 November 2017 19:44
> 
> On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote:
> > So these semantics are what we're going with? Anything that does mmap() is
> > guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
> > itself? Is intended to be cross-platform or just x86 and power specific?
> 
> It is x86 and powerpc specific.  The arm64 people have apparently stumbled
> across apps that expect to be able to use bit 48 for their own purposes.
> And their address space is 48 bit by default.  Oops.

(Do you mean 49bit?)

Aren't such apps just doomed to be broken?

ISTR there is something on (IIRC) sparc64 that does a 'match'
on the high address bits to make it much harder to overrun
one area into another.

David



Re: [PATCH v9 44/51] selftest/vm: powerpc implementation for generic abstraction

2017-11-10 Thread Breno Leitao
Hi Ram,

On Thu, Nov 09, 2017 at 03:37:46PM -0800, Ram Pai wrote:
> On Thu, Nov 09, 2017 at 04:47:15PM -0200, Breno Leitao wrote:
> > On Mon, Nov 06, 2017 at 12:57:36AM -0800, Ram Pai wrote:
> > > @@ -206,12 +209,14 @@ void signal_handler(int signum, siginfo_t *si, void 
> > > *vucontext)
> > >  
> > >   trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
> > >   ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
> > > - fpregset = uctxt->uc_mcontext.fpregs;
> > > - fpregs = (void *)fpregset;
> > 
> > Since you removed all references for fpregset now, you probably want to
> > remove the declaration of the variable above.
> 
> fpregs is still needed.

Right, fpregs is still needed, but not fpregset. Every reference for this
variable was removed with your patch.

Grepping this variable identifier on a tree with your patches, I see:

 $ grep fpregset protection_keys.c 
 fpregset_t fpregset;


Re: [PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown

2017-11-10 Thread Michael Ellerman
Nicholas Piggin  writes:

> Currently powernv reboot and shutdown requests just leave secondaries
> to do their own things. This is undesirable because they can trigger
> any number of watchdogs while waiting for reboot, but also we don't
> know what else they might be doing, or they might be stuck somewhere
> causing trouble.
>
> The opal scheduled flash update code already ran into watchdog problems
> due to flashing taking a long time, but it's possible for regular
> reboots to trigger problems too (this is with watchdog_thresh set to 1,
> but I have seen it with watchdog_thresh at the default value once too):
>
>   reboot: Restarting system
>   [  360.038896709,5] OPAL: Reboot request...
>   Watchdog CPU:0 Hard LOCKUP
>   Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
>   Watchdog CPU:16 Hard LOCKUP
>   watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]
>
> So remove the special case for flash update, and unconditionally do
> smp_send_stop before rebooting.
>
> Return the CPUs to Linux stop loops rather than OPAL. The reason for
> this is that the path to firmware is longer, and the CPUs may have
> been interrupted from firmware, which may cause problems to re-enter
> it. It's better to put them into a simple spin loop to maximize the
> chance of a successful reboot.

I always assumed we had to send the CPUs back to OPAL for the flashing
procedure. Is it OK to leave them in Linux?

cheers


Re: [PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols

2017-11-10 Thread Michael Ellerman
Josh Poimboeuf  writes:

> On Fri, Nov 10, 2017 at 01:06:25PM +1100, Balbir Singh wrote:
>> On Fri, Nov 10, 2017 at 2:19 AM, Josh Poimboeuf  wrote:
>> > FWIW, I think it won't matter anyway.  I'm currently pursuing the option
>> > of inserting nops after local calls, because it has less runtime
>> > complexity than using a stub.
>> >
>> > I think I've figured out a way to do it with a GCC plugin, but if that
>> > doesn't work I'll try the asm listing sed approach suggested by Michael.
>> >
>> 
>> A plugin that runs for the new kernel with the patch? Just for
>> specific files involved in the patch?
>
> The plugin will affect the code generation of all functions in the patch
> module.  So all calls in all replacement functions will have the nops.
>
> Here's a prototype (not yet fully tested):
>
>   
> https://github.com/jpoimboe/kpatch/blob/TODO-ppc-fix/kpatch-build/gcc-plugins/ppc64le-plugin.c

Nice.

cheers


Re: [PATCH] powerpc/64s/slice: Use addr limit when computing slice mask

2017-11-10 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> While computing slice mask for the free area we need make sure we only search
> in the addr limit applicable for this mmap. We update the slb_addr_limit
> after we request for a mmap above 128TB. But the following mmap request
> with hint addr below 128TB should still limit its search to below 128TB. ie.
> we should not use slb_addr_limit to compute slice mask in this case. Instead,
> we should derive high addr limit based on the mmap hint addr value.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/slice.c | 34 ++
>  1 file changed, 22 insertions(+), 12 deletions(-)

How does this relate to the fixes Nick has sent?

cheers

> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 564fff06f5c1..23ec2c5e3b78 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -122,7 +122,8 @@ static int slice_high_has_vma(struct mm_struct *mm, 
> unsigned long slice)
>   return !slice_area_is_free(mm, start, end - start);
>  }
>  
> -static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret)
> +static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
> + unsigned long high_limit)
>  {
>   unsigned long i;
>  
> @@ -133,15 +134,16 @@ static void slice_mask_for_free(struct mm_struct *mm, 
> struct slice_mask *ret)
>   if (!slice_low_has_vma(mm, i))
>   ret->low_slices |= 1u << i;
>  
> - if (mm->context.slb_addr_limit <= SLICE_LOW_TOP)
> + if (high_limit <= SLICE_LOW_TOP)
>   return;
>  
> - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++)
> + for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
>   if (!slice_high_has_vma(mm, i))
>   __set_bit(i, ret->high_slices);
>  }
>  
> -static void slice_mask_for_size(struct mm_struct *mm, int psize, struct 
> slice_mask *ret)
> +static void slice_mask_for_size(struct mm_struct *mm, int psize, struct 
> slice_mask *ret,
> + unsigned long high_limit)
>  {
>   unsigned char *hpsizes;
>   int index, mask_index;
> @@ -156,8 +158,11 @@ static void slice_mask_for_size(struct mm_struct *mm, 
> int psize, struct slice_ma
>   if (((lpsizes >> (i * 4)) & 0xf) == psize)
>   ret->low_slices |= 1u << i;
>  
> + if (high_limit <= SLICE_LOW_TOP)
> + return;
> +
>   hpsizes = mm->context.high_slices_psize;
> - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
> + for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
>   mask_index = i & 0x1;
>   index = i >> 1;
>   if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
> @@ -169,6 +174,10 @@ static int slice_check_fit(struct mm_struct *mm,
>  struct slice_mask mask, struct slice_mask available)
>  {
>   DECLARE_BITMAP(result, SLICE_NUM_HIGH);
> + /*
> +  * Make sure we just do bit compare only to the max
> +  * addr limit and not the full bit map size.
> +  */
>   unsigned long slice_count = 
> GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
>  
>   bitmap_and(result, mask.high_slices,
> @@ -472,7 +481,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
> unsigned long len,
>   /* First make up a "good" mask of slices that have the right size
>* already
>*/
> - slice_mask_for_size(mm, psize, &good_mask);
> + slice_mask_for_size(mm, psize, &good_mask, high_limit);
>   slice_print_mask(" good_mask", good_mask);
>  
>   /*
> @@ -497,7 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
> unsigned long len,
>  #ifdef CONFIG_PPC_64K_PAGES
>   /* If we support combo pages, we can allow 64k pages in 4k slices */
>   if (psize == MMU_PAGE_64K) {
> - slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask);
> + slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
>   if (fixed)
>   slice_or_mask(&good_mask, &compat_mask);
>   }
> @@ -530,11 +539,11 @@ unsigned long slice_get_unmapped_area(unsigned long 
> addr, unsigned long len,
>   return newaddr;
>   }
>   }
> -
> - /* We don't fit in the good mask, check what other slices are
> + /*
> +  * We don't fit in the good mask, check what other slices are
>* empty and thus can be converted
>*/
> - slice_mask_for_free(mm, &potential_mask);
> + slice_mask_for_free(mm, &potential_mask, high_limit);
>   slice_or_mask(&potential_mask, &good_mask);
>   slice_print_mask(" potential", potential_mask);
>  
> @@ -744,17 +753,18 @@ int is_hugepage_only_range(struct mm_struct *mm, 
> unsigned long addr,
>  {
>   struct slice_mask mask, available;
>   unsigned int psize = mm->context.user_p

Re: [PATCHv4 2/3] ARMv8: layerscape: add the pcie ep function support

2017-11-10 Thread Kishon Vijay Abraham I
Hi,

On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote:
> Add the pcie controller ep function support of layerscape base on
> pcie ep framework.
> 
> Signed-off-by: Bao Xiaowei 
> ---
>  v2:
>  - fix the ioremap function used but no ioumap issue
>  - optimize the code structure
>  - add code comments
>  v3:
>  - fix the msi outband window request failed issue
>  v4:
>  - optimize the code, adjust the format
> 
>  drivers/pci/dwc/pci-layerscape.c | 120 
> ---
>  1 file changed, 113 insertions(+), 7 deletions(-)

$subject should begin with
PCI: layerscape:
> 
> diff --git a/drivers/pci/dwc/pci-layerscape.c 
> b/drivers/pci/dwc/pci-layerscape.c
> index 87fa486bee2c..6f3e434599e0 100644
> --- a/drivers/pci/dwc/pci-layerscape.c
> +++ b/drivers/pci/dwc/pci-layerscape.c
> @@ -34,7 +34,12 @@
>  /* PEX Internal Configuration Registers */
>  #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */
>  
> +#define PCIE_DBI2_BASE   0x1000  /* DBI2 base address*/

The base address should come from dt.
> +#define PCIE_MSI_MSG_DATA_OFF0x5c/* MSI Data register address*/
> +#define PCIE_MSI_OB_SIZE 4096
> +#define PCIE_MSI_ADDR_OFFSET (1024 * 1024)
>  #define PCIE_IATU_NUM6
> +#define PCIE_EP_ADDR_SPACE_SIZE 0x1
>  
>  struct ls_pcie_drvdata {
>   u32 lut_offset;
> @@ -44,12 +49,20 @@ struct ls_pcie_drvdata {
>   const struct dw_pcie_ops *dw_pcie_ops;
>  };
>  
> +struct ls_pcie_ep {
> + dma_addr_t msi_phys_addr;
> + void __iomem *msi_virt_addr;
> + u64 msi_msg_addr;
> + u16 msi_msg_data;
> +};
> +
>  struct ls_pcie {
>   struct dw_pcie *pci;
>   void __iomem *lut;
>   struct regmap *scfg;
>   const struct ls_pcie_drvdata *drvdata;
>   int index;
> + struct ls_pcie_ep *pcie_ep;
>  };
>  
>  #define to_ls_pcie(x)dev_get_drvdata((x)->dev)
> @@ -263,6 +276,99 @@ static const struct of_device_id ls_pcie_of_match[] = {
>   { },
>  };
>  
> +static void ls_pcie_raise_msi_irq(struct ls_pcie_ep *pcie_ep)
> +{
> + iowrite32(pcie_ep->msi_msg_data, pcie_ep->msi_virt_addr);
> +}
> +
> +static int ls_pcie_raise_irq(struct dw_pcie_ep *ep,
> + enum pci_epc_irq_type type, u8 interrupt_num)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> + struct ls_pcie_ep *pcie_ep = pcie->pcie_ep;
> + u32 free_win;
> +
> + /* get the msi message address and msi message data */
> + pcie_ep->msi_msg_addr = ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_L32) |
> + (((u64)ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_U32)) << 32);
> + pcie_ep->msi_msg_data = ioread16(pci->dbi_base + PCIE_MSI_MSG_DATA_OFF);
> +
> + /* request and config the outband window for msi */
> + free_win = find_first_zero_bit(&ep->ob_window_map,
> + sizeof(ep->ob_window_map));
> + if (free_win >= ep->num_ob_windows) {
> + dev_err(pci->dev, "no free outbound window\n");
> + return -ENOMEM;
> + }
> +
> + dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM,
> + pcie_ep->msi_phys_addr,
> + pcie_ep->msi_msg_addr,
> + PCIE_MSI_OB_SIZE);
> +
> + set_bit(free_win, &ep->ob_window_map);

This custom logic is not required. You can use [1] instead

[1] -> https://lkml.org/lkml/2017/11/3/318
> +
> + /* generate the msi interrupt */
> + ls_pcie_raise_msi_irq(pcie_ep);
> +
> + /* release the outband window of msi */
> + dw_pcie_disable_atu(pci, free_win, DW_PCIE_REGION_OUTBOUND);
> + clear_bit(free_win, &ep->ob_window_map);
> +
> + return 0;
> +}
> +
> +static struct dw_pcie_ep_ops pcie_ep_ops = {
> + .raise_irq = ls_pcie_raise_irq,
> +};
> +
> +static int __init ls_add_pcie_ep(struct ls_pcie *pcie,
> + struct platform_device *pdev)
> +{
> + struct dw_pcie *pci = pcie->pci;
> + struct device *dev = pci->dev;
> + struct dw_pcie_ep *ep;
> + struct ls_pcie_ep *pcie_ep;
> + struct resource *cfg_res;
> + int ret;
> +
> + ep = &pci->ep;
> + ep->ops = &pcie_ep_ops;
> +
> + pcie_ep = devm_kzalloc(dev, sizeof(*pcie_ep), GFP_KERNEL);
> + if (!pcie_ep)
> + return -ENOMEM;
> +
> + pcie->pcie_ep = pcie_ep;
> +
> + cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config");
> + if (cfg_res) {
> + ep->phys_base = cfg_res->start;
> + ep->addr_size = PCIE_EP_ADDR_SPACE_SIZE;
> + } else {
> + dev_err(dev, "missing *config* space\n");
> + return -ENODEV;
> + }
> +
> + pcie_ep->msi_phys_addr = ep->phys_base + PCIE_MSI_ADDR_OFFSET;
> +
> + pcie_ep->msi_virt_addr = ioremap(pcie_ep->msi_phys_addr,
> + PCIE_MSI_OB_SI

Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB

2017-11-10 Thread Kishon Vijay Abraham I
Hi Bao,

On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote:
> Add the property of inbound and outbound windows number for ep
> driver.
> 
> Signed-off-by: Bao Xiaowei 
> Acked-by: Minghuan Lian 
> ---
>  v2:
>  - no change
>  v3:
>  - modify the commit message
>  v4:
>  - no change
> 
>  arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++
>  1 file changed, 6 insertions(+)

$subject should start with something like
arm64: dts: ls1046a: **
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 06b5e12d04d8..f8332669663c 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -674,6 +674,8 @@
>   device_type = "pci";
>   dma-coherent;
>   num-lanes = <4>;
> + num-ib-windows = <6>;
> + num-ob-windows = <6>;

EP specific properties shouldn't be added in RC dt node. Ideally you should
have a separate dt node for RC and EP.

Thanks
Kishon