Re: [PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown
On Fri, 10 Nov 2017 22:08:32 +1100 Michael Ellerman wrote: > Nicholas Piggin writes: > > > Currently powernv reboot and shutdown requests just leave secondaries > > to do their own things. This is undesirable because they can trigger > > any number of watchdogs while waiting for reboot, but also we don't > > know what else they might be doing, or they might be stuck somewhere > > causing trouble. > > > > The opal scheduled flash update code already ran into watchdog problems > > due to flashing taking a long time, but it's possible for regular > > reboots to trigger problems too (this is with watchdog_thresh set to 1, > > but I have seen it with watchdog_thresh at the default value once too): > > > > reboot: Restarting system > > [ 360.038896709,5] OPAL: Reboot request... > > Watchdog CPU:0 Hard LOCKUP > > Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 > > Watchdog CPU:16 Hard LOCKUP > > watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] > > > > So remove the special case for flash update, and unconditionally do > > smp_send_stop before rebooting. > > > > Return the CPUs to Linux stop loops rather than OPAL. The reason for > > this is that the path to firmware is longer, and the CPUs may have > > been interrupted from firmware, which may cause problems to re-enter > > it. It's better to put them into a simple spin loop to maximize the > > chance of a successful reboot. > > I always assumed we had to send the CPUs back to OPAL for the flashing > procedure. Is it OK to leave them in Linux? According to the comment and changelog 2196c6f1ed66eef23df3b478cfe71661ae83726e It was added just to keep secondaries from going silly. Vasant, can you remember details? Thanks, Nick
Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()
On (11/10/17 10:09), Luck, Tony wrote: > On Fri, Nov 10, 2017 at 08:48:29AM +0900, Sergey Senozhatsky wrote: > > -Examples:: > > - > > - printk("Going to call: %pF\n", gettimeofday); > > - printk("Going to call: %pF\n", p->func); > > - printk("%s: called from %pS\n", __func__, (void *)_RET_IP_); > > - printk("%s: called from %pS\n", __func__, > > - (void *)__builtin_return_address(0)); > > - printk("Faulted at %pS\n", (void *)regs->ip); > > - printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack); > > Did you mean to delete the Examples completely? Wouldn't it > be better to just update (s/%pF/%pS/g)? good question. yes, I think I did it deliberately :) we still kinda have some sort of "examples", right at the beginning of section "Symbols/Function Pointers" > Symbols/Function Pointers > = > > :: > > %pS versatile_init+0x0/0x110 > %ps versatile_init > %pF versatile_init+0x0/0x110 > %pf versatile_init > %pSRversatile_init+0x9/0x110 > (with __builtin_extract_return_addr() translation) > %pB prev_fn_of_versatile_init+0x88/0x88 > > The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic > format. They result in the symbol name with (``S``) or without (``s``) > offsets. If KALLSYMS are disabled then the symbol address is printed instead. > > Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``) > and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and > parisc64 function pointers are indirect and, in fact, are function > descriptors, which require additional dereferencing before we can lookup > the symbol. As of now, ``S`` and ``s`` perform dereferencing on those > platforms (when needed), so ``F`` and ``f`` exist for compatibility > reasons only. > > The ``B`` specifier results in the symbol name with offsets and should be > used when printing stack backtraces. The specifier takes into > consideration the effect of compiler optimisations which may occur > when tail-call``s are used and marked with the noreturn GCC attribute. I can return Examples back. don't really have a strong opinion on this. let me know. -ss
Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers
On (11/10/17 10:11), Luck, Tony wrote: > On Fri, Nov 10, 2017 at 08:48:24AM +0900, Sergey Senozhatsky wrote: > > All Ack-s/Tested-by-s were dropped, since the patch set has been > > reworked. I'm kindly asking arch-s maintainers and developers to test it > > once again. Sorry for any inconveniences and thanks for your help in > > advance. > > You can add back the: > > Tested-by: Tony Luck #ia64 Thanks a ton, Tony! -ss
Re: [PATCH 1/9] include: Move compat_timespec/ timeval to compat_time.h
On Fri, 10 Nov 2017 14:42:51 -0800 Deepa Dinamani wrote: > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h > index 09ad88572746..db25aa15b705 100644 > --- a/arch/x86/include/asm/ftrace.h > +++ b/arch/x86/include/asm/ftrace.h > @@ -49,7 +49,7 @@ int ftrace_int3_handler(struct pt_regs *regs); > #if !defined(__ASSEMBLY__) && !defined(COMPILE_OFFSETS) > > #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION) > -#include > +#include > Acked-by: Steven Rostedt (VMware) -- Steve
[PATCH 1/9] include: Move compat_timespec/ timeval to compat_time.h
All the current architecture specific defines for these are the same. Refactor these common defines to a common header file. The new common linux/compat_time.h is also useful as it will eventually be used to hold all the defines that are needed for compat time types that support non y2038 safe types. New architectures need not have to define these new types as they will only use new y2038 safe syscalls. This file can be deleted after y2038 when we stop supporting non y2038 safe syscalls. The patch also requires an operation similar to: git grep "asm/compat\.h" | cut -d ":" -f 1 | xargs -n 1 sed -i -e "s%asm/compat.h%linux/compat.h%g" Cc: a...@kernel.org Cc: b...@kernel.crashing.org Cc: borntrae...@de.ibm.com Cc: catalin.mari...@arm.com Cc: cmetc...@mellanox.com Cc: coh...@redhat.com Cc: da...@davemloft.net Cc: del...@gmx.de Cc: de...@driverdev.osuosl.org Cc: gerald.schae...@de.ibm.com Cc: gre...@linuxfoundation.org Cc: heiko.carst...@de.ibm.com Cc: hoepp...@linux.vnet.ibm.com Cc: h...@zytor.com Cc: j...@parisc-linux.org Cc: j...@linux.vnet.ibm.com Cc: linux-ker...@vger.kernel.org Cc: linux-m...@linux-mips.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: mark.rutl...@arm.com Cc: mi...@redhat.com Cc: m...@ellerman.id.au Cc: ober...@linux.vnet.ibm.com Cc: oprofile-l...@lists.sf.net Cc: pau...@samba.org Cc: pet...@infradead.org Cc: r...@linux-mips.org Cc: rost...@goodmis.org Cc: r...@kernel.org Cc: schwidef...@de.ibm.com Cc: seb...@linux.vnet.ibm.com Cc: sparcli...@vger.kernel.org Cc: s...@linux.vnet.ibm.com Cc: ubr...@linux.vnet.ibm.com Cc: will.dea...@arm.com Cc: x...@kernel.org Signed-off-by: Deepa Dinamani --- arch/arm64/include/asm/compat.h | 11 --- arch/arm64/include/asm/stat.h | 1 + arch/arm64/kernel/hw_breakpoint.c | 1 - arch/arm64/kernel/perf_regs.c | 2 +- arch/arm64/kernel/process.c | 1 - arch/mips/include/asm/compat.h| 11 --- arch/mips/kernel/signal32.c | 2 +- arch/parisc/include/asm/compat.h | 11 --- arch/powerpc/include/asm/compat.h | 11 --- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/oprofile/backtrace.c | 2 +- arch/s390/hypfs/hypfs_sprp.c | 1 - arch/s390/include/asm/compat.h| 11 --- arch/s390/include/asm/elf.h | 3 +-- arch/s390/kvm/priv.c | 1 - arch/s390/pci/pci_clp.c | 1 - arch/sparc/include/asm/compat.h | 11 --- arch/tile/include/asm/compat.h| 11 --- arch/x86/events/core.c| 2 +- arch/x86/include/asm/compat.h | 11 --- arch/x86/include/asm/ftrace.h | 2 +- arch/x86/include/asm/sys_ia32.h | 2 +- arch/x86/kernel/sys_x86_64.c | 2 +- drivers/s390/block/dasd_ioctl.c | 1 - drivers/s390/char/fs3270.c| 1 - drivers/s390/char/sclp_ctl.c | 1 - drivers/s390/char/vmcp.c | 1 - drivers/s390/cio/chsc_sch.c | 1 - drivers/s390/net/qeth_core_main.c | 2 +- drivers/staging/pi433/pi433_if.c | 2 +- include/linux/compat.h| 1 + include/linux/compat_time.h | 19 +++ 32 files changed, 32 insertions(+), 110 deletions(-) create mode 100644 include/linux/compat_time.h diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h index e39d487bf724..d4f9c9ee3b15 100644 --- a/arch/arm64/include/asm/compat.h +++ b/arch/arm64/include/asm/compat.h @@ -34,7 +34,6 @@ typedef u32compat_size_t; typedef s32compat_ssize_t; -typedef s32compat_time_t; typedef s32compat_clock_t; typedef s32compat_pid_t; typedef u16__compat_uid_t; @@ -66,16 +65,6 @@ typedef u32 compat_ulong_t; typedef u64compat_u64; typedef u32compat_uptr_t; -struct compat_timespec { - compat_time_t tv_sec; - s32 tv_nsec; -}; - -struct compat_timeval { - compat_time_t tv_sec; - s32 tv_usec; -}; - struct compat_stat { #ifdef __AARCH64EB__ short st_dev; diff --git a/arch/arm64/include/asm/stat.h b/arch/arm64/include/asm/stat.h index 15e35598ac40..eab738019707 100644 --- a/arch/arm64/include/asm/stat.h +++ b/arch/arm64/include/asm/stat.h @@ -20,6 +20,7 @@ #ifdef CONFIG_COMPAT +#include #include /* diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 749f81779420..bfa2b78cf0e3 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -29,7 +29,6 @@ #include #include -#include #include #include #include diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index 1d091d048d04..929fc369d0be 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -5,7 +5,7 @@ #include #include -#include +#include #include #include diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/proc
[PATCH 0/9] posix_clocks: Prepare syscalls for 64 bit time_t conversion
The series is a preparation series for individual architectures to use 64 bit time_t syscalls in compat and 32 bit emulation modes. This is a follow up to the series Arnd Bergmann posted: https://sourceware.org/ml/libc-alpha/2015-05/msg00070.html Big picture is as per the lwn article: https://lwn.net/Articles/643234/ The series is directed at converting posix clock syscalls: clock_gettime, clock_settime, clock_getres and clock_nanosleep to use a new data structure __kernel_timespec at syscall boundaries. __kernel_timespec maintains 64 bit time_t across all execution modes. vdso will be handled as part of each architecture when they enable support for 64 bit time_t. The compat syscalls are repurposed to provide backward compatibility by using them as native syscalls as well for 32 bit architectures. They will continue to use timespec at syscall boundaries. CONFIG_64_BIT_TIME controls whether the syscalls use __kernel_timespec or timespec at syscall boundaries. The series does the following: 1. Enable compat syscalls unconditionally. 2. Add a new __kernel_timespec type to be used as the data structure for all the new syscalls. 3. Add new config CONFIG_64BIT_TIME(intead of the CONFIG_COMPAT_TIME in [1] and [2] to switch to new definition of __kernel_timespec. It is the same as struct timespec otherwise. Arnd Bergmann (1): y2038: introduce CONFIG_64BIT_TIME Deepa Dinamani (8): include: Move compat_timespec/ timeval to compat_time.h compat: Make compat helpers independent of CONFIG_COMPAT compat: enable compat_get/put_timespec64 always posix-clocks: Enable compat syscalls always include: Add new y2038 safe __kernel_timespec fix get_timespec64() for y2038 safe compat interfaces change time types to new y2038 safe __kernel_* types nanosleep: change time types to safe __kernel_* types arch/Kconfig | 11 arch/arm64/include/asm/compat.h| 11 arch/arm64/include/asm/stat.h | 1 + arch/arm64/kernel/hw_breakpoint.c | 1 - arch/arm64/kernel/perf_regs.c | 2 +- arch/arm64/kernel/process.c| 1 - arch/mips/include/asm/compat.h | 11 arch/mips/kernel/signal32.c| 2 +- arch/parisc/include/asm/compat.h | 11 arch/powerpc/include/asm/compat.h | 11 arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/oprofile/backtrace.c | 2 +- arch/s390/hypfs/hypfs_sprp.c | 1 - arch/s390/include/asm/compat.h | 11 arch/s390/include/asm/elf.h| 3 +- arch/s390/kvm/priv.c | 1 - arch/s390/pci/pci_clp.c| 1 - arch/sparc/include/asm/compat.h| 11 arch/tile/include/asm/compat.h | 11 arch/x86/events/core.c | 2 +- arch/x86/include/asm/compat.h | 11 arch/x86/include/asm/ftrace.h | 2 +- arch/x86/include/asm/sys_ia32.h| 2 +- arch/x86/kernel/sys_x86_64.c | 2 +- drivers/s390/block/dasd_ioctl.c| 1 - drivers/s390/char/fs3270.c | 1 - drivers/s390/char/sclp_ctl.c | 1 - drivers/s390/char/vmcp.c | 1 - drivers/s390/cio/chsc_sch.c| 1 - drivers/s390/net/qeth_core_main.c | 2 +- drivers/staging/pi433/pi433_if.c | 2 +- include/linux/compat.h | 7 ++- include/linux/compat_time.h| 23 + include/linux/restart_block.h | 7 +-- include/linux/syscalls.h | 12 ++--- include/linux/time.h | 4 +- include/linux/time64.h | 10 +++- include/uapi/asm-generic/posix_types.h | 1 + include/uapi/linux/time.h | 7 +++ kernel/Makefile| 2 +- kernel/compat.c| 92 ++ kernel/time/hrtimer.c | 7 +-- kernel/time/posix-stubs.c | 12 ++--- kernel/time/posix-timers.c | 20 kernel/time/time.c | 10 +++- 45 files changed, 152 insertions(+), 195 deletions(-) create mode 100644 include/linux/compat_time.h base-commit: d9e0e63d9a6f88440eb201e1491fcf730272c706 -- 2.11.0 Cc: a...@kernel.org Cc: b...@kernel.crashing.org Cc: borntrae...@de.ibm.com Cc: catalin.mari...@arm.com Cc: cmetc...@mellanox.com Cc: coh...@redhat.com Cc: da...@davemloft.net Cc: del...@gmx.de Cc: de...@driverdev.osuosl.org Cc: gerald.schae...@de.ibm.com Cc: gre...@linuxfoundation.org Cc: heiko.carst...@de.ibm.com Cc: hoepp...@linux.vnet.ibm.com Cc: h...@zytor.com Cc: j...@parisc-linux.org Cc: j...@linux.vnet.ibm.com Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: linux-m...@linux-mips.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: mark.rutl...@arm.com Cc: mi...@redhat.com Cc: m...@ellerman.id.au Cc: ober...@linux.vnet.ibm.com Cc: oprofile-
[RFC PATCH v7 for 4.15 02/10] membarrier: powerpc: Skip memory barrier in switch_mm()
Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Changes since v1: - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in powerpc membarrier_arch_sched_in(), given that we want to specifically check the next thread state. - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. - Use task_thread_info() to pass thread_info from task to *_ti_thread_flag(). Changes since v2: - Move membarrier_arch_sched_in() call to finish_task_switch(). - Check for NULL t->mm in membarrier_arch_fork(). - Use membarrier_sched_in() in generic code, which invokes the arch-specific membarrier_arch_sched_in(). This fixes allnoconfig build on PowerPC. - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing allnoconfig build on PowerPC. - Build and runtime tested on PowerPC. Changes since v3: - Simply rely on copy_mm() to copy the membarrier_private_expedited mm field on fork. - powerpc: test thread flag instead of reading membarrier_private_expedited in membarrier_arch_fork(). - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming from kernel thread, since mmdrop() implies a full barrier. - Set membarrier_private_expedited to 1 only after arch registration code, thus eliminating a race where concurrent commands could succeed when they should fail if issued concurrently with process registration. - Use READ_ONCE() for membarrier_private_expedited field access in membarrier_private_expedited. Matches WRITE_ONCE() performed in process registration. Changes since v4: - Move powerpc hook from sched_in() to switch_mm(), based on feedback from Nicholas Piggin. Changes since v5: - Rebase on v4.14-rc6. - Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on powerpc (v2)" Changes since v6: - Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED. Signed-off-by: Mathieu Desnoyers CC: Peter Zijlstra CC: Paul E. McKenney CC: Boqun Feng CC: Andrew Hunter CC: Maged Michael CC: Avi Kivity CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Dave Watson CC: Alan Stern CC: Will Deacon CC: Andy Lutomirski CC: Ingo Molnar CC: Alexander Viro CC: Nicholas Piggin CC: linuxppc-dev@lists.ozlabs.org CC: linux-a...@vger.kernel.org --- MAINTAINERS | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/membarrier.h | 25 + arch/powerpc/mm/mmu_context.c | 7 +++ include/linux/sched/mm.h | 12 +++- init/Kconfig | 3 +++ kernel/sched/core.c | 10 -- kernel/sched/membarrier.c | 9 + 8 files changed, 57 insertions(+), 11 deletions(-) create mode 100644 arch/powerpc/include/asm/membarrier.h diff --git a/MAINTAINERS b/MAINTAINERS index 1022b5f51cd1..1c02a2be1698 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8837,6 +8837,7 @@ L:linux-ker...@vger.kernel.org S: Supported F: kernel/sched/membarrier.c F: include/uapi/linux/membarrier.h +F: arch/powerpc/include/asm/membarrier.h MEMORY MANAGEMENT L: linux...@kvack.org diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 41d1dae3b1b5..e54a822e5fb9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -139,6 +139,7 @@ config PPC select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_HAS_MEMBARRIER_HOOKS select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h new file mode 100644 index ..046f96768ab5 --- /dev/null +++ b/arch
RE: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB
> -Original Message- > From: Kishon Vijay Abraham I [mailto:kis...@ti.com] > Sent: Friday, November 10, 2017 12:22 AM > To: Xiaowei Bao ; robh...@kernel.org; > mark.rutl...@arm.com; catalin.mari...@arm.com; will.dea...@arm.com; > bhelg...@google.com; shawn...@kernel.org; Madalin-cristian Bucur > ; Sumit Garg ; Y.b. Lu > ; hongtao@nxp.com; Andy Tang > ; Leo Li ; jingooh...@gmail.com; > pbrobin...@gmail.com; songxiao...@hisilicon.com; > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux- > ker...@vger.kernel.org; linux-...@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; Z.q. Hou ; Mingkai Hu > ; M.h. Lian > Subject: Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB > > Hi Bao, > > On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote: > > Add the property of inbound and outbound windows number for ep driver. > > > > Signed-off-by: Bao Xiaowei > > Acked-by: Minghuan Lian > > --- > > v2: > > - no change > > v3: > > - modify the commit message > > v4: > > - no change > > > > arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++ > > 1 file changed, 6 insertions(+) > > $subject should start with something like > arm64: dts: ls1046a: ** > > > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > > b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > > index 06b5e12d04d8..f8332669663c 100644 > > --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > > @@ -674,6 +674,8 @@ > > device_type = "pci"; > > dma-coherent; > > num-lanes = <4>; > > + num-ib-windows = <6>; > > + num-ob-windows = <6>; > > EP specific properties shouldn't be added in RC dt node. Ideally you should > have > a separate dt node for RC and EP. It is a single PCIe controller which can be configured to either RC mode or EP mode. Wouldn't it conflict with the device tree principles to have two device tree nodes for the same PCIe controller? And obviously the two modes cannot be used at the same time so we cannot have two drivers both probe on the same hardware. Regards, Leo
Re: [PATCH v2] watchdog: mpc8xxx: use the core worker function
On Wed, Nov 08, 2017 at 03:39:44PM +0100, Christophe Leroy wrote: > The watchdog core includes a worker function which pings the > watchdog until user app starts pinging it and which also > pings it if the HW require more frequent pings. > Use that function instead of the dedicated timer. > In the mean time, we can allow the user to change the timeout. > > Then change the timeout module parameter to use seconds and > use the watchdog_init_timeout() core function. > > On some HW (eg: the 8xx), SWCRR contains bits unrelated to the > watchdog which have to be preserved upon write. > > This driver has nothing preventing the use of the magic close, so > enable it. > > Signed-off-by: Christophe Leroy Couple of comments, but unrelated to this patch. Reviewed-by: Guenter Roeck > --- > v2: set ddata->wdd.max_hw_heartbeat_ms and ddata->wdd.min_timeout > in probe instead of start > > drivers/watchdog/mpc8xxx_wdt.c | 84 > +++--- > 1 file changed, 38 insertions(+), 46 deletions(-) > > diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c > index 366e5c7e650b..aca2d6323f8a 100644 > --- a/drivers/watchdog/mpc8xxx_wdt.c > +++ b/drivers/watchdog/mpc8xxx_wdt.c > @@ -22,7 +22,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -31,10 +30,13 @@ > #include Not needed. > #include > > +#define WATCHDOG_TIMEOUT 10 > + > struct mpc8xxx_wdt { > __be32 res0; > __be32 swcrr; /* System watchdog control register */ > #define SWCRR_SWTC 0x /* Software Watchdog Time Count. */ > +#define SWCRR_SWF 0x0008 /* Software Watchdog Freeze (mpc8xx). */ > #define SWCRR_SWEN 0x0004 /* Watchdog Enable bit. */ > #define SWCRR_SWRI 0x0002 /* Software Watchdog Reset/Interrupt Select > bit.*/ > #define SWCRR_SWPR 0x0001 /* Software Watchdog Counter Prescale bit. */ > @@ -52,14 +54,15 @@ struct mpc8xxx_wdt_type { > struct mpc8xxx_wdt_ddata { > struct mpc8xxx_wdt __iomem *base; > struct watchdog_device wdd; > - struct timer_list timer; > spinlock_t lock; Not needed (the watchdog core handles locking). > + u16 swtc; > }; > > -static u16 timeout = 0x; > +static u16 timeout; > module_param(timeout, ushort, 0); > MODULE_PARM_DESC(timeout, > - "Watchdog timeout in ticks. (0 + "Watchdog timeout in seconds. (1 > static bool reset = 1; > module_param(reset, bool, 0); > @@ -80,31 +83,27 @@ static void mpc8xxx_wdt_keepalive(struct > mpc8xxx_wdt_ddata *ddata) > spin_unlock(&ddata->lock); > } > > -static void mpc8xxx_wdt_timer_ping(unsigned long arg) > -{ > - struct mpc8xxx_wdt_ddata *ddata = (void *)arg; > - > - mpc8xxx_wdt_keepalive(ddata); > - /* We're pinging it twice faster than needed, just to be sure. */ > - mod_timer(&ddata->timer, jiffies + HZ * ddata->wdd.timeout / 2); > -} > - > static int mpc8xxx_wdt_start(struct watchdog_device *w) > { > struct mpc8xxx_wdt_ddata *ddata = > container_of(w, struct mpc8xxx_wdt_ddata, wdd); > - > - u32 tmp = SWCRR_SWEN | SWCRR_SWPR; > + u32 tmp = in_be32(&ddata->base->swcrr); > > /* Good, fire up the show */ > + tmp &= ~(SWCRR_SWTC | SWCRR_SWF | SWCRR_SWEN | SWCRR_SWRI | SWCRR_SWPR); > + tmp |= SWCRR_SWEN | SWCRR_SWPR | (ddata->swtc << 16); > + > if (reset) > tmp |= SWCRR_SWRI; > > - tmp |= timeout << 16; > - > out_be32(&ddata->base->swcrr, tmp); > > - del_timer_sync(&ddata->timer); > + tmp = in_be32(&ddata->base->swcrr); > + if (!(tmp & SWCRR_SWEN)) > + return -EOPNOTSUPP; > + > + ddata->swtc = tmp >> 16; > + set_bit(WDOG_HW_RUNNING, &ddata->wdd.status); > > return 0; > } > @@ -118,17 +117,8 @@ static int mpc8xxx_wdt_ping(struct watchdog_device *w) > return 0; > } > > -static int mpc8xxx_wdt_stop(struct watchdog_device *w) > -{ > - struct mpc8xxx_wdt_ddata *ddata = > - container_of(w, struct mpc8xxx_wdt_ddata, wdd); > - > - mod_timer(&ddata->timer, jiffies); > - return 0; > -} > - > static struct watchdog_info mpc8xxx_wdt_info = { > - .options = WDIOF_KEEPALIVEPING, > + .options = WDIOF_KEEPALIVEPING | WDIOF_MAGICCLOSE | WDIOF_SETTIMEOUT, > .firmware_version = 1, > .identity = "MPC8xxx", > }; > @@ -137,7 +127,6 @@ static struct watchdog_ops mpc8xxx_wdt_ops = { > .owner = THIS_MODULE, > .start = mpc8xxx_wdt_start, > .ping = mpc8xxx_wdt_ping, > - .stop = mpc8xxx_wdt_stop, > }; > > static int mpc8xxx_wdt_probe(struct platform_device *ofdev) > @@ -148,7 +137,6 @@ static int mpc8xxx_wdt_probe(struct platform_device > *ofdev) > struct mpc8xxx_wdt_ddata *ddata; > u32 freq = fsl_get_sys_freq(); > bool enabled; > - unsigned int timeout_sec; > > wdt_type = of_device_get_match_data(&ofdev->dev); > if (!wdt_type) > @@ -173,27 +161,17 @@ static int mpc
[PATCH 2/2] powerpc/perf: Fix IMC_MAX_PMU macro
IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds nest pmu information. Current value for the macro is 32 based on the initial number of nest pmu units supported by the nest microcode. But going forward, microcode could support more nest units. Instead of static storage, patch to fix the code to dynamically allocate an array based on the number of nest imc units found in the device tree. Fixes:8f95faaac56c1 ('powerpc/powernv: Detect and create IMC device') Signed-off-by: Madhavan Srinivasan --- arch/powerpc/include/asm/imc-pmu.h| 6 +- arch/powerpc/perf/imc-pmu.c | 15 --- arch/powerpc/platforms/powernv/opal-imc.c | 16 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h index 7f74c282710f..fad0e6ff460f 100644 --- a/arch/powerpc/include/asm/imc-pmu.h +++ b/arch/powerpc/include/asm/imc-pmu.h @@ -21,11 +21,6 @@ #include /* - * For static allocation of some of the structures. - */ -#define IMC_MAX_PMUS 32 - -/* * Compatibility macros for IMC devices */ #define IMC_DTB_COMPAT "ibm,opal-in-memory-counters" @@ -125,4 +120,5 @@ enum { extern int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id); extern void thread_imc_disable(void); +extern int get_max_nest_dev(void); #endif /* __ASM_POWERPC_IMC_PMU_H */ diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 74db696ef365..c40cb5f7ceaf 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -26,7 +26,7 @@ */ static DEFINE_MUTEX(nest_init_lock); static DEFINE_PER_CPU(struct imc_pmu_ref *, local_nest_imc_refc); -static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; +static struct imc_pmu **per_nest_pmu_arr; static cpumask_t nest_imc_cpumask; struct imc_pmu_ref *nest_imc_refc; static int nest_pmus; @@ -286,13 +286,14 @@ static struct imc_pmu_ref *get_nest_pmu_ref(int cpu) static void nest_change_cpu_context(int old_cpu, int new_cpu) { struct imc_pmu **pn = per_nest_pmu_arr; - int i; if (old_cpu < 0 || new_cpu < 0) return; - for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++) + while (*pn) { perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu); + pn++; + } } static int ppc_nest_imc_cpu_offline(unsigned int cpu) @@ -1192,6 +1193,7 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr) kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs); kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]); kfree(pmu_ptr); + kfree(per_nest_pmu_arr); return; } @@ -1216,6 +1218,13 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent, return -ENOMEM; /* Needed for hotplug/migration */ + if (!per_nest_pmu_arr) { + per_nest_pmu_arr = kcalloc(get_max_nest_dev() + 1, + sizeof(struct imc_pmu *), + GFP_KERNEL); + if (!per_nest_pmu_arr) + return -ENOMEM; + } per_nest_pmu_arr[pmu_index] = pmu_ptr; break; case IMC_DOMAIN_CORE: diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index b150f4deaccf..4764e6932cb7 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -153,6 +153,22 @@ static void disable_core_pmu_counters(void) put_online_cpus(); } +int get_max_nest_dev(void) +{ + struct device_node *node; + u32 pmu_units, type; + + for_each_compatible_node(node, NULL, IMC_DTB_UNIT_COMPAT) { + if (of_property_read_u32(node, "type", &type)) + continue; + + if (type == IMC_TYPE_CHIP) + pmu_units++; + } + + return pmu_units; +} + static int opal_imc_counters_probe(struct platform_device *pdev) { struct device_node *imc_dev = pdev->dev.of_node; -- 2.7.4
[PATCH 1/2] powerpc/perf: Fix pmu_count to count only nest imc pmus
"pmu_count" in opal_imc_counters_probe() is intended to hold the number of successful nest imc pmu registerations. But current code also counts other imc units like core_imc and thread_imc. Patch add a check to count only nest imc pmus. Signed-off-by: Madhavan Srinivasan --- arch/powerpc/platforms/powernv/opal-imc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index 21f6531fae20..b150f4deaccf 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -191,8 +191,10 @@ static int opal_imc_counters_probe(struct platform_device *pdev) break; } - if (!imc_pmu_create(imc_dev, pmu_count, domain)) - pmu_count++; + if (!imc_pmu_create(imc_dev, pmu_count, domain)) { + if (domain == IMC_DOMAIN_NEST) + pmu_count++; + } } return 0; -- 2.7.4
Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
On 11/10/2017 12:54 PM, Nathan Fontenot wrote: On 11/10/2017 08:41 AM, Desnes Augusto Nunes do Rosário wrote: On 11/09/2017 06:31 PM, Nathan Fontenot wrote: On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote: This patch implements and enables VDP support for the ibmvnic driver. Moreover, it includes the implementation of suitable structs, signal transmission/handling and functions which allows the retrival of firmware information from the ibmvnic card through the ethtool command. Signed-off-by: Desnes A. Nunes do Rosario Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 149 - drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- 2 files changed, 173 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d0cff28..693b502 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter) return 0; } +static void release_vpd_data(struct ibmvnic_adapter *adapter) +{ + if (!adapter->vpd) + return; + + kfree(adapter->vpd->buff); + kfree(adapter->vpd); +} + static void release_tx_pools(struct ibmvnic_adapter *adapter) { struct ibmvnic_tx_pool *tx_pool; @@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter *adapter) { int i; + release_vpd_data(adapter); + release_tx_pools(adapter); release_rx_pools(adapter); @@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device *netdev) return rc; } +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) +{ + struct device *dev = &adapter->vdev->dev; + union ibmvnic_crq crq; + dma_addr_t dma_addr; + int len; + + if (adapter->vpd->buff) + len = adapter->vpd->len; + + reinit_completion(&adapter->fw_done); + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; + crq.get_vpd_size.cmd = GET_VPD_SIZE; + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + Shouldn't there be a check for the return code when getting the vpd size? Hello Nathan, This check is already being performed on the handle_vpd_size_rsp() function down below. In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union in ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives from the VNIC adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union with the vpd size information and the rc.code. If successful, a &adapter->fw_done is sent and this part of the code continues; however if not, a dev_error() is thrown. Same logic applies to GET_VPD/GET_VPD_RSP. Yes, I did see that code. You do a complet of the completion variable for both success and failure, this then lets this routine continue irregardless of the results of the get vpd size request. The call to dev_err will print the error message but does not prevent use from bailing if the get vpd size fails. Perhaps setting vpd->len to -1 to indicate the get vpd call failed which could then be checked by the requester. -Nathan >> What I am adding on the next version of the patch is a check if adapter->vpd->len is different than 0 before allocating adapter->vpd->buff, since that in a case of a failure, adapter->vpd->len will be 0. I do concur with your observation that the break is necessary. If the reception of vpd failed, adapter->vpd->len will be still zeroed out since it was created with kzalloc in init_resources(). Thus, do you agree if in the next version I send the following code? === + reinit_completion(&adapter->fw_done); + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; + crq.get_vpd_size.cmd = GET_VPD_SIZE; + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + ->+ if(!adapter->vpd->len) ->+ return -ENODATA; + + if (!adapter->vpd->buff) + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); + else if (adapter->vpd->len != len) + adapter->vpd->buff = + krealloc(adapter->vpd->buff, +adapter->vpd->len, GFP_KERNEL); === Best Regards, + if (!adapter->vpd->buff) + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); + else if (adapter->vpd->len != len) + adapter->vpd->buff = + krealloc(adapter->vpd->buff, + adapter->vpd->len, GFP_KERNEL); + + if (!adapter->vpd->buff) { + dev_err(dev, "Could allocate VPD buffer\n"); + return -ENOMEM; + } + + adapter->vpd->dma_addr = + dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, dma_addr)) { + dev_err(dev, "Could not map VPD buffer\n"); + return -ENOMEM; + } + +
Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers
On Fri, Nov 10, 2017 at 08:48:24AM +0900, Sergey Senozhatsky wrote: > All Ack-s/Tested-by-s were dropped, since the patch set has been > reworked. I'm kindly asking arch-s maintainers and developers to test it > once again. Sorry for any inconveniences and thanks for your help in > advance. You can add back the: Tested-by: Tony Luck #ia64 Apart from my comment about dropping the Examples from the Documentation the series looks OK to me. -Tony
Re: [PATCH v9 00/51] powerpc, mm: Memory Protection Keys
Hi Le 06/11/2017 à 09:56, Ram Pai a écrit : Memory protection keys enable applications to protect its address space from inadvertent access from or corruption by itself. These patches along with the pte-bit freeing patch series enables the protection key feature on powerpc; 4k and 64k hashpage kernels. It also changes the generic and x86 code to expose memkey features through sysfs. Finally testcases and Documentation is updated. All patches can be found at -- https://github.com/rampai/memorykeys.git memkey.v9 As far as I can see you are focussing the implementation on 64 bits powerpc. This could also be implemented on 32 bits powerpc, for instance the 8xx has MMU Access Protection Registers which can be used to define 16 domains and could I think be used for implementing protection keys. Of course the challenge after that would be to find 4 spare PTE bits, I'm sure we can find them on the 8xx, at least when using 16k pages we have 2 bits already available, then by merging PAGE_SHARED and PAGE_USER and by reducing PAGE_RO to only one bit we can get the 4 spare bits. Therefore I think it would be great if you could implement a framework common to both PPC32 and PPC64. Christophe The overall idea: - A process allocates a key and associates it with an address range within its address space. The process then can dynamically set read/write permissions on the key without involving the kernel. Any code that violates the permissions of the address space; as defined by its associated key, will receive a segmentation fault. This patch series enables the feature on PPC64 HPTE platform. ISA3.0 section 5.7.13 describes the detailed specifications. Highlevel view of the design: --- When an application associates a key with a address address range, program the key in the Linux PTE. When the MMU detects a page fault, allocate a hash page and program the key into HPTE. And finally when the MMU detects a key violation; due to invalid application access, invoke the registered signal handler and provide the violated key number. Testing: --- This patch series has passed all the protection key tests available in the selftest directory.The tests are updated to work on both x86 and powerpc. The selftests have passed on x86 and powerpc hardware. History: --- version v9: (1) used jump-labels to optimize code -- Balbir (2) fixed a register initialization bug noted by Balbir (3) fixed inappropriate use of paca to pass siginfo and keys to signal handler (4) Cleanup of comment style not to be right justified -- mpe (5) restructured the patches to depend on the availability of VM_PKEY_BIT4 in include/linux/mm.h (6) Incorporated comments from Dave Hansen towards changes to selftest and got them tested on x86. version v8: (1) Contents of the AMR register withdrawn from the siginfo structure. Applications can always read the AMR register. (2) AMR/IAMR/UAMOR are now available through ptrace system call. -- thanks to Thiago (3) code changes to handle legacy power cpus that do not support execute-disable. (4) incorporates many code improvement suggestions. version v7: (1) refers to device tree property to enable protection keys. (2) adds 4K PTE support. (3) fixes a couple of bugs noticed by Thiago (4) decouples this patch series from arch- independent code. This patch series can now stand by itself, with one kludge patch(2). version v7: (1) refers to device tree property to enable protection keys. (2) adds 4K PTE support. (3) fixes a couple of bugs noticed by Thiago (4) decouples this patch series from arch- independent code. This patch series can now stand by itself, with one kludge patch(2). version v6: (1) selftest changes are broken down into 20 incremental patches. (2) A separate key allocation mask that includes PKEY_DISABLE_EXECUTE is added for powerpc (3) pkey feature is enabled for 64K HPT case only. RPT and 4k HPT is disabled. (4) Documentation is updated to better capture the semantics. (5) introduced arch_pkeys_enabled() to find if an arch enables pkeys. Correspond- ing change the logic that displays key value in smaps. (6) code rearranged in many places based on comments from Dave Hansen, Balbir, Anshuman. (7) fixed one bug where a bogus key could be associated successfully in pkey_m
Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()
On Fri, Nov 10, 2017 at 08:48:29AM +0900, Sergey Senozhatsky wrote: > -Examples:: > - > - printk("Going to call: %pF\n", gettimeofday); > - printk("Going to call: %pF\n", p->func); > - printk("%s: called from %pS\n", __func__, (void *)_RET_IP_); > - printk("%s: called from %pS\n", __func__, > - (void *)__builtin_return_address(0)); > - printk("Faulted at %pS\n", (void *)regs->ip); > - printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack); Did you mean to delete the Examples completely? Wouldn't it be better to just update (s/%pF/%pS/g)? -Tony
[PATCH v2 2/2] powerpc/pci: Unroll two pass loop when scanning bridges
The current scanning code is really hard to understand because it calls the same function in a loop where pass value is changed without any comments explaining it: for (pass = 0; pass < 2; pass++) for_each_pci_bridge(dev, bus) max = pci_scan_bridge(bus, dev, max, pass); Unfamiliar reader cannot tell easily what is the purpose of this loop without looking at internals of pci_scan_bridge(). In order to make this bit easier to understand, open-code the loop in pci_scan_child_bus() and pci_hp_add_bridge() with added comments. No functional changes intended. Cc: Mika Westerberg Signed-off-by: Andy Shevchenko --- arch/powerpc/kernel/pci-hotplug.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 741f47295188..cf47b1aec4c2 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -104,7 +104,7 @@ EXPORT_SYMBOL_GPL(pci_hp_remove_devices); */ void pci_hp_add_devices(struct pci_bus *bus) { - int slotno, mode, pass, max; + int slotno, mode, max; struct pci_dev *dev; struct pci_controller *phb; struct device_node *dn = pci_bus_to_OF_node(bus); @@ -133,10 +133,17 @@ void pci_hp_add_devices(struct pci_bus *bus) pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); pcibios_setup_bus_devices(bus); max = bus->busn_res.start; - for (pass = 0; pass < 2; pass++) { - for_each_pci_bridge(dev, bus) - max = pci_scan_bridge(bus, dev, max, pass); - } + /* +* Scan bridges that are already configured. We don't touch +* them unless they are misconfigured (which will be done in +* the second scan below). +*/ + for_each_pci_bridge(dev, bus) + max = pci_scan_bridge(bus, dev, max, 0); + + /* Scan bridges that need to be reconfigured */ + for_each_pci_bridge(dev, bus) + max = pci_scan_bridge(bus, dev, max, 1); } pcibios_finish_adding_to_bus(bus); } -- 2.14.2
[PATCH v2 1/2] powerpc/pci: convert to use for_each_pci_bridge() helper
...which makes code slightly cleaner. Requires: d43f59ce6c50 ("PCI: Add for_each_pci_bridge() helper") Acked-by: Michael Ellerman Signed-off-by: Andy Shevchenko --- arch/powerpc/kernel/pci-hotplug.c | 7 ++- arch/powerpc/kernel/pci_of_scan.c | 7 ++- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 2d71269e7dc1..741f47295188 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -134,11 +134,8 @@ void pci_hp_add_devices(struct pci_bus *bus) pcibios_setup_bus_devices(bus); max = bus->busn_res.start; for (pass = 0; pass < 2; pass++) { - list_for_each_entry(dev, &bus->devices, bus_list) { - if (pci_is_bridge(dev)) - max = pci_scan_bridge(bus, dev, - max, pass); - } + for_each_pci_bridge(dev, bus) + max = pci_scan_bridge(bus, dev, max, pass); } } pcibios_finish_adding_to_bus(bus); diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c index 0d790f8432d2..8bdaa2a6fa62 100644 --- a/arch/powerpc/kernel/pci_of_scan.c +++ b/arch/powerpc/kernel/pci_of_scan.c @@ -369,11 +369,8 @@ static void __of_scan_bus(struct device_node *node, struct pci_bus *bus, pcibios_setup_bus_devices(bus); /* Now scan child busses */ - list_for_each_entry(dev, &bus->devices, bus_list) { - if (pci_is_bridge(dev)) { - of_scan_pci_bridge(dev); - } - } + for_each_pci_bridge(dev, bus) + of_scan_pci_bridge(dev); } /** -- 2.14.2
Re: [PATCH] powerpc/64s/slice: Use addr limit when computing slice mask
Michael Ellerman writes: > "Aneesh Kumar K.V" writes: > >> While computing slice mask for the free area we need make sure we only search >> in the addr limit applicable for this mmap. We update the slb_addr_limit >> after we request for a mmap above 128TB. But the following mmap request >> with hint addr below 128TB should still limit its search to below 128TB. ie. >> we should not use slb_addr_limit to compute slice mask in this case. Instead, >> we should derive high addr limit based on the mmap hint addr value. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> arch/powerpc/mm/slice.c | 34 ++ >> 1 file changed, 22 insertions(+), 12 deletions(-) > > How does this relate to the fixes Nick has sent? This patch is on top of the patch series sent by Nick. Without this patch we will allocate memory across the 128TB range if hint_addr < 128TB but hint_addr + len is more. Inorder to recreate this issue we will have to map stack below. Hence one won't hit the error in general case. > > cheers > >> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c >> index 564fff06f5c1..23ec2c5e3b78 100644 >> --- a/arch/powerpc/mm/slice.c >> +++ b/arch/powerpc/mm/slice.c >> @@ -122,7 +122,8 @@ static int slice_high_has_vma(struct mm_struct *mm, >> unsigned long slice) >> return !slice_area_is_free(mm, start, end - start); >> } >> >> -static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask >> *ret) >> +static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask >> *ret, >> +unsigned long high_limit) >> { >> unsigned long i; >> >> @@ -133,15 +134,16 @@ static void slice_mask_for_free(struct mm_struct *mm, >> struct slice_mask *ret) >> if (!slice_low_has_vma(mm, i)) >> ret->low_slices |= 1u << i; >> >> -if (mm->context.slb_addr_limit <= SLICE_LOW_TOP) >> +if (high_limit <= SLICE_LOW_TOP) >> return; >> >> -for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) >> +for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) >> if (!slice_high_has_vma(mm, i)) >> __set_bit(i, ret->high_slices); >> } >> >> -static void slice_mask_for_size(struct mm_struct *mm, int psize, struct >> slice_mask *ret) >> +static void slice_mask_for_size(struct mm_struct *mm, int psize, struct >> slice_mask *ret, >> +unsigned long high_limit) >> { >> unsigned char *hpsizes; >> int index, mask_index; >> @@ -156,8 +158,11 @@ static void slice_mask_for_size(struct mm_struct *mm, >> int psize, struct slice_ma >> if (((lpsizes >> (i * 4)) & 0xf) == psize) >> ret->low_slices |= 1u << i; >> >> +if (high_limit <= SLICE_LOW_TOP) >> +return; >> + >> hpsizes = mm->context.high_slices_psize; >> -for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) { >> +for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) { >> mask_index = i & 0x1; >> index = i >> 1; >> if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize) >> @@ -169,6 +174,10 @@ static int slice_check_fit(struct mm_struct *mm, >> struct slice_mask mask, struct slice_mask available) >> { >> DECLARE_BITMAP(result, SLICE_NUM_HIGH); >> +/* >> + * Make sure we just do bit compare only to the max >> + * addr limit and not the full bit map size. >> + */ >> unsigned long slice_count = >> GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); >> >> bitmap_and(result, mask.high_slices, >> @@ -472,7 +481,7 @@ unsigned long slice_get_unmapped_area(unsigned long >> addr, unsigned long len, >> /* First make up a "good" mask of slices that have the right size >> * already >> */ >> -slice_mask_for_size(mm, psize, &good_mask); >> +slice_mask_for_size(mm, psize, &good_mask, high_limit); >> slice_print_mask(" good_mask", good_mask); >> >> /* >> @@ -497,7 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long >> addr, unsigned long len, >> #ifdef CONFIG_PPC_64K_PAGES >> /* If we support combo pages, we can allow 64k pages in 4k slices */ >> if (psize == MMU_PAGE_64K) { >> -slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask); >> +slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit); >> if (fixed) >> slice_or_mask(&good_mask, &compat_mask); >> } >> @@ -530,11 +539,11 @@ unsigned long slice_get_unmapped_area(unsigned long >> addr, unsigned long len, >> return newaddr; >> } >> } >> - >> -/* We don't fit in the good mask, check what other slices are >> +/* >> + * We don't fit in the good mask, check what other slices are >> * empty and thus can be converted >> */ >> -slice_mask_for_free(mm, &
Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
On 11/10/2017 08:41 AM, Desnes Augusto Nunes do Rosário wrote: > > > On 11/09/2017 06:31 PM, Nathan Fontenot wrote: >> On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote: >>> This patch implements and enables VDP support for the ibmvnic driver. >>> Moreover, it includes the implementation of suitable structs, signal >>> transmission/handling and functions which allows the retrival of firmware >>> information from the ibmvnic card through the ethtool command. >>> >>> Signed-off-by: Desnes A. Nunes do Rosario >>> Signed-off-by: Thomas Falcon >>> --- >>> drivers/net/ethernet/ibm/ibmvnic.c | 149 >>> - >>> drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- >>> 2 files changed, 173 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c >>> b/drivers/net/ethernet/ibm/ibmvnic.c >>> index d0cff28..693b502 100644 >>> --- a/drivers/net/ethernet/ibm/ibmvnic.c >>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >>> @@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter >>> *adapter) >>> return 0; >>> } >>> >>> +static void release_vpd_data(struct ibmvnic_adapter *adapter) >>> +{ >>> + if (!adapter->vpd) >>> + return; >>> + >>> + kfree(adapter->vpd->buff); >>> + kfree(adapter->vpd); >>> +} >>> + >>> static void release_tx_pools(struct ibmvnic_adapter *adapter) >>> { >>> struct ibmvnic_tx_pool *tx_pool; >>> @@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter >>> *adapter) >>> { >>> int i; >>> >>> + release_vpd_data(adapter); >>> + >>> release_tx_pools(adapter); >>> release_rx_pools(adapter); >>> >>> @@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device >>> *netdev) >>> return rc; >>> } >>> >>> +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) >>> +{ >>> + struct device *dev = &adapter->vdev->dev; >>> + union ibmvnic_crq crq; >>> + dma_addr_t dma_addr; >>> + int len; >>> + >>> + if (adapter->vpd->buff) >>> + len = adapter->vpd->len; >>> + >>> + reinit_completion(&adapter->fw_done); >>> + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; >>> + crq.get_vpd_size.cmd = GET_VPD_SIZE; >>> + ibmvnic_send_crq(adapter, &crq); >>> + wait_for_completion(&adapter->fw_done); >>> + >> >> Shouldn't there be a check for the return code when getting the >> vpd size? > > Hello Nathan, > > This check is already being performed on the handle_vpd_size_rsp() function > down below. > > In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union in > ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives from the > VNIC adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union with the vpd > size information and the rc.code. If successful, a &adapter->fw_done is sent > and this part of the code continues; however if not, a dev_error() is thrown. > Same logic applies to GET_VPD/GET_VPD_RSP. > Yes, I did see that code. You do a complet of the completion variable for both success and failure, this then lets this routine continue irregardless of the results of the get vpd size request. The call to dev_err will print the error message but does not prevent use from bailing if the get vpd size fails. Perhaps setting vpd->len to -1 to indicate the get vpd call failed which could then be checked by the requester. -Nathan > What I am adding on the next version of the patch is a check if > adapter->vpd->len is different than 0 before allocating adapter->vpd->buff, > since that in a case of a failure, adapter->vpd->len will be 0. > > Best Regards, > >> >> >>> + if (!adapter->vpd->buff) >>> + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); >>> + else if (adapter->vpd->len != len) >>> + adapter->vpd->buff = >>> + krealloc(adapter->vpd->buff, >>> + adapter->vpd->len, GFP_KERNEL); >>> + >>> + if (!adapter->vpd->buff) { >>> + dev_err(dev, "Could allocate VPD buffer\n"); >>> + return -ENOMEM; >>> + } >>> + >>> + adapter->vpd->dma_addr = >>> + dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len, >>> + DMA_FROM_DEVICE); >>> + if (dma_mapping_error(dev, dma_addr)) { >>> + dev_err(dev, "Could not map VPD buffer\n"); >>> + return -ENOMEM; >>> + } >>> + >>> + reinit_completion(&adapter->fw_done); >>> + crq.get_vpd.first = IBMVNIC_CRQ_CMD; >>> + crq.get_vpd.cmd = GET_VPD; >>> + crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr); >>> + crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len); >>> + ibmvnic_send_crq(adapter, &crq); >>> + wait_for_completion(&adapter->fw_done); >>> + >>> + return 0; >>> +} >>> + >>> static int init_resources(struct ibmvnic_adapter *adapter) >>> { >>> struct net_device *netdev = adapter->netdev; >>> @@ -850,6 +908,10 @@ static int init_resources(struct ibmvnic_adapter >>> *adapter) >>>
Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
On 11/09/2017 06:31 PM, Nathan Fontenot wrote: On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote: This patch implements and enables VDP support for the ibmvnic driver. Moreover, it includes the implementation of suitable structs, signal transmission/handling and functions which allows the retrival of firmware information from the ibmvnic card through the ethtool command. Signed-off-by: Desnes A. Nunes do Rosario Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 149 - drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- 2 files changed, 173 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d0cff28..693b502 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter) return 0; } +static void release_vpd_data(struct ibmvnic_adapter *adapter) +{ + if (!adapter->vpd) + return; + + kfree(adapter->vpd->buff); + kfree(adapter->vpd); +} + static void release_tx_pools(struct ibmvnic_adapter *adapter) { struct ibmvnic_tx_pool *tx_pool; @@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter *adapter) { int i; + release_vpd_data(adapter); + release_tx_pools(adapter); release_rx_pools(adapter); @@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device *netdev) return rc; } +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) +{ + struct device *dev = &adapter->vdev->dev; + union ibmvnic_crq crq; + dma_addr_t dma_addr; + int len; + + if (adapter->vpd->buff) + len = adapter->vpd->len; + + reinit_completion(&adapter->fw_done); + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; + crq.get_vpd_size.cmd = GET_VPD_SIZE; + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + Shouldn't there be a check for the return code when getting the vpd size? Hello Nathan, This check is already being performed on the handle_vpd_size_rsp() function down below. In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union in ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives from the VNIC adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union with the vpd size information and the rc.code. If successful, a &adapter->fw_done is sent and this part of the code continues; however if not, a dev_error() is thrown. Same logic applies to GET_VPD/GET_VPD_RSP. What I am adding on the next version of the patch is a check if adapter->vpd->len is different than 0 before allocating adapter->vpd->buff, since that in a case of a failure, adapter->vpd->len will be 0. Best Regards, + if (!adapter->vpd->buff) + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); + else if (adapter->vpd->len != len) + adapter->vpd->buff = + krealloc(adapter->vpd->buff, +adapter->vpd->len, GFP_KERNEL); + + if (!adapter->vpd->buff) { + dev_err(dev, "Could allocate VPD buffer\n"); + return -ENOMEM; + } + + adapter->vpd->dma_addr = + dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, dma_addr)) { + dev_err(dev, "Could not map VPD buffer\n"); + return -ENOMEM; + } + + reinit_completion(&adapter->fw_done); + crq.get_vpd.first = IBMVNIC_CRQ_CMD; + crq.get_vpd.cmd = GET_VPD; + crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr); + crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len); + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + + return 0; +} + static int init_resources(struct ibmvnic_adapter *adapter) { struct net_device *netdev = adapter->netdev; @@ -850,6 +908,10 @@ static int init_resources(struct ibmvnic_adapter *adapter) if (rc) return rc; + adapter->vpd = kzalloc(sizeof(*adapter->vpd), GFP_KERNEL); + if (!adapter->vpd) + return -ENOMEM; + adapter->map_id = 1; adapter->napi = kcalloc(adapter->req_rx_queues, sizeof(struct napi_struct), GFP_KERNEL); @@ -950,6 +1012,10 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); + + /* Vital Product Data (VPD) */ + ibmvnic_get_vpd(adapter); + mutex_unlock(&adapter->reset_lock); return rc; @@ -1878,11 +1944,15 @@ static int ibmvnic_get_link_ksettings(struct net_device *netdev, return 0; } -static void ibmvnic_get_drvi
Re: [linux-next][0692229e] next-20171106 fails to boot on Power 7
Hi Abdul, On Tue 07-11-17 11:28:54, Michal Hocko wrote: > On Tue 07-11-17 15:20:29, Abdul Haleem wrote: > > Hi, > > > > Today's next kernel fails to boot on Power 7 Machine with below errors > > in boot log messages. > > > > 'Uhuuh, elf segement at 1004 requested but the memory is > > mapped already' > > > > It was introduced with commit: > > > > 0692229e : fs/binfmt_elf.c: drop MAP_FIXED usage from elf_map > > Weird. Clashes shouldn't really happen. Maybe power is doing something > different from other platforms. Could you apply the following debugging > patch to see what is going on there? Did you have chance to test with this debugging patch, please? > --- > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index 0abc30d681ae..f098aaf60039 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -361,8 +361,28 @@ static unsigned long elf_vm_mmap(struct file *filep, > unsigned long addr, > return map_addr; > > if ((type & MAP_FIXED) && map_addr != addr) { > - pr_info("Uhuuh, elf segement at %p requested but the memory is > mapped already\n", > - (void*)addr); > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *vma; > + pr_info("Uhuuh, elf segement at %p requested but the memory is > mapped already, got %p\n", > + (void*)addr, (void*)map_addr); > + down_read(&mm->mmap_sem); > + vma = find_vma(mm, addr); > + if (vma) { > + const char *name; > + pr_info("Clashing vma [%lx, %lx] flags:%lx", > vma->vm_start, vma->vm_end, vma->vm_flags); > + name = arch_vma_name(vma); > + if (!name) { > + if (vma->vm_start <= mm->brk && > + vma->vm_end >= mm->start_brk) > + name = "[heap]"; > + else if (vma->vm_start <= > vma->vm_mm->start_stack && > + vma->vm_end >= > vma->vm_mm->start_stack) > + name = "[stack]"; > + } > + pr_cont(" name:%s\n", name); > + } else > + pr_info("Uhm, no clashing VMA\n"); > + up_read(&mm->mmap_sem); > return -EAGAIN; > } > > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs
RE: POWER: Unexpected fault when writing to brk-allocated memory
From: Matthew Wilcox > Sent: 09 November 2017 19:44 > > On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote: > > So these semantics are what we're going with? Anything that does mmap() is > > guaranteed of getting a 47-bit pointer and it can use the top 17 bits for > > itself? Is intended to be cross-platform or just x86 and power specific? > > It is x86 and powerpc specific. The arm64 people have apparently stumbled > across apps that expect to be able to use bit 48 for their own purposes. > And their address space is 48 bit by default. Oops. (Do you mean 49bit?) Aren't such apps just doomed to be broken? ISTR there is something on (IIRC) sparc64 that does a 'match' on the high address bits to make it much harder to overrun one area into another. David
Re: [PATCH v9 44/51] selftest/vm: powerpc implementation for generic abstraction
Hi Ram, On Thu, Nov 09, 2017 at 03:37:46PM -0800, Ram Pai wrote: > On Thu, Nov 09, 2017 at 04:47:15PM -0200, Breno Leitao wrote: > > On Mon, Nov 06, 2017 at 12:57:36AM -0800, Ram Pai wrote: > > > @@ -206,12 +209,14 @@ void signal_handler(int signum, siginfo_t *si, void > > > *vucontext) > > > > > > trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; > > > ip = uctxt->uc_mcontext.gregs[REG_IP_IDX]; > > > - fpregset = uctxt->uc_mcontext.fpregs; > > > - fpregs = (void *)fpregset; > > > > Since you removed all references for fpregset now, you probably want to > > remove the declaration of the variable above. > > fpregs is still needed. Right, fpregs is still needed, but not fpregset. Every reference for this variable was removed with your patch. Grepping this variable identifier on a tree with your patches, I see: $ grep fpregset protection_keys.c fpregset_t fpregset;
Re: [PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown
Nicholas Piggin writes: > Currently powernv reboot and shutdown requests just leave secondaries > to do their own things. This is undesirable because they can trigger > any number of watchdogs while waiting for reboot, but also we don't > know what else they might be doing, or they might be stuck somewhere > causing trouble. > > The opal scheduled flash update code already ran into watchdog problems > due to flashing taking a long time, but it's possible for regular > reboots to trigger problems too (this is with watchdog_thresh set to 1, > but I have seen it with watchdog_thresh at the default value once too): > > reboot: Restarting system > [ 360.038896709,5] OPAL: Reboot request... > Watchdog CPU:0 Hard LOCKUP > Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 > Watchdog CPU:16 Hard LOCKUP > watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] > > So remove the special case for flash update, and unconditionally do > smp_send_stop before rebooting. > > Return the CPUs to Linux stop loops rather than OPAL. The reason for > this is that the path to firmware is longer, and the CPUs may have > been interrupted from firmware, which may cause problems to re-enter > it. It's better to put them into a simple spin loop to maximize the > chance of a successful reboot. I always assumed we had to send the CPUs back to OPAL for the flashing procedure. Is it OK to leave them in Linux? cheers
Re: [PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
Josh Poimboeuf writes: > On Fri, Nov 10, 2017 at 01:06:25PM +1100, Balbir Singh wrote: >> On Fri, Nov 10, 2017 at 2:19 AM, Josh Poimboeuf wrote: >> > FWIW, I think it won't matter anyway. I'm currently pursuing the option >> > of inserting nops after local calls, because it has less runtime >> > complexity than using a stub. >> > >> > I think I've figured out a way to do it with a GCC plugin, but if that >> > doesn't work I'll try the asm listing sed approach suggested by Michael. >> > >> >> A plugin that runs for the new kernel with the patch? Just for >> specific files involved in the patch? > > The plugin will affect the code generation of all functions in the patch > module. So all calls in all replacement functions will have the nops. > > Here's a prototype (not yet fully tested): > > > https://github.com/jpoimboe/kpatch/blob/TODO-ppc-fix/kpatch-build/gcc-plugins/ppc64le-plugin.c Nice. cheers
Re: [PATCH] powerpc/64s/slice: Use addr limit when computing slice mask
"Aneesh Kumar K.V" writes: > While computing slice mask for the free area we need make sure we only search > in the addr limit applicable for this mmap. We update the slb_addr_limit > after we request for a mmap above 128TB. But the following mmap request > with hint addr below 128TB should still limit its search to below 128TB. ie. > we should not use slb_addr_limit to compute slice mask in this case. Instead, > we should derive high addr limit based on the mmap hint addr value. > > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/slice.c | 34 ++ > 1 file changed, 22 insertions(+), 12 deletions(-) How does this relate to the fixes Nick has sent? cheers > diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c > index 564fff06f5c1..23ec2c5e3b78 100644 > --- a/arch/powerpc/mm/slice.c > +++ b/arch/powerpc/mm/slice.c > @@ -122,7 +122,8 @@ static int slice_high_has_vma(struct mm_struct *mm, > unsigned long slice) > return !slice_area_is_free(mm, start, end - start); > } > > -static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret) > +static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret, > + unsigned long high_limit) > { > unsigned long i; > > @@ -133,15 +134,16 @@ static void slice_mask_for_free(struct mm_struct *mm, > struct slice_mask *ret) > if (!slice_low_has_vma(mm, i)) > ret->low_slices |= 1u << i; > > - if (mm->context.slb_addr_limit <= SLICE_LOW_TOP) > + if (high_limit <= SLICE_LOW_TOP) > return; > > - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) > + for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) > if (!slice_high_has_vma(mm, i)) > __set_bit(i, ret->high_slices); > } > > -static void slice_mask_for_size(struct mm_struct *mm, int psize, struct > slice_mask *ret) > +static void slice_mask_for_size(struct mm_struct *mm, int psize, struct > slice_mask *ret, > + unsigned long high_limit) > { > unsigned char *hpsizes; > int index, mask_index; > @@ -156,8 +158,11 @@ static void slice_mask_for_size(struct mm_struct *mm, > int psize, struct slice_ma > if (((lpsizes >> (i * 4)) & 0xf) == psize) > ret->low_slices |= 1u << i; > > + if (high_limit <= SLICE_LOW_TOP) > + return; > + > hpsizes = mm->context.high_slices_psize; > - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) { > + for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) { > mask_index = i & 0x1; > index = i >> 1; > if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize) > @@ -169,6 +174,10 @@ static int slice_check_fit(struct mm_struct *mm, > struct slice_mask mask, struct slice_mask available) > { > DECLARE_BITMAP(result, SLICE_NUM_HIGH); > + /* > + * Make sure we just do bit compare only to the max > + * addr limit and not the full bit map size. > + */ > unsigned long slice_count = > GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); > > bitmap_and(result, mask.high_slices, > @@ -472,7 +481,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, > unsigned long len, > /* First make up a "good" mask of slices that have the right size >* already >*/ > - slice_mask_for_size(mm, psize, &good_mask); > + slice_mask_for_size(mm, psize, &good_mask, high_limit); > slice_print_mask(" good_mask", good_mask); > > /* > @@ -497,7 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, > unsigned long len, > #ifdef CONFIG_PPC_64K_PAGES > /* If we support combo pages, we can allow 64k pages in 4k slices */ > if (psize == MMU_PAGE_64K) { > - slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask); > + slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit); > if (fixed) > slice_or_mask(&good_mask, &compat_mask); > } > @@ -530,11 +539,11 @@ unsigned long slice_get_unmapped_area(unsigned long > addr, unsigned long len, > return newaddr; > } > } > - > - /* We don't fit in the good mask, check what other slices are > + /* > + * We don't fit in the good mask, check what other slices are >* empty and thus can be converted >*/ > - slice_mask_for_free(mm, &potential_mask); > + slice_mask_for_free(mm, &potential_mask, high_limit); > slice_or_mask(&potential_mask, &good_mask); > slice_print_mask(" potential", potential_mask); > > @@ -744,17 +753,18 @@ int is_hugepage_only_range(struct mm_struct *mm, > unsigned long addr, > { > struct slice_mask mask, available; > unsigned int psize = mm->context.user_p
Re: [PATCHv4 2/3] ARMv8: layerscape: add the pcie ep function support
Hi, On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote: > Add the pcie controller ep function support of layerscape base on > pcie ep framework. > > Signed-off-by: Bao Xiaowei > --- > v2: > - fix the ioremap function used but no ioumap issue > - optimize the code structure > - add code comments > v3: > - fix the msi outband window request failed issue > v4: > - optimize the code, adjust the format > > drivers/pci/dwc/pci-layerscape.c | 120 > --- > 1 file changed, 113 insertions(+), 7 deletions(-) $subject should begin with PCI: layerscape: > > diff --git a/drivers/pci/dwc/pci-layerscape.c > b/drivers/pci/dwc/pci-layerscape.c > index 87fa486bee2c..6f3e434599e0 100644 > --- a/drivers/pci/dwc/pci-layerscape.c > +++ b/drivers/pci/dwc/pci-layerscape.c > @@ -34,7 +34,12 @@ > /* PEX Internal Configuration Registers */ > #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */ > > +#define PCIE_DBI2_BASE 0x1000 /* DBI2 base address*/ The base address should come from dt. > +#define PCIE_MSI_MSG_DATA_OFF0x5c/* MSI Data register address*/ > +#define PCIE_MSI_OB_SIZE 4096 > +#define PCIE_MSI_ADDR_OFFSET (1024 * 1024) > #define PCIE_IATU_NUM6 > +#define PCIE_EP_ADDR_SPACE_SIZE 0x1 > > struct ls_pcie_drvdata { > u32 lut_offset; > @@ -44,12 +49,20 @@ struct ls_pcie_drvdata { > const struct dw_pcie_ops *dw_pcie_ops; > }; > > +struct ls_pcie_ep { > + dma_addr_t msi_phys_addr; > + void __iomem *msi_virt_addr; > + u64 msi_msg_addr; > + u16 msi_msg_data; > +}; > + > struct ls_pcie { > struct dw_pcie *pci; > void __iomem *lut; > struct regmap *scfg; > const struct ls_pcie_drvdata *drvdata; > int index; > + struct ls_pcie_ep *pcie_ep; > }; > > #define to_ls_pcie(x)dev_get_drvdata((x)->dev) > @@ -263,6 +276,99 @@ static const struct of_device_id ls_pcie_of_match[] = { > { }, > }; > > +static void ls_pcie_raise_msi_irq(struct ls_pcie_ep *pcie_ep) > +{ > + iowrite32(pcie_ep->msi_msg_data, pcie_ep->msi_virt_addr); > +} > + > +static int ls_pcie_raise_irq(struct dw_pcie_ep *ep, > + enum pci_epc_irq_type type, u8 interrupt_num) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + struct ls_pcie *pcie = to_ls_pcie(pci); > + struct ls_pcie_ep *pcie_ep = pcie->pcie_ep; > + u32 free_win; > + > + /* get the msi message address and msi message data */ > + pcie_ep->msi_msg_addr = ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_L32) | > + (((u64)ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_U32)) << 32); > + pcie_ep->msi_msg_data = ioread16(pci->dbi_base + PCIE_MSI_MSG_DATA_OFF); > + > + /* request and config the outband window for msi */ > + free_win = find_first_zero_bit(&ep->ob_window_map, > + sizeof(ep->ob_window_map)); > + if (free_win >= ep->num_ob_windows) { > + dev_err(pci->dev, "no free outbound window\n"); > + return -ENOMEM; > + } > + > + dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM, > + pcie_ep->msi_phys_addr, > + pcie_ep->msi_msg_addr, > + PCIE_MSI_OB_SIZE); > + > + set_bit(free_win, &ep->ob_window_map); This custom logic is not required. You can use [1] instead [1] -> https://lkml.org/lkml/2017/11/3/318 > + > + /* generate the msi interrupt */ > + ls_pcie_raise_msi_irq(pcie_ep); > + > + /* release the outband window of msi */ > + dw_pcie_disable_atu(pci, free_win, DW_PCIE_REGION_OUTBOUND); > + clear_bit(free_win, &ep->ob_window_map); > + > + return 0; > +} > + > +static struct dw_pcie_ep_ops pcie_ep_ops = { > + .raise_irq = ls_pcie_raise_irq, > +}; > + > +static int __init ls_add_pcie_ep(struct ls_pcie *pcie, > + struct platform_device *pdev) > +{ > + struct dw_pcie *pci = pcie->pci; > + struct device *dev = pci->dev; > + struct dw_pcie_ep *ep; > + struct ls_pcie_ep *pcie_ep; > + struct resource *cfg_res; > + int ret; > + > + ep = &pci->ep; > + ep->ops = &pcie_ep_ops; > + > + pcie_ep = devm_kzalloc(dev, sizeof(*pcie_ep), GFP_KERNEL); > + if (!pcie_ep) > + return -ENOMEM; > + > + pcie->pcie_ep = pcie_ep; > + > + cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config"); > + if (cfg_res) { > + ep->phys_base = cfg_res->start; > + ep->addr_size = PCIE_EP_ADDR_SPACE_SIZE; > + } else { > + dev_err(dev, "missing *config* space\n"); > + return -ENODEV; > + } > + > + pcie_ep->msi_phys_addr = ep->phys_base + PCIE_MSI_ADDR_OFFSET; > + > + pcie_ep->msi_virt_addr = ioremap(pcie_ep->msi_phys_addr, > + PCIE_MSI_OB_SI
Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB
Hi Bao, On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote: > Add the property of inbound and outbound windows number for ep > driver. > > Signed-off-by: Bao Xiaowei > Acked-by: Minghuan Lian > --- > v2: > - no change > v3: > - modify the commit message > v4: > - no change > > arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++ > 1 file changed, 6 insertions(+) $subject should start with something like arm64: dts: ls1046a: ** > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > index 06b5e12d04d8..f8332669663c 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > @@ -674,6 +674,8 @@ > device_type = "pci"; > dma-coherent; > num-lanes = <4>; > + num-ib-windows = <6>; > + num-ob-windows = <6>; EP specific properties shouldn't be added in RC dt node. Ideally you should have a separate dt node for RC and EP. Thanks Kishon