[PATCH] powerpc: dts: p2020rdb: add missing peripherials
This patch adds dts entry for some peripherials: - i2c: temperature sensor ADT7461 - i2c: eeprom m24256 - i2c: eeprom at24c01 - i2c: pmic zl2006 - i2c: gpio expander - phy: reset pins for phy - dsa: switch vsc7385 It was required to adjust rgmii settings for enet0 because switch with dsa driver act different without 8081 sterring. Signed-off-by: Pawel Dembicki --- arch/powerpc/boot/dts/fsl/p2020rdb.dts | 73 -- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/boot/dts/fsl/p2020rdb.dts b/arch/powerpc/boot/dts/fsl/p2020rdb.dts index 3acd3890b397..1f2ddeca0375 100644 --- a/arch/powerpc/boot/dts/fsl/p2020rdb.dts +++ b/arch/powerpc/boot/dts/fsl/p2020rdb.dts @@ -7,6 +7,9 @@ /include/ "p2020si-pre.dtsi" +#include +#include + / { model = "fsl,P2020RDB"; compatible = "fsl,P2020RDB"; @@ -131,22 +134,84 @@ partition@110 { L2switch@2,0 { #address-cells = <1>; #size-cells = <1>; - compatible = "vitesse-7385"; + compatible = "vitesse,vsc7385"; reg = <0x2 0x0 0x2>; - }; + reset-gpios = < 12 GPIO_ACTIVE_LOW>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@1 { + reg = <1>; + label = "e1-sw-p1"; + }; + port@2 { + reg = <2>; + label = "e1-sw-p2"; + }; + port@3 { + reg = <3>; + label = "e1-sw-p3"; + }; + port@4 { + reg = <4>; + label = "e1-sw-p4"; + }; + port@6 { + reg = <6>; + label = "cpu"; + ethernet = <>; + phy-mode = "rgmii"; + fixed-link { + speed = <1000>; + full-duplex; + pause; + }; + }; + }; + }; }; soc: soc@ffe0 { ranges = <0x0 0x0 0xffe0 0x10>; + gpio0: gpio-controller@fc00 { + }; + i2c@3000 { + temperature-sensor@4c { + compatible = "adi,adt7461"; + reg = <0x4c>; + }; + + eeprom@50 { + compatible = "atmel,24c256"; + reg = <0x50>; + }; rtc@68 { compatible = "dallas,ds1339"; reg = <0x68>; }; }; + i2c@3100 { + pmic@11 { + compatible = "zl2006"; + reg = <0x11>; + }; + + gpio@18 { + compatible = "nxp,pca9557"; + reg = <0x18>; + }; + + eeprom@52 { + compatible = "atmel,24c01"; + reg = <0x52>; + }; + }; + spi@7000 { flash@0 { #address-cells = <1>; @@ -200,10 +265,12 @@ mdio@24520 { phy0: ethernet-phy@0 { interrupts = <3 1 0 0>; reg = <0x0>; + reset-gpios = < 14 GPIO_ACTIVE_LOW>; }; phy1: ethernet-phy@1 { interrupts = <3 1 0 0>; reg = <0x1>; + reset-gpios = < 6 GPIO_ACTIVE_LOW>; }; tbi-phy@2 { device_type = "tbi-phy"; @@ -233,7 +300,7 @@ ptp_clock@24e00 { enet0: ethernet@24000 { fixed-link = <1 1 1000 0 0>; - phy-connection-type = "rgmii-id"; +
[PATCH v2] Adds a new ioctl32 syscall for backwards compatibility layers
From: Ryan Houdek Problem presented: A backwards compatibility layer that allows running x86-64 and x86 processes inside of an AArch64 process. - CPU is emulated - Syscall interface is mostly passthrough - Some syscalls require patching or emulation depending on behaviour - Not viable from the emulator design to use an AArch32 host process x86-64 and x86 userspace emulator source: https://github.com/FEX-Emu/FEX Usage of ioctl32 is currently in a downstream fork. This will be the first user of the syscall. Cross documentation: https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes#ioctl---54 ioctls are opaque from the emulator perspective and the data wants to be passed through a syscall as unimpeded as possible. Sadly due to ioctl struct differences between x86 and x86-64, we need a syscall that exposes the compatibility ioctl handler to userspace in a 64bit process. This is necessary behaves of the behaviour differences that occur between an x86 process doing an ioctl and an x86-64 process doing an ioctl. Both of which are captured and passed through the AArch64 ioctl space. This is implementing a new ioctl32 syscall that allows us to pass 32bit x86 ioctls through to the kernel with zero or minimal manipulation. The only supported hosts where we care about this currently is AArch64 and x86-64 (For testing purposes). PPC64LE, MIPS64LE, and RISC-V64 might be interesting to support in the future; But I don't have any platforms that get anywhere near Cortex-A77 performance in those architectures. Nor do I have the time to bring up the emulator on them. x86-64 can get to the compatibility ioctl through the int $0x80 handler. This does not solve the following problems: 1) compat_alloc_user_space inside ioctl 2) ioctls that check task mode instead of entry point for behaviour 3) ioctls allocating memory 4) struct packing problems between architectures Workarounds for the problems presented: 1a) Do a stack pivot to the lower 32bits from userspace - Forces host 64bit process to have its thread stacks to live in 32bit space. Not ideal. - Only do a stack pivot on ioctl to save previous 32bit VA space 1b) Teach kernel that compat_alloc_userspace can return a 64bit pointer - x86-64 truncates stack from this function - AArch64 returns the full stack pointer - Only ~29 users. Validating all of them support a 64bit stack is trivial? 2a) Any application using these can be checked for compatibility in userspace and put on a block list. 2b) Fix any ioctls doing broken behaviour based on task mode rather than ioctl entry point 3a) Userspace consumes all VA space above 32bit. Forcing allocations to occur in lower 32bits - This is the current implementation 3b) Ensure any allocation in the ioctl handles ioctl entrypoint rather than just allow generic memory allocations in full VA space - This is hard to guarantee 4a) Blocklist any application using ioctls that have different struct packing across the boundary - Can happen when struct packing of 32bit x86 application goes down the aarch64 compat_ioctl path - Userspace is a AArch64 process passing 32bit x86 ioctl structures through the compat_ioctl path which is typically for AArch32 processes - None currently identified 4b) Work with upstream kernel and userspace projects to evaluate and fix - Identify the problem ioctls - Implement a new ioctl with more sane struct packing that matches cross-arch - Implement new ioctl while maintaining backwards compatibility with previous ioctl handler - Change upstream project to use the new compatibility ioctl - ioctl deprecation will be case by case per device and project 4b) Userspace implements a full ioctl emulation layer - Parses the full ioctl tree - Either passes through ioctls that it doesn't understand or transforms ioctls that it knows are trouble - Has the downside that it can still run in to edge cases that will fail - Performance of additional tracking is a concern - Prone to failure keeping the kernel ioctl and userspace ioctl handling in sync - Really want to have it in the kernel space as much as possible Changes in v2: - Added the syscall to all architecture tables - Disabled on 32bit and BE platforms. They can call ioctl directly. - Disabled on x86-64 as well since you can call this from ia32 or x32 dispatch tables Signed-off-by: Ryan Houdek --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 ++ arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 2 ++ arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 +
[PATCH] Adds a new ioctl32 syscall for backwards compatibility layers
From: Ryan Houdek Problem presented: A backwards compatibility layer that allows running x86-64 and x86 processes inside of an AArch64 process. - CPU is emulated - Syscall interface is mostly passthrough - Some syscalls require patching or emulation depending on behaviour - Not viable from the emulator design to use an AArch32 host process x86-64 and x86 userspace emulator source: https://github.com/FEX-Emu/FEX Usage of ioctl32 is currently in a downstream fork. This will be the first user of the syscall. Cross documentation: https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes#ioctl---54 ioctls are opaque from the emulator perspective and the data wants to be passed through a syscall as unimpeded as possible. Sadly due to ioctl struct differences between x86 and x86-64, we need a syscall that exposes the compatibility ioctl handler to userspace in a 64bit process. This is necessary behaves of the behaviour differences that occur between an x86 process doing an ioctl and an x86-64 process doing an ioctl. Both of which are captured and passed through the AArch64 ioctl space. This is implementing a new ioctl32 syscall that allows us to pass 32bit x86 ioctls through to the kernel with zero or minimal manipulation. The only supported hosts where we care about this currently is AArch64 and x86-64 (For testing purposes). PPC64LE, MIPS64LE, and RISC-V64 might be interesting to support in the future; But I don't have any platforms that get anywhere near Cortex-A77 performance in those architectures. Nor do I have the time to bring up the emulator on them. x86-64 can get to the compatibility ioctl through the int $0x80 handler. This does not solve the following problems: 1) compat_alloc_user_space inside ioctl 2) ioctls that check task mode instead of entry point for behaviour 3) ioctls allocating memory 4) struct packing problems between architectures Workarounds for the problems presented: 1a) Do a stack pivot to the lower 32bits from userspace - Forces host 64bit process to have its thread stacks to live in 32bit space. Not ideal. - Only do a stack pivot on ioctl to save previous 32bit VA space 1b) Teach kernel that compat_alloc_userspace can return a 64bit pointer - x86-64 truncates stack from this function - AArch64 returns the full stack pointer - Only ~29 users. Validating all of them support a 64bit stack is trivial? 2a) Any application using these can be checked for compatibility in userspace and put on a block list. 2b) Fix any ioctls doing broken behaviour based on task mode rather than ioctl entry point 3a) Userspace consumes all VA space above 32bit. Forcing allocations to occur in lower 32bits - This is the current implementation 3b) Ensure any allocation in the ioctl handles ioctl entrypoint rather than just allow generic memory allocations in full VA space - This is hard to guarantee 4a) Blocklist any application using ioctls that have different struct packing across the boundary - Can happen when struct packing of 32bit x86 application goes down the aarch64 compat_ioctl path - Userspace is a AArch64 process passing 32bit x86 ioctl structures through the compat_ioctl path which is typically for AArch32 processes - None currently identified 4b) Work with upstream kernel and userspace projects to evaluate and fix - Identify the problem ioctls - Implement a new ioctl with more sane struct packing that matches cross-arch - Implement new ioctl while maintaining backwards compatibility with previous ioctl handler - Change upstream project to use the new compatibility ioctl - ioctl deprecation will be case by case per device and project 4b) Userspace implements a full ioctl emulation layer - Parses the full ioctl tree - Either passes through ioctls that it doesn't understand or transforms ioctls that it knows are trouble - Has the downside that it can still run in to edge cases that will fail - Performance of additional tracking is a concern - Prone to failure keeping the kernel ioctl and userspace ioctl handling in sync - Really want to have it in the kernel space as much as possible Signed-off-by: Ryan Houdek --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 ++ arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 2 ++ arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl| 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl
Re: [PATCH] perf tools: Resolve symbols against debug file first
Hello, On Thu, Jan 14, 2021 at 8:17 PM Michael Ellerman wrote: > > Namhyung Kim writes: > > On Wed, Jan 13, 2021 at 8:43 PM Jiri Slaby wrote: > >> > >> On 13. 01. 21, 11:46, Jiri Olsa wrote: > >> > On Wed, Jan 13, 2021 at 09:01:28AM +0100, Jiri Slaby wrote: > >> >> With LTO, there are symbols like these: > >> >> /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug > >> >> 10305: 00955fa4 0 NOTYPE LOCAL DEFAULT 29 > >> >> Predicate.cpp.2bc410e7 > >> >> > >> >> This comes from a runtime/debug split done by the standard way: > >> >> objcopy --only-keep-debug $runtime $debug > >> >> objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line > >> >> --strip-all $runtime > >> >> > ... > >> >> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c > >> >> index f3577f7d72fe..a31b716fa61c 100644 > >> >> --- a/tools/perf/util/symbol-elf.c > >> >> +++ b/tools/perf/util/symbol-elf.c > >> >> @@ -1226,12 +1226,20 @@ int dso__load_sym(struct dso *dso, struct map > >> >> *map, struct symsrc *syms_ss, > >> >> if (sym.st_shndx == SHN_ABS) > >> >> continue; > >> >> > >> >> -sec = elf_getscn(runtime_ss->elf, sym.st_shndx); > >> >> +sec = elf_getscn(syms_ss->elf, sym.st_shndx); > >> >> if (!sec) > >> >> goto out_elf_end; > >> > > >> > we iterate symbols from syms_ss, so the fix seems to be correct > >> > to call elf_getscn on syms_ss, not on runtime_ss as we do now > >> > > >> > I'd think this worked only when runtime_ss == syms_ss > >> > >> No, because the headers are copied 1:1 from runtime_ss to syms_ss. And > >> runtime_ss is then stripped, so only .debug* sections are removed there. > >> (And syms_ss's are set as NOBITS.) > >> > >> We iterated .debug* sections in syms_ss and used runtime_ss section > >> _headers_ only to adjust symbols (sometimes). That worked. > > > > It seems PPC has an opd section only in the runtime_ss and that's why > > we use it for section headers. > > At least on my system (Ubuntu 20.04.1) I see .opd in the debug file with > NOBITS set: > > $ readelf -e vmlinux.debug | grep opd > [37] .opd NOBITS c1c1f548 01202e14 > > > But possibly that's not the case with older toolchains? I was referring to this commit: commit 261360b6e90a782f0a63d8f61a67683c376c88cf Author: Cody P Schafer Date: Fri Aug 10 15:23:01 2012 -0700 perf symbols: Convert dso__load_syms to take 2 symsrc's To properly handle platforms with an opd section, both a runtime image (which contains the opd section but possibly lacks symbols) and a symbol image (which probably lacks an opd section but has symbols). The next patch ("perf symbol: use both runtime and debug images") adjusts the callsite in dso__load() to take advantage of being able to pass both runtime & debug images. Assumptions made here: - The opd section, if it exists in the runtime image, has headers in both the runtime image and the debug/syms image. - The index of the opd section (again, only if it exists in the runtime image) is the same in both the runtime and debug/symbols image. Both of these are true on RHEL, but it is unclear how accurate they are in general (on platforms with function descriptors in opd sections). Signed-off-by: Cody P Schafer Thanks, Namhyung
[PATCH] powerpc/sstep: Fix array out of bound warning
Compiling kernel with -Warray-bounds throws below warning: In function 'emulate_vsx_store': warning: array subscript is above array bounds [-Warray-bounds] buf.d[2] = byterev_8(reg->d[1]); ~^~~ buf.d[3] = byterev_8(reg->d[0]); ~^~~ Fix it by converting local variable 'union vsx_reg buf' into an array. Also consider function argument 'union vsx_reg *reg' as array instead of pointer because callers are actually passing an array to it. Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access instructions") Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 61 1 file changed, 31 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index bf7a7d62ae8b..5b4281ade5b6 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -723,7 +723,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, const unsigned char *bp; size = GETSIZE(op->type); - reg->d[0] = reg->d[1] = 0; + reg[0].d[0] = reg[0].d[1] = 0; + reg[1].d[0] = reg[1].d[1] = 0; switch (op->element_size) { case 32: @@ -742,25 +743,25 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, /* scalar loads, lxvd2x, lxvdsx */ read_size = (size >= 8) ? 8 : size; i = IS_LE ? 8 : 8 - read_size; - memcpy(>b[i], mem, read_size); + memcpy([0].b[i], mem, read_size); if (rev) - do_byte_reverse(>b[i], 8); + do_byte_reverse([0].b[i], 8); if (size < 8) { if (op->type & SIGNEXT) { /* size == 4 is the only case here */ - reg->d[IS_LE] = (signed int) reg->d[IS_LE]; + reg[0].d[IS_LE] = (signed int)reg[0].d[IS_LE]; } else if (op->vsx_flags & VSX_FPCONV) { preempt_disable(); - conv_sp_to_dp(>fp[1 + IS_LE], - >dp[IS_LE]); + conv_sp_to_dp([0].fp[1 + IS_LE], + [0].dp[IS_LE]); preempt_enable(); } } else { if (size == 16) { unsigned long v = *(unsigned long *)(mem + 8); - reg->d[IS_BE] = !rev ? v : byterev_8(v); + reg[0].d[IS_BE] = !rev ? v : byterev_8(v); } else if (op->vsx_flags & VSX_SPLAT) - reg->d[IS_BE] = reg->d[IS_LE]; + reg[0].d[IS_BE] = reg[0].d[IS_LE]; } break; case 4: @@ -768,13 +769,13 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, wp = mem; for (j = 0; j < size / 4; ++j) { i = IS_LE ? 3 - j : j; - reg->w[i] = !rev ? *wp++ : byterev_4(*wp++); + reg[0].w[i] = !rev ? *wp++ : byterev_4(*wp++); } if (op->vsx_flags & VSX_SPLAT) { - u32 val = reg->w[IS_LE ? 3 : 0]; + u32 val = reg[0].w[IS_LE ? 3 : 0]; for (; j < 4; ++j) { i = IS_LE ? 3 - j : j; - reg->w[i] = val; + reg[0].w[i] = val; } } break; @@ -783,7 +784,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, hp = mem; for (j = 0; j < size / 2; ++j) { i = IS_LE ? 7 - j : j; - reg->h[i] = !rev ? *hp++ : byterev_2(*hp++); + reg[0].h[i] = !rev ? *hp++ : byterev_2(*hp++); } break; case 1: @@ -791,7 +792,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, bp = mem; for (j = 0; j < size; ++j) { i = IS_LE ? 15 - j : j; - reg->b[i] = *bp++; + reg[0].b[i] = *bp++; } break; } @@ -804,7 +805,7 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, { int size, write_size; int i, j; - union vsx_reg buf; + union vsx_reg buf[2]; unsigned int *wp; unsigned short *hp; unsigned char *bp; @@ -818,11 +819,11 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, break; if (rev) { /*
Re: [PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
On 15/1/21 9:00 am, Nathan Lynch wrote: RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any sense. It represents the size of an area of memory set aside for user space to use as work areas for certain RTAS calls. Rename it to RTAS_USER_REGION, and express the value in terms of the number of work areas allocated. Signed-off-by: Nathan Lynch squash! powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE I think you meant to get rid of this line... -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 3/6] powerpc/rtas: remove ibm_suspend_me_token
On 15/1/21 9:00 am, Nathan Lynch wrote: There's not a compelling reason to cache the value of the token for the ibm,suspend-me function. Just look it up when needed in the RTAS syscall's special case for it. Signed-off-by: Nathan Lynch Reviewed-by: Andrew Donnellan --- arch/powerpc/kernel/rtas.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index d126d71ea5bd..60fcf7f7b0b8 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -828,7 +828,6 @@ void rtas_activate_firmware(void) pr_err("ibm,activate-firmware failed (%i)\n", fwrc); } -static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE; #ifdef CONFIG_PPC_PSERIES /** * rtas_call_reentrant() - Used for reentrant rtas calls @@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) return -EINVAL; /* Need to handle ibm,suspend_me call specially */ - if (token == ibm_suspend_me_token) { + if (token == rtas_token("ibm,suspend-me")) { /* * rtas_ibm_suspend_me assumes the streamid handle is in cpu @@ -1191,10 +1190,8 @@ void __init rtas_initialize(void) * the stop-self token if any */ #ifdef CONFIG_PPC64 - if (firmware_has_feature(FW_FEATURE_LPAR)) { + if (firmware_has_feature(FW_FEATURE_LPAR)) rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); - ibm_suspend_me_token = rtas_token("ibm,suspend-me"); - } #endif rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE, 0, rtas_region); -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 2/6] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX
On 15/1/21 9:00 am, Nathan Lynch wrote: This constant is unused. Signed-off-by: Nathan Lynch A quick grep agrees. Reviewed-by: Andrew Donnellan --- arch/powerpc/kernel/rtas-proc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index e0f8329966d6..d2b0d99824a4 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) return 0; } -#define RMO_READ_BUF_MAX 30 - /** * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space. * -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 1/6] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation
On 15/1/21 8:59 am, Nathan Lynch wrote: Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for /proc/powerpc/rtas/rmo_buffer, explaining its expected use. Signed-off-by: Nathan Lynch Looks good. Reviewed-by: Andrew Donnellan -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 4/6] powerpc/rtas: move syscall filter setup into separate function
On 15/1/21 9:00 am, Nathan Lynch wrote: Reduce conditionally compiled sections within rtas_initialize() by moving the filter table initialization into its own function already guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended. Signed-off-by: Nathan Lynch Acked-by: Andrew Donnellan +static void __init rtas_syscall_filter_init(void) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) { + rtas_filters[i].token = rtas_token(rtas_filters[i].name); + } + +} Unnecessary empty line, but vitally important braces :) -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 4/6] powerpc/rtas: move syscall filter setup into separate function
On 15/01/2021 09:00, Nathan Lynch wrote: Reduce conditionally compiled sections within rtas_initialize() by moving the filter table initialization into its own function already guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 60fcf7f7b0b8..55f6aa170e57 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1051,6 +1051,16 @@ static bool block_rtas_call(int token, int nargs, return true; } +static void __init rtas_syscall_filter_init(void) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) { + rtas_filters[i].token = rtas_token(rtas_filters[i].name); + } + Unnecessary curly braces (I understand it is cut-n-paste but still) and an empty line. Otherwise: Reviewed-by: Alexey Kardashevskiy +} + #else static bool block_rtas_call(int token, int nargs, @@ -1059,6 +1069,10 @@ static bool block_rtas_call(int token, int nargs, return false; } +static void __init rtas_syscall_filter_init(void) +{ +} + #endif /* CONFIG_PPC_RTAS_FILTER */ /* We assume to be passed big endian arguments */ @@ -1162,9 +1176,6 @@ void __init rtas_initialize(void) unsigned long rtas_region = RTAS_INSTANTIATE_MAX; u32 base, size, entry; int no_base, no_size, no_entry; -#ifdef CONFIG_PPC_RTAS_FILTER - int i; -#endif /* Get RTAS dev node and fill up our "rtas" structure with infos * about it. @@ -1203,11 +1214,7 @@ void __init rtas_initialize(void) rtas_last_error_token = rtas_token("rtas-last-error"); #endif -#ifdef CONFIG_PPC_RTAS_FILTER - for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) { - rtas_filters[i].token = rtas_token(rtas_filters[i].name); - } -#endif + rtas_syscall_filter_init(); } int __init early_init_dt_scan_rtas(unsigned long node, -- Alexey
Re: [PATCH 1/6] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation
On 15/01/2021 08:59, Nathan Lynch wrote: Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for /proc/powerpc/rtas/rmo_buffer, explaining its expected use. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas-proc.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index 2d33f342a293..e0f8329966d6 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -757,7 +757,16 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) #define RMO_READ_BUF_MAX 30 -/* RTAS Userspace access */ +/** + * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space. + * + * Base + size description of a range of RTAS-addressable memory set + * aside for user space to use as work area(s) for certain RTAS + * functions. User space accesses this region via /dev/mem. mmm ufff wut argh^w^w^w^w Thanks for documenting it :) Reviewed-by: Alexey Kardashevskiy Apart from + * security policies, the kernel does not arbitrate or serialize + * access to this region, and user space must ensure that concurrent + * users do not interfere with each other. + */ static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v) { seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX); -- Alexey
Re: [PATCH 3/6] powerpc/rtas: remove ibm_suspend_me_token
On 15/01/2021 09:00, Nathan Lynch wrote: There's not a compelling reason to cache the value of the token for the ibm,suspend-me function. Just look it up when needed in the RTAS syscall's special case for it. Signed-off-by: Nathan Lynch Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/rtas.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index d126d71ea5bd..60fcf7f7b0b8 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -828,7 +828,6 @@ void rtas_activate_firmware(void) pr_err("ibm,activate-firmware failed (%i)\n", fwrc); } -static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE; #ifdef CONFIG_PPC_PSERIES /** * rtas_call_reentrant() - Used for reentrant rtas calls @@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) return -EINVAL; /* Need to handle ibm,suspend_me call specially */ - if (token == ibm_suspend_me_token) { + if (token == rtas_token("ibm,suspend-me")) { /* * rtas_ibm_suspend_me assumes the streamid handle is in cpu @@ -1191,10 +1190,8 @@ void __init rtas_initialize(void) * the stop-self token if any */ #ifdef CONFIG_PPC64 - if (firmware_has_feature(FW_FEATURE_LPAR)) { + if (firmware_has_feature(FW_FEATURE_LPAR)) rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); - ibm_suspend_me_token = rtas_token("ibm,suspend-me"); - } #endif rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE, 0, rtas_region); -- Alexey
Re: [PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
On 15/01/2021 09:00, Nathan Lynch wrote: RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any sense. It represents the size of an area of memory set aside for user space to use as work areas for certain RTAS calls. Rename it to RTAS_USER_REGION, and express the value in terms of the number of work areas allocated. Signed-off-by: Nathan Lynch squash! powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE --- arch/powerpc/include/asm/rtas.h | 9 ++--- arch/powerpc/kernel/rtas-proc.c | 2 +- arch/powerpc/kernel/rtas.c | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 332e1000ca0f..1aa7ab1cbc84 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -19,8 +19,11 @@ #define RTAS_UNKNOWN_SERVICE (-1) #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */ -/* Buffer size for ppc_rtas system call. */ -#define RTAS_RMOBUF_MAX (64 * 1024) +/* Work areas shared with RTAS must be 4K, naturally aligned. */ Why exactly 4K and not (for example) PAGE_SIZE? +#define RTAS_WORK_AREA_SIZE 4096 + +/* Work areas allocated for user space access. */ +#define RTAS_USER_REGION_SIZE (RTAS_WORK_AREA_SIZE * 16) This is still 64K but no clarity why. There is 16 of something, what is it? /* RTAS return status codes */ #define RTAS_BUSY -2/* RTAS Busy */ @@ -357,7 +360,7 @@ extern void rtas_take_timebase(void); static inline int page_is_rtas_user_buf(unsigned long pfn) { unsigned long paddr = (pfn << PAGE_SHIFT); - if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_RMOBUF_MAX)) + if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_USER_REGION_SIZE)) return 1; return 0; } diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index d2b0d99824a4..6857a5b0a1c3 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -767,6 +767,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) */ static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v) { - seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX); + seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_USER_REGION_SIZE); return 0; } diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 55f6aa170e57..da65faadbbb2 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1204,7 +1204,7 @@ void __init rtas_initialize(void) if (firmware_has_feature(FW_FEATURE_LPAR)) rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); #endif - rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE, + rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, PAGE_SIZE, 0, rtas_region); if (!rtas_rmo_buf) panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n", -- Alexey
Re: [PATCH 2/6] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX
On 15/01/2021 09:00, Nathan Lynch wrote: This constant is unused. Signed-off-by: Nathan Lynch Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/rtas-proc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index e0f8329966d6..d2b0d99824a4 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) return 0; } -#define RMO_READ_BUF_MAX 30 - /** * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space. * -- Alexey
Re: [PATCH 6/6] powerpc/rtas: constrain user region allocation to RMA
On 15/01/2021 09:00, Nathan Lynch wrote: Memory locations passed as arguments from the OS to RTAS usually need to be addressable in 32-bit mode and must reside in the Real Mode Area. On PAPR guests, the RMA starts at logical address 0 and is the first logical memory block reported in the LPAR’s device tree. On powerpc targets with RTAS, Linux makes available to user space a region of memory suitable for arguments to be passed to RTAS via sys_rtas(). This region (rtas_rmo_buf) is allocated via the memblock API during boot in order to ensure that it satisfies the requirements described above. With radix MMU, the upper limit supplied to the memblock allocation can exceed the bounds of the first logical memory block, since ppc64_rma_size is ULONG_MAX and RTAS_INSTANTIATE_MAX is 1GB. (512MB is a common size of the first memory block according to a small sample of LPARs I have checked.) This leads to failures when user space invokes an RTAS function that uses a work area, such as ibm,configure-connector. Alter the determination of the upper limit for rtas_rmo_buf's allocation to consult the device tree directly, ensuring placement within the RMA regardless of the MMU in use. Can we tie this with RTAS (which also needs to be in RMA) and simply add extra 64K in prom_instantiate_rtas() and advertise this address (ALIGH_UP(rtas-base + rtas-size, PAGE_SIZE)) to the user space? We do not need this RMO area before that point. And probably do the same with per-cpu RTAS argument structures mentioned in the cover letter? Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 80 +++--- 1 file changed, 65 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index da65faadbbb2..98dfb112f4df 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1166,6 +1166,70 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) return 0; } +/* + * Memory locations passed to RTAS must be in the RMA as described by + * the range in /memory@0. + */ +static phys_addr_t rtas_arg_addr_limit(void) +{ + unsigned int addr_cells; + unsigned int size_cells; + struct device_node *np; + const __be32 *prop; + u64 limit; + u64 base; + + /* RTAS is instantiated in 32-bit mode. */ + limit = 1ULL << 32; + + /* Account for mem=. */ + if (memory_limit != 0) + limit = min(limit, memory_limit); + + np = of_find_node_by_path("/memory@0"); + if (!np) + goto out; + + prop = of_get_property(np, "reg", NULL); + if (!prop) + goto put; + + addr_cells = of_n_addr_cells(np); + base = of_read_number(prop, addr_cells); + prop += addr_cells; + size_cells = of_n_size_cells(np); + limit = min(limit, of_read_number(prop, size_cells)); +put: + of_node_put(np); +out: + pr_debug("%s: base = %#llx limit = %#llx", __func__, base, limit); + + return limit; +} + +static void __init rtas_user_region_setup(void) +{ + phys_addr_t limit, align, size; + + limit = rtas_arg_addr_limit(); + size = RTAS_USER_REGION_SIZE; + + /* +* Although work areas need only 4KB alignment, user space +* accesses this region via mmap so it must be placed on a +* page boundary. +*/ + align = PAGE_SIZE; + + rtas_rmo_buf = memblock_phys_alloc_range(size, align, 0, limit); + if (rtas_rmo_buf == 0) { + panic("Failed to allocate %llu bytes for user region below %pa\n", + size, ); + } + + pr_debug("RTAS user region allocated at %pa\n", _rmo_buf); +} + /* * Call early during boot, before mem init, to retrieve the RTAS * information from the device-tree and allocate the RMO buffer for userland @@ -1173,7 +1237,6 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) */ void __init rtas_initialize(void) { - unsigned long rtas_region = RTAS_INSTANTIATE_MAX; u32 base, size, entry; int no_base, no_size, no_entry; @@ -1197,23 +1260,10 @@ void __init rtas_initialize(void) no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", ); rtas.entry = no_entry ? rtas.base : entry; - /* If RTAS was found, allocate the RMO buffer for it and look for -* the stop-self token if any -*/ -#ifdef CONFIG_PPC64 - if (firmware_has_feature(FW_FEATURE_LPAR)) - rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); -#endif - rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, PAGE_SIZE, -0, rtas_region); - if (!rtas_rmo_buf) - panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n", - PAGE_SIZE, _region); - #ifdef CONFIG_RTAS_ERROR_LOGGING rtas_last_error_token =
Re: [PATCH 10/18] arch: powerpc: Stop building and using oprofile
On 14-01-21, 17:05, Viresh Kumar wrote: > The "oprofile" user-space tools don't use the kernel OPROFILE support > any more, and haven't in a long time. User-space has been converted to > the perf interfaces. > > This commits stops building oprofile for powerpc and removes any > reference to it from directories in arch/powerpc/ apart from > arch/powerpc/oprofile, which will be removed in the next commit (this is > broken into two commits as the size of the commit became very big, ~5k > lines). > > Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't > removed as it was also used by other parts of the code. > > Suggested-by: Christoph Hellwig > Suggested-by: Linus Torvalds > Signed-off-by: Viresh Kumar + this to fix a warning: diff --git a/arch/powerpc/platforms/cell/spufs/run.c b/arch/powerpc/platforms/cell/spufs/run.c index 466006918003..ce52b87496d2 100644 --- a/arch/powerpc/platforms/cell/spufs/run.c +++ b/arch/powerpc/platforms/cell/spufs/run.c @@ -353,7 +353,6 @@ static int spu_process_callback(struct spu_context *ctx) long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event) { int ret; - struct spu *spu; u32 status; if (mutex_lock_interruptible(>run_mutex)) @@ -386,7 +385,6 @@ long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event) mutex_lock(>state_mutex); break; } - spu = ctx->spu; if (unlikely(test_and_clear_bit(SPU_SCHED_NOTIFY_ACTIVE, >sched_flags))) { if (!(status & SPU_STATUS_STOPPED_BY_STOP)) -- viresh
Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool
On Fri, Jan 15, 2021 at 2:52 AM Florian Fainelli wrote: > > On 1/14/21 1:08 AM, Claire Chang wrote: > > On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli > > wrote: > >> > >> On 1/5/21 7:41 PM, Claire Chang wrote: > >>> If a device is not behind an IOMMU, we look up the device node and set > >>> up the restricted DMA when the restricted-dma-pool is presented. > >>> > >>> Signed-off-by: Claire Chang > >>> --- > >> > >> [snip] > >> > >>> +int of_dma_set_restricted_buffer(struct device *dev) > >>> +{ > >>> + struct device_node *node; > >>> + int count, i; > >>> + > >>> + if (!dev->of_node) > >>> + return 0; > >>> + > >>> + count = of_property_count_elems_of_size(dev->of_node, > >>> "memory-region", > >>> + sizeof(phandle)); > >> > >> You could have an early check for count < 0, along with an error > >> message, if that is deemed useful. > >> > >>> + for (i = 0; i < count; i++) { > >>> + node = of_parse_phandle(dev->of_node, "memory-region", i); > >>> + if (of_device_is_compatible(node, "restricted-dma-pool")) > >> > >> And you may want to add here an of_device_is_available(node). A platform > >> that provides the Device Tree firmware and try to support multiple > >> different SoCs may try to determine if an IOMMU is present, and if it > >> is, it could be marking the restriced-dma-pool region with a 'status = > >> "disabled"' property, or any variant of that scheme. > > > > This function is called only when there is no IOMMU present (check in > > drivers/of/device.c). I can still add of_device_is_available(node) > > here if you think it's helpful. > > I believe it is, since boot loader can have a shared Device Tree blob > skeleton and do various adaptations based on the chip (that's what we > do) and adding a status property is much simpler than insertion new > nodes are run time. > > > > >> > >>> + return of_reserved_mem_device_init_by_idx( > >>> + dev, dev->of_node, i); > >> > >> This does not seem to be supporting more than one memory region, did not > >> you want something like instead: > >> > >> ret = of_reserved_mem_device_init_by_idx(...); > >> if (ret) > >> return ret; > >> > > > > Yes. This implement only supports one restriced-dma-pool memory region > > with the assumption that there is only one memory region with the > > compatible string, restricted-dma-pool, in the dts. IIUC, it's similar > > to shared-dma-pool. > > Then if here is such a known limitation it should be both documented and > enforced here, you shouldn ot be iterating over all of the phandles that > you find, stop at the first one and issue a warning if count > 1? What I have in mind is there might be multiple memory regions, but only one is for restriced-dma-pool. Say, if you want a separated region for coherent DMA and only do streaming DMA in this restriced-dma-pool region, you can add another reserved-memory node with shared-dma-pool in dts and the current implementation will try to allocate the memory via dma_alloc_from_dev_coherent() first (see dma_alloc_attrs() in /kernel/dma/mapping.c). Or if you have vendor specific memory region, you can still set up restriced-dma-pool by adding another reserved-memory node in dts. Dose this make sense to you? I'll document this for sure. > -- > Florian
Re: [PATCH v5 00/21] ibmvfc: initial MQ development/enablement
Tyrel, > Recent updates in pHyp Firmware and VIOS releases provide new > infrastructure towards enabling Subordinate Command Response Queues > (Sub-CRQs) such that each Sub-CRQ is a channel backed by an actual > hardware queue in the FC stack on the partner VIOS. Sub-CRQs are > registered with the firmware via hypercalls and then negotiated with > the VIOS via new Management Datagrams (MADs) for channel setup. > > This initial implementation adds the necessary Sub-CRQ framework and > implements the new MADs for negotiating and assigning a set of > Sub-CRQs to associated VIOS HW backed channels. Applied to 5.12/scsi-staging, thanks! -- Martin K. Petersen Oracle Linux Engineering
[powerpc:next-test] BUILD SUCCESS 6629114a7f6f21d41640cbc006c3694e65940729
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: 6629114a7f6f21d41640cbc006c3694e65940729 powerpc/vas: Fix IRQ name allocation elapsed time: 726m configs tested: 122 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig mips tb0287_defconfig s390 debug_defconfig openriscdefconfig arcnsim_700_defconfig arm sunxi_defconfig mips cavium_octeon_defconfig alphaalldefconfig sh r7780mp_defconfig sparc64 defconfig m68kstmark2_defconfig arcnsimosci_defconfig powerpc mpc5200_defconfig arm cns3420vb_defconfig powerpc holly_defconfig mips pistachio_defconfig mipsnlm_xlr_defconfig powerpc tqm8xx_defconfig powerpc katmai_defconfig powerpc eiger_defconfig arc alldefconfig m68kmvme147_defconfig umkunit_defconfig m68k alldefconfig arm at91_dt_defconfig m68km5407c3_defconfig riscv rv32_defconfig arm davinci_all_defconfig powerpc pq2fads_defconfig powerpc mpc8272_ads_defconfig arm corgi_defconfig arm aspeed_g5_defconfig armmvebu_v5_defconfig arm assabet_defconfig armmulti_v7_defconfig mips loongson1b_defconfig mips mtx1_defconfig armspear3xx_defconfig arc axs103_defconfig nios2 defconfig arm cm_x300_defconfig sh se7750_defconfig sh se7206_defconfig powerpc tqm8555_defconfig arm lpc32xx_defconfig sh se7722_defconfig nds32 allnoconfig mipsmaltaup_defconfig csky alldefconfig arm iop32x_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig arc allyesconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a002-20210114 i386 randconfig-a005-20210114 i386 randconfig-a006-20210114 i386 randconfig-a001-20210114 i386 randconfig-a003-20210114 i386 randconfig-a004-20210114 x86_64 randconfig-a015-20210114 x86_64 randconfig-a012-20210114 x86_64 randconfig-a013-20210114 x86_64 randconfig-a016-20210114 x86_64 randconfig-a014-20210114 x86_64 randconfig-a011-20210114 i386 randconfig-a012-20210114 i386 randconfig-a011-20210114 i386 randconfig-a016-20210114 i386 randconfig
[powerpc:merge] BUILD SUCCESS 56f0e929cb28f6e28e6f88d88c03b75979cf1f40
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: 56f0e929cb28f6e28e6f88d88c03b75979cf1f40 Automatic merge of 'fixes' into merge (2021-01-14 15:57) elapsed time: 725m configs tested: 134 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig powerpc mpc836x_rdk_defconfig powerpc mpc8313_rdb_defconfig armmvebu_v7_defconfig mipsbcm47xx_defconfig sh rsk7201_defconfig mips tb0287_defconfig s390 debug_defconfig openriscdefconfig arcnsim_700_defconfig arm sunxi_defconfig powerpc mpc5200_defconfig arm cns3420vb_defconfig powerpc holly_defconfig mips pistachio_defconfig mipsnlm_xlr_defconfig powerpc tqm8xx_defconfig powerpc katmai_defconfig powerpc eiger_defconfig arc alldefconfig m68kmvme147_defconfig umkunit_defconfig armzeus_defconfig mipsworkpad_defconfig arm eseries_pxa_defconfig mips cu1000-neo_defconfig mips mpc30x_defconfig mips maltasmvp_defconfig m68k alldefconfig arm at91_dt_defconfig m68km5407c3_defconfig riscv rv32_defconfig arm corgi_defconfig arm aspeed_g5_defconfig armmvebu_v5_defconfig arm assabet_defconfig armmulti_v7_defconfig sh se7750_defconfig m68kq40_defconfig m68k m5475evb_defconfig arm imx_v6_v7_defconfig arm tegra_defconfig mips loongson1b_defconfig mips mtx1_defconfig armspear3xx_defconfig arc axs103_defconfig nios2 defconfig arm cm_x300_defconfig sh se7206_defconfig powerpc tqm8555_defconfig arm lpc32xx_defconfig sh se7722_defconfig nds32 allnoconfig mipsmaltaup_defconfig csky alldefconfig arm iop32x_defconfig powerpc tqm8541_defconfig sh ap325rxa_defconfig arm palmz72_defconfig s390 alldefconfig h8300h8300h-sim_defconfig powerpc tqm8540_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig arc allyesconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a002-20210114 i386 randconfig-a005-20210114 i386 randconfig-a006-20210114 i386
[powerpc:fixes-test] BUILD SUCCESS 41131a5e54ae7ba5a2bb8d7b30d1818b3f5b13d2
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: 41131a5e54ae7ba5a2bb8d7b30d1818b3f5b13d2 powerpc/vdso: Fix clock_gettime_fallback for vdso32 elapsed time: 728m configs tested: 115 configs skipped: 94 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig mips tb0287_defconfig s390 debug_defconfig openriscdefconfig arcnsim_700_defconfig arm sunxi_defconfig powerpc mpc5200_defconfig arm cns3420vb_defconfig powerpc holly_defconfig mips pistachio_defconfig mipsnlm_xlr_defconfig powerpc tqm8xx_defconfig powerpc katmai_defconfig powerpc eiger_defconfig arc alldefconfig m68kmvme147_defconfig umkunit_defconfig m68k alldefconfig arm at91_dt_defconfig m68km5407c3_defconfig riscv rv32_defconfig arm corgi_defconfig arm aspeed_g5_defconfig armmvebu_v5_defconfig arm assabet_defconfig armmulti_v7_defconfig powerpc pcm030_defconfig powerpc cm5200_defconfig mips loongson1b_defconfig mips mtx1_defconfig armspear3xx_defconfig arc axs103_defconfig nios2 defconfig arm cm_x300_defconfig sh se7750_defconfig sh se7206_defconfig powerpc tqm8555_defconfig arm lpc32xx_defconfig sh se7722_defconfig nds32 allnoconfig mipsmaltaup_defconfig csky alldefconfig arm iop32x_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig arc allyesconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a002-20210114 i386 randconfig-a005-20210114 i386 randconfig-a006-20210114 i386 randconfig-a001-20210114 i386 randconfig-a003-20210114 i386 randconfig-a004-20210114 x86_64 randconfig-a015-20210114 x86_64 randconfig-a012-20210114 x86_64 randconfig-a013-20210114 x86_64 randconfig-a016-20210114 x86_64 randconfig-a014-20210114 x86_64 randconfig-a011-20210114 i386 randconfig-a012-20210114 i386 randconfig-a011-20210114 i386 randconfig-a016-20210114 i386 randconfig-a015-20210114 i386 randconfig-a013-20210114 i386 randconfig-a014-20210114 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv
Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote: > On 1/13/21 7:27 PM, Ming Lei wrote: > > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote: > >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote: > >>> On 1/12/21 2:54 PM, Brian King wrote: > On 1/11/21 5:12 PM, Tyrel Datwyler wrote: > > Introduce several new vhost fields for managing MQ state of the adapter > > as well as initial defaults for MQ enablement. > > > > Signed-off-by: Tyrel Datwyler > > --- > > drivers/scsi/ibmvscsi/ibmvfc.c | 8 > > drivers/scsi/ibmvscsi/ibmvfc.h | 9 + > > 2 files changed, 17 insertions(+) > > > > diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c > > b/drivers/scsi/ibmvscsi/ibmvfc.c > > index ba95438a8912..9200fe49c57e 100644 > > --- a/drivers/scsi/ibmvscsi/ibmvfc.c > > +++ b/drivers/scsi/ibmvscsi/ibmvfc.c > > @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template > > = { > > .max_sectors = IBMVFC_MAX_SECTORS, > > .shost_attrs = ibmvfc_attrs, > > .track_queue_depth = 1, > > + .host_tagset = 1, > > This doesn't seem right. You are setting host_tagset, which means you > want a > shared, host wide, tag set for commands. It also means that the total > queue depth for the host is can_queue. However, it looks like you are > allocating > max_requests events for each sub crq, which means you are over > allocating memory. > >>> > >>> With the shared tagset yes the queue depth for the host is can_queue, but > >>> this > >>> also implies that the max queue depth for each hw queue is also > >>> can_queue. So, > >>> in the worst case that all commands are queued down the same hw queue we > >>> need an > >>> event pool with can_queue commands. > >>> > > Looking at this closer, we might have bigger problems. There is a host > wide > max number of commands that the VFC host supports, which gets returned on > NPIV Login. This value can change across a live migration event. > >>> > >>> From what I understand the max commands can only become less. > >>> > > The ibmvfc driver, which does the same thing the lpfc driver does, > modifies > can_queue on the scsi_host *after* the tag set has been allocated. This > looks > to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like > we look at can_queue once the tag set is setup, and I'm not seeing a > good way > to dynamically change the host queue depth once the tag set is setup. > > Unless I'm missing something, our best options appear to either be to > implement > our own host wide busy reference counting, which doesn't sound very > good, or > we need to add some API to block / scsi that allows us to dynamically > change > can_queue. > >>> > >>> Changing can_queue won't do use any good with the shared tagset becasue > >>> each > >>> queue still needs to be able to queue can_queue number of commands in the > >>> worst > >>> case. > >> > >> The issue I'm trying to highlight here is the following scenario: > >> > >> 1. We set shost->can_queue, then call scsi_add_host, which allocates the > >> tag set. > >> > >> 2. On our NPIV login response from the VIOS, we might get a lower value > >> than we > >> initially set in shost->can_queue, so we update it, but nobody ever looks > >> at it > >> again, and we don't have any protection against sending too many commands > >> to the host. > >> > >> > >> Basically, we no longer have any code that ensures we don't send more > >> commands to the VIOS than we are told it supports. According to the > >> architecture, > >> if we actually do this, the VIOS will do an h_free_crq, which would be a > >> bit > >> of a bug on our part. > >> > >> I don't think it was ever clearly defined in the API that a driver can > >> change shost->can_queue after calling scsi_add_host, but up until > >> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now > >> it doesn't. > > > > Actually it isn't related with commit 6eb045e092ef, because > > blk_mq_alloc_tag_set() > > uses .can_queue to create driver tag sbitmap and request pool. > > > > So even thought without 6eb045e092ef, the updated .can_queue can't work > > as expected because the max driver tag depth has been fixed by blk-mq > > already. > > There are two scenarios here. In the scenario of someone increasing can_queue > after the tag set is allocated, I agree, blk-mq will never take advantage > of this. However, in the scenario of someone *decreasing* can_queue after the > tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code > provided > this protection. When .can_queue is decreased, blk-mq still may allocate driver tag which is > .can_queue, this way might break driver/device too, but it depends on how driver uses req->tag. > > > > > What
Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
Excerpts from Christophe Leroy's message of January 14, 2021 11:28 pm: > > > Le 14/01/2021 à 14:17, Nicholas Piggin a écrit : >> Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm: >>> >>> >>> Le 14/01/2021 à 13:09, Nicholas Piggin a écrit : Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm: > Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am: >> >> >> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit : >>> The page fault handling still has some complex logic particularly around >>> hash table handling, in asm. Implement this in C instead. >>> >>> Signed-off-by: Nicholas Piggin >>> --- >>> arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 + >>> arch/powerpc/kernel/exceptions-64s.S | 131 >>> +++--- >>> arch/powerpc/mm/book3s64/hash_utils.c | 77 ++ >>> arch/powerpc/mm/fault.c | 46 -- >>> 4 files changed, 107 insertions(+), 148 deletions(-) >>> >>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> index 066b1d34c7bc..60a669379aa0 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long >>> vpn, >>> #define HPTE_NOHPTE_UPDATE 0x2 >>> #define HPTE_USE_KERNEL_KEY 0x4 >>> >>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned >>> long dsisr); >>> extern int __hash_page_4K(unsigned long ea, unsigned long access, >>> unsigned long vsid, pte_t *ptep, unsigned >>> long trap, >>> unsigned long flags, int ssize, int >>> subpage_prot); >>> diff --git a/arch/powerpc/kernel/exceptions-64s.S >>> b/arch/powerpc/kernel/exceptions-64s.S >>> index 6e53f7638737..bcb5e81d2088 100644 >>> --- a/arch/powerpc/kernel/exceptions-64s.S >>> +++ b/arch/powerpc/kernel/exceptions-64s.S >>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) >>> * >>> * Handling: >>> * - Hash MMU >>> - * Go to do_hash_page first to see if the HPT can be filled from an >>> entry in >>> - * the Linux page table. Hash faults can hit in kernel mode in a >>> fairly >>> + * Go to do_hash_fault, which attempts to fill the HPT from an entry >>> in the >>> + * Linux page table. Hash faults can hit in kernel mode in a fairly >>> * arbitrary state (e.g., interrupts disabled, locks held) when >>> accessing >>> * "non-bolted" regions, e.g., vmalloc space. However these >>> should always be >>> - * backed by Linux page tables. >>> + * backed by Linux page table entries. >>> * >>> - * If none is found, do a Linux page fault. Linux page faults can >>> happen in >>> - * kernel mode due to user copy operations of course. >>> + * If no entry is found the Linux page fault handler is invoked (by >>> + * do_hash_fault). Linux page faults can happen in kernel mode due >>> to user >>> + * copy operations of course. >>> * >>> * KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in >>> guest >>> * MMU context, which may cause a DSI in the host, which must go >>> to the >>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common) >>> GEN_COMMON data_access >>> ld r4,_DAR(r1) >>> ld r5,_DSISR(r1) >> >> We have DSISR here. I think the dispatch between page fault or >> do_break() should be done here: >> - It would be more similar to other arches > > Other sub-archs? > >> - Would avoid doing it also in instruction fault > > True but it's hidden under an unlikely branch so won't really help > instruction fault. > >> - Would avoid that -1 return which looks more like a hack. > > I don't really see it as a hack, we return a code to asm caller to > direct whether to restore registers or not, we alrady have this > pattern. > > (I'm hoping all that might be go away one day by conrolling NV > regs from C if we can get good code generation but even if not we > still have it in the interrupt returns). > > That said I will give it a try here. At very least it might be a > better intermediate step. Ah yes, this way doesn't work well for later patches because you end e.g., with the do_break call having to call the interrupt handler wrappers again when they actually expect to be in the asm entry state (e.g., irq soft-mask state) when called, and return via interrupt_return after the exit wrapper runs (which 64s uses
Re: [PATCH v5 00/21] ibmvfc: initial MQ development/enablement
Tyrel, I think this patch series is looking pretty good. I don't think we need to wait for resolution of the can_queue issue being discussed, since that is an issue that exists prior to this patch series and this patch series doesn't make the issue any worse. Let's work that separately. Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [PATCH v5 21/21] ibmvfc: provide modules parameters for MQ settings
Reviewed-by: Brian King -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [PATCH v5 18/21] ibmvfc: send Cancel MAD down each hw scsi channel
Reviewed-by: Brian King -- Brian King Power Linux I/O IBM Linux Technology Center
[PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any sense. It represents the size of an area of memory set aside for user space to use as work areas for certain RTAS calls. Rename it to RTAS_USER_REGION, and express the value in terms of the number of work areas allocated. Signed-off-by: Nathan Lynch squash! powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE --- arch/powerpc/include/asm/rtas.h | 9 ++--- arch/powerpc/kernel/rtas-proc.c | 2 +- arch/powerpc/kernel/rtas.c | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 332e1000ca0f..1aa7ab1cbc84 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -19,8 +19,11 @@ #define RTAS_UNKNOWN_SERVICE (-1) #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */ -/* Buffer size for ppc_rtas system call. */ -#define RTAS_RMOBUF_MAX (64 * 1024) +/* Work areas shared with RTAS must be 4K, naturally aligned. */ +#define RTAS_WORK_AREA_SIZE 4096 + +/* Work areas allocated for user space access. */ +#define RTAS_USER_REGION_SIZE (RTAS_WORK_AREA_SIZE * 16) /* RTAS return status codes */ #define RTAS_BUSY -2/* RTAS Busy */ @@ -357,7 +360,7 @@ extern void rtas_take_timebase(void); static inline int page_is_rtas_user_buf(unsigned long pfn) { unsigned long paddr = (pfn << PAGE_SHIFT); - if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_RMOBUF_MAX)) + if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_USER_REGION_SIZE)) return 1; return 0; } diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index d2b0d99824a4..6857a5b0a1c3 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -767,6 +767,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) */ static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v) { - seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX); + seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_USER_REGION_SIZE); return 0; } diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 55f6aa170e57..da65faadbbb2 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1204,7 +1204,7 @@ void __init rtas_initialize(void) if (firmware_has_feature(FW_FEATURE_LPAR)) rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); #endif - rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE, + rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, PAGE_SIZE, 0, rtas_region); if (!rtas_rmo_buf) panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n", -- 2.29.2
[PATCH 0/6] powerpc/rtas: miscellaneous cleanups, user region allocation
The region exposed to user space for use as work areas passed to sys_rtas() can be incorrectly allocated on radix, leading to failures in users of librtas. Correct this and clean up some of the code visited along the way. I think the cleanups should be unobjectionable and I've placed them first in the series. Please check my work on the rtas_rmo_buf allocation changes; they are only lightly tested so far (slot add on Power9 PowerVM, and comparison of /memory@0/reg with the contents of /proc/powerpc/rtas/rmo_buf on qemu Power9 w/radix). I suspect the per-cpu RTAS argument structures for reentrant calls need similar measures, but I can add that to the series once there is consensus on the approach. Nathan Lynch (6): powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX powerpc/rtas: remove ibm_suspend_me_token powerpc/rtas: move syscall filter setup into separate function powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE powerpc/rtas: constrain user region allocation to RMA arch/powerpc/include/asm/rtas.h | 9 ++- arch/powerpc/kernel/rtas-proc.c | 15 +++-- arch/powerpc/kernel/rtas.c | 108 3 files changed, 98 insertions(+), 34 deletions(-) -- 2.29.2
[PATCH 6/6] powerpc/rtas: constrain user region allocation to RMA
Memory locations passed as arguments from the OS to RTAS usually need to be addressable in 32-bit mode and must reside in the Real Mode Area. On PAPR guests, the RMA starts at logical address 0 and is the first logical memory block reported in the LPAR’s device tree. On powerpc targets with RTAS, Linux makes available to user space a region of memory suitable for arguments to be passed to RTAS via sys_rtas(). This region (rtas_rmo_buf) is allocated via the memblock API during boot in order to ensure that it satisfies the requirements described above. With radix MMU, the upper limit supplied to the memblock allocation can exceed the bounds of the first logical memory block, since ppc64_rma_size is ULONG_MAX and RTAS_INSTANTIATE_MAX is 1GB. (512MB is a common size of the first memory block according to a small sample of LPARs I have checked.) This leads to failures when user space invokes an RTAS function that uses a work area, such as ibm,configure-connector. Alter the determination of the upper limit for rtas_rmo_buf's allocation to consult the device tree directly, ensuring placement within the RMA regardless of the MMU in use. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 80 +++--- 1 file changed, 65 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index da65faadbbb2..98dfb112f4df 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1166,6 +1166,70 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) return 0; } +/* + * Memory locations passed to RTAS must be in the RMA as described by + * the range in /memory@0. + */ +static phys_addr_t rtas_arg_addr_limit(void) +{ + unsigned int addr_cells; + unsigned int size_cells; + struct device_node *np; + const __be32 *prop; + u64 limit; + u64 base; + + /* RTAS is instantiated in 32-bit mode. */ + limit = 1ULL << 32; + + /* Account for mem=. */ + if (memory_limit != 0) + limit = min(limit, memory_limit); + + np = of_find_node_by_path("/memory@0"); + if (!np) + goto out; + + prop = of_get_property(np, "reg", NULL); + if (!prop) + goto put; + + addr_cells = of_n_addr_cells(np); + base = of_read_number(prop, addr_cells); + prop += addr_cells; + size_cells = of_n_size_cells(np); + limit = min(limit, of_read_number(prop, size_cells)); +put: + of_node_put(np); +out: + pr_debug("%s: base = %#llx limit = %#llx", __func__, base, limit); + + return limit; +} + +static void __init rtas_user_region_setup(void) +{ + phys_addr_t limit, align, size; + + limit = rtas_arg_addr_limit(); + size = RTAS_USER_REGION_SIZE; + + /* +* Although work areas need only 4KB alignment, user space +* accesses this region via mmap so it must be placed on a +* page boundary. +*/ + align = PAGE_SIZE; + + rtas_rmo_buf = memblock_phys_alloc_range(size, align, 0, limit); + if (rtas_rmo_buf == 0) { + panic("Failed to allocate %llu bytes for user region below %pa\n", + size, ); + } + + pr_debug("RTAS user region allocated at %pa\n", _rmo_buf); +} + /* * Call early during boot, before mem init, to retrieve the RTAS * information from the device-tree and allocate the RMO buffer for userland @@ -1173,7 +1237,6 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) */ void __init rtas_initialize(void) { - unsigned long rtas_region = RTAS_INSTANTIATE_MAX; u32 base, size, entry; int no_base, no_size, no_entry; @@ -1197,23 +1260,10 @@ void __init rtas_initialize(void) no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", ); rtas.entry = no_entry ? rtas.base : entry; - /* If RTAS was found, allocate the RMO buffer for it and look for -* the stop-self token if any -*/ -#ifdef CONFIG_PPC64 - if (firmware_has_feature(FW_FEATURE_LPAR)) - rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); -#endif - rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, PAGE_SIZE, -0, rtas_region); - if (!rtas_rmo_buf) - panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n", - PAGE_SIZE, _region); - #ifdef CONFIG_RTAS_ERROR_LOGGING rtas_last_error_token = rtas_token("rtas-last-error"); #endif - + rtas_user_region_setup(); rtas_syscall_filter_init(); } -- 2.29.2
[PATCH 4/6] powerpc/rtas: move syscall filter setup into separate function
Reduce conditionally compiled sections within rtas_initialize() by moving the filter table initialization into its own function already guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 60fcf7f7b0b8..55f6aa170e57 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1051,6 +1051,16 @@ static bool block_rtas_call(int token, int nargs, return true; } +static void __init rtas_syscall_filter_init(void) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) { + rtas_filters[i].token = rtas_token(rtas_filters[i].name); + } + +} + #else static bool block_rtas_call(int token, int nargs, @@ -1059,6 +1069,10 @@ static bool block_rtas_call(int token, int nargs, return false; } +static void __init rtas_syscall_filter_init(void) +{ +} + #endif /* CONFIG_PPC_RTAS_FILTER */ /* We assume to be passed big endian arguments */ @@ -1162,9 +1176,6 @@ void __init rtas_initialize(void) unsigned long rtas_region = RTAS_INSTANTIATE_MAX; u32 base, size, entry; int no_base, no_size, no_entry; -#ifdef CONFIG_PPC_RTAS_FILTER - int i; -#endif /* Get RTAS dev node and fill up our "rtas" structure with infos * about it. @@ -1203,11 +1214,7 @@ void __init rtas_initialize(void) rtas_last_error_token = rtas_token("rtas-last-error"); #endif -#ifdef CONFIG_PPC_RTAS_FILTER - for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) { - rtas_filters[i].token = rtas_token(rtas_filters[i].name); - } -#endif + rtas_syscall_filter_init(); } int __init early_init_dt_scan_rtas(unsigned long node, -- 2.29.2
[PATCH 3/6] powerpc/rtas: remove ibm_suspend_me_token
There's not a compelling reason to cache the value of the token for the ibm,suspend-me function. Just look it up when needed in the RTAS syscall's special case for it. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index d126d71ea5bd..60fcf7f7b0b8 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -828,7 +828,6 @@ void rtas_activate_firmware(void) pr_err("ibm,activate-firmware failed (%i)\n", fwrc); } -static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE; #ifdef CONFIG_PPC_PSERIES /** * rtas_call_reentrant() - Used for reentrant rtas calls @@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) return -EINVAL; /* Need to handle ibm,suspend_me call specially */ - if (token == ibm_suspend_me_token) { + if (token == rtas_token("ibm,suspend-me")) { /* * rtas_ibm_suspend_me assumes the streamid handle is in cpu @@ -1191,10 +1190,8 @@ void __init rtas_initialize(void) * the stop-self token if any */ #ifdef CONFIG_PPC64 - if (firmware_has_feature(FW_FEATURE_LPAR)) { + if (firmware_has_feature(FW_FEATURE_LPAR)) rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); - ibm_suspend_me_token = rtas_token("ibm,suspend-me"); - } #endif rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE, 0, rtas_region); -- 2.29.2
[PATCH 1/6] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation
Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for /proc/powerpc/rtas/rmo_buffer, explaining its expected use. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas-proc.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index 2d33f342a293..e0f8329966d6 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -757,7 +757,16 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) #define RMO_READ_BUF_MAX 30 -/* RTAS Userspace access */ +/** + * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space. + * + * Base + size description of a range of RTAS-addressable memory set + * aside for user space to use as work area(s) for certain RTAS + * functions. User space accesses this region via /dev/mem. Apart from + * security policies, the kernel does not arbitrate or serialize + * access to this region, and user space must ensure that concurrent + * users do not interfere with each other. + */ static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v) { seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX); -- 2.29.2
[PATCH 2/6] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX
This constant is unused. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas-proc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index e0f8329966d6..d2b0d99824a4 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, void *v) return 0; } -#define RMO_READ_BUF_MAX 30 - /** * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space. * -- 2.29.2
Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet
On Thu, 14 Jan 2021 08:33:49 + Lee Jones wrote: > On Wed, 13 Jan 2021, Jakub Kicinski wrote: > > > On Wed, 13 Jan 2021 16:41:16 + Lee Jones wrote: > > > Resending the stragglers again. > > > > > > > > > This set is part of a larger effort attempting to clean-up W=1 > > > > > > kernel builds, which are currently overwhelmingly riddled with > > > > > > niggly little warnings. > > > > > > > > > > > > v2: > > > > > > - Squashed IBM patches > > > > > > - Fixed real issue in SMSC > > > - Added Andrew's Reviewed-by tags on remainder > > > > Does not apply, please rebase on net-next/master. > > These are based on Tuesday's next/master. What's next/master? This is net-next: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/ > I just rebased them now with no issue. > > What conflict are you seeing? Applying: net: ethernet: smsc: smc91x: Fix function name in kernel-doc header error: patch failed: drivers/net/ethernet/smsc/smc91x.c:2192 error: drivers/net/ethernet/smsc/smc91x.c: patch does not apply Patch failed at 0001 net: ethernet: smsc: smc91x: Fix function name in kernel-doc header hint: Use 'git am --show-current-patch=diff' to see the failed patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort".
[PATCH v5 17/21] ibmvfc: add cancel mad initialization helper
Add a helper routine for initializing a Cancel MAD. This will be useful for a channelized client that needs to send Cancel commands down every channel commands were sent for a particular LUN. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 68 -- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 578e27180f10..b0b0212344f3 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -2379,6 +2379,45 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, void *device, return SUCCESS; } +static struct ibmvfc_event *ibmvfc_init_tmf(struct ibmvfc_queue *queue, + struct scsi_device *sdev, + int type) +{ + struct ibmvfc_host *vhost = shost_priv(sdev->host); + struct scsi_target *starget = scsi_target(sdev); + struct fc_rport *rport = starget_to_rport(starget); + struct ibmvfc_event *evt; + struct ibmvfc_tmf *tmf; + + evt = ibmvfc_get_event(queue); + ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT); + + tmf = >iu.tmf; + memset(tmf, 0, sizeof(*tmf)); + if (ibmvfc_check_caps(vhost, IBMVFC_HANDLE_VF_WWPN)) { + tmf->common.version = cpu_to_be32(2); + tmf->target_wwpn = cpu_to_be64(rport->port_name); + } else { + tmf->common.version = cpu_to_be32(1); + } + tmf->common.opcode = cpu_to_be32(IBMVFC_TMF_MAD); + tmf->common.length = cpu_to_be16(sizeof(*tmf)); + tmf->scsi_id = cpu_to_be64(rport->port_id); + int_to_scsilun(sdev->lun, >lun); + if (!ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPRESS_ABTS)) + type &= ~IBMVFC_TMF_SUPPRESS_ABTS; + if (vhost->state == IBMVFC_ACTIVE) + tmf->flags = cpu_to_be32((type | IBMVFC_TMF_LUA_VALID)); + else + tmf->flags = cpu_to_be32(((type & IBMVFC_TMF_SUPPRESS_ABTS) | IBMVFC_TMF_LUA_VALID)); + tmf->cancel_key = cpu_to_be32((unsigned long)sdev->hostdata); + tmf->my_cancel_key = cpu_to_be32((unsigned long)starget->hostdata); + + init_completion(>comp); + + return evt; +} + /** * ibmvfc_cancel_all - Cancel all outstanding commands to the device * @sdev: scsi device to cancel commands @@ -2393,9 +2432,6 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, void *device, static int ibmvfc_cancel_all(struct scsi_device *sdev, int type) { struct ibmvfc_host *vhost = shost_priv(sdev->host); - struct scsi_target *starget = scsi_target(sdev); - struct fc_rport *rport = starget_to_rport(starget); - struct ibmvfc_tmf *tmf; struct ibmvfc_event *evt, *found_evt; union ibmvfc_iu rsp; int rsp_rc = -EBUSY; @@ -2422,32 +2458,8 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type) } if (vhost->logged_in) { - evt = ibmvfc_get_event(>crq); - ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT); - - tmf = >iu.tmf; - memset(tmf, 0, sizeof(*tmf)); - if (ibmvfc_check_caps(vhost, IBMVFC_HANDLE_VF_WWPN)) { - tmf->common.version = cpu_to_be32(2); - tmf->target_wwpn = cpu_to_be64(rport->port_name); - } else { - tmf->common.version = cpu_to_be32(1); - } - tmf->common.opcode = cpu_to_be32(IBMVFC_TMF_MAD); - tmf->common.length = cpu_to_be16(sizeof(*tmf)); - tmf->scsi_id = cpu_to_be64(rport->port_id); - int_to_scsilun(sdev->lun, >lun); - if (!ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPRESS_ABTS)) - type &= ~IBMVFC_TMF_SUPPRESS_ABTS; - if (vhost->state == IBMVFC_ACTIVE) - tmf->flags = cpu_to_be32((type | IBMVFC_TMF_LUA_VALID)); - else - tmf->flags = cpu_to_be32(((type & IBMVFC_TMF_SUPPRESS_ABTS) | IBMVFC_TMF_LUA_VALID)); - tmf->cancel_key = cpu_to_be32((unsigned long)sdev->hostdata); - tmf->my_cancel_key = cpu_to_be32((unsigned long)starget->hostdata); - + evt = ibmvfc_init_tmf(>crq, sdev, type); evt->sync_iu = - init_completion(>comp); rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout); } -- 2.27.0
[PATCH v5 10/21] ibmvfc: define Sub-CRQ interrupt handler routine
Simple handler that calls Sub-CRQ drain routine directly. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index f3cd092478ee..51bcafad9490 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3571,6 +3571,16 @@ static void ibmvfc_drain_sub_crq(struct ibmvfc_queue *scrq) } } +static irqreturn_t ibmvfc_interrupt_scsi(int irq, void *scrq_instance) +{ + struct ibmvfc_queue *scrq = (struct ibmvfc_queue *)scrq_instance; + + ibmvfc_toggle_scrq_irq(scrq, 0); + ibmvfc_drain_sub_crq(scrq); + + return IRQ_HANDLED; +} + /** * ibmvfc_init_tgt - Set the next init job step for the target * @tgt: ibmvfc target struct -- 2.27.0
[PATCH v5 18/21] ibmvfc: send Cancel MAD down each hw scsi channel
In general the client needs to send Cancel MADs and task management commands down the same channel as the command(s) intended to cancel or abort. The client assigns cancel keys per LUN and thus must send a Cancel down each channel commands were submitted for that LUN. Further, the client then must wait for those cancel completions prior to submitting a LUN RESET or ABORT TASK SET. Add a cancel rsp iu syncronization field to the ibmvfc_queue struct such that the cancel routine can sync the cancel response to each queue that requires a cancel command. Build a list of each cancel event sent and wait for the completion of each submitted cancel. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 109 + drivers/scsi/ibmvscsi/ibmvfc.h | 3 + 2 files changed, 100 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index b0b0212344f3..5ca8fcafd1d5 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -2418,18 +2418,82 @@ static struct ibmvfc_event *ibmvfc_init_tmf(struct ibmvfc_queue *queue, return evt; } -/** - * ibmvfc_cancel_all - Cancel all outstanding commands to the device - * @sdev: scsi device to cancel commands - * @type: type of error recovery being performed - * - * This sends a cancel to the VIOS for the specified device. This does - * NOT send any abort to the actual device. That must be done separately. - * - * Returns: - * 0 on success / other on failure - **/ -static int ibmvfc_cancel_all(struct scsi_device *sdev, int type) +static int ibmvfc_cancel_all_mq(struct scsi_device *sdev, int type) +{ + struct ibmvfc_host *vhost = shost_priv(sdev->host); + struct ibmvfc_event *evt, *found_evt, *temp; + struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs; + unsigned long flags; + int num_hwq, i; + int fail = 0; + LIST_HEAD(cancelq); + u16 status; + + ENTER; + spin_lock_irqsave(vhost->host->host_lock, flags); + num_hwq = vhost->scsi_scrqs.active_queues; + for (i = 0; i < num_hwq; i++) { + spin_lock(queues[i].q_lock); + spin_lock([i].l_lock); + found_evt = NULL; + list_for_each_entry(evt, [i].sent, queue_list) { + if (evt->cmnd && evt->cmnd->device == sdev) { + found_evt = evt; + break; + } + } + spin_unlock([i].l_lock); + + if (found_evt && vhost->logged_in) { + evt = ibmvfc_init_tmf([i], sdev, type); + evt->sync_iu = [i].cancel_rsp; + ibmvfc_send_event(evt, vhost, default_timeout); + list_add_tail(>cancel, ); + } + + spin_unlock(queues[i].q_lock); + } + spin_unlock_irqrestore(vhost->host->host_lock, flags); + + if (list_empty()) { + if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL) + sdev_printk(KERN_INFO, sdev, "No events found to cancel\n"); + return 0; + } + + sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n"); + + list_for_each_entry_safe(evt, temp, , cancel) { + wait_for_completion(>comp); + status = be16_to_cpu(evt->queue->cancel_rsp.mad_common.status); + list_del(>cancel); + ibmvfc_free_event(evt); + + if (status != IBMVFC_MAD_SUCCESS) { + sdev_printk(KERN_WARNING, sdev, "Cancel failed with rc=%x\n", status); + switch (status) { + case IBMVFC_MAD_DRIVER_FAILED: + case IBMVFC_MAD_CRQ_ERROR: + /* Host adapter most likely going through reset, return success to +* the caller will wait for the command being cancelled to get returned +*/ + break; + default: + fail = 1; + break; + } + } + } + + if (fail) + return -EIO; + + sdev_printk(KERN_INFO, sdev, "Successfully cancelled outstanding commands\n"); + LEAVE; + return 0; +} + +static int ibmvfc_cancel_all_sq(struct scsi_device *sdev, int type) { struct ibmvfc_host *vhost = shost_priv(sdev->host); struct ibmvfc_event *evt, *found_evt; @@ -2498,6 +2562,27 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type) return 0; } +/** + * ibmvfc_cancel_all - Cancel all outstanding commands to the device + * @sdev: scsi device to cancel commands + * @type: type of error recovery being performed + * + * This sends a cancel to the VIOS for the specified
[PATCH v5 14/21] ibmvfc: set and track hw queue in ibmvfc_event struct
Extract the hwq id from a SCSI command and store it in the ibmvfc_event structure to identify which Sub-CRQ to send the command down when channels are being utilized. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 5 + drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 0653d52d4ea0..3f3cc37a263f 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1483,6 +1483,7 @@ static void ibmvfc_init_event(struct ibmvfc_event *evt, evt->_done = done; evt->done = ibmvfc_locked_done; } + evt->hwq = 0; } /** @@ -1839,6 +1840,8 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd) struct ibmvfc_cmd *vfc_cmd; struct ibmvfc_fcp_cmd_iu *iu; struct ibmvfc_event *evt; + u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request); + u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq); int rc; if (unlikely((rc = fc_remote_port_chkready(rport))) || @@ -1865,6 +1868,8 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd) } vfc_cmd->correlation = cpu_to_be64((u64)evt); + if (vhost->using_channels) + evt->hwq = hwq % vhost->scsi_scrqs.active_queues; if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev return ibmvfc_send_event(evt, vhost, 0); diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index 3d76cd3c1fd9..2dbce7071409 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -753,6 +753,7 @@ struct ibmvfc_event { struct completion comp; struct completion *eh_comp; struct timer_list timer; + u16 hwq; }; /* a pool of event structs for use */ -- 2.27.0
[PATCH v5 20/21] ibmvfc: enable MQ and set reasonable defaults
Turn on MQ by default and set sane values for the upper limit on hw queues for the scsi host, and number of hw scsi channels to request from the partner VIOS. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index c3bb83c9d8a6..0391cb746d0b 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -41,9 +41,9 @@ #define IBMVFC_DEFAULT_LOG_LEVEL 2 #define IBMVFC_MAX_CDB_LEN 16 #define IBMVFC_CLS3_ERROR 0 -#define IBMVFC_MQ 0 -#define IBMVFC_SCSI_CHANNELS 0 -#define IBMVFC_SCSI_HW_QUEUES 1 +#define IBMVFC_MQ 1 +#define IBMVFC_SCSI_CHANNELS 8 +#define IBMVFC_SCSI_HW_QUEUES 8 #define IBMVFC_MIG_NO_SUB_TO_CRQ 0 #define IBMVFC_MIG_NO_N_TO_M 0 -- 2.27.0
[PATCH v5 19/21] ibmvfc: purge scsi channels after transport loss/reset
Grab the queue and list lock for each Sub-CRQ and add any uncompleted events to the host purge list. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 5ca8fcafd1d5..d314dffaafc4 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1056,7 +1056,13 @@ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code) static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code) { struct ibmvfc_event *evt, *pos; + struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs; unsigned long flags; + int hwqs = 0; + int i; + + if (vhost->using_channels) + hwqs = vhost->scsi_scrqs.active_queues; ibmvfc_dbg(vhost, "Purging all requests\n"); spin_lock_irqsave(>crq.l_lock, flags); @@ -1064,6 +1070,16 @@ static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code) ibmvfc_fail_request(evt, error_code); list_splice_init(>crq.sent, >purge); spin_unlock_irqrestore(>crq.l_lock, flags); + + for (i = 0; i < hwqs; i++) { + spin_lock_irqsave(queues[i].q_lock, flags); + spin_lock([i].l_lock); + list_for_each_entry_safe(evt, pos, [i].sent, queue_list) + ibmvfc_fail_request(evt, error_code); + list_splice_init([i].sent, >purge); + spin_unlock([i].l_lock); + spin_unlock_irqrestore(queues[i].q_lock, flags); + } } /** -- 2.27.0
[PATCH v5 21/21] ibmvfc: provide modules parameters for MQ settings
Add the various module parameter toggles for adjusting the MQ characteristics at boot/load time as well as a device attribute for changing the client scsi channel request amount. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 75 ++ drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 67 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index d314dffaafc4..7097028d4cb6 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -40,6 +40,12 @@ static unsigned int disc_threads = IBMVFC_MAX_DISC_THREADS; static unsigned int ibmvfc_debug = IBMVFC_DEBUG; static unsigned int log_level = IBMVFC_DEFAULT_LOG_LEVEL; static unsigned int cls3_error = IBMVFC_CLS3_ERROR; +static unsigned int mq_enabled = IBMVFC_MQ; +static unsigned int nr_scsi_hw_queues = IBMVFC_SCSI_HW_QUEUES; +static unsigned int nr_scsi_channels = IBMVFC_SCSI_CHANNELS; +static unsigned int mig_channels_only = IBMVFC_MIG_NO_SUB_TO_CRQ; +static unsigned int mig_no_less_channels = IBMVFC_MIG_NO_N_TO_M; + static LIST_HEAD(ibmvfc_head); static DEFINE_SPINLOCK(ibmvfc_driver_lock); static struct scsi_transport_template *ibmvfc_transport_template; @@ -49,6 +55,22 @@ MODULE_AUTHOR("Brian King "); MODULE_LICENSE("GPL"); MODULE_VERSION(IBMVFC_DRIVER_VERSION); +module_param_named(mq, mq_enabled, uint, S_IRUGO); +MODULE_PARM_DESC(mq, "Enable multiqueue support. " +"[Default=" __stringify(IBMVFC_MQ) "]"); +module_param_named(scsi_host_queues, nr_scsi_hw_queues, uint, S_IRUGO); +MODULE_PARM_DESC(scsi_host_queues, "Number of SCSI Host submission queues. " +"[Default=" __stringify(IBMVFC_SCSI_HW_QUEUES) "]"); +module_param_named(scsi_hw_channels, nr_scsi_channels, uint, S_IRUGO); +MODULE_PARM_DESC(scsi_hw_channels, "Number of hw scsi channels to request. " +"[Default=" __stringify(IBMVFC_SCSI_CHANNELS) "]"); +module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO); +MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized system. " +"[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]"); +module_param_named(mig_no_less_channels, mig_no_less_channels, uint, S_IRUGO); +MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with less channels. " +"[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]"); + module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. " "[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]"); @@ -926,7 +948,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost) crq->cur = 0; if (vhost->scsi_scrqs.scrqs) { - for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) { + for (i = 0; i < nr_scsi_hw_queues; i++) { scrq = >scsi_scrqs.scrqs[i]; spin_lock(scrq->q_lock); memset(scrq->msgs.scrq, 0, PAGE_SIZE); @@ -3397,6 +3419,37 @@ static ssize_t ibmvfc_store_log_level(struct device *dev, return strlen(buf); } +static ssize_t ibmvfc_show_scsi_channels(struct device *dev, +struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + struct ibmvfc_host *vhost = shost_priv(shost); + unsigned long flags = 0; + int len; + + spin_lock_irqsave(shost->host_lock, flags); + len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels); + spin_unlock_irqrestore(shost->host_lock, flags); + return len; +} + +static ssize_t ibmvfc_store_scsi_channels(struct device *dev, +struct device_attribute *attr, +const char *buf, size_t count) +{ + struct Scsi_Host *shost = class_to_shost(dev); + struct ibmvfc_host *vhost = shost_priv(shost); + unsigned long flags = 0; + unsigned int channels; + + spin_lock_irqsave(shost->host_lock, flags); + channels = simple_strtoul(buf, NULL, 10); + vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues); + ibmvfc_hard_reset_host(vhost); + spin_unlock_irqrestore(shost->host_lock, flags); + return strlen(buf); +} + static DEVICE_ATTR(partition_name, S_IRUGO, ibmvfc_show_host_partition_name, NULL); static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, NULL); static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL); @@ -3405,6 +3458,8 @@ static DEVICE_ATTR(npiv_version, S_IRUGO, ibmvfc_show_host_npiv_version, NULL); static DEVICE_ATTR(capabilities, S_IRUGO, ibmvfc_show_host_capabilities, NULL); static DEVICE_ATTR(log_level, S_IRUGO | S_IWUSR, ibmvfc_show_log_level, ibmvfc_store_log_level); +static DEVICE_ATTR(nr_scsi_channels,
[PATCH v5 15/21] ibmvfc: send commands down HW Sub-CRQ when channelized
When the client has negotiated the use of channels all vfcFrames are required to go down a Sub-CRQ channel or it is a protocoal violation. If the adapter state is channelized submit vfcFrames to the appropriate Sub-CRQ via the h_send_sub_crq() helper. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 39 -- 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 3f3cc37a263f..865b87881d86 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -704,6 +704,15 @@ static int ibmvfc_send_crq(struct ibmvfc_host *vhost, u64 word1, u64 word2) return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, word1, word2); } +static int ibmvfc_send_sub_crq(struct ibmvfc_host *vhost, u64 cookie, u64 word1, + u64 word2, u64 word3, u64 word4) +{ + struct vio_dev *vdev = to_vio_dev(vhost->dev); + + return plpar_hcall_norets(H_SEND_SUB_CRQ, vdev->unit_address, cookie, + word1, word2, word3, word4); +} + /** * ibmvfc_send_crq_init - Send a CRQ init message * @vhost: ibmvfc host struct @@ -1623,8 +1632,17 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt, mb(); - if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]), - be64_to_cpu(crq_as_u64[1] { + if (evt->queue->fmt == IBMVFC_SUB_CRQ_FMT) + rc = ibmvfc_send_sub_crq(vhost, +evt->queue->vios_cookie, +be64_to_cpu(crq_as_u64[0]), +be64_to_cpu(crq_as_u64[1]), +0, 0); + else + rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]), +be64_to_cpu(crq_as_u64[1])); + + if (rc) { list_del(>queue_list); spin_unlock_irqrestore(>queue->l_lock, flags); del_timer(>timer); @@ -1842,6 +1860,7 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd) struct ibmvfc_event *evt; u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request); u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq); + u16 scsi_channel; int rc; if (unlikely((rc = fc_remote_port_chkready(rport))) || @@ -1852,7 +1871,13 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd) } cmnd->result = (DID_OK << 16); - evt = ibmvfc_get_event(>crq); + if (vhost->using_channels) { + scsi_channel = hwq % vhost->scsi_scrqs.active_queues; + evt = ibmvfc_get_event(>scsi_scrqs.scrqs[scsi_channel]); + evt->hwq = hwq % vhost->scsi_scrqs.active_queues; + } else + evt = ibmvfc_get_event(>crq); + ibmvfc_init_event(evt, ibmvfc_scsi_done, IBMVFC_CMD_FORMAT); evt->cmnd = cmnd; @@ -1868,8 +1893,6 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd) } vfc_cmd->correlation = cpu_to_be64((u64)evt); - if (vhost->using_channels) - evt->hwq = hwq % vhost->scsi_scrqs.active_queues; if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev return ibmvfc_send_event(evt, vhost, 0); @@ -2200,7 +2223,11 @@ static int ibmvfc_reset_device(struct scsi_device *sdev, int type, char *desc) spin_lock_irqsave(vhost->host->host_lock, flags); if (vhost->state == IBMVFC_ACTIVE) { - evt = ibmvfc_get_event(>crq); + if (vhost->using_channels) + evt = ibmvfc_get_event(>scsi_scrqs.scrqs[0]); + else + evt = ibmvfc_get_event(>crq); + ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_CMD_FORMAT); tmf = ibmvfc_init_vfc_cmd(evt, sdev); iu = ibmvfc_get_fcp_iu(vhost, tmf); -- 2.27.0
[PATCH v5 13/21] ibmvfc: advertise client support for using hardware channels
Previous patches have plumbed the necessary Sub-CRQ interface and channel negotiation MADs to fully channelize via hardware backed queues. Advertise client support via NPIV Login capability IBMVFC_CAN_USE_CHANNELS when the client bits have MQ enabled via vhost->mq_enabled, or when channels were already in use during a subsequent NPIV Login. The later is required because channel support is only renegotiated after a CRQ pair is broken. Simple NPIV Logout/Logins require the client to continue to advertise the channel capability until the CRQ pair between the client is broken. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index a00f38558613..0653d52d4ea0 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1410,6 +1410,10 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost) login_info->max_cmds = cpu_to_be32(max_requests + IBMVFC_NUM_INTERNAL_REQ); login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | IBMVFC_CAN_SEND_VF_WWPN); + + if (vhost->mq_enabled || vhost->using_channels) + login_info->capabilities |= cpu_to_be64(IBMVFC_CAN_USE_CHANNELS); + login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token); login_info->async.len = cpu_to_be32(async_crq->size * sizeof(*async_crq->msgs.async)); -- 2.27.0
[PATCH v5 12/21] ibmvfc: implement channel enquiry and setup commands
New NPIV_ENQUIRY_CHANNEL and NPIV_SETUP_CHANNEL management datagrams (MADs) were defined in a previous patchset. If the client advertises a desire to use channels and the partner VIOS is channel capable then the client must proceed with channel enquiry to determine the maximum number of channels the VIOS is capable of providing, and registering SubCRQs via channel setup with the VIOS immediately following NPIV Login. This handshaking should not be performed for subsequent NPIV Logins unless the CRQ connection has been reset. Implement these two new MADs and issue them following a successful NPIV login where the VIOS has set the SUPPORT_CHANNELS capability bit in the NPIV Login response. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 135 - drivers/scsi/ibmvscsi/ibmvfc.h | 3 + 2 files changed, 136 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index d3d7c6b53d4f..a00f38558613 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -909,6 +909,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost) spin_lock(vhost->crq.q_lock); vhost->state = IBMVFC_NO_CRQ; vhost->logged_in = 0; + vhost->do_enquiry = 1; + vhost->using_channels = 0; /* Clean out the queue */ memset(crq->msgs.crq, 0, PAGE_SIZE); @@ -4586,6 +4588,118 @@ static void ibmvfc_discover_targets(struct ibmvfc_host *vhost) ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); } +static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt) +{ + struct ibmvfc_host *vhost = evt->vhost; + u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status); + int level = IBMVFC_DEFAULT_LOG_LEVEL; + + ibmvfc_free_event(evt); + + switch (mad_status) { + case IBMVFC_MAD_SUCCESS: + ibmvfc_dbg(vhost, "Channel Setup succeded\n"); + vhost->do_enquiry = 0; + break; + case IBMVFC_MAD_FAILED: + level += ibmvfc_retry_host_init(vhost); + ibmvfc_log(vhost, level, "Channel Setup failed\n"); + fallthrough; + case IBMVFC_MAD_DRIVER_FAILED: + return; + default: + dev_err(vhost->dev, "Invalid Channel Setup response: 0x%x\n", + mad_status); + ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); + return; + } + + ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY); + wake_up(>work_wait_q); +} + +static void ibmvfc_channel_setup(struct ibmvfc_host *vhost) +{ + struct ibmvfc_channel_setup_mad *mad; + struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf; + struct ibmvfc_event *evt = ibmvfc_get_event(>crq); + + memset(setup_buf, 0, sizeof(*setup_buf)); + setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS); + + ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT); + mad = >iu.channel_setup; + memset(mad, 0, sizeof(*mad)); + mad->common.version = cpu_to_be32(1); + mad->common.opcode = cpu_to_be32(IBMVFC_CHANNEL_SETUP); + mad->common.length = cpu_to_be16(sizeof(*mad)); + mad->buffer.va = cpu_to_be64(vhost->channel_setup_dma); + mad->buffer.len = cpu_to_be32(sizeof(*vhost->channel_setup_buf)); + + ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT_WAIT); + + if (!ibmvfc_send_event(evt, vhost, default_timeout)) + ibmvfc_dbg(vhost, "Sent channel setup\n"); + else + ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN); +} + +static void ibmvfc_channel_enquiry_done(struct ibmvfc_event *evt) +{ + struct ibmvfc_host *vhost = evt->vhost; + struct ibmvfc_channel_enquiry *rsp = >xfer_iu->channel_enquiry; + u32 mad_status = be16_to_cpu(rsp->common.status); + int level = IBMVFC_DEFAULT_LOG_LEVEL; + + switch (mad_status) { + case IBMVFC_MAD_SUCCESS: + ibmvfc_dbg(vhost, "Channel Enquiry succeeded\n"); + vhost->max_vios_scsi_channels = be32_to_cpu(rsp->num_scsi_subq_channels); + ibmvfc_free_event(evt); + break; + case IBMVFC_MAD_FAILED: + level += ibmvfc_retry_host_init(vhost); + ibmvfc_log(vhost, level, "Channel Enquiry failed\n"); + fallthrough; + case IBMVFC_MAD_DRIVER_FAILED: + ibmvfc_free_event(evt); + return; + default: + dev_err(vhost->dev, "Invalid Channel Enquiry response: 0x%x\n", + mad_status); + ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); + ibmvfc_free_event(evt); + return; + } + + ibmvfc_channel_setup(vhost); +} + +static void ibmvfc_channel_enquiry(struct ibmvfc_host *vhost) +{ + struct
[PATCH v5 16/21] ibmvfc: register Sub-CRQ handles with VIOS during channel setup
If the ibmvfc client adapter requests channels it must submit a number of Sub-CRQ handles matching the number of channels being requested. The VIOS in its response will overwrite the actual number of channel resources allocated which may be less than what was requested. The client then must store the VIOS Sub-CRQ handle for each queue. This VIOS handle is needed as a parameter with h_send_sub_crq(). Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 865b87881d86..578e27180f10 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -4627,15 +4627,35 @@ static void ibmvfc_discover_targets(struct ibmvfc_host *vhost) static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt) { struct ibmvfc_host *vhost = evt->vhost; + struct ibmvfc_channel_setup *setup = vhost->channel_setup_buf; + struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs; u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status); int level = IBMVFC_DEFAULT_LOG_LEVEL; + int flags, active_queues, i; ibmvfc_free_event(evt); switch (mad_status) { case IBMVFC_MAD_SUCCESS: ibmvfc_dbg(vhost, "Channel Setup succeded\n"); + flags = be32_to_cpu(setup->flags); vhost->do_enquiry = 0; + active_queues = be32_to_cpu(setup->num_scsi_subq_channels); + scrqs->active_queues = active_queues; + + if (flags & IBMVFC_CHANNELS_CANCELED) { + ibmvfc_dbg(vhost, "Channels Canceled\n"); + vhost->using_channels = 0; + } else { + if (active_queues) + vhost->using_channels = 1; + for (i = 0; i < active_queues; i++) + scrqs->scrqs[i].vios_cookie = + be64_to_cpu(setup->channel_handles[i]); + + ibmvfc_dbg(vhost, "Using %u channels\n", + vhost->scsi_scrqs.active_queues); + } break; case IBMVFC_MAD_FAILED: level += ibmvfc_retry_host_init(vhost); @@ -4659,9 +4679,19 @@ static void ibmvfc_channel_setup(struct ibmvfc_host *vhost) struct ibmvfc_channel_setup_mad *mad; struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf; struct ibmvfc_event *evt = ibmvfc_get_event(>crq); + struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs; + unsigned int num_channels = + min(vhost->client_scsi_channels, vhost->max_vios_scsi_channels); + int i; memset(setup_buf, 0, sizeof(*setup_buf)); - setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS); + if (num_channels == 0) + setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS); + else { + setup_buf->num_scsi_subq_channels = cpu_to_be32(num_channels); + for (i = 0; i < num_channels; i++) + setup_buf->channel_handles[i] = cpu_to_be64(scrqs->scrqs[i].cookie); + } ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT); mad = >iu.channel_setup; -- 2.27.0
[PATCH v5 11/21] ibmvfc: map/request irq and register Sub-CRQ interrupt handler
Create an irq mapping for the hw_irq number provided from phyp firmware. Request an irq assigned our Sub-CRQ interrupt handler. Unmap these irqs at Sub-CRQ teardown. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 25 + 1 file changed, 25 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 51bcafad9490..d3d7c6b53d4f 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -5292,12 +5292,34 @@ static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost, goto reg_failed; } + scrq->irq = irq_create_mapping(NULL, scrq->hw_irq); + + if (!scrq->irq) { + rc = -EINVAL; + dev_err(dev, "Error mapping sub-crq[%d] irq\n", index); + goto irq_failed; + } + + snprintf(scrq->name, sizeof(scrq->name), "ibmvfc-%x-scsi%d", +vdev->unit_address, index); + rc = request_irq(scrq->irq, ibmvfc_interrupt_scsi, 0, scrq->name, scrq); + + if (rc) { + dev_err(dev, "Couldn't register sub-crq[%d] irq\n", index); + irq_dispose_mapping(scrq->irq); + goto irq_failed; + } + scrq->hwq_id = index; scrq->vhost = vhost; LEAVE; return 0; +irq_failed: + do { + plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, scrq->cookie); + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc)); reg_failed: ibmvfc_free_queue(vhost, scrq); LEAVE; @@ -5313,6 +5335,9 @@ static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index) ENTER; + free_irq(scrq->irq, scrq); + irq_dispose_mapping(scrq->irq); + do { rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, scrq->cookie); -- 2.27.0
[PATCH v5 03/21] ibmvfc: init/free event pool during queue allocation/free
The event pool and CRQ used to be separate entities of the adapter host structure and as such were allocated and freed independently of each other. Recent work as defined a generic queue structure with an event pool specific to each queue. As such the event pool for each queue shouldn't be allocated/freed independently, but instead performed as part of the queue allocation/free routines. Move the calls to ibmvfc_event_pool_{init|free} into ibmvfc_{alloc|free}_queue respectively. The only functional change here is that the CRQ cannot be released in ibmvfc_remove until after the event pool has been successfully purged since releasing the queue will also free the event pool. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index cd9273a5fadb..9330f5a65a7e 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -807,6 +807,8 @@ static void ibmvfc_free_queue(struct ibmvfc_host *vhost, dma_unmap_single(dev, queue->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL); free_page((unsigned long)queue->msgs.handle); queue->msgs.handle = NULL; + + ibmvfc_free_event_pool(vhost, queue); } /** @@ -5019,6 +5021,10 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost, switch (fmt) { case IBMVFC_CRQ_FMT: fmt_size = sizeof(*queue->msgs.crq); + if (ibmvfc_init_event_pool(vhost, queue)) { + dev_err(dev, "Couldn't initialize event pool.\n"); + return -ENOMEM; + } break; case IBMVFC_ASYNC_FMT: fmt_size = sizeof(*queue->msgs.async); @@ -5333,13 +5339,8 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) goto kill_kthread; } - if ((rc = ibmvfc_init_event_pool(vhost, >crq))) { - dev_err(dev, "Couldn't initialize event pool. rc=%d\n", rc); - goto release_crq; - } - if ((rc = scsi_add_host(shost, dev))) - goto release_event_pool; + goto release_crq; fc_host_dev_loss_tmo(shost) = IBMVFC_DEV_LOSS_TMO; @@ -5362,8 +5363,6 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) remove_shost: scsi_remove_host(shost); -release_event_pool: - ibmvfc_free_event_pool(vhost, >crq); release_crq: ibmvfc_release_crq_queue(vhost); kill_kthread: @@ -5398,7 +5397,6 @@ static int ibmvfc_remove(struct vio_dev *vdev) spin_unlock_irqrestore(vhost->host->host_lock, flags); ibmvfc_wait_while_resetting(vhost); - ibmvfc_release_crq_queue(vhost); kthread_stop(vhost->work_thread); fc_remove_host(vhost->host); scsi_remove_host(vhost->host); @@ -5408,7 +5406,7 @@ static int ibmvfc_remove(struct vio_dev *vdev) list_splice_init(>purge, ); spin_unlock_irqrestore(vhost->host->host_lock, flags); ibmvfc_complete_purge(); - ibmvfc_free_event_pool(vhost, >crq); + ibmvfc_release_crq_queue(vhost); ibmvfc_free_mem(vhost); spin_lock(_driver_lock); -- 2.27.0
[PATCH v5 09/21] ibmvfc: add handlers to drain and complete Sub-CRQ responses
The logic for iterating over the Sub-CRQ responses is similiar to that of the primary CRQ. Add the necessary handlers for processing those responses. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 86 ++ 1 file changed, 86 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 5d7ada0ed0d6..f3cd092478ee 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3485,6 +3485,92 @@ static int ibmvfc_toggle_scrq_irq(struct ibmvfc_queue *scrq, int enable) return rc; } +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost, + struct list_head *evt_doneq) +{ + struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba); + + switch (crq->valid) { + case IBMVFC_CRQ_CMD_RSP: + break; + case IBMVFC_CRQ_XPORT_EVENT: + return; + default: + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", crq->valid); + return; + } + + /* The only kind of payload CRQs we should get are responses to +* things we send. Make sure this response is to something we +* actually sent +*/ + if (unlikely(!ibmvfc_valid_event(>queue->evt_pool, evt))) { + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n", + crq->ioba); + return; + } + + if (unlikely(atomic_read(>free))) { + dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n", + crq->ioba); + return; + } + + spin_lock(>queue->l_lock); + list_move_tail(>queue_list, evt_doneq); + spin_unlock(>queue->l_lock); +} + +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_queue *scrq) +{ + struct ibmvfc_crq *crq; + + crq = >msgs.scrq[scrq->cur].crq; + if (crq->valid & 0x80) { + if (++scrq->cur == scrq->size) + scrq->cur = 0; + rmb(); + } else + crq = NULL; + + return crq; +} + +static void ibmvfc_drain_sub_crq(struct ibmvfc_queue *scrq) +{ + struct ibmvfc_crq *crq; + struct ibmvfc_event *evt, *temp; + unsigned long flags; + int done = 0; + LIST_HEAD(evt_doneq); + + spin_lock_irqsave(scrq->q_lock, flags); + while (!done) { + while ((crq = ibmvfc_next_scrq(scrq)) != NULL) { + ibmvfc_handle_scrq(crq, scrq->vhost, _doneq); + crq->valid = 0; + wmb(); + } + + ibmvfc_toggle_scrq_irq(scrq, 1); + if ((crq = ibmvfc_next_scrq(scrq)) != NULL) { + ibmvfc_toggle_scrq_irq(scrq, 0); + ibmvfc_handle_scrq(crq, scrq->vhost, _doneq); + crq->valid = 0; + wmb(); + } else + done = 1; + } + spin_unlock_irqrestore(scrq->q_lock, flags); + + list_for_each_entry_safe(evt, temp, _doneq, queue_list) { + del_timer(>timer); + list_del(>queue_list); + ibmvfc_trc_end(evt); + evt->done(evt); + } +} + /** * ibmvfc_init_tgt - Set the next init job step for the target * @tgt: ibmvfc target struct -- 2.27.0
[PATCH v5 05/21] ibmvfc: define hcall wrapper for registering a Sub-CRQ
Sub-CRQs are registred with firmware via a hypercall. Abstract that interface into a simpler helper function. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 524e81164d70..612c7f3d7bd3 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -138,6 +138,20 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *); static const char *unknown_error = "unknown error"; +static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba, + unsigned long length, unsigned long *cookie, + unsigned long *irq) +{ + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + long rc; + + rc = plpar_hcall(H_REG_SUB_CRQ, retbuf, unit_address, ioba, length); + *cookie = retbuf[0]; + *irq = retbuf[1]; + + return rc; +} + static int ibmvfc_check_caps(struct ibmvfc_host *vhost, unsigned long cap_flags) { u64 host_caps = be64_to_cpu(vhost->login_buf->resp.capabilities); -- 2.27.0
[PATCH v5 08/21] ibmvfc: add Sub-CRQ IRQ enable/disable routine
Each Sub-CRQ has its own interrupt. A hypercall is required to toggle the IRQ state. Provide the necessary mechanism via a helper function. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index a198e118887d..5d7ada0ed0d6 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3465,6 +3465,26 @@ static void ibmvfc_tasklet(void *data) } } +static int ibmvfc_toggle_scrq_irq(struct ibmvfc_queue *scrq, int enable) +{ + struct device *dev = scrq->vhost->dev; + struct vio_dev *vdev = to_vio_dev(dev); + unsigned long rc; + int irq_action = H_ENABLE_VIO_INTERRUPT; + + if (!enable) + irq_action = H_DISABLE_VIO_INTERRUPT; + + rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action, + scrq->hw_irq, 0, 0); + + if (rc) + dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n", + enable ? "enable" : "disable", scrq->hwq_id, rc); + + return rc; +} + /** * ibmvfc_init_tgt - Set the next init job step for the target * @tgt: ibmvfc target struct -- 2.27.0
[PATCH v5 07/21] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
Allocate a set of Sub-CRQs in advance. During channel setup the client and VIOS negotiate the number of queues the VIOS supports and the number that the client desires to request. Its possible that the final channel resources allocated is less than requested, but the client is still responsible for sending handles for every queue it is hoping for. Also, provide deallocation cleanup routines. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 125 + drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 126 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 612c7f3d7bd3..a198e118887d 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -895,6 +895,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost) unsigned long flags; struct vio_dev *vdev = to_vio_dev(vhost->dev); struct ibmvfc_queue *crq = >crq; + struct ibmvfc_queue *scrq; + int i; /* Close the CRQ */ do { @@ -912,6 +914,16 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost) memset(crq->msgs.crq, 0, PAGE_SIZE); crq->cur = 0; + if (vhost->scsi_scrqs.scrqs) { + for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) { + scrq = >scsi_scrqs.scrqs[i]; + spin_lock(scrq->q_lock); + memset(scrq->msgs.scrq, 0, PAGE_SIZE); + scrq->cur = 0; + spin_unlock(scrq->q_lock); + } + } + /* And re-open it again */ rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address, crq->msg_token, PAGE_SIZE); @@ -5045,6 +5057,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost, case IBMVFC_ASYNC_FMT: fmt_size = sizeof(*queue->msgs.async); break; + case IBMVFC_SUB_CRQ_FMT: + fmt_size = sizeof(*queue->msgs.scrq); + /* We need one extra event for Cancel Commands */ + pool_size = max_requests + 1; + break; default: dev_warn(dev, "Unknown command/response queue message format: %d\n", fmt); return -EINVAL; @@ -5136,6 +5153,107 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost) return retrc; } +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost, + int index) +{ + struct device *dev = vhost->dev; + struct vio_dev *vdev = to_vio_dev(dev); + struct ibmvfc_queue *scrq = >scsi_scrqs.scrqs[index]; + int rc = -ENOMEM; + + ENTER; + + if (ibmvfc_alloc_queue(vhost, scrq, IBMVFC_SUB_CRQ_FMT)) + return -ENOMEM; + + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE, + >cookie, >hw_irq); + + if (rc) { + dev_warn(dev, "Error registering sub-crq: %d\n", rc); + if (rc == H_PARAMETER) + dev_warn_once(dev, "Firmware may not support MQ\n"); + goto reg_failed; + } + + scrq->hwq_id = index; + scrq->vhost = vhost; + + LEAVE; + return 0; + +reg_failed: + ibmvfc_free_queue(vhost, scrq); + LEAVE; + return rc; +} + +static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index) +{ + struct device *dev = vhost->dev; + struct vio_dev *vdev = to_vio_dev(dev); + struct ibmvfc_queue *scrq = >scsi_scrqs.scrqs[index]; + long rc; + + ENTER; + + do { + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, + scrq->cookie); + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc)); + + if (rc) + dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc); + + ibmvfc_free_queue(vhost, scrq); + LEAVE; +} + +static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost) +{ + int i, j; + + ENTER; + + vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES, + sizeof(*vhost->scsi_scrqs.scrqs), + GFP_KERNEL); + if (!vhost->scsi_scrqs.scrqs) + return -1; + + for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) { + if (ibmvfc_register_scsi_channel(vhost, i)) { + for (j = i; j > 0; j--) + ibmvfc_deregister_scsi_channel(vhost, j - 1); + kfree(vhost->scsi_scrqs.scrqs); + vhost->scsi_scrqs.scrqs = NULL; + vhost->scsi_scrqs.active_queues = 0; + LEAVE; + return -1; + } + } + + LEAVE; + return 0; +} + +static void ibmvfc_release_sub_crqs(struct
[PATCH v5 04/21] ibmvfc: add size parameter to ibmvfc_init_event_pool
With the upcoming addition of Sub-CRQs the event pool size may vary per-queue. Add a size parameter to ibmvfc_init_event_pool such that different size event pools can be requested by ibmvfc_alloc_queue. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 9330f5a65a7e..524e81164d70 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -723,19 +723,23 @@ static int ibmvfc_send_crq_init_complete(struct ibmvfc_host *vhost) * Returns zero on success. **/ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost, - struct ibmvfc_queue *queue) + struct ibmvfc_queue *queue, + unsigned int size) { int i; struct ibmvfc_event_pool *pool = >evt_pool; ENTER; - pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ; - pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL); + if (!size) + return 0; + + pool->size = size; + pool->events = kcalloc(size, sizeof(*pool->events), GFP_KERNEL); if (!pool->events) return -ENOMEM; pool->iu_storage = dma_alloc_coherent(vhost->dev, - pool->size * sizeof(*pool->iu_storage), + size * sizeof(*pool->iu_storage), >iu_token, 0); if (!pool->iu_storage) { @@ -747,7 +751,7 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost, INIT_LIST_HEAD(>free); spin_lock_init(>l_lock); - for (i = 0; i < pool->size; ++i) { + for (i = 0; i < size; ++i) { struct ibmvfc_event *evt = >events[i]; atomic_set(>free, 1); @@ -5013,6 +5017,7 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost, { struct device *dev = vhost->dev; size_t fmt_size; + unsigned int pool_size = 0; ENTER; spin_lock_init(>_lock); @@ -5021,10 +5026,7 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost, switch (fmt) { case IBMVFC_CRQ_FMT: fmt_size = sizeof(*queue->msgs.crq); - if (ibmvfc_init_event_pool(vhost, queue)) { - dev_err(dev, "Couldn't initialize event pool.\n"); - return -ENOMEM; - } + pool_size = max_requests + IBMVFC_NUM_INTERNAL_REQ; break; case IBMVFC_ASYNC_FMT: fmt_size = sizeof(*queue->msgs.async); @@ -5034,6 +5036,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost, return -EINVAL; } + if (ibmvfc_init_event_pool(vhost, queue, pool_size)) { + dev_err(dev, "Couldn't initialize event pool.\n"); + return -ENOMEM; + } + queue->msgs.handle = (void *)get_zeroed_page(GFP_KERNEL); if (!queue->msgs.handle) return -ENOMEM; -- 2.27.0
[PATCH v5 06/21] ibmvfc: add Subordinate CRQ definitions
Subordinate Command Response Queues (Sub CRQ) are used in conjunction with the primary CRQ when more than one queue is needed by the virtual IO adapter. Recent phyp firmware versions support Sub CRQ's with ibmvfc adapters. This feature is a prerequisite for supporting multiple hardware backed submission queues in the vfc adapter. The Sub CRQ command element differs from the standard CRQ in that it is 32bytes long as opposed to 16bytes for the latter. Despite this extra 16bytes the ibmvfc protocol will use the original CRQ command element mapped to the first 16bytes of the Sub CRQ element initially. Add definitions for the Sub CRQ command element and queue. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.h | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index dd6d89292867..b9eed05c165f 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -650,6 +650,11 @@ struct ibmvfc_crq { volatile __be64 ioba; } __packed __aligned(8); +struct ibmvfc_sub_crq { + struct ibmvfc_crq crq; + __be64 reserved[2]; +} __packed __aligned(8); + enum ibmvfc_ae_link_state { IBMVFC_AE_LS_LINK_UP= 0x01, IBMVFC_AE_LS_LINK_BOUNCED = 0x02, @@ -761,12 +766,14 @@ struct ibmvfc_event_pool { enum ibmvfc_msg_fmt { IBMVFC_CRQ_FMT = 0, IBMVFC_ASYNC_FMT, + IBMVFC_SUB_CRQ_FMT, }; union ibmvfc_msgs { void *handle; struct ibmvfc_crq *crq; struct ibmvfc_async_crq *async; + struct ibmvfc_sub_crq *scrq; }; struct ibmvfc_queue { @@ -781,6 +788,20 @@ struct ibmvfc_queue { struct list_head sent; struct list_head free; spinlock_t l_lock; + + /* Sub-CRQ fields */ + struct ibmvfc_host *vhost; + unsigned long cookie; + unsigned long vios_cookie; + unsigned long hw_irq; + unsigned long irq; + unsigned long hwq_id; + char name[32]; +}; + +struct ibmvfc_scsi_channels { + struct ibmvfc_queue *scrqs; + unsigned int active_queues; }; enum ibmvfc_host_action { -- 2.27.0
[PATCH v5 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
Introduce several new vhost fields for managing MQ state of the adapter as well as initial defaults for MQ enablement. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 8 drivers/scsi/ibmvscsi/ibmvfc.h | 9 + 2 files changed, 17 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index ba95438a8912..9200fe49c57e 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = { .max_sectors = IBMVFC_MAX_SECTORS, .shost_attrs = ibmvfc_attrs, .track_queue_depth = 1, + .host_tagset = 1, }; /** @@ -5290,6 +5291,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) shost->max_sectors = IBMVFC_MAX_SECTORS; shost->max_cmd_len = IBMVFC_MAX_CDB_LEN; shost->unique_id = shost->host_no; + shost->nr_hw_queues = IBMVFC_MQ ? IBMVFC_SCSI_HW_QUEUES : 1; vhost = shost_priv(shost); INIT_LIST_HEAD(>targets); @@ -5300,6 +5302,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) vhost->partition_number = -1; vhost->log_level = log_level; vhost->task_set = 1; + + vhost->mq_enabled = IBMVFC_MQ; + vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS; + vhost->using_channels = 0; + vhost->do_enquiry = 1; + strcpy(vhost->partition_name, "UNKNOWN"); init_waitqueue_head(>work_wait_q); init_waitqueue_head(>init_wait_q); diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index 632e977449c5..dd6d89292867 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -41,6 +41,11 @@ #define IBMVFC_DEFAULT_LOG_LEVEL 2 #define IBMVFC_MAX_CDB_LEN 16 #define IBMVFC_CLS3_ERROR 0 +#define IBMVFC_MQ 0 +#define IBMVFC_SCSI_CHANNELS 0 +#define IBMVFC_SCSI_HW_QUEUES 1 +#define IBMVFC_MIG_NO_SUB_TO_CRQ 0 +#define IBMVFC_MIG_NO_N_TO_M 0 /* * Ensure we have resources for ERP and initialization: @@ -840,6 +845,10 @@ struct ibmvfc_host { int delay_init; int scan_complete; int logged_in; + int mq_enabled; + int using_channels; + int do_enquiry; + int client_scsi_channels; int aborting_passthru; int events_to_log; #define IBMVFC_AE_LINKUP 0x0001 -- 2.27.0
[PATCH v5 00/21] ibmvfc: initial MQ development/enablement
Recent updates in pHyp Firmware and VIOS releases provide new infrastructure towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then negotiated with the VIOS via new Management Datagrams (MADs) for channel setup. This initial implementation adds the necessary Sub-CRQ framework and implements the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS HW backed channels. This latest series is completely rebased and reimplemented on top of the recent ("ibmvfc: MQ prepartory locking work") series. [1] [1] https://lore.kernel.org/linux-scsi/20210106201835.1053593-1-tyr...@linux.ibm.com/ changes in v5: * Addressed comments from brking in following patches: * Patch 18: Drop queue lock in loop after sending cancel event Remove cancel event from list after completion Return -EIO on unknown failure * Patch 21: Removed can_queue rebase artifact and range check user supplied nr_scsi_hw_queue value changes in v4: * Series rebased and reworked on top of previous ibmvfc locking series * Dropped all previous Reviewed-by tags changes in v3: * Patch 4: changed firmware support logging to dev_warn_once * Patch 6: adjusted locking * Patch 15: dropped logging verbosity, moved cancel event tracking into subqueue * Patch 17: removed write permission for migration module parameters drive hard reset after update to num of scsi channels changes in v2: * Patch 4: NULL'd scsi_scrq reference after deallocation * Patch 6: Added switch case to handle XPORT event * Patch 9: fixed ibmvfc_event leak and double free * added support for cancel command with MQ * added parameter toggles for MQ settings Tyrel Datwyler (21): ibmvfc: add vhost fields and defaults for MQ enablement ibmvfc: move event pool init/free routines ibmvfc: init/free event pool during queue allocation/free ibmvfc: add size parameter to ibmvfc_init_event_pool ibmvfc: define hcall wrapper for registering a Sub-CRQ ibmvfc: add Subordinate CRQ definitions ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels ibmvfc: add Sub-CRQ IRQ enable/disable routine ibmvfc: add handlers to drain and complete Sub-CRQ responses ibmvfc: define Sub-CRQ interrupt handler routine ibmvfc: map/request irq and register Sub-CRQ interrupt handler ibmvfc: implement channel enquiry and setup commands ibmvfc: advertise client support for using hardware channels ibmvfc: set and track hw queue in ibmvfc_event struct ibmvfc: send commands down HW Sub-CRQ when channelized ibmvfc: register Sub-CRQ handles with VIOS during channel setup ibmvfc: add cancel mad initialization helper ibmvfc: send Cancel MAD down each hw scsi channel ibmvfc: purge scsi channels after transport loss/reset ibmvfc: enable MQ and set reasonable defaults ibmvfc: provide modules parameters for MQ settings drivers/scsi/ibmvscsi/ibmvfc.c | 917 - drivers/scsi/ibmvscsi/ibmvfc.h | 39 ++ 2 files changed, 828 insertions(+), 128 deletions(-) -- 2.27.0
[PATCH v5 02/21] ibmvfc: move event pool init/free routines
The next patch in this series reworks the event pool allocation calls to happen within the individual queue allocation routines instead of as independent calls. Move the init/free routines earlier in ibmvfc.c to prevent undefined reference errors when calling these functions from the queue allocation code. No functional change. Signed-off-by: Tyrel Datwyler Reviewed-by: Brian King --- drivers/scsi/ibmvscsi/ibmvfc.c | 151 + 1 file changed, 76 insertions(+), 75 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 9200fe49c57e..cd9273a5fadb 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -716,6 +716,82 @@ static int ibmvfc_send_crq_init_complete(struct ibmvfc_host *vhost) return ibmvfc_send_crq(vhost, 0xC002LL, 0); } +/** + * ibmvfc_init_event_pool - Allocates and initializes the event pool for a host + * @vhost: ibmvfc host who owns the event pool + * + * Returns zero on success. + **/ +static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost, + struct ibmvfc_queue *queue) +{ + int i; + struct ibmvfc_event_pool *pool = >evt_pool; + + ENTER; + pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ; + pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL); + if (!pool->events) + return -ENOMEM; + + pool->iu_storage = dma_alloc_coherent(vhost->dev, + pool->size * sizeof(*pool->iu_storage), + >iu_token, 0); + + if (!pool->iu_storage) { + kfree(pool->events); + return -ENOMEM; + } + + INIT_LIST_HEAD(>sent); + INIT_LIST_HEAD(>free); + spin_lock_init(>l_lock); + + for (i = 0; i < pool->size; ++i) { + struct ibmvfc_event *evt = >events[i]; + + atomic_set(>free, 1); + evt->crq.valid = 0x80; + evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i)); + evt->xfer_iu = pool->iu_storage + i; + evt->vhost = vhost; + evt->queue = queue; + evt->ext_list = NULL; + list_add_tail(>queue_list, >free); + } + + LEAVE; + return 0; +} + +/** + * ibmvfc_free_event_pool - Frees memory of the event pool of a host + * @vhost: ibmvfc host who owns the event pool + * + **/ +static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost, + struct ibmvfc_queue *queue) +{ + int i; + struct ibmvfc_event_pool *pool = >evt_pool; + + ENTER; + for (i = 0; i < pool->size; ++i) { + list_del(>events[i].queue_list); + BUG_ON(atomic_read(>events[i].free) != 1); + if (pool->events[i].ext_list) + dma_pool_free(vhost->sg_pool, + pool->events[i].ext_list, + pool->events[i].ext_list_token); + } + + kfree(pool->events); + dma_free_coherent(vhost->dev, + pool->size * sizeof(*pool->iu_storage), + pool->iu_storage, pool->iu_token); + LEAVE; +} + /** * ibmvfc_free_queue - Deallocate queue * @vhost: ibmvfc host struct @@ -1312,81 +1388,6 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost) strncpy(login_info->drc_name, location, IBMVFC_MAX_NAME); } -/** - * ibmvfc_init_event_pool - Allocates and initializes the event pool for a host - * @vhost: ibmvfc host who owns the event pool - * - * Returns zero on success. - **/ -static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost, - struct ibmvfc_queue *queue) -{ - int i; - struct ibmvfc_event_pool *pool = >evt_pool; - - ENTER; - pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ; - pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL); - if (!pool->events) - return -ENOMEM; - - pool->iu_storage = dma_alloc_coherent(vhost->dev, - pool->size * sizeof(*pool->iu_storage), - >iu_token, 0); - - if (!pool->iu_storage) { - kfree(pool->events); - return -ENOMEM; - } - - INIT_LIST_HEAD(>sent); - INIT_LIST_HEAD(>free); - spin_lock_init(>l_lock); - - for (i = 0; i < pool->size; ++i) { - struct ibmvfc_event *evt = >events[i]; - atomic_set(>free, 1); - evt->crq.valid = 0x80; - evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i)); - evt->xfer_iu = pool->iu_storage + i; - evt->vhost = vhost; - evt->queue =
Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet
On Thu, 14 Jan 2021, Jakub Kicinski wrote: > On Thu, 14 Jan 2021 08:33:49 + Lee Jones wrote: > > On Wed, 13 Jan 2021, Jakub Kicinski wrote: > > > > > On Wed, 13 Jan 2021 16:41:16 + Lee Jones wrote: > > > > Resending the stragglers again. > > > > > > > > > > > > This set is part of a larger effort attempting to clean-up W=1 > > > > > > > > kernel builds, which are currently overwhelmingly riddled with > > > > > > > > niggly little warnings. > > > > > > > > > > > > > > > > v2: > > > > > > > > - Squashed IBM patches > > > > > > > > - Fixed real issue in SMSC > > > > - Added Andrew's Reviewed-by tags on remainder > > > > > > Does not apply, please rebase on net-next/master. > > > > These are based on Tuesday's next/master. > > What's next/master? I'm not sure if this is a joke, or not? :) next/master == Linux Next. The daily merged repo where all of the *-next branches end up to ensure interoperability. It's also the branch that is most heavily tested by the auto-builders to ensure the vast majority of issues are ironed out before hitting Mainline. > This is net-next: > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/ Looks like net-next gets merged into next/master: commit 452958f1f3d1c8980a8414f9c37c8c6de24c7d32 Merge: 1eabba209a17a f50e2f9f79164 Author: Stephen Rothwell Date: Thu Jan 14 10:35:40 2021 +1100 Merge remote-tracking branch 'net-next/master' So I'm not sure what it's conflicting with. Do you have patches in net-next that didn't make it into next/master for some reason? I'll try to rebase again tomorrow. Hopefully I am able to reproduce your issue by then. > > I just rebased them now with no issue. > > > > What conflict are you seeing? > > Applying: net: ethernet: smsc: smc91x: Fix function name in kernel-doc header > error: patch failed: drivers/net/ethernet/smsc/smc91x.c:2192 > error: drivers/net/ethernet/smsc/smc91x.c: patch does not apply > Patch failed at 0001 net: ethernet: smsc: smc91x: Fix function name in > kernel-doc header > hint: Use 'git am --show-current-patch=diff' to see the failed patch > When you have resolved this problem, run "git am --continue". > If you prefer to skip this patch, run "git am --skip" instead. > To restore the original branch and stop patching, run "git am --abort". -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
On Thu, 14 Jan 2021 13:09:37 + (UTC), Christophe Leroy wrote: > Commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO > descriptors") broke fsl spi driver. > > As now we fully rely on gpiolib for handling the polarity of > chip selects, the driver shall not alter the GPIO value anymore > when SPI_CS_HIGH is not set in spi->mode. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode commit: 7a2da5d7960a64ee923fe3e31f01a1101052c66f All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[PATCH 1/3] tty: hvcs: Drop unnecessary if block
If hvcs_probe() succeeded dev_set_drvdata() is called with a non-NULL value, and if hvcs_probe() failed hvcs_remove() isn't called. So there is no way dev_get_drvdata() can return NULL in hvcs_remove() and the check can just go away. Signed-off-by: Uwe Kleine-König --- drivers/tty/hvc/hvcs.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c index 509d1042825a..3e0461285c34 100644 --- a/drivers/tty/hvc/hvcs.c +++ b/drivers/tty/hvc/hvcs.c @@ -825,9 +825,6 @@ static int hvcs_remove(struct vio_dev *dev) unsigned long flags; struct tty_struct *tty; - if (!hvcsd) - return -ENODEV; - /* By this time the vty-server won't be getting any more interrupts */ spin_lock_irqsave(>lock, flags); -- 2.29.2
[PATCH 3/3] tty: vcc: Drop impossible to hit WARN_ON
vcc_get() returns the port that has provided port->index. As the port that is about to be removed isn't removed yet this trivially will find this port. So simplify the call to not assign an identical value to the port pointer and drop the warning that is never hit. Signed-off-by: Uwe Kleine-König --- drivers/tty/vcc.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c index d9b0dc6deae9..e2d6205f83ce 100644 --- a/drivers/tty/vcc.c +++ b/drivers/tty/vcc.c @@ -692,12 +692,9 @@ static int vcc_remove(struct vio_dev *vdev) tty_vhangup(port->tty); /* Get exclusive reference to VCC, ensures that there are no other -* clients to this port +* clients to this port. This cannot fail. */ - port = vcc_get(port->index, true); - - if (WARN_ON(!port)) - return -ENODEV; + vcc_get(port->index, true); tty_unregister_device(vcc_tty_driver, port->index); -- 2.29.2
[PATCH 0/3] tty: some cleanups in remove functions
Hello, while working on changing the prototype of struct vio_driver::remove to return void I noticed a few exit paths in such callbacks that return an error code. This is a bad thing because the return value is ignored (which is the motivation to make it void) and the corresponding device then ends in some limbo state. Luckily for the three offenders here these cases cannot happen and are simplified accordingly. This then makes the patch that changes the remove callback's prototype simpler because it only changes prototypes and drops "return 0"s. Best regards and thanks for considering, Uwe Kleine-König Uwe Kleine-König (3): tty: hvcs: Drop unnecessary if block tty: vcc: Drop unnecessary if block tty: vcc: Drop impossible to hit WARN_ON drivers/tty/hvc/hvcs.c | 3 --- drivers/tty/vcc.c | 10 ++ 2 files changed, 2 insertions(+), 11 deletions(-) -- 2.29.2
[PATCH 2/3] tty: vcc: Drop unnecessary if block
If vcc_probe() succeeded dev_set_drvdata() is called with a non-NULL value, and if vcc_probe() failed vcc_remove() isn't called. So there is no way dev_get_drvdata() can return NULL in vcc_remove() and the check can just go away. Signed-off-by: Uwe Kleine-König --- drivers/tty/vcc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c index 9ffd42e333b8..d9b0dc6deae9 100644 --- a/drivers/tty/vcc.c +++ b/drivers/tty/vcc.c @@ -681,9 +681,6 @@ static int vcc_remove(struct vio_dev *vdev) { struct vcc_port *port = dev_get_drvdata(>dev); - if (!port) - return -ENODEV; - del_timer_sync(>rx_timer); del_timer_sync(>tx_timer); -- 2.29.2
Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
On 1/13/21 7:27 PM, Ming Lei wrote: > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote: >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote: >>> On 1/12/21 2:54 PM, Brian King wrote: On 1/11/21 5:12 PM, Tyrel Datwyler wrote: > Introduce several new vhost fields for managing MQ state of the adapter > as well as initial defaults for MQ enablement. > > Signed-off-by: Tyrel Datwyler > --- > drivers/scsi/ibmvscsi/ibmvfc.c | 8 > drivers/scsi/ibmvscsi/ibmvfc.h | 9 + > 2 files changed, 17 insertions(+) > > diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c > b/drivers/scsi/ibmvscsi/ibmvfc.c > index ba95438a8912..9200fe49c57e 100644 > --- a/drivers/scsi/ibmvscsi/ibmvfc.c > +++ b/drivers/scsi/ibmvscsi/ibmvfc.c > @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = { > .max_sectors = IBMVFC_MAX_SECTORS, > .shost_attrs = ibmvfc_attrs, > .track_queue_depth = 1, > + .host_tagset = 1, This doesn't seem right. You are setting host_tagset, which means you want a shared, host wide, tag set for commands. It also means that the total queue depth for the host is can_queue. However, it looks like you are allocating max_requests events for each sub crq, which means you are over allocating memory. >>> >>> With the shared tagset yes the queue depth for the host is can_queue, but >>> this >>> also implies that the max queue depth for each hw queue is also can_queue. >>> So, >>> in the worst case that all commands are queued down the same hw queue we >>> need an >>> event pool with can_queue commands. >>> Looking at this closer, we might have bigger problems. There is a host wide max number of commands that the VFC host supports, which gets returned on NPIV Login. This value can change across a live migration event. >>> >>> From what I understand the max commands can only become less. >>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies can_queue on the scsi_host *after* the tag set has been allocated. This looks to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like we look at can_queue once the tag set is setup, and I'm not seeing a good way to dynamically change the host queue depth once the tag set is setup. Unless I'm missing something, our best options appear to either be to implement our own host wide busy reference counting, which doesn't sound very good, or we need to add some API to block / scsi that allows us to dynamically change can_queue. >>> >>> Changing can_queue won't do use any good with the shared tagset becasue each >>> queue still needs to be able to queue can_queue number of commands in the >>> worst >>> case. >> >> The issue I'm trying to highlight here is the following scenario: >> >> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag >> set. >> >> 2. On our NPIV login response from the VIOS, we might get a lower value than >> we >> initially set in shost->can_queue, so we update it, but nobody ever looks at >> it >> again, and we don't have any protection against sending too many commands to >> the host. >> >> >> Basically, we no longer have any code that ensures we don't send more >> commands to the VIOS than we are told it supports. According to the >> architecture, >> if we actually do this, the VIOS will do an h_free_crq, which would be a bit >> of a bug on our part. >> >> I don't think it was ever clearly defined in the API that a driver can >> change shost->can_queue after calling scsi_add_host, but up until >> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now >> it doesn't. > > Actually it isn't related with commit 6eb045e092ef, because > blk_mq_alloc_tag_set() > uses .can_queue to create driver tag sbitmap and request pool. > > So even thought without 6eb045e092ef, the updated .can_queue can't work > as expected because the max driver tag depth has been fixed by blk-mq already. There are two scenarios here. In the scenario of someone increasing can_queue after the tag set is allocated, I agree, blk-mq will never take advantage of this. However, in the scenario of someone *decreasing* can_queue after the tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code provided this protection. > > What 6eb045e092ef does is just to remove the double check on max > host-wide allowed commands because that has been respected by blk-mq > driver tag allocation already. > >> >> I started looking through drivers that do this, and so far, it looks like the >> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely >> others... >> >> We probably need an API that lets us change shost->can_queue dynamically. > > I'd suggest to confirm changing .can_queue is one real usecase. For ibmvfc, the total number of
Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool
On 1/14/21 1:08 AM, Claire Chang wrote: > On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli wrote: >> >> On 1/5/21 7:41 PM, Claire Chang wrote: >>> If a device is not behind an IOMMU, we look up the device node and set >>> up the restricted DMA when the restricted-dma-pool is presented. >>> >>> Signed-off-by: Claire Chang >>> --- >> >> [snip] >> >>> +int of_dma_set_restricted_buffer(struct device *dev) >>> +{ >>> + struct device_node *node; >>> + int count, i; >>> + >>> + if (!dev->of_node) >>> + return 0; >>> + >>> + count = of_property_count_elems_of_size(dev->of_node, "memory-region", >>> + sizeof(phandle)); >> >> You could have an early check for count < 0, along with an error >> message, if that is deemed useful. >> >>> + for (i = 0; i < count; i++) { >>> + node = of_parse_phandle(dev->of_node, "memory-region", i); >>> + if (of_device_is_compatible(node, "restricted-dma-pool")) >> >> And you may want to add here an of_device_is_available(node). A platform >> that provides the Device Tree firmware and try to support multiple >> different SoCs may try to determine if an IOMMU is present, and if it >> is, it could be marking the restriced-dma-pool region with a 'status = >> "disabled"' property, or any variant of that scheme. > > This function is called only when there is no IOMMU present (check in > drivers/of/device.c). I can still add of_device_is_available(node) > here if you think it's helpful. I believe it is, since boot loader can have a shared Device Tree blob skeleton and do various adaptations based on the chip (that's what we do) and adding a status property is much simpler than insertion new nodes are run time. > >> >>> + return of_reserved_mem_device_init_by_idx( >>> + dev, dev->of_node, i); >> >> This does not seem to be supporting more than one memory region, did not >> you want something like instead: >> >> ret = of_reserved_mem_device_init_by_idx(...); >> if (ret) >> return ret; >> > > Yes. This implement only supports one restriced-dma-pool memory region > with the assumption that there is only one memory region with the > compatible string, restricted-dma-pool, in the dts. IIUC, it's similar > to shared-dma-pool. Then if here is such a known limitation it should be both documented and enforced here, you shouldn ot be iterating over all of the phandles that you find, stop at the first one and issue a warning if count > 1? -- Florian
Re: [PATCH v5 03/21] powerpc: remove arguments from fault handler functions
Le 13/01/2021 à 08:31, Nicholas Piggin a écrit : Make mm fault handlers all just take the pt_regs * argument and load DAR/DSISR from that. Make those that return a value return long. This is done to make the function signatures match other handlers, which will help with a future patch to add wrappers. Explicit arguments could be added for performance but that would require more wrapper macro variants. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/asm-prototypes.h | 4 ++-- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/bug.h| 2 +- arch/powerpc/kernel/entry_32.S| 6 +- arch/powerpc/kernel/exceptions-64e.S | 2 -- arch/powerpc/kernel/exceptions-64s.S | 14 ++ arch/powerpc/kernel/head_40x.S| 10 +- arch/powerpc/kernel/head_8xx.S| 6 +++--- arch/powerpc/kernel/head_book3s_32.S | 5 ++--- arch/powerpc/kernel/head_booke.h | 4 +--- arch/powerpc/mm/book3s64/hash_utils.c | 8 +--- arch/powerpc/mm/book3s64/slb.c| 11 +++ arch/powerpc/mm/fault.c | 7 --- 13 files changed, 34 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 238eacfda7b0..a32157ce0551 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -277,7 +277,7 @@ reenable_mmu: * r3 can be different from GPR3(r1) at this point, r9 and r11 * contains the old MSR and handler address respectively, * r4 & r5 can contain page fault arguments that need to be passed The line above should be dropped as well (its end on the line below is dropped already) -* along as well. r0, r6-r8, r12, CCR, CTR, XER etc... are left +* r0, r4-r8, r12, CCR, CTR, XER etc... are left * clobbered as they aren't useful past this point. */ Christophe
Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
On Thu, Jan 14, 2021 at 02:42:26PM +0100, Christophe Leroy wrote: > Le 14/01/2021 à 14:22, Mark Brown a écrit : > > For GPIO chipselects you should really fix the driver to just hand the > > GPIO off to the core rather than trying to implement this itself, that > > will avoid driver specific differences like this. > IIUC, it is not trivial as it requires implementing transfer_one() instead > of the existing transfer_one_message() in the driver. Am I right ? Yes, that's a good idea in general though. It should normally be pretty simple since the conversion is mostly just deleting code doing things which will be handled by the core. > What's the difference/benefit of transfer_one() compared to the existing > transfer_one_message() ? It factors out all the handling of chip selects, including per-transfer chip select inversion and so on, and also factors out all the handling of in-message delays. If nothing else it reduces the amount of duplicated code that might require maintainance can have issues with misaligned expectations from the core or client drivers. signature.asc Description: PGP signature
Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
Le 14/01/2021 à 14:22, Mark Brown a écrit : For GPIO chipselects you should really fix the driver to just hand the GPIO off to the core rather than trying to implement this itself, that will avoid driver specific differences like this. IIUC, it is not trivial as it requires implementing transfer_one() instead of the existing transfer_one_message() in the driver. Am I right ? What's the difference/benefit of transfer_one() compared to the existing transfer_one_message() ?
Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
Le 14/01/2021 à 14:17, Nicholas Piggin a écrit : Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm: Le 14/01/2021 à 13:09, Nicholas Piggin a écrit : Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm: Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am: Le 13/01/2021 à 08:31, Nicholas Piggin a écrit : The page fault handling still has some complex logic particularly around hash table handling, in asm. Implement this in C instead. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 + arch/powerpc/kernel/exceptions-64s.S | 131 +++--- arch/powerpc/mm/book3s64/hash_utils.c | 77 ++ arch/powerpc/mm/fault.c | 46 -- 4 files changed, 107 insertions(+), 148 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 066b1d34c7bc..60a669379aa0 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn, #define HPTE_NOHPTE_UPDATE 0x2 #define HPTE_USE_KERNEL_KEY 0x4 +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr); extern int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, unsigned long flags, int ssize, int subpage_prot); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6e53f7638737..bcb5e81d2088 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) * * Handling: * - Hash MMU - * Go to do_hash_page first to see if the HPT can be filled from an entry in - * the Linux page table. Hash faults can hit in kernel mode in a fairly + * Go to do_hash_fault, which attempts to fill the HPT from an entry in the + * Linux page table. Hash faults can hit in kernel mode in a fairly * arbitrary state (e.g., interrupts disabled, locks held) when accessing * "non-bolted" regions, e.g., vmalloc space. However these should always be - * backed by Linux page tables. + * backed by Linux page table entries. * - * If none is found, do a Linux page fault. Linux page faults can happen in - * kernel mode due to user copy operations of course. + * If no entry is found the Linux page fault handler is invoked (by + * do_hash_fault). Linux page faults can happen in kernel mode due to user + * copy operations of course. * * KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest * MMU context, which may cause a DSI in the host, which must go to the @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common) GEN_COMMON data_access ld r4,_DAR(r1) ld r5,_DSISR(r1) We have DSISR here. I think the dispatch between page fault or do_break() should be done here: - It would be more similar to other arches Other sub-archs? - Would avoid doing it also in instruction fault True but it's hidden under an unlikely branch so won't really help instruction fault. - Would avoid that -1 return which looks more like a hack. I don't really see it as a hack, we return a code to asm caller to direct whether to restore registers or not, we alrady have this pattern. (I'm hoping all that might be go away one day by conrolling NV regs from C if we can get good code generation but even if not we still have it in the interrupt returns). That said I will give it a try here. At very least it might be a better intermediate step. Ah yes, this way doesn't work well for later patches because you end e.g., with the do_break call having to call the interrupt handler wrappers again when they actually expect to be in the asm entry state (e.g., irq soft-mask state) when called, and return via interrupt_return after the exit wrapper runs (which 64s uses to implement better context tracking for example). That could possibly be hacked up to deal with multiple interrupt wrappers per interrupt, but I'd rather not go backwards. That does leave the other sub archs as having this issue, but they don't do so much in their handlers. 32 doesn't have soft-mask or context tracking to deal with for example. We will need to fix this up though and unify things more. Not sure I understand what you mean exactly. On the 8xx, do_break() is called by totally different exceptions: - Exception 0x1c00 Data breakpoint ==> do_break() - Exception 0x1300 Instruction TLB error ==> handle_page_fault() - Exception 0x1400 Data TLB error ==> handle_page_fault() On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or handle_page_fault() being called from very
Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
On Thu, Jan 14, 2021 at 01:33:50PM +0100, Christophe Leroy wrote: > Le 14/01/2021 à 12:59, Mark Brown a écrit : > > On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote: > > > Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW. > > > If I declare them as ACTIVE_HIGH instead, then I also have to set > > > spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and > > > forces the GPIO ACTIVE LOW. > > > When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode. > > OK, so it sounds like you want SPI_CS_HIGH and that is being set > > correctly? > > > In fsl_spi_chipselect(), we have > > > > > > bool pol = spi->mode & SPI_CS_HIGH > > > > > > Then > > > pdata->cs_control(spi, pol); > > > So changing the board config is compensated by the above, and at the end > > > it still doesn't work. > > This is a driver bug, the driver set_cs() operation should not be > > modifying the value it is told to set. > A driver bug ? Or maybe a change forgotten in commit 766c6b63aa04 ("spi: > fix client driver breakages when using GPIO descriptors") ? The expectation that the driver will be using the chip select exactly as passed in and not attempting to implement SPI_CS_HIGH itself when it has set_cs() has been there for a considerable time now, that's not new with the cleanup. Drivers should only be paying attention to SPI_CS_HIGH in cases where the hardware controls the chip select autonomously and so set_cs() can't be provided. > I'm almost sure it was not a bug, it is in line which what is said in > the comment removed by the above mentionned commit. Please take a look at the list archive discussions around this - there's been a lot of confusion with GPIO descriptors in particular due to there being multiple places where you can set the inversion. Note that the situation you describe above is that you end up with all the various places other than your driver agreeing that the chip select is active high as it (AFAICT from what you're saying) actually is. For GPIO chipselects you should really fix the driver to just hand the GPIO off to the core rather than trying to implement this itself, that will avoid driver specific differences like this. signature.asc Description: PGP signature
Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm: > > > Le 14/01/2021 à 13:09, Nicholas Piggin a écrit : >> Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm: >>> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am: Le 13/01/2021 à 08:31, Nicholas Piggin a écrit : > The page fault handling still has some complex logic particularly around > hash table handling, in asm. Implement this in C instead. > > Signed-off-by: Nicholas Piggin > --- >arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 + >arch/powerpc/kernel/exceptions-64s.S | 131 +++--- >arch/powerpc/mm/book3s64/hash_utils.c | 77 ++ >arch/powerpc/mm/fault.c | 46 -- >4 files changed, 107 insertions(+), 148 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > index 066b1d34c7bc..60a669379aa0 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long > vpn, >#define HPTE_NOHPTE_UPDATE 0x2 >#define HPTE_USE_KERNEL_KEY0x4 > > +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long > dsisr); >extern int __hash_page_4K(unsigned long ea, unsigned long access, > unsigned long vsid, pte_t *ptep, unsigned > long trap, > unsigned long flags, int ssize, int > subpage_prot); > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 6e53f7638737..bcb5e81d2088 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) > * > * Handling: > * - Hash MMU > - * Go to do_hash_page first to see if the HPT can be filled from an > entry in > - * the Linux page table. Hash faults can hit in kernel mode in a fairly > + * Go to do_hash_fault, which attempts to fill the HPT from an entry > in the > + * Linux page table. Hash faults can hit in kernel mode in a fairly > * arbitrary state (e.g., interrupts disabled, locks held) when > accessing > * "non-bolted" regions, e.g., vmalloc space. However these should > always be > - * backed by Linux page tables. > + * backed by Linux page table entries. > * > - * If none is found, do a Linux page fault. Linux page faults can > happen in > - * kernel mode due to user copy operations of course. > + * If no entry is found the Linux page fault handler is invoked (by > + * do_hash_fault). Linux page faults can happen in kernel mode due to > user > + * copy operations of course. > * > * KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in > guest > * MMU context, which may cause a DSI in the host, which must go to > the > @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common) > GEN_COMMON data_access > ld r4,_DAR(r1) > ld r5,_DSISR(r1) We have DSISR here. I think the dispatch between page fault or do_break() should be done here: - It would be more similar to other arches >>> >>> Other sub-archs? >>> - Would avoid doing it also in instruction fault >>> >>> True but it's hidden under an unlikely branch so won't really help >>> instruction fault. >>> - Would avoid that -1 return which looks more like a hack. >>> >>> I don't really see it as a hack, we return a code to asm caller to >>> direct whether to restore registers or not, we alrady have this >>> pattern. >>> >>> (I'm hoping all that might be go away one day by conrolling NV >>> regs from C if we can get good code generation but even if not we >>> still have it in the interrupt returns). >>> >>> That said I will give it a try here. At very least it might be a >>> better intermediate step. >> >> Ah yes, this way doesn't work well for later patches because you end >> e.g., with the do_break call having to call the interrupt handler >> wrappers again when they actually expect to be in the asm entry state >> (e.g., irq soft-mask state) when called, and return via interrupt_return >> after the exit wrapper runs (which 64s uses to implement better context >> tracking for example). >> >> That could possibly be hacked up to deal with multiple interrupt >> wrappers per interrupt, but I'd rather not go backwards. >> >> That does leave the other sub archs as having this issue, but they don't >> do so much in their handlers. 32 doesn't have soft-mask or context >> tracking to deal
[PATCH] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
Commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") broke fsl spi driver. As now we fully rely on gpiolib for handling the polarity of chip selects, the driver shall not alter the GPIO value anymore when SPI_CS_HIGH is not set in spi->mode. Fixes: 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- drivers/spi/spi-fsl-spi.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c index 649766808caf..9e297b2715f6 100644 --- a/drivers/spi/spi-fsl-spi.c +++ b/drivers/spi/spi-fsl-spi.c @@ -115,14 +115,13 @@ static void fsl_spi_chipselect(struct spi_device *spi, int value) { struct mpc8xxx_spi *mpc8xxx_spi = spi_master_get_devdata(spi->master); struct fsl_spi_platform_data *pdata; - bool pol = spi->mode & SPI_CS_HIGH; struct spi_mpc8xxx_cs *cs = spi->controller_state; pdata = spi->dev.parent->parent->platform_data; if (value == BITBANG_CS_INACTIVE) { if (pdata->cs_control) - pdata->cs_control(spi, !pol); + pdata->cs_control(spi, false); } if (value == BITBANG_CS_ACTIVE) { @@ -134,7 +133,7 @@ static void fsl_spi_chipselect(struct spi_device *spi, int value) fsl_spi_change_mode(spi); if (pdata->cs_control) - pdata->cs_control(spi, pol); + pdata->cs_control(spi, true); } } -- 2.25.0
Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Excerpts from Michal Suchánek's message of January 14, 2021 10:40 pm: > On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote: >> Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am: >> > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am: >> >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote: >> >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm: >> >>> > Michal Suchánek writes: >> >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote: >> >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am: >> >>> >>> > Hello, >> >>> >>> > >> >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 >> >>> >>> > ("powerpc/64s: >> >>> >>> > Reimplement book3s idle code in C"). >> >>> >>> > >> >>> >>> > The symptom is host locking up completely after some hours of KVM >> >>> >>> > workload with messages like >> >>> >>> > >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab >> >>> >>> > cpu 47 >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab >> >>> >>> > cpu 71 >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab >> >>> >>> > cpu 47 >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab >> >>> >>> > cpu 71 >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab >> >>> >>> > cpu 47 >> >>> >>> > >> >>> >>> > printed before the host locks up. >> >>> >>> > >> >>> >>> > The machines run sandboxed builds which is a mixed workload >> >>> >>> > resulting in >> >>> >>> > IO/single core/mutiple core load over time and there are periods >> >>> >>> > of no >> >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM >> >>> >>> > setup/terdown is somewhat excercised as well. >> >>> >>> > >> >>> >>> > POWER9 with the new guest entry fast path does not seem to be >> >>> >>> > affected. >> >>> >>> > >> >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and >> >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR >> >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel >> >>> >>> > seems >> >>> >>> > stable. >> >>> >>> > >> >>> >>> > Config is attached. >> >>> >>> > >> >>> >>> > I cannot easily revert this commit, especially if I want to use >> >>> >>> > the same >> >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are >> >>> >>> > applicable >> >>> >>> > only to the new idle code. >> >>> >>> > >> >>> >>> > Any idea what can be the problem? >> >>> >>> >> >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on >> >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a >> >>> >>> good >> >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from >> >>> >>> the >> >>> >>> BMC unfortunately. >> >>> >> >> >>> >> It may be possible to set up fadump with a later kernel version that >> >>> >> supports it on powernv and dump the whole kernel. >> >>> > >> >>> > Your firmware won't support it AFAIK. >> >>> > >> >>> > You could try kdump, but if we have CPUs stuck in KVM then there's a >> >>> > good chance it won't work :/ >> >>> >> >>> I haven't had any luck yet reproducing this still. Testing with sub >> >>> cores of various different combinations, etc. I'll keep trying though. >> >> >> >> Hello, >> >> >> >> I tried running some KVM guests to simulate the workload and what I get >> >> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9 >> >> kernel and qemu 4.2.1 and 5.1.0 >> >> >> >> To start some guests I run >> >> >> >> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel >> >> kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults >> >> -nographic -serial mon:telnet::444$i,server,wait & done >> >> >> >> To simulate some workload I run >> >> >> >> xz -zc9T0 < /dev/zero > /dev/null & >> >> while true; do >> >> killall -STOP xz; sleep 1; killall -CONT xz; sleep 1; >> >> done & >> >> >> >> on the host and add a job that executes this to the ramdisk. However, most >> >> guests never get to the point where the job is executed. >> >> >> >> Any idea what might be the problem? >> > >> > I would say try without pv queued spin locks (but if the same thing is >> > happening with 5.3 then it must be something else I guess). >> > >> > I'll try to test a similar setup on a POWER8 here. >> >> Couldn't reproduce the guest hang, they seem to run fine even with >> queued spinlocks. Might have a different .config. >> >> I might have got a lockup in the host (although different symptoms than >> the original report). I'll look into that a bit further. > > Hello, > > any progress on this? No progress, I still wasn't able to reproduce, and it fell off the radar sorry. I expect hwthred_state must be getting corrupted somewhere or a secondary thread getting stuck
Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote: > Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am: > > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am: > >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote: > >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm: > >>> > Michal Suchánek writes: > >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote: > >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am: > >>> >>> > Hello, > >>> >>> > > >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s: > >>> >>> > Reimplement book3s idle code in C"). > >>> >>> > > >>> >>> > The symptom is host locking up completely after some hours of KVM > >>> >>> > workload with messages like > >>> >>> > > >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab > >>> >>> > cpu 47 > >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab > >>> >>> > cpu 71 > >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab > >>> >>> > cpu 47 > >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab > >>> >>> > cpu 71 > >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab > >>> >>> > cpu 47 > >>> >>> > > >>> >>> > printed before the host locks up. > >>> >>> > > >>> >>> > The machines run sandboxed builds which is a mixed workload > >>> >>> > resulting in > >>> >>> > IO/single core/mutiple core load over time and there are periods of > >>> >>> > no > >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM > >>> >>> > setup/terdown is somewhat excercised as well. > >>> >>> > > >>> >>> > POWER9 with the new guest entry fast path does not seem to be > >>> >>> > affected. > >>> >>> > > >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and > >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR > >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel > >>> >>> > seems > >>> >>> > stable. > >>> >>> > > >>> >>> > Config is attached. > >>> >>> > > >>> >>> > I cannot easily revert this commit, especially if I want to use the > >>> >>> > same > >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are > >>> >>> > applicable > >>> >>> > only to the new idle code. > >>> >>> > > >>> >>> > Any idea what can be the problem? > >>> >>> > >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on > >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a > >>> >>> good > >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from > >>> >>> the > >>> >>> BMC unfortunately. > >>> >> > >>> >> It may be possible to set up fadump with a later kernel version that > >>> >> supports it on powernv and dump the whole kernel. > >>> > > >>> > Your firmware won't support it AFAIK. > >>> > > >>> > You could try kdump, but if we have CPUs stuck in KVM then there's a > >>> > good chance it won't work :/ > >>> > >>> I haven't had any luck yet reproducing this still. Testing with sub > >>> cores of various different combinations, etc. I'll keep trying though. > >> > >> Hello, > >> > >> I tried running some KVM guests to simulate the workload and what I get > >> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9 > >> kernel and qemu 4.2.1 and 5.1.0 > >> > >> To start some guests I run > >> > >> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel > >> kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults > >> -nographic -serial mon:telnet::444$i,server,wait & done > >> > >> To simulate some workload I run > >> > >> xz -zc9T0 < /dev/zero > /dev/null & > >> while true; do > >> killall -STOP xz; sleep 1; killall -CONT xz; sleep 1; > >> done & > >> > >> on the host and add a job that executes this to the ramdisk. However, most > >> guests never get to the point where the job is executed. > >> > >> Any idea what might be the problem? > > > > I would say try without pv queued spin locks (but if the same thing is > > happening with 5.3 then it must be something else I guess). > > > > I'll try to test a similar setup on a POWER8 here. > > Couldn't reproduce the guest hang, they seem to run fine even with > queued spinlocks. Might have a different .config. > > I might have got a lockup in the host (although different symptoms than > the original report). I'll look into that a bit further. Hello, any progress on this? I considered reinstating the old assembly code for POWER[78] but even the way it's called has changed slightly. Thanks Michal
Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
Le 14/01/2021 à 12:59, Mark Brown a écrit : On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote: Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW. If I declare them as ACTIVE_HIGH instead, then I also have to set spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and forces the GPIO ACTIVE LOW. When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode. OK, so it sounds like you want SPI_CS_HIGH and that is being set correctly? In fsl_spi_chipselect(), we have bool pol = spi->mode & SPI_CS_HIGH Then pdata->cs_control(spi, pol); So changing the board config is compensated by the above, and at the end it still doesn't work. This is a driver bug, the driver set_cs() operation should not be modifying the value it is told to set. A driver bug ? Or maybe a change forgotten in commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") ? I'm almost sure it was not a bug, it is in line which what is said in the comment removed by the above mentionned commit. I'll send a fix. Christophe
Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
Le 14/01/2021 à 13:09, Nicholas Piggin a écrit : Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm: Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am: Le 13/01/2021 à 08:31, Nicholas Piggin a écrit : The page fault handling still has some complex logic particularly around hash table handling, in asm. Implement this in C instead. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 + arch/powerpc/kernel/exceptions-64s.S | 131 +++--- arch/powerpc/mm/book3s64/hash_utils.c | 77 ++ arch/powerpc/mm/fault.c | 46 -- 4 files changed, 107 insertions(+), 148 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 066b1d34c7bc..60a669379aa0 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn, #define HPTE_NOHPTE_UPDATE 0x2 #define HPTE_USE_KERNEL_KEY 0x4 +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr); extern int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, unsigned long flags, int ssize, int subpage_prot); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6e53f7638737..bcb5e81d2088 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) * * Handling: * - Hash MMU - * Go to do_hash_page first to see if the HPT can be filled from an entry in - * the Linux page table. Hash faults can hit in kernel mode in a fairly + * Go to do_hash_fault, which attempts to fill the HPT from an entry in the + * Linux page table. Hash faults can hit in kernel mode in a fairly * arbitrary state (e.g., interrupts disabled, locks held) when accessing * "non-bolted" regions, e.g., vmalloc space. However these should always be - * backed by Linux page tables. + * backed by Linux page table entries. * - * If none is found, do a Linux page fault. Linux page faults can happen in - * kernel mode due to user copy operations of course. + * If no entry is found the Linux page fault handler is invoked (by + * do_hash_fault). Linux page faults can happen in kernel mode due to user + * copy operations of course. * * KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest * MMU context, which may cause a DSI in the host, which must go to the @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common) GEN_COMMON data_access ld r4,_DAR(r1) ld r5,_DSISR(r1) We have DSISR here. I think the dispatch between page fault or do_break() should be done here: - It would be more similar to other arches Other sub-archs? - Would avoid doing it also in instruction fault True but it's hidden under an unlikely branch so won't really help instruction fault. - Would avoid that -1 return which looks more like a hack. I don't really see it as a hack, we return a code to asm caller to direct whether to restore registers or not, we alrady have this pattern. (I'm hoping all that might be go away one day by conrolling NV regs from C if we can get good code generation but even if not we still have it in the interrupt returns). That said I will give it a try here. At very least it might be a better intermediate step. Ah yes, this way doesn't work well for later patches because you end e.g., with the do_break call having to call the interrupt handler wrappers again when they actually expect to be in the asm entry state (e.g., irq soft-mask state) when called, and return via interrupt_return after the exit wrapper runs (which 64s uses to implement better context tracking for example). That could possibly be hacked up to deal with multiple interrupt wrappers per interrupt, but I'd rather not go backwards. That does leave the other sub archs as having this issue, but they don't do so much in their handlers. 32 doesn't have soft-mask or context tracking to deal with for example. We will need to fix this up though and unify things more. Not sure I understand what you mean exactly. On the 8xx, do_break() is called by totally different exceptions: - Exception 0x1c00 Data breakpoint ==> do_break() - Exception 0x1300 Instruction TLB error ==> handle_page_fault() - Exception 0x1400 Data TLB error ==> handle_page_fault() On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or handle_page_fault() being called from very early in ASM. If you do the same in book3s/64, then there is no issue with interrupt wrappers being called twice, is it ? Christophe
Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm: > Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am: >> >> >> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit : >>> The page fault handling still has some complex logic particularly around >>> hash table handling, in asm. Implement this in C instead. >>> >>> Signed-off-by: Nicholas Piggin >>> --- >>> arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 + >>> arch/powerpc/kernel/exceptions-64s.S | 131 +++--- >>> arch/powerpc/mm/book3s64/hash_utils.c | 77 ++ >>> arch/powerpc/mm/fault.c | 46 -- >>> 4 files changed, 107 insertions(+), 148 deletions(-) >>> >>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> index 066b1d34c7bc..60a669379aa0 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h >>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn, >>> #define HPTE_NOHPTE_UPDATE0x2 >>> #define HPTE_USE_KERNEL_KEY 0x4 >>> >>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long >>> dsisr); >>> extern int __hash_page_4K(unsigned long ea, unsigned long access, >>> unsigned long vsid, pte_t *ptep, unsigned long trap, >>> unsigned long flags, int ssize, int subpage_prot); >>> diff --git a/arch/powerpc/kernel/exceptions-64s.S >>> b/arch/powerpc/kernel/exceptions-64s.S >>> index 6e53f7638737..bcb5e81d2088 100644 >>> --- a/arch/powerpc/kernel/exceptions-64s.S >>> +++ b/arch/powerpc/kernel/exceptions-64s.S >>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) >>>* >>>* Handling: >>>* - Hash MMU >>> - * Go to do_hash_page first to see if the HPT can be filled from an >>> entry in >>> - * the Linux page table. Hash faults can hit in kernel mode in a fairly >>> + * Go to do_hash_fault, which attempts to fill the HPT from an entry in >>> the >>> + * Linux page table. Hash faults can hit in kernel mode in a fairly >>>* arbitrary state (e.g., interrupts disabled, locks held) when >>> accessing >>>* "non-bolted" regions, e.g., vmalloc space. However these should >>> always be >>> - * backed by Linux page tables. >>> + * backed by Linux page table entries. >>>* >>> - * If none is found, do a Linux page fault. Linux page faults can happen >>> in >>> - * kernel mode due to user copy operations of course. >>> + * If no entry is found the Linux page fault handler is invoked (by >>> + * do_hash_fault). Linux page faults can happen in kernel mode due to >>> user >>> + * copy operations of course. >>>* >>>* KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest >>>* MMU context, which may cause a DSI in the host, which must go to the >>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common) >>> GEN_COMMON data_access >>> ld r4,_DAR(r1) >>> ld r5,_DSISR(r1) >> >> We have DSISR here. I think the dispatch between page fault or do_break() >> should be done here: >> - It would be more similar to other arches > > Other sub-archs? > >> - Would avoid doing it also in instruction fault > > True but it's hidden under an unlikely branch so won't really help > instruction fault. > >> - Would avoid that -1 return which looks more like a hack. > > I don't really see it as a hack, we return a code to asm caller to > direct whether to restore registers or not, we alrady have this > pattern. > > (I'm hoping all that might be go away one day by conrolling NV > regs from C if we can get good code generation but even if not we > still have it in the interrupt returns). > > That said I will give it a try here. At very least it might be a > better intermediate step. Ah yes, this way doesn't work well for later patches because you end e.g., with the do_break call having to call the interrupt handler wrappers again when they actually expect to be in the asm entry state (e.g., irq soft-mask state) when called, and return via interrupt_return after the exit wrapper runs (which 64s uses to implement better context tracking for example). That could possibly be hacked up to deal with multiple interrupt wrappers per interrupt, but I'd rather not go backwards. That does leave the other sub archs as having this issue, but they don't do so much in their handlers. 32 doesn't have soft-mask or context tracking to deal with for example. We will need to fix this up though and unify things more. Thanks, Nick
Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote: > Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW. > If I declare them as ACTIVE_HIGH instead, then I also have to set > spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and > forces the GPIO ACTIVE LOW. > When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode. OK, so it sounds like you want SPI_CS_HIGH and that is being set correctly? > In fsl_spi_chipselect(), we have > > bool pol = spi->mode & SPI_CS_HIGH > > Then > pdata->cs_control(spi, pol); > So changing the board config is compensated by the above, and at the end it > still doesn't work. This is a driver bug, the driver set_cs() operation should not be modifying the value it is told to set. signature.asc Description: PGP signature
[PATCH 11/18] arch: powerpc: Remove oprofile
The previous commit already disabled building oprofile, lets remove the oprofile directory now. Suggested-by: Christoph Hellwig Suggested-by: Linus Torvalds Signed-off-by: Viresh Kumar --- arch/powerpc/oprofile/Makefile | 19 - arch/powerpc/oprofile/backtrace.c | 120 -- arch/powerpc/oprofile/cell/pr_util.h | 110 -- arch/powerpc/oprofile/cell/spu_profiler.c | 248 --- arch/powerpc/oprofile/cell/spu_task_sync.c | 657 arch/powerpc/oprofile/cell/vma_map.c | 279 arch/powerpc/oprofile/common.c | 243 --- arch/powerpc/oprofile/op_model_7450.c | 207 --- arch/powerpc/oprofile/op_model_cell.c | 1709 arch/powerpc/oprofile/op_model_fsl_emb.c | 380 - arch/powerpc/oprofile/op_model_pa6t.c | 227 --- arch/powerpc/oprofile/op_model_power4.c| 438 - 12 files changed, 4637 deletions(-) delete mode 100644 arch/powerpc/oprofile/Makefile delete mode 100644 arch/powerpc/oprofile/backtrace.c delete mode 100644 arch/powerpc/oprofile/cell/pr_util.h delete mode 100644 arch/powerpc/oprofile/cell/spu_profiler.c delete mode 100644 arch/powerpc/oprofile/cell/spu_task_sync.c delete mode 100644 arch/powerpc/oprofile/cell/vma_map.c delete mode 100644 arch/powerpc/oprofile/common.c delete mode 100644 arch/powerpc/oprofile/op_model_7450.c delete mode 100644 arch/powerpc/oprofile/op_model_cell.c delete mode 100644 arch/powerpc/oprofile/op_model_fsl_emb.c delete mode 100644 arch/powerpc/oprofile/op_model_pa6t.c delete mode 100644 arch/powerpc/oprofile/op_model_power4.c diff --git a/arch/powerpc/oprofile/Makefile b/arch/powerpc/oprofile/Makefile deleted file mode 100644 index bb2d94c8cbe6.. --- a/arch/powerpc/oprofile/Makefile +++ /dev/null @@ -1,19 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 - -ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC) - -obj-$(CONFIG_OPROFILE) += oprofile.o - -DRIVER_OBJS := $(addprefix ../../../drivers/oprofile/, \ - oprof.o cpu_buffer.o buffer_sync.o \ - event_buffer.o oprofile_files.o \ - oprofilefs.o oprofile_stats.o \ - timer_int.o ) - -oprofile-y := $(DRIVER_OBJS) common.o backtrace.o -oprofile-$(CONFIG_OPROFILE_CELL) += op_model_cell.o \ - cell/spu_profiler.o cell/vma_map.o \ - cell/spu_task_sync.o -oprofile-$(CONFIG_PPC_BOOK3S_64) += op_model_power4.o op_model_pa6t.o -oprofile-$(CONFIG_FSL_EMB_PERFMON) += op_model_fsl_emb.o -oprofile-$(CONFIG_PPC_BOOK3S_32) += op_model_7450.o diff --git a/arch/powerpc/oprofile/backtrace.c b/arch/powerpc/oprofile/backtrace.c deleted file mode 100644 index 9db7ada79d10.. --- a/arch/powerpc/oprofile/backtrace.c +++ /dev/null @@ -1,120 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/** - * Copyright (C) 2005 Brian Rogan , IBM - * -**/ - -#include -#include -#include -#include -#include -#include -#include - -#define STACK_SP(STACK)*(STACK) - -#define STACK_LR64(STACK) *((unsigned long *)(STACK) + 2) -#define STACK_LR32(STACK) *((unsigned int *)(STACK) + 1) - -#ifdef CONFIG_PPC64 -#define STACK_LR(STACK)STACK_LR64(STACK) -#else -#define STACK_LR(STACK)STACK_LR32(STACK) -#endif - -static unsigned int user_getsp32(unsigned int sp, int is_first) -{ - unsigned int stack_frame[2]; - void __user *p = compat_ptr(sp); - - /* -* The most likely reason for this is that we returned -EFAULT, -* which means that we've done all that we can do from -* interrupt context. -*/ - if (copy_from_user_nofault(stack_frame, (void __user *)p, - sizeof(stack_frame))) - return 0; - - if (!is_first) - oprofile_add_trace(STACK_LR32(stack_frame)); - - /* -* We do not enforce increasing stack addresses here because -* we may transition to a different stack, eg a signal handler. -*/ - return STACK_SP(stack_frame); -} - -#ifdef CONFIG_PPC64 -static unsigned long user_getsp64(unsigned long sp, int is_first) -{ - unsigned long stack_frame[3]; - - if (copy_from_user_nofault(stack_frame, (void __user *)sp, - sizeof(stack_frame))) - return 0; - - if (!is_first) - oprofile_add_trace(STACK_LR64(stack_frame)); - - return STACK_SP(stack_frame); -} -#endif - -static unsigned long kernel_getsp(unsigned long sp, int is_first) -{ - unsigned long *stack_frame = (unsigned long *)sp; - - if (!validate_sp(sp, current, STACK_FRAME_OVERHEAD)) - return 0; - - if (!is_first) - oprofile_add_trace(STACK_LR(stack_frame)); - - /* -* We do not enforce increasing stack addresses here because -* we might be transitioning from an interrupt stack to a kernel -* stack. validate_sp() is designed to understand this, so just -
[PATCH 10/18] arch: powerpc: Stop building and using oprofile
The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. This commits stops building oprofile for powerpc and removes any reference to it from directories in arch/powerpc/ apart from arch/powerpc/oprofile, which will be removed in the next commit (this is broken into two commits as the size of the commit became very big, ~5k lines). Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't removed as it was also used by other parts of the code. Suggested-by: Christoph Hellwig Suggested-by: Linus Torvalds Signed-off-by: Viresh Kumar --- arch/powerpc/Kconfig | 1 - arch/powerpc/Makefile | 2 - arch/powerpc/configs/44x/akebono_defconfig| 1 - arch/powerpc/configs/44x/currituck_defconfig | 1 - arch/powerpc/configs/44x/fsp2_defconfig | 1 - arch/powerpc/configs/44x/iss476-smp_defconfig | 1 - arch/powerpc/configs/cell_defconfig | 1 - arch/powerpc/configs/g5_defconfig | 1 - arch/powerpc/configs/maple_defconfig | 1 - arch/powerpc/configs/pasemi_defconfig | 1 - arch/powerpc/configs/pmac32_defconfig | 1 - arch/powerpc/configs/powernv_defconfig| 1 - arch/powerpc/configs/ppc64_defconfig | 1 - arch/powerpc/configs/ppc64e_defconfig | 1 - arch/powerpc/configs/ppc6xx_defconfig | 1 - arch/powerpc/configs/ps3_defconfig| 1 - arch/powerpc/configs/pseries_defconfig| 1 - arch/powerpc/include/asm/cputable.h | 20 --- arch/powerpc/include/asm/oprofile_impl.h | 135 -- arch/powerpc/include/asm/spu.h| 33 - arch/powerpc/kernel/cputable.c| 67 - arch/powerpc/kernel/dt_cpu_ftrs.c | 2 - arch/powerpc/platforms/cell/Kconfig | 5 - arch/powerpc/platforms/cell/spu_notify.c | 55 --- arch/powerpc/platforms/cell/spufs/run.c | 4 +- arch/powerpc/platforms/cell/spufs/sched.c | 5 - arch/powerpc/platforms/cell/spufs/spufs.h | 1 - 27 files changed, 1 insertion(+), 344 deletions(-) delete mode 100644 arch/powerpc/include/asm/oprofile_impl.h delete mode 100644 arch/powerpc/platforms/cell/spu_notify.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 107bb4319e0e..1897c0ff2f2b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -226,7 +226,6 @@ config PPC select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_HARDLOCKUP_DETECTOR_ARCHif (PPC64 && PPC_BOOK3S) - select HAVE_OPROFILE select HAVE_OPTPROBES if PPC64 select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI if PPC64 diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 08cf0eade56a..b959fdaec713 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -276,8 +276,6 @@ head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += arch/powerpc/kernel/prom_init.o # See arch/powerpc/Kbuild for content of core part of the kernel core-y += arch/powerpc/ -drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/ - # Default to zImage, override when needed all: zImage diff --git a/arch/powerpc/configs/44x/akebono_defconfig b/arch/powerpc/configs/44x/akebono_defconfig index 3894ba8f8ffc..72b8f93a9bdd 100644 --- a/arch/powerpc/configs/44x/akebono_defconfig +++ b/arch/powerpc/configs/44x/akebono_defconfig @@ -8,7 +8,6 @@ CONFIG_EXPERT=y CONFIG_KALLSYMS_ALL=y # CONFIG_SLUB_CPU_PARTIAL is not set CONFIG_PROFILING=y -CONFIG_OPROFILE=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_BLK_DEV_BSG is not set diff --git a/arch/powerpc/configs/44x/currituck_defconfig b/arch/powerpc/configs/44x/currituck_defconfig index 34c86b3abecb..717827219921 100644 --- a/arch/powerpc/configs/44x/currituck_defconfig +++ b/arch/powerpc/configs/44x/currituck_defconfig @@ -6,7 +6,6 @@ CONFIG_LOG_BUF_SHIFT=14 CONFIG_EXPERT=y CONFIG_KALLSYMS_ALL=y CONFIG_PROFILING=y -CONFIG_OPROFILE=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_BLK_DEV_BSG is not set diff --git a/arch/powerpc/configs/44x/fsp2_defconfig b/arch/powerpc/configs/44x/fsp2_defconfig index 30845ce0885a..8da316e61a08 100644 --- a/arch/powerpc/configs/44x/fsp2_defconfig +++ b/arch/powerpc/configs/44x/fsp2_defconfig @@ -17,7 +17,6 @@ CONFIG_KALLSYMS_ALL=y CONFIG_BPF_SYSCALL=y CONFIG_EMBEDDED=y CONFIG_PROFILING=y -CONFIG_OPROFILE=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_BLK_DEV_BSG is not set diff --git a/arch/powerpc/configs/44x/iss476-smp_defconfig b/arch/powerpc/configs/44x/iss476-smp_defconfig index 2c3834eebca3..c11e777b2f3d 100644 --- a/arch/powerpc/configs/44x/iss476-smp_defconfig +++ b/arch/powerpc/configs/44x/iss476-smp_defconfig @@ -7,7 +7,6 @@ CONFIG_BLK_DEV_INITRD=y
Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
Le 13/01/2021 à 13:33, Mark Brown a écrit : On Wed, Jan 13, 2021 at 09:49:12AM +0100, Christophe Leroy wrote: With commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") reverted, it is back to work: ... What shall I do ? I would guess that there's an error with the chip select polarity configuration on your system that just happened to work previously, I'd suggest fixing this in the board configuration to bring it in line with everything else. Not that easy. Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW. If I declare them as ACTIVE_HIGH instead, then I also have to set spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and forces the GPIO ACTIVE LOW. When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode. In fsl_spi_chipselect(), we have bool pol = spi->mode & SPI_CS_HIGH Then pdata->cs_control(spi, pol); So changing the board config is compensated by the above, and at the end it still doesn't work. Whereas reverting the above mentionned commit sets back SPI_CS_HIGH into spi->mode without changing the ACTIVE level of the GPIO, resulting in the correct polarity. So, I'm a bit lost, where is the problem exactly ? Thanks Christophe
Re: [PATCH] perf tools: Resolve symbols against debug file first
Namhyung Kim writes: > On Wed, Jan 13, 2021 at 8:43 PM Jiri Slaby wrote: >> >> On 13. 01. 21, 11:46, Jiri Olsa wrote: >> > On Wed, Jan 13, 2021 at 09:01:28AM +0100, Jiri Slaby wrote: >> >> With LTO, there are symbols like these: >> >> /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug >> >> 10305: 00955fa4 0 NOTYPE LOCAL DEFAULT 29 >> >> Predicate.cpp.2bc410e7 >> >> >> >> This comes from a runtime/debug split done by the standard way: >> >> objcopy --only-keep-debug $runtime $debug >> >> objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line >> >> --strip-all $runtime >> >> ... >> >> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c >> >> index f3577f7d72fe..a31b716fa61c 100644 >> >> --- a/tools/perf/util/symbol-elf.c >> >> +++ b/tools/perf/util/symbol-elf.c >> >> @@ -1226,12 +1226,20 @@ int dso__load_sym(struct dso *dso, struct map >> >> *map, struct symsrc *syms_ss, >> >> if (sym.st_shndx == SHN_ABS) >> >> continue; >> >> >> >> -sec = elf_getscn(runtime_ss->elf, sym.st_shndx); >> >> +sec = elf_getscn(syms_ss->elf, sym.st_shndx); >> >> if (!sec) >> >> goto out_elf_end; >> > >> > we iterate symbols from syms_ss, so the fix seems to be correct >> > to call elf_getscn on syms_ss, not on runtime_ss as we do now >> > >> > I'd think this worked only when runtime_ss == syms_ss >> >> No, because the headers are copied 1:1 from runtime_ss to syms_ss. And >> runtime_ss is then stripped, so only .debug* sections are removed there. >> (And syms_ss's are set as NOBITS.) >> >> We iterated .debug* sections in syms_ss and used runtime_ss section >> _headers_ only to adjust symbols (sometimes). That worked. > > It seems PPC has an opd section only in the runtime_ss and that's why > we use it for section headers. At least on my system (Ubuntu 20.04.1) I see .opd in the debug file with NOBITS set: $ readelf -e vmlinux.debug | grep opd [37] .opd NOBITS c1c1f548 01202e14 But possibly that's not the case with older toolchains? cheers
Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool
On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli wrote: > > On 1/5/21 7:41 PM, Claire Chang wrote: > > If a device is not behind an IOMMU, we look up the device node and set > > up the restricted DMA when the restricted-dma-pool is presented. > > > > Signed-off-by: Claire Chang > > --- > > [snip] > > > +int of_dma_set_restricted_buffer(struct device *dev) > > +{ > > + struct device_node *node; > > + int count, i; > > + > > + if (!dev->of_node) > > + return 0; > > + > > + count = of_property_count_elems_of_size(dev->of_node, "memory-region", > > + sizeof(phandle)); > > You could have an early check for count < 0, along with an error > message, if that is deemed useful. > > > + for (i = 0; i < count; i++) { > > + node = of_parse_phandle(dev->of_node, "memory-region", i); > > + if (of_device_is_compatible(node, "restricted-dma-pool")) > > And you may want to add here an of_device_is_available(node). A platform > that provides the Device Tree firmware and try to support multiple > different SoCs may try to determine if an IOMMU is present, and if it > is, it could be marking the restriced-dma-pool region with a 'status = > "disabled"' property, or any variant of that scheme. This function is called only when there is no IOMMU present (check in drivers/of/device.c). I can still add of_device_is_available(node) here if you think it's helpful. > > > + return of_reserved_mem_device_init_by_idx( > > + dev, dev->of_node, i); > > This does not seem to be supporting more than one memory region, did not > you want something like instead: > > ret = of_reserved_mem_device_init_by_idx(...); > if (ret) > return ret; > Yes. This implement only supports one restriced-dma-pool memory region with the assumption that there is only one memory region with the compatible string, restricted-dma-pool, in the dts. IIUC, it's similar to shared-dma-pool. > > + } > > + > > + return 0; > > +} > > diff --git a/drivers/of/device.c b/drivers/of/device.c > > index aedfaaafd3e7..e2c7409956ab 100644 > > --- a/drivers/of/device.c > > +++ b/drivers/of/device.c > > @@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct > > device_node *np, > > arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); > > > > dev->dma_range_map = map; > > + > > + if (!iommu) > > + return of_dma_set_restricted_buffer(dev); > > + > > return 0; > > } > > EXPORT_SYMBOL_GPL(of_dma_configure_id); > > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h > > index d9e6a324de0a..28a2dfa197ba 100644 > > --- a/drivers/of/of_private.h > > +++ b/drivers/of/of_private.h > > @@ -161,12 +161,17 @@ struct bus_dma_region; > > #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA) > > int of_dma_get_range(struct device_node *np, > > const struct bus_dma_region **map); > > +int of_dma_set_restricted_buffer(struct device *dev); > > #else > > static inline int of_dma_get_range(struct device_node *np, > > const struct bus_dma_region **map) > > { > > return -ENODEV; > > } > > +static inline int of_dma_get_restricted_buffer(struct device *dev) > > +{ > > + return -ENODEV; > > +} > > #endif > > > > #endif /* _LINUX_OF_PRIVATE_H */ > > > > > -- > Florian
Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
On Wed, Jan 13, 2021 at 8:42 PM Christoph Hellwig wrote: > > > +#ifdef CONFIG_SWIOTLB > > + struct io_tlb_mem *dma_io_tlb_mem; > > #endif > > Please add a new config option for this code instead of always building > it when swiotlb is enabled. > > > +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t > > start, > > +size_t size) > > Can you split the refactoring in swiotlb.c into one or more prep > patches? > > > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem, > > + struct device *dev) > > +{ > > + struct io_tlb_mem *mem = rmem->priv; > > + int ret; > > + > > + if (dev->dma_io_tlb_mem) > > + return -EBUSY; > > + > > + if (!mem) { > > + mem = kzalloc(sizeof(*mem), GFP_KERNEL); > > + if (!mem) > > + return -ENOMEM; > > What is the calling convention here that allows for a NULL and non-NULL > private data? Since multiple devices can share the same pool, the private data, io_tlb_mem struct, will be initialized by the first device attached to it. This is similar to rmem_dma_device_init() in kernel/dma/coherent.c. I'll add a comment for it in next version.
Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet
On Wed, 13 Jan 2021, Jakub Kicinski wrote: > On Wed, 13 Jan 2021 16:41:16 + Lee Jones wrote: > > Resending the stragglers again. > > > > > > This set is part of a larger effort attempting to clean-up W=1 > > > > kernel builds, which are currently overwhelmingly riddled with > > > > niggly little warnings. > > > > > > > > v2: > > > > - Squashed IBM patches > > > > - Fixed real issue in SMSC > > - Added Andrew's Reviewed-by tags on remainder > > Does not apply, please rebase on net-next/master. These are based on Tuesday's next/master. I just rebased them now with no issue. What conflict are you seeing? -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog