date:20210114

[PATCH] powerpc: dts: p2020rdb: add missing peripherials

2021-01-14 Thread Pawel Dembicki

This patch adds dts entry for some peripherials:
- i2c: temperature sensor ADT7461
- i2c: eeprom m24256
- i2c: eeprom at24c01
- i2c: pmic zl2006
- i2c: gpio expander
- phy: reset pins for phy
- dsa: switch vsc7385

It was required to adjust rgmii settings for enet0 because switch with
dsa driver act different without 8081 sterring.

Signed-off-by: Pawel Dembicki 
---
 arch/powerpc/boot/dts/fsl/p2020rdb.dts | 73 --
 1 file changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/p2020rdb.dts 
b/arch/powerpc/boot/dts/fsl/p2020rdb.dts
index 3acd3890b397..1f2ddeca0375 100644
--- a/arch/powerpc/boot/dts/fsl/p2020rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/p2020rdb.dts
@@ -7,6 +7,9 @@
 
 /include/ "p2020si-pre.dtsi"
 
+#include 
+#include 
+
 / {
model = "fsl,P2020RDB";
compatible = "fsl,P2020RDB";
@@ -131,22 +134,84 @@ partition@110 {
L2switch@2,0 {
#address-cells = <1>;
#size-cells = <1>;
-   compatible = "vitesse-7385";
+   compatible = "vitesse,vsc7385";
reg = <0x2 0x0 0x2>;
-   };
+   reset-gpios = < 12 GPIO_ACTIVE_LOW>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
 
+   port@1 {
+   reg = <1>;
+   label = "e1-sw-p1";
+   };
+   port@2 {
+   reg = <2>;
+   label = "e1-sw-p2";
+   };
+   port@3 {
+   reg = <3>;
+   label = "e1-sw-p3";
+   };
+   port@4 {
+   reg = <4>;
+   label = "e1-sw-p4";
+   };
+   port@6 {
+   reg = <6>;
+   label = "cpu";
+   ethernet = <>;
+   phy-mode = "rgmii";
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   pause;
+   };
+   };
+   };
+   };
};
 
soc: soc@ffe0 {
ranges = <0x0 0x0 0xffe0 0x10>;
 
+   gpio0: gpio-controller@fc00 {
+   };
+
i2c@3000 {
+   temperature-sensor@4c {
+   compatible = "adi,adt7461";
+   reg = <0x4c>;
+   };
+
+   eeprom@50 {
+   compatible = "atmel,24c256";
+   reg = <0x50>;
+   };
rtc@68 {
compatible = "dallas,ds1339";
reg = <0x68>;
};
};
 
+   i2c@3100 {
+   pmic@11 {
+   compatible = "zl2006";
+   reg = <0x11>;
+   };
+
+   gpio@18 {
+   compatible = "nxp,pca9557";
+   reg = <0x18>;
+   };
+
+   eeprom@52 {
+   compatible = "atmel,24c01";
+   reg = <0x52>;
+   };
+   };
+
spi@7000 {
flash@0 {
#address-cells = <1>;
@@ -200,10 +265,12 @@ mdio@24520 {
phy0: ethernet-phy@0 {
interrupts = <3 1 0 0>;
reg = <0x0>;
+   reset-gpios = < 14 GPIO_ACTIVE_LOW>;
};
phy1: ethernet-phy@1 {
interrupts = <3 1 0 0>;
reg = <0x1>;
+   reset-gpios = < 6 GPIO_ACTIVE_LOW>;
};
tbi-phy@2 {
device_type = "tbi-phy";
@@ -233,7 +300,7 @@ ptp_clock@24e00 {
 
enet0: ethernet@24000 {
fixed-link = <1 1 1000 0 0>;
-   phy-connection-type = "rgmii-id";
+

[PATCH v2] Adds a new ioctl32 syscall for backwards compatibility layers

2021-01-14 Thread sonicadvance1

From: Ryan Houdek 

Problem presented:
A backwards compatibility layer that allows running x86-64 and x86
processes inside of an AArch64 process.
  - CPU is emulated
  - Syscall interface is mostly passthrough
  - Some syscalls require patching or emulation depending on behaviour
  - Not viable from the emulator design to use an AArch32 host process

x86-64 and x86 userspace emulator source:
https://github.com/FEX-Emu/FEX
Usage of ioctl32 is currently in a downstream fork. This will be the
first user of the syscall.

Cross documentation:
https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes#ioctl---54

ioctls are opaque from the emulator perspective and the data wants to be
passed through a syscall as unimpeded as possible.
Sadly due to ioctl struct differences between x86 and x86-64, we need a
syscall that exposes the compatibility ioctl handler to userspace in a
64bit process.

This is necessary behaves of the behaviour differences that occur
between an x86 process doing an ioctl and an x86-64 process doing an
ioctl.

Both of which are captured and passed through the AArch64 ioctl space.
This is implementing a new ioctl32 syscall that allows us to pass 32bit
x86 ioctls through to the kernel with zero or minimal manipulation.

The only supported hosts where we care about this currently is AArch64
and x86-64 (For testing purposes).
PPC64LE, MIPS64LE, and RISC-V64 might be interesting to support in the
future; But I don't have any platforms that get anywhere near Cortex-A77
performance in those architectures. Nor do I have the time to bring up
the emulator on them.
x86-64 can get to the compatibility ioctl through the int $0x80 handler.

This does not solve the following problems:
1) compat_alloc_user_space inside ioctl
2) ioctls that check task mode instead of entry point for behaviour
3) ioctls allocating memory
4) struct packing problems between architectures

Workarounds for the problems presented:
1a) Do a stack pivot to the lower 32bits from userspace
  - Forces host 64bit process to have its thread stacks to live in 32bit
  space. Not ideal.
  - Only do a stack pivot on ioctl to save previous 32bit VA space
1b) Teach kernel that compat_alloc_userspace can return a 64bit pointer
  - x86-64 truncates stack from this function
  - AArch64 returns the full stack pointer
  - Only ~29 users. Validating all of them support a 64bit stack is
  trivial?

2a) Any application using these can be checked for compatibility in
userspace and put on a block list.
2b) Fix any ioctls doing broken behaviour based on task mode rather than
ioctl entry point

3a) Userspace consumes all VA space above 32bit. Forcing allocations to
occur in lower 32bits
  - This is the current implementation
3b) Ensure any allocation in the ioctl handles ioctl entrypoint rather
than just allow generic memory allocations in full VA space
  - This is hard to guarantee

4a) Blocklist any application using ioctls that have different struct
packing across the boundary
  - Can happen when struct packing of 32bit x86 application goes down
  the aarch64 compat_ioctl path
  - Userspace is a AArch64 process passing 32bit x86 ioctl structures
  through the compat_ioctl path which is typically for AArch32 processes
  - None currently identified
4b) Work with upstream kernel and userspace projects to evaluate and fix
  - Identify the problem ioctls
  - Implement a new ioctl with more sane struct packing that matches
  cross-arch
  - Implement new ioctl while maintaining backwards compatibility with
  previous ioctl handler
  - Change upstream project to use the new compatibility ioctl
  - ioctl deprecation will be case by case per device and project
4b) Userspace implements a full ioctl emulation layer
  - Parses the full ioctl tree
  - Either passes through ioctls that it doesn't understand or
  transforms ioctls that it knows are trouble
  - Has the downside that it can still run in to edge cases that will
  fail
  - Performance of additional tracking is a concern
  - Prone to failure keeping the kernel ioctl and userspace ioctl
  handling in sync
  - Really want to have it in the kernel space as much as possible

Changes in v2:
- Added the syscall to all architecture tables
- Disabled on 32bit and BE platforms. They can call ioctl directly.
- Disabled on x86-64 as well since you can call this from ia32 or x32
dispatch tables

Signed-off-by: Ryan Houdek 
---
 arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
 arch/arm/tools/syscall.tbl  |  1 +
 arch/arm64/include/asm/unistd.h |  2 +-
 arch/arm64/include/asm/unistd32.h   |  2 ++
 arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  2 ++
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl |  1 +

[PATCH] Adds a new ioctl32 syscall for backwards compatibility layers

2021-01-14 Thread sonicadvance1

From: Ryan Houdek 

Problem presented:
A backwards compatibility layer that allows running x86-64 and x86
processes inside of an AArch64 process.
  - CPU is emulated
  - Syscall interface is mostly passthrough
  - Some syscalls require patching or emulation depending on behaviour
  - Not viable from the emulator design to use an AArch32 host process

x86-64 and x86 userspace emulator source:
https://github.com/FEX-Emu/FEX
Usage of ioctl32 is currently in a downstream fork. This will be the
first user of the syscall.

Cross documentation:
https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes#ioctl---54

ioctls are opaque from the emulator perspective and the data wants to be
passed through a syscall as unimpeded as possible.
Sadly due to ioctl struct differences between x86 and x86-64, we need a
syscall that exposes the compatibility ioctl handler to userspace in a
64bit process.

This is necessary behaves of the behaviour differences that occur
between an x86 process doing an ioctl and an x86-64 process doing an
ioctl.

Both of which are captured and passed through the AArch64 ioctl space.
This is implementing a new ioctl32 syscall that allows us to pass 32bit
x86 ioctls through to the kernel with zero or minimal manipulation.

The only supported hosts where we care about this currently is AArch64
and x86-64 (For testing purposes).
PPC64LE, MIPS64LE, and RISC-V64 might be interesting to support in the
future; But I don't have any platforms that get anywhere near Cortex-A77
performance in those architectures. Nor do I have the time to bring up
the emulator on them.
x86-64 can get to the compatibility ioctl through the int $0x80 handler.

This does not solve the following problems:
1) compat_alloc_user_space inside ioctl
2) ioctls that check task mode instead of entry point for behaviour
3) ioctls allocating memory
4) struct packing problems between architectures

Workarounds for the problems presented:
1a) Do a stack pivot to the lower 32bits from userspace
  - Forces host 64bit process to have its thread stacks to live in 32bit
  space. Not ideal.
  - Only do a stack pivot on ioctl to save previous 32bit VA space
1b) Teach kernel that compat_alloc_userspace can return a 64bit pointer
  - x86-64 truncates stack from this function
  - AArch64 returns the full stack pointer
  - Only ~29 users. Validating all of them support a 64bit stack is
  trivial?

2a) Any application using these can be checked for compatibility in
userspace and put on a block list.
2b) Fix any ioctls doing broken behaviour based on task mode rather than
ioctl entry point

3a) Userspace consumes all VA space above 32bit. Forcing allocations to
occur in lower 32bits
  - This is the current implementation
3b) Ensure any allocation in the ioctl handles ioctl entrypoint rather
than just allow generic memory allocations in full VA space
  - This is hard to guarantee

4a) Blocklist any application using ioctls that have different struct
packing across the boundary
  - Can happen when struct packing of 32bit x86 application goes down
  the aarch64 compat_ioctl path
  - Userspace is a AArch64 process passing 32bit x86 ioctl structures
  through the compat_ioctl path which is typically for AArch32 processes
  - None currently identified
4b) Work with upstream kernel and userspace projects to evaluate and fix
  - Identify the problem ioctls
  - Implement a new ioctl with more sane struct packing that matches
  cross-arch
  - Implement new ioctl while maintaining backwards compatibility with
  previous ioctl handler
  - Change upstream project to use the new compatibility ioctl
  - ioctl deprecation will be case by case per device and project
4b) Userspace implements a full ioctl emulation layer
  - Parses the full ioctl tree
  - Either passes through ioctls that it doesn't understand or
  transforms ioctls that it knows are trouble
  - Has the downside that it can still run in to edge cases that will
  fail
  - Performance of additional tracking is a concern
  - Prone to failure keeping the kernel ioctl and userspace ioctl
  handling in sync
  - Really want to have it in the kernel space as much as possible

Signed-off-by: Ryan Houdek 
---
 arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
 arch/arm/tools/syscall.tbl  |  1 +
 arch/arm64/include/asm/unistd.h |  2 +-
 arch/arm64/include/asm/unistd32.h   |  2 ++
 arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  2 ++
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl|  1 +
 arch/s390/kernel/syscalls/syscall.tbl   |  1 +
 arch/sh/kernel/syscalls/syscall.tbl |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl

Re: [PATCH] perf tools: Resolve symbols against debug file first

2021-01-14 Thread Namhyung Kim

Hello,

On Thu, Jan 14, 2021 at 8:17 PM Michael Ellerman  wrote:
>
> Namhyung Kim  writes:
> > On Wed, Jan 13, 2021 at 8:43 PM Jiri Slaby  wrote:
> >>
> >> On 13. 01. 21, 11:46, Jiri Olsa wrote:
> >> > On Wed, Jan 13, 2021 at 09:01:28AM +0100, Jiri Slaby wrote:
> >> >> With LTO, there are symbols like these:
> >> >> /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug
> >> >>   10305: 00955fa4 0 NOTYPE  LOCAL  DEFAULT   29 
> >> >> Predicate.cpp.2bc410e7
> >> >>
> >> >> This comes from a runtime/debug split done by the standard way:
> >> >> objcopy --only-keep-debug $runtime $debug
> >> >> objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line 
> >> >> --strip-all $runtime
> >> >>
> ...
> >> >> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> >> >> index f3577f7d72fe..a31b716fa61c 100644
> >> >> --- a/tools/perf/util/symbol-elf.c
> >> >> +++ b/tools/perf/util/symbol-elf.c
> >> >> @@ -1226,12 +1226,20 @@ int dso__load_sym(struct dso *dso, struct map 
> >> >> *map, struct symsrc *syms_ss,
> >> >>  if (sym.st_shndx == SHN_ABS)
> >> >>  continue;
> >> >>
> >> >> -sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
> >> >> +sec = elf_getscn(syms_ss->elf, sym.st_shndx);
> >> >>  if (!sec)
> >> >>  goto out_elf_end;
> >> >
> >> > we iterate symbols from syms_ss, so the fix seems to be correct
> >> > to call elf_getscn on syms_ss, not on runtime_ss as we do now
> >> >
> >> > I'd think this worked only when runtime_ss == syms_ss
> >>
> >> No, because the headers are copied 1:1 from runtime_ss to syms_ss. And
> >> runtime_ss is then stripped, so only .debug* sections are removed there.
> >> (And syms_ss's are set as NOBITS.)
> >>
> >> We iterated .debug* sections in syms_ss and used runtime_ss section
> >> _headers_ only to adjust symbols (sometimes). That worked.
> >
> > It seems PPC has an opd section only in the runtime_ss and that's why
> > we use it for section headers.
>
> At least on my system (Ubuntu 20.04.1) I see .opd in the debug file with
> NOBITS set:
>
> $ readelf -e vmlinux.debug | grep opd
>   [37] .opd  NOBITS   c1c1f548  01202e14
>
>
> But possibly that's not the case with older toolchains?

I was referring to this commit:

commit 261360b6e90a782f0a63d8f61a67683c376c88cf
Author: Cody P Schafer 
Date:   Fri Aug 10 15:23:01 2012 -0700

perf symbols: Convert dso__load_syms to take 2 symsrc's

To properly handle platforms with an opd section, both a runtime image
(which contains the opd section but possibly lacks symbols) and a symbol
image (which probably lacks an opd section but has symbols).

The next patch ("perf symbol: use both runtime and debug images")
adjusts the callsite in dso__load() to take advantage of being able to
pass both runtime & debug images.

Assumptions made here:

 - The opd section, if it exists in the runtime image, has headers in
   both the runtime image and the debug/syms image.

 - The index of the opd section (again, only if it exists in the runtime
   image) is the same in both the runtime and debug/symbols image.

Both of these are true on RHEL, but it is unclear how accurate they are
in general (on platforms with function descriptors in opd sections).

Signed-off-by: Cody P Schafer 


Thanks,
Namhyung

[PATCH] powerpc/sstep: Fix array out of bound warning

2021-01-14 Thread Ravi Bangoria

Compiling kernel with -Warray-bounds throws below warning:

  In function 'emulate_vsx_store':
  warning: array subscript is above array bounds [-Warray-bounds]
  buf.d[2] = byterev_8(reg->d[1]);
  ~^~~
  buf.d[3] = byterev_8(reg->d[0]);
  ~^~~

Fix it by converting local variable 'union vsx_reg buf' into an array.
Also consider function argument 'union vsx_reg *reg' as array instead
of pointer because callers are actually passing an array to it.

Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access 
instructions")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 61 
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index bf7a7d62ae8b..5b4281ade5b6 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -723,7 +723,8 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
const unsigned char *bp;
 
size = GETSIZE(op->type);
-   reg->d[0] = reg->d[1] = 0;
+   reg[0].d[0] = reg[0].d[1] = 0;
+   reg[1].d[0] = reg[1].d[1] = 0;
 
switch (op->element_size) {
case 32:
@@ -742,25 +743,25 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
/* scalar loads, lxvd2x, lxvdsx */
read_size = (size >= 8) ? 8 : size;
i = IS_LE ? 8 : 8 - read_size;
-   memcpy(>b[i], mem, read_size);
+   memcpy([0].b[i], mem, read_size);
if (rev)
-   do_byte_reverse(>b[i], 8);
+   do_byte_reverse([0].b[i], 8);
if (size < 8) {
if (op->type & SIGNEXT) {
/* size == 4 is the only case here */
-   reg->d[IS_LE] = (signed int) reg->d[IS_LE];
+   reg[0].d[IS_LE] = (signed int)reg[0].d[IS_LE];
} else if (op->vsx_flags & VSX_FPCONV) {
preempt_disable();
-   conv_sp_to_dp(>fp[1 + IS_LE],
- >dp[IS_LE]);
+   conv_sp_to_dp([0].fp[1 + IS_LE],
+ [0].dp[IS_LE]);
preempt_enable();
}
} else {
if (size == 16) {
unsigned long v = *(unsigned long *)(mem + 8);
-   reg->d[IS_BE] = !rev ? v : byterev_8(v);
+   reg[0].d[IS_BE] = !rev ? v : byterev_8(v);
} else if (op->vsx_flags & VSX_SPLAT)
-   reg->d[IS_BE] = reg->d[IS_LE];
+   reg[0].d[IS_BE] = reg[0].d[IS_LE];
}
break;
case 4:
@@ -768,13 +769,13 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
wp = mem;
for (j = 0; j < size / 4; ++j) {
i = IS_LE ? 3 - j : j;
-   reg->w[i] = !rev ? *wp++ : byterev_4(*wp++);
+   reg[0].w[i] = !rev ? *wp++ : byterev_4(*wp++);
}
if (op->vsx_flags & VSX_SPLAT) {
-   u32 val = reg->w[IS_LE ? 3 : 0];
+   u32 val = reg[0].w[IS_LE ? 3 : 0];
for (; j < 4; ++j) {
i = IS_LE ? 3 - j : j;
-   reg->w[i] = val;
+   reg[0].w[i] = val;
}
}
break;
@@ -783,7 +784,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
hp = mem;
for (j = 0; j < size / 2; ++j) {
i = IS_LE ? 7 - j : j;
-   reg->h[i] = !rev ? *hp++ : byterev_2(*hp++);
+   reg[0].h[i] = !rev ? *hp++ : byterev_2(*hp++);
}
break;
case 1:
@@ -791,7 +792,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
bp = mem;
for (j = 0; j < size; ++j) {
i = IS_LE ? 15 - j : j;
-   reg->b[i] = *bp++;
+   reg[0].b[i] = *bp++;
}
break;
}
@@ -804,7 +805,7 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
 {
int size, write_size;
int i, j;
-   union vsx_reg buf;
+   union vsx_reg buf[2];
unsigned int *wp;
unsigned short *hp;
unsigned char *bp;
@@ -818,11 +819,11 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
break;
if (rev) {
/*

Re: [PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

2021-01-14 Thread Andrew Donnellan


On 15/1/21 9:00 am, Nathan Lynch wrote:

RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION, and express the value in terms of the
number of work areas allocated.

Signed-off-by: Nathan Lynch 

squash! powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE


I think you meant to get rid of this line...

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 3/6] powerpc/rtas: remove ibm_suspend_me_token

2021-01-14 Thread Andrew Donnellan


On 15/1/21 9:00 am, Nathan Lynch wrote:

There's not a compelling reason to cache the value of the token for
the ibm,suspend-me function. Just look it up when needed in the RTAS
syscall's special case for it.

Signed-off-by: Nathan Lynch 


Reviewed-by: Andrew Donnellan 


---
  arch/powerpc/kernel/rtas.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d126d71ea5bd..60fcf7f7b0b8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -828,7 +828,6 @@ void rtas_activate_firmware(void)
pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
  }
  
-static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;

  #ifdef CONFIG_PPC_PSERIES
  /**
   * rtas_call_reentrant() - Used for reentrant rtas calls
@@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
return -EINVAL;
  
  	/* Need to handle ibm,suspend_me call specially */

-   if (token == ibm_suspend_me_token) {
+   if (token == rtas_token("ibm,suspend-me")) {
  
  		/*

 * rtas_ibm_suspend_me assumes the streamid handle is in cpu
@@ -1191,10 +1190,8 @@ void __init rtas_initialize(void)
 * the stop-self token if any
 */
  #ifdef CONFIG_PPC64
-   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
-   ibm_suspend_me_token = rtas_token("ibm,suspend-me");
-   }
  #endif
rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
 0, rtas_region);



--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 2/6] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX

2021-01-14 Thread Andrew Donnellan


On 15/1/21 9:00 am, Nathan Lynch wrote:

This constant is unused.

Signed-off-by: Nathan Lynch 


A quick grep agrees.

Reviewed-by: Andrew Donnellan 


---
  arch/powerpc/kernel/rtas-proc.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index e0f8329966d6..d2b0d99824a4 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
return 0;
  }
  
-#define RMO_READ_BUF_MAX 30

-
  /**
   * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
   *



--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 1/6] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation

2021-01-14 Thread Andrew Donnellan


On 15/1/21 8:59 am, Nathan Lynch wrote:

Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for
/proc/powerpc/rtas/rmo_buffer, explaining its expected use.

Signed-off-by: Nathan Lynch 


Looks good.

Reviewed-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 4/6] powerpc/rtas: move syscall filter setup into separate function

2021-01-14 Thread Andrew Donnellan


On 15/1/21 9:00 am, Nathan Lynch wrote:

Reduce conditionally compiled sections within rtas_initialize() by
moving the filter table initialization into its own function already
guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended.

Signed-off-by: Nathan Lynch 


Acked-by: Andrew Donnellan 


+static void __init rtas_syscall_filter_init(void)
+{
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
+   }
+
+}


Unnecessary empty line, but vitally important braces :)


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 4/6] powerpc/rtas: move syscall filter setup into separate function

2021-01-14 Thread Alexey Kardashevskiy





On 15/01/2021 09:00, Nathan Lynch wrote:

Reduce conditionally compiled sections within rtas_initialize() by
moving the filter table initialization into its own function already
guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended.

Signed-off-by: Nathan Lynch 
---
  arch/powerpc/kernel/rtas.c | 23 +++
  1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 60fcf7f7b0b8..55f6aa170e57 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1051,6 +1051,16 @@ static bool block_rtas_call(int token, int nargs,
return true;
  }
  
+static void __init rtas_syscall_filter_init(void)

+{
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
+   }
+


Unnecessary curly braces (I understand it is cut-n-paste but still) and 
an empty line. Otherwise:


Reviewed-by: Alexey Kardashevskiy 




+}
+
  #else
  
  static bool block_rtas_call(int token, int nargs,

@@ -1059,6 +1069,10 @@ static bool block_rtas_call(int token, int nargs,
return false;
  }
  
+static void __init rtas_syscall_filter_init(void)

+{
+}
+
  #endif /* CONFIG_PPC_RTAS_FILTER */
  
  /* We assume to be passed big endian arguments */

@@ -1162,9 +1176,6 @@ void __init rtas_initialize(void)
unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
u32 base, size, entry;
int no_base, no_size, no_entry;
-#ifdef CONFIG_PPC_RTAS_FILTER
-   int i;
-#endif
  
  	/* Get RTAS dev node and fill up our "rtas" structure with infos

 * about it.
@@ -1203,11 +1214,7 @@ void __init rtas_initialize(void)
rtas_last_error_token = rtas_token("rtas-last-error");
  #endif
  
-#ifdef CONFIG_PPC_RTAS_FILTER

-   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
-   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
-   }
-#endif
+   rtas_syscall_filter_init();
  }
  
  int __init early_init_dt_scan_rtas(unsigned long node,




--
Alexey

Re: [PATCH 1/6] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation

2021-01-14 Thread Alexey Kardashevskiy





On 15/01/2021 08:59, Nathan Lynch wrote:

Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for
/proc/powerpc/rtas/rmo_buffer, explaining its expected use.

Signed-off-by: Nathan Lynch 
---
  arch/powerpc/kernel/rtas-proc.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 2d33f342a293..e0f8329966d6 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -757,7 +757,16 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
  
  #define RMO_READ_BUF_MAX 30
  
-/* RTAS Userspace access */

+/**
+ * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
+ *
+ * Base + size description of a range of RTAS-addressable memory set
+ * aside for user space to use as work area(s) for certain RTAS
+ * functions. User space accesses this region via /dev/mem.



mmm ufff wut argh^w^w^w^w Thanks for documenting it :)

Reviewed-by: Alexey Kardashevskiy 




Apart from
+ * security policies, the kernel does not arbitrate or serialize
+ * access to this region, and user space must ensure that concurrent
+ * users do not interfere with each other.
+ */
  static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v)
  {
seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX);



--
Alexey

Re: [PATCH 3/6] powerpc/rtas: remove ibm_suspend_me_token

2021-01-14 Thread Alexey Kardashevskiy





On 15/01/2021 09:00, Nathan Lynch wrote:

There's not a compelling reason to cache the value of the token for
the ibm,suspend-me function. Just look it up when needed in the RTAS
syscall's special case for it.

Signed-off-by: Nathan Lynch 



Reviewed-by: Alexey Kardashevskiy 



---
  arch/powerpc/kernel/rtas.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d126d71ea5bd..60fcf7f7b0b8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -828,7 +828,6 @@ void rtas_activate_firmware(void)
pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
  }
  
-static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;

  #ifdef CONFIG_PPC_PSERIES
  /**
   * rtas_call_reentrant() - Used for reentrant rtas calls
@@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
return -EINVAL;
  
  	/* Need to handle ibm,suspend_me call specially */

-   if (token == ibm_suspend_me_token) {
+   if (token == rtas_token("ibm,suspend-me")) {
  
  		/*

 * rtas_ibm_suspend_me assumes the streamid handle is in cpu
@@ -1191,10 +1190,8 @@ void __init rtas_initialize(void)
 * the stop-self token if any
 */
  #ifdef CONFIG_PPC64
-   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
-   ibm_suspend_me_token = rtas_token("ibm,suspend-me");
-   }
  #endif
rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
 0, rtas_region);



--
Alexey

Re: [PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

2021-01-14 Thread Alexey Kardashevskiy





On 15/01/2021 09:00, Nathan Lynch wrote:

RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION, and express the value in terms of the
number of work areas allocated.

Signed-off-by: Nathan Lynch 

squash! powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
---
  arch/powerpc/include/asm/rtas.h | 9 ++---
  arch/powerpc/kernel/rtas-proc.c | 2 +-
  arch/powerpc/kernel/rtas.c  | 2 +-
  3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 332e1000ca0f..1aa7ab1cbc84 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -19,8 +19,11 @@
  #define RTAS_UNKNOWN_SERVICE (-1)
  #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above 
this value */
  
-/* Buffer size for ppc_rtas system call. */

-#define RTAS_RMOBUF_MAX (64 * 1024)
+/* Work areas shared with RTAS must be 4K, naturally aligned. */


Why exactly 4K and not (for example) PAGE_SIZE?


+#define RTAS_WORK_AREA_SIZE   4096
+
+/* Work areas allocated for user space access. */
+#define RTAS_USER_REGION_SIZE (RTAS_WORK_AREA_SIZE * 16)


This is still 64K but no clarity why. There is 16 of something, what is it?


  
  /* RTAS return status codes */

  #define RTAS_BUSY -2/* RTAS Busy */
@@ -357,7 +360,7 @@ extern void rtas_take_timebase(void);
  static inline int page_is_rtas_user_buf(unsigned long pfn)
  {
unsigned long paddr = (pfn << PAGE_SHIFT);
-   if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_RMOBUF_MAX))
+   if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + 
RTAS_USER_REGION_SIZE))
return 1;
return 0;
  }
diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index d2b0d99824a4..6857a5b0a1c3 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -767,6 +767,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
   */
  static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v)
  {
-   seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX);
+   seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_USER_REGION_SIZE);
return 0;
  }
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 55f6aa170e57..da65faadbbb2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1204,7 +1204,7 @@ void __init rtas_initialize(void)
if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
  #endif
-   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
+   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, 
PAGE_SIZE,
 0, rtas_region);
if (!rtas_rmo_buf)
panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",



--
Alexey

Re: [PATCH 2/6] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX

2021-01-14 Thread Alexey Kardashevskiy





On 15/01/2021 09:00, Nathan Lynch wrote:

This constant is unused.

Signed-off-by: Nathan Lynch 


Reviewed-by: Alexey Kardashevskiy 



---
  arch/powerpc/kernel/rtas-proc.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index e0f8329966d6..d2b0d99824a4 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
return 0;
  }
  
-#define RMO_READ_BUF_MAX 30

-
  /**
   * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
   *



--
Alexey

Re: [PATCH 6/6] powerpc/rtas: constrain user region allocation to RMA

2021-01-14 Thread Alexey Kardashevskiy





On 15/01/2021 09:00, Nathan Lynch wrote:

Memory locations passed as arguments from the OS to RTAS usually need
to be addressable in 32-bit mode and must reside in the Real Mode
Area. On PAPR guests, the RMA starts at logical address 0 and is the
first logical memory block reported in the LPAR’s device tree.

On powerpc targets with RTAS, Linux makes available to user space a
region of memory suitable for arguments to be passed to RTAS via
sys_rtas(). This region (rtas_rmo_buf) is allocated via the memblock
API during boot in order to ensure that it satisfies the requirements
described above.

With radix MMU, the upper limit supplied to the memblock allocation
can exceed the bounds of the first logical memory block, since
ppc64_rma_size is ULONG_MAX and RTAS_INSTANTIATE_MAX is 1GB. (512MB is
a common size of the first memory block according to a small sample of
LPARs I have checked.) This leads to failures when user space invokes
an RTAS function that uses a work area, such as
ibm,configure-connector.

Alter the determination of the upper limit for rtas_rmo_buf's
allocation to consult the device tree directly, ensuring placement
within the RMA regardless of the MMU in use.


Can we tie this with RTAS (which also needs to be in RMA) and simply add 
extra 64K in prom_instantiate_rtas() and advertise this address 
(ALIGH_UP(rtas-base + rtas-size, PAGE_SIZE)) to the user space? We do 
not need this RMO area before that point.


And probably do the same with per-cpu RTAS argument structures mentioned 
in the cover letter?






Signed-off-by: Nathan Lynch 
---
  arch/powerpc/kernel/rtas.c | 80 +++---
  1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index da65faadbbb2..98dfb112f4df 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1166,6 +1166,70 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
return 0;
  }
  
+/*

+ * Memory locations passed to RTAS must be in the RMA as described by
+ * the range in /memory@0.
+ */
+static phys_addr_t rtas_arg_addr_limit(void)
+{
+   unsigned int addr_cells;
+   unsigned int size_cells;
+   struct device_node *np;
+   const __be32 *prop;
+   u64 limit;
+   u64 base;
+
+   /* RTAS is instantiated in 32-bit mode. */
+   limit = 1ULL << 32;
+
+   /* Account for mem=. */
+   if (memory_limit != 0)
+   limit = min(limit, memory_limit);
+
+   np = of_find_node_by_path("/memory@0");
+   if (!np)
+   goto out;
+
+   prop = of_get_property(np, "reg", NULL);
+   if (!prop)
+   goto put;
+
+   addr_cells = of_n_addr_cells(np);
+   base = of_read_number(prop, addr_cells);
+   prop += addr_cells;
+   size_cells = of_n_size_cells(np);
+   limit = min(limit, of_read_number(prop, size_cells));
+put:
+   of_node_put(np);
+out:
+   pr_debug("%s: base = %#llx limit = %#llx", __func__, base, limit);
+
+   return limit;
+}
+
+static void __init rtas_user_region_setup(void)
+{
+   phys_addr_t limit, align, size;
+
+   limit = rtas_arg_addr_limit();
+   size = RTAS_USER_REGION_SIZE;
+
+   /*
+* Although work areas need only 4KB alignment, user space
+* accesses this region via mmap so it must be placed on a
+* page boundary.
+*/
+   align = PAGE_SIZE;
+
+   rtas_rmo_buf = memblock_phys_alloc_range(size, align, 0, limit);
+   if (rtas_rmo_buf == 0) {
+   panic("Failed to allocate %llu bytes for user region below 
%pa\n",
+ size, );
+   }
+
+   pr_debug("RTAS user region allocated at %pa\n", _rmo_buf);
+}
+
  /*
   * Call early during boot, before mem init, to retrieve the RTAS
   * information from the device-tree and allocate the RMO buffer for userland
@@ -1173,7 +1237,6 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
   */
  void __init rtas_initialize(void)
  {
-   unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
u32 base, size, entry;
int no_base, no_size, no_entry;
  
@@ -1197,23 +1260,10 @@ void __init rtas_initialize(void)

no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", );
rtas.entry = no_entry ? rtas.base : entry;
  
-	/* If RTAS was found, allocate the RMO buffer for it and look for

-* the stop-self token if any
-*/
-#ifdef CONFIG_PPC64
-   if (firmware_has_feature(FW_FEATURE_LPAR))
-   rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
-#endif
-   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, 
PAGE_SIZE,
-0, rtas_region);
-   if (!rtas_rmo_buf)
-   panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",
- PAGE_SIZE, _region);
-
  #ifdef CONFIG_RTAS_ERROR_LOGGING
rtas_last_error_token =

Re: [PATCH 10/18] arch: powerpc: Stop building and using oprofile

2021-01-14 Thread Viresh Kumar

On 14-01-21, 17:05, Viresh Kumar wrote:
> The "oprofile" user-space tools don't use the kernel OPROFILE support
> any more, and haven't in a long time. User-space has been converted to
> the perf interfaces.
> 
> This commits stops building oprofile for powerpc and removes any
> reference to it from directories in arch/powerpc/ apart from
> arch/powerpc/oprofile, which will be removed in the next commit (this is
> broken into two commits as the size of the commit became very big, ~5k
> lines).
> 
> Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't
> removed as it was also used by other parts of the code.
> 
> Suggested-by: Christoph Hellwig 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Viresh Kumar 

+ this to fix a warning:

diff --git a/arch/powerpc/platforms/cell/spufs/run.c 
b/arch/powerpc/platforms/cell/spufs/run.c
index 466006918003..ce52b87496d2 100644
--- a/arch/powerpc/platforms/cell/spufs/run.c
+++ b/arch/powerpc/platforms/cell/spufs/run.c
@@ -353,7 +353,6 @@ static int spu_process_callback(struct spu_context *ctx)
 long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event)
 {
int ret;
-   struct spu *spu;
u32 status;
 
if (mutex_lock_interruptible(>run_mutex))
@@ -386,7 +385,6 @@ long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 
*event)
mutex_lock(>state_mutex);
break;
}
-   spu = ctx->spu;
if (unlikely(test_and_clear_bit(SPU_SCHED_NOTIFY_ACTIVE,
>sched_flags))) {
if (!(status & SPU_STATUS_STOPPED_BY_STOP))

-- 
viresh

Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

2021-01-14 Thread Claire Chang

On Fri, Jan 15, 2021 at 2:52 AM Florian Fainelli  wrote:
>
> On 1/14/21 1:08 AM, Claire Chang wrote:
> > On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli  
> > wrote:
> >>
> >> On 1/5/21 7:41 PM, Claire Chang wrote:
> >>> If a device is not behind an IOMMU, we look up the device node and set
> >>> up the restricted DMA when the restricted-dma-pool is presented.
> >>>
> >>> Signed-off-by: Claire Chang 
> >>> ---
> >>
> >> [snip]
> >>
> >>> +int of_dma_set_restricted_buffer(struct device *dev)
> >>> +{
> >>> + struct device_node *node;
> >>> + int count, i;
> >>> +
> >>> + if (!dev->of_node)
> >>> + return 0;
> >>> +
> >>> + count = of_property_count_elems_of_size(dev->of_node, 
> >>> "memory-region",
> >>> + sizeof(phandle));
> >>
> >> You could have an early check for count < 0, along with an error
> >> message, if that is deemed useful.
> >>
> >>> + for (i = 0; i < count; i++) {
> >>> + node = of_parse_phandle(dev->of_node, "memory-region", i);
> >>> + if (of_device_is_compatible(node, "restricted-dma-pool"))
> >>
> >> And you may want to add here an of_device_is_available(node). A platform
> >> that provides the Device Tree firmware and try to support multiple
> >> different SoCs may try to determine if an IOMMU is present, and if it
> >> is, it could be marking the restriced-dma-pool region with a 'status =
> >> "disabled"' property, or any variant of that scheme.
> >
> > This function is called only when there is no IOMMU present (check in
> > drivers/of/device.c). I can still add of_device_is_available(node)
> > here if you think it's helpful.
>
> I believe it is, since boot loader can have a shared Device Tree blob
> skeleton and do various adaptations based on the chip (that's what we
> do) and adding a status property is much simpler than insertion new
> nodes are run time.
>
> >
> >>
> >>> + return of_reserved_mem_device_init_by_idx(
> >>> + dev, dev->of_node, i);
> >>
> >> This does not seem to be supporting more than one memory region, did not
> >> you want something like instead:
> >>
> >> ret = of_reserved_mem_device_init_by_idx(...);
> >> if (ret)
> >> return ret;
> >>
> >
> > Yes. This implement only supports one restriced-dma-pool memory region
> > with the assumption that there is only one memory region with the
> > compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> > to shared-dma-pool.
>
> Then if here is such a known limitation it should be both documented and
> enforced here, you shouldn ot be iterating over all of the phandles that
> you find, stop at the first one and issue a warning if count > 1?

What I have in mind is there might be multiple memory regions, but
only one is for restriced-dma-pool.
Say, if you want a separated region for coherent DMA and only do
streaming DMA in this restriced-dma-pool region, you can add another
reserved-memory node with shared-dma-pool in dts and the current
implementation will try to allocate the memory via
dma_alloc_from_dev_coherent() first (see dma_alloc_attrs() in
/kernel/dma/mapping.c).
Or if you have vendor specific memory region, you can still set up
restriced-dma-pool by adding another reserved-memory node in dts.
Dose this make sense to you? I'll document this for sure.

> --
> Florian

Re: [PATCH v5 00/21] ibmvfc: initial MQ development/enablement

2021-01-14 Thread Martin K. Petersen



Tyrel,

> Recent updates in pHyp Firmware and VIOS releases provide new
> infrastructure towards enabling Subordinate Command Response Queues
> (Sub-CRQs) such that each Sub-CRQ is a channel backed by an actual
> hardware queue in the FC stack on the partner VIOS. Sub-CRQs are
> registered with the firmware via hypercalls and then negotiated with
> the VIOS via new Management Datagrams (MADs) for channel setup.
>
> This initial implementation adds the necessary Sub-CRQ framework and
> implements the new MADs for negotiating and assigning a set of
> Sub-CRQs to associated VIOS HW backed channels.

Applied to 5.12/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

[powerpc:next-test] BUILD SUCCESS 6629114a7f6f21d41640cbc006c3694e65940729

2021-01-14 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: 6629114a7f6f21d41640cbc006c3694e65940729  powerpc/vas: Fix IRQ 
name allocation

elapsed time: 726m

configs tested: 122
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
mips tb0287_defconfig
s390  debug_defconfig
openriscdefconfig
arcnsim_700_defconfig
arm   sunxi_defconfig
mips  cavium_octeon_defconfig
alphaalldefconfig
sh  r7780mp_defconfig
sparc64 defconfig
m68kstmark2_defconfig
arcnsimosci_defconfig
powerpc mpc5200_defconfig
arm   cns3420vb_defconfig
powerpc   holly_defconfig
mips  pistachio_defconfig
mipsnlm_xlr_defconfig
powerpc  tqm8xx_defconfig
powerpc  katmai_defconfig
powerpc   eiger_defconfig
arc  alldefconfig
m68kmvme147_defconfig
umkunit_defconfig
m68k alldefconfig
arm at91_dt_defconfig
m68km5407c3_defconfig
riscv  rv32_defconfig
arm davinci_all_defconfig
powerpc pq2fads_defconfig
powerpc mpc8272_ads_defconfig
arm   corgi_defconfig
arm   aspeed_g5_defconfig
armmvebu_v5_defconfig
arm assabet_defconfig
armmulti_v7_defconfig
mips loongson1b_defconfig
mips   mtx1_defconfig
armspear3xx_defconfig
arc  axs103_defconfig
nios2   defconfig
arm cm_x300_defconfig
sh   se7750_defconfig
sh   se7206_defconfig
powerpc tqm8555_defconfig
arm lpc32xx_defconfig
sh   se7722_defconfig
nds32 allnoconfig
mipsmaltaup_defconfig
csky alldefconfig
arm  iop32x_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
arc  allyesconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20210114
i386 randconfig-a005-20210114
i386 randconfig-a006-20210114
i386 randconfig-a001-20210114
i386 randconfig-a003-20210114
i386 randconfig-a004-20210114
x86_64   randconfig-a015-20210114
x86_64   randconfig-a012-20210114
x86_64   randconfig-a013-20210114
x86_64   randconfig-a016-20210114
x86_64   randconfig-a014-20210114
x86_64   randconfig-a011-20210114
i386 randconfig-a012-20210114
i386 randconfig-a011-20210114
i386 randconfig-a016-20210114
i386 randconfig

[powerpc:merge] BUILD SUCCESS 56f0e929cb28f6e28e6f88d88c03b75979cf1f40

2021-01-14 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 56f0e929cb28f6e28e6f88d88c03b75979cf1f40  Automatic merge of 
'fixes' into merge (2021-01-14 15:57)

elapsed time: 725m

configs tested: 134
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
powerpc mpc836x_rdk_defconfig
powerpc mpc8313_rdb_defconfig
armmvebu_v7_defconfig
mipsbcm47xx_defconfig
sh  rsk7201_defconfig
mips tb0287_defconfig
s390  debug_defconfig
openriscdefconfig
arcnsim_700_defconfig
arm   sunxi_defconfig
powerpc mpc5200_defconfig
arm   cns3420vb_defconfig
powerpc   holly_defconfig
mips  pistachio_defconfig
mipsnlm_xlr_defconfig
powerpc  tqm8xx_defconfig
powerpc  katmai_defconfig
powerpc   eiger_defconfig
arc  alldefconfig
m68kmvme147_defconfig
umkunit_defconfig
armzeus_defconfig
mipsworkpad_defconfig
arm eseries_pxa_defconfig
mips cu1000-neo_defconfig
mips mpc30x_defconfig
mips  maltasmvp_defconfig
m68k alldefconfig
arm at91_dt_defconfig
m68km5407c3_defconfig
riscv  rv32_defconfig
arm   corgi_defconfig
arm   aspeed_g5_defconfig
armmvebu_v5_defconfig
arm assabet_defconfig
armmulti_v7_defconfig
sh   se7750_defconfig
m68kq40_defconfig
m68k   m5475evb_defconfig
arm   imx_v6_v7_defconfig
arm   tegra_defconfig
mips loongson1b_defconfig
mips   mtx1_defconfig
armspear3xx_defconfig
arc  axs103_defconfig
nios2   defconfig
arm cm_x300_defconfig
sh   se7206_defconfig
powerpc tqm8555_defconfig
arm lpc32xx_defconfig
sh   se7722_defconfig
nds32 allnoconfig
mipsmaltaup_defconfig
csky alldefconfig
arm  iop32x_defconfig
powerpc tqm8541_defconfig
sh ap325rxa_defconfig
arm palmz72_defconfig
s390 alldefconfig
h8300h8300h-sim_defconfig
powerpc tqm8540_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
arc  allyesconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20210114
i386 randconfig-a005-20210114
i386 randconfig-a006-20210114
i386

[powerpc:fixes-test] BUILD SUCCESS 41131a5e54ae7ba5a2bb8d7b30d1818b3f5b13d2

2021-01-14 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
fixes-test
branch HEAD: 41131a5e54ae7ba5a2bb8d7b30d1818b3f5b13d2  powerpc/vdso: Fix 
clock_gettime_fallback for vdso32

elapsed time: 728m

configs tested: 115
configs skipped: 94

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
mips tb0287_defconfig
s390  debug_defconfig
openriscdefconfig
arcnsim_700_defconfig
arm   sunxi_defconfig
powerpc mpc5200_defconfig
arm   cns3420vb_defconfig
powerpc   holly_defconfig
mips  pistachio_defconfig
mipsnlm_xlr_defconfig
powerpc  tqm8xx_defconfig
powerpc  katmai_defconfig
powerpc   eiger_defconfig
arc  alldefconfig
m68kmvme147_defconfig
umkunit_defconfig
m68k alldefconfig
arm at91_dt_defconfig
m68km5407c3_defconfig
riscv  rv32_defconfig
arm   corgi_defconfig
arm   aspeed_g5_defconfig
armmvebu_v5_defconfig
arm assabet_defconfig
armmulti_v7_defconfig
powerpc  pcm030_defconfig
powerpc  cm5200_defconfig
mips loongson1b_defconfig
mips   mtx1_defconfig
armspear3xx_defconfig
arc  axs103_defconfig
nios2   defconfig
arm cm_x300_defconfig
sh   se7750_defconfig
sh   se7206_defconfig
powerpc tqm8555_defconfig
arm lpc32xx_defconfig
sh   se7722_defconfig
nds32 allnoconfig
mipsmaltaup_defconfig
csky alldefconfig
arm  iop32x_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
arc  allyesconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20210114
i386 randconfig-a005-20210114
i386 randconfig-a006-20210114
i386 randconfig-a001-20210114
i386 randconfig-a003-20210114
i386 randconfig-a004-20210114
x86_64   randconfig-a015-20210114
x86_64   randconfig-a012-20210114
x86_64   randconfig-a013-20210114
x86_64   randconfig-a016-20210114
x86_64   randconfig-a014-20210114
x86_64   randconfig-a011-20210114
i386 randconfig-a012-20210114
i386 randconfig-a011-20210114
i386 randconfig-a016-20210114
i386 randconfig-a015-20210114
i386 randconfig-a013-20210114
i386 randconfig-a014-20210114
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-14 Thread Ming Lei

On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote:
> On 1/13/21 7:27 PM, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> >>> On 1/12/21 2:54 PM, Brian King wrote:
>  On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> > Introduce several new vhost fields for managing MQ state of the adapter
> > as well as initial defaults for MQ enablement.
> >
> > Signed-off-by: Tyrel Datwyler 
> > ---
> >  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
> >  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c 
> > b/drivers/scsi/ibmvscsi/ibmvfc.c
> > index ba95438a8912..9200fe49c57e 100644
> > --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> > +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> > @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template 
> > = {
> > .max_sectors = IBMVFC_MAX_SECTORS,
> > .shost_attrs = ibmvfc_attrs,
> > .track_queue_depth = 1,
> > +   .host_tagset = 1,
> 
>  This doesn't seem right. You are setting host_tagset, which means you 
>  want a
>  shared, host wide, tag set for commands. It also means that the total
>  queue depth for the host is can_queue. However, it looks like you are 
>  allocating
>  max_requests events for each sub crq, which means you are over 
>  allocating memory.
> >>>
> >>> With the shared tagset yes the queue depth for the host is can_queue, but 
> >>> this
> >>> also implies that the max queue depth for each hw queue is also 
> >>> can_queue. So,
> >>> in the worst case that all commands are queued down the same hw queue we 
> >>> need an
> >>> event pool with can_queue commands.
> >>>
> 
>  Looking at this closer, we might have bigger problems. There is a host 
>  wide
>  max number of commands that the VFC host supports, which gets returned on
>  NPIV Login. This value can change across a live migration event.
> >>>
> >>> From what I understand the max commands can only become less.
> >>>
> 
>  The ibmvfc driver, which does the same thing the lpfc driver does, 
>  modifies
>  can_queue on the scsi_host *after* the tag set has been allocated. This 
>  looks
>  to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>  we look at can_queue once the tag set is setup, and I'm not seeing a 
>  good way
>  to dynamically change the host queue depth once the tag set is setup. 
> 
>  Unless I'm missing something, our best options appear to either be to 
>  implement
>  our own host wide busy reference counting, which doesn't sound very 
>  good, or
>  we need to add some API to block / scsi that allows us to dynamically 
>  change
>  can_queue.
> >>>
> >>> Changing can_queue won't do use any good with the shared tagset becasue 
> >>> each
> >>> queue still needs to be able to queue can_queue number of commands in the 
> >>> worst
> >>> case.
> >>
> >> The issue I'm trying to highlight here is the following scenario:
> >>
> >> 1. We set shost->can_queue, then call scsi_add_host, which allocates the 
> >> tag set.
> >>
> >> 2. On our NPIV login response from the VIOS, we might get a lower value 
> >> than we
> >> initially set in shost->can_queue, so we update it, but nobody ever looks 
> >> at it
> >> again, and we don't have any protection against sending too many commands 
> >> to the host.
> >>
> >>
> >> Basically, we no longer have any code that ensures we don't send more
> >> commands to the VIOS than we are told it supports. According to the 
> >> architecture,
> >> if we actually do this, the VIOS will do an h_free_crq, which would be a 
> >> bit
> >> of a bug on our part.
> >>
> >> I don't think it was ever clearly defined in the API that a driver can
> >> change shost->can_queue after calling scsi_add_host, but up until
> >> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
> >> it doesn't. 
> > 
> > Actually it isn't related with commit 6eb045e092ef, because 
> > blk_mq_alloc_tag_set()
> > uses .can_queue to create driver tag sbitmap and request pool.
> > 
> > So even thought without 6eb045e092ef, the updated .can_queue can't work
> > as expected because the max driver tag depth has been fixed by blk-mq 
> > already.
> 
> There are two scenarios here. In the scenario of someone increasing can_queue
> after the tag set is allocated, I agree, blk-mq will never take advantage
> of this. However, in the scenario of someone *decreasing* can_queue after the
> tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code 
> provided
> this protection.

When .can_queue is decreased, blk-mq still may allocate driver tag which is >
.can_queue, this way might break driver/device too, but it depends on how driver
uses req->tag.

> 
> > 
> > What

Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

2021-01-14 Thread Nicholas Piggin

Excerpts from Christophe Leroy's message of January 14, 2021 11:28 pm:
> 
> 
> Le 14/01/2021 à 14:17, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm:
>>>
>>>
>>> Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :
 Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>>
>>
>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>> The page fault handling still has some complex logic particularly around
>>> hash table handling, in asm. Implement this in C instead.
>>>
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>> arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>> arch/powerpc/kernel/exceptions-64s.S  | 131 
>>> +++---
>>> arch/powerpc/mm/book3s64/hash_utils.c |  77 ++
>>> arch/powerpc/mm/fault.c   |  46 --
>>> 4 files changed, 107 insertions(+), 148 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
>>> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> index 066b1d34c7bc..60a669379aa0 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long 
>>> vpn,
>>> #define HPTE_NOHPTE_UPDATE  0x2
>>> #define HPTE_USE_KERNEL_KEY 0x4
>>> 
>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned 
>>> long dsisr);
>>> extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>   unsigned long vsid, pte_t *ptep, unsigned 
>>> long trap,
>>>   unsigned long flags, int ssize, int 
>>> subpage_prot);
>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>>> b/arch/powerpc/kernel/exceptions-64s.S
>>> index 6e53f7638737..bcb5e81d2088 100644
>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>  *
>>>  * Handling:
>>>  * - Hash MMU
>>> - *   Go to do_hash_page first to see if the HPT can be filled from an 
>>> entry in
>>> - *   the Linux page table. Hash faults can hit in kernel mode in a 
>>> fairly
>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry 
>>> in the
>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>  *   arbitrary state (e.g., interrupts disabled, locks held) when 
>>> accessing
>>>  *   "non-bolted" regions, e.g., vmalloc space. However these 
>>> should always be
>>> - *   backed by Linux page tables.
>>> + *   backed by Linux page table entries.
>>>  *
>>> - *   If none is found, do a Linux page fault. Linux page faults can 
>>> happen in
>>> - *   kernel mode due to user copy operations of course.
>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due 
>>> to user
>>> + *   copy operations of course.
>>>  *
>>>  *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in 
>>> guest
>>>  *   MMU context, which may cause a DSI in the host, which must go 
>>> to the
>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>> GEN_COMMON data_access
>>> ld  r4,_DAR(r1)
>>> ld  r5,_DSISR(r1)
>>
>> We have DSISR here. I think the dispatch between page fault or 
>> do_break() should be done here:
>> - It would be more similar to other arches
>
> Other sub-archs?
>
>> - Would avoid doing it also in instruction fault
>
> True but it's hidden under an unlikely branch so won't really help
> instruction fault.
>
>> - Would avoid that -1 return which looks more like a hack.
>
> I don't really see it as a hack, we return a code to asm caller to
> direct whether to restore registers or not, we alrady have this
> pattern.
>
> (I'm hoping all that might be go away one day by conrolling NV
> regs from C if we can get good code generation but even if not we
> still have it in the interrupt returns).
>
> That said I will give it a try here. At very least it might be a
> better intermediate step.

 Ah yes, this way doesn't work well for later patches because you end
 e.g., with the do_break call having to call the interrupt handler
 wrappers again when they actually expect to be in the asm entry state
 (e.g., irq soft-mask state) when called, and return via interrupt_return
 after the exit wrapper runs (which 64s uses

Re: [PATCH v5 00/21] ibmvfc: initial MQ development/enablement

2021-01-14 Thread Brian King

Tyrel,

I think this patch series is looking pretty good. I don't think we need
to wait for resolution of the can_queue issue being discussed, since
that is an issue that exists prior to this patch series and this patch
series doesn't make the issue any worse. Let's work that separately.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v5 21/21] ibmvfc: provide modules parameters for MQ settings

2021-01-14 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v5 18/21] ibmvfc: send Cancel MAD down each hw scsi channel

2021-01-14 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

[PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

2021-01-14 Thread Nathan Lynch

RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION, and express the value in terms of the
number of work areas allocated.

Signed-off-by: Nathan Lynch 

squash! powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
---
 arch/powerpc/include/asm/rtas.h | 9 ++---
 arch/powerpc/kernel/rtas-proc.c | 2 +-
 arch/powerpc/kernel/rtas.c  | 2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 332e1000ca0f..1aa7ab1cbc84 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -19,8 +19,11 @@
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above 
this value */
 
-/* Buffer size for ppc_rtas system call. */
-#define RTAS_RMOBUF_MAX (64 * 1024)
+/* Work areas shared with RTAS must be 4K, naturally aligned. */
+#define RTAS_WORK_AREA_SIZE   4096
+
+/* Work areas allocated for user space access. */
+#define RTAS_USER_REGION_SIZE (RTAS_WORK_AREA_SIZE * 16)
 
 /* RTAS return status codes */
 #define RTAS_BUSY  -2/* RTAS Busy */
@@ -357,7 +360,7 @@ extern void rtas_take_timebase(void);
 static inline int page_is_rtas_user_buf(unsigned long pfn)
 {
unsigned long paddr = (pfn << PAGE_SHIFT);
-   if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_RMOBUF_MAX))
+   if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + 
RTAS_USER_REGION_SIZE))
return 1;
return 0;
 }
diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index d2b0d99824a4..6857a5b0a1c3 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -767,6 +767,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
  */
 static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v)
 {
-   seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX);
+   seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_USER_REGION_SIZE);
return 0;
 }
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 55f6aa170e57..da65faadbbb2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1204,7 +1204,7 @@ void __init rtas_initialize(void)
if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
 #endif
-   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
+   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, 
PAGE_SIZE,
 0, rtas_region);
if (!rtas_rmo_buf)
panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",
-- 
2.29.2

[PATCH 0/6] powerpc/rtas: miscellaneous cleanups, user region allocation

2021-01-14 Thread Nathan Lynch

The region exposed to user space for use as work areas passed to
sys_rtas() can be incorrectly allocated on radix, leading to failures
in users of librtas. Correct this and clean up some of the code
visited along the way.

I think the cleanups should be unobjectionable and I've placed them
first in the series. Please check my work on the rtas_rmo_buf
allocation changes; they are only lightly tested so far (slot add on
Power9 PowerVM, and comparison of /memory@0/reg with the contents of
/proc/powerpc/rtas/rmo_buf on qemu Power9 w/radix).

I suspect the per-cpu RTAS argument structures for reentrant calls
need similar measures, but I can add that to the series once there is
consensus on the approach.

Nathan Lynch (6):
  powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation
  powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX
  powerpc/rtas: remove ibm_suspend_me_token
  powerpc/rtas: move syscall filter setup into separate function
  powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
  powerpc/rtas: constrain user region allocation to RMA

 arch/powerpc/include/asm/rtas.h |   9 ++-
 arch/powerpc/kernel/rtas-proc.c |  15 +++--
 arch/powerpc/kernel/rtas.c  | 108 
 3 files changed, 98 insertions(+), 34 deletions(-)

-- 
2.29.2

[PATCH 6/6] powerpc/rtas: constrain user region allocation to RMA

2021-01-14 Thread Nathan Lynch

Memory locations passed as arguments from the OS to RTAS usually need
to be addressable in 32-bit mode and must reside in the Real Mode
Area. On PAPR guests, the RMA starts at logical address 0 and is the
first logical memory block reported in the LPAR’s device tree.

On powerpc targets with RTAS, Linux makes available to user space a
region of memory suitable for arguments to be passed to RTAS via
sys_rtas(). This region (rtas_rmo_buf) is allocated via the memblock
API during boot in order to ensure that it satisfies the requirements
described above.

With radix MMU, the upper limit supplied to the memblock allocation
can exceed the bounds of the first logical memory block, since
ppc64_rma_size is ULONG_MAX and RTAS_INSTANTIATE_MAX is 1GB. (512MB is
a common size of the first memory block according to a small sample of
LPARs I have checked.) This leads to failures when user space invokes
an RTAS function that uses a work area, such as
ibm,configure-connector.

Alter the determination of the upper limit for rtas_rmo_buf's
allocation to consult the device tree directly, ensuring placement
within the RMA regardless of the MMU in use.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 80 +++---
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index da65faadbbb2..98dfb112f4df 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1166,6 +1166,70 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
return 0;
 }
 
+/*
+ * Memory locations passed to RTAS must be in the RMA as described by
+ * the range in /memory@0.
+ */
+static phys_addr_t rtas_arg_addr_limit(void)
+{
+   unsigned int addr_cells;
+   unsigned int size_cells;
+   struct device_node *np;
+   const __be32 *prop;
+   u64 limit;
+   u64 base;
+
+   /* RTAS is instantiated in 32-bit mode. */
+   limit = 1ULL << 32;
+
+   /* Account for mem=. */
+   if (memory_limit != 0)
+   limit = min(limit, memory_limit);
+
+   np = of_find_node_by_path("/memory@0");
+   if (!np)
+   goto out;
+
+   prop = of_get_property(np, "reg", NULL);
+   if (!prop)
+   goto put;
+
+   addr_cells = of_n_addr_cells(np);
+   base = of_read_number(prop, addr_cells);
+   prop += addr_cells;
+   size_cells = of_n_size_cells(np);
+   limit = min(limit, of_read_number(prop, size_cells));
+put:
+   of_node_put(np);
+out:
+   pr_debug("%s: base = %#llx limit = %#llx", __func__, base, limit);
+
+   return limit;
+}
+
+static void __init rtas_user_region_setup(void)
+{
+   phys_addr_t limit, align, size;
+
+   limit = rtas_arg_addr_limit();
+   size = RTAS_USER_REGION_SIZE;
+
+   /*
+* Although work areas need only 4KB alignment, user space
+* accesses this region via mmap so it must be placed on a
+* page boundary.
+*/
+   align = PAGE_SIZE;
+
+   rtas_rmo_buf = memblock_phys_alloc_range(size, align, 0, limit);
+   if (rtas_rmo_buf == 0) {
+   panic("Failed to allocate %llu bytes for user region below 
%pa\n",
+ size, );
+   }
+
+   pr_debug("RTAS user region allocated at %pa\n", _rmo_buf);
+}
+
 /*
  * Call early during boot, before mem init, to retrieve the RTAS
  * information from the device-tree and allocate the RMO buffer for userland
@@ -1173,7 +1237,6 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
  */
 void __init rtas_initialize(void)
 {
-   unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
u32 base, size, entry;
int no_base, no_size, no_entry;
 
@@ -1197,23 +1260,10 @@ void __init rtas_initialize(void)
no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", );
rtas.entry = no_entry ? rtas.base : entry;
 
-   /* If RTAS was found, allocate the RMO buffer for it and look for
-* the stop-self token if any
-*/
-#ifdef CONFIG_PPC64
-   if (firmware_has_feature(FW_FEATURE_LPAR))
-   rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
-#endif
-   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, 
PAGE_SIZE,
-0, rtas_region);
-   if (!rtas_rmo_buf)
-   panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",
- PAGE_SIZE, _region);
-
 #ifdef CONFIG_RTAS_ERROR_LOGGING
rtas_last_error_token = rtas_token("rtas-last-error");
 #endif
-
+   rtas_user_region_setup();
rtas_syscall_filter_init();
 }
 
-- 
2.29.2

[PATCH 4/6] powerpc/rtas: move syscall filter setup into separate function

2021-01-14 Thread Nathan Lynch

Reduce conditionally compiled sections within rtas_initialize() by
moving the filter table initialization into its own function already
guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 60fcf7f7b0b8..55f6aa170e57 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1051,6 +1051,16 @@ static bool block_rtas_call(int token, int nargs,
return true;
 }
 
+static void __init rtas_syscall_filter_init(void)
+{
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
+   }
+
+}
+
 #else
 
 static bool block_rtas_call(int token, int nargs,
@@ -1059,6 +1069,10 @@ static bool block_rtas_call(int token, int nargs,
return false;
 }
 
+static void __init rtas_syscall_filter_init(void)
+{
+}
+
 #endif /* CONFIG_PPC_RTAS_FILTER */
 
 /* We assume to be passed big endian arguments */
@@ -1162,9 +1176,6 @@ void __init rtas_initialize(void)
unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
u32 base, size, entry;
int no_base, no_size, no_entry;
-#ifdef CONFIG_PPC_RTAS_FILTER
-   int i;
-#endif
 
/* Get RTAS dev node and fill up our "rtas" structure with infos
 * about it.
@@ -1203,11 +1214,7 @@ void __init rtas_initialize(void)
rtas_last_error_token = rtas_token("rtas-last-error");
 #endif
 
-#ifdef CONFIG_PPC_RTAS_FILTER
-   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
-   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
-   }
-#endif
+   rtas_syscall_filter_init();
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,
-- 
2.29.2

[PATCH 3/6] powerpc/rtas: remove ibm_suspend_me_token

2021-01-14 Thread Nathan Lynch

There's not a compelling reason to cache the value of the token for
the ibm,suspend-me function. Just look it up when needed in the RTAS
syscall's special case for it.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d126d71ea5bd..60fcf7f7b0b8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -828,7 +828,6 @@ void rtas_activate_firmware(void)
pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
 }
 
-static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
 #ifdef CONFIG_PPC_PSERIES
 /**
  * rtas_call_reentrant() - Used for reentrant rtas calls
@@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
return -EINVAL;
 
/* Need to handle ibm,suspend_me call specially */
-   if (token == ibm_suspend_me_token) {
+   if (token == rtas_token("ibm,suspend-me")) {
 
/*
 * rtas_ibm_suspend_me assumes the streamid handle is in cpu
@@ -1191,10 +1190,8 @@ void __init rtas_initialize(void)
 * the stop-self token if any
 */
 #ifdef CONFIG_PPC64
-   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
-   ibm_suspend_me_token = rtas_token("ibm,suspend-me");
-   }
 #endif
rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
 0, rtas_region);
-- 
2.29.2

[PATCH 1/6] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation

2021-01-14 Thread Nathan Lynch

Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for
/proc/powerpc/rtas/rmo_buffer, explaining its expected use.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas-proc.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 2d33f342a293..e0f8329966d6 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -757,7 +757,16 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
 
 #define RMO_READ_BUF_MAX 30
 
-/* RTAS Userspace access */
+/**
+ * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
+ *
+ * Base + size description of a range of RTAS-addressable memory set
+ * aside for user space to use as work area(s) for certain RTAS
+ * functions. User space accesses this region via /dev/mem. Apart from
+ * security policies, the kernel does not arbitrate or serialize
+ * access to this region, and user space must ensure that concurrent
+ * users do not interfere with each other.
+ */
 static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v)
 {
seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX);
-- 
2.29.2

[PATCH 2/6] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX

2021-01-14 Thread Nathan Lynch

This constant is unused.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas-proc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index e0f8329966d6..d2b0d99824a4 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
return 0;
 }
 
-#define RMO_READ_BUF_MAX 30
-
 /**
  * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
  *
-- 
2.29.2

Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet

2021-01-14 Thread Jakub Kicinski

On Thu, 14 Jan 2021 08:33:49 + Lee Jones wrote:
> On Wed, 13 Jan 2021, Jakub Kicinski wrote:
> 
> > On Wed, 13 Jan 2021 16:41:16 + Lee Jones wrote:  
> > > Resending the stragglers again.   
> > >
> > > 
> > > This set is part of a larger effort attempting to clean-up W=1
> > >
> > > kernel builds, which are currently overwhelmingly riddled with
> > >
> > > niggly little warnings.   
> > >
> > >   
> > >
> > > v2:   
> > >
> > >  - Squashed IBM patches   
> > >
> > >  - Fixed real issue in SMSC
> > >  - Added Andrew's Reviewed-by tags on remainder  
> > 
> > Does not apply, please rebase on net-next/master.  
> 
> These are based on Tuesday's next/master.

What's next/master?

This is net-next:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/

> I just rebased them now with no issue.
> 
> What conflict are you seeing?

Applying: net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
error: patch failed: drivers/net/ethernet/smsc/smc91x.c:2192
error: drivers/net/ethernet/smsc/smc91x.c: patch does not apply
Patch failed at 0001 net: ethernet: smsc: smc91x: Fix function name in 
kernel-doc header
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

[PATCH v5 17/21] ibmvfc: add cancel mad initialization helper

2021-01-14 Thread Tyrel Datwyler

Add a helper routine for initializing a Cancel MAD. This will be useful
for a channelized client that needs to send Cancel commands down every
channel commands were sent for a particular LUN.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 68 --
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 578e27180f10..b0b0212344f3 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2379,6 +2379,45 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host 
*vhost, void *device,
return SUCCESS;
 }
 
+static struct ibmvfc_event *ibmvfc_init_tmf(struct ibmvfc_queue *queue,
+   struct scsi_device *sdev,
+   int type)
+{
+   struct ibmvfc_host *vhost = shost_priv(sdev->host);
+   struct scsi_target *starget = scsi_target(sdev);
+   struct fc_rport *rport = starget_to_rport(starget);
+   struct ibmvfc_event *evt;
+   struct ibmvfc_tmf *tmf;
+
+   evt = ibmvfc_get_event(queue);
+   ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT);
+
+   tmf = >iu.tmf;
+   memset(tmf, 0, sizeof(*tmf));
+   if (ibmvfc_check_caps(vhost, IBMVFC_HANDLE_VF_WWPN)) {
+   tmf->common.version = cpu_to_be32(2);
+   tmf->target_wwpn = cpu_to_be64(rport->port_name);
+   } else {
+   tmf->common.version = cpu_to_be32(1);
+   }
+   tmf->common.opcode = cpu_to_be32(IBMVFC_TMF_MAD);
+   tmf->common.length = cpu_to_be16(sizeof(*tmf));
+   tmf->scsi_id = cpu_to_be64(rport->port_id);
+   int_to_scsilun(sdev->lun, >lun);
+   if (!ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPRESS_ABTS))
+   type &= ~IBMVFC_TMF_SUPPRESS_ABTS;
+   if (vhost->state == IBMVFC_ACTIVE)
+   tmf->flags = cpu_to_be32((type | IBMVFC_TMF_LUA_VALID));
+   else
+   tmf->flags = cpu_to_be32(((type & IBMVFC_TMF_SUPPRESS_ABTS) | 
IBMVFC_TMF_LUA_VALID));
+   tmf->cancel_key = cpu_to_be32((unsigned long)sdev->hostdata);
+   tmf->my_cancel_key = cpu_to_be32((unsigned long)starget->hostdata);
+
+   init_completion(>comp);
+
+   return evt;
+}
+
 /**
  * ibmvfc_cancel_all - Cancel all outstanding commands to the device
  * @sdev:  scsi device to cancel commands
@@ -2393,9 +2432,6 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, 
void *device,
 static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
 {
struct ibmvfc_host *vhost = shost_priv(sdev->host);
-   struct scsi_target *starget = scsi_target(sdev);
-   struct fc_rport *rport = starget_to_rport(starget);
-   struct ibmvfc_tmf *tmf;
struct ibmvfc_event *evt, *found_evt;
union ibmvfc_iu rsp;
int rsp_rc = -EBUSY;
@@ -2422,32 +2458,8 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, 
int type)
}
 
if (vhost->logged_in) {
-   evt = ibmvfc_get_event(>crq);
-   ibmvfc_init_event(evt, ibmvfc_sync_completion, 
IBMVFC_MAD_FORMAT);
-
-   tmf = >iu.tmf;
-   memset(tmf, 0, sizeof(*tmf));
-   if (ibmvfc_check_caps(vhost, IBMVFC_HANDLE_VF_WWPN)) {
-   tmf->common.version = cpu_to_be32(2);
-   tmf->target_wwpn = cpu_to_be64(rport->port_name);
-   } else {
-   tmf->common.version = cpu_to_be32(1);
-   }
-   tmf->common.opcode = cpu_to_be32(IBMVFC_TMF_MAD);
-   tmf->common.length = cpu_to_be16(sizeof(*tmf));
-   tmf->scsi_id = cpu_to_be64(rport->port_id);
-   int_to_scsilun(sdev->lun, >lun);
-   if (!ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPRESS_ABTS))
-   type &= ~IBMVFC_TMF_SUPPRESS_ABTS;
-   if (vhost->state == IBMVFC_ACTIVE)
-   tmf->flags = cpu_to_be32((type | IBMVFC_TMF_LUA_VALID));
-   else
-   tmf->flags = cpu_to_be32(((type & 
IBMVFC_TMF_SUPPRESS_ABTS) | IBMVFC_TMF_LUA_VALID));
-   tmf->cancel_key = cpu_to_be32((unsigned long)sdev->hostdata);
-   tmf->my_cancel_key = cpu_to_be32((unsigned 
long)starget->hostdata);
-
+   evt = ibmvfc_init_tmf(>crq, sdev, type);
evt->sync_iu = 
-   init_completion(>comp);
rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout);
}
 
-- 
2.27.0

[PATCH v5 10/21] ibmvfc: define Sub-CRQ interrupt handler routine

2021-01-14 Thread Tyrel Datwyler

Simple handler that calls Sub-CRQ drain routine directly.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f3cd092478ee..51bcafad9490 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3571,6 +3571,16 @@ static void ibmvfc_drain_sub_crq(struct ibmvfc_queue 
*scrq)
}
 }
 
+static irqreturn_t ibmvfc_interrupt_scsi(int irq, void *scrq_instance)
+{
+   struct ibmvfc_queue *scrq = (struct ibmvfc_queue *)scrq_instance;
+
+   ibmvfc_toggle_scrq_irq(scrq, 0);
+   ibmvfc_drain_sub_crq(scrq);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH v5 18/21] ibmvfc: send Cancel MAD down each hw scsi channel

2021-01-14 Thread Tyrel Datwyler

In general the client needs to send Cancel MADs and task management
commands down the same channel as the command(s) intended to cancel or
abort. The client assigns cancel keys per LUN and thus must send a
Cancel down each channel commands were submitted for that LUN. Further,
the client then must wait for those cancel completions prior to
submitting a LUN RESET or ABORT TASK SET.

Add a cancel rsp iu syncronization field to the ibmvfc_queue struct such
that the cancel routine can sync the cancel response to each queue that
requires a cancel command. Build a list of each cancel event sent and
wait for the completion of each submitted cancel.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 109 +
 drivers/scsi/ibmvscsi/ibmvfc.h |   3 +
 2 files changed, 100 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index b0b0212344f3..5ca8fcafd1d5 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2418,18 +2418,82 @@ static struct ibmvfc_event *ibmvfc_init_tmf(struct 
ibmvfc_queue *queue,
return evt;
 }
 
-/**
- * ibmvfc_cancel_all - Cancel all outstanding commands to the device
- * @sdev:  scsi device to cancel commands
- * @type:  type of error recovery being performed
- *
- * This sends a cancel to the VIOS for the specified device. This does
- * NOT send any abort to the actual device. That must be done separately.
- *
- * Returns:
- * 0 on success / other on failure
- **/
-static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
+static int ibmvfc_cancel_all_mq(struct scsi_device *sdev, int type)
+{
+   struct ibmvfc_host *vhost = shost_priv(sdev->host);
+   struct ibmvfc_event *evt, *found_evt, *temp;
+   struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs;
+   unsigned long flags;
+   int num_hwq, i;
+   int fail = 0;
+   LIST_HEAD(cancelq);
+   u16 status;
+
+   ENTER;
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   num_hwq = vhost->scsi_scrqs.active_queues;
+   for (i = 0; i < num_hwq; i++) {
+   spin_lock(queues[i].q_lock);
+   spin_lock([i].l_lock);
+   found_evt = NULL;
+   list_for_each_entry(evt, [i].sent, queue_list) {
+   if (evt->cmnd && evt->cmnd->device == sdev) {
+   found_evt = evt;
+   break;
+   }
+   }
+   spin_unlock([i].l_lock);
+
+   if (found_evt && vhost->logged_in) {
+   evt = ibmvfc_init_tmf([i], sdev, type);
+   evt->sync_iu = [i].cancel_rsp;
+   ibmvfc_send_event(evt, vhost, default_timeout);
+   list_add_tail(>cancel, );
+   }
+
+   spin_unlock(queues[i].q_lock);
+   }
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
+   if (list_empty()) {
+   if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
+   sdev_printk(KERN_INFO, sdev, "No events found to 
cancel\n");
+   return 0;
+   }
+
+   sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n");
+
+   list_for_each_entry_safe(evt, temp, , cancel) {
+   wait_for_completion(>comp);
+   status = be16_to_cpu(evt->queue->cancel_rsp.mad_common.status);
+   list_del(>cancel);
+   ibmvfc_free_event(evt);
+
+   if (status != IBMVFC_MAD_SUCCESS) {
+   sdev_printk(KERN_WARNING, sdev, "Cancel failed with 
rc=%x\n", status);
+   switch (status) {
+   case IBMVFC_MAD_DRIVER_FAILED:
+   case IBMVFC_MAD_CRQ_ERROR:
+   /* Host adapter most likely going through reset, return 
success to
+* the caller will wait for the command being cancelled 
to get returned
+*/
+   break;
+   default:
+   fail = 1;
+   break;
+   }
+   }
+   }
+
+   if (fail)
+   return -EIO;
+
+   sdev_printk(KERN_INFO, sdev, "Successfully cancelled outstanding 
commands\n");
+   LEAVE;
+   return 0;
+}
+
+static int ibmvfc_cancel_all_sq(struct scsi_device *sdev, int type)
 {
struct ibmvfc_host *vhost = shost_priv(sdev->host);
struct ibmvfc_event *evt, *found_evt;
@@ -2498,6 +2562,27 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, 
int type)
return 0;
 }
 
+/**
+ * ibmvfc_cancel_all - Cancel all outstanding commands to the device
+ * @sdev:  scsi device to cancel commands
+ * @type:  type of error recovery being performed
+ *
+ * This sends a cancel to the VIOS for the specified

[PATCH v5 14/21] ibmvfc: set and track hw queue in ibmvfc_event struct

2021-01-14 Thread Tyrel Datwyler

Extract the hwq id from a SCSI command and store it in the ibmvfc_event
structure to identify which Sub-CRQ to send the command down when
channels are being utilized.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +
 drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 0653d52d4ea0..3f3cc37a263f 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1483,6 +1483,7 @@ static void ibmvfc_init_event(struct ibmvfc_event *evt,
evt->_done = done;
evt->done = ibmvfc_locked_done;
}
+   evt->hwq = 0;
 }
 
 /**
@@ -1839,6 +1840,8 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *cmnd)
struct ibmvfc_cmd *vfc_cmd;
struct ibmvfc_fcp_cmd_iu *iu;
struct ibmvfc_event *evt;
+   u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request);
+   u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq);
int rc;
 
if (unlikely((rc = fc_remote_port_chkready(rport))) ||
@@ -1865,6 +1868,8 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *cmnd)
}
 
vfc_cmd->correlation = cpu_to_be64((u64)evt);
+   if (vhost->using_channels)
+   evt->hwq = hwq % vhost->scsi_scrqs.active_queues;
 
if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev
return ibmvfc_send_event(evt, vhost, 0);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 3d76cd3c1fd9..2dbce7071409 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -753,6 +753,7 @@ struct ibmvfc_event {
struct completion comp;
struct completion *eh_comp;
struct timer_list timer;
+   u16 hwq;
 };
 
 /* a pool of event structs for use */
-- 
2.27.0

[PATCH v5 20/21] ibmvfc: enable MQ and set reasonable defaults

2021-01-14 Thread Tyrel Datwyler

Turn on MQ by default and set sane values for the upper limit on hw
queues for the scsi host, and number of hw scsi channels to request from
the partner VIOS.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index c3bb83c9d8a6..0391cb746d0b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,9 +41,9 @@
 #define IBMVFC_DEFAULT_LOG_LEVEL   2
 #define IBMVFC_MAX_CDB_LEN 16
 #define IBMVFC_CLS3_ERROR  0
-#define IBMVFC_MQ  0
-#define IBMVFC_SCSI_CHANNELS   0
-#define IBMVFC_SCSI_HW_QUEUES  1
+#define IBMVFC_MQ  1
+#define IBMVFC_SCSI_CHANNELS   8
+#define IBMVFC_SCSI_HW_QUEUES  8
 #define IBMVFC_MIG_NO_SUB_TO_CRQ   0
 #define IBMVFC_MIG_NO_N_TO_M   0
 
-- 
2.27.0

[PATCH v5 19/21] ibmvfc: purge scsi channels after transport loss/reset

2021-01-14 Thread Tyrel Datwyler

Grab the queue and list lock for each Sub-CRQ and add any uncompleted
events to the host purge list.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 5ca8fcafd1d5..d314dffaafc4 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1056,7 +1056,13 @@ static void ibmvfc_fail_request(struct ibmvfc_event 
*evt, int error_code)
 static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code)
 {
struct ibmvfc_event *evt, *pos;
+   struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs;
unsigned long flags;
+   int hwqs = 0;
+   int i;
+
+   if (vhost->using_channels)
+   hwqs = vhost->scsi_scrqs.active_queues;
 
ibmvfc_dbg(vhost, "Purging all requests\n");
spin_lock_irqsave(>crq.l_lock, flags);
@@ -1064,6 +1070,16 @@ static void ibmvfc_purge_requests(struct ibmvfc_host 
*vhost, int error_code)
ibmvfc_fail_request(evt, error_code);
list_splice_init(>crq.sent, >purge);
spin_unlock_irqrestore(>crq.l_lock, flags);
+
+   for (i = 0; i < hwqs; i++) {
+   spin_lock_irqsave(queues[i].q_lock, flags);
+   spin_lock([i].l_lock);
+   list_for_each_entry_safe(evt, pos, [i].sent, queue_list)
+   ibmvfc_fail_request(evt, error_code);
+   list_splice_init([i].sent, >purge);
+   spin_unlock([i].l_lock);
+   spin_unlock_irqrestore(queues[i].q_lock, flags);
+   }
 }
 
 /**
-- 
2.27.0

[PATCH v5 21/21] ibmvfc: provide modules parameters for MQ settings

2021-01-14 Thread Tyrel Datwyler

Add the various module parameter toggles for adjusting the MQ
characteristics at boot/load time as well as a device attribute for
changing the client scsi channel request amount.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 75 ++
 drivers/scsi/ibmvscsi/ibmvfc.h |  1 +
 2 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d314dffaafc4..7097028d4cb6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -40,6 +40,12 @@ static unsigned int disc_threads = IBMVFC_MAX_DISC_THREADS;
 static unsigned int ibmvfc_debug = IBMVFC_DEBUG;
 static unsigned int log_level = IBMVFC_DEFAULT_LOG_LEVEL;
 static unsigned int cls3_error = IBMVFC_CLS3_ERROR;
+static unsigned int mq_enabled = IBMVFC_MQ;
+static unsigned int nr_scsi_hw_queues = IBMVFC_SCSI_HW_QUEUES;
+static unsigned int nr_scsi_channels = IBMVFC_SCSI_CHANNELS;
+static unsigned int mig_channels_only = IBMVFC_MIG_NO_SUB_TO_CRQ;
+static unsigned int mig_no_less_channels = IBMVFC_MIG_NO_N_TO_M;
+
 static LIST_HEAD(ibmvfc_head);
 static DEFINE_SPINLOCK(ibmvfc_driver_lock);
 static struct scsi_transport_template *ibmvfc_transport_template;
@@ -49,6 +55,22 @@ MODULE_AUTHOR("Brian King ");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(IBMVFC_DRIVER_VERSION);
 
+module_param_named(mq, mq_enabled, uint, S_IRUGO);
+MODULE_PARM_DESC(mq, "Enable multiqueue support. "
+"[Default=" __stringify(IBMVFC_MQ) "]");
+module_param_named(scsi_host_queues, nr_scsi_hw_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(scsi_host_queues, "Number of SCSI Host submission queues. "
+"[Default=" __stringify(IBMVFC_SCSI_HW_QUEUES) "]");
+module_param_named(scsi_hw_channels, nr_scsi_channels, uint, S_IRUGO);
+MODULE_PARM_DESC(scsi_hw_channels, "Number of hw scsi channels to request. "
+"[Default=" __stringify(IBMVFC_SCSI_CHANNELS) "]");
+module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO);
+MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized 
system. "
+"[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]");
+module_param_named(mig_no_less_channels, mig_no_less_channels, uint, S_IRUGO);
+MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with less 
channels. "
+"[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]");
+
 module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. "
 "[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]");
@@ -926,7 +948,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
crq->cur = 0;
 
if (vhost->scsi_scrqs.scrqs) {
-   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+   for (i = 0; i < nr_scsi_hw_queues; i++) {
scrq = >scsi_scrqs.scrqs[i];
spin_lock(scrq->q_lock);
memset(scrq->msgs.scrq, 0, PAGE_SIZE);
@@ -3397,6 +3419,37 @@ static ssize_t ibmvfc_store_log_level(struct device *dev,
return strlen(buf);
 }
 
+static ssize_t ibmvfc_show_scsi_channels(struct device *dev,
+struct device_attribute *attr, char 
*buf)
+{
+   struct Scsi_Host *shost = class_to_shost(dev);
+   struct ibmvfc_host *vhost = shost_priv(shost);
+   unsigned long flags = 0;
+   int len;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+   return len;
+}
+
+static ssize_t ibmvfc_store_scsi_channels(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct Scsi_Host *shost = class_to_shost(dev);
+   struct ibmvfc_host *vhost = shost_priv(shost);
+   unsigned long flags = 0;
+   unsigned int channels;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   channels = simple_strtoul(buf, NULL, 10);
+   vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues);
+   ibmvfc_hard_reset_host(vhost);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+   return strlen(buf);
+}
+
 static DEVICE_ATTR(partition_name, S_IRUGO, ibmvfc_show_host_partition_name, 
NULL);
 static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, NULL);
 static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL);
@@ -3405,6 +3458,8 @@ static DEVICE_ATTR(npiv_version, S_IRUGO, 
ibmvfc_show_host_npiv_version, NULL);
 static DEVICE_ATTR(capabilities, S_IRUGO, ibmvfc_show_host_capabilities, NULL);
 static DEVICE_ATTR(log_level, S_IRUGO | S_IWUSR,
   ibmvfc_show_log_level, ibmvfc_store_log_level);
+static DEVICE_ATTR(nr_scsi_channels,

[PATCH v5 15/21] ibmvfc: send commands down HW Sub-CRQ when channelized

2021-01-14 Thread Tyrel Datwyler

When the client has negotiated the use of channels all vfcFrames are
required to go down a Sub-CRQ channel or it is a protocoal violation. If
the adapter state is channelized submit vfcFrames to the appropriate
Sub-CRQ via the h_send_sub_crq() helper.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 39 --
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 3f3cc37a263f..865b87881d86 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -704,6 +704,15 @@ static int ibmvfc_send_crq(struct ibmvfc_host *vhost, u64 
word1, u64 word2)
return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, word1, word2);
 }
 
+static int ibmvfc_send_sub_crq(struct ibmvfc_host *vhost, u64 cookie, u64 
word1,
+  u64 word2, u64 word3, u64 word4)
+{
+   struct vio_dev *vdev = to_vio_dev(vhost->dev);
+
+   return plpar_hcall_norets(H_SEND_SUB_CRQ, vdev->unit_address, cookie,
+ word1, word2, word3, word4);
+}
+
 /**
  * ibmvfc_send_crq_init - Send a CRQ init message
  * @vhost: ibmvfc host struct
@@ -1623,8 +1632,17 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
 
mb();
 
-   if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
- be64_to_cpu(crq_as_u64[1] {
+   if (evt->queue->fmt == IBMVFC_SUB_CRQ_FMT)
+   rc = ibmvfc_send_sub_crq(vhost,
+evt->queue->vios_cookie,
+be64_to_cpu(crq_as_u64[0]),
+be64_to_cpu(crq_as_u64[1]),
+0, 0);
+   else
+   rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
+be64_to_cpu(crq_as_u64[1]));
+
+   if (rc) {
list_del(>queue_list);
spin_unlock_irqrestore(>queue->l_lock, flags);
del_timer(>timer);
@@ -1842,6 +1860,7 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *cmnd)
struct ibmvfc_event *evt;
u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request);
u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq);
+   u16 scsi_channel;
int rc;
 
if (unlikely((rc = fc_remote_port_chkready(rport))) ||
@@ -1852,7 +1871,13 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *cmnd)
}
 
cmnd->result = (DID_OK << 16);
-   evt = ibmvfc_get_event(>crq);
+   if (vhost->using_channels) {
+   scsi_channel = hwq % vhost->scsi_scrqs.active_queues;
+   evt = ibmvfc_get_event(>scsi_scrqs.scrqs[scsi_channel]);
+   evt->hwq = hwq % vhost->scsi_scrqs.active_queues;
+   } else
+   evt = ibmvfc_get_event(>crq);
+
ibmvfc_init_event(evt, ibmvfc_scsi_done, IBMVFC_CMD_FORMAT);
evt->cmnd = cmnd;
 
@@ -1868,8 +1893,6 @@ static int ibmvfc_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *cmnd)
}
 
vfc_cmd->correlation = cpu_to_be64((u64)evt);
-   if (vhost->using_channels)
-   evt->hwq = hwq % vhost->scsi_scrqs.active_queues;
 
if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev
return ibmvfc_send_event(evt, vhost, 0);
@@ -2200,7 +2223,11 @@ static int ibmvfc_reset_device(struct scsi_device *sdev, 
int type, char *desc)
 
spin_lock_irqsave(vhost->host->host_lock, flags);
if (vhost->state == IBMVFC_ACTIVE) {
-   evt = ibmvfc_get_event(>crq);
+   if (vhost->using_channels)
+   evt = ibmvfc_get_event(>scsi_scrqs.scrqs[0]);
+   else
+   evt = ibmvfc_get_event(>crq);
+
ibmvfc_init_event(evt, ibmvfc_sync_completion, 
IBMVFC_CMD_FORMAT);
tmf = ibmvfc_init_vfc_cmd(evt, sdev);
iu = ibmvfc_get_fcp_iu(vhost, tmf);
-- 
2.27.0

[PATCH v5 13/21] ibmvfc: advertise client support for using hardware channels

2021-01-14 Thread Tyrel Datwyler

Previous patches have plumbed the necessary Sub-CRQ interface and
channel negotiation MADs to fully channelize via hardware backed queues.

Advertise client support via NPIV Login capability
IBMVFC_CAN_USE_CHANNELS when the client bits have MQ enabled via
vhost->mq_enabled, or when channels were already in use during a
subsequent NPIV Login. The later is required because channel support is
only renegotiated after a CRQ pair is broken. Simple NPIV Logout/Logins
require the client to continue to advertise the channel capability until
the CRQ pair between the client is broken.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a00f38558613..0653d52d4ea0 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1410,6 +1410,10 @@ static void ibmvfc_set_login_info(struct ibmvfc_host 
*vhost)
 
login_info->max_cmds = cpu_to_be32(max_requests + 
IBMVFC_NUM_INTERNAL_REQ);
login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | 
IBMVFC_CAN_SEND_VF_WWPN);
+
+   if (vhost->mq_enabled || vhost->using_channels)
+   login_info->capabilities |= 
cpu_to_be64(IBMVFC_CAN_USE_CHANNELS);
+
login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token);
login_info->async.len = cpu_to_be32(async_crq->size *
sizeof(*async_crq->msgs.async));
-- 
2.27.0

[PATCH v5 12/21] ibmvfc: implement channel enquiry and setup commands

2021-01-14 Thread Tyrel Datwyler

New NPIV_ENQUIRY_CHANNEL and NPIV_SETUP_CHANNEL management datagrams
(MADs) were defined in a previous patchset. If the client advertises a
desire to use channels and the partner VIOS is channel capable then the
client must proceed with channel enquiry to determine the maximum number
of channels the VIOS is capable of providing, and registering SubCRQs
via channel setup with the VIOS immediately following NPIV Login. This
handshaking should not be performed for subsequent NPIV Logins unless
the CRQ connection has been reset.

Implement these two new MADs and issue them following a successful NPIV
login where the VIOS has set the SUPPORT_CHANNELS capability bit in the
NPIV Login response.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 135 -
 drivers/scsi/ibmvscsi/ibmvfc.h |   3 +
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d3d7c6b53d4f..a00f38558613 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -909,6 +909,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
spin_lock(vhost->crq.q_lock);
vhost->state = IBMVFC_NO_CRQ;
vhost->logged_in = 0;
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
 
/* Clean out the queue */
memset(crq->msgs.crq, 0, PAGE_SIZE);
@@ -4586,6 +4588,118 @@ static void ibmvfc_discover_targets(struct ibmvfc_host 
*vhost)
ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
 }
 
+static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_host *vhost = evt->vhost;
+   u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
+   int level = IBMVFC_DEFAULT_LOG_LEVEL;
+
+   ibmvfc_free_event(evt);
+
+   switch (mad_status) {
+   case IBMVFC_MAD_SUCCESS:
+   ibmvfc_dbg(vhost, "Channel Setup succeded\n");
+   vhost->do_enquiry = 0;
+   break;
+   case IBMVFC_MAD_FAILED:
+   level += ibmvfc_retry_host_init(vhost);
+   ibmvfc_log(vhost, level, "Channel Setup failed\n");
+   fallthrough;
+   case IBMVFC_MAD_DRIVER_FAILED:
+   return;
+   default:
+   dev_err(vhost->dev, "Invalid Channel Setup response: 0x%x\n",
+   mad_status);
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   return;
+   }
+
+   ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
+   wake_up(>work_wait_q);
+}
+
+static void ibmvfc_channel_setup(struct ibmvfc_host *vhost)
+{
+   struct ibmvfc_channel_setup_mad *mad;
+   struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
+   struct ibmvfc_event *evt = ibmvfc_get_event(>crq);
+
+   memset(setup_buf, 0, sizeof(*setup_buf));
+   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+
+   ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
+   mad = >iu.channel_setup;
+   memset(mad, 0, sizeof(*mad));
+   mad->common.version = cpu_to_be32(1);
+   mad->common.opcode = cpu_to_be32(IBMVFC_CHANNEL_SETUP);
+   mad->common.length = cpu_to_be16(sizeof(*mad));
+   mad->buffer.va = cpu_to_be64(vhost->channel_setup_dma);
+   mad->buffer.len = cpu_to_be32(sizeof(*vhost->channel_setup_buf));
+
+   ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT_WAIT);
+
+   if (!ibmvfc_send_event(evt, vhost, default_timeout))
+   ibmvfc_dbg(vhost, "Sent channel setup\n");
+   else
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN);
+}
+
+static void ibmvfc_channel_enquiry_done(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_host *vhost = evt->vhost;
+   struct ibmvfc_channel_enquiry *rsp = >xfer_iu->channel_enquiry;
+   u32 mad_status = be16_to_cpu(rsp->common.status);
+   int level = IBMVFC_DEFAULT_LOG_LEVEL;
+
+   switch (mad_status) {
+   case IBMVFC_MAD_SUCCESS:
+   ibmvfc_dbg(vhost, "Channel Enquiry succeeded\n");
+   vhost->max_vios_scsi_channels = 
be32_to_cpu(rsp->num_scsi_subq_channels);
+   ibmvfc_free_event(evt);
+   break;
+   case IBMVFC_MAD_FAILED:
+   level += ibmvfc_retry_host_init(vhost);
+   ibmvfc_log(vhost, level, "Channel Enquiry failed\n");
+   fallthrough;
+   case IBMVFC_MAD_DRIVER_FAILED:
+   ibmvfc_free_event(evt);
+   return;
+   default:
+   dev_err(vhost->dev, "Invalid Channel Enquiry response: 0x%x\n",
+   mad_status);
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   ibmvfc_free_event(evt);
+   return;
+   }
+
+   ibmvfc_channel_setup(vhost);
+}
+
+static void ibmvfc_channel_enquiry(struct ibmvfc_host *vhost)
+{
+   struct

[PATCH v5 16/21] ibmvfc: register Sub-CRQ handles with VIOS during channel setup

2021-01-14 Thread Tyrel Datwyler

If the ibmvfc client adapter requests channels it must submit a number
of Sub-CRQ handles matching the number of channels being requested. The
VIOS in its response will overwrite the actual number of channel
resources allocated which may be less than what was requested. The
client then must store the VIOS Sub-CRQ handle for each queue. This VIOS
handle is needed as a parameter with  h_send_sub_crq().

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 865b87881d86..578e27180f10 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4627,15 +4627,35 @@ static void ibmvfc_discover_targets(struct ibmvfc_host 
*vhost)
 static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
 {
struct ibmvfc_host *vhost = evt->vhost;
+   struct ibmvfc_channel_setup *setup = vhost->channel_setup_buf;
+   struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs;
u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
int level = IBMVFC_DEFAULT_LOG_LEVEL;
+   int flags, active_queues, i;
 
ibmvfc_free_event(evt);
 
switch (mad_status) {
case IBMVFC_MAD_SUCCESS:
ibmvfc_dbg(vhost, "Channel Setup succeded\n");
+   flags = be32_to_cpu(setup->flags);
vhost->do_enquiry = 0;
+   active_queues = be32_to_cpu(setup->num_scsi_subq_channels);
+   scrqs->active_queues = active_queues;
+
+   if (flags & IBMVFC_CHANNELS_CANCELED) {
+   ibmvfc_dbg(vhost, "Channels Canceled\n");
+   vhost->using_channels = 0;
+   } else {
+   if (active_queues)
+   vhost->using_channels = 1;
+   for (i = 0; i < active_queues; i++)
+   scrqs->scrqs[i].vios_cookie =
+   be64_to_cpu(setup->channel_handles[i]);
+
+   ibmvfc_dbg(vhost, "Using %u channels\n",
+  vhost->scsi_scrqs.active_queues);
+   }
break;
case IBMVFC_MAD_FAILED:
level += ibmvfc_retry_host_init(vhost);
@@ -4659,9 +4679,19 @@ static void ibmvfc_channel_setup(struct ibmvfc_host 
*vhost)
struct ibmvfc_channel_setup_mad *mad;
struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
struct ibmvfc_event *evt = ibmvfc_get_event(>crq);
+   struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs;
+   unsigned int num_channels =
+   min(vhost->client_scsi_channels, vhost->max_vios_scsi_channels);
+   int i;
 
memset(setup_buf, 0, sizeof(*setup_buf));
-   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+   if (num_channels == 0)
+   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+   else {
+   setup_buf->num_scsi_subq_channels = cpu_to_be32(num_channels);
+   for (i = 0; i < num_channels; i++)
+   setup_buf->channel_handles[i] = 
cpu_to_be64(scrqs->scrqs[i].cookie);
+   }
 
ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
mad = >iu.channel_setup;
-- 
2.27.0

[PATCH v5 11/21] ibmvfc: map/request irq and register Sub-CRQ interrupt handler

2021-01-14 Thread Tyrel Datwyler

Create an irq mapping for the hw_irq number provided from phyp firmware.
Request an irq assigned our Sub-CRQ interrupt handler. Unmap these irqs
at Sub-CRQ teardown.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 51bcafad9490..d3d7c6b53d4f 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5292,12 +5292,34 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
goto reg_failed;
}
 
+   scrq->irq = irq_create_mapping(NULL, scrq->hw_irq);
+
+   if (!scrq->irq) {
+   rc = -EINVAL;
+   dev_err(dev, "Error mapping sub-crq[%d] irq\n", index);
+   goto irq_failed;
+   }
+
+   snprintf(scrq->name, sizeof(scrq->name), "ibmvfc-%x-scsi%d",
+vdev->unit_address, index);
+   rc = request_irq(scrq->irq, ibmvfc_interrupt_scsi, 0, scrq->name, scrq);
+
+   if (rc) {
+   dev_err(dev, "Couldn't register sub-crq[%d] irq\n", index);
+   irq_dispose_mapping(scrq->irq);
+   goto irq_failed;
+   }
+
scrq->hwq_id = index;
scrq->vhost = vhost;
 
LEAVE;
return 0;
 
+irq_failed:
+   do {
+   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
 reg_failed:
ibmvfc_free_queue(vhost, scrq);
LEAVE;
@@ -5313,6 +5335,9 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
 
ENTER;
 
+   free_irq(scrq->irq, scrq);
+   irq_dispose_mapping(scrq->irq);
+
do {
rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
scrq->cookie);
-- 
2.27.0

[PATCH v5 03/21] ibmvfc: init/free event pool during queue allocation/free

2021-01-14 Thread Tyrel Datwyler

The event pool and CRQ used to be separate entities of the adapter host
structure and as such were allocated and freed independently of each
other. Recent work as defined a generic queue structure with an event
pool specific to each queue. As such the event pool for each queue
shouldn't be allocated/freed independently, but instead performed as
part of the queue allocation/free routines.

Move the calls to ibmvfc_event_pool_{init|free} into
ibmvfc_{alloc|free}_queue respectively. The only functional change here
is that the CRQ cannot be released in ibmvfc_remove until after the
event pool has been successfully purged since releasing the queue will
also free the event pool.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index cd9273a5fadb..9330f5a65a7e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -807,6 +807,8 @@ static void ibmvfc_free_queue(struct ibmvfc_host *vhost,
dma_unmap_single(dev, queue->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
free_page((unsigned long)queue->msgs.handle);
queue->msgs.handle = NULL;
+
+   ibmvfc_free_event_pool(vhost, queue);
 }
 
 /**
@@ -5019,6 +5021,10 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
switch (fmt) {
case IBMVFC_CRQ_FMT:
fmt_size = sizeof(*queue->msgs.crq);
+   if (ibmvfc_init_event_pool(vhost, queue)) {
+   dev_err(dev, "Couldn't initialize event pool.\n");
+   return -ENOMEM;
+   }
break;
case IBMVFC_ASYNC_FMT:
fmt_size = sizeof(*queue->msgs.async);
@@ -5333,13 +5339,8 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
goto kill_kthread;
}
 
-   if ((rc = ibmvfc_init_event_pool(vhost, >crq))) {
-   dev_err(dev, "Couldn't initialize event pool. rc=%d\n", rc);
-   goto release_crq;
-   }
-
if ((rc = scsi_add_host(shost, dev)))
-   goto release_event_pool;
+   goto release_crq;
 
fc_host_dev_loss_tmo(shost) = IBMVFC_DEV_LOSS_TMO;
 
@@ -5362,8 +5363,6 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
 
 remove_shost:
scsi_remove_host(shost);
-release_event_pool:
-   ibmvfc_free_event_pool(vhost, >crq);
 release_crq:
ibmvfc_release_crq_queue(vhost);
 kill_kthread:
@@ -5398,7 +5397,6 @@ static int ibmvfc_remove(struct vio_dev *vdev)
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 
ibmvfc_wait_while_resetting(vhost);
-   ibmvfc_release_crq_queue(vhost);
kthread_stop(vhost->work_thread);
fc_remove_host(vhost->host);
scsi_remove_host(vhost->host);
@@ -5408,7 +5406,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
list_splice_init(>purge, );
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_complete_purge();
-   ibmvfc_free_event_pool(vhost, >crq);
+   ibmvfc_release_crq_queue(vhost);
 
ibmvfc_free_mem(vhost);
spin_lock(_driver_lock);
-- 
2.27.0

[PATCH v5 09/21] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2021-01-14 Thread Tyrel Datwyler

The logic for iterating over the Sub-CRQ responses is similiar to that
of the primary CRQ. Add the necessary handlers for processing those
responses.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 86 ++
 1 file changed, 86 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 5d7ada0ed0d6..f3cd092478ee 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3485,6 +3485,92 @@ static int ibmvfc_toggle_scrq_irq(struct ibmvfc_queue 
*scrq, int enable)
return rc;
 }
 
+static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
*vhost,
+  struct list_head *evt_doneq)
+{
+   struct ibmvfc_event *evt = (struct ibmvfc_event 
*)be64_to_cpu(crq->ioba);
+
+   switch (crq->valid) {
+   case IBMVFC_CRQ_CMD_RSP:
+   break;
+   case IBMVFC_CRQ_XPORT_EVENT:
+   return;
+   default:
+   dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
crq->valid);
+   return;
+   }
+
+   /* The only kind of payload CRQs we should get are responses to
+* things we send. Make sure this response is to something we
+* actually sent
+*/
+   if (unlikely(!ibmvfc_valid_event(>queue->evt_pool, evt))) {
+   dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
invalid!\n",
+   crq->ioba);
+   return;
+   }
+
+   if (unlikely(atomic_read(>free))) {
+   dev_err(vhost->dev, "Received duplicate correlation_token 
0x%08llx!\n",
+   crq->ioba);
+   return;
+   }
+
+   spin_lock(>queue->l_lock);
+   list_move_tail(>queue_list, evt_doneq);
+   spin_unlock(>queue->l_lock);
+}
+
+static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_queue *scrq)
+{
+   struct ibmvfc_crq *crq;
+
+   crq = >msgs.scrq[scrq->cur].crq;
+   if (crq->valid & 0x80) {
+   if (++scrq->cur == scrq->size)
+   scrq->cur = 0;
+   rmb();
+   } else
+   crq = NULL;
+
+   return crq;
+}
+
+static void ibmvfc_drain_sub_crq(struct ibmvfc_queue *scrq)
+{
+   struct ibmvfc_crq *crq;
+   struct ibmvfc_event *evt, *temp;
+   unsigned long flags;
+   int done = 0;
+   LIST_HEAD(evt_doneq);
+
+   spin_lock_irqsave(scrq->q_lock, flags);
+   while (!done) {
+   while ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+   ibmvfc_handle_scrq(crq, scrq->vhost, _doneq);
+   crq->valid = 0;
+   wmb();
+   }
+
+   ibmvfc_toggle_scrq_irq(scrq, 1);
+   if ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+   ibmvfc_toggle_scrq_irq(scrq, 0);
+   ibmvfc_handle_scrq(crq, scrq->vhost, _doneq);
+   crq->valid = 0;
+   wmb();
+   } else
+   done = 1;
+   }
+   spin_unlock_irqrestore(scrq->q_lock, flags);
+
+   list_for_each_entry_safe(evt, temp, _doneq, queue_list) {
+   del_timer(>timer);
+   list_del(>queue_list);
+   ibmvfc_trc_end(evt);
+   evt->done(evt);
+   }
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH v5 05/21] ibmvfc: define hcall wrapper for registering a Sub-CRQ

2021-01-14 Thread Tyrel Datwyler

Sub-CRQs are registred with firmware via a hypercall. Abstract that
interface into a simpler helper function.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 524e81164d70..612c7f3d7bd3 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -138,6 +138,20 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
 static const char *unknown_error = "unknown error";
 
+static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
+ unsigned long length, unsigned long *cookie,
+ unsigned long *irq)
+{
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+   long rc;
+
+   rc = plpar_hcall(H_REG_SUB_CRQ, retbuf, unit_address, ioba, length);
+   *cookie = retbuf[0];
+   *irq = retbuf[1];
+
+   return rc;
+}
+
 static int ibmvfc_check_caps(struct ibmvfc_host *vhost, unsigned long 
cap_flags)
 {
u64 host_caps = be64_to_cpu(vhost->login_buf->resp.capabilities);
-- 
2.27.0

[PATCH v5 08/21] ibmvfc: add Sub-CRQ IRQ enable/disable routine

2021-01-14 Thread Tyrel Datwyler

Each Sub-CRQ has its own interrupt. A hypercall is required to toggle
the IRQ state. Provide the necessary mechanism via a helper function.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a198e118887d..5d7ada0ed0d6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3465,6 +3465,26 @@ static void ibmvfc_tasklet(void *data)
}
 }
 
+static int ibmvfc_toggle_scrq_irq(struct ibmvfc_queue *scrq, int enable)
+{
+   struct device *dev = scrq->vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   unsigned long rc;
+   int irq_action = H_ENABLE_VIO_INTERRUPT;
+
+   if (!enable)
+   irq_action = H_DISABLE_VIO_INTERRUPT;
+
+   rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action,
+   scrq->hw_irq, 0, 0);
+
+   if (rc)
+   dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n",
+   enable ? "enable" : "disable", scrq->hwq_id, rc);
+
+   return rc;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH v5 07/21] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2021-01-14 Thread Tyrel Datwyler

Allocate a set of Sub-CRQs in advance. During channel setup the client
and VIOS negotiate the number of queues the VIOS supports and the number
that the client desires to request. Its possible that the final channel
resources allocated is less than requested, but the client is still
responsible for sending handles for every queue it is hoping for.

Also, provide deallocation cleanup routines.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 125 +
 drivers/scsi/ibmvscsi/ibmvfc.h |   1 +
 2 files changed, 126 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 612c7f3d7bd3..a198e118887d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -895,6 +895,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_queue *crq = >crq;
+   struct ibmvfc_queue *scrq;
+   int i;
 
/* Close the CRQ */
do {
@@ -912,6 +914,16 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs.crq, 0, PAGE_SIZE);
crq->cur = 0;
 
+   if (vhost->scsi_scrqs.scrqs) {
+   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+   scrq = >scsi_scrqs.scrqs[i];
+   spin_lock(scrq->q_lock);
+   memset(scrq->msgs.scrq, 0, PAGE_SIZE);
+   scrq->cur = 0;
+   spin_unlock(scrq->q_lock);
+   }
+   }
+
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -5045,6 +5057,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
case IBMVFC_ASYNC_FMT:
fmt_size = sizeof(*queue->msgs.async);
break;
+   case IBMVFC_SUB_CRQ_FMT:
+   fmt_size = sizeof(*queue->msgs.scrq);
+   /* We need one extra event for Cancel Commands */
+   pool_size = max_requests + 1;
+   break;
default:
dev_warn(dev, "Unknown command/response queue message format: 
%d\n", fmt);
return -EINVAL;
@@ -5136,6 +5153,107 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
return retrc;
 }
 
+static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
+ int index)
+{
+   struct device *dev = vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   struct ibmvfc_queue *scrq = >scsi_scrqs.scrqs[index];
+   int rc = -ENOMEM;
+
+   ENTER;
+
+   if (ibmvfc_alloc_queue(vhost, scrq, IBMVFC_SUB_CRQ_FMT))
+   return -ENOMEM;
+
+   rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
+  >cookie, >hw_irq);
+
+   if (rc) {
+   dev_warn(dev, "Error registering sub-crq: %d\n", rc);
+   if (rc == H_PARAMETER)
+   dev_warn_once(dev, "Firmware may not support MQ\n");
+   goto reg_failed;
+   }
+
+   scrq->hwq_id = index;
+   scrq->vhost = vhost;
+
+   LEAVE;
+   return 0;
+
+reg_failed:
+   ibmvfc_free_queue(vhost, scrq);
+   LEAVE;
+   return rc;
+}
+
+static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int 
index)
+{
+   struct device *dev = vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   struct ibmvfc_queue *scrq = >scsi_scrqs.scrqs[index];
+   long rc;
+
+   ENTER;
+
+   do {
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
+   scrq->cookie);
+   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+
+   if (rc)
+   dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
+
+   ibmvfc_free_queue(vhost, scrq);
+   LEAVE;
+}
+
+static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+{
+   int i, j;
+
+   ENTER;
+
+   vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
+ sizeof(*vhost->scsi_scrqs.scrqs),
+ GFP_KERNEL);
+   if (!vhost->scsi_scrqs.scrqs)
+   return -1;
+
+   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+   if (ibmvfc_register_scsi_channel(vhost, i)) {
+   for (j = i; j > 0; j--)
+   ibmvfc_deregister_scsi_channel(vhost, j - 1);
+   kfree(vhost->scsi_scrqs.scrqs);
+   vhost->scsi_scrqs.scrqs = NULL;
+   vhost->scsi_scrqs.active_queues = 0;
+   LEAVE;
+   return -1;
+   }
+   }
+
+   LEAVE;
+   return 0;
+}
+
+static void ibmvfc_release_sub_crqs(struct

[PATCH v5 04/21] ibmvfc: add size parameter to ibmvfc_init_event_pool

2021-01-14 Thread Tyrel Datwyler

With the upcoming addition of Sub-CRQs the event pool size may vary
per-queue.

Add a size parameter to ibmvfc_init_event_pool such that different size
event pools can be requested by ibmvfc_alloc_queue.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 9330f5a65a7e..524e81164d70 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -723,19 +723,23 @@ static int ibmvfc_send_crq_init_complete(struct 
ibmvfc_host *vhost)
  * Returns zero on success.
  **/
 static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
- struct ibmvfc_queue *queue)
+ struct ibmvfc_queue *queue,
+ unsigned int size)
 {
int i;
struct ibmvfc_event_pool *pool = >evt_pool;
 
ENTER;
-   pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
-   pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL);
+   if (!size)
+   return 0;
+
+   pool->size = size;
+   pool->events = kcalloc(size, sizeof(*pool->events), GFP_KERNEL);
if (!pool->events)
return -ENOMEM;
 
pool->iu_storage = dma_alloc_coherent(vhost->dev,
- pool->size * 
sizeof(*pool->iu_storage),
+ size * sizeof(*pool->iu_storage),
  >iu_token, 0);
 
if (!pool->iu_storage) {
@@ -747,7 +751,7 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
INIT_LIST_HEAD(>free);
spin_lock_init(>l_lock);
 
-   for (i = 0; i < pool->size; ++i) {
+   for (i = 0; i < size; ++i) {
struct ibmvfc_event *evt = >events[i];
 
atomic_set(>free, 1);
@@ -5013,6 +5017,7 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
 {
struct device *dev = vhost->dev;
size_t fmt_size;
+   unsigned int pool_size = 0;
 
ENTER;
spin_lock_init(>_lock);
@@ -5021,10 +5026,7 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
switch (fmt) {
case IBMVFC_CRQ_FMT:
fmt_size = sizeof(*queue->msgs.crq);
-   if (ibmvfc_init_event_pool(vhost, queue)) {
-   dev_err(dev, "Couldn't initialize event pool.\n");
-   return -ENOMEM;
-   }
+   pool_size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
break;
case IBMVFC_ASYNC_FMT:
fmt_size = sizeof(*queue->msgs.async);
@@ -5034,6 +5036,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
return -EINVAL;
}
 
+   if (ibmvfc_init_event_pool(vhost, queue, pool_size)) {
+   dev_err(dev, "Couldn't initialize event pool.\n");
+   return -ENOMEM;
+   }
+
queue->msgs.handle = (void *)get_zeroed_page(GFP_KERNEL);
if (!queue->msgs.handle)
return -ENOMEM;
-- 
2.27.0

[PATCH v5 06/21] ibmvfc: add Subordinate CRQ definitions

2021-01-14 Thread Tyrel Datwyler

Subordinate Command Response Queues (Sub CRQ) are used in conjunction
with the primary CRQ when more than one queue is needed by the virtual
IO adapter. Recent phyp firmware versions support Sub CRQ's with ibmvfc
adapters. This feature is a prerequisite for supporting multiple
hardware backed submission queues in the vfc adapter.

The Sub CRQ command element differs from the standard CRQ in that it is
32bytes long as opposed to 16bytes for the latter. Despite this extra
16bytes the ibmvfc protocol will use the original CRQ command element
mapped to the first 16bytes of the Sub CRQ element initially.

Add definitions for the Sub CRQ command element and queue.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index dd6d89292867..b9eed05c165f 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -650,6 +650,11 @@ struct ibmvfc_crq {
volatile __be64 ioba;
 } __packed __aligned(8);
 
+struct ibmvfc_sub_crq {
+   struct ibmvfc_crq crq;
+   __be64 reserved[2];
+} __packed __aligned(8);
+
 enum ibmvfc_ae_link_state {
IBMVFC_AE_LS_LINK_UP= 0x01,
IBMVFC_AE_LS_LINK_BOUNCED   = 0x02,
@@ -761,12 +766,14 @@ struct ibmvfc_event_pool {
 enum ibmvfc_msg_fmt {
IBMVFC_CRQ_FMT = 0,
IBMVFC_ASYNC_FMT,
+   IBMVFC_SUB_CRQ_FMT,
 };
 
 union ibmvfc_msgs {
void *handle;
struct ibmvfc_crq *crq;
struct ibmvfc_async_crq *async;
+   struct ibmvfc_sub_crq *scrq;
 };
 
 struct ibmvfc_queue {
@@ -781,6 +788,20 @@ struct ibmvfc_queue {
struct list_head sent;
struct list_head free;
spinlock_t l_lock;
+
+   /* Sub-CRQ fields */
+   struct ibmvfc_host *vhost;
+   unsigned long cookie;
+   unsigned long vios_cookie;
+   unsigned long hw_irq;
+   unsigned long irq;
+   unsigned long hwq_id;
+   char name[32];
+};
+
+struct ibmvfc_scsi_channels {
+   struct ibmvfc_queue *scrqs;
+   unsigned int active_queues;
 };
 
 enum ibmvfc_host_action {
-- 
2.27.0

[PATCH v5 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-14 Thread Tyrel Datwyler

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 8 
 drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ba95438a8912..9200fe49c57e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
.max_sectors = IBMVFC_MAX_SECTORS,
.shost_attrs = ibmvfc_attrs,
.track_queue_depth = 1,
+   .host_tagset = 1,
 };
 
 /**
@@ -5290,6 +5291,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
shost->max_sectors = IBMVFC_MAX_SECTORS;
shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
shost->unique_id = shost->host_no;
+   shost->nr_hw_queues = IBMVFC_MQ ? IBMVFC_SCSI_HW_QUEUES : 1;
 
vhost = shost_priv(shost);
INIT_LIST_HEAD(>targets);
@@ -5300,6 +5302,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
vhost->partition_number = -1;
vhost->log_level = log_level;
vhost->task_set = 1;
+
+   vhost->mq_enabled = IBMVFC_MQ;
+   vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
+   vhost->using_channels = 0;
+   vhost->do_enquiry = 1;
+
strcpy(vhost->partition_name, "UNKNOWN");
init_waitqueue_head(>work_wait_q);
init_waitqueue_head(>init_wait_q);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 632e977449c5..dd6d89292867 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,6 +41,11 @@
 #define IBMVFC_DEFAULT_LOG_LEVEL   2
 #define IBMVFC_MAX_CDB_LEN 16
 #define IBMVFC_CLS3_ERROR  0
+#define IBMVFC_MQ  0
+#define IBMVFC_SCSI_CHANNELS   0
+#define IBMVFC_SCSI_HW_QUEUES  1
+#define IBMVFC_MIG_NO_SUB_TO_CRQ   0
+#define IBMVFC_MIG_NO_N_TO_M   0
 
 /*
  * Ensure we have resources for ERP and initialization:
@@ -840,6 +845,10 @@ struct ibmvfc_host {
int delay_init;
int scan_complete;
int logged_in;
+   int mq_enabled;
+   int using_channels;
+   int do_enquiry;
+   int client_scsi_channels;
int aborting_passthru;
int events_to_log;
 #define IBMVFC_AE_LINKUP   0x0001
-- 
2.27.0

[PATCH v5 00/21] ibmvfc: initial MQ development/enablement

2021-01-14 Thread Tyrel Datwyler

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels.

This latest series is completely rebased and reimplemented on top of the recent
("ibmvfc: MQ prepartory locking work") series. [1]

[1] 
https://lore.kernel.org/linux-scsi/20210106201835.1053593-1-tyr...@linux.ibm.com/

changes in v5:
* Addressed comments from brking in following patches:
* Patch 18: Drop queue lock in loop after sending cancel event
Remove cancel event from list after completion
Return -EIO on unknown failure
* Patch 21: Removed can_queue rebase artifact and range check user supplied
nr_scsi_hw_queue value

changes in v4:
* Series rebased and reworked on top of previous ibmvfc locking series
* Dropped all previous Reviewed-by tags

changes in v3:
* Patch 4: changed firmware support logging to dev_warn_once
* Patch 6: adjusted locking
* Patch 15: dropped logging verbosity, moved cancel event tracking into subqueue
* Patch 17: removed write permission for migration module parameters
drive hard reset after update to num of scsi channels

changes in v2:
* Patch 4: NULL'd scsi_scrq reference after deallocation
* Patch 6: Added switch case to handle XPORT event
* Patch 9: fixed ibmvfc_event leak and double free
* added support for cancel command with MQ
* added parameter toggles for MQ settings

Tyrel Datwyler (21):
  ibmvfc: add vhost fields and defaults for MQ enablement
  ibmvfc: move event pool init/free routines
  ibmvfc: init/free event pool during queue allocation/free
  ibmvfc: add size parameter to ibmvfc_init_event_pool
  ibmvfc: define hcall wrapper for registering a Sub-CRQ
  ibmvfc: add Subordinate CRQ definitions
  ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
  ibmvfc: add Sub-CRQ IRQ enable/disable routine
  ibmvfc: add handlers to drain and complete Sub-CRQ responses
  ibmvfc: define Sub-CRQ interrupt handler routine
  ibmvfc: map/request irq and register Sub-CRQ interrupt handler
  ibmvfc: implement channel enquiry and setup commands
  ibmvfc: advertise client support for using hardware channels
  ibmvfc: set and track hw queue in ibmvfc_event struct
  ibmvfc: send commands down HW Sub-CRQ when channelized
  ibmvfc: register Sub-CRQ handles with VIOS during channel setup
  ibmvfc: add cancel mad initialization helper
  ibmvfc: send Cancel MAD down each hw scsi channel
  ibmvfc: purge scsi channels after transport loss/reset
  ibmvfc: enable MQ and set reasonable defaults
  ibmvfc: provide modules parameters for MQ settings

 drivers/scsi/ibmvscsi/ibmvfc.c | 917 -
 drivers/scsi/ibmvscsi/ibmvfc.h |  39 ++
 2 files changed, 828 insertions(+), 128 deletions(-)

-- 
2.27.0

[PATCH v5 02/21] ibmvfc: move event pool init/free routines

2021-01-14 Thread Tyrel Datwyler

The next patch in this series reworks the event pool allocation calls to
happen within the individual queue allocation routines instead of as
independent calls.

Move the init/free routines earlier in ibmvfc.c to prevent undefined
reference errors when calling these functions from the queue allocation
code. No functional change.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 151 +
 1 file changed, 76 insertions(+), 75 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 9200fe49c57e..cd9273a5fadb 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -716,6 +716,82 @@ static int ibmvfc_send_crq_init_complete(struct 
ibmvfc_host *vhost)
return ibmvfc_send_crq(vhost, 0xC002LL, 0);
 }
 
+/**
+ * ibmvfc_init_event_pool - Allocates and initializes the event pool for a host
+ * @vhost: ibmvfc host who owns the event pool
+ *
+ * Returns zero on success.
+ **/
+static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
+ struct ibmvfc_queue *queue)
+{
+   int i;
+   struct ibmvfc_event_pool *pool = >evt_pool;
+
+   ENTER;
+   pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
+   pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL);
+   if (!pool->events)
+   return -ENOMEM;
+
+   pool->iu_storage = dma_alloc_coherent(vhost->dev,
+ pool->size * 
sizeof(*pool->iu_storage),
+ >iu_token, 0);
+
+   if (!pool->iu_storage) {
+   kfree(pool->events);
+   return -ENOMEM;
+   }
+
+   INIT_LIST_HEAD(>sent);
+   INIT_LIST_HEAD(>free);
+   spin_lock_init(>l_lock);
+
+   for (i = 0; i < pool->size; ++i) {
+   struct ibmvfc_event *evt = >events[i];
+
+   atomic_set(>free, 1);
+   evt->crq.valid = 0x80;
+   evt->crq.ioba = cpu_to_be64(pool->iu_token + 
(sizeof(*evt->xfer_iu) * i));
+   evt->xfer_iu = pool->iu_storage + i;
+   evt->vhost = vhost;
+   evt->queue = queue;
+   evt->ext_list = NULL;
+   list_add_tail(>queue_list, >free);
+   }
+
+   LEAVE;
+   return 0;
+}
+
+/**
+ * ibmvfc_free_event_pool - Frees memory of the event pool of a host
+ * @vhost: ibmvfc host who owns the event pool
+ *
+ **/
+static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost,
+  struct ibmvfc_queue *queue)
+{
+   int i;
+   struct ibmvfc_event_pool *pool = >evt_pool;
+
+   ENTER;
+   for (i = 0; i < pool->size; ++i) {
+   list_del(>events[i].queue_list);
+   BUG_ON(atomic_read(>events[i].free) != 1);
+   if (pool->events[i].ext_list)
+   dma_pool_free(vhost->sg_pool,
+ pool->events[i].ext_list,
+ pool->events[i].ext_list_token);
+   }
+
+   kfree(pool->events);
+   dma_free_coherent(vhost->dev,
+ pool->size * sizeof(*pool->iu_storage),
+ pool->iu_storage, pool->iu_token);
+   LEAVE;
+}
+
 /**
  * ibmvfc_free_queue - Deallocate queue
  * @vhost: ibmvfc host struct
@@ -1312,81 +1388,6 @@ static void ibmvfc_set_login_info(struct ibmvfc_host 
*vhost)
strncpy(login_info->drc_name, location, IBMVFC_MAX_NAME);
 }
 
-/**
- * ibmvfc_init_event_pool - Allocates and initializes the event pool for a host
- * @vhost: ibmvfc host who owns the event pool
- *
- * Returns zero on success.
- **/
-static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
- struct ibmvfc_queue *queue)
-{
-   int i;
-   struct ibmvfc_event_pool *pool = >evt_pool;
-
-   ENTER;
-   pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
-   pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL);
-   if (!pool->events)
-   return -ENOMEM;
-
-   pool->iu_storage = dma_alloc_coherent(vhost->dev,
- pool->size * 
sizeof(*pool->iu_storage),
- >iu_token, 0);
-
-   if (!pool->iu_storage) {
-   kfree(pool->events);
-   return -ENOMEM;
-   }
-
-   INIT_LIST_HEAD(>sent);
-   INIT_LIST_HEAD(>free);
-   spin_lock_init(>l_lock);
-
-   for (i = 0; i < pool->size; ++i) {
-   struct ibmvfc_event *evt = >events[i];
-   atomic_set(>free, 1);
-   evt->crq.valid = 0x80;
-   evt->crq.ioba = cpu_to_be64(pool->iu_token + 
(sizeof(*evt->xfer_iu) * i));
-   evt->xfer_iu = pool->iu_storage + i;
-   evt->vhost = vhost;
-   evt->queue =

Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet

2021-01-14 Thread Lee Jones

On Thu, 14 Jan 2021, Jakub Kicinski wrote:

> On Thu, 14 Jan 2021 08:33:49 + Lee Jones wrote:
> > On Wed, 13 Jan 2021, Jakub Kicinski wrote:
> > 
> > > On Wed, 13 Jan 2021 16:41:16 + Lee Jones wrote:  
> > > > Resending the stragglers again. 
> > > >  
> > > > 
> > > > This set is part of a larger effort attempting to clean-up W=1  
> > > >  
> > > > kernel builds, which are currently overwhelmingly riddled with  
> > > >  
> > > > niggly little warnings. 
> > > >  
> > > > 
> > > >  
> > > > v2: 
> > > >  
> > > >  - Squashed IBM patches 
> > > >  
> > > >  - Fixed real issue in SMSC
> > > >  - Added Andrew's Reviewed-by tags on remainder  
> > > 
> > > Does not apply, please rebase on net-next/master.  
> > 
> > These are based on Tuesday's next/master.
> 
> What's next/master?

I'm not sure if this is a joke, or not? :)

next/master == Linux Next.  The daily merged repo where all of the
*-next branches end up to ensure interoperability.  It's also the
branch that is most heavily tested by the auto-builders to ensure the
vast majority of issues are ironed out before hitting Mainline.

> This is net-next:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/

Looks like net-next gets merged into next/master:

commit 452958f1f3d1c8980a8414f9c37c8c6de24c7d32
Merge: 1eabba209a17a f50e2f9f79164
Author: Stephen Rothwell 
Date:   Thu Jan 14 10:35:40 2021 +1100

Merge remote-tracking branch 'net-next/master'

So I'm not sure what it's conflicting with.

Do you have patches in net-next that didn't make it into next/master
for some reason?

I'll try to rebase again tomorrow.

Hopefully I am able to reproduce your issue by then.

> > I just rebased them now with no issue.
> > 
> > What conflict are you seeing?
> 
> Applying: net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
> error: patch failed: drivers/net/ethernet/smsc/smc91x.c:2192
> error: drivers/net/ethernet/smsc/smc91x.c: patch does not apply
> Patch failed at 0001 net: ethernet: smsc: smc91x: Fix function name in 
> kernel-doc header
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode

2021-01-14 Thread Mark Brown

On Thu, 14 Jan 2021 13:09:37 + (UTC), Christophe Leroy wrote:
> Commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO
> descriptors") broke fsl spi driver.
> 
> As now we fully rely on gpiolib for handling the polarity of
> chip selects, the driver shall not alter the GPIO value anymore
> when SPI_CS_HIGH is not set in spi->mode.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
  commit: 7a2da5d7960a64ee923fe3e31f01a1101052c66f

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

[PATCH 1/3] tty: hvcs: Drop unnecessary if block

2021-01-14 Thread Uwe Kleine-König

If hvcs_probe() succeeded dev_set_drvdata() is called with a non-NULL
value, and if hvcs_probe() failed hvcs_remove() isn't called.

So there is no way dev_get_drvdata() can return NULL in hvcs_remove() and
the check can just go away.

Signed-off-by: Uwe Kleine-König 
---
 drivers/tty/hvc/hvcs.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 509d1042825a..3e0461285c34 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -825,9 +825,6 @@ static int hvcs_remove(struct vio_dev *dev)
unsigned long flags;
struct tty_struct *tty;
 
-   if (!hvcsd)
-   return -ENODEV;
-
/* By this time the vty-server won't be getting any more interrupts */
 
spin_lock_irqsave(>lock, flags);
-- 
2.29.2

[PATCH 3/3] tty: vcc: Drop impossible to hit WARN_ON

2021-01-14 Thread Uwe Kleine-König

vcc_get() returns the port that has provided port->index. As the port that
is about to be removed isn't removed yet this trivially will find this
port. So simplify the call to not assign an identical value to the port
pointer and drop the warning that is never hit.

Signed-off-by: Uwe Kleine-König 
---
 drivers/tty/vcc.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c
index d9b0dc6deae9..e2d6205f83ce 100644
--- a/drivers/tty/vcc.c
+++ b/drivers/tty/vcc.c
@@ -692,12 +692,9 @@ static int vcc_remove(struct vio_dev *vdev)
tty_vhangup(port->tty);
 
/* Get exclusive reference to VCC, ensures that there are no other
-* clients to this port
+* clients to this port. This cannot fail.
 */
-   port = vcc_get(port->index, true);
-
-   if (WARN_ON(!port))
-   return -ENODEV;
+   vcc_get(port->index, true);
 
tty_unregister_device(vcc_tty_driver, port->index);
 
-- 
2.29.2

[PATCH 0/3] tty: some cleanups in remove functions

2021-01-14 Thread Uwe Kleine-König

Hello,

while working on changing the prototype of struct vio_driver::remove to
return void I noticed a few exit paths in such callbacks that return an
error code.

This is a bad thing because the return value is ignored (which is the
motivation to make it void) and the corresponding device then ends in
some limbo state.

Luckily for the three offenders here these cases cannot happen and are
simplified accordingly. This then makes the patch that changes the
remove callback's prototype simpler because it only changes prototypes
and drops "return 0"s.

Best regards and thanks for considering,
Uwe Kleine-König

Uwe Kleine-König (3):
  tty: hvcs: Drop unnecessary if block
  tty: vcc: Drop unnecessary if block
  tty: vcc: Drop impossible to hit WARN_ON

 drivers/tty/hvc/hvcs.c |  3 ---
 drivers/tty/vcc.c  | 10 ++
 2 files changed, 2 insertions(+), 11 deletions(-)

-- 
2.29.2

[PATCH 2/3] tty: vcc: Drop unnecessary if block

2021-01-14 Thread Uwe Kleine-König

If vcc_probe() succeeded dev_set_drvdata() is called with a non-NULL
value, and if vcc_probe() failed vcc_remove() isn't called.

So there is no way dev_get_drvdata() can return NULL in vcc_remove() and
the check can just go away.

Signed-off-by: Uwe Kleine-König 
---
 drivers/tty/vcc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c
index 9ffd42e333b8..d9b0dc6deae9 100644
--- a/drivers/tty/vcc.c
+++ b/drivers/tty/vcc.c
@@ -681,9 +681,6 @@ static int vcc_remove(struct vio_dev *vdev)
 {
struct vcc_port *port = dev_get_drvdata(>dev);
 
-   if (!port)
-   return -ENODEV;
-
del_timer_sync(>rx_timer);
del_timer_sync(>tx_timer);
 
-- 
2.29.2

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-14 Thread Brian King

On 1/13/21 7:27 PM, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
>> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
>>> On 1/12/21 2:54 PM, Brian King wrote:
 On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
>
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
>  2 files changed, 17 insertions(+)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c 
> b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ba95438a8912..9200fe49c57e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>   .max_sectors = IBMVFC_MAX_SECTORS,
>   .shost_attrs = ibmvfc_attrs,
>   .track_queue_depth = 1,
> + .host_tagset = 1,

 This doesn't seem right. You are setting host_tagset, which means you want 
 a
 shared, host wide, tag set for commands. It also means that the total
 queue depth for the host is can_queue. However, it looks like you are 
 allocating
 max_requests events for each sub crq, which means you are over allocating 
 memory.
>>>
>>> With the shared tagset yes the queue depth for the host is can_queue, but 
>>> this
>>> also implies that the max queue depth for each hw queue is also can_queue. 
>>> So,
>>> in the worst case that all commands are queued down the same hw queue we 
>>> need an
>>> event pool with can_queue commands.
>>>

 Looking at this closer, we might have bigger problems. There is a host wide
 max number of commands that the VFC host supports, which gets returned on
 NPIV Login. This value can change across a live migration event.
>>>
>>> From what I understand the max commands can only become less.
>>>

 The ibmvfc driver, which does the same thing the lpfc driver does, modifies
 can_queue on the scsi_host *after* the tag set has been allocated. This 
 looks
 to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
 we look at can_queue once the tag set is setup, and I'm not seeing a good 
 way
 to dynamically change the host queue depth once the tag set is setup. 

 Unless I'm missing something, our best options appear to either be to 
 implement
 our own host wide busy reference counting, which doesn't sound very good, 
 or
 we need to add some API to block / scsi that allows us to dynamically 
 change
 can_queue.
>>>
>>> Changing can_queue won't do use any good with the shared tagset becasue each
>>> queue still needs to be able to queue can_queue number of commands in the 
>>> worst
>>> case.
>>
>> The issue I'm trying to highlight here is the following scenario:
>>
>> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag 
>> set.
>>
>> 2. On our NPIV login response from the VIOS, we might get a lower value than 
>> we
>> initially set in shost->can_queue, so we update it, but nobody ever looks at 
>> it
>> again, and we don't have any protection against sending too many commands to 
>> the host.
>>
>>
>> Basically, we no longer have any code that ensures we don't send more
>> commands to the VIOS than we are told it supports. According to the 
>> architecture,
>> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
>> of a bug on our part.
>>
>> I don't think it was ever clearly defined in the API that a driver can
>> change shost->can_queue after calling scsi_add_host, but up until
>> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
>> it doesn't. 
> 
> Actually it isn't related with commit 6eb045e092ef, because 
> blk_mq_alloc_tag_set()
> uses .can_queue to create driver tag sbitmap and request pool.
> 
> So even thought without 6eb045e092ef, the updated .can_queue can't work
> as expected because the max driver tag depth has been fixed by blk-mq already.

There are two scenarios here. In the scenario of someone increasing can_queue
after the tag set is allocated, I agree, blk-mq will never take advantage
of this. However, in the scenario of someone *decreasing* can_queue after the
tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code provided
this protection.

> 
> What 6eb045e092ef does is just to remove the double check on max
> host-wide allowed commands because that has been respected by blk-mq
> driver tag allocation already.
> 
>>
>> I started looking through drivers that do this, and so far, it looks like the
>> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely 
>> others...
>>
>> We probably need an API that lets us change shost->can_queue dynamically.
> 
> I'd suggest to confirm changing .can_queue is one real usecase.

For ibmvfc, the total number of

Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

2021-01-14 Thread Florian Fainelli

On 1/14/21 1:08 AM, Claire Chang wrote:
> On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli  wrote:
>>
>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>> If a device is not behind an IOMMU, we look up the device node and set
>>> up the restricted DMA when the restricted-dma-pool is presented.
>>>
>>> Signed-off-by: Claire Chang 
>>> ---
>>
>> [snip]
>>
>>> +int of_dma_set_restricted_buffer(struct device *dev)
>>> +{
>>> + struct device_node *node;
>>> + int count, i;
>>> +
>>> + if (!dev->of_node)
>>> + return 0;
>>> +
>>> + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
>>> + sizeof(phandle));
>>
>> You could have an early check for count < 0, along with an error
>> message, if that is deemed useful.
>>
>>> + for (i = 0; i < count; i++) {
>>> + node = of_parse_phandle(dev->of_node, "memory-region", i);
>>> + if (of_device_is_compatible(node, "restricted-dma-pool"))
>>
>> And you may want to add here an of_device_is_available(node). A platform
>> that provides the Device Tree firmware and try to support multiple
>> different SoCs may try to determine if an IOMMU is present, and if it
>> is, it could be marking the restriced-dma-pool region with a 'status =
>> "disabled"' property, or any variant of that scheme.
> 
> This function is called only when there is no IOMMU present (check in
> drivers/of/device.c). I can still add of_device_is_available(node)
> here if you think it's helpful.

I believe it is, since boot loader can have a shared Device Tree blob
skeleton and do various adaptations based on the chip (that's what we
do) and adding a status property is much simpler than insertion new
nodes are run time.

> 
>>
>>> + return of_reserved_mem_device_init_by_idx(
>>> + dev, dev->of_node, i);
>>
>> This does not seem to be supporting more than one memory region, did not
>> you want something like instead:
>>
>> ret = of_reserved_mem_device_init_by_idx(...);
>> if (ret)
>> return ret;
>>
> 
> Yes. This implement only supports one restriced-dma-pool memory region
> with the assumption that there is only one memory region with the
> compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> to shared-dma-pool.

Then if here is such a known limitation it should be both documented and
enforced here, you shouldn ot be iterating over all of the phandles that
you find, stop at the first one and issue a warning if count > 1?
-- 
Florian

Re: [PATCH v5 03/21] powerpc: remove arguments from fault handler functions

2021-01-14 Thread Christophe Leroy





Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :

Make mm fault handlers all just take the pt_regs * argument and load
DAR/DSISR from that. Make those that return a value return long.

This is done to make the function signatures match other handlers, which
will help with a future patch to add wrappers. Explicit arguments could
be added for performance but that would require more wrapper macro
variants.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
  arch/powerpc/include/asm/bug.h|  2 +-
  arch/powerpc/kernel/entry_32.S|  6 +-
  arch/powerpc/kernel/exceptions-64e.S  |  2 --
  arch/powerpc/kernel/exceptions-64s.S  | 14 ++
  arch/powerpc/kernel/head_40x.S| 10 +-
  arch/powerpc/kernel/head_8xx.S|  6 +++---
  arch/powerpc/kernel/head_book3s_32.S  |  5 ++---
  arch/powerpc/kernel/head_booke.h  |  4 +---
  arch/powerpc/mm/book3s64/hash_utils.c |  8 +---
  arch/powerpc/mm/book3s64/slb.c| 11 +++
  arch/powerpc/mm/fault.c   |  7 ---
  13 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 238eacfda7b0..a32157ce0551 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -277,7 +277,7 @@ reenable_mmu:
 * r3 can be different from GPR3(r1) at this point, r9 and r11
 * contains the old MSR and handler address respectively,
 * r4 & r5 can contain page fault arguments that need to be passed


The line above should be dropped as well (its end on the line below is dropped 
already)



-* along as well. r0, r6-r8, r12, CCR, CTR, XER etc... are left
+* r0, r4-r8, r12, CCR, CTR, XER etc... are left
 * clobbered as they aren't useful past this point.
 */
  


Christophe

Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

2021-01-14 Thread Mark Brown

On Thu, Jan 14, 2021 at 02:42:26PM +0100, Christophe Leroy wrote:
> Le 14/01/2021 à 14:22, Mark Brown a écrit :

> > For GPIO chipselects you should really fix the driver to just hand the
> > GPIO off to the core rather than trying to implement this itself, that
> > will avoid driver specific differences like this.

> IIUC, it is not trivial as it requires implementing transfer_one() instead
> of the existing transfer_one_message() in the driver. Am I right ?

Yes, that's a good idea in general though.  It should normally be pretty
simple since the conversion is mostly just deleting code doing things
which will be handled by the core.

> What's the difference/benefit of transfer_one() compared to the existing 
> transfer_one_message() ?

It factors out all the handling of chip selects, including per-transfer
chip select inversion and so on, and also factors out all the handling
of in-message delays.  If nothing else it reduces the amount of
duplicated code that might require maintainance can have issues with
misaligned expectations from the core or client drivers.

signature.asc
Description: PGP signature

Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

2021-01-14 Thread Christophe Leroy





Le 14/01/2021 à 14:22, Mark Brown a écrit :


For GPIO chipselects you should really fix the driver to just hand the
GPIO off to the core rather than trying to implement this itself, that
will avoid driver specific differences like this.



IIUC, it is not trivial as it requires implementing transfer_one() instead of the existing 
transfer_one_message() in the driver. Am I right ?


What's the difference/benefit of transfer_one() compared to the existing 
transfer_one_message() ?

Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

2021-01-14 Thread Christophe Leroy





Le 14/01/2021 à 14:17, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm:



Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :

Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:

Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:



Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :

The page fault handling still has some complex logic particularly around
hash table handling, in asm. Implement this in C instead.

Signed-off-by: Nicholas Piggin 
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
arch/powerpc/kernel/exceptions-64s.S  | 131 +++---
arch/powerpc/mm/book3s64/hash_utils.c |  77 ++
arch/powerpc/mm/fault.c   |  46 --
4 files changed, 107 insertions(+), 148 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 066b1d34c7bc..60a669379aa0 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
#define HPTE_NOHPTE_UPDATE  0x2
#define HPTE_USE_KERNEL_KEY 0x4

+int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);

extern int __hash_page_4K(unsigned long ea, unsigned long access,
  unsigned long vsid, pte_t *ptep, unsigned long trap,
  unsigned long flags, int ssize, int subpage_prot);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6e53f7638737..bcb5e81d2088 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 *
 * Handling:
 * - Hash MMU
- *   Go to do_hash_page first to see if the HPT can be filled from an entry in
- *   the Linux page table. Hash faults can hit in kernel mode in a fairly
+ *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
+ *   Linux page table. Hash faults can hit in kernel mode in a fairly
 *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
 *   "non-bolted" regions, e.g., vmalloc space. However these should always 
be
- *   backed by Linux page tables.
+ *   backed by Linux page table entries.
 *
- *   If none is found, do a Linux page fault. Linux page faults can happen in
- *   kernel mode due to user copy operations of course.
+ *   If no entry is found the Linux page fault handler is invoked (by
+ *   do_hash_fault). Linux page faults can happen in kernel mode due to user
+ *   copy operations of course.
 *
 *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
 *   MMU context, which may cause a DSI in the host, which must go to the
@@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
GEN_COMMON data_access
ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)


We have DSISR here. I think the dispatch between page fault or do_break() 
should be done here:
- It would be more similar to other arches


Other sub-archs?


- Would avoid doing it also in instruction fault


True but it's hidden under an unlikely branch so won't really help
instruction fault.


- Would avoid that -1 return which looks more like a hack.


I don't really see it as a hack, we return a code to asm caller to
direct whether to restore registers or not, we alrady have this
pattern.

(I'm hoping all that might be go away one day by conrolling NV
regs from C if we can get good code generation but even if not we
still have it in the interrupt returns).

That said I will give it a try here. At very least it might be a
better intermediate step.


Ah yes, this way doesn't work well for later patches because you end
e.g., with the do_break call having to call the interrupt handler
wrappers again when they actually expect to be in the asm entry state
(e.g., irq soft-mask state) when called, and return via interrupt_return
after the exit wrapper runs (which 64s uses to implement better context
tracking for example).

That could possibly be hacked up to deal with multiple interrupt
wrappers per interrupt, but I'd rather not go backwards.

That does leave the other sub archs as having this issue, but they don't
do so much in their handlers. 32 doesn't have soft-mask or context
tracking to deal with for example. We will need to fix this up though
and unify things more.



Not sure I understand what you mean exactly.

On the 8xx, do_break() is called by totally different exceptions:
- Exception 0x1c00 Data breakpoint ==> do_break()
- Exception 0x1300 Instruction TLB error ==> handle_page_fault()
- Exception 0x1400 Data TLB error ==> handle_page_fault()

On book3s/32, we now (after my patch ie patch 1 in your series ) have either 
do_break() or
handle_page_fault() being called from very

Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

2021-01-14 Thread Mark Brown

On Thu, Jan 14, 2021 at 01:33:50PM +0100, Christophe Leroy wrote:
> Le 14/01/2021 à 12:59, Mark Brown a écrit :
> > On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote:

> > > Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW.

> > > If I declare them as ACTIVE_HIGH instead, then I also have to set
> > > spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and
> > > forces the GPIO ACTIVE LOW.

> > > When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode.

> > OK, so it sounds like you want SPI_CS_HIGH and that is being set
> > correctly?

> > > In fsl_spi_chipselect(), we have
> > > 
> > >   bool pol = spi->mode & SPI_CS_HIGH
> > > 
> > > Then
> > >   pdata->cs_control(spi, pol);

> > > So changing the board config is compensated by the above, and at the end 
> > > it still doesn't work.

> > This is a driver bug, the driver set_cs() operation should not be
> > modifying the value it is told to set.

> A driver bug ? Or maybe a change forgotten in commit  766c6b63aa04 ("spi:
> fix client driver breakages when using GPIO descriptors") ?

The expectation that the driver will be using the chip select exactly as
passed in and not attempting to implement SPI_CS_HIGH itself when it has
set_cs() has been there for a considerable time now, that's not new with
the cleanup.  Drivers should only be paying attention to SPI_CS_HIGH in
cases where the hardware controls the chip select autonomously and so
set_cs() can't be provided.

> I'm almost sure it was not a bug, it is in line which what is said in
> the comment removed by the above mentionned commit.

Please take a look at the list archive discussions around this - there's
been a lot of confusion with GPIO descriptors in particular due to there
being multiple places where you can set the inversion.  Note that the
situation you describe above is that you end up with all the various
places other than your driver agreeing that the chip select is active
high as it (AFAICT from what you're saying) actually is.  

For GPIO chipselects you should really fix the driver to just hand the
GPIO off to the core rather than trying to implement this itself, that
will avoid driver specific differences like this.

signature.asc
Description: PGP signature

Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

2021-01-14 Thread Nicholas Piggin

Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm:
> 
> 
> Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :
>> Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
>>> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:


 Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
> The page fault handling still has some complex logic particularly around
> hash table handling, in asm. Implement this in C instead.
>
> Signed-off-by: Nicholas Piggin 
> ---
>arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>arch/powerpc/kernel/exceptions-64s.S  | 131 +++---
>arch/powerpc/mm/book3s64/hash_utils.c |  77 ++
>arch/powerpc/mm/fault.c   |  46 --
>4 files changed, 107 insertions(+), 148 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 066b1d34c7bc..60a669379aa0 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long 
> vpn,
>#define HPTE_NOHPTE_UPDATE 0x2
>#define HPTE_USE_KERNEL_KEY0x4
>
> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long 
> dsisr);
>extern int __hash_page_4K(unsigned long ea, unsigned long access,
> unsigned long vsid, pte_t *ptep, unsigned 
> long trap,
> unsigned long flags, int ssize, int 
> subpage_prot);
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 6e53f7638737..bcb5e81d2088 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> *
> * Handling:
> * - Hash MMU
> - *   Go to do_hash_page first to see if the HPT can be filled from an 
> entry in
> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry 
> in the
> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
> *   arbitrary state (e.g., interrupts disabled, locks held) when 
> accessing
> *   "non-bolted" regions, e.g., vmalloc space. However these should 
> always be
> - *   backed by Linux page tables.
> + *   backed by Linux page table entries.
> *
> - *   If none is found, do a Linux page fault. Linux page faults can 
> happen in
> - *   kernel mode due to user copy operations of course.
> + *   If no entry is found the Linux page fault handler is invoked (by
> + *   do_hash_fault). Linux page faults can happen in kernel mode due to 
> user
> + *   copy operations of course.
> *
> *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in 
> guest
> *   MMU context, which may cause a DSI in the host, which must go to 
> the
> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>   GEN_COMMON data_access
>   ld  r4,_DAR(r1)
>   ld  r5,_DSISR(r1)

 We have DSISR here. I think the dispatch between page fault or do_break() 
 should be done here:
 - It would be more similar to other arches
>>>
>>> Other sub-archs?
>>>
 - Would avoid doing it also in instruction fault
>>>
>>> True but it's hidden under an unlikely branch so won't really help
>>> instruction fault.
>>>
 - Would avoid that -1 return which looks more like a hack.
>>>
>>> I don't really see it as a hack, we return a code to asm caller to
>>> direct whether to restore registers or not, we alrady have this
>>> pattern.
>>>
>>> (I'm hoping all that might be go away one day by conrolling NV
>>> regs from C if we can get good code generation but even if not we
>>> still have it in the interrupt returns).
>>>
>>> That said I will give it a try here. At very least it might be a
>>> better intermediate step.
>> 
>> Ah yes, this way doesn't work well for later patches because you end
>> e.g., with the do_break call having to call the interrupt handler
>> wrappers again when they actually expect to be in the asm entry state
>> (e.g., irq soft-mask state) when called, and return via interrupt_return
>> after the exit wrapper runs (which 64s uses to implement better context
>> tracking for example).
>> 
>> That could possibly be hacked up to deal with multiple interrupt
>> wrappers per interrupt, but I'd rather not go backwards.
>> 
>> That does leave the other sub archs as having this issue, but they don't
>> do so much in their handlers. 32 doesn't have soft-mask or context
>> tracking to deal

[PATCH] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode

2021-01-14 Thread Christophe Leroy

Commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO
descriptors") broke fsl spi driver.

As now we fully rely on gpiolib for handling the polarity of
chip selects, the driver shall not alter the GPIO value anymore
when SPI_CS_HIGH is not set in spi->mode.

Fixes: 766c6b63aa04 ("spi: fix client driver breakages when using GPIO 
descriptors")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 drivers/spi/spi-fsl-spi.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 649766808caf..9e297b2715f6 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -115,14 +115,13 @@ static void fsl_spi_chipselect(struct spi_device *spi, 
int value)
 {
struct mpc8xxx_spi *mpc8xxx_spi = spi_master_get_devdata(spi->master);
struct fsl_spi_platform_data *pdata;
-   bool pol = spi->mode & SPI_CS_HIGH;
struct spi_mpc8xxx_cs   *cs = spi->controller_state;
 
pdata = spi->dev.parent->parent->platform_data;
 
if (value == BITBANG_CS_INACTIVE) {
if (pdata->cs_control)
-   pdata->cs_control(spi, !pol);
+   pdata->cs_control(spi, false);
}
 
if (value == BITBANG_CS_ACTIVE) {
@@ -134,7 +133,7 @@ static void fsl_spi_chipselect(struct spi_device *spi, int 
value)
fsl_spi_change_mode(spi);
 
if (pdata->cs_control)
-   pdata->cs_control(spi, pol);
+   pdata->cs_control(spi, true);
}
 }
 
-- 
2.25.0

Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2021-01-14 Thread Nicholas Piggin

Excerpts from Michal Suchánek's message of January 14, 2021 10:40 pm:
> On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am:
>> > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
>> >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
>> >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
>> >>> > Michal Suchánek  writes:
>> >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
>> >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
>> >>> >>> > Hello,
>> >>> >>> > 
>> >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 
>> >>> >>> > ("powerpc/64s:
>> >>> >>> > Reimplement book3s idle code in C").
>> >>> >>> > 
>> >>> >>> > The symptom is host locking up completely after some hours of KVM
>> >>> >>> > workload with messages like
>> >>> >>> > 
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
>> >>> >>> > cpu 47
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
>> >>> >>> > cpu 71
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
>> >>> >>> > cpu 47
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
>> >>> >>> > cpu 71
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
>> >>> >>> > cpu 47
>> >>> >>> > 
>> >>> >>> > printed before the host locks up.
>> >>> >>> > 
>> >>> >>> > The machines run sandboxed builds which is a mixed workload 
>> >>> >>> > resulting in
>> >>> >>> > IO/single core/mutiple core load over time and there are periods 
>> >>> >>> > of no
>> >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
>> >>> >>> > setup/terdown is somewhat excercised as well.
>> >>> >>> > 
>> >>> >>> > POWER9 with the new guest entry fast path does not seem to be 
>> >>> >>> > affected.
>> >>> >>> > 
>> >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
>> >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
>> >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel 
>> >>> >>> > seems
>> >>> >>> > stable.
>> >>> >>> > 
>> >>> >>> > Config is attached.
>> >>> >>> > 
>> >>> >>> > I cannot easily revert this commit, especially if I want to use 
>> >>> >>> > the same
>> >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are 
>> >>> >>> > applicable
>> >>> >>> > only to the new idle code.
>> >>> >>> > 
>> >>> >>> > Any idea what can be the problem?
>> >>> >>> 
>> >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
>> >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a 
>> >>> >>> good
>> >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from 
>> >>> >>> the
>> >>> >>> BMC unfortunately.
>> >>> >>
>> >>> >> It may be possible to set up fadump with a later kernel version that
>> >>> >> supports it on powernv and dump the whole kernel.
>> >>> > 
>> >>> > Your firmware won't support it AFAIK.
>> >>> > 
>> >>> > You could try kdump, but if we have CPUs stuck in KVM then there's a
>> >>> > good chance it won't work :/
>> >>> 
>> >>> I haven't had any luck yet reproducing this still. Testing with sub 
>> >>> cores of various different combinations, etc. I'll keep trying though.
>> >> 
>> >> Hello,
>> >> 
>> >> I tried running some KVM guests to simulate the workload and what I get
>> >> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
>> >> kernel and qemu 4.2.1 and 5.1.0
>> >> 
>> >> To start some guests I run
>> >> 
>> >> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel 
>> >> kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults 
>> >> -nographic -serial mon:telnet::444$i,server,wait & done
>> >> 
>> >> To simulate some workload I run
>> >> 
>> >> xz -zc9T0 < /dev/zero > /dev/null &
>> >> while true; do
>> >> killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
>> >> done &
>> >> 
>> >> on the host and add a job that executes this to the ramdisk. However, most
>> >> guests never get to the point where the job is executed.
>> >> 
>> >> Any idea what might be the problem?
>> > 
>> > I would say try without pv queued spin locks (but if the same thing is 
>> > happening with 5.3 then it must be something else I guess). 
>> > 
>> > I'll try to test a similar setup on a POWER8 here.
>> 
>> Couldn't reproduce the guest hang, they seem to run fine even with 
>> queued spinlocks. Might have a different .config.
>> 
>> I might have got a lockup in the host (although different symptoms than 
>> the original report). I'll look into that a bit further.
> 
> Hello,
> 
> any progress on this?

No progress, I still wasn't able to reproduce, and it fell off the 
radar sorry.

I expect hwthred_state must be getting corrupted somewhere or a
secondary thread getting stuck

Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2021-01-14 Thread Michal Suchánek

On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am:
> > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
> >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
> >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> >>> > Michal Suchánek  writes:
> >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> >>> >>> > Hello,
> >>> >>> > 
> >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> >>> >>> > Reimplement book3s idle code in C").
> >>> >>> > 
> >>> >>> > The symptom is host locking up completely after some hours of KVM
> >>> >>> > workload with messages like
> >>> >>> > 
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
> >>> >>> > cpu 47
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
> >>> >>> > cpu 71
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
> >>> >>> > cpu 47
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
> >>> >>> > cpu 71
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab 
> >>> >>> > cpu 47
> >>> >>> > 
> >>> >>> > printed before the host locks up.
> >>> >>> > 
> >>> >>> > The machines run sandboxed builds which is a mixed workload 
> >>> >>> > resulting in
> >>> >>> > IO/single core/mutiple core load over time and there are periods of 
> >>> >>> > no
> >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
> >>> >>> > setup/terdown is somewhat excercised as well.
> >>> >>> > 
> >>> >>> > POWER9 with the new guest entry fast path does not seem to be 
> >>> >>> > affected.
> >>> >>> > 
> >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel 
> >>> >>> > seems
> >>> >>> > stable.
> >>> >>> > 
> >>> >>> > Config is attached.
> >>> >>> > 
> >>> >>> > I cannot easily revert this commit, especially if I want to use the 
> >>> >>> > same
> >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are 
> >>> >>> > applicable
> >>> >>> > only to the new idle code.
> >>> >>> > 
> >>> >>> > Any idea what can be the problem?
> >>> >>> 
> >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a 
> >>> >>> good
> >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from 
> >>> >>> the
> >>> >>> BMC unfortunately.
> >>> >>
> >>> >> It may be possible to set up fadump with a later kernel version that
> >>> >> supports it on powernv and dump the whole kernel.
> >>> > 
> >>> > Your firmware won't support it AFAIK.
> >>> > 
> >>> > You could try kdump, but if we have CPUs stuck in KVM then there's a
> >>> > good chance it won't work :/
> >>> 
> >>> I haven't had any luck yet reproducing this still. Testing with sub 
> >>> cores of various different combinations, etc. I'll keep trying though.
> >> 
> >> Hello,
> >> 
> >> I tried running some KVM guests to simulate the workload and what I get
> >> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
> >> kernel and qemu 4.2.1 and 5.1.0
> >> 
> >> To start some guests I run
> >> 
> >> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel 
> >> kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults 
> >> -nographic -serial mon:telnet::444$i,server,wait & done
> >> 
> >> To simulate some workload I run
> >> 
> >> xz -zc9T0 < /dev/zero > /dev/null &
> >> while true; do
> >> killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
> >> done &
> >> 
> >> on the host and add a job that executes this to the ramdisk. However, most
> >> guests never get to the point where the job is executed.
> >> 
> >> Any idea what might be the problem?
> > 
> > I would say try without pv queued spin locks (but if the same thing is 
> > happening with 5.3 then it must be something else I guess). 
> > 
> > I'll try to test a similar setup on a POWER8 here.
> 
> Couldn't reproduce the guest hang, they seem to run fine even with 
> queued spinlocks. Might have a different .config.
> 
> I might have got a lockup in the host (although different symptoms than 
> the original report). I'll look into that a bit further.

Hello,

any progress on this?

I considered reinstating the old assembly code for POWER[78] but even
the way it's called has changed slightly.

Thanks

Michal

Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

2021-01-14 Thread Christophe Leroy





Le 14/01/2021 à 12:59, Mark Brown a écrit :

On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote:


Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW.



If I declare them as ACTIVE_HIGH instead, then I also have to set
spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and
forces the GPIO ACTIVE LOW.



When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode.


OK, so it sounds like you want SPI_CS_HIGH and that is being set
correctly?


In fsl_spi_chipselect(), we have

bool pol = spi->mode & SPI_CS_HIGH

Then
pdata->cs_control(spi, pol);



So changing the board config is compensated by the above, and at the end it 
still doesn't work.


This is a driver bug, the driver set_cs() operation should not be
modifying the value it is told to set.



A driver bug ? Or maybe a change forgotten in commit  766c6b63aa04 ("spi: fix client driver 
breakages when using GPIO descriptors") ?


I'm almost sure it was not a bug, it is in line which what is said in the comment removed by the 
above mentionned commit.


I'll send a fix.

Christophe

Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

2021-01-14 Thread Christophe Leroy





Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :

Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:

Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:



Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :

The page fault handling still has some complex logic particularly around
hash table handling, in asm. Implement this in C instead.

Signed-off-by: Nicholas Piggin 
---
   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
   arch/powerpc/kernel/exceptions-64s.S  | 131 +++---
   arch/powerpc/mm/book3s64/hash_utils.c |  77 ++
   arch/powerpc/mm/fault.c   |  46 --
   4 files changed, 107 insertions(+), 148 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 066b1d34c7bc..60a669379aa0 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
   #define HPTE_NOHPTE_UPDATE   0x2
   #define HPTE_USE_KERNEL_KEY  0x4
   
+int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);

   extern int __hash_page_4K(unsigned long ea, unsigned long access,
  unsigned long vsid, pte_t *ptep, unsigned long trap,
  unsigned long flags, int ssize, int subpage_prot);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6e53f7638737..bcb5e81d2088 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
*
* Handling:
* - Hash MMU
- *   Go to do_hash_page first to see if the HPT can be filled from an entry in
- *   the Linux page table. Hash faults can hit in kernel mode in a fairly
+ *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
+ *   Linux page table. Hash faults can hit in kernel mode in a fairly
*   arbitrary state (e.g., interrupts disabled, locks held) when accessing
*   "non-bolted" regions, e.g., vmalloc space. However these should always 
be
- *   backed by Linux page tables.
+ *   backed by Linux page table entries.
*
- *   If none is found, do a Linux page fault. Linux page faults can happen in
- *   kernel mode due to user copy operations of course.
+ *   If no entry is found the Linux page fault handler is invoked (by
+ *   do_hash_fault). Linux page faults can happen in kernel mode due to user
+ *   copy operations of course.
*
*   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
*   MMU context, which may cause a DSI in the host, which must go to the
@@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
GEN_COMMON data_access
ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)


We have DSISR here. I think the dispatch between page fault or do_break() 
should be done here:
- It would be more similar to other arches


Other sub-archs?


- Would avoid doing it also in instruction fault


True but it's hidden under an unlikely branch so won't really help
instruction fault.


- Would avoid that -1 return which looks more like a hack.


I don't really see it as a hack, we return a code to asm caller to
direct whether to restore registers or not, we alrady have this
pattern.

(I'm hoping all that might be go away one day by conrolling NV
regs from C if we can get good code generation but even if not we
still have it in the interrupt returns).

That said I will give it a try here. At very least it might be a
better intermediate step.


Ah yes, this way doesn't work well for later patches because you end
e.g., with the do_break call having to call the interrupt handler
wrappers again when they actually expect to be in the asm entry state
(e.g., irq soft-mask state) when called, and return via interrupt_return
after the exit wrapper runs (which 64s uses to implement better context
tracking for example).

That could possibly be hacked up to deal with multiple interrupt
wrappers per interrupt, but I'd rather not go backwards.

That does leave the other sub archs as having this issue, but they don't
do so much in their handlers. 32 doesn't have soft-mask or context
tracking to deal with for example. We will need to fix this up though
and unify things more.



Not sure I understand what you mean exactly.

On the 8xx, do_break() is called by totally different exceptions:
- Exception 0x1c00 Data breakpoint ==> do_break()
- Exception 0x1300 Instruction TLB error ==> handle_page_fault()
- Exception 0x1400 Data TLB error ==> handle_page_fault()

On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or 
handle_page_fault() being called from very early in ASM.


If you do the same in book3s/64, then there is no issue with interrupt wrappers being called twice, 
is it ?


Christophe

Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

2021-01-14 Thread Nicholas Piggin

Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>> 
>> 
>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>> The page fault handling still has some complex logic particularly around
>>> hash table handling, in asm. Implement this in C instead.
>>> 
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>>   arch/powerpc/kernel/exceptions-64s.S  | 131 +++---
>>>   arch/powerpc/mm/book3s64/hash_utils.c |  77 ++
>>>   arch/powerpc/mm/fault.c   |  46 --
>>>   4 files changed, 107 insertions(+), 148 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
>>> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> index 066b1d34c7bc..60a669379aa0 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>>   #define HPTE_NOHPTE_UPDATE0x2
>>>   #define HPTE_USE_KERNEL_KEY   0x4
>>>   
>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long 
>>> dsisr);
>>>   extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>   unsigned long vsid, pte_t *ptep, unsigned long trap,
>>>   unsigned long flags, int ssize, int subpage_prot);
>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>>> b/arch/powerpc/kernel/exceptions-64s.S
>>> index 6e53f7638737..bcb5e81d2088 100644
>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>*
>>>* Handling:
>>>* - Hash MMU
>>> - *   Go to do_hash_page first to see if the HPT can be filled from an 
>>> entry in
>>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in 
>>> the
>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>*   arbitrary state (e.g., interrupts disabled, locks held) when 
>>> accessing
>>>*   "non-bolted" regions, e.g., vmalloc space. However these should 
>>> always be
>>> - *   backed by Linux page tables.
>>> + *   backed by Linux page table entries.
>>>*
>>> - *   If none is found, do a Linux page fault. Linux page faults can happen 
>>> in
>>> - *   kernel mode due to user copy operations of course.
>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to 
>>> user
>>> + *   copy operations of course.
>>>*
>>>*   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>>*   MMU context, which may cause a DSI in the host, which must go to the
>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>> GEN_COMMON data_access
>>> ld  r4,_DAR(r1)
>>> ld  r5,_DSISR(r1)
>> 
>> We have DSISR here. I think the dispatch between page fault or do_break() 
>> should be done here:
>> - It would be more similar to other arches
> 
> Other sub-archs?
> 
>> - Would avoid doing it also in instruction fault
> 
> True but it's hidden under an unlikely branch so won't really help 
> instruction fault.
> 
>> - Would avoid that -1 return which looks more like a hack.
> 
> I don't really see it as a hack, we return a code to asm caller to
> direct whether to restore registers or not, we alrady have this
> pattern.
> 
> (I'm hoping all that might be go away one day by conrolling NV
> regs from C if we can get good code generation but even if not we
> still have it in the interrupt returns).
> 
> That said I will give it a try here. At very least it might be a
> better intermediate step.

Ah yes, this way doesn't work well for later patches because you end
e.g., with the do_break call having to call the interrupt handler
wrappers again when they actually expect to be in the asm entry state
(e.g., irq soft-mask state) when called, and return via interrupt_return
after the exit wrapper runs (which 64s uses to implement better context
tracking for example).

That could possibly be hacked up to deal with multiple interrupt 
wrappers per interrupt, but I'd rather not go backwards.

That does leave the other sub archs as having this issue, but they don't 
do so much in their handlers. 32 doesn't have soft-mask or context 
tracking to deal with for example. We will need to fix this up though 
and unify things more.

Thanks,
Nick

Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

2021-01-14 Thread Mark Brown

On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote:

> Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW.

> If I declare them as ACTIVE_HIGH instead, then I also have to set
> spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and
> forces the GPIO ACTIVE LOW.

> When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode.

OK, so it sounds like you want SPI_CS_HIGH and that is being set
correctly?

> In fsl_spi_chipselect(), we have
> 
>   bool pol = spi->mode & SPI_CS_HIGH
> 
> Then
>   pdata->cs_control(spi, pol);

> So changing the board config is compensated by the above, and at the end it 
> still doesn't work.

This is a driver bug, the driver set_cs() operation should not be
modifying the value it is told to set.


signature.asc
Description: PGP signature

[PATCH 11/18] arch: powerpc: Remove oprofile

2021-01-14 Thread Viresh Kumar

The previous commit already disabled building oprofile, lets remove the
oprofile directory now.

Suggested-by: Christoph Hellwig 
Suggested-by: Linus Torvalds 
Signed-off-by: Viresh Kumar 
---
 arch/powerpc/oprofile/Makefile |   19 -
 arch/powerpc/oprofile/backtrace.c  |  120 --
 arch/powerpc/oprofile/cell/pr_util.h   |  110 --
 arch/powerpc/oprofile/cell/spu_profiler.c  |  248 ---
 arch/powerpc/oprofile/cell/spu_task_sync.c |  657 
 arch/powerpc/oprofile/cell/vma_map.c   |  279 
 arch/powerpc/oprofile/common.c |  243 ---
 arch/powerpc/oprofile/op_model_7450.c  |  207 ---
 arch/powerpc/oprofile/op_model_cell.c  | 1709 
 arch/powerpc/oprofile/op_model_fsl_emb.c   |  380 -
 arch/powerpc/oprofile/op_model_pa6t.c  |  227 ---
 arch/powerpc/oprofile/op_model_power4.c|  438 -
 12 files changed, 4637 deletions(-)
 delete mode 100644 arch/powerpc/oprofile/Makefile
 delete mode 100644 arch/powerpc/oprofile/backtrace.c
 delete mode 100644 arch/powerpc/oprofile/cell/pr_util.h
 delete mode 100644 arch/powerpc/oprofile/cell/spu_profiler.c
 delete mode 100644 arch/powerpc/oprofile/cell/spu_task_sync.c
 delete mode 100644 arch/powerpc/oprofile/cell/vma_map.c
 delete mode 100644 arch/powerpc/oprofile/common.c
 delete mode 100644 arch/powerpc/oprofile/op_model_7450.c
 delete mode 100644 arch/powerpc/oprofile/op_model_cell.c
 delete mode 100644 arch/powerpc/oprofile/op_model_fsl_emb.c
 delete mode 100644 arch/powerpc/oprofile/op_model_pa6t.c
 delete mode 100644 arch/powerpc/oprofile/op_model_power4.c

diff --git a/arch/powerpc/oprofile/Makefile b/arch/powerpc/oprofile/Makefile
deleted file mode 100644
index bb2d94c8cbe6..
--- a/arch/powerpc/oprofile/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
-obj-$(CONFIG_OPROFILE) += oprofile.o
-
-DRIVER_OBJS := $(addprefix ../../../drivers/oprofile/, \
-   oprof.o cpu_buffer.o buffer_sync.o \
-   event_buffer.o oprofile_files.o \
-   oprofilefs.o oprofile_stats.o \
-   timer_int.o )
-
-oprofile-y := $(DRIVER_OBJS) common.o backtrace.o
-oprofile-$(CONFIG_OPROFILE_CELL) += op_model_cell.o \
-   cell/spu_profiler.o cell/vma_map.o \
-   cell/spu_task_sync.o
-oprofile-$(CONFIG_PPC_BOOK3S_64) += op_model_power4.o op_model_pa6t.o
-oprofile-$(CONFIG_FSL_EMB_PERFMON) += op_model_fsl_emb.o
-oprofile-$(CONFIG_PPC_BOOK3S_32) += op_model_7450.o
diff --git a/arch/powerpc/oprofile/backtrace.c 
b/arch/powerpc/oprofile/backtrace.c
deleted file mode 100644
index 9db7ada79d10..
--- a/arch/powerpc/oprofile/backtrace.c
+++ /dev/null
@@ -1,120 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/**
- * Copyright (C) 2005 Brian Rogan , IBM
- *
-**/
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define STACK_SP(STACK)*(STACK)
-
-#define STACK_LR64(STACK)  *((unsigned long *)(STACK) + 2)
-#define STACK_LR32(STACK)  *((unsigned int *)(STACK) + 1)
-
-#ifdef CONFIG_PPC64
-#define STACK_LR(STACK)STACK_LR64(STACK)
-#else
-#define STACK_LR(STACK)STACK_LR32(STACK)
-#endif
-
-static unsigned int user_getsp32(unsigned int sp, int is_first)
-{
-   unsigned int stack_frame[2];
-   void __user *p = compat_ptr(sp);
-
-   /*
-* The most likely reason for this is that we returned -EFAULT,
-* which means that we've done all that we can do from
-* interrupt context.
-*/
-   if (copy_from_user_nofault(stack_frame, (void __user *)p,
-   sizeof(stack_frame)))
-   return 0;
-
-   if (!is_first)
-   oprofile_add_trace(STACK_LR32(stack_frame));
-
-   /*
-* We do not enforce increasing stack addresses here because
-* we may transition to a different stack, eg a signal handler.
-*/
-   return STACK_SP(stack_frame);
-}
-
-#ifdef CONFIG_PPC64
-static unsigned long user_getsp64(unsigned long sp, int is_first)
-{
-   unsigned long stack_frame[3];
-
-   if (copy_from_user_nofault(stack_frame, (void __user *)sp,
-   sizeof(stack_frame)))
-   return 0;
-
-   if (!is_first)
-   oprofile_add_trace(STACK_LR64(stack_frame));
-
-   return STACK_SP(stack_frame);
-}
-#endif
-
-static unsigned long kernel_getsp(unsigned long sp, int is_first)
-{
-   unsigned long *stack_frame = (unsigned long *)sp;
-
-   if (!validate_sp(sp, current, STACK_FRAME_OVERHEAD))
-   return 0;
-
-   if (!is_first)
-   oprofile_add_trace(STACK_LR(stack_frame));
-
-   /*
-* We do not enforce increasing stack addresses here because
-* we might be transitioning from an interrupt stack to a kernel
-* stack. validate_sp() is designed to understand this, so just
-

[PATCH 10/18] arch: powerpc: Stop building and using oprofile

2021-01-14 Thread Viresh Kumar

The "oprofile" user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.

This commits stops building oprofile for powerpc and removes any
reference to it from directories in arch/powerpc/ apart from
arch/powerpc/oprofile, which will be removed in the next commit (this is
broken into two commits as the size of the commit became very big, ~5k
lines).

Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't
removed as it was also used by other parts of the code.

Suggested-by: Christoph Hellwig 
Suggested-by: Linus Torvalds 
Signed-off-by: Viresh Kumar 
---
 arch/powerpc/Kconfig  |   1 -
 arch/powerpc/Makefile |   2 -
 arch/powerpc/configs/44x/akebono_defconfig|   1 -
 arch/powerpc/configs/44x/currituck_defconfig  |   1 -
 arch/powerpc/configs/44x/fsp2_defconfig   |   1 -
 arch/powerpc/configs/44x/iss476-smp_defconfig |   1 -
 arch/powerpc/configs/cell_defconfig   |   1 -
 arch/powerpc/configs/g5_defconfig |   1 -
 arch/powerpc/configs/maple_defconfig  |   1 -
 arch/powerpc/configs/pasemi_defconfig |   1 -
 arch/powerpc/configs/pmac32_defconfig |   1 -
 arch/powerpc/configs/powernv_defconfig|   1 -
 arch/powerpc/configs/ppc64_defconfig  |   1 -
 arch/powerpc/configs/ppc64e_defconfig |   1 -
 arch/powerpc/configs/ppc6xx_defconfig |   1 -
 arch/powerpc/configs/ps3_defconfig|   1 -
 arch/powerpc/configs/pseries_defconfig|   1 -
 arch/powerpc/include/asm/cputable.h   |  20 ---
 arch/powerpc/include/asm/oprofile_impl.h  | 135 --
 arch/powerpc/include/asm/spu.h|  33 -
 arch/powerpc/kernel/cputable.c|  67 -
 arch/powerpc/kernel/dt_cpu_ftrs.c |   2 -
 arch/powerpc/platforms/cell/Kconfig   |   5 -
 arch/powerpc/platforms/cell/spu_notify.c  |  55 ---
 arch/powerpc/platforms/cell/spufs/run.c   |   4 +-
 arch/powerpc/platforms/cell/spufs/sched.c |   5 -
 arch/powerpc/platforms/cell/spufs/spufs.h |   1 -
 27 files changed, 1 insertion(+), 344 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/oprofile_impl.h
 delete mode 100644 arch/powerpc/platforms/cell/spu_notify.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 107bb4319e0e..1897c0ff2f2b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -226,7 +226,6 @@ config PPC
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI if PERF_EVENTS || (PPC64 && 
PPC_BOOK3S)
select HAVE_HARDLOCKUP_DETECTOR_ARCHif (PPC64 && PPC_BOOK3S)
-   select HAVE_OPROFILE
select HAVE_OPTPROBES   if PPC64
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 08cf0eade56a..b959fdaec713 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -276,8 +276,6 @@ head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)  += 
arch/powerpc/kernel/prom_init.o
 # See arch/powerpc/Kbuild for content of core part of the kernel
 core-y += arch/powerpc/
 
-drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
-
 # Default to zImage, override when needed
 all: zImage
 
diff --git a/arch/powerpc/configs/44x/akebono_defconfig 
b/arch/powerpc/configs/44x/akebono_defconfig
index 3894ba8f8ffc..72b8f93a9bdd 100644
--- a/arch/powerpc/configs/44x/akebono_defconfig
+++ b/arch/powerpc/configs/44x/akebono_defconfig
@@ -8,7 +8,6 @@ CONFIG_EXPERT=y
 CONFIG_KALLSYMS_ALL=y
 # CONFIG_SLUB_CPU_PARTIAL is not set
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
diff --git a/arch/powerpc/configs/44x/currituck_defconfig 
b/arch/powerpc/configs/44x/currituck_defconfig
index 34c86b3abecb..717827219921 100644
--- a/arch/powerpc/configs/44x/currituck_defconfig
+++ b/arch/powerpc/configs/44x/currituck_defconfig
@@ -6,7 +6,6 @@ CONFIG_LOG_BUF_SHIFT=14
 CONFIG_EXPERT=y
 CONFIG_KALLSYMS_ALL=y
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
diff --git a/arch/powerpc/configs/44x/fsp2_defconfig 
b/arch/powerpc/configs/44x/fsp2_defconfig
index 30845ce0885a..8da316e61a08 100644
--- a/arch/powerpc/configs/44x/fsp2_defconfig
+++ b/arch/powerpc/configs/44x/fsp2_defconfig
@@ -17,7 +17,6 @@ CONFIG_KALLSYMS_ALL=y
 CONFIG_BPF_SYSCALL=y
 CONFIG_EMBEDDED=y
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
diff --git a/arch/powerpc/configs/44x/iss476-smp_defconfig 
b/arch/powerpc/configs/44x/iss476-smp_defconfig
index 2c3834eebca3..c11e777b2f3d 100644
--- a/arch/powerpc/configs/44x/iss476-smp_defconfig
+++ b/arch/powerpc/configs/44x/iss476-smp_defconfig
@@ -7,7 +7,6 @@ CONFIG_BLK_DEV_INITRD=y

Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

2021-01-14 Thread Christophe Leroy





Le 13/01/2021 à 13:33, Mark Brown a écrit :

On Wed, Jan 13, 2021 at 09:49:12AM +0100, Christophe Leroy wrote:


With commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO
descriptors") reverted, it is back to work:


...


What shall I do ?


I would guess that there's an error with the chip select polarity
configuration on your system that just happened to work previously, I'd
suggest fixing this in the board configuration to bring it in line with
everything else.



Not that easy.

Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW.

If I declare them as ACTIVE_HIGH instead, then I also have to set spi-cs-high property, otherwise 
of_gpio_flags_quirks() is not happy and forces the GPIO ACTIVE LOW.


When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode.

In fsl_spi_chipselect(), we have

bool pol = spi->mode & SPI_CS_HIGH

Then
pdata->cs_control(spi, pol);

So changing the board config is compensated by the above, and at the end it 
still doesn't work.


Whereas reverting the above mentionned commit sets back SPI_CS_HIGH into spi->mode without changing 
the ACTIVE level of the GPIO, resulting in the correct polarity.



So, I'm a bit lost, where is the problem exactly ?

Thanks
Christophe

Re: [PATCH] perf tools: Resolve symbols against debug file first

2021-01-14 Thread Michael Ellerman

Namhyung Kim  writes:
> On Wed, Jan 13, 2021 at 8:43 PM Jiri Slaby  wrote:
>>
>> On 13. 01. 21, 11:46, Jiri Olsa wrote:
>> > On Wed, Jan 13, 2021 at 09:01:28AM +0100, Jiri Slaby wrote:
>> >> With LTO, there are symbols like these:
>> >> /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug
>> >>   10305: 00955fa4 0 NOTYPE  LOCAL  DEFAULT   29 
>> >> Predicate.cpp.2bc410e7
>> >>
>> >> This comes from a runtime/debug split done by the standard way:
>> >> objcopy --only-keep-debug $runtime $debug
>> >> objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line 
>> >> --strip-all $runtime
>> >>
...
>> >> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
>> >> index f3577f7d72fe..a31b716fa61c 100644
>> >> --- a/tools/perf/util/symbol-elf.c
>> >> +++ b/tools/perf/util/symbol-elf.c
>> >> @@ -1226,12 +1226,20 @@ int dso__load_sym(struct dso *dso, struct map 
>> >> *map, struct symsrc *syms_ss,
>> >>  if (sym.st_shndx == SHN_ABS)
>> >>  continue;
>> >>
>> >> -sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
>> >> +sec = elf_getscn(syms_ss->elf, sym.st_shndx);
>> >>  if (!sec)
>> >>  goto out_elf_end;
>> >
>> > we iterate symbols from syms_ss, so the fix seems to be correct
>> > to call elf_getscn on syms_ss, not on runtime_ss as we do now
>> >
>> > I'd think this worked only when runtime_ss == syms_ss
>>
>> No, because the headers are copied 1:1 from runtime_ss to syms_ss. And
>> runtime_ss is then stripped, so only .debug* sections are removed there.
>> (And syms_ss's are set as NOBITS.)
>>
>> We iterated .debug* sections in syms_ss and used runtime_ss section
>> _headers_ only to adjust symbols (sometimes). That worked.
>
> It seems PPC has an opd section only in the runtime_ss and that's why
> we use it for section headers.

At least on my system (Ubuntu 20.04.1) I see .opd in the debug file with
NOBITS set:

$ readelf -e vmlinux.debug | grep opd
  [37] .opd  NOBITS   c1c1f548  01202e14


But possibly that's not the case with older toolchains?

cheers

Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

2021-01-14 Thread Claire Chang

On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli  wrote:
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > If a device is not behind an IOMMU, we look up the device node and set
> > up the restricted DMA when the restricted-dma-pool is presented.
> >
> > Signed-off-by: Claire Chang 
> > ---
>
> [snip]
>
> > +int of_dma_set_restricted_buffer(struct device *dev)
> > +{
> > + struct device_node *node;
> > + int count, i;
> > +
> > + if (!dev->of_node)
> > + return 0;
> > +
> > + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
> > + sizeof(phandle));
>
> You could have an early check for count < 0, along with an error
> message, if that is deemed useful.
>
> > + for (i = 0; i < count; i++) {
> > + node = of_parse_phandle(dev->of_node, "memory-region", i);
> > + if (of_device_is_compatible(node, "restricted-dma-pool"))
>
> And you may want to add here an of_device_is_available(node). A platform
> that provides the Device Tree firmware and try to support multiple
> different SoCs may try to determine if an IOMMU is present, and if it
> is, it could be marking the restriced-dma-pool region with a 'status =
> "disabled"' property, or any variant of that scheme.

This function is called only when there is no IOMMU present (check in
drivers/of/device.c). I can still add of_device_is_available(node)
here if you think it's helpful.

>
> > + return of_reserved_mem_device_init_by_idx(
> > + dev, dev->of_node, i);
>
> This does not seem to be supporting more than one memory region, did not
> you want something like instead:
>
> ret = of_reserved_mem_device_init_by_idx(...);
> if (ret)
> return ret;
>

Yes. This implement only supports one restriced-dma-pool memory region
with the assumption that there is only one memory region with the
compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
to shared-dma-pool.


> > + }
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index aedfaaafd3e7..e2c7409956ab 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct 
> > device_node *np,
> >   arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
> >
> >   dev->dma_range_map = map;
> > +
> > + if (!iommu)
> > + return of_dma_set_restricted_buffer(dev);
> > +
> >   return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(of_dma_configure_id);
> > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> > index d9e6a324de0a..28a2dfa197ba 100644
> > --- a/drivers/of/of_private.h
> > +++ b/drivers/of/of_private.h
> > @@ -161,12 +161,17 @@ struct bus_dma_region;
> >  #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
> >  int of_dma_get_range(struct device_node *np,
> >   const struct bus_dma_region **map);
> > +int of_dma_set_restricted_buffer(struct device *dev);
> >  #else
> >  static inline int of_dma_get_range(struct device_node *np,
> >   const struct bus_dma_region **map)
> >  {
> >   return -ENODEV;
> >  }
> > +static inline int of_dma_get_restricted_buffer(struct device *dev)
> > +{
> > + return -ENODEV;
> > +}
> >  #endif
> >
> >  #endif /* _LINUX_OF_PRIVATE_H */
> >
>
>
> --
> Florian

Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-14 Thread Claire Chang

On Wed, Jan 13, 2021 at 8:42 PM Christoph Hellwig  wrote:
>
> > +#ifdef CONFIG_SWIOTLB
> > + struct io_tlb_mem   *dma_io_tlb_mem;
> >  #endif
>
> Please add a new config option for this code instead of always building
> it when swiotlb is enabled.
>
> > +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> > start,
> > +size_t size)
>
> Can you split the refactoring in swiotlb.c into one or more prep
> patches?
>
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > + struct device *dev)
> > +{
> > + struct io_tlb_mem *mem = rmem->priv;
> > + int ret;
> > +
> > + if (dev->dma_io_tlb_mem)
> > + return -EBUSY;
> > +
> > + if (!mem) {
> > + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > + if (!mem)
> > + return -ENOMEM;
>
> What is the calling convention here that allows for a NULL and non-NULL
> private data?

Since multiple devices can share the same pool, the private data,
io_tlb_mem struct, will be initialized by the first device attached to
it.
This is similar to rmem_dma_device_init() in kernel/dma/coherent.c.
I'll add a comment for it in next version.

Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet

2021-01-14 Thread Lee Jones

On Wed, 13 Jan 2021, Jakub Kicinski wrote:

> On Wed, 13 Jan 2021 16:41:16 + Lee Jones wrote:
> > Resending the stragglers again. 
> >  
> > 
> > This set is part of a larger effort attempting to clean-up W=1  
> >  
> > kernel builds, which are currently overwhelmingly riddled with  
> >  
> > niggly little warnings. 
> >  
> > 
> >  
> > v2: 
> >  
> >  - Squashed IBM patches 
> >  
> >  - Fixed real issue in SMSC
> >  - Added Andrew's Reviewed-by tags on remainder
> 
> Does not apply, please rebase on net-next/master.

These are based on Tuesday's next/master.

I just rebased them now with no issue.

What conflict are you seeing?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

85 matches

Mail list logo