Re: [PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR

2021-04-08 Thread Daniel Axtens
Hi Ravi,

> Add selftests to test multiple active DAWRs with ptrace interface.

It would be good if somewhere (maybe in the cover letter) you explain
what DAWR stands for and where to find more information about it. I
found the Power ISA v3.1 Book 3 Chapter 9 very helpful.

Apart from that, I don't have any specific comments about this patch. It
looks good to me, it seems to do what it says, and there are no comments
from checkpatch. It is a bit sparse in terms of comments but it is
consistent with the rest of the file so I can't really complain there :)

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Sample o/p:
>   $ ./ptrace-hwbreak
>   ...
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok
>
> Signed-off-by: Ravi Bangoria 
> ---
>  .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++
>  1 file changed, 79 insertions(+)
>
> diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
> b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
> index 2e0d86e0687e..a0635a3819aa 100644
> --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
> +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
> @@ -194,6 +194,18 @@ static void test_workload(void)
>   big_var[rand() % DAWR_MAX_LEN] = 'a';
>   else
>   cvar = big_var[rand() % DAWR_MAX_LEN];
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
> + gstruct.a[rand() % A_LEN] = 'a';
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
> + cvar = gstruct.b[rand() % B_LEN];
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
> + gstruct.a[rand() % A_LEN] = 'a';
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
> + cvar = gstruct.a[rand() % A_LEN];
>  }
>  
>  static void check_success(pid_t child_pid, const char *name, const char 
> *type,
> @@ -417,6 +429,69 @@ static void test_sethwdebug_range_aligned(pid_t 
> child_pid)
>   ptrace_delhwdebug(child_pid, wh);
>  }
>  
> +static void test_multi_sethwdebug_range(pid_t child_pid)
> +{
> + struct ppc_hw_breakpoint info1, info2;
> + unsigned long wp_addr1, wp_addr2;
> + char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED";
> + char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED";
> + int len1, len2;
> + int wh1, wh2;
> +
> + wp_addr1 = (unsigned long)&gstruct.a;
> + wp_addr2 = (unsigned long)&gstruct.b;
> + len1 = A_LEN;
> + len2 = B_LEN;
> + get_ppc_hw_breakpoint(&info1, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
> len1);
> + get_ppc_hw_breakpoint(&info2, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
> len2);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
> + wh1 = ptrace_sethwdebug(child_pid, &info1);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
> + wh2 = ptrace_sethwdebug(child_pid, &info2);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name1, "WO", wp_addr1, len1);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name2, "RO", wp_addr2, len2);
> +
> + ptrace_delhwdebug(child_pid, wh1);
> + ptrace_delhwdebug(child_pid, wh2);
> +}
> +
> +static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid)
> +{
> + struct ppc_hw_breakpoint info1, info2;
> + unsigned long wp_addr1, wp_addr2;
> + char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap";
> + int len1, len2;
> + int wh1, wh2;
> +
> + wp_addr1 = (unsigned long)&gstruct.a;
> + wp_addr2 = (unsigned long)&gstruct.a;
> + len1 = A_LEN;
> + len2 = A_LEN;
> + get_ppc_hw_breakpoint(&info1, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
> len1);
> + get_ppc_hw_breakpoint(&info2, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
> len2);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
> + wh1 = ptrace_sethwdebug(child_pid, &info1);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
> + wh2 = ptrace_sethwdebug(child_pid, &info2);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name, "WO", wp_addr1, len1);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name, "RO", wp_addr2, len2);
> +
> + ptrace_delhwdebug(child_pid, wh1);
> + ptrace_delhwdebug(child_pid, wh2);
> +}
> +
>  static void test_sethwdebug_range_unaligned(pid_t child_pid)
>  {
>   struct ppc_hw_breakpoint info;
> @@ -504,6 +579,10 @@ run_tests(pid_t child_pid, struct ppc_debug_info 
> *dbginfo, bool dawr)
>   test_sethwdebug_range_unaligned(child_pid);
>  

Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-08 Thread Andrew Morton
On Wed, 7 Apr 2021 11:46:37 +0300 Andy Shevchenko  
wrote:

> On Wed, Apr 7, 2021 at 11:17 AM Kees Cook  wrote:
> >
> > On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote:
> > > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > > Here is the attempt to start cleaning it up by splitting out panic and
> > > oops helpers.
> > >
> > > At the same time convert users in header and lib folder to use new header.
> > > Though for time being include new header back to kernel.h to avoid twisted
> > > indirected includes for existing users.
> > >
> > > Signed-off-by: Andy Shevchenko 
> >
> > I like it! Do you have a multi-arch CI to do allmodconfig builds to
> > double-check this?
> 
> Unfortunately no, I rely on plenty of bots that are harvesting mailing lists.
> 
> But I will appreciate it if somebody can run this through various build tests.
> 

um, did you try x86_64 allmodconfig?

I'm up to
kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix-fix.patch
and counting.

From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix

more files need panic_notifier.h

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 arch/x86/xen/enlighten.c|1 +
 drivers/video/fbdev/hyperv_fb.c |1 +
 2 files changed, 2 insertions(+)

--- a/arch/x86/xen/enlighten.c~kernelh-split-out-panic-and-oops-helpers-fix
+++ a/arch/x86/xen/enlighten.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
--- 
a/drivers/video/fbdev/hyperv_fb.c~kernelh-split-out-panic-and-oops-helpers-fix
+++ a/drivers/video/fbdev/hyperv_fb.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
_


From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix

arch/x86/purgatory/purgatory.c needs kernel.h

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 arch/x86/purgatory/purgatory.c |1 +
 1 file changed, 1 insertion(+)

--- 
a/arch/x86/purgatory/purgatory.c~kernelh-split-out-panic-and-oops-helpers-fix-fix
+++ a/arch/x86/purgatory/purgatory.c
@@ -8,6 +8,7 @@
  *   Vivek Goyal 
  */
 
+#include 
 #include 
 #include 
 #include 
_

From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix

drivers/clk/analogbits/wrpll-cln28hpc.c needs minmax.h, math.h and limits.h

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 drivers/clk/analogbits/wrpll-cln28hpc.c |4 
 1 file changed, 4 insertions(+)

--- 
a/drivers/clk/analogbits/wrpll-cln28hpc.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix
+++ a/drivers/clk/analogbits/wrpll-cln28hpc.c
@@ -25,6 +25,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+
 #include 
 
 /* MIN_INPUT_FREQ: minimum input clock frequency, in Hz (Fref_min) */
_

From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix

drivers/misc/pvpanic/pvpanic.c needs panic_notifier.h

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 drivers/misc/pvpanic/pvpanic.c |1 +
 1 file changed, 1 insertion(+)

--- 
a/drivers/misc/pvpanic/pvpanic.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix
+++ a/drivers/misc/pvpanic/pvpanic.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
_
From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix

fix drivers/misc/pvpanic/pvpanic.c and drivers/net/ipa/ipa_smp2p.c

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 drivers/net/ipa/ipa_smp2p.c |1 +
 1 file changed, 1 insertion(+)

--- 
a/drivers/net/ipa/ipa_smp2p.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix
+++ a/drivers/net/ipa/ipa_smp2p.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
_

From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix

fix drivers/power/reset/ltc2952-poweroff.c and drivers/misc/bcm-vk/bcm_vk_dev.c

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 drivers/misc/bcm-vk/bcm_vk_dev.c   |1 +
 drivers/power/reset/ltc2952-poweroff.c |1 +
 2 files changed, 2 insertions(+)

--- 
a/drivers/power/reset/ltc2952-poweroff.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix
+++ a/drivers/power/reset/ltc2952-poweroff.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- 
a/drivers/misc/bcm-vk/bcm_vk_dev.c~kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix
+++ a/drivers/misc/bcm-vk/bcm_vk_dev.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
_

From: Andrew Morton 
Subject: kernelh-split-out-panic-and-oops-helpers-fix-fix-fix-fix-fix-fix-fix

fix drivers/leds/trigger/ledtrig-panic.c and drivers/firmware/google/gsmi.c

Cc: Andy Shevchenko 
Signed-off-by: Andrew Morton 
---

 drivers/firmware/google/gsmi.c   |1 +
 drivers/leds/trigger/ledtrig-panic.c |

Re: [PATCH v3 0/9] Speedup mremap on ppc64

2021-04-08 Thread Aneesh Kumar K.V



"Aneesh Kumar K.V"  writes:

> This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
> the platform to support updating higher-level page tables without
> updating page table entries. This also needs to invalidate the Page Walk
> Cache on architecture supporting the same.
>
> Changes from v2:
> * switch from using mmu_gather to flush_pte_tlb_pwc_range() 
>
> Changes from v1:
> * Rebase to recent upstream
> * Fix build issues with tlb_gather_mmu changes
>

Gentle ping. Any objections for this series? 

-aneesh


Re: [powerpc:next-test] BUILD REGRESSION 3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb

2021-04-08 Thread Christophe Leroy




Le 09/04/2021 à 04:28, kernel test robot a écrit :

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb  powerpc/xive: Modernize 
XIVE-IPI domain with an 'alloc' handler

Error/Warning reports:

https://lore.kernel.org/linuxppc-dev/202104090230.acwno03u-...@intel.com
https://lore.kernel.org/linuxppc-dev/202104090827.jh0wbicc-...@intel.com

Error/Warning in current branch:

include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_171' 
declared with attribute error: BUILD_BUG_ON failed: TASK_SIZE > MODULES_VADDR


As I pointed in the report, this is because the rand config has set TASK_SIZE to 0xc000 without 
changing PAGE_OFFSET. Therefore there is no space inbetween for the 256Mbytes segment for modules.

It is too complex to guard this inside the Kconfig, that's the reason why we 
have a BUILD_BUG_ON().

There was already a similar kind of build test to make sure TASK_SIZE is not 
greater than KERNEL_START.



Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- powerpc-randconfig-s031-20210408
|   |-- 
drivers-w1-slaves-w1_ds28e04.c:sparse:sparse:incorrect-type-in-initializer-(different-address-spaces)-expected-char-const-noderef-__user-_gu_addr-got-char-const-buf
|   `-- 
drivers-w1-slaves-w1_ds28e04.c:sparse:sparse:incorrect-type-in-initializer-(different-address-spaces)-expected-char-noderef-__user-_pu_addr-got-char-buf
`-- powerpc64-randconfig-c004-20210408
 `-- 
include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:BUILD_BUG_ON-failed:TASK_SIZE-MODULES_VADDR

elapsed time: 727m

configs tested: 166
configs skipped: 2

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
riscvallyesconfig
i386 allyesconfig
mips rt305x_defconfig
umallnoconfig
sh  urquell_defconfig
shtitan_defconfig
arm ezx_defconfig
armoxnas_v6_defconfig
powerpc akebono_defconfig
arm eseries_pxa_defconfig
armpleb_defconfig
m68k amcore_defconfig
sparc   sparc32_defconfig
powerpc ppa8548_defconfig
x86_64   alldefconfig
mipsmaltaup_xpa_defconfig
xtensa  cadence_csp_defconfig
powerpc   allnoconfig
powerpc  mgcoge_defconfig
powerpc linkstation_defconfig
shmigor_defconfig
mips   lemote2f_defconfig
m68km5407c3_defconfig
armlart_defconfig
arm   spitz_defconfig
arm palmz72_defconfig
arm lpc32xx_defconfig
ia64 alldefconfig
powerpc mpc832x_mds_defconfig
powerpc  ppc6xx_defconfig
sh   sh7770_generic_defconfig
sh   sh2007_defconfig
mips   ip28_defconfig
sh  r7780mp_defconfig
m68kmvme16x_defconfig
armmulti_v5_defconfig
powerpc kmeter1_defconfig
arc nsimosci_hs_defconfig
armclps711x_defconfig
xtensaxip_kc705_defconfig
m68k   bvme6000_defconfig
h8300alldefconfig
riscvnommu_k210_defconfig
mips loongson1b_defconfig
mips  decstation_64_defconfig
powerpc  ppc64e_defconfig
mips  rb532_defconfig
powerpc mpc834x_mds_defconfig
sh  landisk_defconfig
powerpc  arches_defconfig
m68k  hp300_defconfig
s390  debug_defconfig
sh kfr2r09-romimage_defconfig
arm mxs_defconfig
mips  malta_defconfig
arm   u8500_defconfig
sh   se7206_defconfig
nios2alldefconfig
arcvdk_hs38_defconfig
sh  sdk7786_defconfig
powerpc mpc83xx_defconfig
arm  pxa3xx_defconfig
um   

Re: [powerpc:next-test 168/182] include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_171' declared with attribute error: BUILD_BUG_ON failed: TASK_SIZE > MODULES_VADDR

2021-04-08 Thread Christophe Leroy




Le 09/04/2021 à 02:41, kernel test robot a écrit :

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb
commit: 093cb12967d4bde01a4170fd342bc0d443004599 [168/182] powerpc/32s: Define 
a MODULE area below kernel text all the time
config: powerpc64-randconfig-c004-20210408 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 # 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=093cb12967d4bde01a4170fd342bc0d443004599
 git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
 git fetch --no-tags powerpc next-test
 git checkout 093cb12967d4bde01a4170fd342bc0d443004599
 # save the attached .config to linux build tree
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

In file included from :
arch/powerpc/kernel/module.c: In function 'module_alloc':

include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_171' 
declared with attribute error: BUILD_BUG_ON failed: TASK_SIZE > MODULES_VADDR


I don't think there is much we can do about that.

TASK_SIZE is set to 0xb000 by default on BOOK3S/32 in Kconfig.

If the user forces a greater value without increasing PAGE_OFFSET accordingly, it won't work. The 
BUILD_BUG_ON() is there to catch it.


Christophe


Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread Leonardo Bras
Em sex., 9 de abr. de 2021 01:36, Alexey Kardashevskiy 
escreveu:

>
>
> On 08/04/2021 19:04, Michael Ellerman wrote:
> > Alexey Kardashevskiy  writes:
> >> On 08/04/2021 15:37, Michael Ellerman wrote:
> >>> Leonardo Bras  writes:
>  According to LoPAR, ibm,query-pe-dma-window output named "IO Page
> Sizes"
>  will let the OS know all possible pagesizes that can be used for
> creating a
>  new DDW.
> 
>  Currently Linux will only try using 3 of the 8 available options:
>  4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M,
> 64M,
>  128M, 256M and 16G.
> >>>
> >>> Do we know of any hardware & hypervisor combination that will actually
> >>> give us bigger pages?
> >>
> >>
> >> On P8 16MB host pages and 16MB hardware iommu pages worked.
> >>
> >> On P9, VM's 16MB IOMMU pages worked on top of 2MB host pages + 2MB
> >> hardware IOMMU pages.
> >
> > The current code already tries 16MB though.
> >
> > I'm wondering if we're going to ask for larger sizes that have never
> > been tested and possibly expose bugs. But it sounds like this is mainly
> > targeted at future platforms.
>
>
> I tried for fun to pass through a PCI device to a guest with this patch as:
>
> pbuild/qemu-killslof-aiku1904le-ppc64/qemu-system-ppc64 \
> -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> -mon id=MON0,chardev=STDIO0,mode=readline \
> -nographic \
> -vga none \
> -enable-kvm \
> -m 16G \
> -kernel ./vmldbg \
> -initrd /home/aik/t/le.cpio \
> -device vfio-pci,id=vfio0001_01_00_0,host=0001:01:00.0 \
> -mem-prealloc \
> -mem-path qemu_hp_1G_node0 \
> -global spapr-pci-host-bridge.pgsz=0xff000 \
> -machine cap-cfpc=broken,cap-ccf-assist=off \
> -smp 1,threads=1 \
> -L /home/aik/t/qemu-ppc64-bios/ \
> -trace events=qemu_trace_events \
> -d guest_errors,mmu \
> -chardev socket,id=SOCKET0,server=on,wait=off,path=qemu.mon.1_1_0_0 \
> -mon chardev=SOCKET0,mode=control
>
>
> The guest created a huge window:
>
> xhci_hcd :00:00.0: ibm,create-pe-dma-window(2027) 0 800 2000
> 22 22 returned 0 (liobn = 0x8001 starting addr = 800 0)
>
> The first "22" is page_shift in hex (16GB), the second "22" is
> window_shift (so we have 1 TCE).
>
> On the host side the window#1 was created with 1GB pages:
> pci 0001:01 : [PE# fd] Setting up window#1
> 800..80007ff pg=4000
>
>
> The XHCI seems working. Without the patch 16MB was the maximum.
>
>
> >
>  diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
>  index 9fc5217f0c8e..6cda1c92597d 100644
>  --- a/arch/powerpc/platforms/pseries/iommu.c
>  +++ b/arch/powerpc/platforms/pseries/iommu.c
>  @@ -53,6 +53,20 @@ enum {
> DDW_EXT_QUERY_OUT_SIZE = 2
> };
> >>>
> >>> A comment saying where the values come from would be good.
> >>>
>  +#define QUERY_DDW_PGSIZE_4K   0x01
>  +#define QUERY_DDW_PGSIZE_64K  0x02
>  +#define QUERY_DDW_PGSIZE_16M  0x04
>  +#define QUERY_DDW_PGSIZE_32M  0x08
>  +#define QUERY_DDW_PGSIZE_64M  0x10
>  +#define QUERY_DDW_PGSIZE_128M 0x20
>  +#define QUERY_DDW_PGSIZE_256M 0x40
>  +#define QUERY_DDW_PGSIZE_16G  0x80
> >>>
> >>> I'm not sure the #defines really gain us much vs just putting the
> >>> literal values in the array below?
> >>
> >> Then someone says "u magic values" :) I do not mind either way.
> Thanks,
> >
> > Yeah that's true. But #defining them doesn't make them less magic, if
> > you only use them in one place :)
>
> Defining them with "QUERY_DDW" in the names kinda tells where they are
> from. Can also grep QEMU using these to see how the other side handles
> it. Dunno.
>
> btw the bot complained about __builtin_ctz(SZ_16G) which should be
> __builtin_ctzl(SZ_16G) so we have to ask Leonardo to repost anyway :)
>

Thanks for testing!

http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/

I sent a v3 a few hours ago, fixing this by using __builtin_ctzll() instead
of __builtin_ctz() in all sizes, and it worked like a charm.

I also reverted to the previous approach of not having QUERY_DDW defines
for masks, as Michael suggested.

I can revert back to v2 approach if you guys decide it's better.

Best regards,
Leonardo Bras


Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread Alexey Kardashevskiy




On 08/04/2021 19:04, Michael Ellerman wrote:

Alexey Kardashevskiy  writes:

On 08/04/2021 15:37, Michael Ellerman wrote:

Leonardo Bras  writes:

According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
will let the OS know all possible pagesizes that can be used for creating a
new DDW.

Currently Linux will only try using 3 of the 8 available options:
4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
128M, 256M and 16G.


Do we know of any hardware & hypervisor combination that will actually
give us bigger pages?



On P8 16MB host pages and 16MB hardware iommu pages worked.

On P9, VM's 16MB IOMMU pages worked on top of 2MB host pages + 2MB
hardware IOMMU pages.


The current code already tries 16MB though.

I'm wondering if we're going to ask for larger sizes that have never
been tested and possibly expose bugs. But it sounds like this is mainly
targeted at future platforms.



I tried for fun to pass through a PCI device to a guest with this patch as:

pbuild/qemu-killslof-aiku1904le-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 16G \
-kernel ./vmldbg \
-initrd /home/aik/t/le.cpio \
-device vfio-pci,id=vfio0001_01_00_0,host=0001:01:00.0 \
-mem-prealloc \
-mem-path qemu_hp_1G_node0 \
-global spapr-pci-host-bridge.pgsz=0xff000 \
-machine cap-cfpc=broken,cap-ccf-assist=off \
-smp 1,threads=1 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors,mmu \
-chardev socket,id=SOCKET0,server=on,wait=off,path=qemu.mon.1_1_0_0 \
-mon chardev=SOCKET0,mode=control


The guest created a huge window:

xhci_hcd :00:00.0: ibm,create-pe-dma-window(2027) 0 800 2000 
22 22 returned 0 (liobn = 0x8001 starting addr = 800 0)


The first "22" is page_shift in hex (16GB), the second "22" is 
window_shift (so we have 1 TCE).


On the host side the window#1 was created with 1GB pages:
pci 0001:01 : [PE# fd] Setting up window#1 
800..80007ff pg=4000



The XHCI seems working. Without the patch 16MB was the maximum.





diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..6cda1c92597d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,6 +53,20 @@ enum {
DDW_EXT_QUERY_OUT_SIZE = 2
   };


A comment saying where the values come from would be good.


+#define QUERY_DDW_PGSIZE_4K0x01
+#define QUERY_DDW_PGSIZE_64K   0x02
+#define QUERY_DDW_PGSIZE_16M   0x04
+#define QUERY_DDW_PGSIZE_32M   0x08
+#define QUERY_DDW_PGSIZE_64M   0x10
+#define QUERY_DDW_PGSIZE_128M  0x20
+#define QUERY_DDW_PGSIZE_256M  0x40
+#define QUERY_DDW_PGSIZE_16G   0x80


I'm not sure the #defines really gain us much vs just putting the
literal values in the array below?


Then someone says "u magic values" :) I do not mind either way. Thanks,


Yeah that's true. But #defining them doesn't make them less magic, if
you only use them in one place :)


Defining them with "QUERY_DDW" in the names kinda tells where they are 
from. Can also grep QEMU using these to see how the other side handles 
it. Dunno.


btw the bot complained about __builtin_ctz(SZ_16G) which should be 
__builtin_ctzl(SZ_16G) so we have to ask Leonardo to repost anyway :)




--
Alexey


Re: [PATCH v6 30/48] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C

2021-04-08 Thread Alexey Kardashevskiy




On 05/04/2021 11:19, Nicholas Piggin wrote:

Almost all logic is moved to C, by introducing a new in_guest mode for
the P9 path that branches very early in the KVM interrupt handler to
P9 exit code.

The main P9 entry and exit assembly is now only about 160 lines of low
level stack setup and register save/restore, plus a bad-interrupt
handler.

There are two motivations for this, the first is just make the code more
maintainable being in C. The second is to reduce the amount of code
running in a special KVM mode, "realmode". In quotes because with radix
it is no longer necessarily real-mode in the MMU, but it still has to be
treated specially because it may be in real-mode, and has various
important registers like PID, DEC, TB, etc set to guest. This is hostile
to the rest of Linux and can't use arbitrary kernel functionality or be
instrumented well.

This initial patch is a reasonably faithful conversion of the asm code,
but it does lack any loop to return quickly back into the guest without
switching out of realmode in the case of unimportant or easily handled
interrupts. As explained in previous changes, handling HV interrupts
in real mode is not so important for P9.

Use of Linux 64s interrupt entry code register conventions including
paca EX_ save areas are brought into the KVM code. There is no point
shuffling things into different paca save areas and making up a
different calling convention for KVM.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/asm-prototypes.h |   3 +-
  arch/powerpc/include/asm/kvm_asm.h|   3 +-
  arch/powerpc/include/asm/kvm_book3s_64.h  |   8 +
  arch/powerpc/include/asm/kvm_host.h   |   7 +-
  arch/powerpc/kernel/security.c|   5 +-
  arch/powerpc/kvm/Makefile |   1 +
  arch/powerpc/kvm/book3s_64_entry.S| 247 ++
  arch/powerpc/kvm/book3s_hv.c  |   9 +-
  arch/powerpc/kvm/book3s_hv_interrupt.c| 218 +++
  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 125 +--
  10 files changed, 501 insertions(+), 125 deletions(-)
  create mode 100644 arch/powerpc/kvm/book3s_hv_interrupt.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 939f3c94c8f3..7c74c80ed994 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -122,6 +122,7 @@ extern s32 patch__call_flush_branch_caches3;
  extern s32 patch__flush_count_cache_return;
  extern s32 patch__flush_link_stack_return;
  extern s32 patch__call_kvm_flush_link_stack;
+extern s32 patch__call_kvm_flush_link_stack_p9;
  extern s32 patch__memset_nocache, patch__memcpy_nocache;
  
  extern long flush_branch_caches;

@@ -142,7 +143,7 @@ void kvmhv_load_host_pmu(void);
  void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
  void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
  
-int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);

+void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu);
  
  long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr);

  long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr,
diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index a3633560493b..b4f9996bd331 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -146,7 +146,8 @@
  #define KVM_GUEST_MODE_GUEST  1
  #define KVM_GUEST_MODE_SKIP   2
  #define KVM_GUEST_MODE_GUEST_HV   3
-#define KVM_GUEST_MODE_HOST_HV 4
+#define KVM_GUEST_MODE_GUEST_HV_FAST   4 /* ISA v3.0 with host radix mode */
+#define KVM_GUEST_MODE_HOST_HV 5
  
  #define KVM_INST_FETCH_FAILED	-1
  
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h

index 9bb9bb370b53..c214bcffb441 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -153,9 +153,17 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu 
*vcpu)
return radix;
  }
  
+int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);

+
  #define KVM_DEFAULT_HPT_ORDER 24  /* 16MB HPT by default */
  #endif
  
+/*

+ * Invalid HDSISR value which is used to indicate when HW has not set the reg.
+ * Used to work around an errata.
+ */
+#define HDSISR_CANARY  0x7fff
+
  /*
   * We use a lock bit in HPTE dword 0 to synchronize updates and
   * accesses to each HPTE, and another bit to indicate non-present
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 05fb00d37609..fa0083345b11 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -690,7 +690,12 @@ struct kvm_vcpu_arch {
ulong fault_dar;
u32 fault_dsisr;
unsigned long intr_msr;
-   ulong fault_gpa;/* guest real address of page fault (POWER9) */
+   /*
+* POWER9 and later, fault_gpa contains the guest real address of page
+* fault for a

Re: [PATCH 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

2021-04-08 Thread Leonardo Bras
Hello David, thanks for commenting.

On Tue, 2021-03-23 at 10:45 +1100, David Gibson wrote:
> > @@ -805,6 +808,10 @@ static int resize_hpt_for_hotplug(unsigned long 
> > new_mem_size, bool shrinking)
> >     if (shrinking) {
> > 
> > +   /* When batch removing entries, only resizes HPT at the end. */
> > +   if (atomic_read_acquire(&hpt_resize_disable))
> > +   return 0;
> > +
> 
> I'm not quite convinced by this locking.  Couldn't hpt_resize_disable
> be set after this point, but while you're still inside
> resize_hpt_for_hotplug()?  Probably better to use an explicit mutex
> (and mutex_trylock()) to make the critical sections clearer.

Sure, I can do that for v2.

> Except... do we even need the fancy mechanics to suppress the resizes
> in one place to do them elswhere.  Couldn't we just replace the
> existing resize calls with the batched ones?

How do you think of having batched resizes-down in HPT? 
Other than the current approach, I could only think of a way that would
touch a lot of generic code, and/or duplicate some functions, as
dlpar_add_lmb() does a lot of other stuff.

> > +void hash_memory_batch_shrink_end(void)
> > +{
> > +   unsigned long newsize;
> > +
> > +   /* Re-enables HPT resize-down after hot-unplug */
> > +   atomic_set_release(&hpt_resize_disable, 0);
> > +
> > +   newsize = memblock_phys_mem_size();
> > +   /* Resize to smallest SHIFT possible */
> > +   while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) {
> > +   newsize *= 2;
> 
> As noted earlier, doing this without an explicit cap on the new hpt
> size (of the existing size) this makes me nervous. 
> 

I can add a stop in v2.

>  Less so, but doing
> the calculations on memory size, rather than explictly on HPT size /
> HPT order also seems kinda clunky.

Agree, but at this point, it would seem kind of a waste to find the
shift from newsize, then calculate (1 << shift) for each retry of
resize_hpt_for_hotplug() only to point that we are retrying the order
value.

But sure, if you think it looks better, I can change that. 

> > +void memory_batch_shrink_begin(void)
> > +{
> > +   if (!radix_enabled())
> > +   hash_memory_batch_shrink_begin();
> > +}
> > +
> > +void memory_batch_shrink_end(void)
> > +{
> > +   if (!radix_enabled())
> > +   hash_memory_batch_shrink_end();
> > +}
> 
> Again, these wrappers don't seem particularly useful to me.

Options would be add 'if (!radix_enabled())' to hotplug-memory.c
functions or to hash* functions, which look kind of wrong.

> > +   memory_batch_shrink_end();
> 
> remove_by_index only removes a single LMB, so there's no real point to
> batching here.

Sure, will be fixed for v2.

> > @@ -700,6 +712,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >     if (lmbs_added != lmbs_to_add) {
> >     pr_err("Memory hot-add failed, removing any added LMBs\n");
> > 
> > +   memory_batch_shrink_begin();
> 
> 
> The effect of these on the memory grow path is far from clear.
> 

On hotplug, HPT is resized-up before adding LMBs.
On hotunplug, HPT is resized-down after removing LMBs.
And each one has it's own mechanism to batch HPT resizes...

I can't understand exactly how using it on hotplug fail path can be any
different than using it on hotunplug.
> 

Can you please help me understanding this?

Best regards,
Leonardo Bras



Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-04-08 Thread Leonardo Bras
Hello David, thanks for the feedback!

On Mon, 2021-03-22 at 18:55 +1100, David Gibson wrote:
> > +void hash_memory_batch_expand_prepare(unsigned long newsize)
> > +{
> > +   /*
> > +* Resizing-up HPT should never fail, but there are some cases system 
> > starts with higher
> > +* SHIFT than required, and we go through the funny case of resizing 
> > HPT down while
> > +* adding memory
> > +*/
> > +
> > +   while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > +   newsize *= 2;
> > +   pr_warn("Hash collision while resizing HPT\n");
> 
> This unbounded increase in newsize makes me nervous - we should be
> bounded by the current size of the HPT at least.  In practice we
> should be fine, since the resize should always succeed by the time we
> reach our current HPT size, but that's far from obvious from this
> point in the code.

Sure, I will add bounds in v2.

> 
> And... you're doubling newsize which is a value which might not be a
> power of 2.  I'm wondering if there's an edge case where this could
> actually cause us to skip the current size and erroneously resize to
> one bigger than we have currently.

I also though that at the start, but it seems quite reliable.
Before using this value, htab_shift_for_mem_size() will always round it
to next power of 2. 
Ex.
Any value between 0b0101 and 0b1000 will be rounded to 0b1000 for shift
calculation. If we multiply it by 2 (same as << 1), we have that
anything between 0b01010 and 0b1 will be rounded to 0b1. 

This works just fine as long as we are multiplying. 
Division may have the behavior you expect, as 0b0101 >> 1 would become
0b010 and skip a shift.

> > +void memory_batch_expand_prepare(unsigned long newsize)
> 
> This wrapper doesn't seem useful.

Yeah, it does little, but I can't just jump into hash_* functions
directly from hotplug-memory.c, without even knowing if it's using hash
pagetables. (in case the suggestion would be test for disable_radix
inside hash_memory_batch*)

> 
> > +{
> > +   if (!radix_enabled())
> > +   hash_memory_batch_expand_prepare(newsize);
> > +}
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >  
> > 
> > +   memory_batch_expand_prepare(memblock_phys_mem_size() +
> > +drmem_info->n_lmbs * drmem_lmb_size());
> 
> This doesn't look right.  memory_add_by_index() is adding a *single*
> LMB, I think using drmem_info->n_lmbs here means you're counting this
> as adding again as much memory as you already have hotplugged.

Yeah, my mistake. This makes sense.
I will change it to something like 
memblock_phys_mem_size() + drmem_lmb_size()

> > 
> > +   memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> > drmem_lmb_size());
> > +
> >     for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> >     if (lmb->flags & DRCONF_MEM_ASSIGNED)
> >     continue;
> 
> I don't see memory_batch_expand_prepare() suppressing any existing HPT
> resizes.  Won't this just resize to the right size for the full add,
> then resize several times again as we perform the add?  Or.. I guess
> that will be suppressed by patch 1/3. 

Correct.

>  That's seems kinda fragile, though.

What do you mean by fragile here?
What would you suggest doing different?

Best regards,
Leonardo Bras



[powerpc:next] BUILD SUCCESS c46bbf5d2defae50d61ddf31502017ee8952af83

2021-04-08 Thread kernel test robot
onfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a004-20210408
x86_64   randconfig-a005-20210408
x86_64   randconfig-a003-20210408
x86_64   randconfig-a001-20210408
x86_64   randconfig-a002-20210408
x86_64   randconfig-a006-20210408
i386 randconfig-a006-20210408
i386 randconfig-a003-20210408
i386 randconfig-a001-20210408
i386 randconfig-a004-20210408
i386 randconfig-a005-20210408
i386 randconfig-a002-20210408
i386 randconfig-a014-20210408
i386 randconfig-a016-20210408
i386 randconfig-a011-20210408
i386 randconfig-a012-20210408
i386 randconfig-a013-20210408
i386 randconfig-a015-20210408
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
um   allmodconfig
um   allyesconfig
um  defconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a014-20210408
x86_64   randconfig-a015-20210408
x86_64   randconfig-a012-20210408
x86_64   randconfig-a011-20210408
x86_64   randconfig-a013-20210408
x86_64   randconfig-a016-20210408

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[powerpc:next-test] BUILD REGRESSION 3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb

2021-04-08 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb  powerpc/xive: Modernize 
XIVE-IPI domain with an 'alloc' handler

Error/Warning reports:

https://lore.kernel.org/linuxppc-dev/202104090230.acwno03u-...@intel.com
https://lore.kernel.org/linuxppc-dev/202104090827.jh0wbicc-...@intel.com

Error/Warning in current branch:

include/linux/compiler_types.h:320:38: error: call to 
'__compiletime_assert_171' declared with attribute error: BUILD_BUG_ON failed: 
TASK_SIZE > MODULES_VADDR

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- powerpc-randconfig-s031-20210408
|   |-- 
drivers-w1-slaves-w1_ds28e04.c:sparse:sparse:incorrect-type-in-initializer-(different-address-spaces)-expected-char-const-noderef-__user-_gu_addr-got-char-const-buf
|   `-- 
drivers-w1-slaves-w1_ds28e04.c:sparse:sparse:incorrect-type-in-initializer-(different-address-spaces)-expected-char-noderef-__user-_pu_addr-got-char-buf
`-- powerpc64-randconfig-c004-20210408
`-- 
include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:BUILD_BUG_ON-failed:TASK_SIZE-MODULES_VADDR

elapsed time: 727m

configs tested: 166
configs skipped: 2

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
riscvallyesconfig
i386 allyesconfig
mips rt305x_defconfig
umallnoconfig
sh  urquell_defconfig
shtitan_defconfig
arm ezx_defconfig
armoxnas_v6_defconfig
powerpc akebono_defconfig
arm eseries_pxa_defconfig
armpleb_defconfig
m68k amcore_defconfig
sparc   sparc32_defconfig
powerpc ppa8548_defconfig
x86_64   alldefconfig
mipsmaltaup_xpa_defconfig
xtensa  cadence_csp_defconfig
powerpc   allnoconfig
powerpc  mgcoge_defconfig
powerpc linkstation_defconfig
shmigor_defconfig
mips   lemote2f_defconfig
m68km5407c3_defconfig
armlart_defconfig
arm   spitz_defconfig
arm palmz72_defconfig
arm lpc32xx_defconfig
ia64 alldefconfig
powerpc mpc832x_mds_defconfig
powerpc  ppc6xx_defconfig
sh   sh7770_generic_defconfig
sh   sh2007_defconfig
mips   ip28_defconfig
sh  r7780mp_defconfig
m68kmvme16x_defconfig
armmulti_v5_defconfig
powerpc kmeter1_defconfig
arc nsimosci_hs_defconfig
armclps711x_defconfig
xtensaxip_kc705_defconfig
m68k   bvme6000_defconfig
h8300alldefconfig
riscvnommu_k210_defconfig
mips loongson1b_defconfig
mips  decstation_64_defconfig
powerpc  ppc64e_defconfig
mips  rb532_defconfig
powerpc mpc834x_mds_defconfig
sh  landisk_defconfig
powerpc  arches_defconfig
m68k  hp300_defconfig
s390  debug_defconfig
sh kfr2r09-romimage_defconfig
arm mxs_defconfig
mips  malta_defconfig
arm   u8500_defconfig
sh   se7206_defconfig
nios2alldefconfig
arcvdk_hs38_defconfig
sh  sdk7786_defconfig
powerpc mpc83xx_defconfig
arm  pxa3xx_defconfig
um   x86_64_defconfig
armzeus_defconfig
arm  footbridge_defconfig
powerpcwarp_defconfig
mips   ip22_defconfig
m68k  multi_defconfig
sh  lboxre2_defconfig
powerpc mpc5200_defconfig
powerpc  ep88xc_defconfig
m68k  amiga_defconfig
arm  colib

[powerpc:merge] BUILD SUCCESS f2b8ef18c8e0634e176be99dcf242e515cfdb1d3

2021-04-08 Thread kernel test robot
onfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20210408
x86_64   randconfig-a005-20210408
x86_64   randconfig-a003-20210408
x86_64   randconfig-a001-20210408
x86_64   randconfig-a002-20210408
x86_64   randconfig-a006-20210408
i386 randconfig-a006-20210408
i386 randconfig-a003-20210408
i386 randconfig-a001-20210408
i386 randconfig-a004-20210408
i386 randconfig-a005-20210408
i386 randconfig-a002-20210408
i386 randconfig-a006-20210407
i386 randconfig-a003-20210407
i386 randconfig-a001-20210407
i386 randconfig-a004-20210407
i386 randconfig-a002-20210407
i386 randconfig-a005-20210407
x86_64   randconfig-a014-20210407
x86_64   randconfig-a015-20210407
x86_64   randconfig-a013-20210407
x86_64   randconfig-a011-20210407
x86_64   randconfig-a012-20210407
x86_64   randconfig-a016-20210407
i386 randconfig-a014-20210408
i386 randconfig-a016-20210408
i386 randconfig-a011-20210408
i386 randconfig-a012-20210408
i386 randconfig-a013-20210408
i386 randconfig-a015-20210408
i386 randconfig-a014-20210407
i386 randconfig-a011-20210407
i386 randconfig-a016-20210407
i386 randconfig-a012-20210407
i386 randconfig-a015-20210407
i386 randconfig-a013-20210407
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
um   allmodconfig
umallnoconfig
um   allyesconfig
um  defconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a014-20210408
x86_64   randconfig-a015-20210408
x86_64   randconfig-a012-20210408
x86_64   randconfig-a011-20210408
x86_64   randconfig-a013-20210408
x86_64   randconfig-a016-20210408

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug

2021-04-08 Thread Leonardo Bras
Hello David, thanks for your feedback.

On Mon, 2021-03-22 at 17:49 +1100, David Gibson wrote:
> I don't love this approach.  Adding the extra flag at this level seems
> a bit inelegant, and it means we're passing up an easy opportunity to
> reduce our resource footprint on the host.

I understand, but trying to reduce resource footprint in host, and
mostly failing is what causes hot-add and hot-remove to take so long.

> But... maybe we'll have to do it.  I'd like to see if we can get
> things to work well enough with just the "batching" to avoid multiple
> resize attempts first.

This batching is something I had thought a lot about.
Problem is that there are a lot of generic interfaces between memory
hotplug and actually resizing HPT. I tried a simpler approach in
patches 2 & 3, so I don't touch much stuff there.

Best regards,
Leonardo Bras






Re: [PATCH 2/8] CMDLINE: drivers: of: ifdef out cmdline section

2021-04-08 Thread Daniel Walker
On Wed, Apr 07, 2021 at 05:59:15PM -0500, Rob Herring wrote:
> On Tue, Mar 30, 2021 at 04:17:53PM -0700, Daniel Walker wrote:
> > On Tue, Mar 30, 2021 at 02:49:13PM -0500, Rob Herring wrote:
> > > On Tue, Mar 30, 2021 at 12:57 PM Daniel Walker  wrote:
> > > >
> > > > It looks like there's some seepage of cmdline stuff into
> > > > the generic device tree code. This conflicts with the
> > > > generic cmdline implementation so I remove it in the case
> > > > when that's enabled.
> > > >
> > > > Cc: xe-linux-exter...@cisco.com
> > > > Signed-off-by: Ruslan Ruslichenko 
> > > > Signed-off-by: Daniel Walker 
> > > > ---
> > > >  drivers/of/fdt.c | 14 ++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index dcc1dd96911a..d8805cd9717a 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >
> > > >  #include   /* for COMMAND_LINE_SIZE */
> > > >  #include 
> > > > @@ -1050,6 +1051,18 @@ int __init early_init_dt_scan_chosen(unsigned 
> > > > long node, const char *uname,
> > > >
> > > > /* Retrieve command line */
> > > > p = of_get_flat_dt_prop(node, "bootargs", &l);
> > > > +
> > > > +#if defined(CONFIG_GENERIC_CMDLINE) && 
> > > > defined(CONFIG_GENERIC_CMDLINE_OF)
> > > 
> > > Moving in the wrong direction... This code already has too many
> > > #ifdef's. I like Christophe's version as it gets rid of all the code
> > > here.
> >  
> > It's temporary .. Notice CONFIG_GENERIC_CMDLINE_OF is only used on PowerPC. 
> > I
> > experienced doubling on arm64 when this was used (i.e. the append and 
> > prepend
> > was added twice).
> > 
> > I don't think there are any other users which can't be moved outside the 
> > device
> > tree code, but powerpc uses this function three times during boot up plus 
> > the
> > prom_init user. It's possible to use the generic command line in all four 
> > places,
> > but it become space inefficient.
> 
> What's the 3rd use? I count kaslr code and in 
> early_init_dt_scan_chosen_ppc. Do we need to build the command line for 
> kaslr seed? Getting any build time value from the kernel is pointless.

I think I may have been mistaken. I added a dump_stack() , but there may have
been other stack traces during bootup on prior -rcX's I was testing.

I re-ran the test and I only see one user on powerpc and powerpc64,

powerpc64,

[T0] Call Trace:
[T0] [c1517d00] [c077e910] dump_stack+0xc4/0x114 
(unreliable)
[T0] [c1517d50] [c1186fb4] 
early_init_dt_scan_chosen+0x238/0x324
[T0] [c1517de0] [c1138b00] 
early_init_dt_scan_chosen_ppc+0x20/0x194
[T0] [c1517e10] [c1186ae0] of_scan_flat_dt+0xc8/0x130
[T0] [c1517e70] [c1139404] early_init_devtree+0xa4/0x48c
[T0] [c1517f10] [c113ac90] early_setup+0xc8/0x254
[T0] [c1517f90] [c754] 0xc754

powerpc32,

Call Trace:
[c06bbee0] [c067e334] early_init_dt_scan_chosen+0xf8/0x1dc (unreliable)
[c06bbf10] [c0666ec4] early_init_dt_scan_chosen_ppc+0x18/0x6c
[c06bbf30] [c067e048] of_scan_flat_dt+0x98/0xf4
[c06bbf70] [c0667234] early_init_devtree+0x48/0x2d0
[c06bbfb0] [c06679cc] machine_init+0x98/0xcc
[c06bbff0] [c398] set_ivor+0x114/0x154

I think it would be possible to just move the generic handling entire into
architecture code.

Daniel


Re: [PATCH v4 19/20] mips: Convert to GENERIC_CMDLINE

2021-04-08 Thread Daniel Walker
On Thu, Apr 08, 2021 at 02:04:08PM -0500, Rob Herring wrote:
> On Tue, Apr 06, 2021 at 10:38:36AM -0700, Daniel Walker wrote:
> > On Fri, Apr 02, 2021 at 03:18:21PM +, Christophe Leroy wrote:
> > > -config CMDLINE_BOOL
> > > - bool "Built-in kernel command line"
> > > - help
> > > -   For most systems, it is firmware or second stage bootloader that
> > > -   by default specifies the kernel command line options.  However,
> > > -   it might be necessary or advantageous to either override the
> > > -   default kernel command line or add a few extra options to it.
> > > -   For such cases, this option allows you to hardcode your own
> > > -   command line options directly into the kernel.  For that, you
> > > -   should choose 'Y' here, and fill in the extra boot arguments
> > > -   in CONFIG_CMDLINE.
> > > -
> > > -   The built-in options will be concatenated to the default command
> > > -   line if CMDLINE_OVERRIDE is set to 'N'. Otherwise, the default
> > > -   command line will be ignored and replaced by the built-in string.
> > > -
> > > -   Most MIPS systems will normally expect 'N' here and rely upon
> > > -   the command line from the firmware or the second-stage bootloader.
> > > -
> > 
> > 
> > See how you complained that I have CMDLINE_BOOL in my changed, and you 
> > think it
> > shouldn't exist.
> > 
> > Yet here mips has it, and you just deleted it with no feature parity in your
> > changes for this.
> 
> AFAICT, CMDLINE_BOOL equates to a non-empty or empty CONFIG_CMDLINE. You 
> seem to need it just because you have CMDLINE_PREPEND and 
> CMDLINE_APPEND. If that's not it, what feature is missing? CMDLINE_BOOL 
> is not a feature, but an implementation detail.

Not true.

It makes it easier to turn it all off inside the Kconfig , so it's for usability
and multiple architecture have it even with just CMDLINE as I was commenting
here.

Daniel


Re: [PATCH] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

2021-04-08 Thread Nicholas Piggin
I was going to nitpick "overflown" here as something birds do, but some
sources says overflown is okay for past tense.

You could use "overflowed" for that, but I understand the issue with the 
word: you are talking about counters that are currently in an "overflow" 
state, but the overflow occurred in the past and is not still happening
so you "overflowing" doesn't exactly fit either.

overflown kind of works for some reason you can kind of use it for
present tense!

Excerpts from Athira Rajeev's message of April 7, 2021 12:47 am:
> Running perf fuzzer showed below in dmesg logs:
> "Can't find PMC that caused IRQ"
> 
> This means a PMU exception happened, but none of the PMC's (Performance
> Monitor Counter) were found to be overflown. There are some corner cases
> that clears the PMCs after PMI gets masked. In such cases, the perf
> interrupt handler will not find the active PMC values that had caused
> the overflow and thus leads to this message while replaying.
> 
> Case 1: PMU Interrupt happens during replay of other interrupts and
> counter values gets cleared by PMU callbacks before replay:
> 
> During replay of interrupts like timer, __do_irq and doorbell exception, we
> conditionally enable interrupts via may_hard_irq_enable(). This could
> potentially create a window to generate a PMI. Since irq soft mask is set
> to ALL_DISABLED, the PMI will get masked here.

I wonder if may_hard_irq_enable shouldn't enable if PMI is soft
disabled. And also maybe replay should not set ALL_DISABLED if
there are no PMI interrupts pending.

Still, I think those are a bit more tricky and might take a while
to get right or just not be worth while, so I think your patch is
fine.

> We could get IPIs run before
> perf interrupt is replayed and the PMU events could deleted or stopped.
> This will change the PMU SPR values and resets the counters. Snippet of
> ftrace log showing PMU callbacks invoked in "__do_irq":
> 
> -0 [051] dns. 132025441306354: __do_irq <-call_do_irq
> -0 [051] dns. 132025441306430: irq_enter <-__do_irq
> -0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
> -0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
> <<>>
> -0 [051] dnH. 132025441307770: 
> generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
> -0 [051] dnH. 132025441307839: flush_smp_call_function_queue 
> <-smp_ipi_demux_relaxed
> -0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
> -0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
> -0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
> -0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
> -0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
> -0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
> -0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
> -0 [051] dnH. 132025441308787: power_pmu_event_idx 
> <-perf_event_update_userpage
> -0 [051] dnH. 132025441308859: rcu_read_unlock_strict 
> <-perf_event_update_userpage
> -0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
> <<>>
> -0 [051] dnH. 132025441311108: irq_exit <-__do_irq
> -0 [051] dns. 132025441311319: performance_monitor_exception 
> <-replay_soft_interrupts
> 
> Case 2: PMI's masked during local_* operations, example local_add.
> If the local_add operation happens within a local_irq_save, replay of
> PMI will be during local_irq_restore. Similar to case 1, this could
> also create a window before replay where PMU events gets deleted or
> stopped.

Here as well perhaps PMIs should be replayed if they are unmasked
even if other interrupts are still masked. Again that might be more
complexity than it's worth.

> 
> Patch adds a fix to update the PMU callback functions (del,stop,enable) to
> check for pending perf interrupt. If there is an overflown PMC and pending
> perf interrupt indicated in Paca, clear the PMI bit in paca to drop that
> sample. In case of power_pmu_del, also clear the MMCR0 PMAO bit which
> otherwise could lead to spurious interrupts in some corner cases. Example,
> a timer after power_pmu_del which will re-enable interrupts since PMI is
> cleared and triggers a PMI again since PMAO bit is still set.
> 
> We can't just replay PMI any time. Hence this approach is preferred rather
> than replaying PMI before resetting overflown PMC. Patch also documents
> core-book3s on a race condition which can trigger these PMC messages during
> idle path in PowerNV.
> 
> Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and 
> replay them")
> Reported-by: Nageswara R Sastry 
> Suggested-by: Nicholas Piggin 
> Suggested-by: Madhavan Srinivasan 
> Signed-off-by: Athira Rajeev 
> ---
>  arch/powerpc/include/asm/pmc.h  | 11 +
>  arch/powerpc/perf/core-book3s.c | 55 
> +
>  2 files changed, 66 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
> index c6bbe9778d3c..97b4bd8de25b 100644
> --- a/arch/powerpc/in

[powerpc:next-test 168/182] include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_171' declared with attribute error: BUILD_BUG_ON failed: TASK_SIZE > MODULES_VADDR

2021-04-08 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb
commit: 093cb12967d4bde01a4170fd342bc0d443004599 [168/182] powerpc/32s: Define 
a MODULE area below kernel text all the time
config: powerpc64-randconfig-c004-20210408 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=093cb12967d4bde01a4170fd342bc0d443004599
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next-test
git checkout 093cb12967d4bde01a4170fd342bc0d443004599
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from :
   arch/powerpc/kernel/module.c: In function 'module_alloc':
>> include/linux/compiler_types.h:320:38: error: call to 
>> '__compiletime_assert_171' declared with attribute error: BUILD_BUG_ON 
>> failed: TASK_SIZE > MODULES_VADDR
 320 |  _compiletime_assert(condition, msg, __compiletime_assert_, 
__COUNTER__)
 |  ^
   include/linux/compiler_types.h:301:4: note: in definition of macro 
'__compiletime_assert'
 301 |prefix ## suffix();\
 |^~
   include/linux/compiler_types.h:320:2: note: in expansion of macro 
'_compiletime_assert'
 320 |  _compiletime_assert(condition, msg, __compiletime_assert_, 
__COUNTER__)
 |  ^~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 
'compiletime_assert'
  39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
 | ^~
   include/linux/build_bug.h:50:2: note: in expansion of macro 
'BUILD_BUG_ON_MSG'
  50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
 |  ^~~~
   arch/powerpc/kernel/module.c:105:2: note: in expansion of macro 
'BUILD_BUG_ON'
 105 |  BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
 |  ^~~~


vim +/__compiletime_assert_171 +320 include/linux/compiler_types.h

eb5c2d4b45e3d2 Will Deacon 2020-07-21  306  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  307  #define 
_compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21  308  __compiletime_assert(condition, 
msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  309  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  310  /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21  311   * compiletime_assert - break build 
and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  312   * @condition: a compile-time 
constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21  313   * @msg:   a message to emit if 
condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  314   *
eb5c2d4b45e3d2 Will Deacon 2020-07-21  315   * In tradition of POSIX assert, 
this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  316   * supplied condition is *false*, 
emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  317   * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21  318   */
eb5c2d4b45e3d2 Will Deacon 2020-07-21  319  #define 
compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @320  _compiletime_assert(condition, 
msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  321  

:: The code at line 320 was first introduced by commit
:: eb5c2d4b45e3d2d5d052ea6b8f1463976b1020d5 compiler.h: Move 
compiletime_assert() macros into compiler_types.h

:: TO: Will Deacon 
:: CC: Will Deacon 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v7] soc: fsl: enable acpi support in RCPM driver

2021-04-08 Thread Li Yang
On Wed, Apr 7, 2021 at 9:58 PM Ran Wang  wrote:
>
> From: Peng Ma 
>
> This patch enables ACPI support in RCPM driver.
>
> Signed-off-by: Peng Ma 
> Signed-off-by: Ran Wang 

Applied for next.  Thanks.

> ---
> Change in v7:
>  - Update comment for checking RCPM node which refferred to
>
> Change in v6:
>  - Remove copyright udpate to rebase on latest mainline
>
> Change in v5:
>  - Fix panic when dev->of_node is null
>
> Change in v4:
>  - Make commit subject more accurate
>  - Remove unrelated new blank line
>
> Change in v3:
>  - Add #ifdef CONFIG_ACPI for acpi_device_id
>  - Rename rcpm_acpi_imx_ids to rcpm_acpi_ids
>
> Change in v2:
>  - Update acpi_device_id to fix conflict with other driver
>
>  drivers/soc/fsl/rcpm.c | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c
> index 4ace28cab314..90d3f4060b0c 100644
> --- a/drivers/soc/fsl/rcpm.c
> +++ b/drivers/soc/fsl/rcpm.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #define RCPM_WAKEUP_CELL_MAX_SIZE  7
>
> @@ -78,10 +79,20 @@ static int rcpm_pm_prepare(struct device *dev)
> "fsl,rcpm-wakeup", value,
> rcpm->wakeup_cells + 1);
>
> -   /*  Wakeup source should refer to current rcpm device */
> -   if (ret || (np->phandle != value[0]))
> +   if (ret)
> continue;
>
> +   /*
> +* For DT mode, would handle devices with "fsl,rcpm-wakeup"
> +* pointing to the current RCPM node.
> +*
> +* For ACPI mode, currently we assume there is only one
> +* RCPM controller existing.
> +*/
> +   if (is_of_node(dev->fwnode))
> +   if (np->phandle != value[0])
> +   continue;
> +
> /* Property "#fsl,rcpm-wakeup-cells" of rcpm node defines the
>  * number of IPPDEXPCR register cells, and "fsl,rcpm-wakeup"
>  * of wakeup source IP contains an integer array:  @@ -172,10 +183,19 @@ static const struct of_device_id rcpm_of_match[] = {
>  };
>  MODULE_DEVICE_TABLE(of, rcpm_of_match);
>
> +#ifdef CONFIG_ACPI
> +static const struct acpi_device_id rcpm_acpi_ids[] = {
> +   {"NXP0015",},
> +   { }
> +};
> +MODULE_DEVICE_TABLE(acpi, rcpm_acpi_ids);
> +#endif
> +
>  static struct platform_driver rcpm_driver = {
> .driver = {
> .name = "rcpm",
> .of_match_table = rcpm_of_match,
> +   .acpi_match_table = ACPI_PTR(rcpm_acpi_ids),
> .pm = &rcpm_pm_ops,
> },
> .probe = rcpm_probe,
> --
> 2.25.1
>


[PATCH v3 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread Leonardo Bras
According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
will let the OS know all possible pagesizes that can be used for creating a
new DDW.

Currently Linux will only try using 3 of the 8 available options:
4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
128M, 256M and 16G.

Enabling bigger pages would be interesting for direct mapping systems
with a lot of RAM, while using less TCE entries.

Signed-off-by: Leonardo Bras 
---
Changes since v2:
 - Restore 'int array & shift' strategy
 - Remove defines for RTAS "IO Page Size" output of ibm,query-pe-dma-window
 - Added/Improved comments
Link: 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210407195613.131140-1-leobra...@gmail.com/
Changes since v1:
- Remove page shift defines, replace by __builtin_ctzll(SZ_XXX)
- Add bit field defines for RTAS "IO Page Shift" output of 
ibm,query-pe-dma-window
- Use struct array instead of int array to be more explicit on pagesizes
Link: 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210322190943.715368-1-leobra...@gmail.com/
 

 arch/powerpc/platforms/pseries/iommu.c | 37 +-
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..67c9953a6503 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1099,6 +1099,33 @@ static void reset_dma_window(struct pci_dev *dev, struct 
device_node *par_dn)
 ret);
 }
 
+/* Return largest page shift based on "IO Page Sizes" output of 
ibm,query-pe-dma-window. */
+static int iommu_get_page_shift(u32 query_page_size)
+{
+   /* Supported IO page-sizes according to LoPAR */
+   const int shift[] = {
+   __builtin_ctzll(SZ_4K),   __builtin_ctzll(SZ_64K), 
__builtin_ctzll(SZ_16M),
+   __builtin_ctzll(SZ_32M),  __builtin_ctzll(SZ_64M), 
__builtin_ctzll(SZ_128M),
+   __builtin_ctzll(SZ_256M), __builtin_ctzll(SZ_16G)
+   };
+
+   int i = ARRAY_SIZE(shift) - 1;
+
+   /*
+* On LoPAR, ibm,query-pe-dma-window outputs "IO Page Sizes" using a 
bit field:
+* - bit 31 means 4k pages are supported,
+* - bit 30 means 64k pages are supported, and so on.
+* Larger pagesizes map more memory with the same amount of TCEs, so 
start probing them.
+*/
+   for (; i >= 0 ; i--) {
+   if (query_page_size & (1 << i))
+   return shift[i];
+   }
+
+   /* No valid page size found. */
+   return 0;
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
@@ -1206,13 +1233,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
goto out_failed;
}
}
-   if (query.page_size & 4) {
-   page_shift = 24; /* 16MB */
-   } else if (query.page_size & 2) {
-   page_shift = 16; /* 64kB */
-   } else if (query.page_size & 1) {
-   page_shift = 12; /* 4kB */
-   } else {
+
+   page_shift = iommu_get_page_shift(query.page_size);
+   if (!page_shift) {
dev_dbg(&dev->dev, "no supported direct page size in mask %x",
  query.page_size);
goto out_failed;
-- 
2.30.2



Re: [RFC PATCH v6 1/1] cmdline: Add capability to both append and prepend at the same time

2021-04-08 Thread Rob Herring
On Sun, Apr 4, 2021 at 12:20 PM Christophe Leroy
 wrote:
>
> One user has expressed the need to both append and prepend some
> built-in parameters to the command line provided by the bootloader.
>
> Allthough it is a corner case, it is easy to implement so let's do it.
>
> When the user chooses to prepend the bootloader provided command line
> with the built-in command line, he is offered the possibility to enter
> an additionnal built-in command line to be appended after the
> bootloader provided command line.
>
> It is a complementary feature which has no impact on the already
> existing ones and/or the existing defconfig.
>
> Suggested-by: Daniel Walker 
> Signed-off-by: Christophe Leroy 
> ---
> Sending this out as an RFC, applies on top of the series
> ("Implement GENERIC_CMDLINE"). I will add it to the series next spin
> unless someone is against it.

Well, it works, but you are working around the existing kconfig and
the result is not great. You'd never design it this way.

Rob


Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask

2021-04-08 Thread Nathan Lynch
Srikar Dronamraju  writes:
> * Nathan Lynch  [2021-04-07 14:46:24]:
>> I don't know. I guess this question just makes me wonder whether powerpc
>> needs to have the additional lookup table. How is it different from the
>> generic per_cpu numa_node?
>
> lookup table is for early cpu to node i.e when per_cpu variables may not be
> available. This would mean that calling set_numa_node/set_cpu_numa_node from
> map_cpu_to_node() may not always be an option, since map_cpu_to_node() does
> end up getting called very early in the system.

Ah that's right, thanks.


Re: [PATCH v4 18/20] x86: Convert to GENERIC_CMDLINE

2021-04-08 Thread Rob Herring
On Fri, Apr 02, 2021 at 03:18:20PM +, Christophe Leroy wrote:
> This converts the architecture to GENERIC_CMDLINE.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/x86/Kconfig| 45 ++---
>  arch/x86/kernel/setup.c | 17 ++--
>  2 files changed, 4 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index a20684d56b4b..66b384228ca3 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -104,6 +104,7 @@ config X86
>   select ARCH_USE_QUEUED_SPINLOCKS
>   select ARCH_USE_SYM_ANNOTATIONS
>   select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> + select ARCH_WANT_CMDLINE_PREPEND_BY_DEFAULT

Seems to be non-existent kconfig option.

>   select ARCH_WANT_DEFAULT_BPF_JITif X86_64
>   select ARCH_WANTS_DYNAMIC_TASK_STRUCT
>   select ARCH_WANT_HUGE_PMD_SHARE
> @@ -118,6 +119,7 @@ config X86
>   select EDAC_SUPPORT
>   select GENERIC_CLOCKEVENTS_BROADCASTif X86_64 || (X86_32 && 
> X86_LOCAL_APIC)
>   select GENERIC_CLOCKEVENTS_MIN_ADJUST
> + select GENERIC_CMDLINE
>   select GENERIC_CMOS_UPDATE
>   select GENERIC_CPU_AUTOPROBE
>   select GENERIC_CPU_VULNERABILITIES
> @@ -2358,49 +2360,6 @@ choice
>  
>  endchoice
>  
> -config CMDLINE_BOOL
> - bool "Built-in kernel command line"
> - help
> -   Allow for specifying boot arguments to the kernel at
> -   build time.  On some systems (e.g. embedded ones), it is
> -   necessary or convenient to provide some or all of the
> -   kernel boot arguments with the kernel itself (that is,
> -   to not rely on the boot loader to provide them.)
> -
> -   To compile command line arguments into the kernel,
> -   set this option to 'Y', then fill in the
> -   boot arguments in CONFIG_CMDLINE.
> -
> -   Systems with fully functional boot loaders (i.e. non-embedded)
> -   should leave this option set to 'N'.
> -
> -config CMDLINE
> - string "Built-in kernel command string"
> - depends on CMDLINE_BOOL
> - default ""
> - help
> -   Enter arguments here that should be compiled into the kernel
> -   image and used at boot time.  If the boot loader provides a
> -   command line at boot time, it is appended to this string to
> -   form the full kernel command line, when the system boots.
> -
> -   However, you can use the CONFIG_CMDLINE_FORCE option to
> -   change this behavior.
> -
> -   In most cases, the command line (whether built-in or provided
> -   by the boot loader) should specify the device for the root
> -   file system.
> -
> -config CMDLINE_FORCE
> - bool "Built-in command line overrides boot loader arguments"
> - depends on CMDLINE_BOOL && CMDLINE != ""
> - help
> -   Set this option to 'Y' to have the kernel ignore the boot loader
> -   command line, and use ONLY the built-in command line.
> -
> -   This is used to work around broken boot loaders.  This should
> -   be set to 'N' under normal conditions.
> -
>  config MODIFY_LDT_SYSCALL
>   bool "Enable the LDT (local descriptor table)" if EXPERT
>   default y
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6f2de58eeb54..3f274b02e51c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -5,6 +5,7 @@
>   * This file contains the setup_arch() code, which handles the 
> architecture-dependent
>   * parts of early kernel initialization.
>   */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -161,9 +162,6 @@ unsigned long saved_video_mode;
>  #define RAMDISK_LOAD_FLAG0x4000
>  
>  static char __initdata command_line[COMMAND_LINE_SIZE];
> -#ifdef CONFIG_CMDLINE_BOOL
> -static char __initdata builtin_cmdline[COMMAND_LINE_SIZE] = CONFIG_CMDLINE;
> -#endif
>  
>  #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
>  struct edd edd;
> @@ -883,18 +881,7 @@ void __init setup_arch(char **cmdline_p)
>   bss_resource.start = __pa_symbol(__bss_start);
>   bss_resource.end = __pa_symbol(__bss_stop)-1;
>  
> -#ifdef CONFIG_CMDLINE_BOOL
> -#ifdef CONFIG_CMDLINE_FORCE
> - strlcpy(boot_command_line, builtin_cmdline, COMMAND_LINE_SIZE);
> -#else
> - if (builtin_cmdline[0]) {
> - /* append boot loader cmdline to builtin */
> - strlcat(builtin_cmdline, " ", COMMAND_LINE_SIZE);
> - strlcat(builtin_cmdline, boot_command_line, COMMAND_LINE_SIZE);
> - strlcpy(boot_command_line, builtin_cmdline, COMMAND_LINE_SIZE);
> - }
> -#endif
> -#endif
> + cmdline_build(boot_command_line, boot_command_line);
>  
>   strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
>   *cmdline_p = command_line;

Once this is all done, I wonder if we can get rid of the strlcpy and 
perhaps also cmdline_p.

Rob


Re: [PATCH v4 19/20] mips: Convert to GENERIC_CMDLINE

2021-04-08 Thread Rob Herring
On Tue, Apr 06, 2021 at 10:38:36AM -0700, Daniel Walker wrote:
> On Fri, Apr 02, 2021 at 03:18:21PM +, Christophe Leroy wrote:
> > -config CMDLINE_BOOL
> > -   bool "Built-in kernel command line"
> > -   help
> > - For most systems, it is firmware or second stage bootloader that
> > - by default specifies the kernel command line options.  However,
> > - it might be necessary or advantageous to either override the
> > - default kernel command line or add a few extra options to it.
> > - For such cases, this option allows you to hardcode your own
> > - command line options directly into the kernel.  For that, you
> > - should choose 'Y' here, and fill in the extra boot arguments
> > - in CONFIG_CMDLINE.
> > -
> > - The built-in options will be concatenated to the default command
> > - line if CMDLINE_OVERRIDE is set to 'N'. Otherwise, the default
> > - command line will be ignored and replaced by the built-in string.
> > -
> > - Most MIPS systems will normally expect 'N' here and rely upon
> > - the command line from the firmware or the second-stage bootloader.
> > -
> 
> 
> See how you complained that I have CMDLINE_BOOL in my changed, and you think 
> it
> shouldn't exist.
> 
> Yet here mips has it, and you just deleted it with no feature parity in your
> changes for this.

AFAICT, CMDLINE_BOOL equates to a non-empty or empty CONFIG_CMDLINE. You 
seem to need it just because you have CMDLINE_PREPEND and 
CMDLINE_APPEND. If that's not it, what feature is missing? CMDLINE_BOOL 
is not a feature, but an implementation detail.

Rob


Re: [OpenRISC] [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

2021-04-08 Thread Waiman Long

On 4/6/21 7:52 PM, Stafford Horne wrote:


For OpenRISC I did ack the patch to convert to
CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y.  But I think you are right, the
generic code in xchg_tail and the xchg16 emulation code in produced by OpenRISC
using xchg32 would produce very similar code.  I have not compared instructions,
but it does seem like duplicate functionality.

Why doesn't RISC-V add the xchg16 emulation code similar to OpenRISC?  For
OpenRISC we added xchg16 and xchg8 emulation code to enable qspinlocks.  So
one thought is with CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, can we remove our
xchg16/xchg8 emulation code?


For the record, the latest qspinlock code doesn't use xchg8 anymore. It 
still need xchg16, though.


Cheers,
Longman



[powerpc:next-test 120/182] drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in initializer (different address spaces)

2021-04-08 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   3ac6488df9160f52bbd8b8ec3387a53ac3d0f2eb
commit: e72fcdb26cde72985c418b39f72ecaa222e1f4d5 [120/182] powerpc/uaccess: 
Refactor get/put_user() and __get/put_user()
config: powerpc-randconfig-s031-20210408 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-279-g6d5d9b42-dirty
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=e72fcdb26cde72985c418b39f72ecaa222e1f4d5
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next-test
git checkout e72fcdb26cde72985c418b39f72ecaa222e1f4d5
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
>> drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
>> initializer (different address spaces) @@ expected char [noderef] __user 
>> *_pu_addr @@ got char *buf @@
   drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
__user *_pu_addr
   drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
>> drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
>> initializer (different address spaces) @@ expected char const [noderef] 
>> __user *_gu_addr @@ got char const *buf @@
   drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
[noderef] __user *_gu_addr
   drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf

vim +342 drivers/w1/slaves/w1_ds28e04.c

fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  338  
fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  339  static ssize_t 
crccheck_show(struct device *dev, struct device_attribute *attr,
fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  340
char *buf)
fbf7f7b4e2ae40 Markus Franke  2012-05-26  341  {
fbf7f7b4e2ae40 Markus Franke  2012-05-26 @342   if 
(put_user(w1_enable_crccheck + 0x30, buf))
fbf7f7b4e2ae40 Markus Franke  2012-05-26  343   return -EFAULT;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  344  
fbf7f7b4e2ae40 Markus Franke  2012-05-26  345   return 
sizeof(w1_enable_crccheck);
fbf7f7b4e2ae40 Markus Franke  2012-05-26  346  }
fbf7f7b4e2ae40 Markus Franke  2012-05-26  347  
fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  348  static ssize_t 
crccheck_store(struct device *dev, struct device_attribute *attr,
fbf7f7b4e2ae40 Markus Franke  2012-05-26  349 
const char *buf, size_t count)
fbf7f7b4e2ae40 Markus Franke  2012-05-26  350  {
fbf7f7b4e2ae40 Markus Franke  2012-05-26  351   char val;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  352  
fbf7f7b4e2ae40 Markus Franke  2012-05-26  353   if (count != 1 || !buf)
fbf7f7b4e2ae40 Markus Franke  2012-05-26  354   return -EINVAL;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  355  
fbf7f7b4e2ae40 Markus Franke  2012-05-26 @356   if (get_user(val, buf))
fbf7f7b4e2ae40 Markus Franke  2012-05-26  357   return -EFAULT;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  358  
fbf7f7b4e2ae40 Markus Franke  2012-05-26  359   /* convert to decimal */
fbf7f7b4e2ae40 Markus Franke  2012-05-26  360   val = val - 0x30;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  361   if (val != 0 && val != 
1)
fbf7f7b4e2ae40 Markus Franke  2012-05-26  362   return -EINVAL;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  363  
fbf7f7b4e2ae40 Markus Franke  2012-05-26  364   /* set the new value */
fbf7f7b4e2ae40 Markus Franke  2012-05-26  365   w1_enable_crccheck = 
val;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  366  
fbf7f7b4e2ae40 Markus Franke  2012-05-26  367   return 
sizeof(w1_enable_crccheck);
fbf7f7b4e2ae40 Markus Franke  2012-05-26  368  }
fbf7f7b4e2ae40 Markus Franke  2012-05-26  369  

:: The code at line 342 was first introduced by commit
:: fbf7f7b4e2ae40f790828c86d31beff2d49e9ac8 w1: Add 1-wire slave device 
driver for DS28E04-100

:: TO: Markus Franke 
:: CC: Greg Kroah-Hartman 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] ASoC: fsl: sunxi: remove redundant dev_err call

2021-04-08 Thread Mark Brown
On Wed, 7 Apr 2021 14:56:34 +0500, Muhammad Usama Anjum wrote:
> devm_ioremap_resource() prints error message in itself. Remove the
> dev_err call to avoid redundant error message.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl: sunxi: remove redundant dev_err call
  commit: a93799d55fd479f540ed97066e69114aa7709787

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


[PATCH v1 2/2] powerpc/atomics: Use immediate operand when possible

2021-04-08 Thread Christophe Leroy
Today we get the following code generation for atomic operations:

c001bb2c:   39 20 00 01 li  r9,1
c001bb30:   7d 40 18 28 lwarx   r10,0,r3
c001bb34:   7d 09 50 50 subfr8,r9,r10
c001bb38:   7d 00 19 2d stwcx.  r8,0,r3

c001c7a8:   39 40 00 01 li  r10,1
c001c7ac:   7d 00 18 28 lwarx   r8,0,r3
c001c7b0:   7c ea 42 14 add r7,r10,r8
c001c7b4:   7c e0 19 2d stwcx.  r7,0,r3

By allowing GCC to choose between immediate or regular operation,
we get:

c001bb2c:   7d 20 18 28 lwarx   r9,0,r3
c001bb30:   39 49 ff ff addir10,r9,-1
c001bb34:   7d 40 19 2d stwcx.  r10,0,r3
--
c001c7a4:   7d 40 18 28 lwarx   r10,0,r3
c001c7a8:   39 0a 00 01 addir8,r10,1
c001c7ac:   7d 00 19 2d stwcx.  r8,0,r3

For "and", the dot form has to be used because "andi" doesn't exist.

For logical operations we use unsigned 16 bits immediate.
For arithmetic operations we use signed 16 bits immediate.

On pmac32_defconfig, it reduces the text by approx another 8 kbytes.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/atomic.h | 56 +++
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 61c6e8b200e8..e4b5e2f25ba7 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -37,62 +37,62 @@ static __inline__ void atomic_set(atomic_t *v, int i)
__asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"UPD_CONSTR(v->counter) : 
"r"(i));
 }
 
-#define ATOMIC_OP(op, asm_op)  \
+#define ATOMIC_OP(op, asm_op, dot, sign)   \
 static __inline__ void atomic_##op(int a, atomic_t *v) \
 {  \
int t;  \
\
__asm__ __volatile__(   \
 "1:lwarx   %0,0,%3 # atomic_" #op "\n" \
-   #asm_op " %0,%2,%0\n"   \
+   #asm_op "%I2" dot " %0,%0,%2\n" \
 "  stwcx.  %0,0,%3 \n" \
 "  bne-1b\n"   \
-   : "=&r" (t), "+m" (v->counter)  \
-   : "r" (a), "r" (&v->counter)\
+   : "=&b" (t), "+m" (v->counter)  \
+   : "r"#sign (a), "r" (&v->counter)   \
: "cc");\
 }  \
 
-#define ATOMIC_OP_RETURN_RELAXED(op, asm_op)   \
+#define ATOMIC_OP_RETURN_RELAXED(op, asm_op, dot, sign)
\
 static inline int atomic_##op##_return_relaxed(int a, atomic_t *v) \
 {  \
int t;  \
\
__asm__ __volatile__(   \
 "1:lwarx   %0,0,%3 # atomic_" #op "_return_relaxed\n"  \
-   #asm_op " %0,%2,%0\n"   \
+   #asm_op "%I2" dot " %0,%0,%2\n" \
 "  stwcx.  %0,0,%3\n"  \
 "  bne-1b\n"   \
-   : "=&r" (t), "+m" (v->counter)  \
-   : "r" (a), "r" (&v->counter)\
+   : "=&b" (t), "+m" (v->counter)  \
+   : "r"#sign (a), "r" (&v->counter)   \
: "cc");\
\
return t;   \
 }
 
-#define ATOMIC_FETCH_OP_RELAXED(op, asm_op)\
+#define ATOMIC_FETCH_OP_RELAXED(op, asm_op, dot, sign) \
 static inline int atomic_fetch_##op##_relaxed(int a, atomic_t *v)  \
 {  \
int res, t; \
\
__asm__ __volatile__( 

[PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-08 Thread Christophe Leroy
Today we get the following code generation for bitops like
set or clear bit:

c0009fe0:   39 40 08 00 li  r10,2048
c0009fe4:   7c e0 40 28 lwarx   r7,0,r8
c0009fe8:   7c e7 53 78 or  r7,r7,r10
c0009fec:   7c e0 41 2d stwcx.  r7,0,r8

c000c044:   39 40 20 00 li  r10,8192
c000c048:   7c e0 40 28 lwarx   r7,0,r8
c000c04c:   7c e7 50 78 andcr7,r7,r10
c000c050:   7c e0 41 2d stwcx.  r7,0,r8

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for
when all bits to be cleared are consecutive. For the time being only
handle the single bit case, which we detect by checking whether the
mask is a power of two. Can't use is_power_of_2() function because it
is not included yet, but it is easy to code with (mask & (mask - 1))
and even the 0 case which is not a power of two is acceptable for us.

On 64 bits we don't have any equivalent single operation, we'd need
two 'rldicl' so it is not worth it.

With this patch we get:

c0009fe0:   7d 00 50 28 lwarx   r8,0,r10
c0009fe4:   61 08 08 00 ori r8,r8,2048
c0009fe8:   7d 00 51 2d stwcx.  r8,0,r10

c000c034:   7d 00 50 28 lwarx   r8,0,r10
c000c038:   55 08 04 e2 rlwinm  r8,r8,0,19,17
c000c03c:   7d 00 51 2d stwcx.  r8,0,r10

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/bitops.h | 77 +++
 1 file changed, 69 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/bitops.h 
b/arch/powerpc/include/asm/bitops.h
index 299ab33505a6..0b0c6bdd9be9 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -71,19 +71,49 @@ static inline void fn(unsigned long mask,   \
__asm__ __volatile__ (  \
prefix  \
 "1:"   PPC_LLARX(%0,0,%3,0) "\n"   \
-   stringify_in_c(op) "%0,%0,%2\n" \
+   #op "%I2 %0,%0,%2\n"\
PPC_STLCX "%0,0,%3\n"   \
"bne- 1b\n" \
: "=&r" (old), "+m" (*p)\
-   : "r" (mask), "r" (p)   \
+   : "rK" (mask), "r" (p)  \
: "cc", "memory");  \
 }
 
 DEFINE_BITOP(set_bits, or, "")
-DEFINE_BITOP(clear_bits, andc, "")
-DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)
 DEFINE_BITOP(change_bits, xor, "")
 
+#define DEFINE_CLROP(fn, prefix)   \
+static inline void fn(unsigned long mask, volatile unsigned long *_p)  \
+{  \
+   unsigned long old;  \
+   unsigned long *p = (unsigned long *)_p; \
+   if (IS_ENABLED(CONFIG_PPC32) && \
+   __builtin_constant_p(mask) && !(mask & (mask - 1))) {   \
+   asm volatile (  \
+   prefix  \
+   "1:""lwarx  %0,0,%3\n"  \
+   "rlwinm %0,%0,0,%2\n"   \
+   "stwcx. %0,0,%3\n"  \
+   "bne- 1b\n" \
+   : "=&r" (old), "+m" (*p)\
+   : "i" (~mask), "r" (p)  \
+   : "cc", "memory");  \
+   } else {\
+   asm volatile (  \
+   prefix  \
+   "1:"PPC_LLARX(%0,0,%3,0) "\n"   \
+   "andc %0,%0,%2\n"   \
+   PPC_STLCX "%0,0,%3\n"   \
+   "bne- 1b\n" \
+   : "=&r" (old), "+m" (*p)\
+   : "r" (mask), "r" (p)   \
+   : "cc", "memory");  \
+   }   \
+}
+
+DEFINE_CLROP(clear_bits, "")
+DEFINE_CLROP(clear_bits_unlock, PPC_RELEASE_BARRIER)
+
 static inline void arch_set_bit(int nr, volatile unsigned long *addr)
 {
set_

[PATCH v2 9/9] powerpc/mem: Use kmap_local_page() in flushing functions

2021-04-08 Thread Christophe Leroy
Flushing functions don't rely on preemption being disabled, so
use kmap_local_page() instead of kmap_atomic().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/cacheflush.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index d9eafa077c09..63363787e000 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -152,16 +152,16 @@ static void flush_dcache_icache_hugepage(struct page 
*page)
 {
int i;
int nr = compound_nr(page);
-   void *start;
 
if (!PageHighMem(page)) {
for (i = 0; i < nr; i++)
__flush_dcache_icache(lowmem_page_address(page + i));
} else {
for (i = 0; i < nr; i++) {
-   start = kmap_atomic(page+i);
+   void *start = kmap_local_page(page + i);
+
__flush_dcache_icache(start);
-   kunmap_atomic(start);
+   kunmap_local(start);
}
}
 }
@@ -177,9 +177,10 @@ void flush_dcache_icache_page(struct page *page)
if (!PageHighMem(page)) {
__flush_dcache_icache(lowmem_page_address(page));
} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > 
sizeof(void *)) {
-   void *start = kmap_atomic(page);
+   void *start = kmap_local_page(page);
+
__flush_dcache_icache(start);
-   kunmap_atomic(start);
+   kunmap_local(start);
} else {
flush_dcache_icache_phys(page_to_phys(page));
}
@@ -225,9 +226,9 @@ void copy_user_page(void *vto, void *vfrom, unsigned long 
vaddr,
 void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 unsigned long addr, int len)
 {
-   unsigned long maddr;
+   void *maddr;
 
-   maddr = (unsigned long) kmap(page) + (addr & ~PAGE_MASK);
-   flush_icache_range(maddr, maddr + len);
-   kunmap(page);
+   maddr = kmap_local_page(page) + (addr & ~PAGE_MASK);
+   flush_icache_range((unsigned long)maddr, (unsigned long)maddr + len);
+   kunmap_local(maddr);
 }
-- 
2.25.0



[PATCH v2 8/9] powerpc/mem: Inline flush_dcache_page()

2021-04-08 Thread Christophe Leroy
flush_dcache_page() is only a few lines, it is worth
inlining.

ia64, csky, mips, openrisc and riscv have a similar
flush_dcache_page() and inline it.

On pmac32_defconfig, we get a small size reduction.
On ppc64_defconfig, we get a very small size increase.

In both case that's in the noise (less than 0.1%).

textdatabss dec hex filename
189911555934744 1497624 2642352319330e3 vmlinux64.before
189948295936732 1497624 264291851934701 vmlinux64.after
9150963 2467502  184548 11803013 b41985 vmlinux32.before
9149689 2467302  184548 11801539 b413c3 vmlinux32.after

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cacheflush.h | 14 +-
 arch/powerpc/mm/cacheflush.c  | 15 ---
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 9110489ea411..7564dd4fd12b 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -30,7 +30,19 @@ static inline void flush_cache_vmap(unsigned long start, 
unsigned long end)
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *page);
+/*
+ * This is called when a page has been modified by the kernel.
+ * It just marks the page as not i-cache clean.  We do the i-cache
+ * flush later when the page is given to a user process, if necessary.
+ */
+static inline void flush_dcache_page(struct page *page)
+{
+   if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
+   return;
+   /* avoid an atomic op if possible */
+   if (test_bit(PG_dcache_clean, &page->flags))
+   clear_bit(PG_dcache_clean, &page->flags);
+}
 
 void flush_icache_range(unsigned long start, unsigned long stop);
 #define flush_icache_range flush_icache_range
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index abeef69ed4e4..d9eafa077c09 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -121,21 +121,6 @@ static void flush_dcache_icache_phys(unsigned long 
physaddr)
 }
 #endif
 
-/*
- * This is called when a page has been modified by the kernel.
- * It just marks the page as not i-cache clean.  We do the i-cache
- * flush later when the page is given to a user process, if necessary.
- */
-void flush_dcache_page(struct page *page)
-{
-   if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
-   return;
-   /* avoid an atomic op if possible */
-   if (test_bit(PG_dcache_clean, &page->flags))
-   clear_bit(PG_dcache_clean, &page->flags);
-}
-EXPORT_SYMBOL(flush_dcache_page);
-
 /**
  * __flush_dcache_icache(): Flush a particular page from the data cache to RAM.
  * Note: this is necessary because the instruction cache does *not*
-- 
2.25.0



[PATCH v2 7/9] powerpc/mem: Help GCC realise __flush_dcache_icache() flushes single pages

2021-04-08 Thread Christophe Leroy
'And' the given page address with PAGE_MASK to help GCC.

With the patch:

0024 <__flush_dcache_icache>:
  24:   54 63 00 26 rlwinm  r3,r3,0,0,19
  28:   39 40 00 40 li  r10,64
  2c:   7c 69 1b 78 mr  r9,r3
  30:   7d 49 03 a6 mtctr   r10
  34:   7c 00 48 6c dcbst   0,r9
  38:   39 29 00 20 addir9,r9,32
  3c:   7c 00 48 6c dcbst   0,r9
  40:   39 29 00 20 addir9,r9,32
  44:   42 00 ff f0 bdnz34 <__flush_dcache_icache+0x10>
  48:   7c 00 04 ac hwsync
  4c:   39 20 00 40 li  r9,64
  50:   7d 29 03 a6 mtctr   r9
  54:   7c 00 1f ac icbi0,r3
  58:   38 63 00 20 addir3,r3,32
  5c:   7c 00 1f ac icbi0,r3
  60:   38 63 00 20 addir3,r3,32
  64:   42 00 ff f0 bdnz54 <__flush_dcache_icache+0x30>
  68:   7c 00 04 ac hwsync
  6c:   4c 00 01 2c isync
  70:   4e 80 00 20 blr

Without the patch:

0024 <__flush_dcache_icache>:
  24:   54 6a 00 34 rlwinm  r10,r3,0,0,26
  28:   39 23 10 1f addir9,r3,4127
  2c:   7d 2a 48 50 subfr9,r10,r9
  30:   55 29 d9 7f rlwinm. r9,r9,27,5,31
  34:   41 82 00 94 beq c8 <__flush_dcache_icache+0xa4>
  38:   71 28 00 01 andi.   r8,r9,1
  3c:   38 c9 ff ff addir6,r9,-1
  40:   7d 48 53 78 mr  r8,r10
  44:   7d 27 4b 78 mr  r7,r9
  48:   40 82 00 6c bne b4 <__flush_dcache_icache+0x90>
  4c:   54 e7 f8 7e rlwinm  r7,r7,31,1,31
  50:   7c e9 03 a6 mtctr   r7
  54:   7c 00 40 6c dcbst   0,r8
  58:   39 08 00 20 addir8,r8,32
  5c:   7c 00 40 6c dcbst   0,r8
  60:   39 08 00 20 addir8,r8,32
  64:   42 00 ff f0 bdnz54 <__flush_dcache_icache+0x30>
  68:   7c 00 04 ac hwsync
  6c:   71 28 00 01 andi.   r8,r9,1
  70:   39 09 ff ff addir8,r9,-1
  74:   40 82 00 2c bne a0 <__flush_dcache_icache+0x7c>
  78:   55 29 f8 7e rlwinm  r9,r9,31,1,31
  7c:   7d 29 03 a6 mtctr   r9
  80:   7c 00 57 ac icbi0,r10
  84:   39 4a 00 20 addir10,r10,32
  88:   7c 00 57 ac icbi0,r10
  8c:   39 4a 00 20 addir10,r10,32
  90:   42 00 ff f0 bdnz80 <__flush_dcache_icache+0x5c>
  94:   7c 00 04 ac hwsync
  98:   4c 00 01 2c isync
  9c:   4e 80 00 20 blr
  a0:   7c 00 57 ac icbi0,r10
  a4:   2c 08 00 00 cmpwi   r8,0
  a8:   39 4a 00 20 addir10,r10,32
  ac:   40 82 ff cc bne 78 <__flush_dcache_icache+0x54>
  b0:   4b ff ff e4 b   94 <__flush_dcache_icache+0x70>
  b4:   7c 00 50 6c dcbst   0,r10
  b8:   2c 06 00 00 cmpwi   r6,0
  bc:   39 0a 00 20 addir8,r10,32
  c0:   40 82 ff 8c bne 4c <__flush_dcache_icache+0x28>
  c4:   4b ff ff a4 b   68 <__flush_dcache_icache+0x44>
  c8:   7c 00 04 ac hwsync
  cc:   7c 00 04 ac hwsync
  d0:   4c 00 01 2c isync
  d4:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/cacheflush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 2d92cb6bc423..abeef69ed4e4 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -145,7 +145,7 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 static void __flush_dcache_icache(void *p)
 {
-   unsigned long addr = (unsigned long)p;
+   unsigned long addr = (unsigned long)p & PAGE_MASK;
 
clean_dcache_range(addr, addr + PAGE_SIZE);
 
-- 
2.25.0



[PATCH v2 6/9] powerpc/mem: flush_dcache_icache_phys() is for HIGHMEM pages only

2021-04-08 Thread Christophe Leroy
__flush_dcache_icache() is usable for non HIGHMEM pages on
every platform.

It is only for HIGHMEM pages that BOOKE needs kmap() and
BOOK3S needs flush_dcache_icache_phys().

So make flush_dcache_icache_phys() dependent on CONFIG_HIGHMEM and
call it only when it is a HIGHMEM page.

We could make flush_dcache_icache_phys() available at all time,
but as it is declared NOKPROBE_SYMBOL(), GCC doesn't optimise
it out when it is not used.

So define a stub for !CONFIG_HIGHMEM in order to remove the #ifdef in
flush_dcache_icache_page() and use IS_ENABLED() instead.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/cacheflush.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 3268a3e55c3f..2d92cb6bc423 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -76,7 +76,7 @@ void flush_icache_range(unsigned long start, unsigned long 
stop)
 }
 EXPORT_SYMBOL(flush_icache_range);
 
-#if !defined(CONFIG_PPC_8xx) && !defined(CONFIG_PPC64)
+#ifdef CONFIG_HIGHMEM
 /**
  * flush_dcache_icache_phys() - Flush a page by it's physical address
  * @physaddr: the physical address of the page
@@ -115,7 +115,11 @@ static void flush_dcache_icache_phys(unsigned long 
physaddr)
: "ctr", "memory");
 }
 NOKPROBE_SYMBOL(flush_dcache_icache_phys)
-#endif // !defined(CONFIG_PPC_8xx) && !defined(CONFIG_PPC64)
+#else
+static void flush_dcache_icache_phys(unsigned long physaddr)
+{
+}
+#endif
 
 /*
  * This is called when a page has been modified by the kernel.
@@ -185,18 +189,15 @@ void flush_dcache_icache_page(struct page *page)
if (PageCompound(page))
return flush_dcache_icache_hugepage(page);
 
-#if defined(CONFIG_PPC_8xx) || defined(CONFIG_PPC64)
-   /* On 8xx there is no need to kmap since highmem is not supported */
-   __flush_dcache_icache(page_address(page));
-#else
-   if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
+   if (!PageHighMem(page)) {
+   __flush_dcache_icache(lowmem_page_address(page));
+   } else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > 
sizeof(void *)) {
void *start = kmap_atomic(page);
__flush_dcache_icache(start);
kunmap_atomic(start);
} else {
flush_dcache_icache_phys(page_to_phys(page));
}
-#endif
 }
 EXPORT_SYMBOL(flush_dcache_icache_page);
 
-- 
2.25.0



[PATCH v2 5/9] powerpc/mem: Optimise flush_dcache_icache_hugepage()

2021-04-08 Thread Christophe Leroy
flush_dcache_icache_hugepage() is a static function, with
only one caller. That caller calls it when PageCompound() is true,
so bugging on !PageCompound() is useless if we can trust the
compiler a little. Remove the BUG_ON(!PageCompound()).

The number of elements of a page won't change over time, but
GCC doesn't know about it, so it gets the value at every iteration.

To avoid that, call compound_nr() outside the loop and save it in
a local variable.

Whether the page is a HIGHMEM page or not doesn't change over time.

But GCC doesn't know it so it does the test on every iteration.

Do the test outside the loop.

When the page is not a HIGHMEM page, page_address() will fallback on
lowmem_page_address(), so call lowmem_page_address() directly and
don't suffer the call to page_address() on every iteration.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/cacheflush.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 811045c50d82..3268a3e55c3f 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -162,14 +162,14 @@ static void __flush_dcache_icache(void *p)
 static void flush_dcache_icache_hugepage(struct page *page)
 {
int i;
+   int nr = compound_nr(page);
void *start;
 
-   BUG_ON(!PageCompound(page));
-
-   for (i = 0; i < compound_nr(page); i++) {
-   if (!PageHighMem(page)) {
-   __flush_dcache_icache(page_address(page+i));
-   } else {
+   if (!PageHighMem(page)) {
+   for (i = 0; i < nr; i++)
+   __flush_dcache_icache(lowmem_page_address(page + i));
+   } else {
+   for (i = 0; i < nr; i++) {
start = kmap_atomic(page+i);
__flush_dcache_icache(start);
kunmap_atomic(start);
-- 
2.25.0



[PATCH v2 4/9] powerpc/mem: Call flush_coherent_icache() at higher level

2021-04-08 Thread Christophe Leroy
flush_coherent_icache() doesn't need the address anymore,
so it can be called immediately when entering the public
functions and doesn't need to be disseminated among
lower level functions.

And use page_to_phys() instead of open coding the calculation
of phys address to call flush_dcache_icache_phys().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/cacheflush.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index dc2d39da6f63..811045c50d82 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -143,9 +143,6 @@ static void __flush_dcache_icache(void *p)
 {
unsigned long addr = (unsigned long)p;
 
-   if (flush_coherent_icache())
-   return;
-
clean_dcache_range(addr, addr + PAGE_SIZE);
 
/*
@@ -182,6 +179,8 @@ static void flush_dcache_icache_hugepage(struct page *page)
 
 void flush_dcache_icache_page(struct page *page)
 {
+   if (flush_coherent_icache())
+   return;
 
if (PageCompound(page))
return flush_dcache_icache_hugepage(page);
@@ -195,11 +194,7 @@ void flush_dcache_icache_page(struct page *page)
__flush_dcache_icache(start);
kunmap_atomic(start);
} else {
-   unsigned long addr = page_to_pfn(page) << PAGE_SHIFT;
-
-   if (flush_coherent_icache())
-   return;
-   flush_dcache_icache_phys(addr);
+   flush_dcache_icache_phys(page_to_phys(page));
}
 #endif
 }
-- 
2.25.0



[PATCH v2 3/9] powerpc/mem: Remove address argument to flush_coherent_icache()

2021-04-08 Thread Christophe Leroy
flush_coherent_icache() can use any valid address as mentionned
by the comment.

Use PAGE_OFFSET as base address. This allows removing the
user access stuff.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/cacheflush.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 742d3e0fb12f..dc2d39da6f63 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -5,10 +5,9 @@
 
 /**
  * flush_coherent_icache() - if a CPU has a coherent icache, flush it
- * @addr: The base address to use (can be any valid address, the whole cache 
will be flushed)
  * Return true if the cache was flushed, false otherwise
  */
-static inline bool flush_coherent_icache(unsigned long addr)
+static inline bool flush_coherent_icache(void)
 {
/*
 * For a snooping icache, we still need a dummy icbi to purge all the
@@ -18,9 +17,7 @@ static inline bool flush_coherent_icache(unsigned long addr)
 */
if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
mb(); /* sync */
-   allow_read_from_user((const void __user *)addr, L1_CACHE_BYTES);
-   icbi((void *)addr);
-   prevent_read_from_user((const void __user *)addr, 
L1_CACHE_BYTES);
+   icbi((void *)PAGE_OFFSET);
mb(); /* sync */
isync();
return true;
@@ -60,7 +57,7 @@ static void invalidate_icache_range(unsigned long start, 
unsigned long stop)
  */
 void flush_icache_range(unsigned long start, unsigned long stop)
 {
-   if (flush_coherent_icache(start))
+   if (flush_coherent_icache())
return;
 
clean_dcache_range(start, stop);
@@ -146,7 +143,7 @@ static void __flush_dcache_icache(void *p)
 {
unsigned long addr = (unsigned long)p;
 
-   if (flush_coherent_icache(addr))
+   if (flush_coherent_icache())
return;
 
clean_dcache_range(addr, addr + PAGE_SIZE);
@@ -200,7 +197,7 @@ void flush_dcache_icache_page(struct page *page)
} else {
unsigned long addr = page_to_pfn(page) << PAGE_SHIFT;
 
-   if (flush_coherent_icache(addr))
+   if (flush_coherent_icache())
return;
flush_dcache_icache_phys(addr);
}
-- 
2.25.0



[PATCH v2 2/9] powerpc/mem: Declare __flush_dcache_icache() static

2021-04-08 Thread Christophe Leroy
__flush_dcache_icache() is only used in mem.c.

Move it before the functions that use it and declare it static.

And also fix the name of the parameter in the comment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cacheflush.h |  1 -
 arch/powerpc/mm/cacheflush.c  | 60 +--
 2 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index f63495109f63..9110489ea411 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -40,7 +40,6 @@ void flush_icache_user_page(struct vm_area_struct *vma, 
struct page *page,
 #define flush_icache_user_page flush_icache_user_page
 
 void flush_dcache_icache_page(struct page *page);
-void __flush_dcache_icache(void *page);
 
 /**
  * flush_dcache_range(): Write any modified data cache blocks out to memory and
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 40613d2fda37..742d3e0fb12f 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -135,6 +135,36 @@ void flush_dcache_page(struct page *page)
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
+/**
+ * __flush_dcache_icache(): Flush a particular page from the data cache to RAM.
+ * Note: this is necessary because the instruction cache does *not*
+ * snoop from the data cache.
+ *
+ * @p: the address of the page to flush
+ */
+static void __flush_dcache_icache(void *p)
+{
+   unsigned long addr = (unsigned long)p;
+
+   if (flush_coherent_icache(addr))
+   return;
+
+   clean_dcache_range(addr, addr + PAGE_SIZE);
+
+   /*
+* We don't flush the icache on 44x. Those have a virtual icache and we
+* don't have access to the virtual address here (it's not the page
+* vaddr but where it's mapped in user space). The flushing of the
+* icache on these is handled elsewhere, when a change in the address
+* space occurs, before returning to user space.
+*/
+
+   if (mmu_has_feature(MMU_FTR_TYPE_44x))
+   return;
+
+   invalidate_icache_range(addr, addr + PAGE_SIZE);
+}
+
 static void flush_dcache_icache_hugepage(struct page *page)
 {
int i;
@@ -178,36 +208,6 @@ void flush_dcache_icache_page(struct page *page)
 }
 EXPORT_SYMBOL(flush_dcache_icache_page);
 
-/**
- * __flush_dcache_icache(): Flush a particular page from the data cache to RAM.
- * Note: this is necessary because the instruction cache does *not*
- * snoop from the data cache.
- *
- * @page: the address of the page to flush
- */
-void __flush_dcache_icache(void *p)
-{
-   unsigned long addr = (unsigned long)p;
-
-   if (flush_coherent_icache(addr))
-   return;
-
-   clean_dcache_range(addr, addr + PAGE_SIZE);
-
-   /*
-* We don't flush the icache on 44x. Those have a virtual icache and we
-* don't have access to the virtual address here (it's not the page
-* vaddr but where it's mapped in user space). The flushing of the
-* icache on these is handled elsewhere, when a change in the address
-* space occurs, before returning to user space.
-*/
-
-   if (mmu_has_feature(MMU_FTR_TYPE_44x))
-   return;
-
-   invalidate_icache_range(addr, addr + PAGE_SIZE);
-}
-
 void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
 {
clear_page(page);
-- 
2.25.0



[PATCH v2 1/9] powerpc/mem: Move cache flushing functions into mm/cacheflush.c

2021-04-08 Thread Christophe Leroy
Cache flushing functions are in the middle of completely
unrelated stuff in mm/mem.c

Create a dedicated mm/cacheflush.c for those functions.

Also cleanup the list of included headers.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/Makefile |   3 +-
 arch/powerpc/mm/cacheflush.c | 255 +++
 arch/powerpc/mm/mem.c| 281 ---
 3 files changed, 257 insertions(+), 282 deletions(-)
 create mode 100644 arch/powerpc/mm/cacheflush.c

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 3b4e9e4e25ea..c3df3a8501d4 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -8,7 +8,8 @@ ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
 obj-y  := fault.o mem.o pgtable.o mmap.o maccess.o \
   init_$(BITS).o pgtable_$(BITS).o \
   pgtable-frag.o ioremap.o ioremap_$(BITS).o \
-  init-common.o mmu_context.o drmem.o
+  init-common.o mmu_context.o drmem.o \
+  cacheflush.o
 obj-$(CONFIG_PPC_MMU_NOHASH)   += nohash/
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s32/
 obj-$(CONFIG_PPC_BOOK3S_64)+= book3s64/
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
new file mode 100644
index ..40613d2fda37
--- /dev/null
+++ b/arch/powerpc/mm/cacheflush.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include 
+#include 
+
+/**
+ * flush_coherent_icache() - if a CPU has a coherent icache, flush it
+ * @addr: The base address to use (can be any valid address, the whole cache 
will be flushed)
+ * Return true if the cache was flushed, false otherwise
+ */
+static inline bool flush_coherent_icache(unsigned long addr)
+{
+   /*
+* For a snooping icache, we still need a dummy icbi to purge all the
+* prefetched instructions from the ifetch buffers. We also need a sync
+* before the icbi to order the the actual stores to memory that might
+* have modified instructions with the icbi.
+*/
+   if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
+   mb(); /* sync */
+   allow_read_from_user((const void __user *)addr, L1_CACHE_BYTES);
+   icbi((void *)addr);
+   prevent_read_from_user((const void __user *)addr, 
L1_CACHE_BYTES);
+   mb(); /* sync */
+   isync();
+   return true;
+   }
+
+   return false;
+}
+
+/**
+ * invalidate_icache_range() - Flush the icache by issuing icbi across an 
address range
+ * @start: the start address
+ * @stop: the stop address (exclusive)
+ */
+static void invalidate_icache_range(unsigned long start, unsigned long stop)
+{
+   unsigned long shift = l1_icache_shift();
+   unsigned long bytes = l1_icache_bytes();
+   char *addr = (char *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+   unsigned long i;
+
+   for (i = 0; i < size >> shift; i++, addr += bytes)
+   icbi(addr);
+
+   mb(); /* sync */
+   isync();
+}
+
+/**
+ * flush_icache_range: Write any modified data cache blocks out to memory
+ * and invalidate the corresponding blocks in the instruction cache
+ *
+ * Generic code will call this after writing memory, before executing from it.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
+ */
+void flush_icache_range(unsigned long start, unsigned long stop)
+{
+   if (flush_coherent_icache(start))
+   return;
+
+   clean_dcache_range(start, stop);
+
+   if (IS_ENABLED(CONFIG_44x)) {
+   /*
+* Flash invalidate on 44x because we are passed kmapped
+* addresses and this doesn't work for userspace pages due to
+* the virtually tagged icache.
+*/
+   iccci((void *)start);
+   mb(); /* sync */
+   isync();
+   } else
+   invalidate_icache_range(start, stop);
+}
+EXPORT_SYMBOL(flush_icache_range);
+
+#if !defined(CONFIG_PPC_8xx) && !defined(CONFIG_PPC64)
+/**
+ * flush_dcache_icache_phys() - Flush a page by it's physical address
+ * @physaddr: the physical address of the page
+ */
+static void flush_dcache_icache_phys(unsigned long physaddr)
+{
+   unsigned long bytes = l1_dcache_bytes();
+   unsigned long nb = PAGE_SIZE / bytes;
+   unsigned long addr = physaddr & PAGE_MASK;
+   unsigned long msr, msr0;
+   unsigned long loop1 = addr, loop2 = addr;
+
+   msr0 = mfmsr();
+   msr = msr0 & ~MSR_DR;
+   /*
+* This must remain as ASM to prevent potential memory accesses
+* while the data MMU is disabled
+*/
+   asm volatile(
+   "   mtctr %2;\n"
+   "   mtmsr %3;\n"
+   "   isync;\n"
+   

Re: [PATCH v1 2/8] powerpc/mem: Remove address argument to flush_coherent_icache()

2021-04-08 Thread Christophe Leroy




Le 08/04/2021 à 10:50, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


flush_coherent_icache() can use any valid address as mentionned
by the comment.

Use PAGE_OFFSET as base address. This allows removing the
user access stuff.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/mm/mem.c | 13 +
  1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index ce6c81ce4362..19f807b87697 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -342,10 +342,9 @@ void free_initmem(void)
  
  /**

   * flush_coherent_icache() - if a CPU has a coherent icache, flush it
- * @addr: The base address to use (can be any valid address, the whole cache 
will be flushed)
   * Return true if the cache was flushed, false otherwise
   */
-static inline bool flush_coherent_icache(unsigned long addr)
+static inline bool flush_coherent_icache(void)
  {
/*
 * For a snooping icache, we still need a dummy icbi to purge all the
@@ -355,9 +354,7 @@ static inline bool flush_coherent_icache(unsigned long addr)
 */
if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
mb(); /* sync */
-   allow_read_from_user((const void __user *)addr, L1_CACHE_BYTES);
-   icbi((void *)addr);
-   prevent_read_from_user((const void __user *)addr, 
L1_CACHE_BYTES);
+   icbi((void *)PAGE_OFFSET);
mb(); /* sync */
isync();
return true;


do we need that followup sync? Usermanual suggest sync; icbi(any address);
isync sequence.


I don't know.

The original implementation is here: 
https://github.com/linuxppc/linux/commit/0ce636700

Christophe


[PATCH v3] powerpc/traps: Enhance readability for trap types

2021-04-08 Thread Xiongwei Song
From: Xiongwei Song 

Create a new header named traps.h, define macros to list ppc interrupt
types in traps.h, replace the reference of the trap hex values with these
macros.

Referred the hex number in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S and
arch/powerpc/include/asm/kvm_asm.h.

v2-v3:
Correct the prefix of trap macros with INTERRUPT_, the previous prefix
is TRAP_, which is not precise. This is suggested by Segher Boessenkool
and Nicholas Piggin.

v1-v2:
Define more trap macros to replace more trap hexs in code, not just for
the __show_regs function. This is suggested by Christophe Leroy.

Signed-off-by: Xiongwei Song 
---
 arch/powerpc/include/asm/interrupt.h  |  9 +---
 arch/powerpc/include/asm/ptrace.h |  3 ++-
 arch/powerpc/include/asm/traps.h  | 32 +++
 arch/powerpc/kernel/interrupt.c   |  3 ++-
 arch/powerpc/kernel/process.c |  5 -
 arch/powerpc/mm/book3s64/hash_utils.c |  5 +++--
 arch/powerpc/mm/fault.c   | 21 +++---
 arch/powerpc/perf/core-book3s.c   |  5 +++--
 arch/powerpc/xmon/xmon.c  | 16 +++---
 9 files changed, 78 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/traps.h

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 7c633896d758..5ce9898bc9a6 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct interrupt_state {
 #ifdef CONFIG_PPC_BOOK3E_64
@@ -59,7 +60,7 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs, struct interrup
 * CT_WARN_ON comes here via program_check_exception,
 * so avoid recursion.
 */
-   if (TRAP(regs) != 0x700)
+   if (TRAP(regs) != INTERRUPT_PROGRAM)
CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
}
 #endif
@@ -156,7 +157,8 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
/* Don't do any per-CPU operations until interrupt state is fixed */
 #endif
/* Allow DEC and PMI to be traced when they are soft-NMI */
-   if (TRAP(regs) != 0x900 && TRAP(regs) != 0xf00 && TRAP(regs) != 0x260) {
+   if (TRAP(regs) != INTERRUPT_DECREMENTER &&
+   TRAP(regs) != INTERRUPT_PERFMON) {
state->ftrace_enabled = this_cpu_get_ftrace_enabled();
this_cpu_set_ftrace_enabled(0);
}
@@ -180,7 +182,8 @@ static inline void interrupt_nmi_exit_prepare(struct 
pt_regs *regs, struct inter
nmi_exit();
 
 #ifdef CONFIG_PPC64
-   if (TRAP(regs) != 0x900 && TRAP(regs) != 0xf00 && TRAP(regs) != 0x260)
+   if (TRAP(regs) != INTERRUPT_DECREMENTER &&
+   TRAP(regs) != INTERRUPT_PERFMON)
this_cpu_set_ftrace_enabled(state->ftrace_enabled);
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index f10498e1b3f6..7a17e0365d43 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -21,6 +21,7 @@
 
 #include 
 #include 
+#include 
 
 #ifndef __ASSEMBLY__
 struct pt_regs
@@ -237,7 +238,7 @@ static inline bool trap_is_unsupported_scv(struct pt_regs 
*regs)
 
 static inline bool trap_is_syscall(struct pt_regs *regs)
 {
-   return (trap_is_scv(regs) || TRAP(regs) == 0xc00);
+   return (trap_is_scv(regs) || TRAP(regs) == INTERRUPT_SYSCALL);
 }
 
 static inline bool trap_norestart(struct pt_regs *regs)
diff --git a/arch/powerpc/include/asm/traps.h b/arch/powerpc/include/asm/traps.h
new file mode 100644
index ..cb416a17097c
--- /dev/null
+++ b/arch/powerpc/include/asm/traps.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_PPC_TRAPS_H
+#define _ASM_PPC_TRAPS_H
+
+#if defined(CONFIG_BOOKE) || defined(CONFIG_4xx)
+#define INTERRUPT_MACHINE_CHECK   0x000
+#define INTERRUPT_CRITICAL_INPUT  0x100
+#define INTERRUPT_ALTIVEC_UNAVAIL 0x200
+#define INTERRUPT_PERFMON 0x260
+#define INTERRUPT_DOORBELL0x280
+#define INTERRUPT_DEBUG   0xd00
+#elif defined(CONFIG_PPC_BOOK3S)
+#define INTERRUPT_SYSTEM_RESET0x100
+#define INTERRUPT_MACHINE_CHECK   0x200
+#define INTERRUPT_DATA_SEGMENT0x380
+#define INTERRUPT_INST_SEGMENT0x480
+#define INTERRUPT_DOORBELL0xa00
+#define INTERRUPT_TRACE   0xd00
+#define INTERRUPT_H_DATA_STORAGE  0xe00
+#define INTERRUPT_PERFMON 0xf00
+#define INTERRUPT_H_FAC_UNAVAIL   0xf80
+#endif
+
+#define INTERRUPT_DATA_STORAGE0x300
+#define INTERRUPT_INST_STORAGE0x400
+#define INTERRUPT_ALIGNMENT   0x600
+#define INTERRUPT_PROGRAM 0x700
+#define INTERRUPT_FP_UNAVAIL  0x800
+#define INTERRUPT_DECREMENTER 0x900
+#define INTERRUPT_SYSCALL 0xc00
+
+#endif /* _ASM_PPC_TRAPS_H */
diff --git a/arch/powerpc/kernel/i

[PATCH v2 3/5] powerpc/rtas: remove ibm_suspend_me_token

2021-04-08 Thread Nathan Lynch
There's not a compelling reason to cache the value of the token for
the ibm,suspend-me function. Just look it up when needed in the RTAS
syscall's special case for it.

Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Andrew Donnellan 
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d126d71ea5bd..60fcf7f7b0b8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -828,7 +828,6 @@ void rtas_activate_firmware(void)
pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
 }
 
-static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
 #ifdef CONFIG_PPC_PSERIES
 /**
  * rtas_call_reentrant() - Used for reentrant rtas calls
@@ -1103,7 +1102,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
return -EINVAL;
 
/* Need to handle ibm,suspend_me call specially */
-   if (token == ibm_suspend_me_token) {
+   if (token == rtas_token("ibm,suspend-me")) {
 
/*
 * rtas_ibm_suspend_me assumes the streamid handle is in cpu
@@ -1191,10 +1190,8 @@ void __init rtas_initialize(void)
 * the stop-self token if any
 */
 #ifdef CONFIG_PPC64
-   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
-   ibm_suspend_me_token = rtas_token("ibm,suspend-me");
-   }
 #endif
rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
 0, rtas_region);
-- 
2.30.2



[PATCH v2 2/5] powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX

2021-04-08 Thread Nathan Lynch
This constant is unused.

Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Andrew Donnellan 
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas-proc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index e0f8329966d6..d2b0d99824a4 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -755,8 +755,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
return 0;
 }
 
-#define RMO_READ_BUF_MAX 30
-
 /**
  * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
  *
-- 
2.30.2



[PATCH v2 4/5] powerpc/rtas: move syscall filter setup into separate function

2021-04-08 Thread Nathan Lynch
Reduce conditionally compiled sections within rtas_initialize() by
moving the filter table initialization into its own function already
guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended.

Reviewed-by: Alexey Kardashevskiy 
Acked-by: Andrew Donnellan 
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 60fcf7f7b0b8..24dc7bc463a8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1051,6 +1051,14 @@ static bool block_rtas_call(int token, int nargs,
return true;
 }
 
+static void __init rtas_syscall_filter_init(void)
+{
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++)
+   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
+}
+
 #else
 
 static bool block_rtas_call(int token, int nargs,
@@ -1059,6 +1067,10 @@ static bool block_rtas_call(int token, int nargs,
return false;
 }
 
+static void __init rtas_syscall_filter_init(void)
+{
+}
+
 #endif /* CONFIG_PPC_RTAS_FILTER */
 
 /* We assume to be passed big endian arguments */
@@ -1162,9 +1174,6 @@ void __init rtas_initialize(void)
unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
u32 base, size, entry;
int no_base, no_size, no_entry;
-#ifdef CONFIG_PPC_RTAS_FILTER
-   int i;
-#endif
 
/* Get RTAS dev node and fill up our "rtas" structure with infos
 * about it.
@@ -1203,11 +1212,7 @@ void __init rtas_initialize(void)
rtas_last_error_token = rtas_token("rtas-last-error");
 #endif
 
-#ifdef CONFIG_PPC_RTAS_FILTER
-   for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
-   rtas_filters[i].token = rtas_token(rtas_filters[i].name);
-   }
-#endif
+   rtas_syscall_filter_init();
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,
-- 
2.30.2



[PATCH v2 5/5] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

2021-04-08 Thread Nathan Lynch
RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION_SIZE.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h | 6 +++---
 arch/powerpc/kernel/rtas-proc.c | 2 +-
 arch/powerpc/kernel/rtas.c  | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 658448ca5b8a..9dc97d2f9d27 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -19,8 +19,8 @@
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above 
this value */
 
-/* Buffer size for ppc_rtas system call. */
-#define RTAS_RMOBUF_MAX (64 * 1024)
+/* Memory set aside for sys_rtas to use with calls that need a work area. */
+#define RTAS_USER_REGION_SIZE (64 * 1024)
 
 /* RTAS return status codes */
 #define RTAS_BUSY  -2/* RTAS Busy */
@@ -357,7 +357,7 @@ extern void rtas_take_timebase(void);
 static inline int page_is_rtas_user_buf(unsigned long pfn)
 {
unsigned long paddr = (pfn << PAGE_SHIFT);
-   if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_RMOBUF_MAX))
+   if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + 
RTAS_USER_REGION_SIZE))
return 1;
return 0;
 }
diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index d2b0d99824a4..6857a5b0a1c3 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -767,6 +767,6 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
  */
 static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v)
 {
-   seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX);
+   seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_USER_REGION_SIZE);
return 0;
 }
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 24dc7bc463a8..6bada744402b 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -987,10 +987,10 @@ static struct rtas_filter rtas_filters[] __ro_after_init 
= {
 static bool in_rmo_buf(u32 base, u32 end)
 {
return base >= rtas_rmo_buf &&
-   base < (rtas_rmo_buf + RTAS_RMOBUF_MAX) &&
+   base < (rtas_rmo_buf + RTAS_USER_REGION_SIZE) &&
base <= end &&
end >= rtas_rmo_buf &&
-   end < (rtas_rmo_buf + RTAS_RMOBUF_MAX);
+   end < (rtas_rmo_buf + RTAS_USER_REGION_SIZE);
 }
 
 static bool block_rtas_call(int token, int nargs,
@@ -1202,7 +1202,7 @@ void __init rtas_initialize(void)
if (firmware_has_feature(FW_FEATURE_LPAR))
rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
 #endif
-   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE,
+   rtas_rmo_buf = memblock_phys_alloc_range(RTAS_USER_REGION_SIZE, 
PAGE_SIZE,
 0, rtas_region);
if (!rtas_rmo_buf)
panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",
-- 
2.30.2



[PATCH v2 1/5] powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation

2021-04-08 Thread Nathan Lynch
Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for
/proc/powerpc/rtas/rmo_buffer, explaining its expected use.

Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Andrew Donnellan 
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas-proc.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 2d33f342a293..e0f8329966d6 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -757,7 +757,16 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
 
 #define RMO_READ_BUF_MAX 30
 
-/* RTAS Userspace access */
+/**
+ * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
+ *
+ * Base + size description of a range of RTAS-addressable memory set
+ * aside for user space to use as work area(s) for certain RTAS
+ * functions. User space accesses this region via /dev/mem. Apart from
+ * security policies, the kernel does not arbitrate or serialize
+ * access to this region, and user space must ensure that concurrent
+ * users do not interfere with each other.
+ */
 static int ppc_rtas_rmo_buf_show(struct seq_file *m, void *v)
 {
seq_printf(m, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX);
-- 
2.30.2



[PATCH v2 0/5] powerpc/rtas: miscellaneous cleanups

2021-04-08 Thread Nathan Lynch
This is a reroll of the series posted here:
https://lore.kernel.org/linuxppc-dev/20210114220004.1138993-1-nath...@linux.ibm.com/

Originally this work was prompted by failures on radix MMU PowerVM
guests when passing buffers to RTAS that lay outside of its idea of
the RMA. In v1 I approached this as a problem to be solved in Linux,
but RTAS development has since decided to change their code so that
the RMA restriction does not apply with radix.

So in v2 I retain the cleanups and discard the more significant change
which accommodated the misbehaving RTAS versions.

Changes since v1:
- Correct missing conversion of RTAS_RMOBUF_MAX ->
  RTAS_USER_REGION_SIZE in in_rmo_buf().
- Remove unnecessary braces in rtas_syscall_filter_init().
- Leave expression of RTAS_WORK_AREA_SIZE as-is instead of changing
  the factors in a confusing way, per discussion with Alexey.
- Drop "powerpc/rtas: constrain user region allocation to RMA"

Nathan Lynch (5):
  powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation
  powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX
  powerpc/rtas: remove ibm_suspend_me_token
  powerpc/rtas: move syscall filter setup into separate function
  powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

 arch/powerpc/include/asm/rtas.h |  6 +++---
 arch/powerpc/kernel/rtas-proc.c | 15 +++
 arch/powerpc/kernel/rtas.c  | 34 +
 3 files changed, 32 insertions(+), 23 deletions(-)

-- 
2.30.2



Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-08 Thread Andy Shevchenko
On Thu, Apr 08, 2021 at 02:45:12PM +0200, Rasmus Villemoes wrote:
> On 06/04/2021 15.31, Andy Shevchenko wrote:
> > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > Here is the attempt to start cleaning it up by splitting out panic and
> > oops helpers.
> 
> Yay.
> 
> Acked-by: Rasmus Villemoes 

Thanks!

> > At the same time convert users in header and lib folder to use new header.
> > Though for time being include new header back to kernel.h to avoid twisted
> > indirected includes for existing users.
> 
> I think it would be good to have some place to note that "This #include
> is just for backwards compatibility, it will go away RealSoonNow, so if
> you rely on something from linux/panic.h, include that explicitly
> yourself TYVM. And if you're looking for a janitorial task, write a
> script to check that every file that uses some identifier defined in
> panic.h actually includes that file. When all offenders are found and
> dealt with, remove the #include and this note.".

Good and...

> > +struct taint_flag {
> > +   char c_true;/* character printed when tainted */
> > +   char c_false;   /* character printed when not tainted */
> > +   bool module;/* also show as a per-module taint flag */
> > +};
> > +
> > +extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];
> 
> While you're doing this, nothing outside of kernel/panic.c cares about
> the definition of struct taint_flag or use the taint_flags array, so
> could you make the definition private to that file and make the array
> static? (Another patch, of course.)

...according to the above if *you are looking for a janitorial task*... :-))

> > +enum lockdep_ok {
> > +   LOCKDEP_STILL_OK,
> > +   LOCKDEP_NOW_UNRELIABLE,
> > +};
> > +
> > +extern const char *print_tainted(void);
> > +extern void add_taint(unsigned flag, enum lockdep_ok);
> > +extern int test_taint(unsigned flag);
> > +extern unsigned long get_taint(void);
> 
> I know you're just moving code, but it would be a nice opportunity to
> drop the redundant externs.

As above. But for all these I have heard you. So, I'll keep this response
as part of my always only growing TODO list.

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-08 Thread Rasmus Villemoes
On 06/04/2021 15.31, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> Here is the attempt to start cleaning it up by splitting out panic and
> oops helpers.

Yay.

Acked-by: Rasmus Villemoes 

> At the same time convert users in header and lib folder to use new header.
> Though for time being include new header back to kernel.h to avoid twisted
> indirected includes for existing users.

I think it would be good to have some place to note that "This #include
is just for backwards compatibility, it will go away RealSoonNow, so if
you rely on something from linux/panic.h, include that explicitly
yourself TYVM. And if you're looking for a janitorial task, write a
script to check that every file that uses some identifier defined in
panic.h actually includes that file. When all offenders are found and
dealt with, remove the #include and this note.".

> +
> +struct taint_flag {
> + char c_true;/* character printed when tainted */
> + char c_false;   /* character printed when not tainted */
> + bool module;/* also show as a per-module taint flag */
> +};
> +
> +extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];

While you're doing this, nothing outside of kernel/panic.c cares about
the definition of struct taint_flag or use the taint_flags array, so
could you make the definition private to that file and make the array
static? (Another patch, of course.)

> +enum lockdep_ok {
> + LOCKDEP_STILL_OK,
> + LOCKDEP_NOW_UNRELIABLE,
> +};
> +
> +extern const char *print_tainted(void);
> +extern void add_taint(unsigned flag, enum lockdep_ok);
> +extern int test_taint(unsigned flag);
> +extern unsigned long get_taint(void);

I know you're just moving code, but it would be a nice opportunity to
drop the redundant externs.

Rasmus


Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask

2021-04-08 Thread Srikar Dronamraju
* Nathan Lynch  [2021-04-07 14:46:24]:

> Srikar Dronamraju  writes:
> 
> > * Nathan Lynch  [2021-04-07 07:19:10]:
> >
> >> Sorry for the delay in following up here.
> >> 
> >
> > No issues.
> >
> >> >> So I'd suggest that pseries_add_processor() be made to update
> >> >> these things when the CPUs are marked present, before onlining them.
> >> >
> >> > In pseries_add_processor, we are only marking the cpu as present. i.e
> >> > I believe numa_setup_cpu() would not have been called. So we may not 
> >> > have a
> >> > way to associate the CPU to the node. Otherwise we will have to call
> >> > numa_setup_cpu() or the hcall_vphn.
> >> >
> >> > We could try calling numa_setup_cpu() immediately after we set the
> >> > CPU to be present, but that would be one more extra hcall + I dont know 
> >> > if
> >> > there are any more steps needed before CPU being made present and
> >> > associating the CPU to the node.
> >> 
> >> An additional hcall in this path doesn't seem too expensive.
> >> 
> >> > Are we sure the node is already online?
> >> 
> >> I see that dlpar_online_cpu() calls find_and_online_cpu_nid(), so yes I
> >> think that's covered.
> >
> > Okay, 
> >
> > Can we just call set_cpu_numa_node() at the end of map_cpu_to_node().
> > The advantage would be the update to numa_cpu_lookup_table and cpu_to_node
> > would happen at the same time and would be in sync.
> 
> I don't know. I guess this question just makes me wonder whether powerpc
> needs to have the additional lookup table. How is it different from the
> generic per_cpu numa_node?

lookup table is for early cpu to node i.e when per_cpu variables may not be
available. This would mean that calling set_numa_node/set_cpu_numa_node from
map_cpu_to_node() may not always be an option, since map_cpu_to_node() does
end up getting called very early in the system.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 2/2] powerpc: make 'boot_text_mapped' static

2021-04-08 Thread yukuai (C)

On 2021/04/08 13:04, Christophe Leroy wrote:



Le 08/04/2021 à 03:18, Yu Kuai a écrit :

The sparse tool complains as follow:

arch/powerpc/kernel/btext.c:48:5: warning:
  symbol 'boot_text_mapped' was not declared. Should it be static?

This symbol is not used outside of btext.c, so this commit make
it static.

Signed-off-by: Yu Kuai 
---
  arch/powerpc/kernel/btext.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index 359d0f4ca532..8df9230be6fa 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -45,7 +45,7 @@ unsigned long disp_BAT[2] __initdata = {0, 0};
  static unsigned char vga_font[cmapsz];
-int boot_text_mapped __force_data = 0;
+static int boot_text_mapped __force_data;


Are you sure the initialisation to 0 can be removed ? Usually 
initialisation to 0 is not needed because not initialised variables go 
in the BSS section which is zeroed at startup. But here the variable is 
flagged with __force_data so it is not going in the BSS section.


Hi,

I removed the initialisation to 0 because checkpatch complained about
it, I do not familiar with '__force_data', thanks for pointing it out.

Thanks,
Yu Kuai




  extern void rmci_on(void);
  extern void rmci_off(void);


.



Re: [PATCH 1/2] powerpc: remove set but not used variable 'force_printk_to_btext'

2021-04-08 Thread yukuai (C)

On 2021/04/08 13:01, Christophe Leroy wrote:



Le 08/04/2021 à 03:18, Yu Kuai a écrit :

Fixes gcc '-Wunused-but-set-variable' warning:

arch/powerpc/kernel/btext.c:49:12: error: 'force_printk_to_btext'
defined but not used.


You don't get this error as it is now.
You will get this error only if you make it 'static', which is what you 
did in your first patch based on the 'sparse' report.


When removing a non static variable, you should explain that you can 
remove it after you have verifier that it is nowhere used, neither in 
that file nor in any other one.


Hi,

I do use 'git grep force_printk_to_btext' to confirm that
'force_printk_to_btext' is not used anywhere. Maybe it's better to
metion it in commit message?

Thanks
Yu Kuai




It is never used, and so can be removed.

Signed-off-by: Yu Kuai 
---
  arch/powerpc/kernel/btext.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index 803c2a45b22a..359d0f4ca532 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -46,7 +46,6 @@ unsigned long disp_BAT[2] __initdata = {0, 0};
  static unsigned char vga_font[cmapsz];
  int boot_text_mapped __force_data = 0;
-int force_printk_to_btext = 0;
  extern void rmci_on(void);
  extern void rmci_off(void);


.



Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread Michael Ellerman
Leonardo Bras  writes:
> On Thu, 2021-04-08 at 03:20 -0300, Leonardo Bras wrote:
>> > > +#define QUERY_DDW_PGSIZE_4K 0x01
>> > > +#define QUERY_DDW_PGSIZE_64K0x02
>> > > +#define QUERY_DDW_PGSIZE_16M0x04
>> > > +#define QUERY_DDW_PGSIZE_32M0x08
>> > > +#define QUERY_DDW_PGSIZE_64M0x10
>> > > +#define QUERY_DDW_PGSIZE_128M   0x20
>> > > +#define QUERY_DDW_PGSIZE_256M   0x40
>> > > +#define QUERY_DDW_PGSIZE_16G0x80
>> > 
>> > I'm not sure the #defines really gain us much vs just putting the
>> > literal values in the array below?
>> 
>> My v1 did not use the define approach, what do you think of that?
>> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210322190943.715368-1-leobra...@gmail.com/
>> 
>> 
> (of course, it would be that without the pageshift defines also, using
> the __builtin_ctz() approach suggested by Alexey.)

Yeah I think I like that better.

cheers


Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> On 08/04/2021 15:37, Michael Ellerman wrote:
>> Leonardo Bras  writes:
>>> According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
>>> will let the OS know all possible pagesizes that can be used for creating a
>>> new DDW.
>>>
>>> Currently Linux will only try using 3 of the 8 available options:
>>> 4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
>>> 128M, 256M and 16G.
>> 
>> Do we know of any hardware & hypervisor combination that will actually
>> give us bigger pages?
>
>
> On P8 16MB host pages and 16MB hardware iommu pages worked.
>
> On P9, VM's 16MB IOMMU pages worked on top of 2MB host pages + 2MB 
> hardware IOMMU pages.

The current code already tries 16MB though.

I'm wondering if we're going to ask for larger sizes that have never
been tested and possibly expose bugs. But it sounds like this is mainly
targeted at future platforms.


>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index 9fc5217f0c8e..6cda1c92597d 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -53,6 +53,20 @@ enum {
>>> DDW_EXT_QUERY_OUT_SIZE = 2
>>>   };
>> 
>> A comment saying where the values come from would be good.
>> 
>>> +#define QUERY_DDW_PGSIZE_4K0x01
>>> +#define QUERY_DDW_PGSIZE_64K   0x02
>>> +#define QUERY_DDW_PGSIZE_16M   0x04
>>> +#define QUERY_DDW_PGSIZE_32M   0x08
>>> +#define QUERY_DDW_PGSIZE_64M   0x10
>>> +#define QUERY_DDW_PGSIZE_128M  0x20
>>> +#define QUERY_DDW_PGSIZE_256M  0x40
>>> +#define QUERY_DDW_PGSIZE_16G   0x80
>> 
>> I'm not sure the #defines really gain us much vs just putting the
>> literal values in the array below?
>
> Then someone says "u magic values" :) I do not mind either way. Thanks,

Yeah that's true. But #defining them doesn't make them less magic, if
you only use them in one place :)

cheers


Re: [PATCH v1 2/8] powerpc/mem: Remove address argument to flush_coherent_icache()

2021-04-08 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> flush_coherent_icache() can use any valid address as mentionned
> by the comment.
>
> Use PAGE_OFFSET as base address. This allows removing the
> user access stuff.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/mm/mem.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index ce6c81ce4362..19f807b87697 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -342,10 +342,9 @@ void free_initmem(void)
>  
>  /**
>   * flush_coherent_icache() - if a CPU has a coherent icache, flush it
> - * @addr: The base address to use (can be any valid address, the whole cache 
> will be flushed)
>   * Return true if the cache was flushed, false otherwise
>   */
> -static inline bool flush_coherent_icache(unsigned long addr)
> +static inline bool flush_coherent_icache(void)
>  {
>   /*
>* For a snooping icache, we still need a dummy icbi to purge all the
> @@ -355,9 +354,7 @@ static inline bool flush_coherent_icache(unsigned long 
> addr)
>*/
>   if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
>   mb(); /* sync */
> - allow_read_from_user((const void __user *)addr, L1_CACHE_BYTES);
> - icbi((void *)addr);
> - prevent_read_from_user((const void __user *)addr, 
> L1_CACHE_BYTES);
> + icbi((void *)PAGE_OFFSET);
>   mb(); /* sync */
>   isync();
>   return true;

do we need that followup sync? Usermanual suggest sync; icbi(any address);
isync sequence. 

-aneesh


Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread kernel test robot
Hi Leonardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.12-rc6 next-20210407]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Leonardo-Bras/powerpc-iommu-Enable-remaining-IOMMU-Pagesizes-present-in-LoPAR/20210408-035800
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r016-20210407 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/faa8b10e5b9652dbd56ed8e759a1cc09b95805be
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Leonardo-Bras/powerpc-iommu-Enable-remaining-IOMMU-Pagesizes-present-in-LoPAR/20210408-035800
git checkout faa8b10e5b9652dbd56ed8e759a1cc09b95805be
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from include/vdso/const.h:5,
from include/linux/const.h:4,
from include/linux/bits.h:5,
from include/linux/bitops.h:6,
from include/linux/kernel.h:11,
from include/asm-generic/bug.h:20,
from arch/powerpc/include/asm/bug.h:109,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/gfp.h:5,
from include/linux/slab.h:15,
from arch/powerpc/platforms/pseries/iommu.c:15:
   arch/powerpc/platforms/pseries/iommu.c: In function 'iommu_get_page_shift':
>> include/uapi/linux/const.h:20:19: error: conversion from 'long long unsigned 
>> int' to 'unsigned int' changes value from '17179869184' to '0' 
>> [-Werror=overflow]
  20 | #define __AC(X,Y) (X##Y)
 |   ^~
   include/uapi/linux/const.h:21:18: note: in expansion of macro '__AC'
  21 | #define _AC(X,Y) __AC(X,Y)
 |  ^~~~
   include/linux/sizes.h:48:19: note: in expansion of macro '_AC'
  48 | #define SZ_16G_AC(0x4, ULL)
 |   ^~~
   arch/powerpc/platforms/pseries/iommu.c:1120:42: note: in expansion of macro 
'SZ_16G'
1120 |   { QUERY_DDW_PGSIZE_16G,  __builtin_ctz(SZ_16G)  },
 |  ^~
   cc1: all warnings being treated as errors


vim +20 include/uapi/linux/const.h

9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02   6  
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02   7  
/* Some constant macros are used in both assembler and
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02   8   
* C code.  Therefore we cannot annotate them always with
6df95fd7ad9a84 include/linux/const.h  Randy Dunlap2007-05-08   9   
* 'UL' and other type specifiers unilaterally.  We
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  10   
* use the following macros to deal with this.
74ef649fe847fd include/linux/const.h  Jeremy Fitzhardinge 2008-01-30  11   *
74ef649fe847fd include/linux/const.h  Jeremy Fitzhardinge 2008-01-30  12   
* Similarly, _AT() will cast an expression with a type in C, but
74ef649fe847fd include/linux/const.h  Jeremy Fitzhardinge 2008-01-30  13   
* leave it unchanged in asm.
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  14   
*/
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  15  
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  16  
#ifdef __ASSEMBLY__
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  17  
#define _AC(X,Y)  X
74ef649fe847fd include/linux/const.h  Jeremy Fitzhardinge 2008-01-30  18  
#define _AT(T,X)  X
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  19  
#else
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02 @20  
#define __AC(X,Y) (X##Y)
9d291e787b2b71 include/asm-x86_64/const.h Vivek Goyal 2007-05-02  21  
#define _AC(X,Y)  __AC(X,Y)
74ef649fe847fd include/linux/const.h  Jeremy Fitzhardinge 2008-01-30  22  
#define _AT(T,X)  ((T)(X))
9d291e787b2b71 include/asm-x86_64/cons

[PATCH v4 1/2] powerpc/perf: Infrastructure to support checking of attr.config*

2021-04-08 Thread Madhavan Srinivasan
Introduce code to support the checking of attr.config* for
values which are reserved for a given platform.
Performance Monitoring Unit (PMU) configuration registers
have fields that are reserved and some specific values for
bit fields are reserved. For ex., MMCRA[61:62] is
Random Sampling Mode (SM) and value of 0b11 for this field
is reserved.

Writing non-zero or invalid values in these fields will
have unknown behaviours.

Patch adds a generic call-back function "check_attr_config"
in "struct power_pmu", to be called in event_init to
check for attr.config* values for a given platform.

Signed-off-by: Madhavan Srinivasan 
---
Changelog v3:
-Made the check_attr_config() to be called for all event type of instead
 only for raw event type.

Changelog v2:
-Fixed commit message

Changelog v1:
-Fixed commit message and in-code comments

 arch/powerpc/include/asm/perf_event_server.h |  6 ++
 arch/powerpc/perf/core-book3s.c  | 11 +++
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 00e7e671bb4b..dde97d7d9253 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -67,6 +67,12 @@ struct power_pmu {
 * the pmu supports extended perf regs capability
 */
int capabilities;
+   /*
+* Function to check event code for values which are
+* reserved. Function takes struct perf_event as input,
+* since event code could be spread in attr.config*
+*/
+   int (*check_attr_config)(struct perf_event *ev);
 };
 
 /*
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 766f064f00fb..b17358e8dc12 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1963,6 +1963,17 @@ static int power_pmu_event_init(struct perf_event *event)
return -ENOENT;
}
 
+   /*
+* PMU config registers have fields that are
+* reserved and some specific values for bit fields are reserved.
+* For ex., MMCRA[61:62] is Randome Sampling Mode (SM)
+* and value of 0b11 to this field is reserved.
+* Check for invalid values in attr.config.
+*/
+   if (ppmu->check_attr_config &&
+   ppmu->check_attr_config(event))
+   return -EINVAL;
+
event->hw.config_base = ev;
event->hw.idx = 0;
 
-- 
2.26.2



[PATCH 2/2] powerpc/perf: Add platform specific check_attr_config

2021-04-08 Thread Madhavan Srinivasan
Add platform specific attr.config value checks. Patch
includes checks for both power9 and power10.

Signed-off-by: Madhavan Srinivasan 
---
Changelog v3:
- No changes

Changelog v2:
- Changed function name as suggested.
- Added name of source document referred for reserved values

Changelog v1:
- No changes

 arch/powerpc/perf/isa207-common.c | 42 +++
 arch/powerpc/perf/isa207-common.h |  2 ++
 arch/powerpc/perf/power10-pmu.c   | 13 ++
 arch/powerpc/perf/power9-pmu.c| 13 ++
 4 files changed, 70 insertions(+)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..358a0e95ba5f 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -694,3 +694,45 @@ int isa207_get_alternatives(u64 event, u64 alt[], int 
size, unsigned int flags,
 
return num_alt;
 }
+
+int isa3XX_check_attr_config(struct perf_event *ev)
+{
+   u64 val, sample_mode;
+   u64 event = ev->attr.config;
+
+   val = (event >> EVENT_SAMPLE_SHIFT) & EVENT_SAMPLE_MASK;
+   sample_mode = val & 0x3;
+
+   /*
+* MMCRA[61:62] is Random Sampling Mode (SM).
+* value of 0b11 is reserved.
+*/
+   if (sample_mode == 0x3)
+   return -EINVAL;
+
+   /*
+* Check for all reserved value
+* Source: Performance Monitoring Unit User Guide
+*/
+   switch (val) {
+   case 0x5:
+   case 0x9:
+   case 0xD:
+   case 0x19:
+   case 0x1D:
+   case 0x1A:
+   case 0x1E:
+   return -EINVAL;
+   }
+
+   /*
+* MMCRA[48:51]/[52:55]) Threshold Start/Stop
+* Events Selection.
+* 0b/0b is reserved.
+*/
+   val = (event >> EVENT_THR_CTL_SHIFT) & EVENT_THR_CTL_MASK;
+   if (((val & 0xF0) == 0xF0) || ((val & 0xF) == 0xF))
+   return -EINVAL;
+
+   return 0;
+}
diff --git a/arch/powerpc/perf/isa207-common.h 
b/arch/powerpc/perf/isa207-common.h
index 1af0e8c97ac7..b4d2a2b2b346 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -280,4 +280,6 @@ void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, 
u32 flags,
struct pt_regs *regs);
 void isa207_get_mem_weight(u64 *weight);
 
+int isa3XX_check_attr_config(struct perf_event *ev);
+
 #endif
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index a901c1348cad..f9d64c63bb4a 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -106,6 +106,18 @@ static int power10_get_alternatives(u64 event, unsigned 
int flags, u64 alt[])
return num_alt;
 }
 
+static int power10_check_attr_config(struct perf_event *ev)
+{
+   u64 val;
+   u64 event = ev->attr.config;
+
+   val = (event >> EVENT_SAMPLE_SHIFT) & EVENT_SAMPLE_MASK;
+   if (val == 0x10 || isa3XX_check_attr_config(ev))
+   return -EINVAL;
+
+   return 0;
+}
+
 GENERIC_EVENT_ATTR(cpu-cycles, PM_RUN_CYC);
 GENERIC_EVENT_ATTR(instructions,   PM_RUN_INST_CMPL);
 GENERIC_EVENT_ATTR(branch-instructions,PM_BR_CMPL);
@@ -559,6 +571,7 @@ static struct power_pmu power10_pmu = {
.attr_groups= power10_pmu_attr_groups,
.bhrb_nr= 32,
.capabilities   = PERF_PMU_CAP_EXTENDED_REGS,
+   .check_attr_config  = power10_check_attr_config,
 };
 
 int init_power10_pmu(void)
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 2a57e93a79dc..ff3382140d7e 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -151,6 +151,18 @@ static int power9_get_alternatives(u64 event, unsigned int 
flags, u64 alt[])
return num_alt;
 }
 
+static int power9_check_attr_config(struct perf_event *ev)
+{
+   u64 val;
+   u64 event = ev->attr.config;
+
+   val = (event >> EVENT_SAMPLE_SHIFT) & EVENT_SAMPLE_MASK;
+   if (val == 0xC || isa3XX_check_attr_config(ev))
+   return -EINVAL;
+
+   return 0;
+}
+
 GENERIC_EVENT_ATTR(cpu-cycles, PM_CYC);
 GENERIC_EVENT_ATTR(stalled-cycles-frontend,PM_ICT_NOSLOT_CYC);
 GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
@@ -437,6 +449,7 @@ static struct power_pmu power9_pmu = {
.attr_groups= power9_pmu_attr_groups,
.bhrb_nr= 32,
.capabilities   = PERF_PMU_CAP_EXTENDED_REGS,
+   .check_attr_config  = power9_check_attr_config,
 };
 
 int init_power9_pmu(void)
-- 
2.26.2



Re: [PATCH v6 00/48] KVM: PPC: Book3S: C-ify the P9 entry/exit code

2021-04-08 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of April 5, 2021 11:19 am:
> Git tree here
> 
> https://github.com/npiggin/linux/tree/kvm-in-c-v6
> 

In the interest of making things more managable, I would like to submit 
some initial things for merge, which have mostly had pretty good review
(I'll repost them in a new series or set of series if there is no
objection, rather than pick from this series).

> Nicholas Piggin (48):
>   KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs
>   KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits
>   KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2
>   KVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC]
>   KVM: PPC: Book3S HV: Remove redundant mtspr PSPB
>   KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument
>   KVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls
>   powerpc/64s: Remove KVM handler support from CBE_RAS interrupts
>   powerpc/64s: remove KVM SKIP test from instruction breakpoint handler
>   KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR
>   KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR

1-11 are pretty small, mostly isolated improvements.

>   KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point
>   KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
>   KVM: PPC: Book3S 64: add hcall interrupt handler
>   KVM: PPC: Book3S 64: Move hcall early register setup to KVM
>   KVM: PPC: Book3S 64: Move interrupt early register setup to KVM
>   KVM: PPC: Book3S 64: move bad_host_intr check to HV handler
>   KVM: PPC: Book3S 64: Minimise hcall handler calling convention
> differences

12-18 includes all the exception-64s.S <-> KVM API changes required. I 
think these changes are improvements in their own right, certainly the 
exception-64s.S side is far nicer.

>   KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together

19 I would like to include because these MMU SPRs have a special 
relationship that can't just be set in any order. This code is also much 
better suited to sim and prototyping work for proposed changes to the 
MMU context switching architecture.

>   KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in C
>   KVM: PPC: Book3S HV P9: Move xive vcpu context management into
> kvmhv_p9_guest_entry

20-21 are stand-alone, I think they're good. Existing asm is duplicated 
in C but the C documents it and anyway matches its inverse which is 
already in C.

And I think it's better to be doing CI MMIOs while we're still mostly in 
host context.

>   KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9
> path

22 move down together with Implement the rest of the P9 path in C.

>   KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest
> LPCR
>   KVM: PPC: Book3S HV P9: Use large decrementer for HDEC
>   KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer
> read
>   KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit
>   KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races
>   KMV: PPC: Book3S HV: Use set_dec to set decrementer to host
>   powerpc/time: add API for KVM to re-arm the host timer/decrementer

23-29 try to get all these timekeeping things in. They ended up being 
mostly unrelated to the C conversion but the way I started out writing 
the C conversion, these changes fell out and ended up collecting here. I 
think they're generally improvements.

That leaves about 20 patches remaining. Of those, only about the first 5 
are necessary to reimplement the existing P9 path functionality in C, 
which is a lot less scary than nearly 50.

Thanks,
Nick

>   KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
>   KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into
> __kvmhv_vcpu_entry_p9
>   KVM: PPC: Book3S HV P9: Read machine check registers while MSR[RI] is
> 0
>   KVM: PPC: Book3S HV P9: Improve exit timing accounting coverage
>   KVM: PPC: Book3S HV P9: Move SPR loading after expiry time check
>   KVM: PPC: Book3S HV P9: Add helpers for OS SPR handling
>   KVM: PPC: Book3S HV P9: Switch to guest MMU context as late as
> possible
>   KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling
> MMU
>   KVM: PPC: Book3S HV: Remove support for dependent threads mode on P9
>   KVM: PPC: Book3S HV: Remove radix guest support from P7/8 path
>   KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers
>   KVM: PPC: Book3S HV: Remove unused nested HV tests in XICS emulation
>   KVM: PPC: Book3S HV P9: Allow all P9 processors to enable nested HV
>   KVM: PPC: Book3S HV: small pseries_do_hcall cleanup
>   KVM: PPC: Book3S HV: add virtual mode handlers for HPT hcalls and page
> faults
>   KVM: PPC: Book3S HV P9: Reflect userspace hcalls to hash guests to
> support PR KVM
>   KVM: PPC: Book3S HV P9: implement hash guest support
>   KVM: PPC: Book3S HV P9: implement hash host

Re: [PATCH-next] powerpc/interrupt: Remove duplicate header file

2021-04-08 Thread Chenyi (Johnny)




在 2021/4/8 12:57, Christophe Leroy 写道:



Le 08/04/2021 à 05:56, johnny.che...@huawei.com a écrit :

From: Chen Yi 

Delete one of the header files  that are included
twice.


Guys, we have been flooded with such tiny patches over the last weeks, 
some changes being sent several times by different people.


That one is included in 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210323062916.295346-1-wanjiab...@vivo.com/ 



And was already submitted a few hours earlier by someone else: 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/1616464656-59372-1-git-send-email-zhouchuan...@vivo.com/ 



Could you work all together and cook an overall patch including all 
duplicate removal from arch/powerpc/ files ?


Best way would be I think to file an issue at 
https://github.com/linuxppc/issues/issues , then you do a complete 
analysis and list in the issue all places to be modified, then once the 
analysis is complete you send a full single patch.


Thanks
Christophe


Dear Christophe,
	Thanks for your reply, I have checked that thers is no header files 
which has been included twice by mistake in arch/powerpc/.I would file 
an issue next time.


Best regards,
Chen Yi




Signed-off-by: Chen Yi 
---
  arch/powerpc/kernel/interrupt.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt.c 
b/arch/powerpc/kernel/interrupt.c

index c4dd4b8f9cfa..f64ace0208b7 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -7,7 +7,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 


.


Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-08 Thread Alexey Kardashevskiy




On 08/04/2021 15:37, Michael Ellerman wrote:

Leonardo Bras  writes:

According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
will let the OS know all possible pagesizes that can be used for creating a
new DDW.

Currently Linux will only try using 3 of the 8 available options:
4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
128M, 256M and 16G.


Do we know of any hardware & hypervisor combination that will actually
give us bigger pages?



On P8 16MB host pages and 16MB hardware iommu pages worked.

On P9, VM's 16MB IOMMU pages worked on top of 2MB host pages + 2MB 
hardware IOMMU pages.






Enabling bigger pages would be interesting for direct mapping systems
with a lot of RAM, while using less TCE entries.

Signed-off-by: Leonardo Bras 
---
  arch/powerpc/platforms/pseries/iommu.c | 49 ++
  1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..6cda1c92597d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,6 +53,20 @@ enum {
DDW_EXT_QUERY_OUT_SIZE = 2
  };


A comment saying where the values come from would be good.


+#define QUERY_DDW_PGSIZE_4K0x01
+#define QUERY_DDW_PGSIZE_64K   0x02
+#define QUERY_DDW_PGSIZE_16M   0x04
+#define QUERY_DDW_PGSIZE_32M   0x08
+#define QUERY_DDW_PGSIZE_64M   0x10
+#define QUERY_DDW_PGSIZE_128M  0x20
+#define QUERY_DDW_PGSIZE_256M  0x40
+#define QUERY_DDW_PGSIZE_16G   0x80


I'm not sure the #defines really gain us much vs just putting the
literal values in the array below?



Then someone says "u magic values" :) I do not mind either way. Thanks,




+struct iommu_ddw_pagesize {
+   u32 mask;
+   int shift;
+};
+
  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
  {
struct iommu_table_group *table_group;
@@ -1099,6 +1113,31 @@ static void reset_dma_window(struct pci_dev *dev, struct 
device_node *par_dn)
 ret);
  }
  
+/* Returns page shift based on "IO Page Sizes" output at ibm,query-pe-dma-window. See LoPAR */

+static int iommu_get_page_shift(u32 query_page_size)
+{
+   const struct iommu_ddw_pagesize ddw_pagesize[] = {
+   { QUERY_DDW_PGSIZE_16G,  __builtin_ctz(SZ_16G)  },
+   { QUERY_DDW_PGSIZE_256M, __builtin_ctz(SZ_256M) },
+   { QUERY_DDW_PGSIZE_128M, __builtin_ctz(SZ_128M) },
+   { QUERY_DDW_PGSIZE_64M,  __builtin_ctz(SZ_64M)  },
+   { QUERY_DDW_PGSIZE_32M,  __builtin_ctz(SZ_32M)  },
+   { QUERY_DDW_PGSIZE_16M,  __builtin_ctz(SZ_16M)  },
+   { QUERY_DDW_PGSIZE_64K,  __builtin_ctz(SZ_64K)  },
+   { QUERY_DDW_PGSIZE_4K,   __builtin_ctz(SZ_4K)   }
+   };



cheers



--
Alexey


[PATCH -next] powerpc/fadump: make symbol 'rtas_fadump_set_regval' static

2021-04-08 Thread Pu Lehui
Fix sparse warnings:

arch/powerpc/platforms/pseries/rtas-fadump.c:250:6: warning:
 symbol 'rtas_fadump_set_regval' was not declared. Should it be static?

Signed-off-by: Pu Lehui 
---
 arch/powerpc/platforms/pseries/rtas-fadump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c 
b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 81343908ed33..f8f73b47b107 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -247,7 +247,7 @@ static inline int rtas_fadump_gpr_index(u64 id)
return i;
 }
 
-void rtas_fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val)
+static void rtas_fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 
reg_val)
 {
int i;
 
-- 
2.17.1