Re: linux-next: manual merge of the spi tree with the powerpc tree

2021-02-12 Thread Stephen Rothwell
Hi Mark,

On Fri, 12 Feb 2021 12:27:59 + Mark Brown  wrote:
>
> On Fri, Feb 12, 2021 at 03:31:42PM +1100, Stephen Rothwell wrote:
> 
> > BTW Mark: the author's address in 258ea99fe25a uses a non existent domain 
> > :-(  
> 
> Ugh, I think that's something gone wrong with b4 :(  A bit late now to
> try to fix it up.

Not sure about that, the email (following the link to lore from the
commit) has the same address (...@public.gmane.com) and that domain
does not exist. In fact the email headers (in lore) look like this:

From: Sergiu Cuciurean 

To: ,
,

Cc: Sergiu Cuciurean


So I am suprised that it was received by anyone.  Maybe gmane has an
internal reply system that is screwed.
-- 
Cheers,
Stephen Rothwell


pgpwdZmSquIk5.pgp
Description: OpenPGP digital signature


Re: [PATCH v5 10/10] powerpc/signal64: Use __get_user() to copy sigset_t

2021-02-12 Thread Daniel Axtens
"Christopher M. Riedl"  writes:

> Usually sigset_t is exactly 8B which is a "trivial" size and does not
> warrant using __copy_from_user(). Use __get_user() directly in
> anticipation of future work to remove the trivial size optimizations
> from __copy_from_user(). Calling __get_user() also results in a small
> boost to signal handling throughput here.

Modulo the comments from Christophe, this looks good to me. It seems to
do what it says on the tin.

Reviewed-by: Daniel Axtens 

Do you know if this patch is responsible for the slightly increase in
radix performance you observed in your cover letter? The rest of the
series shouldn't really make things faster than the no-KUAP case...

Kind regards,
Daniel

>
> Signed-off-by: Christopher M. Riedl 
> ---
>  arch/powerpc/kernel/signal_64.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 817b64e1e409..42fdc4a7ff72 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -97,6 +97,14 @@ static void prepare_setup_sigcontext(struct task_struct 
> *tsk, int ctx_has_vsx_re
>  #endif /* CONFIG_VSX */
>  }
>  
> +static inline int get_user_sigset(sigset_t *dst, const sigset_t *src)
> +{
> + if (sizeof(sigset_t) <= 8)
> + return __get_user(dst->sig[0], >sig[0]);
> + else
> + return __copy_from_user(dst, src, sizeof(sigset_t));
> +}
> +
>  /*
>   * Set up the sigcontext for the signal frame.
>   */
> @@ -701,8 +709,9 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, 
> old_ctx,
>* We kill the task with a SIGSEGV in this situation.
>*/
>  
> - if (__copy_from_user(, _ctx->uc_sigmask, sizeof(set)))
> + if (get_user_sigset(, _ctx->uc_sigmask))
>   do_exit(SIGSEGV);
> +
>   set_current_blocked();
>  
>   if (!user_read_access_begin(new_ctx, ctx_size))
> @@ -740,8 +749,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
>   if (!access_ok(uc, sizeof(*uc)))
>   goto badframe;
>  
> - if (__copy_from_user(, >uc_sigmask, sizeof(set)))
> + if (get_user_sigset(, >uc_sigmask))
>   goto badframe;
> +
>   set_current_blocked();
>  
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> -- 
> 2.26.1


Re: [PATCH v5 07/10] powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext()

2021-02-12 Thread Daniel Axtens
Hi Chris,

> Previously restore_sigcontext() performed a costly KUAP switch on every
> uaccess operation. These repeated uaccess switches cause a significant
> drop in signal handling performance.
>
> Rewrite restore_sigcontext() to assume that a userspace read access
> window is open. Replace all uaccess functions with their 'unsafe'
> versions which avoid the repeated uaccess switches.
>

Much of the same comments apply here as to the last patch:
 - the commit message might be improved by adding that you are also
   changing the calling functions to open the uaccess window before
   calling into the new unsafe functions

 - I have checked that the safe to unsafe conversions look right.

 - I think you're opening too wide a window in user_read_access_begin,
   it seems to me that it could be reduced to just the
   uc_mcontext. (Again, not that it makes a difference with the current
   HW.)

Kind regards,
Daniel

> Signed-off-by: Christopher M. Riedl 
> ---
>  arch/powerpc/kernel/signal_64.c | 68 -
>  1 file changed, 41 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 4248e4489ff1..d668f8af18fe 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -326,14 +326,14 @@ static long setup_tm_sigcontexts(struct sigcontext 
> __user *sc,
>  /*
>   * Restore the sigcontext from the signal frame.
>   */
> -
> -static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int 
> sig,
> -   struct sigcontext __user *sc)
> +#define unsafe_restore_sigcontext(tsk, set, sig, sc, e) \
> + unsafe_op_wrap(__unsafe_restore_sigcontext(tsk, set, sig, sc), e)
> +static long notrace __unsafe_restore_sigcontext(struct task_struct *tsk, 
> sigset_t *set,
> + int sig, struct sigcontext 
> __user *sc)
>  {
>  #ifdef CONFIG_ALTIVEC
>   elf_vrreg_t __user *v_regs;
>  #endif
> - unsigned long err = 0;
>   unsigned long save_r13 = 0;
>   unsigned long msr;
>   struct pt_regs *regs = tsk->thread.regs;
> @@ -348,27 +348,28 @@ static long restore_sigcontext(struct task_struct *tsk, 
> sigset_t *set, int sig,
>   save_r13 = regs->gpr[13];
>  
>   /* copy the GPRs */
> - err |= __copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr));
> - err |= __get_user(regs->nip, >gp_regs[PT_NIP]);
> + unsafe_copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr),
> +   efault_out);
> + unsafe_get_user(regs->nip, >gp_regs[PT_NIP], efault_out);
>   /* get MSR separately, transfer the LE bit if doing signal return */
> - err |= __get_user(msr, >gp_regs[PT_MSR]);
> + unsafe_get_user(msr, >gp_regs[PT_MSR], efault_out);
>   if (sig)
>   regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
> - err |= __get_user(regs->orig_gpr3, >gp_regs[PT_ORIG_R3]);
> - err |= __get_user(regs->ctr, >gp_regs[PT_CTR]);
> - err |= __get_user(regs->link, >gp_regs[PT_LNK]);
> - err |= __get_user(regs->xer, >gp_regs[PT_XER]);
> - err |= __get_user(regs->ccr, >gp_regs[PT_CCR]);
> + unsafe_get_user(regs->orig_gpr3, >gp_regs[PT_ORIG_R3], efault_out);
> + unsafe_get_user(regs->ctr, >gp_regs[PT_CTR], efault_out);
> + unsafe_get_user(regs->link, >gp_regs[PT_LNK], efault_out);
> + unsafe_get_user(regs->xer, >gp_regs[PT_XER], efault_out);
> + unsafe_get_user(regs->ccr, >gp_regs[PT_CCR], efault_out);
>   /* Don't allow userspace to set SOFTE */
>   set_trap_norestart(regs);
> - err |= __get_user(regs->dar, >gp_regs[PT_DAR]);
> - err |= __get_user(regs->dsisr, >gp_regs[PT_DSISR]);
> - err |= __get_user(regs->result, >gp_regs[PT_RESULT]);
> + unsafe_get_user(regs->dar, >gp_regs[PT_DAR], efault_out);
> + unsafe_get_user(regs->dsisr, >gp_regs[PT_DSISR], efault_out);
> + unsafe_get_user(regs->result, >gp_regs[PT_RESULT], efault_out);
>  
>   if (!sig)
>   regs->gpr[13] = save_r13;
>   if (set != NULL)
> - err |=  __get_user(set->sig[0], >oldmask);
> + unsafe_get_user(set->sig[0], >oldmask, efault_out);
>  
>   /*
>* Force reload of FP/VEC.
> @@ -378,29 +379,28 @@ static long restore_sigcontext(struct task_struct *tsk, 
> sigset_t *set, int sig,
>   regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
>  
>  #ifdef CONFIG_ALTIVEC
> - err |= __get_user(v_regs, >v_regs);
> - if (err)
> - return err;
> + unsafe_get_user(v_regs, >v_regs, efault_out);
>   if (v_regs && !access_ok(v_regs, 34 * sizeof(vector128)))
>   return -EFAULT;
>   /* Copy 33 vec registers (vr0..31 and vscr) from the stack */
>   if (v_regs != NULL && (msr & MSR_VEC) != 0) {
> - err |= __copy_from_user(>thread.vr_state, v_regs,
> - 33 * sizeof(vector128));
> +   

Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump

2021-02-12 Thread kernel test robot
Hi Greg,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master v5.11-rc7 next-20210211]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Greg-Kurz/powerpc-pseries-Don-t-enforce-MSI-affinity-with-kdump/20210213-004658
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/1e5f7523fcfc57ab9437b8c7b29a974b62bde79d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Greg-Kurz/powerpc-pseries-Don-t-enforce-MSI-affinity-with-kdump/20210213-004658
git checkout 1e5f7523fcfc57ab9437b8c7b29a974b62bde79d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/pseries/msi.c: In function 'rtas_setup_msi_irqs':
>> arch/powerpc/platforms/pseries/msi.c:478:7: error: implicit declaration of 
>> function 'is_kdump_kernel' [-Werror=implicit-function-declaration]
 478 |   if (is_kdump_kernel())
 |   ^~~
   cc1: some warnings being treated as errors


vim +/is_kdump_kernel +478 arch/powerpc/platforms/pseries/msi.c

   369  
   370  static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int 
type)
   371  {
   372  struct pci_dn *pdn;
   373  int hwirq, virq, i, quota, rc;
   374  struct msi_desc *entry;
   375  struct msi_msg msg;
   376  int nvec = nvec_in;
   377  int use_32bit_msi_hack = 0;
   378  
   379  if (type == PCI_CAP_ID_MSIX)
   380  rc = check_req_msix(pdev, nvec);
   381  else
   382  rc = check_req_msi(pdev, nvec);
   383  
   384  if (rc)
   385  return rc;
   386  
   387  quota = msi_quota_for_device(pdev, nvec);
   388  
   389  if (quota && quota < nvec)
   390  return quota;
   391  
   392  if (type == PCI_CAP_ID_MSIX && check_msix_entries(pdev))
   393  return -EINVAL;
   394  
   395  /*
   396   * Firmware currently refuse any non power of two allocation
   397   * so we round up if the quota will allow it.
   398   */
   399  if (type == PCI_CAP_ID_MSIX) {
   400  int m = roundup_pow_of_two(nvec);
   401  quota = msi_quota_for_device(pdev, m);
   402  
   403  if (quota >= m)
   404  nvec = m;
   405  }
   406  
   407  pdn = pci_get_pdn(pdev);
   408  
   409  /*
   410   * Try the new more explicit firmware interface, if that fails 
fall
   411   * back to the old interface. The old interface is known to 
never
   412   * return MSI-Xs.
   413   */
   414  again:
   415  if (type == PCI_CAP_ID_MSI) {
   416  if (pdev->no_64bit_msi) {
   417  rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, 
nvec);
   418  if (rc < 0) {
   419  /*
   420   * We only want to run the 32 bit MSI 
hack below if
   421   * the max bus speed is Gen2 speed
   422   */
   423  if (pdev->bus->max_bus_speed != 
PCIE_SPEED_5_0GT)
   424  return rc;
   425  
   426  use_32bit_msi_hack = 1;
   427  }
   428  } else
   429  rc = -1;
   430  
   431  if (rc < 0)
   432  rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, 
nvec);
   433  
   434  if (rc < 0) {
   435  pr_debug("rtas_msi: trying the old firmware 
call.\n");
   436  rc = rtas_change_msi(pdn, RTAS_CHANGE_FN, nvec);
   437  }
   438  
   439  if (use_32bit_msi_hack && rc > 0)
   440  rtas_hack_32bit_msi_gen2(pdev);
   441  } else
   442  rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec);
   443  
   444  if (rc != nvec) {
   445   

Re: [RFC PATCH 3/9] KVM: PPC: Book3S 64: add hcall interrupt handler

2021-02-12 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Add a separate hcall entry point. This can be used to deal with the
> different calling convention.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 4 ++--
>  arch/powerpc/kvm/book3s_64_entry.S   | 4 
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index e6f7fc7c61a1..c25395b5921a 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -2028,13 +2028,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>* Requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
>* outside the head section.
>*/
> - __LOAD_FAR_HANDLER(r10, kvmppc_interrupt)
> + __LOAD_FAR_HANDLER(r10, kvmppc_hcall)
>   mtctr   r10
>   ld  r10,PACA_EXGEN+EX_R10(r13)
>   bctr
>  #else
>   ld  r10,PACA_EXGEN+EX_R10(r13)
> - b   kvmppc_interrupt
> + b   kvmppc_hcall
>  #endif
>  #endif
>
> diff --git a/arch/powerpc/kvm/book3s_64_entry.S 
> b/arch/powerpc/kvm/book3s_64_entry.S
> index 8e7216f3c3ee..3b894b90862f 100644
> --- a/arch/powerpc/kvm/book3s_64_entry.S
> +++ b/arch/powerpc/kvm/book3s_64_entry.S
> @@ -9,6 +9,10 @@
>  /*
>   * We come here from the first-level interrupt handlers.
>   */
> +.global  kvmppc_hcall
> +.balign IFETCH_ALIGN_BYTES
> +kvmppc_hcall:
> +
>  .global  kvmppc_interrupt
>  .balign IFETCH_ALIGN_BYTES
>  kvmppc_interrupt:


Re: [RFC PATCH 2/9] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM

2021-02-12 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM
> internal detail that has no real need to be in common handlers.

LGTM,

>
> (XXX: Need to confirm CBE handlers etc)

Do these interrupts exist only in Cell? I see that they set HSRRs and
MSR_HV, but CPU_FTRS_CELL does not contain CPU_HVMODE. So I don't get
why they use the KVM macros.

And for the instruction_breakpoint (0x1300) I think it would help if we
could at least restrict when it is built. But I don't know what
ISA/processor version it is from.

>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 64 
>  arch/powerpc/kvm/book3s_64_entry.S   | 50 ++
>  2 files changed, 42 insertions(+), 72 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 65659ea3cec4..e6f7fc7c61a1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -133,7 +133,6 @@ name:
>  #define IBRANCH_TO_COMMON.L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch 
> to common */
>  #define IREALMODE_COMMON .L_IREALMODE_COMMON_\name\() /* Common runs in 
> realmode */
>  #define IMASK.L_IMASK_\name\()   /* IRQ soft-mask bit */
> -#define IKVM_SKIP.L_IKVM_SKIP_\name\()   /* Generate KVM skip handler */
>  #define IKVM_REAL.L_IKVM_REAL_\name\()   /* Real entry tests KVM */
>  #define __IKVM_REAL(name).L_IKVM_REAL_ ## name
>  #define IKVM_VIRT.L_IKVM_VIRT_\name\()   /* Virt entry tests KVM */
> @@ -191,9 +190,6 @@ do_define_int n
>   .ifndef IMASK
>   IMASK=0
>   .endif
> - .ifndef IKVM_SKIP
> - IKVM_SKIP=0
> - .endif
>   .ifndef IKVM_REAL
>   IKVM_REAL=0
>   .endif
> @@ -254,15 +250,10 @@ do_define_int n
>   .balign IFETCH_ALIGN_BYTES
>  \name\()_kvm:
>
> - .if IKVM_SKIP
> - cmpwi   r10,KVM_GUEST_MODE_SKIP
> - beq 89f
> - .else
>  BEGIN_FTR_SECTION
>   ld  r10,IAREA+EX_CFAR(r13)
>   std r10,HSTATE_CFAR(r13)
>  END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
> - .endif
>
>   ld  r10,IAREA+EX_CTR(r13)
>   mtctr   r10
> @@ -289,27 +280,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>   ori r12,r12,(IVEC)
>   .endif
>   b   kvmppc_interrupt
> -
> - .if IKVM_SKIP
> -89:  mtocrf  0x80,r9
> - ld  r10,IAREA+EX_CTR(r13)
> - mtctr   r10
> - ld  r9,IAREA+EX_R9(r13)
> - ld  r10,IAREA+EX_R10(r13)
> - ld  r11,IAREA+EX_R11(r13)
> - ld  r12,IAREA+EX_R12(r13)
> - .if IHSRR_IF_HVMODE
> - BEGIN_FTR_SECTION
> - b   kvmppc_skip_Hinterrupt
> - FTR_SECTION_ELSE
> - b   kvmppc_skip_interrupt
> - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
> - .elseif IHSRR
> - b   kvmppc_skip_Hinterrupt
> - .else
> - b   kvmppc_skip_interrupt
> - .endif
> - .endif
>  .endm
>
>  #else
> @@ -1128,7 +1098,6 @@ INT_DEFINE_BEGIN(machine_check)
>   ISET_RI=0
>   IDAR=1
>   IDSISR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  INT_DEFINE_END(machine_check)
>
> @@ -1419,7 +1388,6 @@ INT_DEFINE_BEGIN(data_access)
>   IVEC=0x300
>   IDAR=1
>   IDSISR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  INT_DEFINE_END(data_access)
>
> @@ -1469,7 +1437,6 @@ INT_DEFINE_BEGIN(data_access_slb)
>   IAREA=PACA_EXSLB
>   IRECONCILE=0
>   IDAR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  INT_DEFINE_END(data_access_slb)
>
> @@ -2116,7 +2083,6 @@ INT_DEFINE_BEGIN(h_data_storage)
>   IHSRR=1
>   IDAR=1
>   IDSISR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>   IKVM_VIRT=1
>  INT_DEFINE_END(h_data_storage)
> @@ -2573,7 +2539,6 @@ EXC_VIRT_NONE(0x5100, 0x100)
>  INT_DEFINE_BEGIN(cbe_system_error)
>   IVEC=0x1200
>   IHSRR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  INT_DEFINE_END(cbe_system_error)
>
> @@ -2598,7 +2563,6 @@ EXC_VIRT_NONE(0x5200, 0x100)
>  INT_DEFINE_BEGIN(instruction_breakpoint)
>   IVEC=0x1300
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  #endif
>  INT_DEFINE_END(instruction_breakpoint)
> @@ -2744,7 +2708,6 @@ EXC_COMMON_BEGIN(denorm_exception_common)
>  INT_DEFINE_BEGIN(cbe_maintenance)
>   IVEC=0x1600
>   IHSRR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  INT_DEFINE_END(cbe_maintenance)
>
> @@ -2797,7 +2760,6 @@ EXC_COMMON_BEGIN(altivec_assist_common)
>  INT_DEFINE_BEGIN(cbe_thermal)
>   IVEC=0x1800
>   IHSRR=1
> - IKVM_SKIP=1
>   IKVM_REAL=1
>  INT_DEFINE_END(cbe_thermal)
>
> @@ -3081,32 +3043,6 @@ EXPORT_SYMBOL(do_uaccess_flush)
>  MASKED_INTERRUPT
>  MASKED_INTERRUPT hsrr=1
>
> -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> -kvmppc_skip_interrupt:
> - /*
> -  * Here all GPRs are unchanged from when the interrupt happened
> -  * except for r13, which is saved in SPRG_SCRATCH0.
> -  */
> - mfspr   

Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function

2021-02-12 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> On 2/12/21 10:24 AM, Rob Herring wrote:
>> On Fri, Feb 12, 2021 at 11:19 AM Lakshmi Ramasubramanian
>>  wrote:
>>>
>>> On 2/12/21 6:38 AM, Rob Herring wrote:
 On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
  wrote:
>
> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
>>
>> There's actually a complication that I just noticed and needs to be
>> addressed. More below.
>>
>
> <...>
>
>>> +
>>> +/*
>>> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened 
>>> Device Tree
>>> + *
>>> + * @image:  kexec image being loaded.
>>> + * @initrd_load_addr:   Address where the next initrd will be 
>>> loaded.
>>> + * @initrd_len: Size of the next initrd, or 0 if there 
>>> will be none.
>>> + * @cmdline:Command line for the next kernel, or NULL 
>>> if there will
>>> + *  be none.
>>> + *
>>> + * Return: fdt on success, or NULL errno on error.
>>> + */
>>> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>>> +   unsigned long initrd_load_addr,
>>> +   unsigned long initrd_len,
>>> +   const char *cmdline)
>>> +{
>>> +void *fdt;
>>> +int ret, chosen_node;
>>> +const void *prop;
>>> +unsigned long fdt_size;
>>> +
>>> +fdt_size = fdt_totalsize(initial_boot_params) +
>>> +   (cmdline ? strlen(cmdline) : 0) +
>>> +   FDT_EXTRA_SPACE;
>>
>> Just adding 4 KB to initial_boot_params won't be enough for crash
>> kernels on ppc64. The current powerpc code doubles the size of
>> initial_boot_params (which is normally larger than 4 KB) and even that
>> isn't enough. A patch was added to powerpc/next today which uses a more
>> precise (but arch-specific) formula:
>>
>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
>>
>> So I believe we need a hook here where architectures can provide their
>> own specific calculation for the size of the fdt. Perhaps a weakly
>> defined function providing a default implementation which an
>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
>>
>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
>> function from the patch I linked above.
>>
>
> Do you think it'd better to add "fdt_size" parameter to
> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
> desired FDT buffer size?

 Yes, I guess so. But please define the param as extra size, not total
 size. The kernel command line size addition can be in the common code.
>>>
>>> Will do. Just to clarify -
>>>
>>> The common code will do:
>>>
>>> fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size
>>>
>>> The caller will pass "extra_fdt_size"
>>> ARM64 => 4KB
>>> PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when
>>> the patch Thiago had referred to is merged.
>> Yes, I'd leave the 4KB in there by default and arm64 use 0.
>> 
>
> Sounds good.
>
> common:
> fdt_totalsize(initial_boot_params) + strlen(cmdline) + 0x1000 + extra
>
> arm64 => 0 for extra
> ppc => fdt_totalsize(initial_boot_params) for extra.

Looks good to me.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function

2021-02-12 Thread Lakshmi Ramasubramanian

On 2/12/21 10:24 AM, Rob Herring wrote:

On Fri, Feb 12, 2021 at 11:19 AM Lakshmi Ramasubramanian
 wrote:


On 2/12/21 6:38 AM, Rob Herring wrote:

On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
 wrote:


On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:


There's actually a complication that I just noticed and needs to be
addressed. More below.



<...>


+
+/*
+ * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
+ *
+ * @image:  kexec image being loaded.
+ * @initrd_load_addr:   Address where the next initrd will be loaded.
+ * @initrd_len: Size of the next initrd, or 0 if there will be 
none.
+ * @cmdline:Command line for the next kernel, or NULL if there 
will
+ *  be none.
+ *
+ * Return: fdt on success, or NULL errno on error.
+ */
+void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
+   unsigned long initrd_load_addr,
+   unsigned long initrd_len,
+   const char *cmdline)
+{
+void *fdt;
+int ret, chosen_node;
+const void *prop;
+unsigned long fdt_size;
+
+fdt_size = fdt_totalsize(initial_boot_params) +
+   (cmdline ? strlen(cmdline) : 0) +
+   FDT_EXTRA_SPACE;


Just adding 4 KB to initial_boot_params won't be enough for crash
kernels on ppc64. The current powerpc code doubles the size of
initial_boot_params (which is normally larger than 4 KB) and even that
isn't enough. A patch was added to powerpc/next today which uses a more
precise (but arch-specific) formula:

https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/

So I believe we need a hook here where architectures can provide their
own specific calculation for the size of the fdt. Perhaps a weakly
defined function providing a default implementation which an
arch-specific file can override (a la arch_kexec_kernel_image_load())?

Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
function from the patch I linked above.



Do you think it'd better to add "fdt_size" parameter to
of_kexec_alloc_and_setup_fdt() so that the caller can provide the
desired FDT buffer size?


Yes, I guess so. But please define the param as extra size, not total
size. The kernel command line size addition can be in the common code.


Will do. Just to clarify -

The common code will do:

fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size

The caller will pass "extra_fdt_size"
ARM64 => 4KB
PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when
the patch Thiago had referred to is merged.


Yes, I'd leave the 4KB in there by default and arm64 use 0.



Sounds good.

common:
fdt_totalsize(initial_boot_params) + strlen(cmdline) + 0x1000 + extra

arm64 => 0 for extra
ppc => fdt_totalsize(initial_boot_params) for extra.

 -lakshmi



Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function

2021-02-12 Thread Rob Herring
On Fri, Feb 12, 2021 at 11:19 AM Lakshmi Ramasubramanian
 wrote:
>
> On 2/12/21 6:38 AM, Rob Herring wrote:
> > On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
> >  wrote:
> >>
> >> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
> >>>
> >>> There's actually a complication that I just noticed and needs to be
> >>> addressed. More below.
> >>>
> >>
> >> <...>
> >>
>  +
>  +/*
>  + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened 
>  Device Tree
>  + *
>  + * @image:  kexec image being loaded.
>  + * @initrd_load_addr:   Address where the next initrd will be 
>  loaded.
>  + * @initrd_len: Size of the next initrd, or 0 if there will 
>  be none.
>  + * @cmdline:Command line for the next kernel, or NULL 
>  if there will
>  + *  be none.
>  + *
>  + * Return: fdt on success, or NULL errno on error.
>  + */
>  +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
>  +   unsigned long initrd_load_addr,
>  +   unsigned long initrd_len,
>  +   const char *cmdline)
>  +{
>  +void *fdt;
>  +int ret, chosen_node;
>  +const void *prop;
>  +unsigned long fdt_size;
>  +
>  +fdt_size = fdt_totalsize(initial_boot_params) +
>  +   (cmdline ? strlen(cmdline) : 0) +
>  +   FDT_EXTRA_SPACE;
> >>>
> >>> Just adding 4 KB to initial_boot_params won't be enough for crash
> >>> kernels on ppc64. The current powerpc code doubles the size of
> >>> initial_boot_params (which is normally larger than 4 KB) and even that
> >>> isn't enough. A patch was added to powerpc/next today which uses a more
> >>> precise (but arch-specific) formula:
> >>>
> >>> https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
> >>>
> >>> So I believe we need a hook here where architectures can provide their
> >>> own specific calculation for the size of the fdt. Perhaps a weakly
> >>> defined function providing a default implementation which an
> >>> arch-specific file can override (a la arch_kexec_kernel_image_load())?
> >>>
> >>> Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
> >>> function from the patch I linked above.
> >>>
> >>
> >> Do you think it'd better to add "fdt_size" parameter to
> >> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
> >> desired FDT buffer size?
> >
> > Yes, I guess so. But please define the param as extra size, not total
> > size. The kernel command line size addition can be in the common code.
>
> Will do. Just to clarify -
>
> The common code will do:
>
> fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size
>
> The caller will pass "extra_fdt_size"
> ARM64 => 4KB
> PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when
> the patch Thiago had referred to is merged.

Yes, I'd leave the 4KB in there by default and arm64 use 0.

Rob


Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump

2021-02-12 Thread Cédric Le Goater
On 2/12/21 5:41 PM, Greg Kurz wrote:
> Depending on the number of online CPUs in the original kernel, it is
> likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
> in the affinity mappings provided by irq_create_affinity_masks() are
> thus not started by irq_startup(), as per-design with managed IRQs.
> 
> This can be a problem with multi-queue block devices driven by blk-mq :
> such a non-started IRQ is very likely paired with the single queue
> enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
> causes the device to remain silent and likely hangs the guest at
> some point.
> 
> This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
> Pass MSI affinity to irq_create_mapping()"). Note that this only happens
> with the XIVE interrupt controller because XICS has a workaround to bypass
> affinity, which is activated during kdump with the "noirqdistrib" kernel
> parameter.
> 
> The issue comes from a combination of factors:
> - discrepancy between the number of queues detected by the multi-queue
>   block driver, that was used to create the MSI vectors, and the single
>   queue mode enforced later on by blk-mq because of kdump (i.e. keeping
>   all queues fixes the issue)
> - CPU#0 offline (i.e. kdump always succeed with CPU#0)
> 
> Given that I couldn't reproduce on x86, which seems to always have CPU#0
> online even during kdump, I'm not sure where this should be fixed. Hence
> going for another approach : fine-grained affinity is for performance
> and we don't really care about that during kdump. Simply revert to the
> previous working behavior of ignoring affinity masks in this case only.
> 
> Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to 
> irq_create_mapping()")
> Cc: lviv...@redhat.com
> Cc: sta...@vger.kernel.org
> Signed-off-by: Greg Kurz 


Reviewed-by: Cédric Le Goater 

Thanks for tracking this issue. 

This layer needs a rework. Patches adding a MSI domain should be ready 
in a couple of releases. Hopefully. 

C. 

> ---
>  arch/powerpc/platforms/pseries/msi.c | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c 
> b/arch/powerpc/platforms/pseries/msi.c
> index b3ac2455faad..29d04b83288d 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
> nvec_in, int type)
>   return hwirq;
>   }
>  
> - virq = irq_create_mapping_affinity(NULL, hwirq,
> -entry->affinity);
> + /*
> +  * Depending on the number of online CPUs in the original
> +  * kernel, it is likely for CPU #0 to be offline in a kdump
> +  * kernel. The associated IRQs in the affinity mappings
> +  * provided by irq_create_affinity_masks() are thus not
> +  * started by irq_startup(), as per-design for managed IRQs.
> +  * This can be a problem with multi-queue block devices driven
> +  * by blk-mq : such a non-started IRQ is very likely paired
> +  * with the single queue enforced by blk-mq during kdump (see
> +  * blk_mq_alloc_tag_set()). This causes the device to remain
> +  * silent and likely hangs the guest at some point.
> +  *
> +  * We don't really care for fine-grained affinity when doing
> +  * kdump actually : simply ignore the pre-computed affinity
> +  * masks in this case and let the default mask with all CPUs
> +  * be used when creating the IRQ mappings.
> +  */
> + if (is_kdump_kernel())
> + virq = irq_create_mapping(NULL, hwirq);
> + else
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> +entry->affinity);
>  
>   if (!virq) {
>   pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> 



Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function

2021-02-12 Thread Lakshmi Ramasubramanian

On 2/12/21 6:38 AM, Rob Herring wrote:

On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
 wrote:


On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:


There's actually a complication that I just noticed and needs to be
addressed. More below.



<...>


+
+/*
+ * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree
+ *
+ * @image:  kexec image being loaded.
+ * @initrd_load_addr:   Address where the next initrd will be loaded.
+ * @initrd_len: Size of the next initrd, or 0 if there will be 
none.
+ * @cmdline:Command line for the next kernel, or NULL if there 
will
+ *  be none.
+ *
+ * Return: fdt on success, or NULL errno on error.
+ */
+void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
+   unsigned long initrd_load_addr,
+   unsigned long initrd_len,
+   const char *cmdline)
+{
+void *fdt;
+int ret, chosen_node;
+const void *prop;
+unsigned long fdt_size;
+
+fdt_size = fdt_totalsize(initial_boot_params) +
+   (cmdline ? strlen(cmdline) : 0) +
+   FDT_EXTRA_SPACE;


Just adding 4 KB to initial_boot_params won't be enough for crash
kernels on ppc64. The current powerpc code doubles the size of
initial_boot_params (which is normally larger than 4 KB) and even that
isn't enough. A patch was added to powerpc/next today which uses a more
precise (but arch-specific) formula:

https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/

So I believe we need a hook here where architectures can provide their
own specific calculation for the size of the fdt. Perhaps a weakly
defined function providing a default implementation which an
arch-specific file can override (a la arch_kexec_kernel_image_load())?

Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
function from the patch I linked above.



Do you think it'd better to add "fdt_size" parameter to
of_kexec_alloc_and_setup_fdt() so that the caller can provide the
desired FDT buffer size?


Yes, I guess so. But please define the param as extra size, not total
size. The kernel command line size addition can be in the common code.


Will do. Just to clarify -

The common code will do:

fdt_totalsize(initial_boot_params) + strlen(cmdline) + extra_fdt_size

The caller will pass "extra_fdt_size"
ARM64 => 4KB
PPC64 => fdt_totalsize(initial_boot_params) - which will be updated when 
the patch Thiago had referred to is merged.




The above change is also going to conflict, so I think this may have
to wait. Or I'll take the common and arm bits and powerpc can be
converted next cycle (or after the merge window).



thanks.

 -lakshmi




Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump

2021-02-12 Thread Laurent Vivier
On 12/02/2021 17:41, Greg Kurz wrote:
> Depending on the number of online CPUs in the original kernel, it is
> likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
> in the affinity mappings provided by irq_create_affinity_masks() are
> thus not started by irq_startup(), as per-design with managed IRQs.
> 
> This can be a problem with multi-queue block devices driven by blk-mq :
> such a non-started IRQ is very likely paired with the single queue
> enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
> causes the device to remain silent and likely hangs the guest at
> some point.
> 
> This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
> Pass MSI affinity to irq_create_mapping()"). Note that this only happens
> with the XIVE interrupt controller because XICS has a workaround to bypass
> affinity, which is activated during kdump with the "noirqdistrib" kernel
> parameter.
> 
> The issue comes from a combination of factors:
> - discrepancy between the number of queues detected by the multi-queue
>   block driver, that was used to create the MSI vectors, and the single
>   queue mode enforced later on by blk-mq because of kdump (i.e. keeping
>   all queues fixes the issue)
> - CPU#0 offline (i.e. kdump always succeed with CPU#0)
> 
> Given that I couldn't reproduce on x86, which seems to always have CPU#0
> online even during kdump, I'm not sure where this should be fixed. Hence
> going for another approach : fine-grained affinity is for performance
> and we don't really care about that during kdump. Simply revert to the
> previous working behavior of ignoring affinity masks in this case only.
> 
> Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to 
> irq_create_mapping()")
> Cc: lviv...@redhat.com
> Cc: sta...@vger.kernel.org
> Signed-off-by: Greg Kurz 
> ---
>  arch/powerpc/platforms/pseries/msi.c | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c 
> b/arch/powerpc/platforms/pseries/msi.c
> index b3ac2455faad..29d04b83288d 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
> nvec_in, int type)
>   return hwirq;
>   }
>  
> - virq = irq_create_mapping_affinity(NULL, hwirq,
> -entry->affinity);
> + /*
> +  * Depending on the number of online CPUs in the original
> +  * kernel, it is likely for CPU #0 to be offline in a kdump
> +  * kernel. The associated IRQs in the affinity mappings
> +  * provided by irq_create_affinity_masks() are thus not
> +  * started by irq_startup(), as per-design for managed IRQs.
> +  * This can be a problem with multi-queue block devices driven
> +  * by blk-mq : such a non-started IRQ is very likely paired
> +  * with the single queue enforced by blk-mq during kdump (see
> +  * blk_mq_alloc_tag_set()). This causes the device to remain
> +  * silent and likely hangs the guest at some point.
> +  *
> +  * We don't really care for fine-grained affinity when doing
> +  * kdump actually : simply ignore the pre-computed affinity
> +  * masks in this case and let the default mask with all CPUs
> +  * be used when creating the IRQ mappings.
> +  */
> + if (is_kdump_kernel())
> + virq = irq_create_mapping(NULL, hwirq);
> + else
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> +entry->affinity);
>  
>   if (!virq) {
>   pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> 

Reviewed-by: Laurent Vivier 



[PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump

2021-02-12 Thread Greg Kurz
Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to 
irq_create_mapping()")
Cc: lviv...@redhat.com
Cc: sta...@vger.kernel.org
Signed-off-by: Greg Kurz 
---
 arch/powerpc/platforms/pseries/msi.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index b3ac2455faad..29d04b83288d 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
return hwirq;
}
 
-   virq = irq_create_mapping_affinity(NULL, hwirq,
-  entry->affinity);
+   /*
+* Depending on the number of online CPUs in the original
+* kernel, it is likely for CPU #0 to be offline in a kdump
+* kernel. The associated IRQs in the affinity mappings
+* provided by irq_create_affinity_masks() are thus not
+* started by irq_startup(), as per-design for managed IRQs.
+* This can be a problem with multi-queue block devices driven
+* by blk-mq : such a non-started IRQ is very likely paired
+* with the single queue enforced by blk-mq during kdump (see
+* blk_mq_alloc_tag_set()). This causes the device to remain
+* silent and likely hangs the guest at some point.
+*
+* We don't really care for fine-grained affinity when doing
+* kdump actually : simply ignore the pre-computed affinity
+* masks in this case and let the default mask with all CPUs
+* be used when creating the IRQ mappings.
+*/
+   if (is_kdump_kernel())
+   virq = irq_create_mapping(NULL, hwirq);
+   else
+   virq = irq_create_mapping_affinity(NULL, hwirq,
+  entry->affinity);
 
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.26.2



Re: [PATCH 2/2] powerpc/uaccess: Move might_fault() into user_access_begin()

2021-02-12 Thread Christophe Leroy




Le 08/02/2021 à 14:57, Michael Ellerman a écrit :

We have a might_fault() check in __unsafe_put_user_goto(), but that is
dangerous as it potentially calls lots of code while user access is
enabled.

It also triggers the check Alexey added to the irq restore path to
catch cases like that:

   WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 
arch_local_irq_restore+0x160/0x190
   NIP arch_local_irq_restore+0x160/0x190
   LR  lock_is_held_type+0x140/0x200
   Call Trace:
 0xc0007f392ff8 (unreliable)
 ___might_sleep+0x180/0x320
 __might_fault+0x50/0xe0
 filldir64+0x2d0/0x5d0
 call_filldir+0xc8/0x180
 ext4_readdir+0x948/0xb40
 iterate_dir+0x1ec/0x240
 sys_getdents64+0x80/0x290
 system_call_exception+0x160/0x280
 system_call_common+0xf0/0x27c

So remove the might fault check from unsafe_put_user().

Any call to unsafe_put_user() must be inside a region that's had user
access enabled with user_access_begin(), so move the might_fault() in
there. That also allows us to drop the is_kernel_addr() test, because
there should be no code using user_access_begin() in order to access a
kernel address.


x86 and mips only have might_fault() on get_user() and put_user(), neither on __get_user() nor on 
__put_user() nor on the unsafe alternative.


When have might_fault() in __get_user_nocheck() that is used by __get_user() and 
__get_user_allowed() ie by unsafe_get_user().


Shoudln't those be dropped as well ?

Christophe



Signed-off-by: Michael Ellerman 
---
  arch/powerpc/include/asm/uaccess.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 70347ee34c94..71640eca7341 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -214,8 +214,6 @@ do {
\
  #define __unsafe_put_user_goto(x, ptr, size, label)   \
  do {  \
__typeof__(*(ptr)) __user *__pu_addr = (ptr);   \
-   if (!is_kernel_addr((unsigned long)__pu_addr))  \
-   might_fault();  \
__chk_user_ptr(ptr);\
__put_user_size_goto((x), __pu_addr, (size), label);\
  } while (0)
@@ -494,6 +492,8 @@ extern void memcpy_page_flushcache(char *to, struct page 
*page, size_t offset,
  
  static __must_check inline bool user_access_begin(const void __user *ptr, size_t len)

  {
+   might_fault();
+
if (unlikely(!access_ok(ptr, len)))
return false;
allow_read_write_user((void __user *)ptr, ptr, len);



Re: [PATCH v17 02/10] of: Add a common kexec FDT setup function

2021-02-12 Thread Rob Herring
On Thu, Feb 11, 2021 at 7:17 PM Lakshmi Ramasubramanian
 wrote:
>
> On 2/11/21 5:09 PM, Thiago Jung Bauermann wrote:
> >
> > There's actually a complication that I just noticed and needs to be
> > addressed. More below.
> >
>
> <...>
>
> >> +
> >> +/*
> >> + * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device 
> >> Tree
> >> + *
> >> + * @image:  kexec image being loaded.
> >> + * @initrd_load_addr:   Address where the next initrd will be loaded.
> >> + * @initrd_len: Size of the next initrd, or 0 if there will 
> >> be none.
> >> + * @cmdline:Command line for the next kernel, or NULL if 
> >> there will
> >> + *  be none.
> >> + *
> >> + * Return: fdt on success, or NULL errno on error.
> >> + */
> >> +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
> >> +   unsigned long initrd_load_addr,
> >> +   unsigned long initrd_len,
> >> +   const char *cmdline)
> >> +{
> >> +void *fdt;
> >> +int ret, chosen_node;
> >> +const void *prop;
> >> +unsigned long fdt_size;
> >> +
> >> +fdt_size = fdt_totalsize(initial_boot_params) +
> >> +   (cmdline ? strlen(cmdline) : 0) +
> >> +   FDT_EXTRA_SPACE;
> >
> > Just adding 4 KB to initial_boot_params won't be enough for crash
> > kernels on ppc64. The current powerpc code doubles the size of
> > initial_boot_params (which is normally larger than 4 KB) and even that
> > isn't enough. A patch was added to powerpc/next today which uses a more
> > precise (but arch-specific) formula:
> >
> > https://lore.kernel.org/linuxppc-dev/161243826811.119001.14083048209224609814.stgit@hbathini/
> >
> > So I believe we need a hook here where architectures can provide their
> > own specific calculation for the size of the fdt. Perhaps a weakly
> > defined function providing a default implementation which an
> > arch-specific file can override (a la arch_kexec_kernel_image_load())?
> >
> > Then the powerpc specific hook would be the kexec_fdt_totalsize_ppc64()
> > function from the patch I linked above.
> >
>
> Do you think it'd better to add "fdt_size" parameter to
> of_kexec_alloc_and_setup_fdt() so that the caller can provide the
> desired FDT buffer size?

Yes, I guess so. But please define the param as extra size, not total
size. The kernel command line size addition can be in the common code.

The above change is also going to conflict, so I think this may have
to wait. Or I'll take the common and arm bits and powerpc can be
converted next cycle (or after the merge window).

Rob


Re: linux-next: manual merge of the spi tree with the powerpc tree

2021-02-12 Thread Mark Brown
On Fri, Feb 12, 2021 at 03:31:42PM +1100, Stephen Rothwell wrote:

> BTW Mark: the author's address in 258ea99fe25a uses a non existent domain :-(

Ugh, I think that's something gone wrong with b4 :(  A bit late now to
try to fix it up.


signature.asc
Description: PGP signature


[PATCH for 5.10] powerpc/32: Preserve cr1 in exception prolog stack check to fix build error

2021-02-12 Thread Christophe Leroy
This is backport of 3642eb21256a ("powerpc/32: Preserve cr1 in
exception prolog stack check to fix build error") for kernel 5.10

It fixes the build failure on v5.10 reported by kernel test robot
and by David Michael.

This fix is not in Linux tree yet, it is in next branch in powerpc tree.

(cherry picked from commit 3642eb21256a317ac14e9ed560242c6d20cf06d9)

THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1
Maximum PAGE_SHIFT is 18 for 256k pages so
THREAD_ALIGN_SHIFT is 19 at the maximum.

No need to clobber cr1, it can be preserved when moving r1
into CR when we check stack overflow.

This reduces the number of instructions in Machine Check Exception
prolog and fixes a build failure reported by the kernel test robot
on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That
build failure is due to too many instructions in the prolog hence
not fitting between 0x200 and 0x300. Allthough the problem doesn't
show up in mainline, it is still worth the change.

Fixes: 98bf2d3f4970 ("powerpc/32s: Fix RTAS machine check with VMAP stack")
Cc: sta...@vger.kernel.org
Reported-by: kernel test robot 
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.1612768623.git.christophe.le...@csgroup.eu
---
 arch/powerpc/kernel/head_32.h| 2 +-
 arch/powerpc/kernel/head_book3s_32.S | 6 --
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index c88e66adecb5..fef0b34a77c9 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -56,7 +56,7 @@
 1:
tophys_novmstack r11, r11
 #ifdef CONFIG_VMAP_STACK
-   mtcrf   0x7f, r1
+   mtcrf   0x3f, r1
bt  32 - THREAD_ALIGN_SHIFT, stack_overflow
 #endif
 .endm
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index d66da35f2e8d..2729d8fa6e77 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -280,12 +280,6 @@ MachineCheck:
 7: EXCEPTION_PROLOG_2
addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_CHRP
-#ifdef CONFIG_VMAP_STACK
-   mfspr   r4, SPRN_SPRG_THREAD
-   tovirt(r4, r4)
-   lwz r4, RTAS_SP(r4)
-   cmpwi   cr1, r4, 0
-#endif
beq cr1, machine_check_tramp
twi 31, 0, 0
 #else
-- 
2.25.0