[PATCH] treewide: drop CONFIG_EMBEDDED

2023-08-15 Thread Randy Dunlap
There is only one Kconfig user of CONFIG_EMBEDDED and it can be
switched to EXPERT or "if !ARCH_MULTIPLATFORM" (suggested by Arnd).

Signed-off-by: Randy Dunlap 
Cc: Russell King 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Arnd Bergmann 
Cc: Jason A. Donenfeld 
Cc: wiregu...@lists.zx2c4.com
Cc: linux-a...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: Vineet Gupta 
Cc: Brian Cain 
Cc: linux-hexa...@vger.kernel.org
Cc: Greg Ungerer 
Cc: Geert Uytterhoeven 
Cc: linux-m...@lists.linux-m68k.org
Cc: Michal Simek 
Cc: Thomas Bogendoerfer 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: linux-openr...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Albert Ou 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: linux...@vger.kernel.org
Cc: Max Filippov 
Cc: Josh Triplett 
Cc: Masahiro Yamada 
Cc: linux-kbu...@vger.kernel.org
Cc: Andrew Morton 
---
 arch/arc/configs/axs101_defconfig|2 +-
 arch/arc/configs/axs103_defconfig|2 +-
 arch/arc/configs/axs103_smp_defconfig|2 +-
 arch/arc/configs/haps_hs_smp_defconfig   |2 +-
 arch/arc/configs/hsdk_defconfig  |2 +-
 arch/arc/configs/nsim_700_defconfig  |2 +-
 arch/arc/configs/nsimosci_defconfig  |2 +-
 arch/arc/configs/nsimosci_hs_defconfig   |2 +-
 arch/arc/configs/tb10x_defconfig |2 +-
 arch/arc/configs/vdk_hs38_defconfig  |2 +-
 arch/arc/configs/vdk_hs38_smp_defconfig  |2 +-
 arch/arm/Kconfig |2 +-
 arch/arm/configs/aspeed_g4_defconfig |2 +-
 arch/arm/configs/aspeed_g5_defconfig |2 +-
 arch/arm/configs/at91_dt_defconfig   |2 +-
 arch/arm/configs/axm55xx_defconfig   |2 +-
 arch/arm/configs/bcm2835_defconfig   |2 +-
 arch/arm/configs/clps711x_defconfig  |2 +-
 arch/arm/configs/keystone_defconfig  |2 +-
 arch/arm/configs/lpc18xx_defconfig   |2 +-
 arch/arm/configs/lpc32xx_defconfig   |2 +-
 arch/arm/configs/milbeaut_m10v_defconfig |2 +-
 arch/arm/configs/moxart_defconfig|2 +-
 arch/arm/configs/multi_v4t_defconfig |2 +-
 arch/arm/configs/multi_v7_defconfig  |2 +-
 arch/arm/configs/pxa_defconfig   |2 +-
 arch/arm/configs/qcom_defconfig  |2 +-
 arch/arm/configs/sama5_defconfig |2 +-
 arch/arm/configs/sama7_defconfig |2 +-
 arch/arm/configs/socfpga_defconfig   |2 +-
 arch/arm/configs/stm32_defconfig |2 +-
 arch/arm/configs/tegra_defconfig |2 +-
 arch/arm/configs/vf610m4_defconfig   |2 +-
 arch/hexagon/configs/comet_defconfig |2 +-
 arch/m68k/configs/amcore_defconfig   |2 +-
 arch/m68k/configs/m5475evb_defconfig |2 +-
 arch/m68k/configs/stmark2_defconfig  |2 +-
 arch/microblaze/configs/mmu_defconfig|2 +-
 arch/mips/configs/ath25_defconfig|2 +-
 arch/mips/configs/ath79_defconfig|2 +-
 arch/mips/configs/bcm47xx_defconfig  |2 +-
 arch/mips/configs/ci20_defconfig |2 +-
 arch/mips/configs/cu1000-neo_defconfig   |2 +-
 arch/mips/configs/cu1830-neo_defconfig   |2 +-
 arch/mips/configs/db1xxx_defconfig   |2 +-
 arch/mips/configs/gcw0_defconfig |2 +-
 arch/mips/configs/generic_defconfig  |2 +-
 arch/mips/configs/loongson2k_defconfig   |2 +-
 arch/mips/configs/loongson3_defconfig|2 +-
 arch/mips/configs/malta_qemu_32r6_defconfig  |2 +-
 arch/mips/configs/maltaaprp_defconfig|2 +-
 arch/mips/configs/maltasmvp_defconfig|2 +-
 arch/mips/configs/maltasmvp_eva_defconfig|2 +-
 arch/mips/configs/maltaup_defconfig  |2 +-
 arch/mips/configs/omega2p_defconfig  |2 +-
 arch/mips/configs/pic32mzda_defconfig|2 +-
 arch/mips/configs/qi_lb60_defconfig  |2 +-
 arch/mips/configs/rs90_defconfig |2 +-
 arch/mips/configs/rt305x_defconfig   |2 +-
 arch/mips/configs/vocore2_defconfig  |2 +-
 arch/mips/configs/xway_defconfig |2 

[powerpc:topic/ppc-kvm 4/6] include/linux/kern_levels.h:5:25: warning: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int'

2023-08-15 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/ppc-kvm
head:   c95bf4c16099f2ed29a7f6949559bc4187d30710
commit: adf03dd2204acf7eab000a02bdfe50099f2fa089 [4/6] KVM: PPC: Book3s HV: 
Hold LPIDs in an unsigned long
config: powerpc-allmodconfig 
(https://download.01.org/0day-ci/archive/20230816/202308161348.nnnxubxr-...@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 12.3.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230816/202308161348.nnnxubxr-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308161348.nnnxubxr-...@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:22,
from arch/powerpc/include/asm/bug.h:116,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:6,
from include/linux/pagemap.h:8,
from arch/powerpc/kvm/book3s_hv_uvmem.c:89:
   arch/powerpc/kvm/book3s_hv_uvmem.c: In function 'kvmppc_h_svm_init_done':
>> include/linux/kern_levels.h:5:25: warning: format '%d' expects argument of 
>> type 'int', but argument 2 has type 'long unsigned int' [-Wformat=]
   5 | #define KERN_SOH"\001"  /* ASCII Start Of Header */
 | ^~
   include/linux/printk.h:427:25: note: in definition of macro 
'printk_index_wrap'
 427 | _p_func(_fmt, ##__VA_ARGS__);
   \
 | ^~~~
   include/linux/printk.h:528:9: note: in expansion of macro 'printk'
 528 | printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 | ^~
   include/linux/kern_levels.h:14:25: note: in expansion of macro 'KERN_SOH'
  14 | #define KERN_INFO   KERN_SOH "6"/* informational */
 | ^~~~
   include/linux/printk.h:528:16: note: in expansion of macro 'KERN_INFO'
 528 | printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 |^
   arch/powerpc/kvm/book3s_hv_uvmem.c:860:9: note: in expansion of macro 
'pr_info'
 860 | pr_info("LPID %d went secure\n", kvm->arch.lpid);
 | ^~~


vim +5 include/linux/kern_levels.h

314ba3520e513a Joe Perches 2012-07-30  4  
04d2c8c83d0e3a Joe Perches 2012-07-30 @5  #define KERN_SOH  "\001"  
/* ASCII Start Of Header */
04d2c8c83d0e3a Joe Perches 2012-07-30  6  #define KERN_SOH_ASCII'\001'
04d2c8c83d0e3a Joe Perches 2012-07-30  7  

:: The code at line 5 was first introduced by commit
:: 04d2c8c83d0e3ac5f78aeede51babb3236200112 printk: convert the format for 
KERN_ to a 2 byte pattern

:: TO: Joe Perches 
:: CC: Linus Torvalds 

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH v2] powerpc/ptrace: Split gpr32_set_common

2023-08-15 Thread Michael Ellerman
Christophe Leroy  writes:
> objtool report the following warning:
>
>   arch/powerpc/kernel/ptrace/ptrace-view.o: warning: objtool:
> gpr32_set_common+0x23c (.text+0x860): redundant UACCESS disable
>
> gpr32_set_common() conditionnaly opens and closes UACCESS based on
> whether kbuf point is NULL or not. This is wackelig.
>
> Split gpr32_set_common() in two fonctions, one for user one for
> kernel.
>
> Signed-off-by: Christophe Leroy 
> ---
> v2: Mark gpr32_set_common_kernel() and gpr32_set_common_user() static
> ---
>  arch/powerpc/kernel/ptrace/ptrace-view.c | 106 ++-
>  1 file changed, 67 insertions(+), 39 deletions(-)
>
> diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c 
> b/arch/powerpc/kernel/ptrace/ptrace-view.c
> index 3910cd7bb2d9..42abbed452cd 100644
> --- a/arch/powerpc/kernel/ptrace/ptrace-view.c
> +++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
> @@ -716,73 +716,89 @@ int gpr32_get_common(struct task_struct *target,
>   return membuf_zero(, (ELF_NGREG - PT_REGS_COUNT) * sizeof(u32));
>  }
>  
> -int gpr32_set_common(struct task_struct *target,
> -  const struct user_regset *regset,
> -  unsigned int pos, unsigned int count,
> -  const void *kbuf, const void __user *ubuf,
> -  unsigned long *regs)
> +static int gpr32_set_common_kernel(struct task_struct *target,
> +const struct user_regset *regset,
> +unsigned int pos, unsigned int count,
> +const void *kbuf, unsigned long *regs)
>  {
>   const compat_ulong_t *k = kbuf;
> +
> + pos /= sizeof(compat_ulong_t);
> + count /= sizeof(compat_ulong_t);
> +
> + for (; count > 0 && pos < PT_MSR; --count)
> + regs[pos++] = *k++;
> +
> + if (count > 0 && pos == PT_MSR) {
> + set_user_msr(target, *k++);
> + ++pos;
> + --count;
> + }
> +
> + for (; count > 0 && pos <= PT_MAX_PUT_REG; --count)
> + regs[pos++] = *k++;
> + for (; count > 0 && pos < PT_TRAP; --count, ++pos)
> + ++k;
> +
> + if (count > 0 && pos == PT_TRAP) {
> + set_user_trap(target, *k++);
> + ++pos;
> + --count;
> + }
> +
> + kbuf = k;
> + pos *= sizeof(compat_ulong_t);
> + count *= sizeof(compat_ulong_t);
> + user_regset_copyin_ignore(, , , NULL,
> +   (PT_TRAP + 1) * sizeof(compat_ulong_t), -1);
> + return 0;
> +}
> +
> +static int gpr32_set_common_user(struct task_struct *target,
> +  const struct user_regset *regset,
> +  unsigned int pos, unsigned int count,
> +  const void __user *ubuf, unsigned long *regs)
> +{
>   const compat_ulong_t __user *u = ubuf;
>   compat_ulong_t reg;
>  
> - if (!kbuf && !user_read_access_begin(u, count))
> + if (!user_read_access_begin(u, count))
>   return -EFAULT;
>  
>   pos /= sizeof(reg);
>   count /= sizeof(reg);
>  
> - if (kbuf)
> - for (; count > 0 && pos < PT_MSR; --count)
> - regs[pos++] = *k++;
> - else
> - for (; count > 0 && pos < PT_MSR; --count) {
> - unsafe_get_user(reg, u++, Efault);
> - regs[pos++] = reg;
> - }
> -
> + for (; count > 0 && pos < PT_MSR; --count) {
> + unsafe_get_user(reg, u++, Efault);
> + regs[pos++] = reg;
> + }
>  
>   if (count > 0 && pos == PT_MSR) {
> - if (kbuf)
> - reg = *k++;
> - else
> - unsafe_get_user(reg, u++, Efault);
> + unsafe_get_user(reg, u++, Efault);
>   set_user_msr(target, reg);
>   ++pos;
>   --count;
>   }
>  
> - if (kbuf) {
> - for (; count > 0 && pos <= PT_MAX_PUT_REG; --count)
> - regs[pos++] = *k++;
> - for (; count > 0 && pos < PT_TRAP; --count, ++pos)
> - ++k;
> - } else {
> - for (; count > 0 && pos <= PT_MAX_PUT_REG; --count) {
> - unsafe_get_user(reg, u++, Efault);
> - regs[pos++] = reg;
> - }
> - for (; count > 0 && pos < PT_TRAP; --count, ++pos)
> - unsafe_get_user(reg, u++, Efault);
> + for (; count > 0 && pos <= PT_MAX_PUT_REG; --count) {
> + unsafe_get_user(reg, u++, Efault);
> + regs[pos++] = reg;
>   }
> + for (; count > 0 && pos < PT_TRAP; --count, ++pos)
> + unsafe_get_user(reg, u++, Efault);
>  
>   if (count > 0 && pos == PT_TRAP) {
> - if (kbuf)
> - reg = *k++;
> - else
> - unsafe_get_user(reg, u++, Efault);
> + unsafe_get_user(reg, u++, 

[powerpc:topic/ppc-kvm 3/6] arch/powerpc/kvm/guest-state-buffer.c:505: warning: expecting prototype for kvmppc_gsm_init(). Prototype was for kvmppc_gsm_new() instead

2023-08-15 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/ppc-kvm
head:   c95bf4c16099f2ed29a7f6949559bc4187d30710
commit: 596bda338f11c9d2f860439733e5239100803d6c [3/6] KVM: PPC: Add helper 
library for Guest State Buffers
config: powerpc-allmodconfig 
(https://download.01.org/0day-ci/archive/20230816/202308161359.9d2z52f5-...@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 12.3.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230816/202308161359.9d2z52f5-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308161359.9d2z52f5-...@intel.com/

All warnings (new ones prefixed by >>):

>> arch/powerpc/kvm/guest-state-buffer.c:505: warning: expecting prototype for 
>> kvmppc_gsm_init(). Prototype was for kvmppc_gsm_new() instead
>> arch/powerpc/kvm/guest-state-buffer.c:565: warning: expecting prototype for 
>> kvmppc_gsm_fill_info(). Prototype was for kvmppc_gsm_refresh_info() instead


vim +505 arch/powerpc/kvm/guest-state-buffer.c

   493  
   494  /**
   495   * kvmppc_gsm_init() - creates a new guest state message
   496   * @ops: callbacks
   497   * @data: private data
   498   * @flags: guest wide or thread wide
   499   * @gfp_flags: GFP allocation flags
   500   *
   501   * Returns an initialized guest state message.
   502   */
   503  struct kvmppc_gs_msg *kvmppc_gsm_new(struct kvmppc_gs_msg_ops *ops, 
void *data,
   504   unsigned long flags, gfp_t 
gfp_flags)
 > 505  {
   506  struct kvmppc_gs_msg *gsm;
   507  
   508  gsm = kzalloc(sizeof(*gsm), gfp_flags);
   509  if (!gsm)
   510  return NULL;
   511  
   512  kvmppc_gsm_init(gsm, ops, data, flags);
   513  
   514  return gsm;
   515  }
   516  EXPORT_SYMBOL_GPL(kvmppc_gsm_new);
   517  
   518  /**
   519   * kvmppc_gsm_size() - creates a new guest state message
   520   * @gsm: self
   521   *
   522   * Returns the size required for the message.
   523   */
   524  size_t kvmppc_gsm_size(struct kvmppc_gs_msg *gsm)
   525  {
   526  if (gsm->ops->get_size)
   527  return gsm->ops->get_size(gsm);
   528  return 0;
   529  }
   530  EXPORT_SYMBOL_GPL(kvmppc_gsm_size);
   531  
   532  /**
   533   * kvmppc_gsm_free() - free guest state message
   534   * @gsm: guest state message
   535   *
   536   * Returns the size required for the message.
   537   */
   538  void kvmppc_gsm_free(struct kvmppc_gs_msg *gsm)
   539  {
   540  kfree(gsm);
   541  }
   542  EXPORT_SYMBOL_GPL(kvmppc_gsm_free);
   543  
   544  /**
   545   * kvmppc_gsm_fill_info() - serialises message to guest state buffer 
format
   546   * @gsm: self
   547   * @gsb: buffer to serialise into
   548   */
   549  int kvmppc_gsm_fill_info(struct kvmppc_gs_msg *gsm, struct 
kvmppc_gs_buff *gsb)
   550  {
   551  if (!gsm->ops->fill_info)
   552  return -EINVAL;
   553  
   554  return gsm->ops->fill_info(gsb, gsm);
   555  }
   556  EXPORT_SYMBOL_GPL(kvmppc_gsm_fill_info);
   557  
   558  /**
   559   * kvmppc_gsm_fill_info() - deserialises from guest state buffer
   560   * @gsm: self
   561   * @gsb: buffer to serialise from
   562   */
   563  int kvmppc_gsm_refresh_info(struct kvmppc_gs_msg *gsm,
   564  struct kvmppc_gs_buff *gsb)
 > 565  {

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH v3 4/6] KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long

2023-08-15 Thread Jordan Niethe




On 15/8/23 8:45 pm, Michael Ellerman wrote:

"Nicholas Piggin"  writes:

On Mon Aug 7, 2023 at 11:45 AM AEST, Jordan Niethe wrote:

The LPID register is 32 bits long. The host keeps the lpids for each
guest in an unsigned word struct kvm_arch. Currently, LPIDs are already
limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT.

The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host
for each L2 guest. This value is used as an lpid, e.g. it is the
parameter used by H_RPT_INVALIDATE. To minimize needless special casing
it makes sense to keep this "Guest ID" in struct kvm_arch::lpid.

This means that struct kvm_arch::lpid is too small so prepare for this
and make it an unsigned long. This is not a problem for the KVM-HV and
nestedv1 cases as their lpid values are already limited to valid ranges
so in those contexts the lpid can be used as an unsigned word safely as
needed.

In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already
specified as an unsigned long so change pseries_rpt_invalidate() to
match that.  Update the callers of pseries_rpt_invalidate() to also take
an unsigned long if they take an lpid value.


I don't suppose it would be worth having an lpid_t.


diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 4adff4f1896d..229f0a1ffdd4 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -886,10 +886,10 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, 
u8 prio,
  
  	if (single_escalation)

name = kasprintf(GFP_KERNEL, "kvm-%d-%d",
-vcpu->kvm->arch.lpid, xc->server_num);
+(unsigned int)vcpu->kvm->arch.lpid, 
xc->server_num);
else
name = kasprintf(GFP_KERNEL, "kvm-%d-%d-%d",
-vcpu->kvm->arch.lpid, xc->server_num, prio);
+(unsigned int)vcpu->kvm->arch.lpid, 
xc->server_num, prio);
if (!name) {
pr_err("Failed to allocate escalation irq name for queue %d of VCPU 
%d\n",
   prio, xc->server_num);


I would have thought you'd keep the type and change the format.


Yeah. Don't we risk having ambigious names by discarding the high bits?
Not sure that would be a bug per se, but it could be confusing.


In this context is would always be constrained be the number of LPID 
bits so wouldn't be ambiguous, but I'm going to change the format.




cheers



Re: [PATCH v3 2/6] KVM: PPC: Rename accessor generator macros

2023-08-15 Thread Jordan Niethe




On 14/8/23 6:27 pm, Nicholas Piggin wrote:

On Mon Aug 7, 2023 at 11:45 AM AEST, Jordan Niethe wrote:

More "wrapper" style accessor generating macros will be introduced for
the nestedv2 guest support. Rename the existing macros with more
descriptive names now so there is a consistent naming convention.

Signed-off-by: Jordan Niethe 



---
v3:
   - New to series
---
  arch/powerpc/include/asm/kvm_ppc.h | 60 +++---
  1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index d16d80ad2ae4..b66084a81dd0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -927,19 +927,19 @@ static inline bool kvmppc_shared_big_endian(struct 
kvm_vcpu *vcpu)
  #endif
  }
  
-#define SPRNG_WRAPPER_GET(reg, bookehv_spr)\

+#define KVMPPC_BOOKE_HV_SPRNG_ACESSOR_GET(reg, bookehv_spr)\
  static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)   \
  { \
return mfspr(bookehv_spr);  \
  } \
  
-#define SPRNG_WRAPPER_SET(reg, bookehv_spr)\

+#define KVMPPC_BOOKE_HV_SPRNG_ACESSOR_SET(reg, bookehv_spr)\
  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val) \
  { \
mtspr(bookehv_spr, val);
\
  } \
  
-#define SHARED_WRAPPER_GET(reg, size)	\

+#define KVMPPC_VCPU_SHARED_REGS_ACESSOR_GET(reg, size) \
  static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu) \
  { \
if (kvmppc_shared_big_endian(vcpu)) \
@@ -948,7 +948,7 @@ static inline u##size kvmppc_get_##reg(struct kvm_vcpu 
*vcpu)   \
   return le##size##_to_cpu(vcpu->arch.shared->reg);  \
  } \
  
-#define SHARED_WRAPPER_SET(reg, size)	\

+#define KVMPPC_VCPU_SHARED_REGS_ACESSOR_SET(reg, size) \
  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)   
\
  { \
if (kvmppc_shared_big_endian(vcpu)) \
@@ -957,36 +957,36 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu 
*vcpu, u##size val)   \
   vcpu->arch.shared->reg = cpu_to_le##size(val); \
  } \
  
-#define SHARED_WRAPPER(reg, size)	\

-   SHARED_WRAPPER_GET(reg, size)   \
-   SHARED_WRAPPER_SET(reg, size)   \
+#define KVMPPC_VCPU_SHARED_REGS_ACESSOR(reg, size) 
\
+   KVMPPC_VCPU_SHARED_REGS_ACESSOR_GET(reg, size)  
\
+   KVMPPC_VCPU_SHARED_REGS_ACESSOR_SET(reg, size)  
\
  
-#define SPRNG_WRAPPER(reg, bookehv_spr)	\

-   SPRNG_WRAPPER_GET(reg, bookehv_spr) \
-   SPRNG_WRAPPER_SET(reg, bookehv_spr) \
+#define KVMPPC_BOOKE_HV_SPRNG_ACESSOR(reg, bookehv_spr)
\
+   KVMPPC_BOOKE_HV_SPRNG_ACESSOR_GET(reg, bookehv_spr) 
\
+   KVMPPC_BOOKE_HV_SPRNG_ACESSOR_SET(reg, bookehv_spr) 
\
  
  #ifdef CONFIG_KVM_BOOKE_HV
  
-#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\

-   SPRNG_WRAPPER(reg, bookehv_spr) \
+#define KVMPPC_BOOKE_HV_SPRNG_OR_VCPU_SHARED_REGS_ACCESSOR(reg, size, 
bookehv_spr) \
+   KVMPPC_BOOKE_HV_SPRNG_ACESSOR(reg, bookehv_spr) \
  
  #else
  
-#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\

-   SHARED_WRAPPER(reg, size)   \
+#define KVMPPC_BOOKE_HV_SPRNG_OR_VCPU_SHARED_REGS_ACCESSOR(reg, size, 
bookehv_spr) \
+   KVMPPC_VCPU_SHARED_REGS_ACESSOR(reg, size)  \


Not the greatest name I've ever seen :D Hard to be concice and
consistent though, this is an odd one.


Yes, it is a bit wordy.



Reviewed-by: Nicholas Piggin 


Thanks.



  
  #endif
  
-SHARED_WRAPPER(critical, 64)

-SHARED_SPRNG_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR)

Re: [PATCH v3 4/6] KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long

2023-08-15 Thread Jordan Niethe




On 14/8/23 6:15 pm, David Laight wrote:

From: Jordan Niethe

Sent: 07 August 2023 02:46

The LPID register is 32 bits long. The host keeps the lpids for each
guest in an unsigned word struct kvm_arch. Currently, LPIDs are already
limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT.

The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host
for each L2 guest. This value is used as an lpid, e.g. it is the
parameter used by H_RPT_INVALIDATE. To minimize needless special casing
it makes sense to keep this "Guest ID" in struct kvm_arch::lpid.

This means that struct kvm_arch::lpid is too small so prepare for this
and make it an unsigned long. This is not a problem for the KVM-HV and
nestedv1 cases as their lpid values are already limited to valid ranges
so in those contexts the lpid can be used as an unsigned word safely as
needed.


Shouldn't it be changed to u64?


This will only be for 64-bit PPC so an unsigned long will always be 64 
bits wide, but I can use a u64 instead.




David
  


-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH v3 4/6] KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long

2023-08-15 Thread Jordan Niethe




On 14/8/23 6:12 pm, Nicholas Piggin wrote:

On Mon Aug 7, 2023 at 11:45 AM AEST, Jordan Niethe wrote:

The LPID register is 32 bits long. The host keeps the lpids for each
guest in an unsigned word struct kvm_arch. Currently, LPIDs are already
limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT.

The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host
for each L2 guest. This value is used as an lpid, e.g. it is the
parameter used by H_RPT_INVALIDATE. To minimize needless special casing
it makes sense to keep this "Guest ID" in struct kvm_arch::lpid.

This means that struct kvm_arch::lpid is too small so prepare for this
and make it an unsigned long. This is not a problem for the KVM-HV and
nestedv1 cases as their lpid values are already limited to valid ranges
so in those contexts the lpid can be used as an unsigned word safely as
needed.

In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already
specified as an unsigned long so change pseries_rpt_invalidate() to
match that.  Update the callers of pseries_rpt_invalidate() to also take
an unsigned long if they take an lpid value.


I don't suppose it would be worth having an lpid_t.


I actually introduced that when I was developing for the purpose of 
doing the conversion, but I felt like it was unnecessary in the end, it 
is just a wider integer and it is simpler to treat it that way imho.





diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 4adff4f1896d..229f0a1ffdd4 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -886,10 +886,10 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, 
u8 prio,
  
  	if (single_escalation)

name = kasprintf(GFP_KERNEL, "kvm-%d-%d",
-vcpu->kvm->arch.lpid, xc->server_num);
+(unsigned int)vcpu->kvm->arch.lpid, 
xc->server_num);
else
name = kasprintf(GFP_KERNEL, "kvm-%d-%d-%d",
-vcpu->kvm->arch.lpid, xc->server_num, prio);
+(unsigned int)vcpu->kvm->arch.lpid, 
xc->server_num, prio);
if (!name) {
pr_err("Failed to allocate escalation irq name for queue %d of VCPU 
%d\n",
   prio, xc->server_num);


I would have thought you'd keep the type and change the format.


yeah, I will do that.



Otherwise seems okay too.


Thanks.



Thanks,
Nick



Re: [PATCH v3 1/6] KVM: PPC: Use getters and setters for vcpu register state

2023-08-15 Thread Jordan Niethe




On 14/8/23 6:08 pm, Nicholas Piggin wrote:

On Mon Aug 7, 2023 at 11:45 AM AEST, Jordan Niethe wrote:

There are already some getter and setter functions used for accessing
vcpu register state, e.g. kvmppc_get_pc(). There are also more
complicated examples that are generated by macros like
kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
macro.

In the new PAPR "Nestedv2" API for nested guest partitions the L1 is
required to communicate with the L0 to modify and read nested guest
state.

Prepare to support this by replacing direct accesses to vcpu register
state with wrapper functions. Follow the existing pattern of using
macros to generate individual wrappers. These wrappers will
be augmented for supporting Nestedv2 guests later.

Signed-off-by: Gautam Menghani 
Signed-off-by: Jordan Niethe 
---
v3:
   - Do not add a helper for pvr
   - Use an expression when declaring variable in case
   - Squash in all getters and setters
   - Guatam: Pass vector registers by reference
---
  arch/powerpc/include/asm/kvm_book3s.h  | 123 +-
  arch/powerpc/include/asm/kvm_booke.h   |  10 ++
  arch/powerpc/kvm/book3s.c  |  38 ++---
  arch/powerpc/kvm/book3s_64_mmu_hv.c|   4 +-
  arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
  arch/powerpc/kvm/book3s_64_vio.c   |   4 +-
  arch/powerpc/kvm/book3s_hv.c   | 220 +
  arch/powerpc/kvm/book3s_hv.h   |  58 +++
  arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
  arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
  arch/powerpc/kvm/book3s_hv_ras.c   |   5 +-
  arch/powerpc/kvm/book3s_hv_rm_mmu.c|   8 +-
  arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
  arch/powerpc/kvm/book3s_xive.c |   9 +-
  arch/powerpc/kvm/emulate_loadstore.c   |   2 +-
  arch/powerpc/kvm/powerpc.c |  76 -
  16 files changed, 395 insertions(+), 189 deletions(-)



[snip]


+
  /* Expiry time of vcpu DEC relative to host TB */
  static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
  {
-   return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
+   return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
  }


I don't see kvmppc_get_tb_offset_hv in this patch.


It should be generated by:

KVMPPC_BOOK3S_VCORE_ACCESSOR(tb_offset, 64)




diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7f765d5ad436..738f2ecbe9b9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -347,7 +347,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
unsigned long v, orig_v, gr;
__be64 *hptep;
long int index;
-   int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
+   int virtmode = kvmppc_get_msr(vcpu) & (data ? MSR_DR : MSR_IR);
  
  	if (kvm_is_radix(vcpu->kvm))

return kvmppc_mmu_radix_xlate(vcpu, eaddr, gpte, data, iswrite);


So this isn't _only_ adding new accessors. This should be functionally a
noop, but I think it introduces a branch if PR is defined.


That being checking kvmppc_shared_big_endian()?



Shared page is a slight annoyance for HV, I'd like to get rid of it...
but that's another story. I think the pattern here would be to add a
kvmppc_get_msr_hv() accessor.


Yes, that will work.



And as a nitpick, for anywhere employing existing access functions, gprs
and such, could that be split into its own patch?


Sure will do. One other thing I could do is make the existing functions 
use the macros if they don't already. Do you think that is worth doing?




Looks pretty good aside from those little things.


Thanks.



Thanks,
Nick



[PATCH] powerpc/eeh: Use pci_dev_id() to simplify the code

2023-08-15 Thread Jialin Zhang
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.

Signed-off-by: Jialin Zhang 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index a83cb679dd59..af3a5d37a149 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -855,8 +855,7 @@ static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int 
option)
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pnv_phb *phb = hose->private_data;
struct device_node *dn = pci_device_to_OF_node(pdev);
-   uint64_t id = PCI_SLOT_ID(phb->opal_id,
- (pdev->bus->number << 8) | pdev->devfn);
+   uint64_t id = PCI_SLOT_ID(phb->opal_id, pci_dev_id(pdev));
uint8_t scope;
int64_t rc;
 
-- 
2.25.1



Re: [PATCH v4 6/6] integrity: PowerVM support for loading third party code signing keys

2023-08-15 Thread Mimi Zohar
On Tue, 2023-08-15 at 07:27 -0400, Nayna Jain wrote:
> On secure boot enabled PowerVM LPAR, third party code signing keys are
> needed during early boot to verify signed third party modules. These
> third party keys are stored in moduledb object in the Platform
> KeyStore (PKS).
> 
> Load third party code signing keys onto .secondary_trusted_keys keyring.
> 
> Signed-off-by: Nayna Jain 

Reviewed-and-tested-by: Mimi Zohar 



Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory

2023-08-15 Thread Sean Christopherson
On Tue, Aug 15, 2023, Ackerley Tng wrote:
> Sean Christopherson  writes:
> 
> >> I feel that memslots form a natural way of managing usage of the gmem
> >> file. When a memslot is created, it is using the file; hence we take a
> >> refcount on the gmem file, and as memslots are removed, we drop
> >> refcounts on the gmem file.
> >
> > Yes and no.  It's definitely more natural *if* the goal is to allow 
> > guest_memfd
> > memory to exist without being attached to a VM.  But I'm not at all 
> > convinced
> > that we want to allow that, or that it has desirable properties.  With TDX 
> > and
> > SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM 
> > is
> > very underisable (more below).
> >
> 
> This is a little confusing, with the file/inode split in gmem where the
> physical memory/data is attached to the inode and the file represents
> the VM's view of that memory, won't the memory outlive the VM?

Doh, I overloaded the term "VM".  By "VM" I meant the virtual machine as a 
"thing"
the rest of the world sees and interacts with, not the original "struct kvm" 
object.

Because yes, you're absolutely correct that the memory will outlive "struct 
kvm",
but it won't outlive the virtual machine, and specifically won't outlive the
ASID (SNP) / HKID (TDX) to which it's bound.

> This [1] POC was built based on that premise, that the gmem inode can be
> linked to another file and handed off to another VM, to facilitate
> intra-host migration, where the point is to save the work of rebuilding
> the VM's memory in the destination VM.
> 
> With this, the bindings don't outlive the VM, but the data/memory
> does. I think this split design you proposed is really nice.
> 
> >> The KVM pointer is shared among all the bindings in gmem’s xarray, and we 
> >> can
> >> enforce that a gmem file is used only with one VM:
> >>
> >> + When binding a memslot to the file, if a kvm pointer exists, it must
> >>   be the same kvm as the one in this binding
> >> + When the binding to the last memslot is removed from a file, NULL the
> >>   kvm pointer.
> >
> > Nullifying the KVM pointer isn't sufficient, because without additional 
> > actions
> > userspace could extract data from a VM by deleting its memslots and then 
> > binding
> > the guest_memfd to an attacker controlled VM.  Or more likely with TDX and 
> > SNP,
> > induce badness by coercing KVM into mapping memory into a guest with the 
> > wrong
> > ASID/HKID.
> >
> > I can think of three ways to handle that:
> >
> >   (a) prevent a different VM from *ever* binding to the gmem instance
> >   (b) free/zero physical pages when unbinding
> >   (c) free/zero when binding to a different VM
> >
> > Option (a) is easy, but that pretty much defeats the purpose of decopuling
> > guest_memfd from a VM.
> >
> > Option (b) isn't hard to implement, but it screws up the lifecycle of the 
> > memory,
> > e.g. would require memory when a memslot is deleted.  That isn't 
> > necessarily a
> > deal-breaker, but it runs counter to how KVM memlots currently operate.  
> > Memslots
> > are basically just weird page tables, e.g. deleting a memslot doesn't have 
> > any
> > impact on the underlying data in memory.  TDX throws a wrench in this as 
> > removing
> > a page from the Secure EPT is effectively destructive to the data (can't be 
> > mapped
> > back in to the VM without zeroing the data), but IMO that's an oddity with 
> > TDX and
> > not necessarily something we want to carry over to other VM types.
> >
> > There would also be performance implications (probably a non-issue in 
> > practice),
> > and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.  
> > E.g. what
> > should happen if the last memslot (binding) is deleted, but there 
> > outstanding userspace
> > mappings?
> >
> > Option (c) is better from a lifecycle perspective, but it adds its own 
> > flavor of
> > complexity, e.g. the performant way to reclaim TDX memory requires the TDMR
> > (effectively the VM pointer), and so a deferred relcaim doesn't really work 
> > for
> > TDX.  And I'm pretty sure it *can't* work for SNP, because RMP entries must 
> > not
> > outlive the VM; KVM can't reuse an ASID if there are pages assigned to that 
> > ASID
> > in the RMP, i.e. until all memory belonging to the VM has been fully freed.
> >
> 
> If we are on the same page that the memory should outlive the VM but not
> the bindings, then associating the gmem inode to a new VM should be a
> feature and not a bug.
> 
> What do we want to defend against here?
> 
> (a) Malicious host VMM
> 
> For a malicious host VMM to read guest memory (with TDX and SNP), it can
> create a new VM with the same HKID/ASID as the victim VM, rebind the
> gmem inode to a VM crafted with an image that dumps the memory.
> 
> I believe it is not possible for userspace to arbitrarily select a
> matching HKID unless userspace uses the intra-host migration ioctls, but if 
> the
> migration ioctl is used, then EPTs are migrated 

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

2023-08-15 Thread Erhard Furtner
On Tue, 15 Aug 2023 17:25:13 +
Christophe Leroy  wrote:

> I just sent a patch with additional pr_info() in order to help locate 
> the issue, please provide output with that patch.
> 
> Thanks
> Christophe

Thanks for your efforts Christophe! With the patch applied I get this output + 
a blank black line:

[0.00] printk: bootconsole [udbg0] enabled
[0.00] Total memory = 2048MB; using 4096kB for hash table
[0.00] mapin_ram:125
[0.00] mmu_mapin_ram:169 0 3000 140 200
[0.00] __mmu_mapin_ram:146 0 140
[0.00] __mmu_mapin_ram:155 140
[0.00] __mmu_mapin_ram:146 140 3000
[0.00] __mmu_mapin_ram:155 2000
[0.00] __mapin_ram_chunk:107 2000 3000
[0.00] __mapin_ram_chunk:117
[0.00] mapin_ram:134
[0.00] btext_unmap:129


Afterwards the freeze. On a warm boot after booting another kernel the output 
continues with "[0.00] btext_unmap:131" instead of the black line. Full 
dmesg of a warm boot attached.

Regards,
Erhard


dmesg_65-rc6_g4_01
Description: Binary data


Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory

2023-08-15 Thread Ackerley Tng
Sean Christopherson  writes:

> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> I’d like to propose an alternative to the refcounting approach between
>> the gmem file and associated kvm, where we think of KVM’s memslots as
>> users of the gmem file.
>>
>> Instead of having the gmem file pin the VM (i.e. take a refcount on
>> kvm), we could let memslot take a refcount on the gmem file when the
>> memslots are configured.
>>
>> Here’s a POC patch that flips the refcounting (and modified selftests in
>> the next commit):
>> https://github.com/googleprodkernel/linux-cc/commit/7f487b029b89b9f3e9b094a721bc0772f3c8c797
>>
>> One side effect of having the gmem file pin the VM is that now the gmem
>> file becomes sort of a false handle on the VM:
>>
>> + Closing the file destroys the file pointers in the VM and invalidates
>>   the pointers
>
> Yeah, this is less than ideal.  But, it's also how things operate today.  KVM
> doesn't hold references to VMAs or files, e.g. if userspace munmap()s memory,
> any and all SPTEs pointing at the memory are zapped.  The only difference with
> gmem is that KVM needs to explicitly invalidate file pointers, instead of that
> happening behind the scenes (no more VMAs to find).  Again, I agree the 
> resulting
> code is more complex than I would prefer, but from a userspace perspective I
> don't see this as problematic.
>
>> + Keeping the file open keeps the VM around in the kernel even though
>>   the VM fd may already be closed.
>
> That is perfectly ok.  There is plenty of prior art, as well as plenty of ways
> for userspace to shoot itself in the foot.  E.g. open a stats fd for a vCPU 
> and
> the VM and all its vCPUs will be kept alive.  And conceptually it's sound,
> anything created in the scope of a VM _should_ pin the VM.
>

Thanks for explaining!

>> I feel that memslots form a natural way of managing usage of the gmem
>> file. When a memslot is created, it is using the file; hence we take a
>> refcount on the gmem file, and as memslots are removed, we drop
>> refcounts on the gmem file.
>
> Yes and no.  It's definitely more natural *if* the goal is to allow 
> guest_memfd
> memory to exist without being attached to a VM.  But I'm not at all convinced
> that we want to allow that, or that it has desirable properties.  With TDX and
> SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM is
> very underisable (more below).
>

This is a little confusing, with the file/inode split in gmem where the
physical memory/data is attached to the inode and the file represents
the VM's view of that memory, won't the memory outlive the VM?

This [1] POC was built based on that premise, that the gmem inode can be
linked to another file and handed off to another VM, to facilitate
intra-host migration, where the point is to save the work of rebuilding
the VM's memory in the destination VM.

With this, the bindings don't outlive the VM, but the data/memory
does. I think this split design you proposed is really nice.

>> The KVM pointer is shared among all the bindings in gmem’s xarray, and we can
>> enforce that a gmem file is used only with one VM:
>>
>> + When binding a memslot to the file, if a kvm pointer exists, it must
>>   be the same kvm as the one in this binding
>> + When the binding to the last memslot is removed from a file, NULL the
>>   kvm pointer.
>
> Nullifying the KVM pointer isn't sufficient, because without additional 
> actions
> userspace could extract data from a VM by deleting its memslots and then 
> binding
> the guest_memfd to an attacker controlled VM.  Or more likely with TDX and 
> SNP,
> induce badness by coercing KVM into mapping memory into a guest with the wrong
> ASID/HKID.
>
> I can think of three ways to handle that:
>
>   (a) prevent a different VM from *ever* binding to the gmem instance
>   (b) free/zero physical pages when unbinding
>   (c) free/zero when binding to a different VM
>
> Option (a) is easy, but that pretty much defeats the purpose of decopuling
> guest_memfd from a VM.
>
> Option (b) isn't hard to implement, but it screws up the lifecycle of the 
> memory,
> e.g. would require memory when a memslot is deleted.  That isn't necessarily a
> deal-breaker, but it runs counter to how KVM memlots currently operate.  
> Memslots
> are basically just weird page tables, e.g. deleting a memslot doesn't have any
> impact on the underlying data in memory.  TDX throws a wrench in this as 
> removing
> a page from the Secure EPT is effectively destructive to the data (can't be 
> mapped
> back in to the VM without zeroing the data), but IMO that's an oddity with 
> TDX and
> not necessarily something we want to carry over to other VM types.
>
> There would also be performance implications (probably a non-issue in 
> practice),
> and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.  E.g. 
> what
> should happen if the last memslot (binding) is deleted, but there outstanding 
> userspace
> mappings?
>
> Option (c) 

[PATCH v2] powerpc/32s: Cleanup the mess in __set_pte_at()

2023-08-15 Thread Christophe Leroy
__set_pte_at() handles 3 main cases with #ifdefs plus the 'percpu'
subcase which leads to code duplication.

Rewrite the function using IS_ENABLED() to minimise the total number
of cases and remove duplicated code.

Signed-off-by: Christophe Leroy 
---
v2: Reorganise comments, first case becomes third and third become first.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 77 
 1 file changed, 31 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 7bf1fe7297c6..d49c2a9d4ffe 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -541,58 +541,43 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
newprot)
 
 
 /* This low level function performs the actual PTE insertion
- * Setting the PTE depends on the MMU type and other factors. It's
- * an horrible mess that I'm not going to try to clean up now but
- * I'm keeping it in one place rather than spread around
+ * Setting the PTE depends on the MMU type and other factors.
+ *
+ * First case is 32-bit in UP mode with 32-bit PTEs, we need to preserve
+ * the _PAGE_HASHPTE bit since we may not have invalidated the previous
+ * translation in the hash yet (done in a subsequent flush_tlb_xxx())
+ * and see we need to keep track that this PTE needs invalidating.
+ *
+ * Second case is 32-bit with 64-bit PTE.  In this case, we
+ * can just store as long as we do the two halves in the right order
+ * with a barrier in between. This is possible because we take care,
+ * in the hash code, to pre-invalidate if the PTE was already hashed,
+ * which synchronizes us with any concurrent invalidation.
+ * In the percpu case, we fallback to the simple update preserving
+ * the hash bits (ie, same as the non-SMP case).
+ *
+ * Third case is 32-bit in SMP mode with 32-bit PTEs. We use the
+ * helper pte_update() which does an atomic update. We need to do that
+ * because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
+ * per-CPU PTE such as a kmap_atomic, we also do a simple update preserving
+ * the hash bits instead.
  */
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
 {
-#if defined(CONFIG_SMP) && !defined(CONFIG_PTE_64BIT)
-   /* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use 
the
-* helper pte_update() which does an atomic update. We need to do that
-* because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
-* per-CPU PTE such as a kmap_atomic, we do a simple update preserving
-* the hash bits instead (ie, same as the non-SMP case)
-*/
-   if (percpu)
-   *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
- | (pte_val(pte) & ~_PAGE_HASHPTE));
-   else
-   pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, pte_val(pte), 0);
+   if ((!IS_ENABLED(CONFIG_SMP) && !IS_ENABLED(CONFIG_PTE_64BIT)) || 
percpu) {
+   *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE) |
+ (pte_val(pte) & ~_PAGE_HASHPTE));
+   } else if (IS_ENABLED(CONFIG_PTE_64BIT)) {
+   if (pte_val(*ptep) & _PAGE_HASHPTE)
+   flush_hash_entry(mm, ptep, addr);
 
-#elif defined(CONFIG_PTE_64BIT)
-   /* Second case is 32-bit with 64-bit PTE.  In this case, we
-* can just store as long as we do the two halves in the right order
-* with a barrier in between. This is possible because we take care,
-* in the hash code, to pre-invalidate if the PTE was already hashed,
-* which synchronizes us with any concurrent invalidation.
-* In the percpu case, we also fallback to the simple update preserving
-* the hash bits
-*/
-   if (percpu) {
-   *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
- | (pte_val(pte) & ~_PAGE_HASHPTE));
-   return;
+   asm volatile("stw%X0 %2,%0; eieio; stw%X1 %L2,%1" :
+"=m" (*ptep), "=m" (*((unsigned char *)ptep+4)) :
+"r" (pte) : "memory");
+   } else {
+   pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, pte_val(pte), 0);
}
-   if (pte_val(*ptep) & _PAGE_HASHPTE)
-   flush_hash_entry(mm, ptep, addr);
-   __asm__ __volatile__("\
-   stw%X0 %2,%0\n\
-   eieio\n\
-   stw%X1 %L2,%1"
-   : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
-   : "r" (pte) : "memory");
-
-#else
-   /* Third case is 32-bit hash table in UP mode, we need to preserve
-* the _PAGE_HASHPTE bit since we may not have invalidated the previous
-* translation in the hash yet (done in a subsequent flush_tlb_xxx())
-* and see we need to keep track that this PTE needs 

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

2023-08-15 Thread Christophe Leroy


Le 14/08/2023 à 19:27, Erhard Furtner a écrit :
> On Mon, 14 Aug 2023 09:40:44 +
> Christophe Leroy  wrote:
> 
>> Interesting. That means we get stuck somewhere around  MMU_init()
>>
>> We know that MMU_init_hw() is called and runs at least until:
>>
>>  pr_info("Total memory = %lldMB; using %ldkB for hash table\n",
>>  (unsigned long long)(total_memory >> 20), Hash_size >> 10);
>>
>> But we never reach the print in setup_kuap() which is itself called by
>> set_kup():
>>  pr_info("Activating Kernel Userspace Access Protection\n");
>>
>>
>> Could you try to narrow more the issue by spreading pr_info() at places
>> in the code below and/or the called functions ? Either we never come
>> back from MMU_init_hw(), or one of mapin_ram() btext_unmap()
>> kasan_mmu_init() fails.
>>
>> So the piece of code we are interested in is located in
>> arch/powerpc/mm/init_32.c and is:
>>
>>  /* Initialize the MMU hardware */
>>  if (ppc_md.progress)
>>  ppc_md.progress("MMU:hw init", 0x300);
>> ==>  MMU_init_hw();
>>
>>  /* Map in all of RAM starting at KERNELBASE */
>>  if (ppc_md.progress)
>>  ppc_md.progress("MMU:mapin", 0x301);
>>  mapin_ram();
>>
>>  /* Initialize early top-down ioremap allocator */
>>  ioremap_bot = IOREMAP_TOP;
>>
>>  if (ppc_md.progress)
>>  ppc_md.progress("MMU:exit", 0x211);
>>
>>  /* From now on, btext is no longer BAT mapped if it was at all */
>> #ifdef CONFIG_BOOTX_TEXT
>>  btext_unmap();
>> #endif
>>
>>  kasan_mmu_init();
>>
>> ==>  setup_kup();
> 
> I added a pr_info(); right after MMU_init_hw(); and another one right after 
> setup_kup();.
> 
> Output of PPC_EARLY_DEBUG changes so that I get an additional black blank 
> line after
> [0.00] printk: bootconsole [udbg0] enabled
> [0.00] Total memory = 2048MB; using 4096kB for hash table
> 
> and the freeze afterwards. So it looks like we return from MMU_init_hw() but 
> not from setup_kup().

I just sent a patch with additional pr_info() in order to help locate 
the issue, please provide output with that patch.

Thanks
Christophe


[PATCH] Add pr_info() traces for investigation

2023-08-15 Thread Christophe Leroy
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/btext.c | 2 ++
 arch/powerpc/mm/book3s32/kuap.c | 5 +
 arch/powerpc/mm/book3s32/mmu.c  | 3 +++
 arch/powerpc/mm/kasan/init_32.c | 3 +++
 arch/powerpc/mm/pgtable_32.c| 4 
 5 files changed, 17 insertions(+)

diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index 19e46fd623b0..ec989e1011f0 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -126,7 +126,9 @@ void __init btext_setup_display(int width, int height, int 
depth, int pitch,
 
 void __init btext_unmap(void)
 {
+   pr_info("%s:%d\n", __func__, __LINE__);
boot_text_mapped = 0;
+   pr_info("%s:%d\n", __func__, __LINE__);
 }
 
 /* Here's a small text engine to use during early boot
diff --git a/arch/powerpc/mm/book3s32/kuap.c b/arch/powerpc/mm/book3s32/kuap.c
index 28676cabb005..856d18c135bf 100644
--- a/arch/powerpc/mm/book3s32/kuap.c
+++ b/arch/powerpc/mm/book3s32/kuap.c
@@ -20,17 +20,22 @@ EXPORT_SYMBOL(kuap_unlock_all_ool);
 
 void setup_kuap(bool disabled)
 {
+   pr_info("%s:%d\n", __func__, __LINE__);
if (!disabled) {
+   pr_info("%s:%d\n", __func__, __LINE__);
kuap_lock_all_ool();
init_mm.context.sr0 |= SR_KS;
current->thread.sr0 |= SR_KS;
}
+   pr_info("%s:%d\n", __func__, __LINE__);
 
if (smp_processor_id() != boot_cpuid)
return;
 
+   pr_info("%s:%d\n", __func__, __LINE__);
if (disabled)
static_branch_enable(_kuap_key);
else
pr_info("Activating Kernel Userspace Access Protection\n");
+   pr_info("%s:%d\n", __func__, __LINE__);
 }
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 850783cfa9c7..ab0c19acd3f3 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -143,6 +143,7 @@ static unsigned long __init __mmu_mapin_ram(unsigned long 
base, unsigned long to
 {
int idx;
 
+   pr_info("%s:%d %lx %lx\n", __func__, __LINE__, base, top);
while ((idx = find_free_bat()) != -1 && base != top) {
unsigned int size = bat_block_size(base, top);
 
@@ -151,6 +152,7 @@ static unsigned long __init __mmu_mapin_ram(unsigned long 
base, unsigned long to
setbat(idx, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
base += size;
}
+   pr_info("%s:%d %lx\n", __func__, __LINE__, base);
 
return base;
 }
@@ -164,6 +166,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
size = roundup_pow_of_two((unsigned long)_einittext - PAGE_OFFSET);
setibat(0, PAGE_OFFSET, 0, size, PAGE_KERNEL_X);
 
+   pr_info("%s:%d %lx %lx %lx %lx\n", __func__, __LINE__, base, top, 
border, size);
if (debug_pagealloc_enabled_or_kfence()) {
pr_debug_once("Read-Write memory mapped without BATs\n");
if (base >= border)
diff --git a/arch/powerpc/mm/kasan/init_32.c b/arch/powerpc/mm/kasan/init_32.c
index a70828a6d935..a0f82d9a1fa6 100644
--- a/arch/powerpc/mm/kasan/init_32.c
+++ b/arch/powerpc/mm/kasan/init_32.c
@@ -126,12 +126,15 @@ void __init kasan_mmu_init(void)
 {
int ret;
 
+   pr_info("%s:%d\n", __func__, __LINE__);
if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, 
KASAN_SHADOW_END);
+   pr_info("%s:%d %d\n", __func__, __LINE__, ret);
 
if (ret)
panic("kasan: kasan_init_shadow_page_tables() failed");
}
+   pr_info("%s:%d\n", __func__, __LINE__);
 }
 
 void __init kasan_init(void)
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5c02fd08d61e..d75084f67d9b 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -104,6 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, 
unsigned long top)
phys_addr_t p;
bool ktext;
 
+   pr_info("%s:%d %lx %lx\n", __func__, __LINE__, offset, top);
s = offset;
v = PAGE_OFFSET + s;
p = memstart_addr + s;
@@ -113,6 +114,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, 
unsigned long top)
v += PAGE_SIZE;
p += PAGE_SIZE;
}
+   pr_info("%s:%d\n", __func__, __LINE__);
 }
 
 void __init mapin_ram(void)
@@ -120,6 +122,7 @@ void __init mapin_ram(void)
phys_addr_t base, end;
u64 i;
 
+   pr_info("%s:%d\n", __func__, __LINE__);
for_each_mem_range(i, , ) {
phys_addr_t top = min(end, total_lowmem);
 
@@ -128,6 +131,7 @@ void __init mapin_ram(void)
base = mmu_mapin_ram(base, top);
__mapin_ram_chunk(base, top);
}
+   pr_info("%s:%d\n", __func__, __LINE__);
 }
 
 void mark_initmem_nx(void)
-- 
2.41.0



Re: [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock()

2023-08-15 Thread Hugh Dickins
On Tue, 15 Aug 2023, David Hildenbrand wrote:
> On 15.08.23 08:34, Hugh Dickins wrote:
> > On Mon, 14 Aug 2023, Jann Horn wrote:
> >>
> >>  /* step 4: remove page table */
> >> +   if (strcmp(current->comm, "DELAYME") == 0) {
> >> +   pr_warn("%s: BEGIN DELAY INJECTION\n", __func__);
> >> +   mdelay(5000);
> >> +   pr_warn("%s: END DELAY INJECTION\n", __func__);
> >> +   }
> >>
> >>  /* Huge page lock is still held, so page table must remain empty
> >>  */
> >>  pml = pmd_lock(mm, pmd);
> >>
> >>
> >> And then run the attached reproducer against mm/mm-everything. You
> >> should get this in dmesg:
> >>
> >> [  206.578096] BUG: Bad rss-counter state mm:0942ebea
> >> type:MM_ANONPAGES val:1
> > 
> > Thanks a lot, Jann. I haven't thought about it at all yet; and just
> > tried to reproduce, but haven't yet got the "BUG: Bad rss-counter":
> > just see "Invalid argument" on the UFFDIO_COPY ioctl.
> > Will investigate tomorrow.
> 
> Maybe you're missing a fixup:
> 
> https://lkml.kernel.org/r/20230810192128.1855570-1-axelrasmus...@google.com
> 
> When the src address is not page aligned, UFFDIO_COPY in mm-unstable would
> erroneously fail.

You got it, many thanks David: I had assumed that my next-20230808 tree
would be up-to-date enough, but it wasn't.  Reproduced now.

Hugh 


[PATCH v4 5/6] integrity: PowerVM machine keyring enablement

2023-08-15 Thread Nayna Jain
Update Kconfig to enable machine keyring and limit to CA certificates
on PowerVM. Only key signing CA keys are allowed.

Signed-off-by: Nayna Jain 
Reviewed-and-tested-by: Mimi Zohar 
Reviewed-by: Jarkko Sakkinen 
---
 security/integrity/Kconfig | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
index ec6e0d789da1..232191ee09e3 100644
--- a/security/integrity/Kconfig
+++ b/security/integrity/Kconfig
@@ -67,7 +67,9 @@ config INTEGRITY_MACHINE_KEYRING
depends on SECONDARY_TRUSTED_KEYRING
depends on INTEGRITY_ASYMMETRIC_KEYS
depends on SYSTEM_BLACKLIST_KEYRING
-   depends on LOAD_UEFI_KEYS
+   depends on LOAD_UEFI_KEYS || LOAD_PPC_KEYS
+   select INTEGRITY_CA_MACHINE_KEYRING if LOAD_PPC_KEYS
+   select INTEGRITY_CA_MACHINE_KEYRING_MAX if LOAD_PPC_KEYS
help
 If set, provide a keyring to which Machine Owner Keys (MOK) may
 be added. This keyring shall contain just MOK keys.  Unlike keys
-- 
2.31.1



[PATCH v4 1/6] integrity: PowerVM support for loading CA keys on machine keyring

2023-08-15 Thread Nayna Jain
Keys that derive their trust from an entity such as a security officer,
administrator, system owner, or machine owner are said to have "imputed
trust". CA keys with imputed trust can be loaded onto the machine keyring.
The mechanism for loading these keys onto the machine keyring is platform
dependent.

Load keys stored in the variable trustedcadb onto the .machine keyring
on PowerVM platform.

Signed-off-by: Nayna Jain 
Reviewed-and-tested-by: Mimi Zohar 
---
 .../integrity/platform_certs/keyring_handler.c  |  8 
 .../integrity/platform_certs/keyring_handler.h  |  5 +
 .../integrity/platform_certs/load_powerpc.c | 17 +
 3 files changed, 30 insertions(+)

diff --git a/security/integrity/platform_certs/keyring_handler.c 
b/security/integrity/platform_certs/keyring_handler.c
index 8a1124e4d769..1649d047e3b8 100644
--- a/security/integrity/platform_certs/keyring_handler.c
+++ b/security/integrity/platform_certs/keyring_handler.c
@@ -69,6 +69,14 @@ __init efi_element_handler_t get_handler_for_mok(const 
efi_guid_t *sig_type)
return NULL;
 }
 
+__init efi_element_handler_t get_handler_for_ca_keys(const efi_guid_t 
*sig_type)
+{
+   if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0)
+   return add_to_machine_keyring;
+
+   return NULL;
+}
+
 /*
  * Return the appropriate handler for particular signature list types found in
  * the UEFI dbx and MokListXRT tables.
diff --git a/security/integrity/platform_certs/keyring_handler.h 
b/security/integrity/platform_certs/keyring_handler.h
index 212d894a8c0c..6f15bb4cc8dc 100644
--- a/security/integrity/platform_certs/keyring_handler.h
+++ b/security/integrity/platform_certs/keyring_handler.h
@@ -29,6 +29,11 @@ efi_element_handler_t get_handler_for_db(const efi_guid_t 
*sig_type);
  */
 efi_element_handler_t get_handler_for_mok(const efi_guid_t *sig_type);
 
+/*
+ * Return the handler for particular signature list types for CA keys.
+ */
+efi_element_handler_t get_handler_for_ca_keys(const efi_guid_t *sig_type);
+
 /*
  * Return the handler for particular signature list types found in the dbx.
  */
diff --git a/security/integrity/platform_certs/load_powerpc.c 
b/security/integrity/platform_certs/load_powerpc.c
index 170789dc63d2..339053d9726d 100644
--- a/security/integrity/platform_certs/load_powerpc.c
+++ b/security/integrity/platform_certs/load_powerpc.c
@@ -59,6 +59,7 @@ static __init void *get_cert_list(u8 *key, unsigned long 
keylen, u64 *size)
 static int __init load_powerpc_certs(void)
 {
void *db = NULL, *dbx = NULL, *data = NULL;
+   void *trustedca;
u64 dsize = 0;
u64 offset = 0;
int rc = 0;
@@ -120,6 +121,22 @@ static int __init load_powerpc_certs(void)
kfree(data);
}
 
+   data = get_cert_list("trustedcadb", 12,  );
+   if (!data) {
+   pr_info("Couldn't get trustedcadb list from firmware\n");
+   } else if (IS_ERR(data)) {
+   rc = PTR_ERR(data);
+   pr_err("Error reading trustedcadb from firmware: %d\n", rc);
+   } else {
+   extract_esl(trustedca, data, dsize, offset);
+
+   rc = parse_efi_signature_list("powerpc:trustedca", trustedca, 
dsize,
+ get_handler_for_ca_keys);
+   if (rc)
+   pr_err("Couldn't parse trustedcadb signatures: %d\n", 
rc);
+   kfree(data);
+   }
+
return rc;
 }
 late_initcall(load_powerpc_certs);
-- 
2.31.1



[PATCH v4 0/6] Enable loading local and third party keys on PowerVM guest

2023-08-15 Thread Nayna Jain
On a secure boot enabled PowerVM guest, local and third party code signing
keys are needed to verify signed applications, configuration files, and
kernel modules.

Loading these keys onto either the .secondary_trusted_keys or .ima
keyrings requires the certificates be signed by keys on the
.builtin_trusted_keys, .machine or .secondary_trusted_keys keyrings.

Keys on the .builtin_trusted_keys keyring are trusted because of the chain
of trust from secure boot up to and including the linux kernel.  Keys on
the .machine keyring that derive their trust from an entity such as a
security officer, administrator, system owner, or machine owner are said
to have "imputed trust." The type of certificates and the mechanism for
loading them onto the .machine keyring is platform dependent.

Userspace may load certificates onto the .secondary_trusted_keys or .ima
keyrings. However, keys may also need to be loaded by the kernel if they
are needed for verification in early boot time. On PowerVM guest, third
party code signing keys are loaded from the moduledb variable in the
Platform KeyStore(PKS) onto the .secondary_trusted_keys.

The purpose of this patch set is to allow loading of local and third party
code signing keys on PowerVM.

Changelog:

v4:

* Fixed build error reported by Nageswara R Sastry
, as part of his testing of patches.
* Included Jarkko's and Mimi's feedback

v3:

* Included Jarkko's feedback for Patch 6/6.

v2:

* Patch 5/6: Update CA restriction to allow only key signing CA's.
* Rebase on Jarkko's master tree - 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
* Tested after reverting cfa7522f280aa95 because of build failure due to
this commit.

Nayna Jain (6):
  integrity: PowerVM support for loading CA keys on machine keyring
  integrity: ignore keys failing CA restrictions on non-UEFI platform
  integrity: remove global variable from machine_keyring.c
  integrity: check whether imputed trust is enabled
  integrity: PowerVM machine keyring enablement
  integrity: PowerVM support for loading third party code signing keys

 certs/system_keyring.c| 30 
 include/keys/system_keyring.h |  4 +++
 security/integrity/Kconfig|  4 ++-
 security/integrity/digsig.c   |  2 +-
 security/integrity/integrity.h|  5 +--
 .../platform_certs/keyring_handler.c  | 19 ++-
 .../platform_certs/keyring_handler.h  | 10 ++
 .../integrity/platform_certs/load_powerpc.c   | 34 +++
 .../platform_certs/machine_keyring.c  | 22 +---
 9 files changed, 121 insertions(+), 9 deletions(-)

-- 
2.31.1


[PATCH v4 4/6] integrity: check whether imputed trust is enabled

2023-08-15 Thread Nayna Jain
trust_moklist() is specific to UEFI enabled systems. Other platforms
rely only on the Kconfig.

Define a generic wrapper named imputed_trust_enabled().

Signed-off-by: Nayna Jain 
Reviewed-off-by: Mimi Zohar 
---
 security/integrity/digsig.c|  2 +-
 security/integrity/integrity.h |  5 +++--
 .../integrity/platform_certs/keyring_handler.c |  3 ++-
 .../integrity/platform_certs/machine_keyring.c | 18 --
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
index d0704b1597d4..df387de29bfa 100644
--- a/security/integrity/digsig.c
+++ b/security/integrity/digsig.c
@@ -113,7 +113,7 @@ static int __init __integrity_init_keyring(const unsigned 
int id,
} else {
if (id == INTEGRITY_KEYRING_PLATFORM)
set_platform_trusted_keys(keyring[id]);
-   if (id == INTEGRITY_KEYRING_MACHINE && trust_moklist())
+   if (id == INTEGRITY_KEYRING_MACHINE && imputed_trust_enabled())
set_machine_trusted_keys(keyring[id]);
if (id == INTEGRITY_KEYRING_IMA)
load_module_cert(keyring[id]);
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 7167a6e99bdc..d7553c93f5c0 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -320,13 +320,14 @@ static inline void __init add_to_platform_keyring(const 
char *source,
 
 #ifdef CONFIG_INTEGRITY_MACHINE_KEYRING
 void __init add_to_machine_keyring(const char *source, const void *data, 
size_t len);
-bool __init trust_moklist(void);
+bool __init imputed_trust_enabled(void);
 #else
 static inline void __init add_to_machine_keyring(const char *source,
  const void *data, size_t len)
 {
 }
-static inline bool __init trust_moklist(void)
+
+static inline bool __init imputed_trust_enabled(void)
 {
return false;
 }
diff --git a/security/integrity/platform_certs/keyring_handler.c 
b/security/integrity/platform_certs/keyring_handler.c
index 1649d047e3b8..586027b9a3f5 100644
--- a/security/integrity/platform_certs/keyring_handler.c
+++ b/security/integrity/platform_certs/keyring_handler.c
@@ -61,7 +61,8 @@ __init efi_element_handler_t get_handler_for_db(const 
efi_guid_t *sig_type)
 __init efi_element_handler_t get_handler_for_mok(const efi_guid_t *sig_type)
 {
if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0) {
-   if (IS_ENABLED(CONFIG_INTEGRITY_MACHINE_KEYRING) && 
trust_moklist())
+   if (IS_ENABLED(CONFIG_INTEGRITY_MACHINE_KEYRING) &&
+   imputed_trust_enabled())
return add_to_machine_keyring;
else
return add_to_platform_keyring;
diff --git a/security/integrity/platform_certs/machine_keyring.c 
b/security/integrity/platform_certs/machine_keyring.c
index 9482e16cb2ca..a401640a63cd 100644
--- a/security/integrity/platform_certs/machine_keyring.c
+++ b/security/integrity/platform_certs/machine_keyring.c
@@ -34,7 +34,8 @@ void __init add_to_machine_keyring(const char *source, const 
void *data, size_t
 * If the restriction check does not pass and the platform keyring
 * is configured, try to add it into that keyring instead.
 */
-   if (rc && efi_enabled(EFI_BOOT) && 
IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING))
+   if (rc && efi_enabled(EFI_BOOT) &&
+   IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING))
rc = integrity_load_cert(INTEGRITY_KEYRING_PLATFORM, source,
 data, len, perm);
 
@@ -60,7 +61,7 @@ static __init bool uefi_check_trust_mok_keys(void)
return false;
 }
 
-bool __init trust_moklist(void)
+static bool __init trust_moklist(void)
 {
static bool initialized;
static bool trust_mok;
@@ -75,3 +76,16 @@ bool __init trust_moklist(void)
 
return trust_mok;
 }
+
+/*
+ * Provides platform specific check for trusting imputed keys before loading
+ * on .machine keyring. UEFI systems enable this trust based on a variable,
+ * and for other platforms, it is always enabled.
+ */
+bool __init imputed_trust_enabled(void)
+{
+   if (efi_enabled(EFI_BOOT))
+   return trust_moklist();
+
+   return true;
+}
-- 
2.31.1



Re: [PATCH] powerpc: Make virt_to_pfn() a static inline

2023-08-15 Thread Michael Ellerman
Linus Walleij  writes:
> On Tue, Aug 15, 2023 at 9:30 AM Michael Ellerman  wrote:
>> Linus Walleij  writes:
>
>> > - return ((unsigned long)__va(pmd_val(pmd) & ~PMD_MASKED_BITS));
>> > + return (const void *)((unsigned long)__va(pmd_val(pmd) & 
>> > ~PMD_MASKED_BITS));
>>
>> This can also just be:
>>
>> return __va(pmd_val(pmd) & ~PMD_MASKED_BITS);
>>
>> I've squashed that in.
>
> Oh you applied it, then I don't need to send revised versions, thanks Michael!

Yeah, it's in my next-test, so I can still change it if needed for a day
or two. But if you're happy with me squashing those changes in then
that's easy, no need to send a v2.

cheers


Re: [PATCH] powerpc/32s: Cleanup the mess in __set_pte_at()

2023-08-15 Thread Michael Ellerman
Christophe Leroy  writes:
> __set_pte_at() handles 3 main cases with #ifdefs plus the 'percpu'
> subcase which leads to code duplication.
>
> Rewrite the function using IS_ENABLED() to minimise the total number
> of cases and remove duplicated code.

I think the code change is good, but the comment becomes completely
misleading. Because it talks about "the first case" etc., but then the
code is structured differently.

cheers

> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
> b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 40041ac713d9..2a0ca1f9a1ff 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -534,58 +534,43 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
> newprot)
>  
>  
>  /* This low level function performs the actual PTE insertion
> - * Setting the PTE depends on the MMU type and other factors. It's
> - * an horrible mess that I'm not going to try to clean up now but
> - * I'm keeping it in one place rather than spread around
> + * Setting the PTE depends on the MMU type and other factors.
> + *
> + * First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use the
> + * helper pte_update() which does an atomic update. We need to do that
> + * because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
> + * per-CPU PTE such as a kmap_atomic, we do a simple update preserving
> + * the hash bits instead (ie, same as the non-SMP case)
> + *
> + * Second case is 32-bit with 64-bit PTE.  In this case, we
> + * can just store as long as we do the two halves in the right order
> + * with a barrier in between. This is possible because we take care,
> + * in the hash code, to pre-invalidate if the PTE was already hashed,
> + * which synchronizes us with any concurrent invalidation.
> + * In the percpu case, we also fallback to the simple update preserving
> + * the hash bits
> + *
> + * Third case is 32-bit hash table in UP mode, we need to preserve
> + * the _PAGE_HASHPTE bit since we may not have invalidated the previous
> + * translation in the hash yet (done in a subsequent flush_tlb_xxx())
> + * and see we need to keep track that this PTE needs invalidating
>   */
>  static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>   pte_t *ptep, pte_t pte, int percpu)
>  {
> -#if defined(CONFIG_SMP) && !defined(CONFIG_PTE_64BIT)
> - /* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use 
> the
> -  * helper pte_update() which does an atomic update. We need to do that
> -  * because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
> -  * per-CPU PTE such as a kmap_atomic, we do a simple update preserving
> -  * the hash bits instead (ie, same as the non-SMP case)
> -  */
> - if (percpu)
> - *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
> -   | (pte_val(pte) & ~_PAGE_HASHPTE));
> - else
> - pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, pte_val(pte), 0);
> + if ((!IS_ENABLED(CONFIG_SMP) && !IS_ENABLED(CONFIG_PTE_64BIT)) || 
> percpu) {
> + *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE) |
> +   (pte_val(pte) & ~_PAGE_HASHPTE));
> + } else if (IS_ENABLED(CONFIG_PTE_64BIT)) {
> + if (pte_val(*ptep) & _PAGE_HASHPTE)
> + flush_hash_entry(mm, ptep, addr);
>  
> -#elif defined(CONFIG_PTE_64BIT)
> - /* Second case is 32-bit with 64-bit PTE.  In this case, we
> -  * can just store as long as we do the two halves in the right order
> -  * with a barrier in between. This is possible because we take care,
> -  * in the hash code, to pre-invalidate if the PTE was already hashed,
> -  * which synchronizes us with any concurrent invalidation.
> -  * In the percpu case, we also fallback to the simple update preserving
> -  * the hash bits
> -  */
> - if (percpu) {
> - *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
> -   | (pte_val(pte) & ~_PAGE_HASHPTE));
> - return;
> + asm volatile("stw%X0 %2,%0; eieio; stw%X1 %L2,%1" :
> +  "=m" (*ptep), "=m" (*((unsigned char *)ptep+4)) :
> +  "r" (pte) : "memory");
> + } else {
> + pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, pte_val(pte), 0);
>   }
> - if (pte_val(*ptep) & _PAGE_HASHPTE)
> - flush_hash_entry(mm, ptep, addr);
> - __asm__ __volatile__("\
> - stw%X0 %2,%0\n\
> - eieio\n\
> - stw%X1 %L2,%1"
> - : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> - : "r" (pte) : "memory");
> -
> -#else
> - /* Third case is 32-bit hash table in UP mode, we need to preserve
> -  * the _PAGE_HASHPTE bit since we may not have invalidated the previous
> -  * translation in the hash yet (done in a subsequent flush_tlb_xxx())
> -  

[PATCH v4 6/6] integrity: PowerVM support for loading third party code signing keys

2023-08-15 Thread Nayna Jain
On secure boot enabled PowerVM LPAR, third party code signing keys are
needed during early boot to verify signed third party modules. These
third party keys are stored in moduledb object in the Platform
KeyStore (PKS).

Load third party code signing keys onto .secondary_trusted_keys keyring.

Signed-off-by: Nayna Jain 
---
 certs/system_keyring.c| 30 +++
 include/keys/system_keyring.h |  4 +++
 .../platform_certs/keyring_handler.c  |  8 +
 .../platform_certs/keyring_handler.h  |  5 
 .../integrity/platform_certs/load_powerpc.c   | 17 +++
 5 files changed, 64 insertions(+)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index b348e0898d34..33841c91f12c 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -152,6 +152,36 @@ static __init struct key_restriction 
*get_builtin_and_secondary_restriction(void
 
return restriction;
 }
+
+/**
+ * add_to_secondary_keyring - Add to secondary keyring.
+ * @source: Source of key
+ * @data: The blob holding the key
+ * @len: The length of the data blob
+ *
+ * Add a key to the secondary keyring. The key must be vouched for by a key in 
the builtin,
+ * machine or secondary keyring itself.
+ */
+void __init add_to_secondary_keyring(const char *source, const void *data, 
size_t len)
+{
+   key_ref_t key;
+   key_perm_t perm;
+
+   perm = (KEY_POS_ALL & ~KEY_POS_SETATTR) | KEY_USR_VIEW;
+
+   key = key_create_or_update(make_key_ref(secondary_trusted_keys, 1),
+  "asymmetric",
+  NULL, data, len, perm,
+  KEY_ALLOC_NOT_IN_QUOTA);
+   if (IS_ERR(key)) {
+   pr_err("Problem loading X.509 certificate from %s to secondary 
keyring %ld\n",
+  source, PTR_ERR(key));
+   return;
+   }
+
+   pr_notice("Loaded X.509 cert '%s'\n", key_ref_to_ptr(key)->description);
+   key_ref_put(key);
+}
 #endif
 #ifdef CONFIG_INTEGRITY_MACHINE_KEYRING
 void __init set_machine_trusted_keys(struct key *keyring)
diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
index 7e2583208820..8365adf842ef 100644
--- a/include/keys/system_keyring.h
+++ b/include/keys/system_keyring.h
@@ -50,9 +50,13 @@ int restrict_link_by_digsig_builtin_and_secondary(struct key 
*keyring,
  const struct key_type *type,
  const union key_payload 
*payload,
  struct key *restriction_key);
+void __init add_to_secondary_keyring(const char *source, const void *data, 
size_t len);
 #else
 #define restrict_link_by_builtin_and_secondary_trusted 
restrict_link_by_builtin_trusted
 #define restrict_link_by_digsig_builtin_and_secondary 
restrict_link_by_digsig_builtin
+static inline void __init add_to_secondary_keyring(const char *source, const 
void *data, size_t len)
+{
+}
 #endif
 
 #ifdef CONFIG_INTEGRITY_MACHINE_KEYRING
diff --git a/security/integrity/platform_certs/keyring_handler.c 
b/security/integrity/platform_certs/keyring_handler.c
index 586027b9a3f5..13ea17207902 100644
--- a/security/integrity/platform_certs/keyring_handler.c
+++ b/security/integrity/platform_certs/keyring_handler.c
@@ -78,6 +78,14 @@ __init efi_element_handler_t get_handler_for_ca_keys(const 
efi_guid_t *sig_type)
return NULL;
 }
 
+__init efi_element_handler_t get_handler_for_code_signing_keys(const 
efi_guid_t *sig_type)
+{
+   if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0)
+   return add_to_secondary_keyring;
+
+   return NULL;
+}
+
 /*
  * Return the appropriate handler for particular signature list types found in
  * the UEFI dbx and MokListXRT tables.
diff --git a/security/integrity/platform_certs/keyring_handler.h 
b/security/integrity/platform_certs/keyring_handler.h
index 6f15bb4cc8dc..f92895cc50f6 100644
--- a/security/integrity/platform_certs/keyring_handler.h
+++ b/security/integrity/platform_certs/keyring_handler.h
@@ -34,6 +34,11 @@ efi_element_handler_t get_handler_for_mok(const efi_guid_t 
*sig_type);
  */
 efi_element_handler_t get_handler_for_ca_keys(const efi_guid_t *sig_type);
 
+/*
+ * Return the handler for particular signature list types for code signing 
keys.
+ */
+efi_element_handler_t get_handler_for_code_signing_keys(const efi_guid_t 
*sig_type);
+
 /*
  * Return the handler for particular signature list types found in the dbx.
  */
diff --git a/security/integrity/platform_certs/load_powerpc.c 
b/security/integrity/platform_certs/load_powerpc.c
index 339053d9726d..c85febca3343 100644
--- a/security/integrity/platform_certs/load_powerpc.c
+++ b/security/integrity/platform_certs/load_powerpc.c
@@ -60,6 +60,7 @@ static int __init load_powerpc_certs(void)
 {
void *db = NULL, *dbx = NULL, *data = NULL;
void *trustedca;
+   void 

[PATCH v4 3/6] integrity: remove global variable from machine_keyring.c

2023-08-15 Thread Nayna Jain
trust_mok variable is accessed within a single function locally.

Change trust_mok from global to local static variable.

Signed-off-by: Nayna Jain 
Reviewed-and-tested-by: Mimi Zohar 
Reviewed-by: Jarkko Sakkinen 
---
 security/integrity/platform_certs/machine_keyring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/integrity/platform_certs/machine_keyring.c 
b/security/integrity/platform_certs/machine_keyring.c
index 389a6e7c9245..9482e16cb2ca 100644
--- a/security/integrity/platform_certs/machine_keyring.c
+++ b/security/integrity/platform_certs/machine_keyring.c
@@ -8,8 +8,6 @@
 #include 
 #include "../integrity.h"
 
-static bool trust_mok;
-
 static __init int machine_keyring_init(void)
 {
int rc;
@@ -65,9 +63,11 @@ static __init bool uefi_check_trust_mok_keys(void)
 bool __init trust_moklist(void)
 {
static bool initialized;
+   static bool trust_mok;
 
if (!initialized) {
initialized = true;
+   trust_mok = false;
 
if (uefi_check_trust_mok_keys())
trust_mok = true;
-- 
2.31.1



[PATCH v4 2/6] integrity: ignore keys failing CA restrictions on non-UEFI platform

2023-08-15 Thread Nayna Jain
On non-UEFI platforms, handle restrict_link_by_ca failures differently.

Certificates which do not satisfy CA restrictions on non-UEFI platforms
are ignored.

Signed-off-by: Nayna Jain 
Reviewed-and-tested-by: Mimi Zohar 
Acked-by: Jarkko Sakkinen 
---
 security/integrity/platform_certs/machine_keyring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/platform_certs/machine_keyring.c 
b/security/integrity/platform_certs/machine_keyring.c
index 7aaed7950b6e..389a6e7c9245 100644
--- a/security/integrity/platform_certs/machine_keyring.c
+++ b/security/integrity/platform_certs/machine_keyring.c
@@ -36,7 +36,7 @@ void __init add_to_machine_keyring(const char *source, const 
void *data, size_t
 * If the restriction check does not pass and the platform keyring
 * is configured, try to add it into that keyring instead.
 */
-   if (rc && IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING))
+   if (rc && efi_enabled(EFI_BOOT) && 
IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING))
rc = integrity_load_cert(INTEGRITY_KEYRING_PLATFORM, source,
 data, len, perm);
 
-- 
2.31.1



[PATCH] powerpc: Move zalloc_maybe_bootmem() into pci-common.c

2023-08-15 Thread Christophe Leroy
zalloc_maybe_bootmem() is only used by PCI related functions.

Move it into pci-common.c and remove the always built alloc.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/pci.h   |  2 ++
 arch/powerpc/include/asm/setup.h |  1 -
 arch/powerpc/kernel/pci-common.c | 16 
 arch/powerpc/lib/Makefile|  2 +-
 arch/powerpc/lib/alloc.c | 23 ---
 5 files changed, 19 insertions(+), 25 deletions(-)
 delete mode 100644 arch/powerpc/lib/alloc.c

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index f5078a7dd85a..13d36ec3a5ea 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -45,6 +45,8 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, 
int channel)
return channel ? 15 : 14;
 }
 
+void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
+
 #ifdef CONFIG_PCI
 void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops);
 #else  /* CONFIG_PCI */
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index e29e83f8a89c..eed74c1fb832 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -8,7 +8,6 @@
 extern void ppc_printk_progress(char *s, unsigned short hex);
 
 extern unsigned long long memory_limit;
-extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
 
 struct device_node;
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index e88d7c9feeec..34e66b06a030 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -121,6 +122,21 @@ static int get_phb_number(struct device_node *dn)
return phb_id;
 }
 
+void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask)
+{
+   void *p;
+
+   if (slab_is_available()) {
+   p = kzalloc(size, mask);
+   } else {
+   p = memblock_alloc(size, SMP_CACHE_BYTES);
+   if (!p)
+   panic("%s: Failed to allocate %zu bytes\n", __func__,
+ size);
+   }
+   return p;
+}
+
 struct pci_controller *pcibios_alloc_controller(struct device_node *dev)
 {
struct pci_controller *phb;
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 9aa8286c9687..51ad0397c17a 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -27,7 +27,7 @@ endif
 CFLAGS_code-patching.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 CFLAGS_feature-fixups.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 
-obj-y += alloc.o code-patching.o feature-fixups.o pmem.o
+obj-y += code-patching.o feature-fixups.o pmem.o
 
 obj-$(CONFIG_CODE_PATCHING_SELFTEST) += test-code-patching.o
 
diff --git a/arch/powerpc/lib/alloc.c b/arch/powerpc/lib/alloc.c
deleted file mode 100644
index ce180870bd52..
--- a/arch/powerpc/lib/alloc.c
+++ /dev/null
@@ -1,23 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-
-void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask)
-{
-   void *p;
-
-   if (slab_is_available())
-   p = kzalloc(size, mask);
-   else {
-   p = memblock_alloc(size, SMP_CACHE_BYTES);
-   if (!p)
-   panic("%s: Failed to allocate %zu bytes\n", __func__,
- size);
-   }
-   return p;
-}
-- 
2.41.0



Re: [PATCH v2] powernv/opal-prd: Silence memcpy() run-time false positive warnings

2023-08-15 Thread Michael Ellerman
Joel Stanley  writes:
> On Fri, 7 Jul 2023 at 05:11, Mahesh Salgaonkar  wrote:
>>
>> opal_prd_msg_notifier extracts the opal prd message size from the message
>> header and uses it for allocating opal_prd_msg_queue_item that includes
>> the correct message size to be copied. However, while running under
>> CONFIG_FORTIFY_SOURCE=y, it triggers following run-time warning:
>>
>> [ 6458.234352] memcpy: detected field-spanning write (size 32) of single 
>> field ">msg" at arch/powerpc/platforms/powernv/opal-prd.c:355 (size 4)
>> [ 6458.234390] WARNING: CPU: 9 PID: 660 at 
>> arch/powerpc/platforms/powernv/opal-prd.c:355 
>> opal_prd_msg_notifier+0x174/0x188 [opal_prd]
>> [...]
>> [ 6458.234709] NIP [c0080e0c0e6c] opal_prd_msg_notifier+0x174/0x188 
>> [opal_prd]
>> [ 6458.234723] LR [c0080e0c0e68] opal_prd_msg_notifier+0x170/0x188 
>> [opal_prd]
>> [ 6458.234736] Call Trace:
>> [ 6458.234742] [c002acb23c10] [c0080e0c0e68] 
>> opal_prd_msg_notifier+0x170/0x188 [opal_prd] (unreliable)
>> [ 6458.234759] [c002acb23ca0] [c019ccc0] 
>> notifier_call_chain+0xc0/0x1b0
>> [ 6458.234774] [c002acb23d00] [c019ceac] 
>> atomic_notifier_call_chain+0x2c/0x40
>> [ 6458.234788] [c002acb23d20] [c00d69b4] 
>> opal_message_notify+0xf4/0x2c0
>> [...]
>>
>> Split the copy to avoid false positive run-time warning.
>>
>> Reported-by: Aneesh Kumar K.V 
>> Signed-off-by: Mahesh Salgaonkar 
>
> I hit this on a box running the Ubuntu 6.2.0-27-generic kernel.
>
> Do we plan on merging this fix?

I thought it was papering over the issue rather than fixing the root
cause.

I'll send a new version, as soon as I can work out how to trigger that
code path.

cheers


Re: [PATCH v3 4/6] KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long

2023-08-15 Thread Michael Ellerman
"Nicholas Piggin"  writes:
> On Mon Aug 7, 2023 at 11:45 AM AEST, Jordan Niethe wrote:
>> The LPID register is 32 bits long. The host keeps the lpids for each
>> guest in an unsigned word struct kvm_arch. Currently, LPIDs are already
>> limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT.
>>
>> The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host
>> for each L2 guest. This value is used as an lpid, e.g. it is the
>> parameter used by H_RPT_INVALIDATE. To minimize needless special casing
>> it makes sense to keep this "Guest ID" in struct kvm_arch::lpid.
>>
>> This means that struct kvm_arch::lpid is too small so prepare for this
>> and make it an unsigned long. This is not a problem for the KVM-HV and
>> nestedv1 cases as their lpid values are already limited to valid ranges
>> so in those contexts the lpid can be used as an unsigned word safely as
>> needed.
>>
>> In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already
>> specified as an unsigned long so change pseries_rpt_invalidate() to
>> match that.  Update the callers of pseries_rpt_invalidate() to also take
>> an unsigned long if they take an lpid value.
>
> I don't suppose it would be worth having an lpid_t.
>
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index 4adff4f1896d..229f0a1ffdd4 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
>> @@ -886,10 +886,10 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu 
>> *vcpu, u8 prio,
>>  
>>  if (single_escalation)
>>  name = kasprintf(GFP_KERNEL, "kvm-%d-%d",
>> - vcpu->kvm->arch.lpid, xc->server_num);
>> + (unsigned int)vcpu->kvm->arch.lpid, 
>> xc->server_num);
>>  else
>>  name = kasprintf(GFP_KERNEL, "kvm-%d-%d-%d",
>> - vcpu->kvm->arch.lpid, xc->server_num, prio);
>> + (unsigned int)vcpu->kvm->arch.lpid, 
>> xc->server_num, prio);
>>  if (!name) {
>>  pr_err("Failed to allocate escalation irq name for queue %d of 
>> VCPU %d\n",
>> prio, xc->server_num);
>
> I would have thought you'd keep the type and change the format.

Yeah. Don't we risk having ambigious names by discarding the high bits?
Not sure that would be a bug per se, but it could be confusing.

cheers


Re: [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement

2023-08-15 Thread Mateusz Guzik
On 8/15/23, Linus Torvalds  wrote:
> On Tue, 15 Aug 2023 at 07:12, kernel test robot 
> wrote:
>>
>> kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec
>> on:
>
> WTF? That's ridiculous. Why would that even test new_inode() at all?
> And why would it make any difference anyway to prefetch a new inode?
> The 'zero' test claims to just read /dev/zero in a loop...
>
> [ Goes looking ]
>

Ye man, I was puzzled myself but just figured it out and was about to respond ;)

# bpftrace -e 'kprobe:new_inode { @[kstack()] = count(); }'
Attaching 1 probe...

@[
new_inode+1
shmem_get_inode+137
__shmem_file_setup+195
shmem_zero_setup+46
mmap_region+1937
do_mmap+956
vm_mmap_pgoff+224
do_syscall_64+46
entry_SYSCALL_64_after_hwframe+115
]: 2689570

the bench is doing this *A LOT* and this looks so fishy, for the bench
itself and the kernel doing it, but I'm not going to dig into any of
that.

>>  39.35-0.3   39.09
>> perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
>
> Ahh. It also does the mmap side, and the shared case ends up always
> creating a new inode.
>
> And while the test only tests *reading* and the mmap is read-only, the
> /dev/zero file descriptor was opened for writing too, for a different
> part of a test.
>
> So even though the mapping is never written to, MAYWRITE is set, and
> so the /dev/zero mapping is done as a shared memory mapping and we
> can't do it as just a private one.
>
> That's kind of stupid and looks unintentional, but whatever.
>
> End result: that benchmark ends up being at least partly (and a fairly
> noticeable part) a shmem setup benchmark, for no actual good reason.
>
> Oh well. I certainly don't mind the removal apparently then also
> helping some odd benchmark case, but I don't think this translates to
> anything real. Very random.
>
> Linus
>


-- 
Mateusz Guzik 


Re: [PATCH] powerpc: Make virt_to_pfn() a static inline

2023-08-15 Thread Linus Walleij
On Tue, Aug 15, 2023 at 9:30 AM Michael Ellerman  wrote:
> Linus Walleij  writes:

> > - return ((unsigned long)__va(pmd_val(pmd) & ~PMD_MASKED_BITS));
> > + return (const void *)((unsigned long)__va(pmd_val(pmd) & 
> > ~PMD_MASKED_BITS));
>
> This can also just be:
>
> return __va(pmd_val(pmd) & ~PMD_MASKED_BITS);
>
> I've squashed that in.

Oh you applied it, then I don't need to send revised versions, thanks Michael!

Yours,
Linus Walleij


Re: [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement

2023-08-15 Thread Linus Torvalds
On Tue, 15 Aug 2023 at 07:12, kernel test robot  wrote:
>
> kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:

WTF? That's ridiculous. Why would that even test new_inode() at all?
And why would it make any difference anyway to prefetch a new inode?
The 'zero' test claims to just read /dev/zero in a loop...

[ Goes looking ]

>  39.35-0.3   39.09
> perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup

Ahh. It also does the mmap side, and the shared case ends up always
creating a new inode.

And while the test only tests *reading* and the mmap is read-only, the
/dev/zero file descriptor was opened for writing too, for a different
part of a test.

So even though the mapping is never written to, MAYWRITE is set, and
so the /dev/zero mapping is done as a shared memory mapping and we
can't do it as just a private one.

That's kind of stupid and looks unintentional, but whatever.

End result: that benchmark ends up being at least partly (and a fairly
noticeable part) a shmem setup benchmark, for no actual good reason.

Oh well. I certainly don't mind the removal apparently then also
helping some odd benchmark case, but I don't think this translates to
anything real. Very random.

Linus


Re: [PATCH] powerpc: Make virt_to_pfn() a static inline

2023-08-15 Thread Michael Ellerman
Linus Walleij  writes:

> Making virt_to_pfn() a static inline taking a strongly typed
> (const void *) makes the contract of a passing a pointer of that
> type to the function explicit and exposes any misuse of the
> macro virt_to_pfn() acting polymorphic and accepting many types
> such as (void *), (unitptr_t) or (unsigned long) as arguments
> without warnings.
>
> Move the virt_to_pfn() and related functions below the
> declaration of __pa() so it compiles.
>
> For symmetry do the same with pfn_to_kaddr().
>
> As the file is included right into the linker file, we need
> to surround the functions with ifndef __ASSEMBLY__ so we
> don't cause compilation errors.
>
> The conversion moreover exposes the fact that pmd_page_vaddr()
> was returning an unsigned long rather than a const void * as
> could be expected, so all the sites defining pmd_page_vaddr()
> had to be augmented as well.
...
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 6a88bfdaa69b..a9515d3d7831 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -60,9 +60,9 @@ static inline pgprot_t pte_pgprot(pte_t pte)
>  }
>  
>  #ifndef pmd_page_vaddr
> -static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> +static inline const void *pmd_page_vaddr(pmd_t pmd)
>  {
> - return ((unsigned long)__va(pmd_val(pmd) & ~PMD_MASKED_BITS));
> + return (const void *)((unsigned long)__va(pmd_val(pmd) & 
> ~PMD_MASKED_BITS));

This can also just be:

return __va(pmd_val(pmd) & ~PMD_MASKED_BITS);

I've squashed that in.

cheers


Re: [PATCH] powerpc: Make virt_to_pfn() a static inline

2023-08-15 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 14/08/2023 à 14:37, Michael Ellerman a écrit :
>> Linus Walleij  writes:
>>> Making virt_to_pfn() a static inline taking a strongly typed
>>> (const void *) makes the contract of a passing a pointer of that
>>> type to the function explicit and exposes any misuse of the
>>> macro virt_to_pfn() acting polymorphic and accepting many types
>>> such as (void *), (unitptr_t) or (unsigned long) as arguments
>>> without warnings.
>> ...
>>> diff --git a/arch/powerpc/include/asm/page.h 
>>> b/arch/powerpc/include/asm/page.h
>>> index f2b6bf5687d0..9ee4b6d4a82a 100644
>>> --- a/arch/powerpc/include/asm/page.h
>>> +++ b/arch/powerpc/include/asm/page.h
>>> @@ -233,6 +224,25 @@ extern long long virt_phys_offset;
>>>   #endif
>>>   #endif
>>>   
>>> +#ifndef __ASSEMBLY__
>>> +static inline unsigned long virt_to_pfn(const void *kaddr)
>>> +{
>>> +   return __pa(kaddr) >> PAGE_SHIFT;
>>> +}
>>> +
>>> +static inline const void *pfn_to_kaddr(unsigned long pfn)
>>> +{
>>> +   return (const void *)(((unsigned long)__va(pfn)) << PAGE_SHIFT);
>> 
>> Any reason to do it this way rather than:
>> 
>> +   return __va(pfn << PAGE_SHIFT);
>
> Even cleaner:
>
>   return __va(PFN_PHYS(pfn));

PFN_PHYS() includes a cast to phys_addr_t before shifting, so it's not
entirely equivalent.

But if phys_addr_t is larger than unsinged long then that cast is
important. Which makes me wonder how/if pfn_to_kaddr() has been working
until now for CONFIG_PHYS_ADDR_T_64BIT=y.

cheers


[linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement

2023-08-15 Thread kernel test robot



Hello,

kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:


commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove 
spin_lock_prefetch")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz 
(Ice Lake) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
class: memory
test: zero
cpufreq_governor: performance






Details are as below:
-->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.s...@intel.com

=
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  
memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s

commit: 
  3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
  c8afaa1b0f ("locking: remove spin_lock_prefetch")

3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 
 --- 
 %stddev %change %stddev
 \  |\  
 20.98 ±  8% +12.7%  23.65 ±  4%  
perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
 21.05 ±  8%+803.4% 190.14 ±196%  perf-sched.total_sch_delay.max.ms
 46437+2.4%  47564
stress-ng.time.involuntary_context_switches
  87942414+6.3%   93441484stress-ng.time.minor_page_faults
  21983137+6.3%   23357886stress-ng.zero.ops
366380+6.3% 389295stress-ng.zero.ops_per_sec
100683+4.1% 104861 ±  2%  proc-vmstat.nr_shmem
  60215587+6.2%   63957836proc-vmstat.numa_hit
  60148996+6.2%   63889951proc-vmstat.numa_local
  22046746+6.2%   23421583proc-vmstat.pgactivate
  83092777+6.3%   88309102proc-vmstat.pgalloc_normal
  88854159+6.1%   94276960proc-vmstat.pgfault
  82294936+6.3%   87489838proc-vmstat.pgfree
  21970411+6.3%   23344438proc-vmstat.unevictable_pgs_culled
  21970116+6.3%   23344165
proc-vmstat.unevictable_pgs_mlocked
  21970115+6.3%   23344164
proc-vmstat.unevictable_pgs_munlocked
  21970113+6.3%   23344161
proc-vmstat.unevictable_pgs_rescued
 1.455e+10+4.2%  1.517e+10perf-stat.i.branch-instructions
  58358654+5.0%   61304729perf-stat.i.branch-misses
  1.12e+08+5.2%  1.179e+08perf-stat.i.cache-misses
 2.569e+08+5.1%  2.698e+08perf-stat.i.cache-references
  3.32-4.4%   3.17perf-stat.i.cpi
  2031 ±  2%  -5.0%   1930 ±  2%  
perf-stat.i.cycles-between-cache-misses
 1.603e+10+4.4%  1.674e+10perf-stat.i.dTLB-loads
 7.449e+09+6.1%  7.901e+09perf-stat.i.dTLB-stores
  6.52e+10+4.4%  6.807e+10perf-stat.i.instructions
  0.31+5.7%   0.33 ±  3%  perf-stat.i.ipc
825.05+4.8% 864.24perf-stat.i.metric.K/sec
598.07+4.7% 626.06perf-stat.i.metric.M/sec
  12910790+4.3%   13471810perf-stat.i.node-load-misses
   7901301 ±  2%  +5.7%8348185perf-stat.i.node-loads
  21890957 ±  3%  +6.9%   23410670 ±  2%  perf-stat.i.node-stores
  3.38-4.3%   3.23perf-stat.overall.cpi
  1964-5.1%   1864
perf-stat.overall.cycles-between-cache-misses
  0.30+4.5%   0.31perf-stat.overall.ipc
 1.431e+10+4.3%  1.493e+10perf-stat.ps.branch-instructions
  57370846+5.0%   60264193perf-stat.ps.branch-misses
 1.103e+08+5.3%   1.16e+08perf-stat.ps.cache-misses
 2.528e+08+5.1%  2.657e+08perf-stat.ps.cache-references
 1.577e+10+4.5%  1.647e+10perf-stat.ps.dTLB-loads
  7.33e+09+6.1%  7.776e+09perf-stat.ps.dTLB-stores
 6.415e+10+4.4%  6.699e+10perf-stat.ps.instructions
  12704753+4.4%   13259951perf-stat.ps.node-load-misses
   7778242 ±  2%  +5.7%8224062perf-stat.ps.node-loads
  21539559 ±  3%  +7.0%   23044455 ±  2%  perf-stat.ps.node-stores
 4.005e+12+5.0%  4.205e+12perf-stat.total.instructions
 38.85-0.8   38.07 

Re: [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock()

2023-08-15 Thread David Hildenbrand

On 15.08.23 08:34, Hugh Dickins wrote:

On Mon, 14 Aug 2023, Jann Horn wrote:

On Wed, Jul 12, 2023 at 6:42 AM Hugh Dickins  wrote:

Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp().
It does need mmap_read_lock(), but it does not need mmap_write_lock(),
nor vma_start_write() nor i_mmap lock nor anon_vma lock.  All racing
paths are relying on pte_offset_map_lock() and pmd_lock(), so use those.


We can still have a racing userfaultfd operation at the "/* step 4:
remove page table */" point that installs a new PTE before the page
table is removed.

To reproduce, patch a delay into the kernel like this:


diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9a6e0d507759..27cc8dfbf3a7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -20,6 +20,7 @@
  #include 
  #include 
  #include 
+#include 

  #include 
  #include 
@@ -1617,6 +1618,11 @@ int collapse_pte_mapped_thp(struct mm_struct
*mm, unsigned long addr,
 }

 /* step 4: remove page table */
+   if (strcmp(current->comm, "DELAYME") == 0) {
+   pr_warn("%s: BEGIN DELAY INJECTION\n", __func__);
+   mdelay(5000);
+   pr_warn("%s: END DELAY INJECTION\n", __func__);
+   }

 /* Huge page lock is still held, so page table must remain empty */
 pml = pmd_lock(mm, pmd);


And then run the attached reproducer against mm/mm-everything. You
should get this in dmesg:

[  206.578096] BUG: Bad rss-counter state mm:0942ebea
type:MM_ANONPAGES val:1


Thanks a lot, Jann. I haven't thought about it at all yet; and just
tried to reproduce, but haven't yet got the "BUG: Bad rss-counter":
just see "Invalid argument" on the UFFDIO_COPY ioctl.
Will investigate tomorrow.


Maybe you're missing a fixup:

https://lkml.kernel.org/r/20230810192128.1855570-1-axelrasmus...@google.com

When the src address is not page aligned, UFFDIO_COPY in mm-unstable 
would erroneously fail.


--
Cheers,

David / dhildenb



Re: [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock()

2023-08-15 Thread Hugh Dickins
On Mon, 14 Aug 2023, Jann Horn wrote:
> On Wed, Jul 12, 2023 at 6:42 AM Hugh Dickins  wrote:
> > Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp().
> > It does need mmap_read_lock(), but it does not need mmap_write_lock(),
> > nor vma_start_write() nor i_mmap lock nor anon_vma lock.  All racing
> > paths are relying on pte_offset_map_lock() and pmd_lock(), so use those.
> 
> We can still have a racing userfaultfd operation at the "/* step 4:
> remove page table */" point that installs a new PTE before the page
> table is removed.
> 
> To reproduce, patch a delay into the kernel like this:
> 
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 9a6e0d507759..27cc8dfbf3a7 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -1617,6 +1618,11 @@ int collapse_pte_mapped_thp(struct mm_struct
> *mm, unsigned long addr,
> }
> 
> /* step 4: remove page table */
> +   if (strcmp(current->comm, "DELAYME") == 0) {
> +   pr_warn("%s: BEGIN DELAY INJECTION\n", __func__);
> +   mdelay(5000);
> +   pr_warn("%s: END DELAY INJECTION\n", __func__);
> +   }
> 
> /* Huge page lock is still held, so page table must remain empty */
> pml = pmd_lock(mm, pmd);
> 
> 
> And then run the attached reproducer against mm/mm-everything. You
> should get this in dmesg:
> 
> [  206.578096] BUG: Bad rss-counter state mm:0942ebea
> type:MM_ANONPAGES val:1

Thanks a lot, Jann. I haven't thought about it at all yet; and just
tried to reproduce, but haven't yet got the "BUG: Bad rss-counter":
just see "Invalid argument" on the UFFDIO_COPY ioctl.
Will investigate tomorrow.

Hugh