from:"Daniel Axtens"

Re: [PATCH 3/3] crypto/nx: Constify static attribute_group structs

2022-02-10 Thread Daniel Axtens

Hi Rikard,

> The only usage of these is to pass their address to
> sysfs_{create,remove}_group(), which takes pointers to const struct
> attribute_group. Make them const to allow the compiler to put them in
> read-only memory.

I checked the file. Indeed, those structs are only used in
sysfs_{create,remove}_group. So I agree that they can be constified.

They also pass all the automated tests:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220210202805.7750-4-rikard.falkeb...@gmail.com/

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Signed-off-by: Rikard Falkeborn 
> ---
>  drivers/crypto/nx/nx-common-pseries.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/crypto/nx/nx-common-pseries.c 
> b/drivers/crypto/nx/nx-common-pseries.c
> index 4e304f6081e4..7584a34ba88c 100644
> --- a/drivers/crypto/nx/nx-common-pseries.c
> +++ b/drivers/crypto/nx/nx-common-pseries.c
> @@ -962,7 +962,7 @@ static struct attribute *nx842_sysfs_entries[] = {
>   NULL,
>  };
>  
> -static struct attribute_group nx842_attribute_group = {
> +static const struct attribute_group nx842_attribute_group = {
>   .name = NULL,   /* put in device directory */
>   .attrs = nx842_sysfs_entries,
>  };
> @@ -992,7 +992,7 @@ static struct attribute *nxcop_caps_sysfs_entries[] = {
>   NULL,
>  };
>  
> -static struct attribute_group nxcop_caps_attr_group = {
> +static const struct attribute_group nxcop_caps_attr_group = {
>   .name   =   "nx_gzip_caps",
>   .attrs  =   nxcop_caps_sysfs_entries,
>  };
> -- 
> 2.35.1

Re: [RFC PATCH 0/2] powerpc/pseries: add support for local secure storage called Platform Keystore(PKS)

2022-01-23 Thread Daniel Axtens

Hi Greg,

> Ok, this is like the 3rd or 4th different platform-specific proposal for
> this type of functionality.  I think we need to give up on
> platform-specific user/kernel apis on this (random sysfs/securityfs
> files scattered around the tree), and come up with a standard place for
> all of this.

I agree that we do have a number of platforms exposing superficially
similar functionality. Indeed, back in 2019 I had a crack at a unified
approach: [1] [2].

Looking back at it now, I am not sure it ever would have worked because
the semantics of the underlying firmware stores are quite
different. Here are the ones I know about:

 - OpenPower/PowerNV Secure Variables:
 
   * Firmware semantics:
 - flat variable space
 - variables are fixed in firmware, can neither be created nor
destroyed
 - variable names are ASCII
 - no concept of policy/attributes
 
   * Current kernel interface semantics:
 - names are case sensitive
 - directory per variable 


 - (U)EFI variables:
 
   * Firmware semantics:
 - flat variable space
 - variables can be created/destroyed but the semantics are fiddly
[3]
 - variable names are UTF-16 + UUID
 - variables have 32-bit attributes

   * efivarfs interface semantics:
 - file per variable
 - attributes are the first 4 bytes of the file
 - names are partially case-insensitive (UUID part) and partially
case-sensitive ('name' part)

   * sysfs interface semantics (as used by CONFIG_GOOGLE_SMI)
 - directory per variable
 - attributes are a separate sysfs file
 - to create a variable you write a serialised structure to
`/sys/firmware/efi/vars/new_var`, to delete a var you write
to `.../del_var`
 - names are case-sensitive including the UUID


 - PowerVM Partition Key Store Variables:
 
   * Firmware semantics:
 - _not_ a flat space, there are 3 domains ("consumers"): firmware,
bootloader and OS (not yet supported by the patch set)
 - variables can be created and destroyed but the semantics are
fiddly and fiddly in different ways to UEFI [4]
 - variable names are arbitrary byte strings: the hypervisor permits
names to contain nul and /.
 - variables have 32-bit attributes ("policy") that don't align with
UEFI attributes

   * No stable kernel interface yet

Even if we could come up with some stable kernel interface features
(e.g. decide if we want file per variable vs directory per variable), I
don't know how easy it would be to deal with the underlying semantic
differences - I think userspace would still need substantial
per-platform knowledge.

Or have I misunderstood what you're asking for? (If you want them all to
live under /sys/firmware, these ones all already do...)

Kind regards,
Daniel


[1]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-May/190735.html
[2]: discussion continues at 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/191365.html
[3]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/191496.html

[4]: An unsigned variable cannot be updated, it can only be deleted
 (unless it was created with the immutable policy) and then
 re-created. A signed variable, on the other hand, can be updated
 and the only way to delete it is to submit a validly signed empty
 update.

Re: [PATCH] powerpc/64s/radix: Fix unmapping huge vmaps when CONFIG_HUGETLB_PAGE=n

2021-11-25 Thread Daniel Axtens

Hi,

> pmd_huge is defined out to false when HUGETLB_PAGE is not configured,
> but the vmap code still installs huge PMDs. This leads to errors
> encountering bad PMDs when vunmapping because it is not seen as a
> huge PTE, and the bad PMD check catches it. The end result may not
> be much more serious than some bad pmd warning messages, because the
> pmd_none_or_clear_bad() does what we wanted and clears the huge PTE
> anyway.

Huh. So vmap seems to key off arch_vmap_p?d_supported which checks for
radix and HAVE_ARCH_HUGE_VMAP.

> Fix this by checking pmd_is_leaf(), which checks for a PTE regardless
> of config options. The whole huge/large/leaf stuff is a tangled mess
> but that's kernel-wide and not something we can improve much in
> arch/powerpc code.

I guess I'm a bit late to the party here because p?d_is_leaf was added
in 2019 in commit d6eacedd1f0e ("powerpc/book3s: Use config independent
helpers for page table walk") but why wouldn't we just make pmd_huge()
not config dependent?

Also, looking at that commit, there are a few places that might still
throw warnings, e.g. find_linux_pte, find_current_mm_pte, pud_page which
seem like they might still throw warnings if they were to encounter a
huge vmap page:

struct page *pud_page(pud_t pud)
{
if (pud_is_leaf(pud)) {
VM_WARN_ON(!pud_huge(pud));

Do these functions need special treatment for huge vmappings()?

Apart from those questions, the patch itself makes sense to me and I can
follow how it would fix a problem.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-11 Thread Daniel Axtens

Hi Ganesh,

> In realmode mce handler we use irq_work_queue() to defer
> the processing of mce events, irq_work_queue() can only
> be called when translation is enabled because it touches
> memory outside RMA, hence we enable translation before
> calling irq_work_queue and disable on return, though it
> is not safe to do in realmode.
>
> To avoid this, program the decrementer and call the event
> processing functions from timer handler.
>

This is an interesting approach, and it would indeed be nice to clear up
the MCE handling a bit.

I have a few questions:

> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 331d944280b8..187810f13669 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -235,8 +235,10 @@ extern void machine_check_print_event_info(struct 
> machine_check_event *evt,
>  unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr);
>  extern void mce_common_process_ue(struct pt_regs *regs,
> struct mce_error_info *mce_err);
> +extern void machine_check_raise_dec_intr(void);

Does this need an extern? I think that's the default...?

>  int mce_register_notifier(struct notifier_block *nb);
>  int mce_unregister_notifier(struct notifier_block *nb);
> +void mce_run_late_handlers(void);
>  #ifdef CONFIG_PPC_BOOK3S_64
>  void flush_and_reload_slb(void);
>  void flush_erat(void);
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index dc05a862e72a..f49180f8c9be 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -280,6 +280,7 @@ struct paca_struct {
>  #endif
>  #ifdef CONFIG_PPC_BOOK3S_64
>   struct mce_info *mce_info;
> + atomic_t mces_to_process;

This is in the PACA, which is supposed to be a per-cpu structure: hey
does it need to be atomic_t? Isn't there only one CPU accessing it?

Does this variable provide anything new compared to mce_nest_count or
mce_queue_count + mce_ue_count? It looks like
machine_check_process_queued_event will clear a queue based on value of
mce_queue_count and machine_check_process_ue_event will clear a queue
based on mce_ue_count...

I think (absent nested interrupts which I talk about below) it should be
the case that mces_to_process == mce_queue_count + mce_ue_count but I
might be wrong?

>  #endif /* CONFIG_PPC_BOOK3S_64 */
>  } cacheline_aligned;

  
>  /*
>   * Decode and save high level MCE information into per cpu buffer which
>   * is an array of machine_check_event structure.
> @@ -135,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled,
>   if (mce->error_type == MCE_ERROR_TYPE_UE)
>   mce->u.ue_error.ignore_event = mce_err->ignore_event;
>  
> + atomic_inc(&local_paca->mces_to_process);
> +

Is there any chance the decrementer will fire between when you do this
atomic_inc() and when you finish adding all the information to the mce
data structure in the remainder of save_mce_event? (e.g. filling in the
tlb_errror.effective_address field)?

(Or does save_mce_event get called with interrupts masked? I find it
very hard to remember what parts of the MCE code path happen under what
circumstances!)

>   if (!addr)
>   return;
>  


> @@ -338,7 +322,7 @@ static void machine_process_ue_event(struct work_struct 
> *work)
>   * process pending MCE event from the mce event queue. This function will be
>   * called during syscall exit.

Is this comment still accurate if this patch is applied?

>   */
> -static void machine_check_process_queued_event(struct irq_work *work)
> +static void machine_check_process_queued_event(void)
>  {
>   int index;
>   struct machine_check_event *evt;
> @@ -363,6 +347,17 @@ static void machine_check_process_queued_event(struct 
> irq_work *work)
>   }
>  }
>  
> +void mce_run_late_handlers(void)
> +{
> + if (unlikely(atomic_read(&local_paca->mces_to_process))) {
> + if (ppc_md.machine_check_log_err)
> + ppc_md.machine_check_log_err();
> + machine_check_process_queued_event();
> + machine_check_ue_work();
> + atomic_dec(&local_paca->mces_to_process);
> + }
> +}

What happens if you get a nested MCE between log_err() and
process_queued_event()? If my very foggy memory of the MCE handling is
correct, we enable nested MCEs very early in the process because the
alternative is that a nested MCE will checkstop the box. So I think this
might be possible, albeit probably unlikely.

It looks like process_queued_event clears the entire MCE queue as
determined by the mce_queue_count. So, consider the following sequence
of events:

1. Take MCE 1. Save to queue, increment mce_queue_count, increment
   mces_to_process, set decrementer to fire.

2. Decrementer fires. mce_run_late_handlers is called.

3. mces_to_process = 1, so we call machine_check_log_err(), which prints
   (on pseries) the info for MCE 1.

4. Take MCE 2

Re: [PATCH v1] powerpc/watchdog: help remote CPUs to flush NMI printk output

2021-11-11 Thread Daniel Axtens

Hi,

> The printk layer at the moment does not seem to have a good way to force
> flush printk messages that are created in NMI context, except in the
> panic path.
>
> NMI-context printk messages normally get to the console with irq_work,
> but that won't help if the CPU is stuck with irqs disabled, as can be
> the case for hard lockup watchdog messages.
>
> The watchdog currently flushes the printk buffers after detecting a
> lockup on remote CPUs, but they may not have processed their NMI IPI
> yet by that stage, or they may have self-detected a lockup in which
> case they won't go via this NMI IPI path.
>
> Improve the situation by having NMI-context mark a flag if it called
> printk, and have watchdog timer interrupts check if that flag was set
> and try to flush if it was. Latency is not a big problem because we
> were already stuck for a while, just need to try to make sure the
> messages eventually make it out.

Initially I was surprised that this doesn't affect the printk code
itself, just the powerpc code...

>
> Cc: Laurent Dufour 
> Signed-off-by: Nicholas Piggin 
> ---
> This patch is actually based on top of this one which is planned to go
> upstream in rc1/2. https://marc.info/?l=linux-kernel&m=163626070312052&w=2
>
> Prior to commit 93d102f094be that is fixed by the above, we had a printk
> flush function with a different name but basically does the same job, so
> this patch can be backported, just needs some care. I'm posting it for
> review now for feedback so it's ready to go when the printk patch is
> upstream.
>
> Thanks,
> Nick
>
>  arch/powerpc/kernel/watchdog.c | 29 +++--
>  1 file changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index b6533539386b..a7b6b0691203 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -86,6 +86,7 @@ static DEFINE_PER_CPU(u64, wd_timer_tb);
>  /* SMP checker bits */
>  static unsigned long __wd_smp_lock;
>  static unsigned long __wd_reporting;
> +static unsigned long __wd_nmi_output;
>  static cpumask_t wd_smp_cpus_pending;
>  static cpumask_t wd_smp_cpus_stuck;
>  static u64 wd_smp_last_reset_tb;
> @@ -154,6 +155,18 @@ static void wd_lockup_ipi(struct pt_regs *regs)
>   else
>   dump_stack();
>  
> + /*
> +  * We printk()ed from NMI context, the contents may not get flushed
> +  * if we return to a context with interrupts disabled because
> +  * printk uses irq_work to schedule flushes of NMI output.
> +  * __wd_nmi_output says there has been output from NMI context, so
> +  * other CPUs are recruited to help flush it.
> +  *
> +  * xchg is not needed here (it could be a simple atomic store), but
> +  * it gives the memory ordering and atomicity required.
> +  */
> + xchg(&__wd_nmi_output, 1);
> +
>   /* Do not panic from here because that can recurse into NMI IPI layer */
>  }

I think, looking at this and the other site where __wd_nmi_output is
set, that this works because you set the flag only when you are done
printing from the non-panic lockup context on this CPU. I was initially
worried that you set this flag part way through printing, and then it
might get cleared by another CPU while you're still trying to print.
However, in this function it's right at the end - there's nothing else
left to do, and ...

>  DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
> @@ -386,6 +401,8 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>   print_irqtrace_events(current);
>   show_regs(regs);
>  
> + xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
> +
>   if (sysctl_hardlockup_all_cpu_backtrace)
>   trigger_allbutself_cpu_backtrace();

in this one, the only things that can happen afterwards are
 - a panic, which does its own flushing, and

- trigger_allbutself_cpu_backtrace(), which seems to just send IPIs, not
 do any printing of its own.

All of which is fine, but I wonder if we need a more descriptive
variable name or if the comment needs to specify that the flag should
only be set at certain times.

Kind regards,
Daniel

Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-07 Thread Daniel Axtens

Michal Suchánek  writes:

> On Fri, Nov 05, 2021 at 09:55:52PM +1100, Daniel Axtens wrote:
>> Michal Suchanek  writes:
>> 
>> > S390 uses appended signature for kernel but implements the check
>> > separately from module loader.
>> >
>> > Support for secure boot on powerpc with appended signature is planned -
>> > grub patches submitted upstream but not yet merged.
>> 
>> Power Non-Virtualised / OpenPower already supports secure boot via kexec
>> with signature verification via IMA. I think you have now sent a
>> follow-up series that merges some of the IMA implementation, I just
>> wanted to make sure it was clear that we actually already have support
>
> So is IMA_KEXEC and KEXEC_SIG redundant?
>
> I see some architectures have both. I also see there is a lot of overlap
> between the IMA framework and the KEXEC_SIG and MODULE_SIg.

Mimi would be much better placed than me to answer this.

The limits of my knowledge are basically that signature verification for
modules and kexec kernels can be enforced by IMA policies.

For example a secure booted powerpc kernel with module support will have
the following IMA policy set at the arch level:

"appraise func=KEXEC_KERNEL_CHECK appraise_flag=check_blacklist 
appraise_type=imasig|modsig",
(in arch/powerpc/kernel/ima_arch.c)

Module signature enforcement can be set with either IMA (policy like
"appraise func=MODULE_CHECK appraise_flag=check_blacklist 
appraise_type=imasig|modsig" )
or with CONFIG_MODULE_SIG_FORCE/module.sig_enforce=1.

Sometimes this leads to arguably unexpected interactions - for example
commit fa4f3f56ccd2 ("powerpc/ima: Fix secure boot rules in ima arch
policy"), so it might be interesting to see if we can make things easier
to understand.  I suspect another amusing interaction is that if the IMA
verification is used, the signature could be in an xattr rather than an
appended signature.

Kind regards,
Daniel

Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-05 Thread Daniel Axtens

Michal Suchanek  writes:

> S390 uses appended signature for kernel but implements the check
> separately from module loader.
>
> Support for secure boot on powerpc with appended signature is planned -
> grub patches submitted upstream but not yet merged.

Power Non-Virtualised / OpenPower already supports secure boot via kexec
with signature verification via IMA. I think you have now sent a
follow-up series that merges some of the IMA implementation, I just
wanted to make sure it was clear that we actually already have support
for this in the kernel, it's just grub that is getting new support.

> This is an attempt at unified appended signature verification.

I am always in favour of fewer reimplementations of the same feature in
the kernel :)

Regards,
Daniel

>
> Thanks
>
> Michal
>
> Michal Suchanek (3):
>   s390/kexec_file: Don't opencode appended signature verification.
>   module: strip the signature marker in the verification function.
>   powerpc/kexec_file: Add KEXEC_SIG support.
>
>  arch/powerpc/Kconfig  | 11 +++
>  arch/powerpc/kexec/elf_64.c   | 14 +
>  arch/s390/kernel/machine_kexec_file.c | 42 +++
>  include/linux/verification.h  |  3 ++
>  kernel/module-internal.h  |  2 --
>  kernel/module.c   | 11 +++
>  kernel/module_signing.c   | 32 ++--
>  7 files changed, 59 insertions(+), 56 deletions(-)
>
> -- 
> 2.31.1

Re: [PATCH] powerpc/32e: Ignore ESR in instruction storage interrupt handler

2021-10-28 Thread Daniel Axtens

Hi Nick,

> A e5500 machine running a 32-bit kernel sometimes hangs at boot,
> seemingly going into an infinite loop of instruction storage interrupts.
> The ESR SPR has a value of 0x80 (store) when this happens, which is
> likely set by a previous store. An instruction TLB miss interrupt would
> then leave ESR unchanged, and if no PTE exists it calls directly to the
> instruction storage interrupt handler without changing ESR.

I hadn't heard of the ESR before and your patch just uses the acronym:
apparently it is the Exception Syndrome Register. In ISA 2.07 it's in
Section 7.2.13 of Book III-E. (e5500 implements 2.06 but I assume it's
roughly the same there.)

The ISA says that ESR_ST is set for the following types of exception:
 - Alignment
 - Data Storage
 - Data TLB
 - LRAT Error

So it makes sense to assume that it was not set by the instruction
storage exception. (I see you have a discussion as to how that might
occur in
https://lore.kernel.org/linuxppc-dev/1635413197.83rhc4s3fc.astr...@bobo.none/#t
and you concluded that in the comment you add that we arrive here via
the TLB handler jumping to the ISI.)

> access_error() does not cause a segfault due to a store to a read-only
> vma because is_exec is true. Most subsequent fault handling does not
> check for a write fault on a read-only vma, and might do strange things
> like create a writeable PTE or call page_mkwrite on a read only vma or
> file. It's not clear what happens here to cause the infinite faulting in
> this case, a fault handler failure or low level PTE or TLB handling.

I'm just trying to unpick this:

 - INSTRUCTION_STORAGE_EXCEPTION ends up branching to do_page_fault ->
   __do_page_fault -> ___do_page_fault

 - ___do_page_fault has is_exec = true because the exception is a
   instruction storage interrupt

 - ___do_page_fault also decides that is_write = true because the
   ESR_DST bit is set.

This puts us in a bad position because we're taking some information
from the current exception and some information from a previous
exception, so it makes sense that things would go wrong from here!

> In any case this can be fixed by having the instruction storage
> interrupt zero regs->dsisr rather than storing the ESR value to it.

Is it valid to just ignore the contents of ESR for an Instruction
Storage exception?

The 2.07 ISA at least says that the following ESR bits can be set by an
Instruction Storage exception:
 - Byte Ordering
 - TLB Inelligible
 - Page Table

It sounds from

> + * In any case, do_page_fault for BOOK3E does not use ESR and always expects
> + * dsisr to be 0. ESR_DST from a prior store in particular would confuse 
> fault
> + * handling.

that we don't actually ever check any of those three bits in the
do_page_fault path. reg_booke.h defines ESR_BO but the definition is
never used, and there is no ESR_TLBI or ESR_PT constant defined.

So much as it seems a bit dodgy to me, I guess it is right to say that
we're not changing behaviour by setting dsisr to 0 and just ignoring
those 3 bits? Should we document that in the comment?

I probably would have masked ESR_DST but I guess extra bit-twiddling in
an interrupt path when zeroing would do is probably not worth it for
theoretical correctness?

Kind regards,
Daniel

> Link: 
> https://lore.kernel.org/linuxppc-dev/1635306738.0z8wt7619v.astr...@bobo.none/
> Fixes: a01a3f2ddbcd ("powerpc: remove arguments from fault handler functions")
> Reported-by: Jacques de Laval 
> Tested-by: Jacques de Laval 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/head_booke.h | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/head_booke.h 
> b/arch/powerpc/kernel/head_booke.h
> index e5503420b6c6..ef8d1b1c234e 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -465,12 +465,21 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>   bl  do_page_fault;\
>   b   interrupt_return
>  
> +/*
> + * Instruction TLB Error interrupt handlers may call InstructionStorage
> + * directly without clearing ESR, so the ESR at this point may be left over
> + * from a prior interrupt.
> + *
> + * In any case, do_page_fault for BOOK3E does not use ESR and always expects
> + * dsisr to be 0. ESR_DST from a prior store in particular would confuse 
> fault
> + * handling.
> + */
>  #define INSTRUCTION_STORAGE_EXCEPTION
>   \
>   START_EXCEPTION(InstructionStorage)   \
> - NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE); \
> - mfspr   r5,SPRN_ESR;/* Grab the ESR and save it */\
> + NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE); \
> + li  r5,0;   /* Store 0 in regs->esr (dsisr) */\
>   stw r5,_ESR(r11); \
> -

Re: [PATCH v2] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

2021-10-22 Thread Daniel Axtens

Nathan Lynch  writes:

> Daniel Axtens  writes:
>>> On VMs with NX encryption, compression, and/or RNG offload, these
>>> capabilities are described by nodes in the ibm,platform-facilities device
>>> tree hierarchy:
>>>
>>>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>   ├── ibm,compression-v1
>>>   ├── ibm,random-v1
>>>   └── ibm,sym-encryption-v1
>>>
>>>   3 directories
>>>
>>> The acceleration functions that these nodes describe are not disrupted by
>>> live migration, not even temporarily.
>>>
>>> But the post-migration ibm,update-nodes sequence firmware always sends
>>> "delete" messages for this hierarchy, followed by an "add" directive to
>>> reconstruct it via ibm,configure-connector (log with debugging statements
>>> enabled in mobility.c):
>>>
>>>   mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>>>   mobility: removing node 
>>> /ibm,platform-facilities/ibm,compression-v1:4294967284
>>>   mobility: removing node 
>>> /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>>>   mobility: removing node /ibm,platform-facilities:4294967286
>>>   ...
>>>   mobility: added node /ibm,platform-facilities:4294967286
>>>
>>> Note we receive a single "add" message for the entire hierarchy, and what
>>> we receive from the ibm,configure-connector sequence is the top-level
>>> platform-facilities node along with its three children. The debug message
>>> simply reports the parent node and not the whole subtree.
>>
>> If I understand correctly, (and again, this is not my area at all!) we
>> still have to go out to the firmware and call the
>> ibm,configure-connector sequence in order to figure out that the node
>> we're supposed to add is the ibm,platform-facilites node, right? We
>> can't save enough information at delete time to avoid the trip out to
>> firmware?
>
> That is right... but maybe I don't understand your angle here. Unsure
> what avoiding the configure-connector sequence for the nodes would buy
> us.

It's not meant to be a tricky question, so the simple answer is probably
the right one. Just wondering if there was a marginal efficiency gain -
although I believe it's not really a hot path anyway.

>
>
>>> Until that can be realized we have a confirmed use-after-free and the
>>> possibility of memory corruption. So add a limited workaround that
>>> discriminates on the node type, ignoring adds and removes. This should be
>>> amenable to backporting in the meantime.
>>
>> Yeah it's an unpleasant situation to find ourselves in. It's a bit icky
>> but as I think you said in a previous email, at least this isn't worse:
>> in the common case it should now succeed and and if properties change
>> significantly it will still fail.
>>
>> My one question (from more of a security point of view) is:
>>  1) Say you start using the facilities with a particular set of
>> parameters.
>>
>>  2) Say you then get migrated and the parameters change.
>>
>>  3) If you keep using the platform facilities as if the original
>> properties are still valid, can this cause any Interesting,
>> unexpected or otherwise Bad consequences? Are we going to end up
>> (for example) scribbling over random memory somehow?
>
> If drivers are safely handling errors from H_COP_OP etc, then no. (I
> know, this looks like a Well That Would Be a Driver Bug dismissal, but
> that's not my attitude.) And again this is a case where the change
> cannot make things worse.
>
> In the current design of the pseries LPM implementation, user space and
> other normal system activity resume as soon as we return from the
> stop_machine() call which suspends the partition, executing concurrently
> with any device tree updates. So even if we had code in place to
> correctly resolve the DT changes and the drivers were able to respond to
> the changes, there would still be a window of exposure to the kind of
> problem you describe: the changed characteristics, if any, of the
> destination obtain as soon as execution resumes, regardless of when the
> OS initiates the update-nodes sequence.
>
> The way out of that mess is to use the Linux suspend framework, or
> otherwise prevent user space from executing until the destination
> system's characteristics have been appropriately propagated out to the
> n

Re: [PATCH v2] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

2021-10-20 Thread Daniel Axtens

Hi Nathan,

Thanks for the detailed explanation.

I've not really worked with the partition migration code before I was
able to follow your logic.

> On VMs with NX encryption, compression, and/or RNG offload, these
> capabilities are described by nodes in the ibm,platform-facilities device
> tree hierarchy:
>
>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>   ├── ibm,compression-v1
>   ├── ibm,random-v1
>   └── ibm,sym-encryption-v1
>
>   3 directories
>
> The acceleration functions that these nodes describe are not disrupted by
> live migration, not even temporarily.
>
> But the post-migration ibm,update-nodes sequence firmware always sends
> "delete" messages for this hierarchy, followed by an "add" directive to
> reconstruct it via ibm,configure-connector (log with debugging statements
> enabled in mobility.c):
>
>   mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>   mobility: removing node 
> /ibm,platform-facilities/ibm,compression-v1:4294967284
>   mobility: removing node 
> /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>   mobility: removing node /ibm,platform-facilities:4294967286
>   ...
>   mobility: added node /ibm,platform-facilities:4294967286
>
> Note we receive a single "add" message for the entire hierarchy, and what
> we receive from the ibm,configure-connector sequence is the top-level
> platform-facilities node along with its three children. The debug message
> simply reports the parent node and not the whole subtree.

If I understand correctly, (and again, this is not my area at all!) we
still have to go out to the firmware and call the
ibm,configure-connector sequence in order to figure out that the node
we're supposed to add is the ibm,platform-facilites node, right? We
can't save enough information at delete time to avoid the trip out to
firmware?

> Also, significantly, the nodes added are almost completely equivalent to
> the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
> the leaf nodes is the only property I've observed to differ, and Linux does
> not use that. So in practice, the sum of update messages Linux receives for
> this hierarchy is equivalent to minor property updates.
>
> We succeed in removing the original hierarchy from the device tree. But the
> vio bus code is ignorant of this, and does not unbind or relinquish its
> references. The leaf nodes, still reachable through sysfs, of course still
> refer to the now-freed ibm,platform-facilities parent node, which makes
> use-after-free possible:
>
>   refcount_t: addition on 0; use-after-free.
>   WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 
> refcount_warn_saturate+0x164/0x1f0
>   refcount_warn_saturate+0x160/0x1f0 (unreliable)
>   kobject_get+0xf0/0x100
>   of_node_get+0x30/0x50
>   of_get_parent+0x50/0xb0
>   of_fwnode_get_parent+0x54/0x90
>   fwnode_count_parents+0x50/0x150
>   fwnode_full_name_string+0x30/0x110
>   device_node_string+0x49c/0x790
>   vsnprintf+0x1c0/0x4c0
>   sprintf+0x44/0x60
>   devspec_show+0x34/0x50
>   dev_attr_show+0x40/0xa0
>   sysfs_kf_seq_show+0xbc/0x200
>   kernfs_seq_show+0x44/0x60
>   seq_read_iter+0x2a4/0x740
>   kernfs_fop_read_iter+0x254/0x2e0
>   new_sync_read+0x120/0x190
>   vfs_read+0x1d0/0x240
>
> Moreover, the "new" replacement subtree is not correctly added to the
> device tree, resulting in ibm,platform-facilities parent node without the
> appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
>
>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>
>   0 directories
>
>   $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
>   ./ibm,sym-encryption-v1/of_node: broken symbolic link to
> 
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
>   ./ibm,random-v1/of_node: broken symbolic link to
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
>   ./ibm,compression-v1/of_node:broken symbolic link to
> 
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
>
> This is because add_dt_node() -> dlpar_attach_node() attaches only the
> parent node returned from configure-connector, ignoring any children. This
> should be corrected for the general case, but fixing that won't help with
> the stale OF node references, which is the more urgent problem.
>
> One way to address that would be to make the drivers respond to node
> removal notifications, so that node references can be dropped
> appropriately. But this would likely force the drivers to disrupt active
> clients for no useful purpose: equivalent nodes are immediately re-added.
> And recall that the acceleration capabilities described by the nodes remain
> available throughout the whole process.
>
> The solution I believe to be robust for this situation is to convert
> remove+add of a node

[PATCH 2/2] powerpc/eeh: Use a goto for recovery failures

2021-10-15 Thread Daniel Axtens

From: Oliver O'Halloran 

The EEH recovery logic in eeh_handle_normal_event() has some pretty strange
flow control. If we remove all the actual recovery logic we're left with
the following skeleton:

if (result != PCI_ERS_RESULT_DISCONNECT) {
...
}

if (result != PCI_ERS_RESULT_DISCONNECT) {
...
}

if (result == PCI_ERS_RESULT_NONE) {
...
}

if (result == PCI_ERS_RESULT_CAN_RECOVER) {
...
}

if (result == PCI_ERS_RESULT_CAN_RECOVER) {
...
}

if (result == PCI_ERS_RESULT_NEED_RESET) {
...
}

if ((result == PCI_ERS_RESULT_RECOVERED) ||
(result == PCI_ERS_RESULT_NONE)) {
...
goto out;
}

/*
 * unsuccessful recovery / PCI_ERS_RESULT_DISCONECTED
 * handling is here.
 */
...

out:
...

Most of the "if () { ... }" blocks above change "result" to
PCI_ERS_RESULT_DISCONNECTED if an error occurs in that recovery step. This
makes the control flow a bit confusing since it breaks the early-exit
pattern that is generally used in Linux. In any case we end up handling the
error in the final else block so why not just jump there directly? Doing so
also allows us to de-indent a bunch of code.

No functional changes.

Cc: Mahesh Salgaonkar 
Signed-off-by: Oliver O'Halloran 
[dja: rebase on top of linux-next + my preceeding refactor,
  move clearing the EEH_DEV_NO_HANDLER bit above the first goto so that
  it is always clear in the error handler code as it was before.]
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/kernel/eeh_driver.c | 93 
 1 file changed, 45 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index cb3ac555c944..b8d5805c446a 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -905,18 +905,19 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
}
 #endif /* CONFIG_STACKTRACE */
 
+   eeh_for_each_pe(pe, tmp_pe)
+   eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+   edev->mode &= ~EEH_DEV_NO_HANDLER;
+
eeh_pe_update_time_stamp(pe);
pe->freeze_count++;
if (pe->freeze_count > eeh_max_freezes) {
pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour 
and has been permanently disabled.\n",
   pe->phb->global_number, pe->addr,
   pe->freeze_count);
-   result = PCI_ERS_RESULT_DISCONNECT;
-   }
 
-   eeh_for_each_pe(pe, tmp_pe)
-   eeh_pe_for_each_dev(tmp_pe, edev, tmp)
-   edev->mode &= ~EEH_DEV_NO_HANDLER;
+   goto recover_failed;
+   }
 
/* Walk the various device drivers attached to this slot through
 * a reset sequence, giving each an opportunity to do what it needs
@@ -928,39 +929,38 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 * the error. Override the result if necessary to have partially
 * hotplug for this case.
 */
-   if (result != PCI_ERS_RESULT_DISCONNECT) {
-   pr_warn("EEH: This PCI device has failed %d times in the last 
hour and will be permanently disabled after %d failures.\n",
-   pe->freeze_count, eeh_max_freezes);
-   pr_info("EEH: Notify device drivers to shutdown\n");
-   eeh_set_channel_state(pe, pci_channel_io_frozen);
-   eeh_set_irq_state(pe, false);
-   eeh_pe_report("error_detected(IO frozen)", pe,
- eeh_report_error, &result);
-   if ((pe->type & EEH_PE_PHB) &&
-   result != PCI_ERS_RESULT_NONE &&
-   result != PCI_ERS_RESULT_NEED_RESET)
-   result = PCI_ERS_RESULT_NEED_RESET;
-   }
+   pr_warn("EEH: This PCI device has failed %d times in the last hour and 
will be permanently disabled after %d failures.\n",
+   pe->freeze_count, eeh_max_freezes);
+   pr_info("EEH: Notify device drivers to shutdown\n");
+   eeh_set_channel_state(pe, pci_channel_io_frozen);
+   eeh_set_irq_state(pe, false);
+   eeh_pe_report("error_detected(IO frozen)", pe,
+ eeh_report_error, &result);
+   if (result == PCI_ERS_RESULT_DISCONNECT)
+   goto recover_failed;
+
+   /*
+* Error logged on a PHB are always fences which need a full
+* PHB reset to clear so force that to happen.
+*/
+   if ((pe->type & EEH_PE_PHB) && result != PCI_ERS_RESULT_NONE)
+   resul

[PATCH 1/2] eeh: Small refactor of eeh_handle_normal_event

2021-10-15 Thread Daniel Axtens

The control flow of eeh_handle_normal_event is a bit tricky.

Break out one of the error handling paths - rather than be in
an else block, we'll make it part of the regular body of the
function and put a 'goto out;' in the true limb of the if.

Signed-off-by: Daniel Axtens 

---

This doesn't really make things that much simpler to comprehend
by itself but it makes the next patch a bit cleaner.
---
 arch/powerpc/kernel/eeh_driver.c | 69 
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3eff6a4888e7..cb3ac555c944 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -1054,45 +1054,46 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
}
 
pr_info("EEH: Recovery successful.\n");
-   } else  {
-   /*
-* About 90% of all real-life EEH failures in the field
-* are due to poorly seated PCI cards. Only 10% or so are
-* due to actual, failed cards.
-*/
-   pr_err("EEH: Unable to recover from failure from 
PHB#%x-PE#%x.\n"
-  "Please try reseating or replacing it\n",
-   pe->phb->global_number, pe->addr);
+   goto out;
+   }
 
-   eeh_slot_error_detail(pe, EEH_LOG_PERM);
+   /*
+* About 90% of all real-life EEH failures in the field
+* are due to poorly seated PCI cards. Only 10% or so are
+* due to actual, failed cards.
+*/
+   pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n"
+   "Please try reseating or replacing it\n",
+   pe->phb->global_number, pe->addr);
 
-   /* Notify all devices that they're about to go down. */
-   eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-   eeh_set_irq_state(pe, false);
-   eeh_pe_report("error_detected(permanent failure)", pe,
- eeh_report_failure, NULL);
+   eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
-   /* Mark the PE to be removed permanently */
-   eeh_pe_state_mark(pe, EEH_PE_REMOVED);
+   /* Notify all devices that they're about to go down. */
+   eeh_set_channel_state(pe, pci_channel_io_perm_failure);
+   eeh_set_irq_state(pe, false);
+   eeh_pe_report("error_detected(permanent failure)", pe,
+ eeh_report_failure, NULL);
 
-   /*
-* Shut down the device drivers for good. We mark
-* all removed devices correctly to avoid access
-* the their PCI config any more.
-*/
-   if (pe->type & EEH_PE_VF) {
-   eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
-   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-   } else {
-   eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
-   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+   /* Mark the PE to be removed permanently */
+   eeh_pe_state_mark(pe, EEH_PE_REMOVED);
 
-   pci_lock_rescan_remove();
-   pci_hp_remove_devices(bus);
-   pci_unlock_rescan_remove();
-   /* The passed PE should no longer be used */
-   return;
-   }
+   /*
+* Shut down the device drivers for good. We mark
+* all removed devices correctly to avoid access
+* the their PCI config any more.
+*/
+   if (pe->type & EEH_PE_VF) {
+   eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+   } else {
+   eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
+   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+
+   pci_lock_rescan_remove();
+   pci_hp_remove_devices(bus);
+   pci_unlock_rescan_remove();
+   /* The passed PE should no longer be used */
+   return;
}
 
 out:
-- 
2.25.1

Re: [PATCH v2 03/13] powerpc: Remove func_descr_t

2021-10-14 Thread Daniel Axtens

Christophe Leroy  writes:

> 'func_descr_t' is redundant with 'struct ppc64_opd_entry'

So, if I understand the overall direction of the series, you're
consolidating powerpc around one single type for function descriptors,
and then you're creating a generic typedef so that generic code can
always do ((func_desc_t)x)->addr to get the address of a function out of
a function descriptor regardless of arch. (And regardless of whether the
arch uses function descriptors or not.)

So:

 - why pick ppc64_opd_entry over func_descr_t?

 - Why not make our struct just called func_desc_t - why have a
   ppc64_opd_entry type or a func_descr_t typedef?

 - Should this patch wait until after you've made the generic
   func_desc_t change and move directly to that new interface? (rather
   than move from func_descr_t -> ppc64_opd_entry -> ...) Or is there a
   particular reason arch specific code should use an arch-specific
   struct or named type?

> Remove it.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/code-patching.h | 2 +-
>  arch/powerpc/include/asm/types.h | 6 --
>  arch/powerpc/kernel/signal_64.c  | 8 
>  3 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 4ba834599c4d..f3445188d319 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -110,7 +110,7 @@ static inline unsigned long ppc_function_entry(void *func)
>* function's descriptor. The first entry in the descriptor is the
>* address of the function text.
>*/
> - return ((func_descr_t *)func)->entry;
> + return ((struct ppc64_opd_entry *)func)->addr;
>  #else
>   return (unsigned long)func;
>  #endif
> diff --git a/arch/powerpc/include/asm/types.h 
> b/arch/powerpc/include/asm/types.h
> index f1630c553efe..97da77bc48c9 100644
> --- a/arch/powerpc/include/asm/types.h
> +++ b/arch/powerpc/include/asm/types.h
> @@ -23,12 +23,6 @@
>  
>  typedef __vector128 vector128;
>  
> -typedef struct {
> - unsigned long entry;
> - unsigned long toc;
> - unsigned long env;
> -} func_descr_t;

I was a little concerned about going from a 3-element struct to a
2-element struct (as ppc64_opd_entry doesn't have an element for env) -
but we don't seem to take the sizeof this anywhere, nor do we use env
anywhere, nor do we do funky macro stuff with it in the signal handling
code that might implictly use the 3rd element, so I guess this will
work. Still, func_descr_t seems to describe the underlying ABI better
than ppc64_opd_entry...

>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_TYPES_H */
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 1831bba0582e..63ddbe7b108c 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -933,11 +933,11 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t 
> *set,
>* descriptor is the entry address of signal and the second
>* entry is the TOC value we need to use.
>*/
> - func_descr_t __user *funct_desc_ptr =
> - (func_descr_t __user *) ksig->ka.sa.sa_handler;
> + struct ppc64_opd_entry __user *funct_desc_ptr =
> + (struct ppc64_opd_entry __user *)ksig->ka.sa.sa_handler;
>  
> - err |= get_user(regs->ctr, &funct_desc_ptr->entry);
> - err |= get_user(regs->gpr[2], &funct_desc_ptr->toc);
> + err |= get_user(regs->ctr, &funct_desc_ptr->addr);
> + err |= get_user(regs->gpr[2], &funct_desc_ptr->r2);

Likewise, r2 seems like a worse name than toc. I guess we could clean
that up another time though.

Kind regards,
Daniel

>   }
>  
>   /* enter the signal handler in native-endian mode */
> -- 
> 2.31.1

Re: [PATCH v2 02/13] powerpc: Rename 'funcaddr' to 'addr' in 'struct ppc64_opd_entry'

2021-10-14 Thread Daniel Axtens

Christophe Leroy  writes:

> There are three architectures with function descriptors, try to
> have common names for the address they contain in order to
> refactor some functions into generic functions later.
>
> powerpc has 'funcaddr'
> ia64 has 'ip'
> parisc has 'addr'
>
> Vote for 'addr' and update 'struct ppc64_opd_entry' accordingly.

I would have picked 'funcaddr', but at least 'addr' is better than 'ip'!
And I agree that consistency, and then making things generic is worthwhile.

I grepped the latest powerpc/next for uses of 'funcaddr'. There were 5,
your patch changes all 5.

The series passes build tests and this patch has no checkpatch or other
style concerns.

On that basis:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Reviewed-by: Kees Cook 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/elf.h  | 2 +-
>  arch/powerpc/include/asm/sections.h | 2 +-
>  arch/powerpc/kernel/module_64.c | 6 +++---
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
> index a4406714c060..bb0f278f9ed4 100644
> --- a/arch/powerpc/include/asm/elf.h
> +++ b/arch/powerpc/include/asm/elf.h
> @@ -178,7 +178,7 @@ void relocate(unsigned long final_address);
>  
>  /* There's actually a third entry here, but it's unused */
>  struct ppc64_opd_entry {
> - unsigned long funcaddr;
> + unsigned long addr;
>   unsigned long r2;
>  };
>  
> diff --git a/arch/powerpc/include/asm/sections.h 
> b/arch/powerpc/include/asm/sections.h
> index 6e4af4492a14..32e7035863ac 100644
> --- a/arch/powerpc/include/asm/sections.h
> +++ b/arch/powerpc/include/asm/sections.h
> @@ -77,7 +77,7 @@ static inline void *dereference_function_descriptor(void 
> *ptr)
>   struct ppc64_opd_entry *desc = ptr;
>   void *p;
>  
> - if (!get_kernel_nofault(p, (void *)&desc->funcaddr))
> + if (!get_kernel_nofault(p, (void *)&desc->addr))
>   ptr = p;
>   return ptr;
>  }
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 6baa676e7cb6..82908c9be627 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -72,11 +72,11 @@ static func_desc_t func_desc(unsigned long addr)
>  }
>  static unsigned long func_addr(unsigned long addr)
>  {
> - return func_desc(addr).funcaddr;
> + return func_desc(addr).addr;
>  }
>  static unsigned long stub_func_addr(func_desc_t func)
>  {
> - return func.funcaddr;
> + return func.addr;
>  }
>  static unsigned int local_entry_offset(const Elf64_Sym *sym)
>  {
> @@ -187,7 +187,7 @@ static int relacmp(const void *_x, const void *_y)
>  static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
>   const Elf64_Shdr *sechdrs)
>  {
> - /* One extra reloc so it's always 0-funcaddr terminated */
> + /* One extra reloc so it's always 0-addr terminated */
>   unsigned long relocs = 1;
>   unsigned i;
>  
> -- 
> 2.31.1

Re: [PATCH v2 00/13] Fix LKDTM for PPC64/IA64/PARISC

2021-10-14 Thread Daniel Axtens

Christophe Leroy  writes:

> PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work
> on those three architectures because LKDTM messes up function
> descriptors with functions.

Just to nitpick, it's powerpc 64-bit using the ELFv1 ABI. [1]

The ELFv2 ABI [2] doesn't use function descriptors. (ELFv2 is used
primarily for ppc64le, but some people like musl support it for BE as
well.)

This doesn't affect the correctness or desirability of your changes, it
was just bugging me when I was reading the commit messages :-)

Kind regards,
Daniel

[1] See e.g. https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html
[2] 
https://openpowerfoundation.org/wp-content/uploads/2016/03/ABI64BitOpenPOWERv1.1_16July2015_pub4.pdf


> This series does some cleanup in the three architectures and
> refactors function descriptors so that it can then easily use it
> in a generic way in LKDTM.
>
> Patch 8 is not absolutely necessary but it is a good trivial cleanup.
>
> Changes in v2:
> - Addressed received comments
> - Moved dereference_[kernel]_function_descriptor() out of line
> - Added patches to remove func_descr_t and func_desc_t in powerpc
> - Using func_desc_t instead of funct_descr_t
> - Renamed HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR to HAVE_FUNCTION_DESCRIPTORS
> - Added a new lkdtm test to check protection of function descriptors
>
> Christophe Leroy (13):
>   powerpc: Move 'struct ppc64_opd_entry' back into asm/elf.h
>   powerpc: Rename 'funcaddr' to 'addr' in 'struct ppc64_opd_entry'
>   powerpc: Remove func_descr_t
>   powerpc: Prepare func_desc_t for refactorisation
>   ia64: Rename 'ip' to 'addr' in 'struct fdesc'
>   asm-generic: Use HAVE_FUNCTION_DESCRIPTORS to define associated stubs
>   asm-generic: Define 'func_desc_t' to commonly describe function
> descriptors
>   asm-generic: Refactor dereference_[kernel]_function_descriptor()
>   lkdtm: Force do_nothing() out of line
>   lkdtm: Really write into kernel text in WRITE_KERN
>   lkdtm: Fix lkdtm_EXEC_RODATA()
>   lkdtm: Fix execute_[user]_location()
>   lkdtm: Add a test for function descriptors protection
>
>  arch/ia64/include/asm/elf.h  |  2 +-
>  arch/ia64/include/asm/sections.h | 25 ++---
>  arch/ia64/kernel/module.c|  6 +--
>  arch/parisc/include/asm/sections.h   | 17 +++---
>  arch/parisc/kernel/process.c | 21 
>  arch/powerpc/include/asm/code-patching.h |  2 +-
>  arch/powerpc/include/asm/elf.h   |  6 +++
>  arch/powerpc/include/asm/sections.h  | 30 ++-
>  arch/powerpc/include/asm/types.h |  6 ---
>  arch/powerpc/include/uapi/asm/elf.h  |  8 ---
>  arch/powerpc/kernel/module_64.c  | 38 +
>  arch/powerpc/kernel/signal_64.c  |  8 +--
>  drivers/misc/lkdtm/core.c|  1 +
>  drivers/misc/lkdtm/lkdtm.h   |  1 +
>  drivers/misc/lkdtm/perms.c   | 68 
>  include/asm-generic/sections.h   | 13 -
>  include/linux/kallsyms.h |  2 +-
>  kernel/extable.c | 23 +++-
>  18 files changed, 138 insertions(+), 139 deletions(-)
>
> -- 
> 2.31.1

Re: [PATCH v2 01/13] powerpc: Move 'struct ppc64_opd_entry' back into asm/elf.h

2021-10-14 Thread Daniel Axtens

Hi Christophe,

> 'struct ppc64_opd_entry' doesn't belong to uapi/asm/elf.h
>
> It was initially in module_64.c and commit 2d291e902791 ("Fix compile
> failure with non modular builds") moved it into asm/elf.h
>
> But it was by mistake added outside of __KERNEL__ section,
> therefore commit c3617f72036c ("UAPI: (Scripted) Disintegrate
> arch/powerpc/include/asm") moved it to uapi/asm/elf.h

As Michael said on v1, I'm a little nervous about moving it out of uAPI
after so long, although I do take the points of Arnd and Kees that we're
not breaking compiled binaries, nor should people be using this struct
to begin with...

I've cc:ed the linux-api@ list.

Kind regards,
Daniel

> Move it back into asm/elf.h, this brings it back in line with
> IA64 and PARISC architectures.
>
> Fixes: 2d291e902791 ("Fix compile failure with non modular builds")
> Reviewed-by: Kees Cook 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/elf.h  | 6 ++
>  arch/powerpc/include/uapi/asm/elf.h | 8 
>  2 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
> index b8425e3cfd81..a4406714c060 100644
> --- a/arch/powerpc/include/asm/elf.h
> +++ b/arch/powerpc/include/asm/elf.h
> @@ -176,4 +176,10 @@ do { 
> \
>  /* Relocate the kernel image to @final_address */
>  void relocate(unsigned long final_address);
>  
> +/* There's actually a third entry here, but it's unused */
> +struct ppc64_opd_entry {
> + unsigned long funcaddr;
> + unsigned long r2;
> +};
> +
>  #endif /* _ASM_POWERPC_ELF_H */
> diff --git a/arch/powerpc/include/uapi/asm/elf.h 
> b/arch/powerpc/include/uapi/asm/elf.h
> index 860c59291bfc..308857123a08 100644
> --- a/arch/powerpc/include/uapi/asm/elf.h
> +++ b/arch/powerpc/include/uapi/asm/elf.h
> @@ -289,12 +289,4 @@ typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
>  /* Keep this the last entry.  */
>  #define R_PPC64_NUM  253
>  
> -/* There's actually a third entry here, but it's unused */
> -struct ppc64_opd_entry
> -{
> - unsigned long funcaddr;
> - unsigned long r2;
> -};
> -
> -
>  #endif /* _UAPI_ASM_POWERPC_ELF_H */
> -- 
> 2.31.1

Re: [PATCH] powerpc/eeh:Fix docstrings in eeh

2021-10-07 Thread Daniel Axtens

Hi Kai,

Thank you for your patch! I have 3 very minor tweaks and I am otherwise
very happy with it.

Firstly, in your commit name, there should be a space between
"powerpc/eeh:" and "Fix docstrings". You might also want to say "in
eeh.c" rather than "in eeh" because there is eeh code in a number of
other files too.

> We fix the following warnings when building kernel with W=1:
> arch/powerpc/kernel/eeh.c:598: warning: Function parameter or member 
> 'function' not described in 'eeh_pci_enable'
> arch/powerpc/kernel/eeh.c:774: warning: Function parameter or member 'edev' 
> not described in 'eeh_set_dev_freset'
> arch/powerpc/kernel/eeh.c:774: warning: expecting prototype for 
> eeh_set_pe_freset(). Prototype was for eeh_set_dev_freset() instead
> arch/powerpc/kernel/eeh.c:814: warning: Function parameter or member 
> 'include_passed' not described in 'eeh_pe_reset_full'
> arch/powerpc/kernel/eeh.c:944: warning: Function parameter or member 'ops' 
> not described in 'eeh_init'
> arch/powerpc/kernel/eeh.c:1451: warning: Function parameter or member 
> 'include_passed' not described in 'eeh_pe_reset'
> arch/powerpc/kernel/eeh.c:1526: warning: Function parameter or member 'func' 
> not described in 'eeh_pe_inject_err'
> arch/powerpc/kernel/eeh.c:1526: warning: Excess function parameter 'function' 
> described in 'eeh_pe_inject_err'
>
> Signed-off-by: Kai Song 
> ---
>  arch/powerpc/kernel/eeh.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index e9b597ed423c..57a6868a41ab 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -589,6 +589,7 @@ EXPORT_SYMBOL(eeh_check_failure);
>  /**
>   * eeh_pci_enable - Enable MMIO or DMA transfers for this slot
>   * @pe: EEH PE
> + * @function : EEH function
I don't think there should be a space between '@function' and ':'.

I know the parameter is called 'function' and I think "EEH function" was
a good guess for the docstring. However, if we look at the way
'function' is used, it is compared with EEH_OPT_xxx constants, and then
it's passed to eeh_ops->set_option(), eeh_pci_enable() is also called in
eeh_pe_set_option() with a parameter called 'option'. So I think maybe
'function' should be described as the "EEH option"?

This is still very unsatisfactory but it's not the fault of your patch -
the EEH codebase is very messy and it's worth fixing the W=1 warnings
even if we don't fully clean up the EEH codebase.

> - * eeh_set_pe_freset - Check the required reset for the indicated device
> - * @data: EEH device
> + * eeh_set_dev_freset - Check the required reset for the indicated device
> + * @edev: EEH device
>   * @flag: return value
>   *

This is good.

>  /**
>   * eeh_pe_reset_full - Complete a full reset process on the indicated PE
>   * @pe: EEH PE
> + * @include_passed: include passed-through devices?
>   *
>   * This function executes a full reset procedure on a PE, including setting
>   * the appropriate flags, performing a fundamental or hot reset, and then
> @@ -937,6 +939,7 @@ static struct notifier_block eeh_device_nb = {

This is OK.

 
>  /**
>   * eeh_init - System wide EEH initialization
> + * @ops: struct to trace EEH operation callback functions

I think "@ops: platform-specific functions for EEH operations" is
probably clearer?

>   *
>   * It's the platform's job to call this from an arch_initcall().
>   */
> @@ -1442,6 +1445,7 @@ static int eeh_pe_reenable_devices(struct eeh_pe *pe, 
> bool include_passed)
>   * eeh_pe_reset - Issue PE reset according to specified type
>   * @pe: EEH PE
>   * @option: reset type
> + * @include_passed: include passed-through devices?
>   *

This is OK.

>   * The routine is called to reset the specified PE with the
>   * indicated type, either fundamental reset or hot reset.
> @@ -1513,12 +1517,12 @@ EXPORT_SYMBOL_GPL(eeh_pe_configure);
>   * eeh_pe_inject_err - Injecting the specified PCI error to the indicated PE
>   * @pe: the indicated PE
>   * @type: error type
> - * @function: error function
> + * @func: error function

This is good.

>   * @addr: address
>   * @mask: address mask
>   *
>   * The routine is called to inject the specified PCI error, which
> - * is determined by @type and @function, to the indicated PE for
> + * is determined by @type and @func, to the indicated PE for

This is good.

When you resend, you can include:
 Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH v3 2/4] mm: Make generic arch_is_kernel_initmem_freed() do what it says

2021-10-01 Thread Daniel Axtens

>  #ifdef __KERNEL__
> +/*
> + * Check if an address is part of freed initmem. After initmem is freed,
> + * memory can be allocated from it, and such allocations would then have
> + * addresses within the range [_stext, _end].
> + */
> +#ifndef arch_is_kernel_initmem_freed
> +static int arch_is_kernel_initmem_freed(unsigned long addr)
> +{
> + if (system_state < SYSTEM_FREEING_INITMEM)
> + return 0;
> +
> + return init_section_contains((void *)addr, 1);

Is init_section_contains sufficient here?

include/asm-generic/sections.h says:
 * [__init_begin, __init_end]: contains .init.* sections, but .init.text.*
 *   may be out of this range on some architectures.
 * [_sinittext, _einittext]: contains .init.text.* sections

init_section_contains only checks __init_*:
static inline bool init_section_contains(void *virt, size_t size)
{
return memory_contains(__init_begin, __init_end, virt, size);
}

Do we need to check against _sinittext and _einittext?

Your proposed generic code will work for powerpc and s390 because those
archs only test against __init_* anyway. I don't know if any platform
actually does place .init.text outside of __init_begin=>__init_end, but
the comment seems to suggest that they could.

Kind regards,
Daniel

Re: [V2 3/4] powerpc/perf: Expose instruction and data address registers as part of extended regs

2021-09-30 Thread Daniel Axtens

Athira Rajeev  writes:

> Patch adds support to include Sampled Instruction Address Register

This is a nit and doesn't require a new revision, but I think this
should read "Include Sampled Instruction Address ...", not "Patch adds
support to include Sampled Instruction ..." - see
https://www.kernel.org/doc/html/v5.11/process/submitting-patches.html#describe-your-changes

> (SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended
> registers. Update the definition of PERF_REG_PMU_MASK_300/31 and
> PERF_REG_EXTENDED_MAX to include these SPR's.

> diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
> index b931eed482c9..51d31b65e423 100644
> --- a/arch/powerpc/perf/perf_regs.c
> +++ b/arch/powerpc/perf/perf_regs.c
> @@ -90,7 +90,11 @@ static u64 get_ext_regs_value(int idx)
>   return mfspr(SPRN_SIER2);
>   case PERF_REG_POWERPC_SIER3:
>   return mfspr(SPRN_SIER3);
> + case PERF_REG_POWERPC_SDAR:
> + return mfspr(SPRN_SDAR);
>  #endif
> + case PERF_REG_POWERPC_SIAR:
> + return mfspr(SPRN_SIAR);

I was initially confused about why SIAR was outside the CONFIG_PPC64
block and SDAR was inside. But it turns out that SIAR is also defined
for a 32 bit platform, so that makes sense.

I'm not an expert on how the perf subsystem works, but this all seems
consistent with the surrounding code and it seems to do what the commit
message says, so on that limited basis:

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [V2 2/4] tools/perf: Refactor the code definition of perf reg extended mask in tools side header file

2021-09-30 Thread Daniel Axtens

Hi Athira,

> PERF_REG_PMU_MASK_300 and PERF_REG_PMU_MASK_31 defines the mask
> value for extended registers. Current definition of these mask values
> uses hex constant and does not use registers by name, making it less
> readable. Patch refactor the macro values in perf tools side header file
> by or'ing together the actual register value constants.
>
This looks like a good simplification.

> -/* Exclude MMCR3, SIER2, SIER3 for CPU_FTR_ARCH_300 */
> -#define  PERF_EXCLUDE_REG_EXT_300(7ULL << PERF_REG_POWERPC_MMCR3)

This file is uAPI - are we allowed to remove a define? Could a program
built against these headers now fail to compile because we've removed it?

> -
>  /*
>   * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300
>   * includes 9 SPRS from MMCR0 to PMC6 excluding the
> - * unsupported SPRS in PERF_EXCLUDE_REG_EXT_300.
> + * unsupported SPRS MMCR3, SIER2 and SIER3.
>   */
> -#define PERF_REG_PMU_MASK_300   ((0xfffULL << PERF_REG_POWERPC_MMCR0) - 
> PERF_EXCLUDE_REG_EXT_300)
> +#define PERF_REG_PMU_MASK_300\
> + ((1ul << PERF_REG_POWERPC_MMCR0) | (1ul << PERF_REG_POWERPC_MMCR1) | \
> + (1ul << PERF_REG_POWERPC_MMCR2) | (1ul << PERF_REG_POWERPC_PMC1) | \
> + (1ul << PERF_REG_POWERPC_PMC2) | (1ul << PERF_REG_POWERPC_PMC3) | \
> + (1ul << PERF_REG_POWERPC_PMC4) | (1ul << PERF_REG_POWERPC_PMC5) | \
> + (1ul << PERF_REG_POWERPC_PMC6))
>  
>  /*
>   * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31
>   * includes 12 SPRs from MMCR0 to PMC6.
>   */
> -#define PERF_REG_PMU_MASK_31   (0xfffULL << PERF_REG_POWERPC_MMCR0)
> +#define PERF_REG_PMU_MASK_31 \
> + (PERF_REG_PMU_MASK_300 | (1ul << PERF_REG_POWERPC_MMCR3) | \
> + (1ul << PERF_REG_POWERPC_SIER2) | (1ul << PERF_REG_POWERPC_SIER3))
>  
> -#define PERF_REG_EXTENDED_MAX  (PERF_REG_POWERPC_PMC6 + 1)

Likewise for this define?

I think this might also be an issue for some of your other patches which
change an include/uapi/ file.

Kind regards,
Daniel

> -- 
> 2.30.1 (Apple Git-130)

Re: [PATCH] powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload

2021-09-30 Thread Daniel Axtens

Hi Vasant,

> Commit 587164cd, introduced new opal message type (OPAL_MSG_PRD2) and added
> opal notifier. But I missed to unregister the notifier during module unload
> path. This results in below call trace if you try to unload and load
> opal_prd module.
>
> Fixes: 587164cd ("powerpc/powernv: Add new opal message type")

In reviewing this patch, I've become a bit worried the underlying patch
has another issue that we should fix.

Consider opal_prd_probe and the opal_prd_event_nb:

static struct notifier_block opal_prd_event_nb = {
.notifier_call  = opal_prd_msg_notifier,
.next   = NULL,
.priority   = 0,
};

static int opal_prd_probe(struct platform_device *pdev)
{
...
rc = opal_message_notifier_register(OPAL_MSG_PRD, &opal_prd_event_nb);
if (rc) { ... }

rc = opal_message_notifier_register(OPAL_MSG_PRD2, &opal_prd_event_nb);
if (rc) { ... }

I don't think you can reuse a single notifier block for multiple
notifier_register calls. opal_message_notify_register calls
atomic_notifier_chain_register which takes a spinlock and calls
notifer_chain_register which reads:

static int notifier_chain_register(struct notifier_block **nl,
struct notifier_block *n)
{
while ((*nl) != NULL) {
// ... skip some error detection
// ... find the right spot in the list to add this
if (n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
}
n->next = *nl; // <-- mutate the notifier block!!
rcu_assign_pointer(*nl, n);
return 0;
}

So we have the same notifier block getting inserted into two different
linked lists. This is unlikely to break at the moment because nothing
else registers an MSG_PRD/PRD2 notifier, but nonetheless I think you
need to use a different notifer_block struct for each list you insert
your notifier into.

Likewise this could cause other problems during removal, if there were
other entries in either notifier list.

Kind regards,
Daniel

Re: [PATCH] powerpc/eeh:Fix some mistakes in comments

2021-09-29 Thread Daniel Axtens

Hi Kai,

Thank you for your contribution to the powerpc kernel!

> Get rid of warning:
> arch/powerpc/kernel/eeh.c:774: warning: expecting prototype for 
> eeh_set_pe_freset(). Prototype was for eeh_set_dev_freset() instead

You haven't said where this warning is from. I thought it might be from
sparse but I couldn't seem to reproduce it - is my version of sparse too
old or are you using a different tool?

>  /**
> - * eeh_set_pe_freset - Check the required reset for the indicated device
> - * @data: EEH device
> + * eeh_set_dev_freset - Check the required reset for the indicated device
> + * @edev: EEH device
>   * @flag: return value
>   *
>   * Each device might have its preferred reset type: fundamental or

This looks like a good and correct change.

I checked through git history with git blame to see when the function
was renamed. There are 2 commits that should have updated the comment:
one renamed the function and one renamed an argument. So, I think this
commit could have:

Fixes: d6c4932fbf24 ("powerpc/eeh: Strengthen types of eeh traversal functions")
Fixes: c270a24c59bd ("powerpc/eeh: Do reset based on PE")

But I don't know if an out of date comment is enough of a 'bug' to
justify a Fixes: tag? (mpe, I'm sure I've asked this before, sorry!)

All up, this is a good correction to the comment.

There are a few other functions in the file that have incorrect
docstrings:

 - eeh_pci_enable - missing parameter

 - eeh_pe_reset and eeh_pe_reset_full - missing parameter

 - eeh_init - missing parameter

 - eeh_pe_inject_err - wrong name for a parameter

Could you fix all of the docstrings in the file at once?

Kind regards,
Daniel

Re: [PATCH] KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()

2021-09-29 Thread Daniel Axtens

Hi Michael,

> kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into
> it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because
> kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable.

This makes sense. LOAD_REG_ADDR() does ld reg,name@got(r2) and
_GLOBAL_TOC sets r2 based on r12 and .TOC. .

Looking at
e.g. https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1610846.html
it seems that we use GOT and TOC largely interchangeably... so assuming
I haven't completely misunderstood, the change this patch makes seems to
make sense to me. :)

> When called from hcall_try_real_mode() we have the kernel TOC in r2,
> established near the start of kvmppc_interrupt_hv(), so there is no
> issue.
>
> But they can also be called from kvmppc_pseries_do_hcall() which is
> module code, so the access ends up happening with the kvm-hv module's
> r2, which will not point at dawr_force_enable and could even cause a
> fault.

I checked and there isn't anywhere else the functions are called, so
this will now cover everything.

> With the current code layout and compilers we haven't observed a fault
> in practice, the load hits somewhere in kvm-hv.ko and silently returns
> some bogus value.
>
> Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses
> h_set_dabr() to test if sc1 works correctly, see SLOF's
> lib/libhvcall/brokensc1.c.

I assume that something (the module loader?) patches the callsite to
restore r2 after the function call? I imagine something must otherwise
things would fall apart pretty quickly...

> Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")

That patch seems to only affect the DA_W_R not the DA_B_R - how does it
cause this bug?

All in all this looks good to me:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 90484425a1e6..30a8a07cff18 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1999,7 +1999,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>   .globl  hcall_real_table_end
>  hcall_real_table_end:
>  
> -_GLOBAL(kvmppc_h_set_xdabr)
> +_GLOBAL_TOC(kvmppc_h_set_xdabr)
>  EXPORT_SYMBOL_GPL(kvmppc_h_set_xdabr)
>   andi.   r0, r5, DABRX_USER | DABRX_KERNEL
>   beq 6f
> @@ -2009,7 +2009,7 @@ EXPORT_SYMBOL_GPL(kvmppc_h_set_xdabr)
>  6:   li  r3, H_PARAMETER
>   blr
>  
> -_GLOBAL(kvmppc_h_set_dabr)
> +_GLOBAL_TOC(kvmppc_h_set_dabr)
>  EXPORT_SYMBOL_GPL(kvmppc_h_set_dabr)
>   li  r5, DABRX_USER | DABRX_KERNEL
>  3:
> -- 
> 2.25.1

Re: [PATCH v1 1/2] powerpc/64s: system call rfscv workaround for TM bugs

2021-09-17 Thread Daniel Axtens

Nicholas Piggin  writes:

> The rfscv instruction does not work correctly with the fake-suspend mode
> in POWER9, which can end up with the hypervisor restoring an incorrect
> checkpoint.

If I understand correctly from commit 4bb3c7a0208f ("KVM: PPC: Book3S
HV: Work around transactional memory bugs in POWER9"), this is because
rfscv does not cause a soft-patch interrupt in the way that rfid etc do.
So we need to avoid calling rfscv if we are in fake-suspend state -
instead we must call something that does indeed get soft-patched - like
rfid.

> Work around this by setting the _TIF_RESTOREALL flag if a system call
> returns to a transaction active state, causing rfid to be used instead
> of rfscv to return, which will do the right thing. The contents of the
> registers are irrelevant because they will be overwritten in this case
> anyway.

I can follow that this will indeed cause syscall_exit_prepare to return
non-zero and therefore we should take the
syscall_vectored_*_restore_regs path which does an RFID_TO_USER rather
than a RFSCV_TO_USER. My only question/concern is:

.Lsyscall_vectored_\name\()_exit:
addir4,r1,STACK_FRAME_OVERHEAD
li  r5,1 /* scv */
bl  syscall_exit_prepare< we get r3 != 0  here
std r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
.Lsyscall_vectored_\name\()_rst_start:
lbz r11,PACAIRQHAPPENED(r13)
andi.   r11,r11,(~PACA_IRQ_HARD_DIS)@l
bne-syscall_vectored_\name\()_restart <-- can we end up taking
  this branch?

Are there any circumstances that would take us down the _restart path,
and if so, will we still go through the correct RFID_TO_USER branch
rather than the RFSCV_TO_USER branch?

Apart from that this looks good to me, although with the heavy
disclaimer that I only learned about fake suspend for the first time
while reviewing the patch.

Kind regards,
Daniel

>
> Reported-by: Eirik Fuller 
> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv 
> instructions")
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/interrupt.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index c77c80214ad3..917a2ac4def6 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -139,6 +139,19 @@ notrace long system_call_exception(long r3, long r4, 
> long r5,
>*/
>   irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
>  
> + /*
> +  * If system call is called with TM active, set _TIF_RESTOREALL to
> +  * prevent RFSCV being used to return to userspace, because POWER9
> +  * TM implementation has problems with this instruction returning to
> +  * transactional state. Final register values are not relevant because
> +  * the transaction will be aborted upon return anyway. Or in the case
> +  * of unsupported_scv SIGILL fault, the return state does not much
> +  * matter because it's an edge case.
> +  */
> + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> + unlikely(MSR_TM_TRANSACTIONAL(regs->msr)))
> + current_thread_info()->flags |= _TIF_RESTOREALL;
> +
>   /*
>* If the system call was made with a transaction active, doom it and
>* return without performing the system call. Unless it was an
> -- 
> 2.23.0

Re: [PATCH v2] powerpc/mce: Fix access error in mce handler

2021-09-16 Thread Daniel Axtens

Hi Ganesh,

> We queue an irq work for deferred processing of mce event
> in realmode mce handler, where translation is disabled.
> Queuing of the work may result in accessing memory outside
> RMO region, such access needs the translation to be enabled
> for an LPAR running with hash mmu else the kernel crashes.
>
> After enabling translation in mce_handle_error() we used to
> leave it enabled to avoid crashing here, but now with the
> commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
> returning from handler") we are restoring the MSR to disable
> translation.
>
> Hence to fix this enable the translation before queuing the work.

[snip]

> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
> handler")

That patch changes arch/powerpc/powerpc/platforms/pseries/ras.c just
below this comment:

/*
 * Enable translation as we will be accessing per-cpu variables
 * in save_mce_event() which may fall outside RMO region, also
 * leave it enabled because subsequently we will be queuing work
 * to workqueues where again per-cpu variables accessed, besides
 * fwnmi_release_errinfo() crashes when called in realmode on
 * pseries.
 * Note: All the realmode handling like flushing SLB entries for
 *   SLB multihit is done by now.
 */

That suggests per-cpu variables need protection. In your patch, you
enable translations just around irq_work_queue:

> + /* Queue irq work to process this event later. Before
> +  * queuing the work enable translation for non radix LPAR,
> +  * as irq_work_queue may try to access memory outside RMO
> +  * region.
> +  */
> + if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
> + msr = mfmsr();
> + mtmsr(msr | MSR_IR | MSR_DR);
> + irq_work_queue(&mce_event_process_work);
> + mtmsr(msr);
> + } else {
> + irq_work_queue(&mce_event_process_work);
> + }

However, just before that in the function, there are a few things that
access per-cpu variables via the local_paca, e.g.:

memcpy(&local_paca->mce_info->mce_event_queue[index],
   &evt, sizeof(evt));

Do we need to widen the window where translations are enabled in order
to protect accesses to local_paca?

Kind regards,
Daniel

Re: [PATCH] powerpc/time: Remove generic_suspend_{dis/en}able_irqs()

2021-09-03 Thread Daniel Axtens

Christophe Leroy  writes:

> Commit d75d68cfef49 ("powerpc: Clean up obsolete code relating to
> decrementer and timebase") made generic_suspend_enable_irqs() and
> generic_suspend_disable_irqs() static.
>
> Fold them into their only caller.

This does what it says, and simplifies the code.
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH] powerpc/machdep: Remove stale functions from ppc_md structure

2021-09-02 Thread Daniel Axtens

Hi Christophe,

> ppc_md.iommu_save() is not set anymore by any platform after
> commit c40785ad305b ("powerpc/dart: Use a cachable DART").
> So iommu_save() has become a nop and can be removed.

I wonder if it makes sense to have an iommu_restore() without an
iommu_save. Only dart_iommu.c defines an iommu_restore(), but I couldn't
figure out if it was safe to remove and it seems like it still did
something...

> ppc_md.show_percpuinfo() is not set anymore by any platform after
> commit 4350147a816b ("[PATCH] ppc64: SMU based macs cpufreq support").
>
> Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were
> removed by commit 0f03a43b8f0f ("[POWERPC] Remove todc code from
> ARCH=powerpc")
>
> Last user of kgdb_map_scc() was removed by commit 17ce452f7ea3 ("kgdb,
> powerpc: arch specific powerpc kgdb support").
>
> ppc.machine_kexec_prepare() has not been used since
> commit 8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform
> code"). This allows the removal of machine_kexec_prepare() and the
> rename of default_machine_kexec_prepare() into machine_kexec_prepare()

I think you should also remove the prototype from
arch/powerpc/include/asm/kexec.h

Apart from that:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel Axtens

[PATCH] powerpc: Remove unused prototype for of_show_percpuinfo

2021-09-02 Thread Daniel Axtens

commit 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
removed of_show_percpuinfo but didn't remove the prototype.

Remove it.

Fixes: 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/platforms/powermac/pmac.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/powermac/pmac.h 
b/arch/powerpc/platforms/powermac/pmac.h
index 0d715db434dc..29d2036dcc9d 100644
--- a/arch/powerpc/platforms/powermac/pmac.h
+++ b/arch/powerpc/platforms/powermac/pmac.h
@@ -27,7 +27,6 @@ extern void pmac_nvram_update(void);
 extern unsigned char pmac_nvram_read_byte(int addr);
 extern void pmac_nvram_write_byte(int addr, unsigned char val);
 extern void pmac_pcibios_after_init(void);
-extern int of_show_percpuinfo(struct seq_file *m, int i);
 
 extern void pmac_setup_pci_dma(void);
 extern void pmac_check_ht_link(void);
-- 
2.25.1

Re: [PATCH] powerpc/64: Avoid link stack corruption in kexec_wait()

2021-08-31 Thread Daniel Axtens

Christophe Leroy  writes:

> Le 31/08/2021 à 08:17, Daniel Axtens a écrit :
>> Hi Christophe,
>> 
>>> Use bcl 20,31,+4 instead of bl in order to preserve link stack.
>>>
>>> See commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption
>>> in __get_datapage()") for details.
>> 
>>  From my understanding of that commit message, the change helps to keep
>> the link stack correctly balanced which is helpful for performance,
>> rather than for correctness. If I understand correctly, kexec_wait is
>> not in a hot path - rather it is where CPUs spin while waiting for
>> kexec. Is there any benefit in using the more complicated opcode in this
>> situation?
>
> AFAICS the main benefit is to keep things consistent over the kernel and not 
> have to wonder "is it a 
> hot path or not ? If it is I use bcl 20,31, if it is not I use bl". The best 
> way to keep things in 
> order is to always use the right instruction.

Yeah, Nick Piggin convinced me of this offline as well.

>
>> 
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>>   arch/powerpc/kernel/misc_64.S | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
>>> index 4b761a18a74d..613509907166 100644
>>> --- a/arch/powerpc/kernel/misc_64.S
>>> +++ b/arch/powerpc/kernel/misc_64.S
>>> @@ -255,7 +255,7 @@ _GLOBAL(scom970_write)
>>>* Physical (hardware) cpu id should be in r3.
>>>*/
>>>   _GLOBAL(kexec_wait)
>>> -   bl  1f
>>> +   bcl 20,31,1f
>>>   1:mflrr5
>> 
>> Would it be better to create a macro of some sort to wrap this unusual
>> special form so that the meaning is more clear?
>
> Not sure, I think people working with assembly will easily recognise that 
> form whereas an obscure 
> macro is always puzzling.
>
> I like macros when they allow you to not repeat again and again the same 
> sequence of several 
> instructions, but here it is a single quite simple instruction which is not 
> worth a macro in my mind.
>


Sure - I was mostly thinking specifically of the bcl; mflr situation but
I agree that for the single instruction it's not needed.

In short, I am convinced, and so:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Christophe

Re: [PATCH] powerpc/64: Avoid link stack corruption in kexec_wait()

2021-08-30 Thread Daniel Axtens

Hi Christophe,

> Use bcl 20,31,+4 instead of bl in order to preserve link stack.
>
> See commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption
> in __get_datapage()") for details.

>From my understanding of that commit message, the change helps to keep
the link stack correctly balanced which is helpful for performance,
rather than for correctness. If I understand correctly, kexec_wait is
not in a hot path - rather it is where CPUs spin while waiting for
kexec. Is there any benefit in using the more complicated opcode in this
situation?

> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/misc_64.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 4b761a18a74d..613509907166 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -255,7 +255,7 @@ _GLOBAL(scom970_write)
>   * Physical (hardware) cpu id should be in r3.
>   */
>  _GLOBAL(kexec_wait)
> - bl  1f
> + bcl 20,31,1f
>  1:   mflrr5

Would it be better to create a macro of some sort to wrap this unusual
special form so that the meaning is more clear?

Kind regards,
Daniel

>   addir5,r5,kexec_flag-1b
>  
> -- 
> 2.25.0

Re: [PATCH linux-next] power:pkeys: fix bugon.cocci warnings

2021-08-30 Thread Daniel Axtens

Hi Jing,

Thanks for your patch.

The patch looks good, but looking at the output of `make coccicheck
M=arch/powerpc MODE=report`, it looks like there might be a few other
things that we might want to fix. Would it be worth trying to make the
arch/powerpc directory free from coccinelle warnings in one big patch
series, and then we could add coccicheck to our automatic patch testing?
(see
e.g. 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210825064228.70487-1-deng.changch...@zte.com.cn/
 )

For this patch, I think we should try to fix all of arch/powerpc at the
same time. The check points out the following other possible uses of
BUG_ON:

arch/powerpc/include/asm/book3s/64/pgtable-64k.h:68:2-5: WARNING: Use BUG_ON 
instead of if condition followed by BUG.
Please make sure the condition has no side effects (see conditional BUG_ON 
definition in include/asm-generic/bug.h)
arch/powerpc/platforms/cell/spufs/sched.c:908:2-5: WARNING: Use BUG_ON instead 
of if condition followed by BUG.
Please make sure the condition has no side effects (see conditional BUG_ON 
definition in include/asm-generic/bug.h)
arch/powerpc/platforms/powernv/idle.c:968:3-6: WARNING: Use BUG_ON instead of 
if condition followed by BUG.
Please make sure the condition has no side effects (see conditional BUG_ON 
definition in include/asm-generic/bug.h)
arch/powerpc/platforms/powernv/idle.c:456:2-5: WARNING: Use BUG_ON instead of 
if condition followed by BUG.
Please make sure the condition has no side effects (see conditional BUG_ON 
definition in include/asm-generic/bug.h)

Kind regards,
Daniel


> Use BUG_ON instead of a if condition followed by BUG.
>
> ./arch/powerpc/include/asm/book3s/64/pkeys.h:21:2-5:WARNING
> Use BUG_ON instead of if condition followed by BUG.
> ./arch/powerpc/include/asm/book3s/64/pkeys.h:14:2-5:WARNING
> Use BUG_ON instead of if condition followed by BUG.
>
> Generated by: scripts/coccinelle/misc/bugon.cocci
>
> Reported-by: Zeal Robot 
> Signed-off-by: Jing Yangyang 
> ---
>  arch/powerpc/include/asm/book3s/64/pkeys.h | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
> b/arch/powerpc/include/asm/book3s/64/pkeys.h
> index 5b17813..5f74f0c 100644
> --- a/arch/powerpc/include/asm/book3s/64/pkeys.h
> +++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
> @@ -10,15 +10,13 @@ static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
>   if (!mmu_has_feature(MMU_FTR_PKEY))
>   return 0x0UL;
>  
> - if (radix_enabled())
> - BUG();
> + BUG_ON(radix_enabled());
>   return hash__vmflag_to_pte_pkey_bits(vm_flags);
>  }
>  
>  static inline u16 pte_to_pkey_bits(u64 pteflags)
>  {
> - if (radix_enabled())
> - BUG();
> + BUG_ON(radix_enabled());
>   return hash__pte_to_pkey_bits(pteflags);
>  }
>  
> -- 
> 1.8.3.1

Re: [PATCH v2 1/4] powerpc/64: handle MSR EE and RI in interrupt entry wrapper

2021-08-27 Thread Daniel Axtens

Hi,

> Similarly to the system call change in the previous patch, the mtmsrd to

I don't actually know what patch this was - I assume it's from a series
that has since been applied?

> enable RI can be combined with the mtmsrd to enable EE for interrupts
> which enable the latter, which tends to be the important synchronous
> interrupts (i.e., page faults).
>
> Do this by enabling EE and RI together at the beginning of the entry
> wrapper if PACA_IRQ_HARD_DIS is clear, and just enabling RI if it is set
> (which means something wanted EE=0).


> diff --git a/arch/powerpc/include/asm/interrupt.h 
> b/arch/powerpc/include/asm/interrupt.h
> index 6b800d3e2681..e3228a911b35 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -148,9 +148,21 @@ static inline void interrupt_enter_prepare(struct 
> pt_regs *regs, struct interrup
>  #endif
>  
>  #ifdef CONFIG_PPC64
> - if (irq_soft_mask_set_return(IRQS_ALL_DISABLED) == IRQS_ENABLED)
> + bool trace_enable = false;
> +
> + if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS)) {
> + if (irq_soft_mask_set_return(IRQS_DISABLED) == IRQS_ENABLED)

In the previous code we had IRQS_ALL_DISABLED, now we just have
IRQS_DISABLED. Is that intentional?

> + trace_enable = true;
> + } else {
> + irq_soft_mask_set(IRQS_DISABLED);
> + }
> + /* If the interrupt was taken with HARD_DIS set, don't enable MSR[EE] */
> + if (local_paca->irq_happened & PACA_IRQ_HARD_DIS)
> + __hard_RI_enable();
> + else
> + __hard_irq_enable();
> + if (trace_enable)
>   trace_hardirqs_off();
> - local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
>  
>   if (user_mode(regs)) {
>   CT_WARN_ON(ct_state() != CONTEXT_USER);


> @@ -901,11 +892,6 @@ INT_DEFINE_BEGIN(system_reset)
>   IVEC=0x100
>   IAREA=PACA_EXNMI
>   IVIRT=0 /* no virt entry point */
> - /*
> -  * MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
> -  * being used, so a nested NMI exception would corrupt it.
> -  */
> - ISET_RI=0
>   ISTACK=0
>   IKVM_REAL=1
>  INT_DEFINE_END(system_reset)

> @@ -986,8 +972,6 @@ EXC_COMMON_BEGIN(system_reset_common)

Right before this change, there's a comment that reads:

/*
 * Increment paca->in_nmi then enable MSR_RI. SLB or MCE will be able
 * to recover, but nested NMI will notice in_nmi and not recover
...

You've taken out the bit that enables MSR_RI, which means the comment is
no longer accurate.

Beyond that, I'm still trying to follow all the various changes in code
flows. It seems to make at least vague sense: we move the setting of
MSR_RI from the early asm to interrupt*enter_prepare. I'm just
struggling to convince myself that when we hit the underlying handler
that the RI states all line up.

Kind regards,
Daniel

Re: [PATCH v8 2/3] tty: hvc: pass DMA capable memory to put_chars()

2021-08-19 Thread Daniel Axtens

Xianting Tian  writes:

> As well known, hvc backend driver(eg, virtio-console) can register its
> operations to hvc framework. The operations can contain put_chars(),
> get_chars() and so on.
>
> Some hvc backend may do dma in its operations. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):

We could also run into issues on powerpc where Andrew is working on
adding vmap-stack but the opal hvc driver assumes that it is passed a
buffer which is not in vmalloc space but in the linear mapping. So it
would be good to fix this (or more clearly document what drivers can
expect).

> 1, c[] is on stack,
>hvc_console_print():
>   char c[N_OUTBUF] __ALIGNED__;
>   cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
>static void hvc_poll_put_char(,,char ch)
>{
>   struct tty_struct *tty = driver->ttys[0];
>   struct hvc_struct *hp = tty->driver_data;
>   int n;
>
>   do {
>   n = hp->ops->put_chars(hp->vtermno, &ch, 1);
>   } while (n <= 0);
>}
>
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the future.
>
> In this patch, we make 'char out_buf[N_OUTBUF]' and 'chat out_ch' part
> of 'struct hvc_struct', so both two buf are no longer the stack memory.
> we can use it in above two cases separately.
>
> Introduce another array(cons_outbufs[]) for buffer pointers next to
> the cons_ops[] and vtermnos[] arrays. With the array, we can easily find
> the buffer, instead of traversing hp list.
>
> With the patch, we can remove the fix c4baad5029.
>
> Signed-off-by: Xianting Tian 
> Reviewed-by: Shile Zhang 

>  struct hvc_struct {
>   struct tty_port port;
>   spinlock_t lock;
>   int index;
>   int do_wakeup;
> - char *outbuf;
> - int outbuf_size;
>   int n_outbuf;
>   uint32_t vtermno;
>   const struct hv_ops *ops;
> @@ -48,6 +56,10 @@ struct hvc_struct {
>   struct work_struct tty_resize;
>   struct list_head next;
>   unsigned long flags;
> + char out_ch;
> + char out_buf[N_OUTBUF] __ALIGNED__;
> + int outbuf_size;
> + char outbuf[0] __ALIGNED__;

I'm trying to understand this patch but I am finding it very difficult
to understand what the difference between `out_buf` and `outbuf`
(without the underscore) is supposed to be. `out_buf` is statically
sized and the size of `outbuf` is supposed to depend on the arguments to
hvc_alloc(), but I can't quite figure out what the roles of each one are
and their names are confusingly similiar!

I looked briefly at the older revisions of the series but it didn't make
things much clearer.

Could you give them clearer names?

Also, looking at Documentation/process/deprecated.rst, it looks like
maybe we want to use a 'flexible array member' instead:

.. note:: If you are using struct_size() on a structure containing a zero-length
or a one-element array as a trailing array member, please refactor such
array usage and switch to a `flexible array member
<#zero-length-and-one-element-arrays>`_ instead.

I think we want:

> + char outbuf[] __ALIGNED__;

Kind regards,
Daniel

Re: [PATCH v2 2/2] powerpc: rectify selection to ARCH_ENABLE_SPLIT_PMD_PTLOCK

2021-08-19 Thread Daniel Axtens

Lukas Bulwahn  writes:

> Commit 66f24fa766e3 ("mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK")
> selects the non-existing config ARCH_ENABLE_PMD_SPLIT_PTLOCK in
> ./arch/powerpc/platforms/Kconfig.cputype, but clearly it intends to select
> ARCH_ENABLE_SPLIT_PMD_PTLOCK here (notice the word swapping!), as this
> commit does select that for all other architectures.
>
> Rectify selection to ARCH_ENABLE_SPLIT_PMD_PTLOCK instead.
>

Yikes, yes, 66f24fa766e3 does seem to have got that wrong. It looks like
that went into 5.13.

I think we want to specifically target this for stable so that we don't
lose the perfomance and scalability benefits of split pmd ptlocks:

Cc: sta...@vger.kernel.org # v5.13+

(I don't think you need to do another revision for this, I think mpe
could add it when merging.)

I tried to check whether we accidentally broke SPLIT_PMD_PTLOCKs while
they were disabled:

 - There hasn't been any change to the pgtable_pmd_page_ctor or _dtor
   prototypes, and we haven't made any relevant changes to any of the
   files in arch/powerpc that called it.

 - I checked out v5.13 and powerpc/merge, applied this patch, built a
   pseries_le_defconfig and boot tested it in qemu. It didn't crash on
   boot or with /bin/sh and some shell commands, but I didn't exactly
   stress test the VM subsystem either.

This gives me some confidence it's both good for powerpc and stable-worthy.

Overall:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Fixes: 66f24fa766e3 ("mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK")
> Signed-off-by: Lukas Bulwahn 
> ---
>  arch/powerpc/platforms/Kconfig.cputype | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index 6794145603de..a208997ade88 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -98,7 +98,7 @@ config PPC_BOOK3S_64
>   select PPC_HAVE_PMU_SUPPORT
>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
> - select ARCH_ENABLE_PMD_SPLIT_PTLOCK
> + select ARCH_ENABLE_SPLIT_PMD_PTLOCK
>   select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>   select ARCH_SUPPORTS_HUGETLBFS
>   select ARCH_SUPPORTS_NUMA_BALANCING
> -- 
> 2.26.2

Re: [PATCH v2 1/2] powerpc: kvm: remove obsolete and unneeded select

2021-08-19 Thread Daniel Axtens

Hi Lukas,

> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index e45644657d49..ff581d70f20c 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -38,7 +38,6 @@ config KVM_BOOK3S_32_HANDLER
>  config KVM_BOOK3S_64_HANDLER
>   bool
>   select KVM_BOOK3S_HANDLER
> - select PPC_DAWR_FORCE_ENABLE

I looked at some of the history here. It looks like this select was left
over from an earlier version of the patch series that added PPC_DAWR: v2
of the series has a new symbol PPC_DAWR_FORCE_ENABLE but by version 4
that new symbol had disappeared but the select had not.

v2: 
https://lore.kernel.org/linuxppc-dev/20190513071703.25243-1-mi...@neuling.org/
v5: 
https://lore.kernel.org/linuxppc-dev/20190604030037.9424-2-mi...@neuling.org/

The rest of the patch reasoning makes sense to me: DAWR support will be
selected anyway by virtue of PPC64->PPC_DAWR so there's no need to try
to select it again anyway.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>  
>  config KVM_BOOK3S_PR_POSSIBLE
>   bool
> -- 
> 2.26.2

Re: [PATCH/RFC] powerpc/module_64: allow .init_array constructors to run

2021-08-17 Thread Daniel Axtens

Jan Stancek  writes:

> gcov and kasan rely on compiler generated constructor code.
> For modules, gcc-8 with gcov enabled generates .init_array section,
> but on ppc64le it doesn't get executed. find_module_sections() never
> finds .init_array section, because module_frob_arch_sections() renames
> it to _init_array.
>
> Avoid renaming .init_array section, so do_mod_ctors() can use it.

This (the existing renaming) breaks a KASAN test too, so I'd love to see
this fixed.

However, I don't know that renaming the section is a complete fix: from
memory it is still be possible to get relocations that are too far away
and require separate trampolines. But I wasn't able to construct a
module that exhibited this behaviour and test a fix before I got pulled
away to other things.

Kind regards,
Daniel

>
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Christophe Leroy 
> Signed-off-by: Jan Stancek 
> ---
> I wasn't able to trace the comment:
>   "We don't handle .init for the moment: rename to _init"
> to original patch (it pre-dates .git). I'm not sure if it
> still applies today, so I limited patch to .init_array. This
> fixes gcov for modules for me on ppc64le 5.14.0-rc6.
>
> Renaming issue is also mentioned in kasan patches here:
>   
> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20210319144058.772525-1-dja@axtens
>
>  arch/powerpc/kernel/module_64.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 6baa676e7cb6..c604b13ea6bf 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -299,8 +299,16 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
> sechdrs[i].sh_size);
>  
>   /* We don't handle .init for the moment: rename to _init */
> - while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
> + while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init"))) {
> +#ifdef CONFIG_CONSTRUCTORS
> + /* find_module_sections() needs .init_array intact */
> + if (strstr(secstrings + sechdrs[i].sh_name,
> + ".init_array")) {
> + break;
> + }
> +#endif
>   p[0] = '_';
> + }
>  
>   if (sechdrs[i].sh_type == SHT_SYMTAB)
>   dedotify((void *)hdr + sechdrs[i].sh_offset,
> -- 
> 2.27.0

Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic

2021-08-13 Thread Daniel Axtens

Bill Wendling  writes:

> The "-z notext" flag disables reporting an error if DT_TEXTREL is set on
> PPC with CONFIG=kdump:
>
>   ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
> local symbol in readonly segment; recompile object files with -fPIC
> or pass '-Wl,-z,notext' to allow text relocations in the output
>   >>> defined in built-in.a(arch/powerpc/kernel/misc.o)
>   >>> referenced by arch/powerpc/kernel/misc.o:(.text+0x20) in archive
>   built-in.a
>
> The BFD linker disables this by default (though it's configurable in
> current versions). LLD enables this by default. So we add the flag to
> keep LLD from emitting the error.

You didn't provide a huge amount of context but I was able to reproduce
a similar set of errors with pseries_le_defconfig and

make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- CC="ccache clang-11" 
LD=ld.lld-11 AR=llvm-ar-11 -j4 vmlinux

I also checked the manpage, and indeed the system ld does not issue this
warning/error by default.

> ---
>  arch/powerpc/Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 6505d66f1193..17a9fbf9b789 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -122,6 +122,7 @@ endif
>  
>  LDFLAGS_vmlinux-y := -Bstatic
>  LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
> +LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext

Is there any reason this should be gated on CONFIG_RELOCATABLE? (I tried
without it and got different but possibly related linker errors...)

Also, is this behaviour new?

Kind regards,
Daniel

>  LDFLAGS_vmlinux  := $(LDFLAGS_vmlinux-y)
>  
>  ifdef CONFIG_PPC64
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog

Re: [PATCH v2 1/2] powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry

2021-08-13 Thread Daniel Axtens

"Aneesh Kumar K.V"  writes:

> Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
> Also add a debugfs entry for tlb_local_single_page_flush_ceiling.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from v1:
> * switch to debugfs_create_u32
>
>  arch/powerpc/mm/book3s64/radix_tlb.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
> b/arch/powerpc/mm/book3s64/radix_tlb.c
> index aefc100d79a7..1fa2bc6a969e 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -1106,8 +1107,8 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>   * invalidating a full PID, so it has a far lower threshold to change from
>   * individual page flushes to full-pid flushes.
>   */
> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> -static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = 
> POWER9_TLB_SETS_RADIX * 2;
> +static u32 tlb_single_page_flush_ceiling __read_mostly = 33;
> +static u32 tlb_local_single_page_flush_ceiling __read_mostly = 
> POWER9_TLB_SETS_RADIX * 2;
>  
>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>   unsigned long start, unsigned long 
> end)
> @@ -1524,3 +1525,14 @@ void do_h_rpt_invalidate_prt(unsigned long pid, 
> unsigned long lpid,
>  EXPORT_SYMBOL_GPL(do_h_rpt_invalidate_prt);
>  
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> +
> +static int __init create_tlb_single_page_flush_ceiling(void)
> +{
> + debugfs_create_u32("tlb_single_page_flush_ceiling", 0600,
> +powerpc_debugfs_root, 
> &tlb_single_page_flush_ceiling);
> + debugfs_create_u32("tlb_local_single_page_flush_ceiling", 0600,
> +powerpc_debugfs_root, 
> &tlb_local_single_page_flush_ceiling);
> + return 0;
> +}
> +late_initcall(create_tlb_single_page_flush_ceiling);

This patch seems to do what the commit message says, and it does seem to
make sense to have these parameters as tunables.

I was briefly concerned that switching from an unsigned long to a u32
might lead to suboptimal code generation in older gcc versions, but it
doesn't seem to be a case where a single instruction is going to make a
huge impact.

I also wondered what the C integer promotion rules would do with a the
nr_pages > tlb*flush_ceiling comparisons, but if we are trying to flush
more than 4 billion pages we might have other, bigger problems! (Also,
if I understand the C integer rules correctly the u32 will get promoted
to an unsigned long anyway.)

All in all this seems good to me.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel



> +
> -- 
> 2.31.1

[PATCH v16 4/4] kasan: use MAX_PTRS_PER_* for early shadow tables

2021-06-23 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build. Switch to using MAX_PTRS_PER_*, which are constant.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Reviewed-by: Marco Elver 
Reviewed-by: Andrey Konovalov 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 6 +++---
 mm/kasan/init.c   | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..5310e217bd74 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -41,9 +41,9 @@ struct kunit_kasan_expectation {
 #endif
 
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.30.2

[PATCH v16 3/4] mm: define default MAX_PTRS_PER_* in include/pgtable.h

2021-06-23 Thread Daniel Axtens

Commit c65e774fb3f6 ("x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable")
made PTRS_PER_P4D variable on x86 and introduced MAX_PTRS_PER_P4D as a
constant for cases which need a compile-time constant (e.g. fixed-size
arrays).

powerpc likewise has boot-time selectable MMU features which can cause
other mm "constants" to vary. For KASAN, we have some static
PTE/PMD/PUD/P4D arrays so we need compile-time maximums for all these
constants. Extend the MAX_PTRS_PER_ idiom, and place default definitions
in include/pgtable.h. These define MAX_PTRS_PER_x to be PTRS_PER_x unless
an architecture has defined MAX_PTRS_PER_x in its arch headers.

Clean up pgtable-nop4d.h and s390's MAX_PTRS_PER_P4D definitions while
we're at it: both can just pick up the default now.

Acked-by: Andrey Konovalov 
Reviewed-by: Christophe Leroy 
Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 

---

s390 was compile tested only.
---
 arch/s390/include/asm/pgtable.h |  2 --
 include/asm-generic/pgtable-nop4d.h |  1 -
 include/linux/pgtable.h | 22 ++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 79742f497cb5..dcac7b2df72c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -343,8 +343,6 @@ static inline int is_module_addr(void *addr)
 #define PTRS_PER_P4D   _CRST_ENTRIES
 #define PTRS_PER_PGD   _CRST_ENTRIES
 
-#define MAX_PTRS_PER_P4D   PTRS_PER_P4D
-
 /*
  * Segment table and region3 table entry encoding
  * (R = read-only, I = invalid, y = young bit):
diff --git a/include/asm-generic/pgtable-nop4d.h 
b/include/asm-generic/pgtable-nop4d.h
index 2f1d0aad645c..03b7dae47dd4 100644
--- a/include/asm-generic/pgtable-nop4d.h
+++ b/include/asm-generic/pgtable-nop4d.h
@@ -9,7 +9,6 @@
 typedef struct { pgd_t pgd; } p4d_t;
 
 #define P4D_SHIFT  PGDIR_SHIFT
-#define MAX_PTRS_PER_P4D   1
 #define PTRS_PER_P4D   1
 #define P4D_SIZE   (1UL << P4D_SHIFT)
 #define P4D_MASK   (~(P4D_SIZE-1))
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index fb20c57de2ce..d147480cdefc 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1634,4 +1634,26 @@ typedef unsigned int pgtbl_mod_mask;
 #define pte_leaf_size(x) PAGE_SIZE
 #endif
 
+/*
+ * Some architectures have MMUs that are configurable or selectable at boot
+ * time. These lead to variable PTRS_PER_x. For statically allocated arrays it
+ * helps to have a static maximum value.
+ */
+
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
+#ifndef MAX_PTRS_PER_P4D
+#define MAX_PTRS_PER_P4D PTRS_PER_P4D
+#endif
+
 #endif /* _LINUX_PGTABLE_H */
-- 
2.30.2

[PATCH v16 2/4] kasan: allow architectures to provide an outline readiness check

2021-06-23 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 

--

Both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series

Build tested on arm64 with SW_TAGS and x86 with INLINE: the error fires
if I add a kasan_arch_is_ready define.
---
 mm/kasan/common.c  | 3 +++
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 6 ++
 mm/kasan/shadow.c  | 6 ++
 4 files changed, 18 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..2baf121fb8c5 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,9 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..4dbc8def64f4 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,12 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#elif !defined(CONFIG_KASAN_GENERIC) || !defined(CONFIG_KASAN_OUTLINE)
+#error kasan_arch_is_ready only works in KASAN generic outline mode!
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..8d95ee52d019 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,9 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -99,6 +102,9 @@ EXPORT_SYMBOL(kasan_poison);
 #ifdef CONFIG_KASAN_GENERIC
 void kasan_poison_last_granule(const void *addr, size_t size)
 {
+   if (!kasan_arch_is_ready())
+   return;
+
if (size & KASAN_GRANULE_MASK) {
u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
*shadow = size & KASAN_GRANULE_MASK;
-- 
2.30.2

[PATCH v16 1/4] kasan: allow an architecture to disable inline instrumentation

2021-06-23 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.*

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

* on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
real mode. However, before we turn on virtual memory, we parse the device
tree to determine which platform and MMU we're running under. That calls
generic DT code, which is instrumented. Inline instrumentation in DT would
unconditionally attempt to touch the shadow region, which we won't have
set up yet, and would crash. We can make outline mode wait for the arch to
be ready, but we can't change what the compiler inserts for inline mode.

Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 12 
 1 file changed, 12 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..c3b228828a80 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,13 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+   help
+ An architecture might not support inline instrumentation.
+ When this option is selected, inline and stack instrumentation are
+ disabled.
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +137,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +149,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +163,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, stack
+ instrumentation is also disabled as it adds inline-style
+ instrumentation that is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.30.2

[PATCH v16 0/4] KASAN core changes for ppc64 radix KASAN

2021-06-23 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU. I've been
trying this for a while, but we keep having collisions between the
kasan code in the mm tree and the code I want to put in to the ppc
tree.

This series just contains the kasan core changes that we need. These
can go in via the mm tree. I will then propose the powerpc changes for
a later cycle. (The most recent RFC for the powerpc changes is in the
v12 series at
https://lore.kernel.org/linux-mm/20210615014705.2234866-1-...@axtens.net/
)

v16 applies to next-20210622. There should be no noticeable changes to
other platforms.

Changes since v15: Review comments from Andrey. Thanks Andrey.

Changes since v14: Included a bunch of Reviewed-by:s, thanks
Christophe and Marco. Cleaned up the build time error #ifdefs, thanks
Christophe.

Changes since v13: move the MAX_PTR_PER_* definitions out of kasan and
into pgtable.h. Add a build time error to hopefully prevent any
confusion about when the new hook is applicable. Thanks Marco and
Christophe.

Changes since v12: respond to Marco's review comments - clean up the
help for ARCH_DISABLE_KASAN_INLINE, and add an arch readiness check to
the new granule poisioning function. Thanks Marco.

Daniel Axtens (4):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  mm: define default MAX_PTRS_PER_* in include/pgtable.h
  kasan: use MAX_PTRS_PER_* for early shadow tables

 arch/s390/include/asm/pgtable.h |  2 --
 include/asm-generic/pgtable-nop4d.h |  1 -
 include/linux/kasan.h   |  6 +++---
 include/linux/pgtable.h | 22 ++
 lib/Kconfig.kasan   | 12 
 mm/kasan/common.c   |  3 +++
 mm/kasan/generic.c  |  3 +++
 mm/kasan/init.c |  6 +++---
 mm/kasan/kasan.h|  6 ++
 mm/kasan/shadow.c   |  6 ++
 10 files changed, 58 insertions(+), 9 deletions(-)

-- 
2.30.2

Re: [PATCH 2/3] powerpc: Define swapper_pg_dir[] in C

2021-06-23 Thread Daniel Axtens

Michael Ellerman  writes:

> Daniel Axtens  writes:
>> Hi Christophe,
>>
>> This breaks booting a radix KVM guest with 4k pages for me:
>>
>> make pseries_le_defconfig
>> scripts/config -d CONFIG_PPC_64K_PAGES
>> scripts/config -e CONFIG_PPC_4K_PAGES
>> make vmlinux
>> sudo qemu-system-ppc64 -enable-kvm -M pseries -m 1G -nographic -vga none 
>> -smp 4 -cpu host -kernel vmlinux
>>
>> Boot hangs after printing 'Booting Linux via __start()' and qemu's 'info
>> registers' reports that it's stuck at the instruction fetch exception.
>>
>> My host is Power9, 64k page size radix, and
>> gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 
>> 2.34
>>
>
> ...
>>> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
>>> index 730838c7ca39..79f2d1e61abd 100644
>>> --- a/arch/powerpc/kernel/head_64.S
>>> +++ b/arch/powerpc/kernel/head_64.S
>>> @@ -997,18 +997,3 @@ start_here_common:
>>>  0: trap
>>> EMIT_BUG_ENTRY 0b, __FILE__, __LINE__, 0
>>> .previous
>>> -
>>> -/*
>>> - * We put a few things here that have to be page-aligned.
>>> - * This stuff goes at the beginning of the bss, which is page-aligned.
>>> - */
>>> -   .section ".bss"
>>> -/*
>>> - * pgd dir should be aligned to PGD_TABLE_SIZE which is 64K.
>>> - * We will need to find a better way to fix this
>>> - */
>>> -   .align  16
>>> -
>>> -   .globl  swapper_pg_dir
>>> -swapper_pg_dir:
>>> -   .space  PGD_TABLE_SIZE
>
> This is now 4K aligned whereas it used to be 64K.
>
> This fixes it and is not completely ugly?
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 1707ab580ee2..298469beaa90 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -28,7 +28,13 @@
>  #include 
>  #include 
>  
> -pgd_t swapper_pg_dir[MAX_PTRS_PER_PGD] __page_aligned_bss;
> +#ifdef CONFIG_PPC64
> +#define PGD_ALIGN 0x1
> +#else
> +#define PGD_ALIGN PAGE_SIZE
> +#endif
> +
> +pgd_t swapper_pg_dir[MAX_PTRS_PER_PGD] __section(".bss..page_aligned") 
> __aligned(PGD_ALIGN);

The fix works for me, thank you.

Kind regards,
Daniel
>  
>  static inline int is_exec_fault(void)
>  {
>
>
> cheers

Re: [PATCH 2/3] powerpc: Define swapper_pg_dir[] in C

2021-06-23 Thread Daniel Axtens

Hi Christophe,

This breaks booting a radix KVM guest with 4k pages for me:

make pseries_le_defconfig
scripts/config -d CONFIG_PPC_64K_PAGES
scripts/config -e CONFIG_PPC_4K_PAGES
make vmlinux
sudo qemu-system-ppc64 -enable-kvm -M pseries -m 1G -nographic -vga none -smp 4 
-cpu host -kernel vmlinux

Boot hangs after printing 'Booting Linux via __start()' and qemu's 'info
registers' reports that it's stuck at the instruction fetch exception.

My host is Power9, 64k page size radix, and
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34

Kind regards,
Daniel

> Don't duplicate swapper_pg_dir[] in each platform's head.S
>
> Define it in mm/pgtable.c
>
> Define MAX_PTRS_PER_PGD because on book3s/64 PTRS_PER_PGD is
> not a constant.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  3 +++
>  arch/powerpc/include/asm/pgtable.h   |  4 
>  arch/powerpc/kernel/asm-offsets.c|  5 -
>  arch/powerpc/kernel/head_40x.S   | 11 ---
>  arch/powerpc/kernel/head_44x.S   | 17 +
>  arch/powerpc/kernel/head_64.S| 15 ---
>  arch/powerpc/kernel/head_8xx.S   | 12 
>  arch/powerpc/kernel/head_book3s_32.S | 11 ---
>  arch/powerpc/kernel/head_fsl_booke.S | 12 
>  arch/powerpc/mm/pgtable.c|  2 ++
>  10 files changed, 10 insertions(+), 82 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index a666d561b44d..4d9941b2fe51 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -232,6 +232,9 @@ extern unsigned long __pmd_frag_size_shift;
>  #define PTRS_PER_PUD (1 << PUD_INDEX_SIZE)
>  #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)
>  
> +#define MAX_PTRS_PER_PGD (1 << (H_PGD_INDEX_SIZE > RADIX_PGD_INDEX_SIZE 
> ? \
> +H_PGD_INDEX_SIZE : RADIX_PGD_INDEX_SIZE))
> +
>  /* PMD_SHIFT determines what a second-level page table entry can map */
>  #define PMD_SHIFT(PAGE_SHIFT + PTE_INDEX_SIZE)
>  #define PMD_SIZE (1UL << PMD_SHIFT)
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index c6a676714f04..b9c8641654f4 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -41,6 +41,10 @@ struct mm_struct;
>  
>  #ifndef __ASSEMBLY__
>  
> +#ifndef MAX_PTRS_PER_PGD
> +#define MAX_PTRS_PER_PGD PTRS_PER_PGD
> +#endif
> +
>  /* Keep these as a macros to avoid include dependency mess */
>  #define pte_page(x)  pfn_to_page(pte_pfn(x))
>  #define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 0480f4006e0c..f1b6ff14c8a0 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -361,11 +361,6 @@ int main(void)
>   DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
>  #endif
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> - DEFINE(PGD_TABLE_SIZE, (sizeof(pgd_t) << max(RADIX_PGD_INDEX_SIZE, 
> H_PGD_INDEX_SIZE)));
> -#else
> - DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE);
> -#endif
>   DEFINE(PTE_SIZE, sizeof(pte_t));
>  
>  #ifdef CONFIG_KVM
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index 92b6c7356161..7d72ee5ab387 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -701,14 +701,3 @@ _GLOBAL(abort)
>  mfspr   r13,SPRN_DBCR0
>  orisr13,r13,DBCR0_RST_SYSTEM@h
>  mtspr   SPRN_DBCR0,r13
> -
> -/* We put a few things here that have to be page-aligned. This stuff
> - * goes at the beginning of the data segment, which is page-aligned.
> - */
> - .data
> - .align  12
> - .globl  sdata
> -sdata:
> - .globl  swapper_pg_dir
> -swapper_pg_dir:
> - .space  PGD_TABLE_SIZE
> diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
> index e037eb615757..ddc978a2d381 100644
> --- a/arch/powerpc/kernel/head_44x.S
> +++ b/arch/powerpc/kernel/head_44x.S
> @@ -1233,23 +1233,8 @@ head_start_common:
>   isync
>   blr
>  
> -/*
> - * We put a few things here that have to be page-aligned. This stuff
> - * goes at the beginning of the data segment, which is page-aligned.
> - */
> - .data
> - .align  PAGE_SHIFT
> - .globl  sdata
> -sdata:
> -
> -/*
> - * To support >32-bit physical addresses, we use an 8KB pgdir.
> - */
> - .globl  swapper_pg_dir
> -swapper_pg_dir:
> - .space  PGD_TABLE_SIZE
> -
>  #ifdef CONFIG_SMP
> + .data
>   .align  12
>  temp_boot_stack:
>   .space  1024
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 730838c7ca39..79f2d1e61abd 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/

Re: [PATCH v15 2/4] kasan: allow architectures to provide an outline readiness check

2021-06-23 Thread Daniel Axtens

>> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
>> index 10177cc26d06..0ad615f3801d 100644
>> --- a/mm/kasan/common.c
>> +++ b/mm/kasan/common.c
>> @@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct 
>> kmem_cache *cache, void *object,
>> u8 tag;
>> void *tagged_object;
>>
>> +   /* Bail if the arch isn't ready */
>
> This comment brings no value. The fact that we bail is clear from the
> following line. The comment should explain why we bail.
>
>> +   if (!kasan_arch_is_ready())
>> +   return false;

Fair enough, I've just dropped the comments as I don't think there's
really a lot of scope for the generic/core comment to explain why a
particular architecture might not be ready.

> Have you considered including these checks into the high-level
> wrappers in include/linux/kasan.h? Would that work?

I don't think those wrappers will catch the outline check functions
like __asan_load*, which also need guarding.

Kind regards,
Daniel

Re: [PATCH 1/2] powerpc/prom_init: Convert prom_strcpy() into prom_strscpy_pad()

2021-06-21 Thread Daniel Axtens

Hi

> -static char __init *prom_strcpy(char *dest, const char *src)
> +static ssize_t __init prom_strscpy_pad(char *dest, const char *src, size_t n)
>  {
> - char *tmp = dest;
> + ssize_t rc;
> + size_t i;
>  
> - while ((*dest++ = *src++) != '\0')
> - /* nothing */;
> - return tmp;
> + if (n == 0 || n > INT_MAX)
> + return -E2BIG;
> +
> + // Copy up to n bytes
> + for (i = 0; i < n && src[i] != '\0'; i++)
> + dest[i] = src[i];
> +
> + rc = i;
> +
> + // If we copied all n then we have run out of space for the nul
> + if (rc == n) {
> + // Rewind by one character to ensure nul termination
> + i--;
> + rc = -E2BIG;
> + }
> +
> + for (; i < n; i++)
> + dest[i] = '\0';
> +
> + return rc;
>  }
>  

This implementation seems good to me.

I copied it into a new C file and added the following:

int main() {
char longstr[255]="abcdefghijklmnopqrstuvwxyz";
char shortstr[5];
assert(prom_strscpy_pad(longstr, "", 0) == -E2BIG);
assert(prom_strscpy_pad(longstr, "hello", 255) == 5);
assert(prom_strscpy_pad(shortstr, "hello", 5) == -E2BIG);
assert(memcmp(shortstr, "hell", 5) == 0);
assert(memcmp(longstr, "hello\0\0\0\0\0\0\0\0\0", 6) == 0);
return 0;
}

All the assertions pass. I believe this covers all the conditions from
the strscpy_pad docstring.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>  static int __init prom_strncmp(const char *cs, const char *ct, size_t count)
> @@ -2701,7 +2719,7 @@ static void __init flatten_device_tree(void)
>  
>   /* Add "phandle" in there, we'll need it */
>   namep = make_room(&mem_start, &mem_end, 16, 1);
> - prom_strcpy(namep, "phandle");
> + prom_strscpy_pad(namep, "phandle", sizeof("phandle"));
>   mem_start = (unsigned long)namep + prom_strlen(namep) + 1;
>  
>   /* Build string array */
> -- 
> 2.25.1

Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-06-20 Thread Daniel Axtens

Hi Chris,

> + /*
> +  * Choose a randomized, page-aligned address from the range:
> +  * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
> +  * The lower address bound is PAGE_SIZE to avoid the zero-page.
> +  * The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
> +  * under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
> +  */
> + patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
> + % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));

I checked and poking_init() comes after the functions that init the RNG,
so this should be fine. The maths - while a bit fiddly to reason about -
does check out.

> +
> + /*
> +  * PTE allocation uses GFP_KERNEL which means we need to pre-allocate
> +  * the PTE here. We cannot do the allocation during patching with IRQs
> +  * disabled (ie. "atomic" context).
> +  */
> + ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> + BUG_ON(!ptep);
> + pte_unmap_unlock(ptep, ptl);
> +}
>  
>  #if IS_BUILTIN(CONFIG_LKDTM)
>  unsigned long read_cpu_patching_addr(unsigned int cpu)
>  {
> - return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
> + return patching_addr;
>  }
>  #endif
>  
> -static int text_area_cpu_up(unsigned int cpu)
> +struct patch_mapping {
> + spinlock_t *ptl; /* for protecting pte table */
> + pte_t *ptep;
> + struct temp_mm temp_mm;
> +};
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +
> +static inline int hash_prefault_mapping(pgprot_t pgprot)
>  {
> - struct vm_struct *area;
> + int err;
>  
> - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> - if (!area) {
> - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> - cpu);
> - return -1;
> - }
> - this_cpu_write(text_poke_area, area);
> + if (radix_enabled())
> + return 0;
>  
> - return 0;
> -}
> + err = slb_allocate_user(patching_mm, patching_addr);
> + if (err)
> + pr_warn("map patch: failed to allocate slb entry\n");
>  

Here if slb_allocate_user() fails, you'll print a warning and then fall
through to the rest of the function. You do return err, but there's a
later call to hash_page_mm() that also sets err. Can slb_allocate_user()
fail while hash_page_mm() succeeds, and would that be a problem?

> -static int text_area_cpu_down(unsigned int cpu)
> -{
> - free_vm_area(this_cpu_read(text_poke_area));
> - return 0;
> + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> +HPTE_USE_KERNEL_KEY);
> + if (err)
> + pr_warn("map patch: failed to insert hashed page\n");
> +
> + /* See comment in switch_slb() in mm/book3s64/slb.c */
> + isync();
> +

The comment reads:

/*
 * Synchronize slbmte preloads with possible subsequent user memory
 * address accesses by the kernel (user mode won't happen until
 * rfid, which is safe).
 */
 isync();

I have to say having read the description of isync I'm not 100% sure why
that's enough (don't we also need stores to complete?) but I'm happy to
take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
on trust here!

I think it does make sense for you to have that barrier here: you are
potentially about to start poking at the memory mapped through that SLB
entry so you should make sure you're fully synchronised.

> + return err;
>  }
>  

> + init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> + use_temporary_mm(&patch_mapping->temp_mm);
>  
> - pmdp = pmd_offset(pudp, addr);
> - if (unlikely(!pmdp))
> - return -EINVAL;
> + /*
> +  * On Book3s64 with the Hash MMU we have to manually insert the SLB
> +  * entry and HPTE to prevent taking faults on the patching_addr later.
> +  */
> + return(hash_prefault_mapping(pgprot));

hmm, `return hash_prefault_mapping(pgprot);` or
`return (hash_prefault_mapping((pgprot));` maybe?

Kind regards,
Daniel

Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload

2021-06-20 Thread Daniel Axtens

"Christopher M. Riedl"  writes:

> Switching to a different mm with Hash translation causes SLB entries to
> be preloaded from the current thread_info. This reduces SLB faults, for
> example when threads share a common mm but operate on different address
> ranges.
>
> Preloading entries from the thread_info struct may not always be
> appropriate - such as when switching to a temporary mm. Introduce a new
> boolean in mm_context_t to skip the SLB preload entirely. Also move the
> SLB preload code into a separate function since switch_slb() is already
> quite long. The default behavior (preloading SLB entries from the
> current thread_info struct) remains unchanged.
>
> Signed-off-by: Christopher M. Riedl 
>
> ---
>
> v4:  * New to series.
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  3 ++
>  arch/powerpc/include/asm/mmu_context.h   | 13 ++
>  arch/powerpc/mm/book3s64/mmu_context.c   |  2 +
>  arch/powerpc/mm/book3s64/slb.c   | 56 ++--
>  4 files changed, 50 insertions(+), 24 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> b/arch/powerpc/include/asm/book3s/64/mmu.h
> index eace8c3f7b0a1..b23a9dcdee5af 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -130,6 +130,9 @@ typedef struct {
>   u32 pkey_allocation_map;
>   s16 execute_only_pkey; /* key holding execute-only protection */
>  #endif
> +
> + /* Do not preload SLB entries from thread_info during switch_slb() */
> + bool skip_slb_preload;
>  } mm_context_t;
>  
>  static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 4bc45d3ed8b0e..264787e90b1a1 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
>   return 0;
>  }
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +
> +static inline void skip_slb_preload_mm(struct mm_struct *mm)
> +{
> + mm->context.skip_slb_preload = true;
> +}
> +
> +#else
> +
> +static inline void skip_slb_preload_mm(struct mm_struct *mm) {}
> +
> +#endif /* CONFIG_PPC_BOOK3S_64 */
> +
>  #include 
>  
>  #endif /* __KERNEL__ */
> diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> b/arch/powerpc/mm/book3s64/mmu_context.c
> index c10fc8a72fb37..3479910264c59 100644
> --- a/arch/powerpc/mm/book3s64/mmu_context.c
> +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct 
> mm_struct *mm)
>   atomic_set(&mm->context.active_cpus, 0);
>   atomic_set(&mm->context.copros, 0);
>  
> + mm->context.skip_slb_preload = false;
> +
>   return 0;
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index c91bd85eb90e3..da0836cb855af 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int index)
>   asm volatile("slbie %0" : : "r" (slbie_data));
>  }
>  
> +static void preload_slb_entries(struct task_struct *tsk, struct mm_struct 
> *mm)
Should this be explicitly inline or even __always_inline? I'm thinking
switch_slb is probably a fairly hot path on hash?

> +{
> + struct thread_info *ti = task_thread_info(tsk);
> + unsigned char i;
> +
> + /*
> +  * We gradually age out SLBs after a number of context switches to
> +  * reduce reload overhead of unused entries (like we do with FP/VEC
> +  * reload). Each time we wrap 256 switches, take an entry out of the
> +  * SLB preload cache.
> +  */
> + tsk->thread.load_slb++;
> + if (!tsk->thread.load_slb) {
> + unsigned long pc = KSTK_EIP(tsk);
> +
> + preload_age(ti);
> + preload_add(ti, pc);
> + }
> +
> + for (i = 0; i < ti->slb_preload_nr; i++) {
> + unsigned char idx;
> + unsigned long ea;
> +
> + idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR;
> + ea = (unsigned long)ti->slb_preload_esid[idx] << SID_SHIFT;
> +
> + slb_allocate_user(mm, ea);
> + }
> +}
> +
>  /* Flush all user entries from the segment table of the current processor. */
>  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>  {
> - struct thread_info *ti = task_thread_info(tsk);
>   unsigned char i;
>  
>   /*
> @@ -502,29 +531,8 @@ void switch_slb(struct task_struct *tsk, struct 
> mm_struct *mm)
>  
>   copy_mm_to_paca(mm);
>  
> - /*
> -  * We gradually age out SLBs after a number of context switches to
> -  * reduce reload overhead of unused entries (like we do with FP/VEC
> -  * reload). Each time we wrap 256 switches, take an entry out of the
> -  * SLB preload cache.
> -  */
> - tsk->thread.load_slb++;
> - if (!tsk->thread.load_slb) {
> -

Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64

2021-06-18 Thread Daniel Axtens

Daniel Axtens  writes:

>> +#ifdef CONFIG_PPC64
>> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
>> +{
>> +struct page *page;
>> +
>> +page = virt_to_page(addr);
>> +if (protect)
>> +__kernel_map_pages(page, 1, 0);
>> +else
>> +__kernel_map_pages(page, 1, 1);
>
> I lose track of the type conversions and code conventions involved, but
> can we do something like __kernel_map_pages(page, 1, !!protect)?

Ah, I missed that the if changed the truth/falsity. !protect, not
!!protect :P
>
> Apart from that, this seems good.
>
> Kind regards,
> Daniel

Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64

2021-06-18 Thread Daniel Axtens



> +#ifdef CONFIG_PPC64
> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> +{
> + struct page *page;
> +
> + page = virt_to_page(addr);
> + if (protect)
> + __kernel_map_pages(page, 1, 0);
> + else
> + __kernel_map_pages(page, 1, 1);

I lose track of the type conversions and code conventions involved, but
can we do something like __kernel_map_pages(page, 1, !!protect)?

Apart from that, this seems good.

Kind regards,
Daniel

Re: [PATCH 2/4] powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils

2021-06-18 Thread Daniel Axtens

Hi Jordan and Christophe,

> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -126,11 +126,8 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
>  #ifdef CONFIG_PPC_64K_PAGES
>  int mmu_ci_restrictions;
>  #endif
> -#ifdef CONFIG_DEBUG_PAGEALLOC
>  static u8 *linear_map_hash_slots;
>  static unsigned long linear_map_hash_count;
> -static DEFINE_SPINLOCK(linear_map_hash_lock);
> -#endif /* CONFIG_DEBUG_PAGEALLOC */
>  struct mmu_hash_ops mmu_hash_ops;
>  EXPORT_SYMBOL(mmu_hash_ops);
>  

> @@ -1944,6 +1937,8 @@ long hpte_insert_repeating(unsigned long hash, unsigned 
> long vpn,
>  }
>  
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> +static DEFINE_SPINLOCK(linear_map_hash_lock);
> +

I had some trouble figuring out why the spinlock has to be in the
ifdef. A bit of investigation suggests that it's only used in functions
that are only defined under CONFIG_DEBUG_PAGEALLOC - unlike the other
variables. So that makes sense.

While I was poking around, I noticed that linear_map_hash_slots is
manipulated under linear_map_hash_lock in kernel_(un)map_linear_page but
is manipulated outside the lock in htab_bolt_mapping(). Is that OK? (I
don't know when htab_bolt_mapping is called, it's possible it's only
called at times where nothing else could be happing to that array.)

Kind regards,
Daniel

>  static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
>  {
>   unsigned long hash;
> -- 
> 2.25.1

Re: [PATCH 1/4] powerpc/64s: Add DEBUG_PAGEALLOC for radix

2021-06-18 Thread Daniel Axtens

Jordan Niethe  writes:

> There is support for DEBUG_PAGEALLOC on hash but not on radix.
> Add support on radix.

Somewhat off-topic but I wonder at what point we can drop the weird !PPC
condition in mm/Kconfig.debug:

config DEBUG_PAGEALLOC
bool "Debug page memory allocations"
depends on DEBUG_KERNEL
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && 
!SPARC

I can't figure out from git history why it exists or if it still serves
any function. Given that HIBERNATION isn't much use on Book3S systems it
probably never matters, it just bugs me a bit.

Again, nothing that has to block this series, just maybe something to
follow up at some vague and undefined point in the future!

> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index a666d561b44d..b89482aed82a 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -812,6 +822,15 @@ static inline bool check_pte_access(unsigned long 
> access, unsigned long ptev)
>   * Generic functions with hash/radix callbacks
>   */
>  
> +#ifdef CONFIG_DEBUG_PAGEALLOC
> +static inline void __kernel_map_pages(struct page *page, int numpages, int 
> enable)
> +{
> + if (radix_enabled())
> + radix__kernel_map_pages(page, numpages, enable);
> + hash__kernel_map_pages(page, numpages, enable);

Does this require an else? On radix we will call both radix__ and
hash__kernel_map_pages.

I notice we call both hash__ and radix__ in map_kernel_page under radix,
so maybe this makes sense?

I don't fully understand the mechanism by which memory removal works: it
looks like on radix you mark the page as 'absent', which I think is
enough? Then you fall through to the hash code here:

for (i = 0; i < numpages; i++, page++) {
vaddr = (unsigned long)page_address(page);
lmi = __pa(vaddr) >> PAGE_SHIFT;
if (lmi >= linear_map_hash_count)
continue;

I think linear_map_hash_count will be zero unless it gets inited to a
non-zero value in htab_initialize(). I am fairly sure htab_initialize()
doesn't get called for a radix MMU. In that case you'll just `continue;`
out of every iteration of the loop, which would explain why nothing
weird would happen on radix.

Have I missed something here?

The rest of the patch looks good to me.

Kind regards,
Daniel

[PATCH v15 4/4] kasan: use MAX_PTRS_PER_* for early shadow tables

2021-06-17 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build. Switch to using MAX_PTRS_PER_*, which are constant.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 6 +++---
 mm/kasan/init.c   | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..5310e217bd74 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -41,9 +41,9 @@ struct kunit_kasan_expectation {
 #endif
 
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.30.2

[PATCH v15 3/4] mm: define default MAX_PTRS_PER_* in include/pgtable.h

2021-06-17 Thread Daniel Axtens

Commit c65e774fb3f6 ("x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable")
made PTRS_PER_P4D variable on x86 and introduced MAX_PTRS_PER_P4D as a
constant for cases which need a compile-time constant (e.g. fixed-size
arrays).

powerpc likewise has boot-time selectable MMU features which can cause
other mm "constants" to vary. For KASAN, we have some static
PTE/PMD/PUD/P4D arrays so we need compile-time maximums for all these
constants. Extend the MAX_PTRS_PER_ idiom, and place default definitions
in include/pgtable.h. These define MAX_PTRS_PER_x to be PTRS_PER_x unless
an architecture has defined MAX_PTRS_PER_x in its arch headers.

Clean up pgtable-nop4d.h and s390's MAX_PTRS_PER_P4D definitions while
we're at it: both can just pick up the default now.

Reviewed-by: Christophe Leroy 
Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 

---

s390 was compile tested only.
---
 arch/s390/include/asm/pgtable.h |  2 --
 include/asm-generic/pgtable-nop4d.h |  1 -
 include/linux/pgtable.h | 22 ++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7c66ae5d7e32..cf05954ce013 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -342,8 +342,6 @@ static inline int is_module_addr(void *addr)
 #define PTRS_PER_P4D   _CRST_ENTRIES
 #define PTRS_PER_PGD   _CRST_ENTRIES
 
-#define MAX_PTRS_PER_P4D   PTRS_PER_P4D
-
 /*
  * Segment table and region3 table entry encoding
  * (R = read-only, I = invalid, y = young bit):
diff --git a/include/asm-generic/pgtable-nop4d.h 
b/include/asm-generic/pgtable-nop4d.h
index ce2cbb3c380f..2f6b1befb129 100644
--- a/include/asm-generic/pgtable-nop4d.h
+++ b/include/asm-generic/pgtable-nop4d.h
@@ -9,7 +9,6 @@
 typedef struct { pgd_t pgd; } p4d_t;
 
 #define P4D_SHIFT  PGDIR_SHIFT
-#define MAX_PTRS_PER_P4D   1
 #define PTRS_PER_P4D   1
 #define P4D_SIZE   (1UL << P4D_SHIFT)
 #define P4D_MASK   (~(P4D_SIZE-1))
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 9e6f71265f72..69700e3e615f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1625,4 +1625,26 @@ typedef unsigned int pgtbl_mod_mask;
 #define pte_leaf_size(x) PAGE_SIZE
 #endif
 
+/*
+ * Some architectures have MMUs that are configurable or selectable at boot
+ * time. These lead to variable PTRS_PER_x. For statically allocated arrays it
+ * helps to have a static maximum value.
+ */
+
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
+#ifndef MAX_PTRS_PER_P4D
+#define MAX_PTRS_PER_P4D PTRS_PER_P4D
+#endif
+
 #endif /* _LINUX_PGTABLE_H */
-- 
2.30.2

[PATCH v15 2/4] kasan: allow architectures to provide an outline readiness check

2021-06-17 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 

--

Both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series

Build tested on arm64 with SW_TAGS and x86 with INLINE: the error fires
if I add a kasan_arch_is_ready define.
---
 mm/kasan/common.c  | 4 
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 6 ++
 mm/kasan/shadow.c  | 8 
 4 files changed, 21 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   /* Bail if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..4dbc8def64f4 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,12 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#elif !defined(CONFIG_KASAN_GENERIC) || !defined(CONFIG_KASAN_OUTLINE)
+#error kasan_arch_is_ready only works in KASAN generic outline mode!
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..3c7f7efe6f68 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
 #ifdef CONFIG_KASAN_GENERIC
 void kasan_poison_last_granule(const void *addr, size_t size)
 {
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
if (size & KASAN_GRANULE_MASK) {
u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
*shadow = size & KASAN_GRANULE_MASK;
-- 
2.30.2

[PATCH v15 1/4] kasan: allow an architecture to disable inline instrumentation

2021-06-17 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.*

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

* on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
real mode. However, before we turn on virtual memory, we parse the device
tree to determine which platform and MMU we're running under. That calls
generic DT code, which is instrumented. Inline instrumentation in DT would
unconditionally attempt to touch the shadow region, which we won't have
set up yet, and would crash. We can make outline mode wait for the arch to
be ready, but we can't change what the compiler inserts for inline mode.

Reviewed-by: Marco Elver 
Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..cb5e02d09e11 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+   help
+ Sometimes an architecture might not be able to support inline
+ instrumentation but might be able to support outline instrumentation.
+ This option allows an architecture to prevent inline and stack
+ instrumentation from being enabled.
+
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.30.2

[PATCH v15 0/4] KASAN core changes for ppc64 radix KASAN

2021-06-17 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU. I've been
trying this for a while, but we keep having collisions between the
kasan code in the mm tree and the code I want to put in to the ppc
tree.

This series just contains the kasan core changes that we need. These
can go in via the mm tree. I will then propose the powerpc changes for
a later cycle. (The most recent RFC for the powerpc changes is in the
v12 series at
https://lore.kernel.org/linux-mm/20210615014705.2234866-1-...@axtens.net/
)

v15 applies to next-20210611. There should be no noticeable changes to
other platforms.

Changes since v14: Included a bunch of Reviewed-by:s, thanks
Christophe and Marco. Cleaned up the build time error #ifdefs, thanks
Christophe.

Changes since v13: move the MAX_PTR_PER_* definitions out of kasan and
into pgtable.h. Add a build time error to hopefully prevent any
confusion about when the new hook is applicable. Thanks Marco and
Christophe.

Changes since v12: respond to Marco's review comments - clean up the
help for ARCH_DISABLE_KASAN_INLINE, and add an arch readiness check to
the new granule poisioning function. Thanks Marco.

Daniel Axtens (4):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  mm: define default MAX_PTRS_PER_* in include/pgtable.h
  kasan: use MAX_PTRS_PER_* for early shadow tables

 arch/s390/include/asm/pgtable.h |  2 --
 include/asm-generic/pgtable-nop4d.h |  1 -
 include/linux/kasan.h   |  6 +++---
 include/linux/pgtable.h | 22 ++
 lib/Kconfig.kasan   | 14 ++
 mm/kasan/common.c   |  4 
 mm/kasan/generic.c  |  3 +++
 mm/kasan/init.c |  6 +++---
 mm/kasan/kasan.h|  6 ++
 mm/kasan/shadow.c   |  8 
 10 files changed, 63 insertions(+), 9 deletions(-)

-- 
2.30.2

[PATCH v14 4/4] kasan: use MAX_PTRS_PER_* for early shadow tables

2021-06-16 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build. Switch to using MAX_PTRS_PER_*, which are constant.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 6 +++---
 mm/kasan/init.c   | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..5310e217bd74 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -41,9 +41,9 @@ struct kunit_kasan_expectation {
 #endif
 
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.30.2

[PATCH v14 3/4] mm: define default MAX_PTRS_PER_* in include/pgtable.h

2021-06-16 Thread Daniel Axtens

Commit c65e774fb3f6 ("x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable")
made PTRS_PER_P4D variable on x86 and introduced MAX_PTRS_PER_P4D as a
constant for cases which need a compile-time constant (e.g. fixed-size
arrays).

powerpc likewise has boot-time selectable MMU features which can cause
other mm "constants" to vary. For KASAN, we have some static
PTE/PMD/PUD/P4D arrays so we need compile-time maximums for all these
constants. Extend the MAX_PTRS_PER_ idiom, and place default definitions
in include/pgtable.h. These define MAX_PTRS_PER_x to be PTRS_PER_x unless
an architecture has defined MAX_PTRS_PER_x in its arch headers.

Clean up pgtable-nop4d.h and s390's MAX_PTRS_PER_P4D definitions while
we're at it: both can just pick up the default now.

Signed-off-by: Daniel Axtens 

---

s390 was compile tested only.
---
 arch/s390/include/asm/pgtable.h |  2 --
 include/asm-generic/pgtable-nop4d.h |  1 -
 include/linux/pgtable.h | 22 ++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7c66ae5d7e32..cf05954ce013 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -342,8 +342,6 @@ static inline int is_module_addr(void *addr)
 #define PTRS_PER_P4D   _CRST_ENTRIES
 #define PTRS_PER_PGD   _CRST_ENTRIES
 
-#define MAX_PTRS_PER_P4D   PTRS_PER_P4D
-
 /*
  * Segment table and region3 table entry encoding
  * (R = read-only, I = invalid, y = young bit):
diff --git a/include/asm-generic/pgtable-nop4d.h 
b/include/asm-generic/pgtable-nop4d.h
index ce2cbb3c380f..2f6b1befb129 100644
--- a/include/asm-generic/pgtable-nop4d.h
+++ b/include/asm-generic/pgtable-nop4d.h
@@ -9,7 +9,6 @@
 typedef struct { pgd_t pgd; } p4d_t;
 
 #define P4D_SHIFT  PGDIR_SHIFT
-#define MAX_PTRS_PER_P4D   1
 #define PTRS_PER_P4D   1
 #define P4D_SIZE   (1UL << P4D_SHIFT)
 #define P4D_MASK   (~(P4D_SIZE-1))
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 9e6f71265f72..69700e3e615f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1625,4 +1625,26 @@ typedef unsigned int pgtbl_mod_mask;
 #define pte_leaf_size(x) PAGE_SIZE
 #endif
 
+/*
+ * Some architectures have MMUs that are configurable or selectable at boot
+ * time. These lead to variable PTRS_PER_x. For statically allocated arrays it
+ * helps to have a static maximum value.
+ */
+
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
+#ifndef MAX_PTRS_PER_P4D
+#define MAX_PTRS_PER_P4D PTRS_PER_P4D
+#endif
+
 #endif /* _LINUX_PGTABLE_H */
-- 
2.30.2

[PATCH v14 2/4] kasan: allow architectures to provide an outline readiness check

2021-06-16 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

Both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series

I haven't been able to exercise the arch hook error for !GENERIC as I
don't have a particularly modern aarch64 toolchain or a lot of experience
cross-compiling with clang. But it does fire for GENERIC + INLINE on x86.
---
 mm/kasan/common.c  | 4 
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 8 
 mm/kasan/shadow.c  | 8 
 4 files changed, 23 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   /* Bail if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..b18abaf8c78e 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,14 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#else
+#if !defined(CONFIG_KASAN_GENERIC) || !defined(CONFIG_KASAN_OUTLINE)
+#error kasan_arch_is_ready only works in KASAN generic outline mode!
+#endif
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..3c7f7efe6f68 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
 #ifdef CONFIG_KASAN_GENERIC
 void kasan_poison_last_granule(const void *addr, size_t size)
 {
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
if (size & KASAN_GRANULE_MASK) {
u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
*shadow = size & KASAN_GRANULE_MASK;
-- 
2.30.2

[PATCH v14 1/4] kasan: allow an architecture to disable inline instrumentation

2021-06-16 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.*

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

* on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
real mode. However, before we turn on virtual memory, we parse the device
tree to determine which platform and MMU we're running under. That calls
generic DT code, which is instrumented. Inline instrumentation in DT would
unconditionally attempt to touch the shadow region, which we won't have
set up yet, and would crash. We can make outline mode wait for the arch to
be ready, but we can't change what the compiler inserts for inline mode.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..cb5e02d09e11 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+   help
+ Sometimes an architecture might not be able to support inline
+ instrumentation but might be able to support outline instrumentation.
+ This option allows an architecture to prevent inline and stack
+ instrumentation from being enabled.
+
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.30.2

[PATCH v14 0/4] KASAN core changes for ppc64 radix KASAN

2021-06-16 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU. I've been
trying this for a while, but we keep having collisions between the
kasan code in the mm tree and the code I want to put in to the ppc
tree.

This series just contains the kasan core changes that we need. These
can go in via the mm tree. I will then propose the powerpc changes for
a later cycle. (The most recent RFC for the powerpc changes is in the
v12 series at
https://lore.kernel.org/linux-mm/20210615014705.2234866-1-...@axtens.net/
)

v14 applies to next-20210611. There should be no noticeable changes to
other platforms.

Changes since v13: move the MAX_PTR_PER_* definitions out of kasan and
into pgtable.h. Add a build time error to hopefully prevent any
confusion about when the new hook is applicable. Thanks Marco and
Christophe.

Changes since v12: respond to Marco's review comments - clean up the
help for ARCH_DISABLE_KASAN_INLINE, and add an arch readiness check to
the new granule poisioning function. Thanks Marco.

Daniel Axtens (4):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  mm: define default MAX_PTRS_PER_* in include/pgtable.h
  kasan: use MAX_PTRS_PER_* for early shadow tables

[PATCH v13 3/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-06-16 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..fd65f477ac92 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.30.2

[PATCH v13 2/3] kasan: allow architectures to provide an outline readiness check

2021-06-16 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 mm/kasan/common.c  | 4 
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 4 
 mm/kasan/shadow.c  | 8 
 4 files changed, 19 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   /* Bail if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..19323a3d5975 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..3c7f7efe6f68 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
 #ifdef CONFIG_KASAN_GENERIC
 void kasan_poison_last_granule(const void *addr, size_t size)
 {
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
if (size & KASAN_GRANULE_MASK) {
u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
*shadow = size & KASAN_GRANULE_MASK;
-- 
2.30.2

[PATCH v13 1/3] kasan: allow an architecture to disable inline instrumentation

2021-06-16 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.*

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

* on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
real mode. However, before we turn on virtual memory, we parse the device
tree to determine which platform and MMU we're running under. That calls
generic DT code, which is instrumented. Inline instrumentation in DT would
unconditionally attempt to touch the shadow region, which we won't have
set up yet, and would crash. We can make outline mode wait for the arch to
be ready, but we can't change what the compiler inserts for inline mode.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..cb5e02d09e11 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+   help
+ Sometimes an architecture might not be able to support inline
+ instrumentation but might be able to support outline instrumentation.
+ This option allows an architecture to prevent inline and stack
+ instrumentation from being enabled.
+
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.30.2

[PATCH v13 0/3] KASAN core changes for ppc64 radix KASAN

2021-06-16 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU. I've been
trying this for a while, but we keep having collisions between the
kasan code in the mm tree and the code I want to put in to the ppc
tree.

So this series just contains the kasan core changes that we
need. These can go in via the mm tree. I will then propose the powerpc
changes for a later cycle. (The most recent RFC for the powerpc
changes is in the last series at
https://lore.kernel.org/linux-mm/20210615014705.2234866-1-...@axtens.net/
)

v13 applies to next-20210611. There should be no noticeable changes to
other platforms.

Changes since v12: respond to Marco's review comments - clean up the
help for ARCH_DISABLE_KASAN_INLINE, and add an arch readiness check to
the new granule poisioning function. Thanks Marco.

Kind regards,
Daniel

Daniel Axtens (3):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables

 include/linux/kasan.h | 18 +++---
 lib/Kconfig.kasan | 14 ++
 mm/kasan/common.c |  4 
 mm/kasan/generic.c|  3 +++
 mm/kasan/init.c   |  6 +++---
 mm/kasan/kasan.h  |  4 
 mm/kasan/shadow.c |  8 
 7 files changed, 51 insertions(+), 6 deletions(-)

-- 
2.30.2

Re: [PATCH v12 2/6] kasan: allow architectures to provide an outline readiness check

2021-06-15 Thread Daniel Axtens

Hi Marco,
>> +   /* Don't touch the shadow memory if arch isn't ready */
>> +   if (!kasan_arch_is_ready())
>> +   return;
>> +
>
> What about kasan_poison_last_granule()? kasan_unpoison() currently
> seems to potentially trip on that.

Ah the perils of rebasing an old series! I'll re-audit the generic code
for functions that touch memory and make sure I have covered them all.

Thanks for the review.

Kind regards,
Daniel

>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kasan-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kasan-dev/CANpmjNN2%3DgdDBPzYQYsmOtLQVVjSz2qFcwcTMEqB%3Ds_ZWndJLg%40mail.gmail.com.

Re: [PATCH v12 1/6] kasan: allow an architecture to disable inline instrumentation

2021-06-15 Thread Daniel Axtens

Hi Marco,

@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
>>  config HAVE_ARCH_KASAN_VMALLOC
>> bool
>>
>> +# Sometimes an architecture might not be able to support inline 
>> instrumentation
>> +# but might be able to support outline instrumentation. This option allows 
>> an
>> +# arch to prevent inline and stack instrumentation from being enabled.
>
> This comment could be moved into 'help' of this new config option.

It could. I did wonder if that made sense given that this is not a user
selectable option so I'm not sure if the help will ever be visible, but
I see that we do this sort of thing in Kconfig.kcsan and Kconfig.kgdb.
I've changed it over.

>> +# ppc64 turns on virtual memory late in boot, after calling into generic 
>> code
>> +# like the device-tree parser, so it uses this in conjuntion with a hook in
>> +# outline mode to avoid invalid access early in boot.
>
> I think the ppc64-related comment isn't necessary and can be moved to
> arch/ppc64 somewhere, if there isn't one already.

Fair enough. I'll pull it out of this file and look for a good place to
put the information in arch/powerpc in a later patch/series.

Kind regards,
Daniel

[PATCH v12 6/6] [RFC] powerpc: Book3S 64-bit outline-only KASAN support

2021-06-14 Thread Daniel Axtens

[I'm hoping to get this in a subsequent merge window after we get the core
changes in. I know there are still a few outstanding review comments, I just
wanted to make sure that I supplied a real use-case for the core changes I'm
proposing.]

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Document KASAN in both generic and powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008 This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

Cc: Aneesh Kumar K.V  # ppc64 hash version
Cc: Christophe Leroy  # ppc32 version
Originally-by: Balbir Singh  # ppc64 out-of-line radix 
version
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst| 11 +--
 Documentation/powerpc/kasan.txt  | 48 +-
 arch/powerpc/Kconfig |  4 +-
 arch/powerpc/Kconfig.debug   |  3 +-
 arch/powerpc/include/asm/book3s/64/hash.h|  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +
 arch/powerpc/include/asm/book3s/64/radix.h   | 13 ++-
 arch/powerpc/include/asm/kasan.h | 22 +
 arch/powerpc/kernel/Makefile | 11 +++
 arch/powerpc/kernel/process.c| 16 ++--
 arch/powerpc/kvm/Makefile|  5 ++
 arch/powerpc/mm/book3s64/Makefile|  9 ++
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/init_book3s_64.c   | 95 
 arch/powerpc/mm/ptdump/ptdump.c  | 20 -
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 arch/powerpc/platforms/powernv/Makefile  |  6 ++
 arch/powerpc/platforms/pseries/Makefile  |  3 +
 18 files changed, 257 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 05d2d428a332..f8d6048db1bb 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,8 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures. It is also supported on 32-bit powerpc kernels.
-Tag-based KASAN modes are supported only for arm64.
+and riscv architectures. It is also supported on powerpc for 32-bit kernels and
+for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
+supported only for arm64.
 
 Usage
 -
@@ -344,10 +345,10 @@ CONFIG_KASAN_VMALLOC
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
 cost of greater memo

[PATCH v12 5/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2021-06-14 Thread Daniel Axtens

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)   += init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.27.0

[PATCH v12 4/6] kasan: Document support on 32-bit powerpc

2021-06-14 Thread Daniel Axtens

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst |  8 ++--
 Documentation/powerpc/kasan.txt   | 12 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 83ec4a556c19..05d2d428a332 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,7 +36,8 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures, and tag-based KASAN modes are supported only for 
arm64.
+and riscv architectures. It is also supported on 32-bit powerpc kernels.
+Tag-based KASAN modes are supported only for arm64.
 
 Usage
 -
@@ -343,7 +344,10 @@ CONFIG_KASAN_VMALLOC
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
 cost of greater memory usage. Currently, this is supported on x86,
-riscv, s390, and powerpc.
+riscv, s390, and 32-bit powerpc.
+
+It is optional, except on 32-bit powerpc kernels with module support,
+where it is required.
 
 This works by hooking into vmalloc and vmap and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.27.0

[PATCH v12 3/6] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-06-14 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..fd65f477ac92 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.27.0

[PATCH v12 2/6] kasan: allow architectures to provide an outline readiness check

2021-06-14 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 mm/kasan/common.c  | 4 
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 4 
 mm/kasan/shadow.c  | 4 
 4 files changed, 15 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   /* Bail if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..19323a3d5975 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..74134b657d7d 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
-- 
2.27.0

[PATCH v12 1/6] kasan: allow an architecture to disable inline instrumentation

2021-06-14 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..935814f332a7 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+# Sometimes an architecture might not be able to support inline instrumentation
+# but might be able to support outline instrumentation. This option allows an 
+# arch to prevent inline and stack instrumentation from being enabled.
+# ppc64 turns on virtual memory late in boot, after calling into generic code
+# like the device-tree parser, so it uses this in conjuntion with a hook in
+# outline mode to avoid invalid access early in boot.
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.27.0

[PATCH v12 0/6] KASAN core changes for ppc64 radix KASAN

2021-06-14 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

I've been trying this for a while, but we keep having collisions
between the kasan code in the mm tree and the code I want to put in to
the ppc tree. So my aim here is for patches 1 through 4 or 1 through 5
to go in via the mm tree. I will then propose the powerpc changes for
a later cycle. (I have attached them to this series as an RFC, and
there are still outstanding review comments I need to attend to.)

v12 applies to next-20210611. There should be no noticable changes to
other platforms.

Kind regards,
Daniel

Daniel Axtens (6):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables
  kasan: Document support on 32-bit powerpc
  powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
  [RFC] powerpc: Book3S 64-bit outline-only KASAN support

 Documentation/dev-tools/kasan.rst |  7 +-
 Documentation/powerpc/kasan.txt   | 58 +++
 arch/powerpc/Kconfig  |  4 +-
 arch/powerpc/Kconfig.debug|  3 +-
 arch/powerpc/include/asm/book3s/64/hash.h |  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  4 +
 arch/powerpc/include/asm/book3s/64/radix.h| 13 ++-
 arch/powerpc/include/asm/kasan.h  | 22 +
 arch/powerpc/kernel/Makefile  | 11 +++
 arch/powerpc/kernel/process.c | 16 ++--
 arch/powerpc/kvm/Makefile |  5 +
 arch/powerpc/mm/book3s64/Makefile |  9 ++
 arch/powerpc/mm/kasan/Makefile|  3 +-
 .../mm/kasan/{kasan_init_32.c => init_32.c}   |  0
 arch/powerpc/mm/kasan/init_book3s_64.c| 95 +++
 arch/powerpc/mm/ptdump/ptdump.c   | 20 +++-
 arch/powerpc/platforms/Kconfig.cputype|  1 +
 arch/powerpc/platforms/powernv/Makefile   |  6 ++
 arch/powerpc/platforms/pseries/Makefile   |  3 +
 include/linux/kasan.h | 18 +++-
 lib/Kconfig.kasan | 14 +++
 mm/kasan/common.c |  4 +
 mm/kasan/generic.c|  3 +
 mm/kasan/init.c   |  6 +-
 mm/kasan/kasan.h  |  4 +
 mm/kasan/shadow.c |  4 +
 26 files changed, 316 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

-- 
2.27.0

Re: [PATCH] powerpc: Fix initrd corruption with relative jump labels

2021-06-14 Thread Daniel Axtens

Hi Michael,

> The fix is simply to make the key value relative to the jump_entry
> struct in the ARCH_STATIC_BRANCH macro.

This fixes the boot issues I observed. Thank you very much!!

Tested-by: Daniel Axtens 

Kind regards,
Daniel

[PATCH v2] powerpc: make stack walking KASAN-safe

2021-06-14 Thread Daniel Axtens

Make our stack-walking code KASAN-safe by using __no_sanitize_address.
Generic code, arm64, s390 and x86 all make accesses unchecked for similar
sorts of reasons: when unwinding a stack, we might touch memory that KASAN
has marked as being out-of-bounds. In ppc64 KASAN development, I hit this
sometimes when checking for an exception frame - because we're checking
an arbitrary offset into the stack frame.

See commit 20955746320e ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c8366416
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").

Cc: Naveen N. Rao 
Signed-off-by: Daniel Axtens 

---

v2: Use __no_sanitize_address, thanks Naveen
---
 arch/powerpc/kernel/process.c| 5 +++--
 arch/powerpc/kernel/stacktrace.c | 8 
 arch/powerpc/perf/callchain.c| 2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 89e34aa273e2..3464064a0b8b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2121,8 +2121,9 @@ unsigned long get_wchan(struct task_struct *p)
 
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
-void show_stack(struct task_struct *tsk, unsigned long *stack,
-   const char *loglvl)
+void __no_sanitize_address show_stack(struct task_struct *tsk,
+ unsigned long *stack,
+ const char *loglvl)
 {
unsigned long sp, ip, lr, newsp;
int count = 0;
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 1deb1bf331dd..1961e6d5e33b 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -23,8 +23,8 @@
 
 #include 
 
-void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
-struct task_struct *task, struct pt_regs *regs)
+void __no_sanitize_address arch_stack_walk(stack_trace_consume_fn 
consume_entry, void *cookie,
+  struct task_struct *task, struct 
pt_regs *regs)
 {
unsigned long sp;
 
@@ -61,8 +61,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, 
void *cookie,
  *
  * If the task is not 'current', the caller *must* ensure the task is inactive.
  */
-int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
-void *cookie, struct task_struct *task)
+int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn 
consume_entry,
+  void *cookie, struct 
task_struct *task)
 {
unsigned long sp;
unsigned long newsp;
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 6c028ee513c0..082f6d0308a4 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -40,7 +40,7 @@ static int valid_next_sp(unsigned long sp, unsigned long 
prev_sp)
return 0;
 }
 
-void
+void __no_sanitize_address
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs 
*regs)
 {
unsigned long sp, next_sp;
-- 
2.27.0

Re: [PATCH] powerpc: make show_stack's stack walking KASAN-safe

2021-06-01 Thread Daniel Axtens

"Naveen N. Rao"  writes:

> Daniel Axtens wrote:
>> Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
>> generic code, arm64, s390 and x86 all do this for similar sorts of
>> reasons: when unwinding a stack, we might touch memory that KASAN has
>> marked as being out-of-bounds. In ppc64 KASAN development, I hit this
>> sometimes when checking for an exception frame - because we're checking
>> an arbitrary offset into the stack frame.
>> 
>> See commit 20955746320e ("s390/kasan: avoid false positives during stack
>> unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
>> frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
>> Prevent KASAN false positive warnings") and commit 6e22c8366416
>> ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").
>> 
>> Signed-off-by: Daniel Axtens 
>> ---
>>  arch/powerpc/kernel/process.c | 16 +---
>>  1 file changed, 9 insertions(+), 7 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index 89e34aa273e2..430cf06f9406 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -2151,8 +2151,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
>> *stack,
>>  break;
>>  
>>  stack = (unsigned long *) sp;
>> -newsp = stack[0];
>> -ip = stack[STACK_FRAME_LR_SAVE];
>> +newsp = READ_ONCE_NOCHECK(stack[0]);
>> +ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
>
> Just curious:
> Given that we validate the stack pointer before these accesses, can we 
> annotate show_stack() with __no_sanitize_address instead?
>
> I ask because we have other places where we walk the stack: 
> arch_stack_walk(), as well as in perf callchain. Similar changes will be 
> needed there as well.

Oh good points. Yes, it probably makes most sense to mark all the
functions with __no_sanitize_address, that resolves Christophe's issue
as well. I'll send a v2.

Kind regards,
Daniel

>
>
> - Naveen

[PATCH] powerpc: make show_stack's stack walking KASAN-safe

2021-05-28 Thread Daniel Axtens

Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
generic code, arm64, s390 and x86 all do this for similar sorts of
reasons: when unwinding a stack, we might touch memory that KASAN has
marked as being out-of-bounds. In ppc64 KASAN development, I hit this
sometimes when checking for an exception frame - because we're checking
an arbitrary offset into the stack frame.

See commit 20955746320e ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c8366416
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").

Signed-off-by: Daniel Axtens 
---
 arch/powerpc/kernel/process.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 89e34aa273e2..430cf06f9406 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2151,8 +2151,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack,
break;
 
stack = (unsigned long *) sp;
-   newsp = stack[0];
-   ip = stack[STACK_FRAME_LR_SAVE];
+   newsp = READ_ONCE_NOCHECK(stack[0]);
+   ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
if (!firstframe || ip != lr) {
printk("%s["REG"] ["REG"] %pS",
loglvl, sp, ip, (void *)ip);
@@ -2170,17 +2170,19 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack,
 * See if this is an exception frame.
 * We look for the "regshere" marker in the current frame.
 */
-   if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
-   && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
+   if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
+   (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
+STACK_FRAME_REGS_MARKER)) {
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_FRAME_OVERHEAD);
 
-   lr = regs->link;
+   lr = READ_ONCE_NOCHECK(regs->link);
printk("%s--- interrupt: %lx at %pS\n",
-  loglvl, regs->trap, (void *)regs->nip);
+  loglvl, READ_ONCE_NOCHECK(regs->trap),
+  (void *)READ_ONCE_NOCHECK(regs->nip));
__show_regs(regs);
printk("%s--- interrupt: %lx\n",
-  loglvl, regs->trap);
+  loglvl, READ_ONCE_NOCHECK(regs->trap));
 
firstframe = 1;
}
-- 
2.27.0

Re: [PATCH 1/2] powerpc/sstep: Add emulation support for ‘setb’ instruction

2021-04-24 Thread Daniel Axtens

Segher Boessenkool  writes:

> Hi!
>
> On Fri, Apr 16, 2021 at 05:44:52PM +1000, Daniel Axtens wrote:
>> Sathvika Vasireddy  writes:
>> Ok, if I've understood correctly...
>> 
>> > +  ra = ra & ~0x3;
>> 
>> This masks off the bits of RA that are not part of BTF:
>> 
>> ra is in [0, 31] which is [0b0, 0b1]
>> Then ~0x3 = ~0b00011
>> ra = ra & 0b11100
>> 
>> This gives us then,
>> ra = btf << 2; or
>> btf = ra >> 2;
>
> Yes.  In effect, you want the offset in bits of the CR field, which is
> just fine like this.  But a comment would not hurt.
>
>> Let's then check to see if your calculations read the right fields.
>> 
>> > +  if ((regs->ccr) & (1 << (31 - ra)))
>> > +  op->val = -1;
>> > +  else if ((regs->ccr) & (1 << (30 - ra)))
>> > +  op->val = 1;
>> > +  else
>> > +  op->val = 0;
>
> It imo is clearer if written
>
>   if ((regs->ccr << ra) & 0x8000)
>   op->val = -1;
>   else if ((regs->ccr << ra) & 0x4000)
>   op->val = 1;
>   else
>   op->val = 0;
>
> but I guess not everyone agrees :-)
>
>> CR field:  76543210
>> bit:  0123 0123 0123 0123 0123 0123 0123 0123
>> normal bit #: 0.31
>> ibm bit #:   31.0
>
> The bit numbers in CR fields are *always* numbered left-to-right.  I
> have never seen anyone use LE for it, anyway.
>
> Also, even people who write LE have the bigger end on the left normally
> (they just write some things right-to-left, and other things
> left-to-right).

Sorry, the numbers in the CR fields weren't meant to be especially
meaningful, I was just trying to convince myself that we referenced the
same bits doing the ISA way vs the way this code did it.

Kind regards,
Daniel
>
>> Checkpatch does have one complaint:
>> 
>> CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'regs->ccr'
>> #30: FILE: arch/powerpc/lib/sstep.c:1971:
>> +if ((regs->ccr) & (1 << (31 - ra)))
>> 
>> I don't really mind the parenteses: I think you are safe to ignore
>> checkpatch here unless someone else complains :)
>
> I find them annoying.  If there are too many parentheses, it is hard to
> see at a glance what groups where.  Also, a suspicious reader might
> think there is something special going on (with macros for example).
>
> This is simple code of course, but :-)
>
>> If you do end up respinning the patch, I think it would be good to make
>> the maths a bit clearer. I think it works because a left shift of 2 is
>> the same as multiplying by 4, but it would be easier to follow if you
>> used a temporary variable for btf.
>
> It is very simple.  The BFA instruction field is closely related to the
> BI instruction field, which is 5 bits, and selects one of the 32 bits in
> the CR.  If you have "BFA00 BFA01 BFA10 BFA11", that gives the bit
> numbers of all four bits in the selected CR field.  So the "& ~3" does
> all you need.  It is quite pretty :-)
>
>
> Segher

Re: [PATCH] powerpc/powernv/pci: remove dead code from !CONFIG_EEH

2021-04-22 Thread Daniel Axtens

Hi Nick,

> While looking at -Wundef warnings, the #if CONFIG_EEH stood out as a
> possible candidate to convert to #ifdef CONFIG_EEH, but it seems that
> based on Kconfig dependencies it's not possible to build this file
> without CONFIG_EEH enabled.

This seemed odd to me, but I think you're right:

arch/powerpc/platforms/Kconfig contains:

config EEH
bool
depends on (PPC_POWERNV || PPC_PSERIES) && PCI
default y

It's not configurable from e.g. make menuconfig because there's no prompt.
You can attempt to explicitly disable it with e.g. `scripts/config -d EEH`
but then something like `make oldconfig` will silently re-enable it for
you.

It's been forced on since commit e49f7a9997c6 ("powerpc/pseries: Rivet
CONFIG_EEH for pSeries platform") in 2012 which fixed it for
pseries. That moved out from pseries to pseries + powernv later on.

There are other cleanups in the same vein that could be made, from the
Makefile (which has files only built with CONFIG_EEH) through to other
source files. It looks like there's one `#ifdef CONFIG_EEH` in
arch/powerpc/platforms/powernv/pci-ioda.c that could be pulled out, for
example.

I think it's probably worth trying to rip out all of those in one patch?

Kind regards,
Daniel

Re: [PATCH kernel] powerpc/makefile: Do not redefine $(CPP) for preprocessor

2021-04-22 Thread Daniel Axtens

Hi Alexey,

> The $(CPP) (do only preprocessing) macro is already defined in Makefile.
> However POWERPC redefines it and adds $(KBUILD_CFLAGS) which results
> in flags duplication. Which is not a big deal by itself except for
> the flags which depend on other flags and the compiler checks them
> as it parses the command line.
>
> Specifically, scripts/Makefile.build:304 generates ksyms for .S files.
> If clang+llvm+sanitizer are enabled, this results in
> -fno-lto -flto -fsanitize=cfi-mfcall   -fno-lto -flto 
> -fsanitize=cfi-mfcall

Checkpatch doesn't like this line:
WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a 
maximum 75 chars per line)
#14: 
-fno-lto -flto -fsanitize=cfi-mfcall   -fno-lto -flto -fsanitize=cfi-mfcall
However, it doesn't make sense to wrap the line so we should just ignore
checkpatch here.

> in the clang command line and triggers error:
>
> clang-13: error: invalid argument '-fsanitize=cfi-mfcall' only allowed with 
> '-flto'
>
> This removes unnecessary CPP redifinition.
>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/Makefile | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index c9d2c7825cd6..3a2f2001c62b 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -214,7 +214,6 @@ KBUILD_CPPFLAGS   += -I $(srctree)/arch/$(ARCH) $(asinstr)
>  KBUILD_AFLAGS+= $(AFLAGS-y)
>  KBUILD_CFLAGS+= $(call cc-option,-msoft-float)
>  KBUILD_CFLAGS+= -pipe $(CFLAGS-y)
> -CPP  = $(CC) -E $(KBUILD_CFLAGS)

I was trying to check the history to see why powerpc has its own
definition. It seems to date back at least as far as merging the two
powerpc platforms into one, maybe it was helpful then. I agree it
doesn't seem to be needed now.

Snowpatch claims that this breaks the build, but I haven't been able to
reproduce the issue in either pmac32 or ppc64 defconfig. I have sent it
off to a fork of mpe's linux-ci repo to see if any of those builds hit
any issues: https://github.com/daxtens/linux-ci/actions

Assuming that doesn't break, this patch looks good to me:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-21 Thread Daniel Axtens

Daniel Axtens  writes:

> Hi Lakshmi,
>
>> On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
>>
>> Sorry - missed copying device-tree and powerpc mailing lists.
>>
>>> There are a few "goto out;" statements before the local variable "fdt"
>>> is initialized through the call to of_kexec_alloc_and_setup_fdt() in
>>> elf64_load(). This will result in an uninitialized "fdt" being passed
>>> to kvfree() in this function if there is an error before the call to
>>> of_kexec_alloc_and_setup_fdt().
>>> 
>>> Initialize the local variable "fdt" to NULL.
>>>
> I'm a huge fan of initialising local variables! But I'm struggling to
> find the code path that will lead to an uninit fdt being returned...

OK, so perhaps this was putting it too strongly. I have been bitten
by uninitialised things enough in C that I may have taken a slightly
overly-agressive view of fixing them in the source rather than the
compiler. I do think compiler-level mitigations are better, and I take
the point that we don't want to defeat compiler checking.

(Does anyone - and by anyone I mean any large distro - compile with
local variables inited by the compiler?)

I was reading the version in powerpc/next, clearly I should have looked
at linux-next. Having said that, I think I will leave the rest of the
bikeshedding to the rest of you, you all seem to have it in hand :)

Kind regards,
Daniel

>
> The out label reads in part:
>
>   /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
>   return ret ? ERR_PTR(ret) : fdt;
>
> As far as I can tell, any time we get a non-zero ret, we're going to
> return an error pointer rather than the uninitialised value...
>
> (btw, it does look like we might leak fdt if we have an error after we
> successfully kmalloc it.)
>
> Am I missing something? Can you link to the report for the kernel test
> robot or from Dan? 
>
> FWIW, I think it's worth including this patch _anyway_ because initing
> local variables is good practice, but I'm just not sure on the
> justification.
>
> Kind regards,
> Daniel
>
>>> Signed-off-by: Lakshmi Ramasubramanian 
>>> Reported-by: kernel test robot 
>>> Reported-by: Dan Carpenter 
>>> ---
>>>   arch/powerpc/kexec/elf_64.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
>>> index 5a569bb51349..0051440c1f77 100644
>>> --- a/arch/powerpc/kexec/elf_64.c
>>> +++ b/arch/powerpc/kexec/elf_64.c
>>> @@ -32,7 +32,7 @@ static void *elf64_load(struct kimage *image, char 
>>> *kernel_buf,
>>> int ret;
>>> unsigned long kernel_load_addr;
>>> unsigned long initrd_load_addr = 0, fdt_load_addr;
>>> -   void *fdt;
>>> +   void *fdt = NULL;
>>> const void *slave_code;
>>> struct elfhdr ehdr;
>>> char *modified_cmdline = NULL;
>>> 
>>
>> thanks,
>>   -lakshmi

Re: [PATCH 1/2] powerpc/sstep: Add emulation support for ‘setb’ instruction

2021-04-16 Thread Daniel Axtens

Sathvika Vasireddy  writes:

> This adds emulation support for the following instruction:
>* Set Boolean (setb)
>
> Signed-off-by: Sathvika Vasireddy 
> ---
>  arch/powerpc/lib/sstep.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index c6aebc149d14..263c613d7490 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1964,6 +1964,18 @@ int analyse_instr(struct instruction_op *op, const 
> struct pt_regs *regs,
>   op->val = ~(regs->gpr[rd] | regs->gpr[rb]);
>   goto logical_done;
>  
> + case 128:   /* setb */
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + goto unknown_opcode;

Ok, if I've understood correctly...

> + ra = ra & ~0x3;

This masks off the bits of RA that are not part of BTF:

ra is in [0, 31] which is [0b0, 0b1]
Then ~0x3 = ~0b00011
ra = ra & 0b11100

This gives us then,
ra = btf << 2; or
btf = ra >> 2;

Let's then check to see if your calculations read the right fields.

> + if ((regs->ccr) & (1 << (31 - ra)))
> + op->val = -1;
> + else if ((regs->ccr) & (1 << (30 - ra)))
> + op->val = 1;
> + else
> + op->val = 0;


CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
normal bit #: 0.31
ibm bit #:   31.0

If btf = 0, ra = 0, check normal bits 31 and 30, which are both in CR0.
CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
   ^^

If btf = 7, ra = 0b11100 = 28, so check normal bits 31-28 and 30-28,
which are 3 and 2.

CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
^^

If btf = 3, ra = 0b01100 = 12, for normal bits 19 and 18:

CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
^^

So yes, your calculations, while I struggle to follow _how_ they work,
do in fact seem to work.

Checkpatch does have one complaint:

CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'regs->ccr'
#30: FILE: arch/powerpc/lib/sstep.c:1971:
+   if ((regs->ccr) & (1 << (31 - ra)))

I don't really mind the parenteses: I think you are safe to ignore
checkpatch here unless someone else complains :)

If you do end up respinning the patch, I think it would be good to make
the maths a bit clearer. I think it works because a left shift of 2 is
the same as multiplying by 4, but it would be easier to follow if you
used a temporary variable for btf.

However, I do think this is still worth adding to the kernel either way,
so:

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> + goto compute_done;
> +
>   case 154:   /* prtyw */
>   do_prty(regs, op, regs->gpr[rd], 32);
>   goto logical_done_nocc;
> -- 
> 2.16.4

Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal

2021-04-16 Thread Daniel Axtens

Hi Tyrel,

> The pci_bus->bridge reference may no longer be valid after
> pci_bus_remove() resulting in passing a bad value to device_unregister()
> for the associated bridge device.
>
> Store the host_bridge reference in a separate variable prior to
> pci_bus_remove().
>
The patch certainly seems to do what you say. I'm not really up on the
innards of PCI, so I'm struggling to figure out by what code path
pci_bus_remove() might invalidate pci_bus->bridge? A quick look at
pci_remove_bus was not very illuminating but I didn't chase down every
call it made.

Kind regards,
Daniel

> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration 
> during PHB removal")
> Signed-off-by: Tyrel Datwyler 
> ---
>  arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
> b/arch/powerpc/platforms/pseries/pci_dlpar.c
> index f9ae17e8a0f4..a8f9140a24fa 100644
> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
> @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic);
>  int remove_phb_dynamic(struct pci_controller *phb)
>  {
>   struct pci_bus *b = phb->bus;
> + struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge);
>   struct resource *res;
>   int rc, i;
>  
> @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
>   /* Remove the PCI bus and unregister the bridge device from sysfs */
>   phb->bus = NULL;
>   pci_remove_bus(b);
> - device_unregister(b->bridge);
> + host_bridge->bus = NULL;
> + device_unregister(&host_bridge->dev);
>  
>   /* Now release the IO resource */
>   if (res->flags & IORESOURCE_IO)
> -- 
> 2.27.0

Re: [PATCH] soc: fsl: qe: remove unused function

2021-04-15 Thread Daniel Axtens

Hi Jiapeng,

> Fix the following clang warning:
>
> drivers/soc/fsl/qe/qe_ic.c:234:29: warning: unused function
> 'qe_ic_from_irq' [-Wunused-function].
>
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/soc/fsl/qe/qe_ic.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
> index 0390af9..b573712 100644
> --- a/drivers/soc/fsl/qe/qe_ic.c
> +++ b/drivers/soc/fsl/qe/qe_ic.c
> @@ -231,11 +231,6 @@ static inline void qe_ic_write(__be32  __iomem *base, 
> unsigned int reg,
>   qe_iowrite32be(value, base + (reg >> 2));
>  }
>  
> -static inline struct qe_ic *qe_ic_from_irq(unsigned int virq)
> -{
> - return irq_get_chip_data(virq);
> -}

This seems good to me.

 * We know that this function can't be called directly from outside the
  file, because it is static.

 * The function address isn't used as a function pointer anywhere, so
   that means it can't be called from outside the file that way (also
   it's inline, which would make using a function pointer unwise!)

 * There's no obvious macros in that file that might construct the name
   of the function in a way that is hidden from grep.

All in all, I am fairly confident that the function is indeed not used.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> -
>  static inline struct qe_ic *qe_ic_from_irq_data(struct irq_data *d)
>  {
>   return irq_data_get_irq_chip_data(d);
> -- 
> 1.8.3.1

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-15 Thread Daniel Axtens

Hi Lakshmi,

> On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
>
> Sorry - missed copying device-tree and powerpc mailing lists.
>
>> There are a few "goto out;" statements before the local variable "fdt"
>> is initialized through the call to of_kexec_alloc_and_setup_fdt() in
>> elf64_load(). This will result in an uninitialized "fdt" being passed
>> to kvfree() in this function if there is an error before the call to
>> of_kexec_alloc_and_setup_fdt().
>> 
>> Initialize the local variable "fdt" to NULL.
>>
I'm a huge fan of initialising local variables! But I'm struggling to
find the code path that will lead to an uninit fdt being returned...

The out label reads in part:

/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
return ret ? ERR_PTR(ret) : fdt;

As far as I can tell, any time we get a non-zero ret, we're going to
return an error pointer rather than the uninitialised value...

(btw, it does look like we might leak fdt if we have an error after we
successfully kmalloc it.)

Am I missing something? Can you link to the report for the kernel test
robot or from Dan? 

FWIW, I think it's worth including this patch _anyway_ because initing
local variables is good practice, but I'm just not sure on the
justification.

Kind regards,
Daniel

>> Signed-off-by: Lakshmi Ramasubramanian 
>> Reported-by: kernel test robot 
>> Reported-by: Dan Carpenter 
>> ---
>>   arch/powerpc/kexec/elf_64.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
>> index 5a569bb51349..0051440c1f77 100644
>> --- a/arch/powerpc/kexec/elf_64.c
>> +++ b/arch/powerpc/kexec/elf_64.c
>> @@ -32,7 +32,7 @@ static void *elf64_load(struct kimage *image, char 
>> *kernel_buf,
>>  int ret;
>>  unsigned long kernel_load_addr;
>>  unsigned long initrd_load_addr = 0, fdt_load_addr;
>> -void *fdt;
>> +void *fdt = NULL;
>>  const void *slave_code;
>>  struct elfhdr ehdr;
>>  char *modified_cmdline = NULL;
>> 
>
> thanks,
>   -lakshmi

Re: [PATCH] symbol : Make the size of the compile-related array fixed

2021-04-15 Thread Daniel Axtens

Hi,

Thanks for your contribution to the kernel!

I notice that your patch is sumbitted as an attachment. In future,
please could you submit your patch inline, rather than as an attachment?
See https://www.kernel.org/doc/html/v4.15/process/5.Posting.html
I'd recommend you use git send-email if possible: see e.g.
https://www.kernel.org/doc/html/v4.15/process/email-clients.html

> Subject: [PATCH] symbol : Make the size of the compile-related array fixed
>
> For the same code, the machine's user name, hostname, or compilation time
> may cause the kernel symbol address to be inconsistent, which is not
> friendly to some symbol-dependent software, such as Crash.

If I understand correctly, this patch makes it easier to recompile the
kernel from the same source but at a different time or on a different
machine or with a different user, but still get the same symbols.
Is that right?

I wonder if there are other reproducible build techniques that might be
simpler to apply? There is a kernel documentation page at
https://www.kernel.org/doc/html/latest/kbuild/reproducible-builds.html
which gives exisiting techniques to override the date, user and host.
Would they be sufficient to address your use case?

Kind regards,
Daniel

>
> Signed-off-by: Han Dapeng 
> ---
>  arch/powerpc/mm/nohash/kaslr_booke.c | 2 +-
>  arch/s390/boot/version.c | 2 +-
>  arch/x86/boot/compressed/kaslr.c | 2 +-
>  arch/x86/boot/version.c  | 2 +-
>  init/version.c   | 4 ++--
>  scripts/mkcompile_h  | 2 ++
>  6 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
> b/arch/powerpc/mm/nohash/kaslr_booke.c
> index 4c74e8a5482b..494ef408e60c 100644
> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
> @@ -37,7 +37,7 @@ struct regions {
>  };
>  
>  /* Simplified build-specific string for starting entropy. */
> -static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
> +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
> LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>  
>  struct regions __initdata regions;
> diff --git a/arch/s390/boot/version.c b/arch/s390/boot/version.c
> index d32e58bdda6a..627416a27d74 100644
> --- a/arch/s390/boot/version.c
> +++ b/arch/s390/boot/version.c
> @@ -3,5 +3,5 @@
>  #include 
>  #include "boot.h"
>  
> -const char kernel_version[] = UTS_RELEASE
> +const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE
>   " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION;
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index b92fffbe761f..7b72b518a4c8 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -43,7 +43,7 @@
>  extern unsigned long get_cmd_line_ptr(void);
>  
>  /* Simplified build-specific string for starting entropy. */
> -static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
> +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
> LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>  
>  static unsigned long rotate_xor(unsigned long hash, const void *area,
> diff --git a/arch/x86/boot/version.c b/arch/x86/boot/version.c
> index a1aaaf6c06a6..08feaa2d7a10 100644
> --- a/arch/x86/boot/version.c
> +++ b/arch/x86/boot/version.c
> @@ -14,6 +14,6 @@
>  #include 
>  #include 
>  
> -const char kernel_version[] =
> +const char kernel_version[COMPILE_STR_MAX] =
>   UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") "
>   UTS_VERSION;
> diff --git a/init/version.c b/init/version.c
> index 92afc782b043..adfc9e91b56b 100644
> --- a/init/version.c
> +++ b/init/version.c
> @@ -35,11 +35,11 @@ struct uts_namespace init_uts_ns = {
>  EXPORT_SYMBOL_GPL(init_uts_ns);
>  
>  /* FIXED STRINGS! Don't touch! */
> -const char linux_banner[] =
> +const char linux_banner[COMPILE_STR_MAX] =
>   "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
>  
> -const char linux_proc_banner[] =
> +const char linux_proc_banner[COMPILE_STR_MAX] =
>   "%s version %s"
>   " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")"
>   " (" LINUX_COMPILER ") %s\n";
> diff --git a/scripts/mkcompile_h b/scripts/mkcompile_h
> index 4ae735039daf..02b9d9d54da9 100755
> --- a/scripts/mkcompile_h
> +++ b/scripts/mkcompile_h
> @@ -65,6 +65,8 @@ UTS_VERSION="$(echo $UTS_VERSION $CONFIG_FLAGS $TIMESTAMP | 
> cut -b -$UTS_LEN)"
>LD_VERSION=$($LD -v | head -n1 | sed 's/(compatible with [^)]*)//' \
> | sed 's/[[:space:]]*$//')
>printf '#define LINUX_COMPILER "%s"\n' "$CC_VERSION, $LD_VERSION"
> +
> +  echo \#define COMPILE_STR_MAX 512
>  } > .tmpcompile
>  
>  # Only replace the real compile.h if the new one is different,
> -- 
> 2.27.0

Re: [PATCH v1 4/5] mm: ptdump: Support hugepd table entries

2021-04-15 Thread Daniel Axtens

Hi Christophe,

> Which hugepd, page table entries can be at any level
> and can be of any size.
>
> Add support for them.
>
> Signed-off-by: Christophe Leroy 
> ---
>  mm/ptdump.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/mm/ptdump.c b/mm/ptdump.c
> index 61cd16afb1c8..6efdb8c15a7d 100644
> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -112,11 +112,24 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long 
> addr,
>  {
>   struct ptdump_state *st = walk->private;
>   pte_t val = ptep_get(pte);
> + unsigned long page_size = next - addr;
> + int level;
> +
> + if (page_size >= PGDIR_SIZE)
> + level = 0;
> + else if (page_size >= P4D_SIZE)
> + level = 1;
> + else if (page_size >= PUD_SIZE)
> + level = 2;
> + else if (page_size >= PMD_SIZE)
> + level = 3;
> + else
> + level = 4;
>  
>   if (st->effective_prot)
> - st->effective_prot(st, 4, pte_val(val));
> + st->effective_prot(st, level, pte_val(val));
>  
> - st->note_page(st, addr, 4, pte_val(val), PAGE_SIZE);
> + st->note_page(st, addr, level, pte_val(val), page_size);

It seems to me that passing both level and page_size is a bit redundant,
but I guess it does reduce the impact on each arch's code?

Kind regards,
Daniel

>  
>   return 0;
>  }
> -- 
> 2.25.0

Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-15 Thread Daniel Axtens

Hi Christophe,

>  static void note_page(struct ptdump_state *pt_st, unsigned long addr, int 
> level,
> -   u64 val)
> +   u64 val, unsigned long page_size)

Compilers can warn about unused parameters at -Wextra level.  However,
reading scripts/Makefile.extrawarn it looks like the warning is
explicitly _disabled_ in the kernel at W=1 and not reenabled at W=2 or
W=3. So I guess this is fine...

> @@ -126,7 +126,7 @@ static int ptdump_hole(unsigned long addr, unsigned long 
> next,
>  {
>   struct ptdump_state *st = walk->private;
>  
> - st->note_page(st, addr, depth, 0);
> + st->note_page(st, addr, depth, 0, 0);

I know it doesn't matter at this point, but I'm not really thrilled by
the idea of passing 0 as the size here. Doesn't the hole have a known
page size?

>  
>   return 0;
>  }
> @@ -153,5 +153,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct 
> mm_struct *mm, pgd_t *pgd)
>   mmap_read_unlock(mm);
>  
>   /* Flush out the last page */
> - st->note_page(st, 0, -1, 0);
> + st->note_page(st, 0, -1, 0, 0);

I'm more OK with the idea of passing 0 as the size when the depth is -1
(don't know): if we don't know the depth we conceptually can't know the
page size.

Regards,
Daniel

Re: [PATCH v1 1/5] mm: pagewalk: Fix walk for hugepage tables

2021-04-15 Thread Daniel Axtens

Hi Christophe,

> Pagewalk ignores hugepd entries and walk down the tables
> as if it was traditionnal entries, leading to crazy result.
>
> Add walk_hugepd_range() and use it to walk hugepage tables.
>
> Signed-off-by: Christophe Leroy 
> ---
>  mm/pagewalk.c | 54 +--
>  1 file changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index e81640d9f177..410a9d8f7572 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -58,6 +58,32 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, 
> unsigned long end,
>   return err;
>  }
>  
> +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr,
> +  unsigned long end, struct mm_walk *walk, int 
> pdshift)
> +{
> + int err = 0;
> +#ifdef CONFIG_ARCH_HAS_HUGEPD
> + const struct mm_walk_ops *ops = walk->ops;
> + int shift = hugepd_shift(*phpd);
> + int page_size = 1 << shift;
> +
> + if (addr & (page_size - 1))
> + return 0;
> +
> + for (;;) {
> + pte_t *pte = hugepte_offset(*phpd, addr, pdshift);
> +
> + err = ops->pte_entry(pte, addr, addr + page_size, walk);
> + if (err)
> + break;
> + if (addr >= end - page_size)
> + break;
> + addr += page_size;
> + }

Initially I thought this was a somewhat unintuitive way to structure
this loop, but I see it parallels the structure of walk_pte_range_inner,
so I think the consistency is worth it.

I notice the pte walking code potentially takes some locks: does this
code need to do that?

arch/powerpc/mm/hugetlbpage.c says that hugepds are protected by the
mm->page_table_lock, but I don't think we're taking it in this code.

> +#endif
> + return err;
> +}
> +
>  static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> struct mm_walk *walk)
>  {
> @@ -108,7 +134,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long 
> addr, unsigned long end,
>   goto again;
>   }
>  
> - err = walk_pte_range(pmd, addr, next, walk);
> + if (is_hugepd(__hugepd(pmd_val(*pmd
> + err = walk_hugepd_range((hugepd_t *)pmd, addr, next, 
> walk, PMD_SHIFT);
> + else
> + err = walk_pte_range(pmd, addr, next, walk);
>   if (err)
>   break;
>   } while (pmd++, addr = next, addr != end);
> @@ -157,7 +186,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long 
> addr, unsigned long end,
>   if (pud_none(*pud))
>   goto again;
>  
> - err = walk_pmd_range(pud, addr, next, walk);
> + if (is_hugepd(__hugepd(pud_val(*pud
> + err = walk_hugepd_range((hugepd_t *)pud, addr, next, 
> walk, PUD_SHIFT);
> + else
> + err = walk_pmd_range(pud, addr, next, walk);

I'm a bit worried you might end up calling into walk_hugepd_range with
ops->pte_entry == NULL, and then jumping to 0.

static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
  struct mm_walk *walk)
{
...
pud = pud_offset(p4d, addr);
do {
...
if ((!walk->vma && (pud_leaf(*pud) || !pud_present(*pud))) ||
walk->action == ACTION_CONTINUE ||
!(ops->pmd_entry || ops->pte_entry)) <<< THIS CHECK
continue;
...
if (is_hugepd(__hugepd(pud_val(*pud
err = walk_hugepd_range((hugepd_t *)pud, addr, next, 
walk, PUD_SHIFT);
else
err = walk_pmd_range(pud, addr, next, walk);
if (err)
break;
} while (pud++, addr = next, addr != end);

walk_pud_range will proceed if there is _either_ an ops->pmd_entry _or_
an ops->pte_entry, but walk_hugepd_range will call ops->pte_entry
unconditionally.

The same issue applies to walk_{p4d,pgd}_range...

Kind regards,
Daniel

Re: [PATCH v2 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code

2021-04-09 Thread Daniel Axtens

Hi Ravi,

> perf-hwbreak selftest opens hw-breakpoint event at multiple places for
> which it has same code repeated. Coalesce that code into a function.
>
> Signed-off-by: Ravi Bangoria 
> ---
>  .../selftests/powerpc/ptrace/perf-hwbreak.c   | 78 +--

This doesn't simplify things very much for the code as it stands now,
but I think your next patch adds a bunch of calls to these functions, so
I agree that it makes sense to consolidate them now.

>  1 file changed, 38 insertions(+), 40 deletions(-)
>
> diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
> b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
> index c1f324afdbf3..bde475341c8a 100644
> --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
> +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
> @@ -34,28 +34,46 @@
>  
>  #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
>  
> -static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t 
> pid,
> -   int cpu, int group_fd,
> -   unsigned long flags)
> +static void perf_event_attr_set(struct perf_event_attr *attr,
> + __u32 type, __u64 addr, __u64 len,
> + bool exclude_user)
>  {
> - attr->size = sizeof(*attr);
> - return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
> + memset(attr, 0, sizeof(struct perf_event_attr));
> + attr->type   = PERF_TYPE_BREAKPOINT;
> + attr->size   = sizeof(struct perf_event_attr);
> + attr->bp_type= type;
> + attr->bp_addr= addr;
> + attr->bp_len = len;
> + attr->exclude_kernel = 1;
> + attr->exclude_hv = 1;
> + attr->exclude_guest  = 1;

Only 1 of the calls to perf sets exclude_{kernel,hv,guest} - I assume
there's no issue with setting them always but I wanted to check.

> + attr->exclude_user   = exclude_user;
> + attr->disabled   = 1;
>  }
>  
> - /* setup counters */
> - memset(&attr, 0, sizeof(attr));
> - attr.disabled = 1;
> - attr.type = PERF_TYPE_BREAKPOINT;
> - attr.bp_type = readwriteflag;
> - attr.bp_addr = (__u64)ptr;
> - attr.bp_len = sizeof(int);
> - if (arraytest)
> - attr.bp_len = DAWR_LENGTH_MAX;
> - attr.exclude_user = exclude_user;
> - break_fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> + break_fd = perf_process_event_open_exclude_user(readwriteflag, 
> (__u64)ptr,
> + arraytest ? DAWR_LENGTH_MAX : sizeof(int),
> + exclude_user);

checkpatch doesn't like this very much:

CHECK: Alignment should match open parenthesis
#103: FILE: tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c:115:
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   arraytest ? DAWR_LENGTH_MAX : sizeof(int),

Apart from that, this seems good but I haven't checked in super fine
detail just yet :)

Kind regards,
Daniel

Re: [PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR

2021-04-08 Thread Daniel Axtens

Hi Ravi,

> Add selftests to test multiple active DAWRs with ptrace interface.

It would be good if somewhere (maybe in the cover letter) you explain
what DAWR stands for and where to find more information about it. I
found the Power ISA v3.1 Book 3 Chapter 9 very helpful.

Apart from that, I don't have any specific comments about this patch. It
looks good to me, it seems to do what it says, and there are no comments
from checkpatch. It is a bit sparse in terms of comments but it is
consistent with the rest of the file so I can't really complain there :)

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Sample o/p:
>   $ ./ptrace-hwbreak
>   ...
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok
>   PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok
>
> Signed-off-by: Ravi Bangoria 
> ---
>  .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++
>  1 file changed, 79 insertions(+)
>
> diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
> b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
> index 2e0d86e0687e..a0635a3819aa 100644
> --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
> +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
> @@ -194,6 +194,18 @@ static void test_workload(void)
>   big_var[rand() % DAWR_MAX_LEN] = 'a';
>   else
>   cvar = big_var[rand() % DAWR_MAX_LEN];
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
> + gstruct.a[rand() % A_LEN] = 'a';
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
> + cvar = gstruct.b[rand() % B_LEN];
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
> + gstruct.a[rand() % A_LEN] = 'a';
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
> + cvar = gstruct.a[rand() % A_LEN];
>  }
>  
>  static void check_success(pid_t child_pid, const char *name, const char 
> *type,
> @@ -417,6 +429,69 @@ static void test_sethwdebug_range_aligned(pid_t 
> child_pid)
>   ptrace_delhwdebug(child_pid, wh);
>  }
>  
> +static void test_multi_sethwdebug_range(pid_t child_pid)
> +{
> + struct ppc_hw_breakpoint info1, info2;
> + unsigned long wp_addr1, wp_addr2;
> + char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED";
> + char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED";
> + int len1, len2;
> + int wh1, wh2;
> +
> + wp_addr1 = (unsigned long)&gstruct.a;
> + wp_addr2 = (unsigned long)&gstruct.b;
> + len1 = A_LEN;
> + len2 = B_LEN;
> + get_ppc_hw_breakpoint(&info1, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
> len1);
> + get_ppc_hw_breakpoint(&info2, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
> len2);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */
> + wh1 = ptrace_sethwdebug(child_pid, &info1);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */
> + wh2 = ptrace_sethwdebug(child_pid, &info2);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name1, "WO", wp_addr1, len1);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name2, "RO", wp_addr2, len2);
> +
> + ptrace_delhwdebug(child_pid, wh1);
> + ptrace_delhwdebug(child_pid, wh2);
> +}
> +
> +static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid)
> +{
> + struct ppc_hw_breakpoint info1, info2;
> + unsigned long wp_addr1, wp_addr2;
> + char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap";
> + int len1, len2;
> + int wh1, wh2;
> +
> + wp_addr1 = (unsigned long)&gstruct.a;
> + wp_addr2 = (unsigned long)&gstruct.a;
> + len1 = A_LEN;
> + len2 = A_LEN;
> + get_ppc_hw_breakpoint(&info1, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, 
> len1);
> + get_ppc_hw_breakpoint(&info2, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, 
> len2);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */
> + wh1 = ptrace_sethwdebug(child_pid, &info1);
> +
> + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */
> + wh2 = ptrace_sethwdebug(child_pid, &info2);
> +
> + ptrace(PTRACE_CONT, child_pid, NULL, 0);
> + check_success(child_pid, name, "WO", wp_addr1, len1);
> +
> + ptrace(P

Re: [PATCH] powerpc/iommu/debug: Remove redundant NULL check

2021-03-25 Thread Daniel Axtens

Daniel Axtens  writes:

It looks like the kernel test robot also reported this:
"[PATCH] powerpc/iommu/debug: fix ifnullfree.cocci warnings"
Weirdly I don't see it in patchwork.

I'm not sure which one mpe will want to take but either would do.

>> Fix the following coccicheck warnings:
>>
>> ./fs/io_uring.c:5989:4-9: WARNING: NULL check before some freeing
>> functions is not needed.

(Also, while unimportant, that's technically not the error you fix here
as it's for a different file!)

>
> This looks correct to me, and matches the description of debugfs_remove
> in Documentation/filesystems/debugfs.rst.
>
> If you have a number of similar fixes it might be helpful to do them in
> a single bigger patch, but I'm not sure if coccicheck reports much else
> as I don't have coccinelle installed at the moment.
>
> Reviewed-by: Daniel Axtens 
>
> Kind regards,
> Daniel

Re: [PATCH] crypto: vmx: fix incorrect kernel-doc comment syntax in files

2021-03-25 Thread Daniel Axtens

Hi Aditya,

Thanks for your patch!

> The opening comment mark '/**' is used for highlighting the beginning of
> kernel-doc comments.
> There are certain files in drivers/crypto/vmx, which follow this syntax,
> but the content inside does not comply with kernel-doc.
> Such lines were probably not meant for kernel-doc parsing, but are parsed
> due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
> causes unexpected warnings from kernel-doc.
>
> E.g., presence of kernel-doc like comment in the header line for
> drivers/crypto/vmx/vmx.c causes this warning by kernel-doc:
>
> "warning: expecting prototype for Routines supporting VMX instructions on the 
> Power 8(). Prototype was for p8_init() instead"

checkpatch (scripts/checkpatch.pl --strict -g HEAD) complains about this line:
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
but checkpatch should be ignored here, as you did the right thing by not
breaking an error message across multiple lines.

> Similarly for other files too.
>
> Provide a simple fix by replacing such occurrences with general comment
> format, i.e. '/*', to prevent kernel-doc from parsing it.

This makes sense.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Signed-off-by: Aditya Srivastava 
> ---
> * Applies perfectly on next-20210319
>
>  drivers/crypto/vmx/aes.c | 2 +-
>  drivers/crypto/vmx/aes_cbc.c | 2 +-
>  drivers/crypto/vmx/aes_ctr.c | 2 +-
>  drivers/crypto/vmx/aes_xts.c | 2 +-
>  drivers/crypto/vmx/ghash.c   | 2 +-
>  drivers/crypto/vmx/vmx.c | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
> index d05c02baebcf..ec06189fbf99 100644
> --- a/drivers/crypto/vmx/aes.c
> +++ b/drivers/crypto/vmx/aes.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
> index d88084447f1c..ed0debc7acb5 100644
> --- a/drivers/crypto/vmx/aes_cbc.c
> +++ b/drivers/crypto/vmx/aes_cbc.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES CBC routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
> index 79ba062ee1c1..9a3da8cd62f3 100644
> --- a/drivers/crypto/vmx/aes_ctr.c
> +++ b/drivers/crypto/vmx/aes_ctr.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES CTR routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c
> index 9fee1b1532a4..dabbccb41550 100644
> --- a/drivers/crypto/vmx/aes_xts.c
> +++ b/drivers/crypto/vmx/aes_xts.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES XTS routines supporting VMX In-core instructions on Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/ghash.c b/drivers/crypto/vmx/ghash.c
> index 14807ac2e3b9..5bc5710a6de0 100644
> --- a/drivers/crypto/vmx/ghash.c
> +++ b/drivers/crypto/vmx/ghash.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
> -/**
> +/*
>   * GHASH routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015, 2019 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/vmx.c b/drivers/crypto/vmx/vmx.c
> index a40d08e75fc0..7eb713cc87c8 100644
> --- a/drivers/crypto/vmx/vmx.c
> +++ b/drivers/crypto/vmx/vmx.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * Routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> -- 
> 2.17.1

Re: [PATCH] powerpc/iommu/debug: Remove redundant NULL check

2021-03-25 Thread Daniel Axtens

Hi Jiapeng Chong,  writes:

> Fix the following coccicheck warnings:
>
> ./fs/io_uring.c:5989:4-9: WARNING: NULL check before some freeing
> functions is not needed.

This looks correct to me, and matches the description of debugfs_remove
in Documentation/filesystems/debugfs.rst.

If you have a number of similar fixes it might be helpful to do them in
a single bigger patch, but I'm not sure if coccicheck reports much else
as I don't have coccinelle installed at the moment.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH] [v2] tools: testing: Remove duplicate includes

2021-03-25 Thread Daniel Axtens

Wan Jiabing  writes:

> sched.h has been included at line 33, so remove the 
> duplicate one at line 36.

> pthread.h has been included at line 17,so remove the 
> duplicate one at line 20.

I can see that both of these are correct from the diff.

> inttypes.h has been included at line 19, so remove the 
> duplicate one at line 23.

For this one I checked the file. Indeed there is another inttypes.h, so
this is also correct.

Again, all the automated checks pass. (although I don't think any of the
automated builds include selftests.)

So:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Signed-off-by: Wan Jiabing 
> ---
>  tools/testing/selftests/powerpc/mm/tlbie_test.c | 1 -
>  tools/testing/selftests/powerpc/tm/tm-poison.c  | 1 -
>  tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c | 1 -
>  3 files changed, 3 deletions(-)
>
> diff --git a/tools/testing/selftests/powerpc/mm/tlbie_test.c 
> b/tools/testing/selftests/powerpc/mm/tlbie_test.c
> index f85a0938ab25..48344a74b212 100644
> --- a/tools/testing/selftests/powerpc/mm/tlbie_test.c
> +++ b/tools/testing/selftests/powerpc/mm/tlbie_test.c
> @@ -33,7 +33,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c 
> b/tools/testing/selftests/powerpc/tm/tm-poison.c
> index 29e5f26af7b9..27c083a03d1f 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-poison.c
> +++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
> @@ -20,7 +20,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "tm.h"
>  
> diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c 
> b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
> index e2a0c07e8362..9ef37a9836ac 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
> +++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
> @@ -17,7 +17,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "tm.h"
>  #include "utils.h"
> -- 
> 2.25.1

Re: [PATCH] [v2] arch: powerpc: Remove duplicate includes

2021-03-25 Thread Daniel Axtens

Wan Jiabing  writes:

> mmu-hash.h: asm/bug.h has been included at line 12, so remove 
> the duplicate one at line 21.

Looking at the file I had wondered if this was due to a #ifdef being
removed, but no, the second one was just added in commit 891121e6c02c
("powerpc/mm: Differentiate between hugetlb and THP during page
walk"). How odd!

Anyway, all of these look good to me, and the automated checks at
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210323062916.295346-1-wanjiab...@vivo.com/
have all passed.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH v2 1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_offline()

2021-03-25 Thread Daniel Axtens

Hi Daniel,

Two small nitpicks:

> This patch adds a 'last online' check in dlpar_cpu_offline() to catch
> the 'last online CPU' offline error, eturning a more informative error
   ^--- s/eturning/returning/;


> + /* device_offline() will return -EBUSY (via cpu_down())
> +  * if there is only one CPU left. Check it here to fail
> +  * earlier and with a more informative error message,
> +  * while also retaining the cpu_add_remove_lock to be 
> sure
> +  * that no CPUs are being online/offlined during this
> +  * check. */

Checkpatch has a small issue with this comment:

WARNING: Block comments use a trailing */ on a separate line
#50: FILE: arch/powerpc/platforms/pseries/hotplug-cpu.c:279:
+* check. */

Apart from that, this patch seems sane to me, but I haven't been able to
test it.

Kind regards,
Daniel

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 941 matches

Mail list logo