[PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()

2022-01-24 Thread Nathan Lynch
Replace the outdated iteration and timeout calculations here with
indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
already does this when waiting for the cpu to set its online bit before
returning, so this change is not really making the function more brittle.

Removing the msleep(1) in the hotplug path here reduces the time it takes
to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
via thaw_secondary_cpus().

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/smp.c | 25 ++---
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b7fd6a72aa76..990893365fe0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1270,7 +1270,7 @@ static void cpu_idle_thread_init(unsigned int cpu, struct 
task_struct *idle)
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
-   int rc, c;
+   int rc;
 
/*
 * Don't allow secondary threads to come online if inhibited
@@ -1314,28 +1314,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
return rc;
}
 
-   /*
-* wait to see if the cpu made a callin (is actually up).
-* use this value that I found through experimentation.
-* -- Cort
-*/
-   if (system_state < SYSTEM_RUNNING)
-   for (c = 5; c && !cpu_callin_map[cpu]; c--)
-   udelay(100);
-#ifdef CONFIG_HOTPLUG_CPU
-   else
-   /*
-* CPUs can take much longer to come up in the
-* hotplug case.  Wait five seconds.
-*/
-   for (c = 5000; c && !cpu_callin_map[cpu]; c--)
-   msleep(1);
-#endif
-
-   if (!cpu_callin_map[cpu]) {
-   printk(KERN_ERR "Processor %u is stuck.\n", cpu);
-   return -ENOENT;
-   }
+   spin_until_cond(cpu_callin_map[cpu] != 0);
 
DBG("Processor %u found.\n", cpu);
 
-- 
2.34.1



Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c

2022-01-24 Thread Oscar Salvador

On 2022-01-19 20:06, Zi Yan wrote:

From: Zi Yan 

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan 


Reviewed-by: Oscar Salvador 

--
Oscar Salvador
SUSE Labs


Re: [PATCH 0/2] powerpc: Disable syscall emulation and stepping

2022-01-24 Thread Christophe Leroy


Le 25/01/2022 à 04:04, Nicholas Piggin a écrit :
> +Naveen (sorry missed cc'ing you at first)
> 
> Excerpts from Christophe Leroy's message of January 24, 2022 4:39 pm:
>>
>>
>> Le 24/01/2022 à 06:57, Nicholas Piggin a écrit :
>>> As discussed previously
>>>
>>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/238946.html
>>>
>>> I'm wondering whether PPC32 should be returning -1 for syscall
>>> instructions too here? That could be done in another patch anyway.
>>>
>>
>> The 'Programming Environments Manual for 32-Bit Implementations of the
>> PowerPC™ Architecture' says:
>>
>> The following are not traced:
>> • rfi instruction
>> • sc and trap instructions that trap
>> • Other instructions that cause interrupts (other than trace interrupts)
>> • The first instruction of any interrupt handler
>> • Instructions that are emulated by software
>>
>>
>> So I think PPC32 should return -1 as well.
> 
> I agree.
> 
> What about the trap instructions? analyse_instr returns 0 for them
> which falls through to return 0 for emulate_step, should they
> return -1 as well or am I missing something?
> 

For the traps I don't know. The manual says "trap instructions that 
trap" are not traced. It means that "trap instructions that _don't_ 
trap" are traced. Taking into account that trap instructions don't trap 
at least 99.9% of the time, not sure if returning -1 is needed.

Allthought that'd probably be the safest.

But then what happens with other instruction that will sparsely generate 
an exception like a DSI or so ? If we do it for the traps then we should 
do it for this as well, and then it becomes a non ending story.

So at the end it's probably ok with return 0, both for them and for traps.

Christophe

Re: [PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-01-24 Thread Christophe Leroy


Le 24/01/2022 à 22:43, Doug Anderson a écrit :
> Hi,
> 
> On Mon, Jan 24, 2022 at 1:22 AM Christophe Leroy
>  wrote:
>>
>> --- a/kernel/debug/kdb/kdb_main.c
>> +++ b/kernel/debug/kdb/kdb_main.c
>> @@ -2022,8 +2022,11 @@ static int kdb_lsmod(int argc, const char **argv)
>>  if (mod->state == MODULE_STATE_UNFORMED)
>>  continue;
>>
>> -   kdb_printf("%-20s%8u  0x%px ", mod->name,
>> -  mod->core_layout.size, (void *)mod);
>> +   kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
>> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>> +   kdb_printf("/%8u  0x%px ", mod->data_layout.size);
> 
> Just counting percentages and arguments, it seems like something's
> wrong in the above print statement.
> 

Yes it seems, the build robot reported something here as well.

Thanks
Christophe

Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Sachin Sant


> On 24-Jan-2022, at 10:15 PM, Steven Rostedt  wrote:
> 
> On Mon, 24 Jan 2022 20:15:06 +0800
> Yinan Liu  wrote:
> 
>> Hi, Steven and Sachin
>> 
>> I don't have a powerpc machine for testing, I guess the ppc has a 
>> similar problem with the s390. It's not clear to me why the compiler 
>> does this. Maybe we can handle ppc like you did with the s390 before, 
>> but I'm not sure if other architectures have similar issues. Or limit 
>> BUILDTIME_MCOUNT_SORT to a smaller scope and make it only available for 
>> x86 and arm?
>> 
>> steven, what's your opinion?
> 
> Yeah, I think it's time to opt in, instead of opting out.
> 
> Something like this:
> 
Thanks. This fixes the reported problem.

Tested-by: Sachin Sant 

- Sachin


Re: [PATCH 1/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled

2022-01-24 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of January 25, 2022 8:49 am:
> Nicholas Piggin  writes:
> 
>> PR KVM does not support running with AIL enabled, and SCV does is not
>> supported with AIL disabled.
>>
>> Fix this by ensuring the SCV facility is disabled with FSCR while a
>> CPU can be running with AIL=0. PowerNV host supports disabling AIL on a
>> per-CPU basis, so SCV just needs to be disabled when a vCPU is run.
>>
>> The pSeries machine can only switch AIL on a system-wide basis, so it
>> must disable SCV support at boot if the configuration can potentially
>> run a PR KVM guest.
>>
>> SCV is not emulated for the PR guest at the moment, this just fixes the
>> host crashes.
>>
>> Alternatives considered and rejected:
>> - SCV support can not be disabled by PR KVM after boot, because it is
>>   advertised to userspace with HWCAP.
>> - AIL can not be disabled on a per-CPU basis. At least when running on
>>   pseries it is a per-LPAR setting.
>> - Support for real-mode SCV vectors will not be added because they are
>>   at 0x17000 so making such a large fixed head space causes immediate
>>   value limits to be exceeded, requiring a lot rework and more code.
>> - Disabling SCV for any PR KVM possible kernel will cause a slowdown
>>   when not using PR KVM.
>> - A boot time option to disable SCV to use PR KVM is user-hostile.
>> - System call instruction emulation for SCV facility unavailable
>>   instructions is too complex and old emulation code was subtly broken
>>   and removed.
>>
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  arch/powerpc/kernel/exceptions-64s.S |  4 
>>  arch/powerpc/kernel/setup_64.c   | 15 +++
>>  arch/powerpc/kvm/book3s_pr.c | 20 ++--
>>  3 files changed, 33 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>> b/arch/powerpc/kernel/exceptions-64s.S
>> index 55caeee37c08..b66dd6f775a4 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -809,6 +809,10 @@ __start_interrupts:
>>   * - MSR_EE|MSR_RI is clear (no reentrant exceptions)
>>   * - Standard kernel environment is set up (stack, paca, etc)
>>   *
>> + * KVM:
>> + * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM
>> + * ensures that FSCR[SCV] is disabled whenever it has to force AIL off.
>> + *
>>   * Call convention:
>>   *
>>   * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
>> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
>> index be8577ac9397..ac52c69a3811 100644
>> --- a/arch/powerpc/kernel/setup_64.c
>> +++ b/arch/powerpc/kernel/setup_64.c
>> @@ -197,6 +197,21 @@ static void __init configure_exceptions(void)
>>
>>  /* Under a PAPR hypervisor, we need hypercalls */
>>  if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
>> +/*
>> + * PR KVM does not support AIL mode interrupts in the host, and
>> + * SCV system call interrupt vectors are only implemented for
>> + * AIL mode. Under pseries, AIL mode can only be enabled and
>> + * disabled system-wide so when PR KVM is loaded, all CPUs in
>> + * the host are set to AIL=0 mode. SCV can not be disabled
>> + * dynamically because the feature is advertised to host
>> + * userspace, so SCV support must not be enabled if PR KVM can
>> + * possibly be run.
>> + */
>> +if (IS_ENABLED(CONFIG_KVM_BOOK3S_PR_POSSIBLE) && 
>> !radix_enabled()) {
>> +init_task.thread.fscr &= ~FSCR_SCV;
>> +cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
>> +}
>> +
> 
> "Under pseries, AIL mode can only be enabled and disabled system-wide so
>  when PR KVM is loaded, all CPUs in the host are set to AIL=0 mode."
> 
> Loaded as in 'modprobe kvm_pr'?

In kvmppc_core_init_vm_pr(), so while there is a PR guest running in the 
system.

> And host as in "nested host"
> surely. Unless I completely misunderstood the patch (likely).

Yes the PR KVM host. I didn't want to say nested because it runs in 
supervisor mode so is basically no difference whether under a HV or
not so I'm not sure if that's the right term for PR KVM or could
confused with nested HV.

I will see if I can make it a bit clearer.

> Is there a way to make this less unexpected to users? Maybe a few words
> in the Kconfig entry for PR_POSSIBLE saying "if you enable this and run
> a Hash MMU guest, you lose SCV"?

That's not a bad idea, also if you run a PR guest under it you lose
AIL in the host which also slows down interrupts and system calls.

Thanks,
Nick

> 
>>  /* Enable AIL if possible */
>>  if (!pseries_enable_reloc_on_exc()) {
>>  init_task.thread.fscr &= ~FSCR_SCV;
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index 34a801c3604a..4d1c84b94b77 100644
>> --- 

Re: [PATCH v2 4/4] KVM: PPC: Decrement module refcount if init_vm fails

2022-01-24 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of January 25, 2022 8:08 am:
> We increment the reference count for KVM-HV/PR before the call to
> kvmppc_core_init_vm. If that function fails we need to decrement the
> refcount.
> 
> Signed-off-by: Fabiano Rosas 
> ---
> Caught this while testing Nick's LPID patches by looking at
> /sys/module/kvm_hv/refcnt

Nice catch. Is this the only change in the series?

You can just use kvm_ops->owner like try_module_get() does I think? Also
try_module_get works on a NULL module same as module_put by the looks,
so you could adjust that in this patch to remove the NULL check so it
is consistent with the put.

Reviewed-by: Nicholas Piggin 

Thanks,
Nick


> ---
>  arch/powerpc/kvm/powerpc.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2ad0ccd202d5..4285d0eac900 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -431,6 +431,8 @@ int kvm_arch_check_processor_compat(void *opaque)
>  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
>   struct kvmppc_ops *kvm_ops = NULL;
> + int r;
> +
>   /*
>* if we have both HV and PR enabled, default is HV
>*/
> @@ -456,7 +458,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   return -ENOENT;
>  
>   kvm->arch.kvm_ops = kvm_ops;
> - return kvmppc_core_init_vm(kvm);
> + r = kvmppc_core_init_vm(kvm);
> + if (r)
> + module_put(kvm->arch.kvm_ops->owner);
> + return r;
>  err_out:
>   return -EINVAL;
>  }
> -- 
> 2.34.1
> 
> 


Re: [PATCH v4 5/5] KVM: PPC: mmio: Deliver DSI after emulation failure

2022-01-24 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of January 22, 2022 8:26 am:
> MMIO emulation can fail if the guest uses an instruction that we are
> not prepared to emulate. Since these instructions can be and most
> likely are valid ones, this is (slightly) closer to an access fault
> than to an illegal instruction, so deliver a Data Storage interrupt
> instead of a Program interrupt.
> 
> Suggested-by: Nicholas Piggin 
> Signed-off-by: Fabiano Rosas 
> ---
>  arch/powerpc/kvm/emulate_loadstore.c | 10 +++---
>  arch/powerpc/kvm/powerpc.c   | 12 
>  2 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
> b/arch/powerpc/kvm/emulate_loadstore.c
> index 48272a9b9c30..cfc9114b87d0 100644
> --- a/arch/powerpc/kvm/emulate_loadstore.c
> +++ b/arch/powerpc/kvm/emulate_loadstore.c
> @@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
>  {
>   u32 inst;
>   enum emulation_result emulated = EMULATE_FAIL;
> - int advance = 1;
>   struct instruction_op op;
>  
>   /* this default type might be overwritten by subcategories */
> @@ -98,6 +97,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
>   int type = op.type & INSTR_TYPE_MASK;
>   int size = GETSIZE(op.type);
>  
> + vcpu->mmio_is_write = OP_IS_STORE(type);
> +
>   switch (type) {
>   case LOAD:  {
>   int instr_byte_swap = op.type & BYTEREV;
> @@ -355,15 +356,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
>   }
>   }
>  
> - if (emulated == EMULATE_FAIL) {
> - advance = 0;
> - kvmppc_core_queue_program(vcpu, 0);
> - }
> -
>   trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
>  
>   /* Advance past emulated instruction. */
> - if (advance)
> + if (emulated != EMULATE_FAIL)
>   kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
>  
>   return emulated;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 214602c58f13..9befb121dddb 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -305,10 +305,22 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
>   case EMULATE_FAIL:
>   {
>   u32 last_inst;
> + ulong store_bit = DSISR_ISSTORE;
> + ulong cause = DSISR_BADACCESS;
>  
> +#ifdef CONFIG_BOOKE
> + store_bit = ESR_ST;
> + cause = 0;
> +#endif

BookE can not cause a bad page fault in the guest with ESR bits AFAIKS, 
so it would cause an infinite fault loop here. Maybe stick with the 
program interrupt for BookE with a comment about that here.

And if it could use if (IS_ENABLED()) would be good?

Otherwise looks good, it should do the right thing on BookS.

Thanks,
Nick

>   kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst);
>   pr_info_ratelimited("KVM: guest access to device memory using 
> unsupported instruction (PID: %d opcode: %#08x)\n",
>   current->pid, last_inst);
> +
> + if (vcpu->mmio_is_write)
> + cause |= store_bit;
> +
> + kvmppc_core_queue_data_storage(vcpu, vcpu->arch.vaddr_accessed,
> +cause);
>   r = RESUME_GUEST;
>   break;
>   }
> -- 
> 2.34.1
> 
> 


Re: [PATCH v4 4/5] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-24 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of January 22, 2022 8:26 am:
> If MMIO emulation fails we don't want to crash the whole guest by
> returning to userspace.
> 
> The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
> implementation") added a todo:
> 
>   /* XXX Deliver Program interrupt to guest. */
> 
> and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
> emulation from priv emulation") added the Program interrupt injection
> but in another file, so I'm assuming it was missed that this block
> needed to be altered.
> 
> Also change the message to a ratelimited one since we're letting the
> guest run and it could flood the host logs.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Nicholas Piggin 

One small thing...

> ---
>  arch/powerpc/kvm/powerpc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 27fb2b70f631..214602c58f13 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -307,9 +307,9 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
>   u32 last_inst;
>  
>   kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst);
> - /* XXX Deliver Program interrupt to guest. */
> - pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
> - r = RESUME_HOST;
> + pr_info_ratelimited("KVM: guest access to device memory using 
> unsupported instruction (PID: %d opcode: %#08x)\n",
> + current->pid, last_inst);

Minor thing but KVM now has some particular printing helpers so I wonder 
if we should start moving to them in general with our messages.

vcpu_debug_ratelimited() maybe?

Thanks,
Nick


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Yinan Liu

Yeah, I think it's time to opt in, instead of opting out.

Something like this:

-- Steve

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c2724d986fa0..5256ebe57451 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -82,6 +82,7 @@ config ARM
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..7996548b2b27 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -166,6 +166,7 @@ config ARM64
select HAVE_ASM_MODVERSIONS
select HAVE_EBPF_JIT
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_CONTEXT_TRACKING
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7399327d1eff..46080dea5dba 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -186,6 +186,7 @@ config X86
select HAVE_CONTEXT_TRACKING_OFFSTACK   if HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
select HAVE_OBJTOOL_MCOUNT  if STACK_VALIDATION
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 752ed89a293b..7e5b92090faa 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -70,10 +70,16 @@ config HAVE_C_RECORDMCOUNT
help
  C version of recordmcount available?
  
+config HAVE_BUILDTIME_MCOUNT_SORT

+   bool
+   help
+ An architecture selects this if it sorts the mcount_loc section
+at build time.
+
  config BUILDTIME_MCOUNT_SORT
 bool
 default y
-   depends on BUILDTIME_TABLE_SORT && !S390
+   depends on HAVE_BUILDTIME_MCOUNT_SORT
 help
   Sort the mcount_loc section at build time.


LGTM. This will no longer destroy ftrace on other architectures.
Those arches that we are not sure about can test and enable this 
function by themselves.



Best regards
--yinan


Re: [PATCH v4 3/5] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-24 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of January 22, 2022 8:26 am:
> The MMIO interface between the kernel and userspace uses a structure
> that supports a maximum of 8-bytes of data. Instructions that access
> more than that need to be emulated in parts.
> 
> We currently don't have generic support for splitting the emulation in
> parts and each set of instructions needs to be explicitly included.
> 
> There's already an error message being printed when a load or store
> exceeds the mmio.data buffer but we don't fail the emulation until
> later at kvmppc_complete_mmio_load and even then we allow userspace to
> make a partial copy of the data, which ends up overwriting some fields
> of the mmio structure.
> 
> This patch makes the emulation fail earlier at kvmppc_handle_load|store,
> which will send a Program interrupt to the guest. This is better than
> allowing the guest to proceed with partial data.
> 
> Note that this was caught in a somewhat artificial scenario using
> quadword instructions (lq/stq), there's no account of an actual guest
> in the wild running instructions that are not properly emulated.
> 
> (While here, remove the "bad MMIO" messages. The caller already has an
> error message.)
> 
> Signed-off-by: Fabiano Rosas 
> Reviewed-by: Alexey Kardashevskiy 

Reviewed-by: Nicholas Piggin 

> ---
>  arch/powerpc/kvm/powerpc.c | 16 +---
>  1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index c2bd29e90314..27fb2b70f631 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -1114,10 +1114,8 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu 
> *vcpu)
>   struct kvm_run *run = vcpu->run;
>   u64 gpr;
>  
> - if (run->mmio.len > sizeof(gpr)) {
> - printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len);
> + if (run->mmio.len > sizeof(gpr))
>   return;
> - }
>  
>   if (!vcpu->arch.mmio_host_swabbed) {
>   switch (run->mmio.len) {
> @@ -1236,10 +1234,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
>   host_swabbed = !is_default_endian;
>   }
>  
> - if (bytes > sizeof(run->mmio.data)) {
> - printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
> -run->mmio.len);
> - }
> + if (bytes > sizeof(run->mmio.data))
> + return EMULATE_FAIL;
>  
>   run->mmio.phys_addr = vcpu->arch.paddr_accessed;
>   run->mmio.len = bytes;
> @@ -1325,10 +1321,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
>   host_swabbed = !is_default_endian;
>   }
>  
> - if (bytes > sizeof(run->mmio.data)) {
> - printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
> -run->mmio.len);
> - }
> + if (bytes > sizeof(run->mmio.data))
> + return EMULATE_FAIL;
>  
>   run->mmio.phys_addr = vcpu->arch.paddr_accessed;
>   run->mmio.len = bytes;
> -- 
> 2.34.1
> 
> 


Re: [PATCH 0/2] powerpc: Disable syscall emulation and stepping

2022-01-24 Thread Nicholas Piggin
+Naveen (sorry missed cc'ing you at first)

Excerpts from Christophe Leroy's message of January 24, 2022 4:39 pm:
> 
> 
> Le 24/01/2022 à 06:57, Nicholas Piggin a écrit :
>> As discussed previously
>> 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/238946.html
>> 
>> I'm wondering whether PPC32 should be returning -1 for syscall
>> instructions too here? That could be done in another patch anyway.
>> 
> 
> The 'Programming Environments Manual for 32-Bit Implementations of the 
> PowerPC™ Architecture' says:
> 
> The following are not traced:
> • rfi instruction
> • sc and trap instructions that trap
> • Other instructions that cause interrupts (other than trace interrupts)
> • The first instruction of any interrupt handler
> • Instructions that are emulated by software
> 
> 
> So I think PPC32 should return -1 as well.

I agree.

What about the trap instructions? analyse_instr returns 0 for them
which falls through to return 0 for emulate_step, should they
return -1 as well or am I missing something?

Thanks,
Nick


Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Randy Dunlap



On 1/24/22 17:23, Felix Kuehling wrote:
> 
> Am 2022-01-24 um 14:11 schrieb Randy Dunlap:
>> On 1/24/22 10:55, Geert Uytterhoeven wrote:
>>> Hi Alex,
>>>
>>> On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher  wrote:
 On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven  
 wrote:
> On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:
>>   + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: 
>> error: control reaches end of non-void function [-Werror=return-type]:  
>> => 1560:1
 I don't really see what's going on here:

 #ifdef CONFIG_X86_64
 return cpu_data(first_cpu_of_numa_node).apicid;
 #else
 return first_cpu_of_numa_node;
 #endif
>>> Ah, the actual failure causing this was not included:
>>>
>>> In file included from /kisskb/src/arch/x86/um/asm/processor.h:41:0,
>>>   from /kisskb/src/include/linux/mutex.h:19,
>>>   from /kisskb/src/include/linux/kernfs.h:11,
>>>   from /kisskb/src/include/linux/sysfs.h:16,
>>>   from /kisskb/src/include/linux/kobject.h:20,
>>>   from /kisskb/src/include/linux/pci.h:35,
>>>   from
>>> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:25:
>>> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In
>>> function 'kfd_cpumask_to_apic_id':
>>> /kisskb/src/arch/um/include/asm/processor-generic.h:103:18: error:
>>> called object is not a function or function pointer
>>>   #define cpu_data (&boot_cpu_data)
>>>    ^
>>> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9:
>>> note: in expansion of macro 'cpu_data'
>>>    return cpu_data(first_cpu_of_numa_node).apicid;
>>>   ^
>>> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1:
>>> error: control reaches end of non-void function [-Werror=return-type]
>>>   }
>>>   ^
>> ah yes, UML.
>> I have a bunch of UML fixes that I have been hesitant to post.
>>
>> This is one of them.
>> What do people think about this?
> 
> Does it make sense to configure a UML kernel with a real device driver in the 
> first place? Or should we just prevent enabling amdgpu for UML with a Kconfig 
> dependency?
> 

Hi,

Your option IMO. I have seen both opinions given.
I also meant to reply that someone could just add
depends on !UML
for this device, like you are suggesting.

I'm fine with it either way.

thanks.

> 
>>
>> thanks.
>>
>> ---
>> From: Randy Dunlap 
>>
>>
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9: note: in 
>> expansion of macro ‘cpu_data’
>>    return cpu_data(first_cpu_of_numa_node).apicid;
>>   ^~~~
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1: error: 
>> control reaches end of non-void function [-Werror=return-type]
>>
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
>> ‘kfd_fill_iolink_info_for_cpu’:
>> ../arch/um/include/asm/processor-generic.h:103:19: error: called object is 
>> not a function or function pointer
>>   #define cpu_data (&boot_cpu_data)
>>    ~^~~
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1688:27: note: in 
>> expansion of macro ‘cpu_data’
>>    struct cpuinfo_x86 *c = &cpu_data(0);
>>     ^~~~
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:7: error: 
>> dereferencing pointer to incomplete type ‘struct cpuinfo_x86’
>>    if (c->x86_vendor == X86_VENDOR_AMD)
>>     ^~
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:23: error: 
>> ‘X86_VENDOR_AMD’ undeclared (first use in this function); did you mean 
>> ‘X86_VENDOR_ANY’?
>>    if (c->x86_vendor == X86_VENDOR_AMD)
>>     ^~
>>     X86_VENDOR_ANY
>>
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
>> ‘kfd_create_vcrat_image_cpu’:
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1742:11: warning: unused 
>> variable ‘entries’ [-Wunused-variable]
>>    uint32_t entries = 0;
>>
>> Signed-off-by: Randy Dunlap 
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c |    6 +++---
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 +-
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> --- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -1552,7 +1552,7 @@ static int kfd_cpumask_to_apic_id(const
>>   first_cpu_of_numa_node = cpumask_first(cpumask);
>>   if (first_cpu_of_numa_node >= nr_cpu_ids)
>>   return -1;
>> -#ifdef CONFIG_X86_64
>> +#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
>>   return cpu_data(first_cpu_of_numa_node).apicid;
>>   #else
>>   return first_cpu_of_numa_node;
>> --- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> +++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> @@ -1679,7 +1679,7 @@ static int kfd_fill_mem_

Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Felix Kuehling



Am 2022-01-24 um 14:11 schrieb Randy Dunlap:

On 1/24/22 10:55, Geert Uytterhoeven wrote:

Hi Alex,

On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher  wrote:

On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven  wrote:

On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:

  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: error: 
control reaches end of non-void function [-Werror=return-type]:  => 1560:1

I don't really see what's going on here:

#ifdef CONFIG_X86_64
return cpu_data(first_cpu_of_numa_node).apicid;
#else
return first_cpu_of_numa_node;
#endif

Ah, the actual failure causing this was not included:

In file included from /kisskb/src/arch/x86/um/asm/processor.h:41:0,
  from /kisskb/src/include/linux/mutex.h:19,
  from /kisskb/src/include/linux/kernfs.h:11,
  from /kisskb/src/include/linux/sysfs.h:16,
  from /kisskb/src/include/linux/kobject.h:20,
  from /kisskb/src/include/linux/pci.h:35,
  from
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:25:
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In
function 'kfd_cpumask_to_apic_id':
/kisskb/src/arch/um/include/asm/processor-generic.h:103:18: error:
called object is not a function or function pointer
  #define cpu_data (&boot_cpu_data)
   ^
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9:
note: in expansion of macro 'cpu_data'
   return cpu_data(first_cpu_of_numa_node).apicid;
  ^
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1:
error: control reaches end of non-void function [-Werror=return-type]
  }
  ^

ah yes, UML.
I have a bunch of UML fixes that I have been hesitant to post.

This is one of them.
What do people think about this?


Does it make sense to configure a UML kernel with a real device driver 
in the first place? Or should we just prevent enabling amdgpu for UML 
with a Kconfig dependency?


Regards,
  Felix




thanks.

---
From: Randy Dunlap 


../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9: note: in 
expansion of macro ‘cpu_data’
   return cpu_data(first_cpu_of_numa_node).apicid;
  ^~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1: error: control 
reaches end of non-void function [-Werror=return-type]

../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
‘kfd_fill_iolink_info_for_cpu’:
../arch/um/include/asm/processor-generic.h:103:19: error: called object is not 
a function or function pointer
  #define cpu_data (&boot_cpu_data)
   ~^~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1688:27: note: in expansion 
of macro ‘cpu_data’
   struct cpuinfo_x86 *c = &cpu_data(0);
^~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:7: error: dereferencing 
pointer to incomplete type ‘struct cpuinfo_x86’
   if (c->x86_vendor == X86_VENDOR_AMD)
^~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:23: error: 
‘X86_VENDOR_AMD’ undeclared (first use in this function); did you mean 
‘X86_VENDOR_ANY’?
   if (c->x86_vendor == X86_VENDOR_AMD)
^~
X86_VENDOR_ANY

../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
‘kfd_create_vcrat_image_cpu’:
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1742:11: warning: unused 
variable ‘entries’ [-Wunused-variable]
   uint32_t entries = 0;

Signed-off-by: Randy Dunlap 
---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c |6 +++---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c |2 +-
  2 files changed, 4 insertions(+), 4 deletions(-)

--- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1552,7 +1552,7 @@ static int kfd_cpumask_to_apic_id(const
first_cpu_of_numa_node = cpumask_first(cpumask);
if (first_cpu_of_numa_node >= nr_cpu_ids)
return -1;
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
return cpu_data(first_cpu_of_numa_node).apicid;
  #else
return first_cpu_of_numa_node;
--- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1679,7 +1679,7 @@ static int kfd_fill_mem_info_for_cpu(int
return 0;
  }
  
-#ifdef CONFIG_X86_64

+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
  static int kfd_fill_iolink_info_for_cpu(int numa_node_id, int *avail_size,
uint32_t *num_entries,
struct crat_subtype_iolink *sub_type_hdr)
@@ -1738,7 +1738,7 @@ static int kfd_create_vcrat_image_cpu(vo
struct crat_subtype_generic *sub_type_hdr;
int avail_size = *size;
int numa_node_id;
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
uint32_t entries = 0

Re: [PATCH 2/2] KVM: PPC: Book3S PR: Disallow AIL != 0

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> KVM PR does not implement address translation modes on interrupt, so it
> must not allow H_SET_MODE to succeed.
>
> This is not compatible with QEMU behaviour. The solution might be to
> have a cap-ail for this, but now it's broken either way so fix it in
> KVM to start with.
>
> This allows PR Linux guests that are using the SCV facility to boot and
> run, because Linux disables the use of SCV if AIL can not be set to 3.
> This isn't a real fix because Linux or another OS could implement real
> mode SCV vectors and try to enable it. The right solution is for KVM to
> emulate scv interrupts from the facility unavailable interrupt.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/kvm/book3s_pr_papr.c | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
> b/arch/powerpc/kvm/book3s_pr_papr.c
> index 1f10e7dfcdd0..dc4f51ac84bc 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -281,6 +281,22 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu 
> *vcpu)
>   return EMULATE_DONE;
>  }
>
> +static int kvmppc_h_pr_set_mode(struct kvm_vcpu *vcpu)
> +{
> + unsigned long mflags = kvmppc_get_gpr(vcpu, 4);
> + unsigned long resource = kvmppc_get_gpr(vcpu, 5);
> +
> + if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) {
> + /* KVM PR does not provide AIL!=0 to guests */
> + if (mflags == 0)
> + kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
> + else
> + kvmppc_set_gpr(vcpu, 3, H_UNSUPPORTED_FLAG_START - 63);
> + return EMULATE_DONE;
> + }
> + return EMULATE_FAIL;
> +}
> +
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
>  {
> @@ -384,6 +400,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>   return kvmppc_h_pr_logical_ci_load(vcpu);
>   case H_LOGICAL_CI_STORE:
>   return kvmppc_h_pr_logical_ci_store(vcpu);
> + case H_SET_MODE:
> + return kvmppc_h_pr_set_mode(vcpu);
>   case H_XIRR:
>   case H_CPPR:
>   case H_EOI:
> @@ -421,6 +439,7 @@ int kvmppc_hcall_impl_pr(unsigned long cmd)
>   case H_CEDE:
>   case H_LOGICAL_CI_LOAD:
>   case H_LOGICAL_CI_STORE:
> + case H_SET_MODE:
>  #ifdef CONFIG_KVM_XICS
>   case H_XIRR:
>   case H_CPPR:
> @@ -447,6 +466,7 @@ static unsigned int default_hcall_list[] = {
>   H_BULK_REMOVE,
>   H_PUT_TCE,
>   H_CEDE,
> + H_SET_MODE,
>  #ifdef CONFIG_KVM_XICS
>   H_XIRR,
>   H_CPPR,


Re: [PATCH 1/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> PR KVM does not support running with AIL enabled, and SCV does is not
> supported with AIL disabled.
>
> Fix this by ensuring the SCV facility is disabled with FSCR while a
> CPU can be running with AIL=0. PowerNV host supports disabling AIL on a
> per-CPU basis, so SCV just needs to be disabled when a vCPU is run.
>
> The pSeries machine can only switch AIL on a system-wide basis, so it
> must disable SCV support at boot if the configuration can potentially
> run a PR KVM guest.
>
> SCV is not emulated for the PR guest at the moment, this just fixes the
> host crashes.
>
> Alternatives considered and rejected:
> - SCV support can not be disabled by PR KVM after boot, because it is
>   advertised to userspace with HWCAP.
> - AIL can not be disabled on a per-CPU basis. At least when running on
>   pseries it is a per-LPAR setting.
> - Support for real-mode SCV vectors will not be added because they are
>   at 0x17000 so making such a large fixed head space causes immediate
>   value limits to be exceeded, requiring a lot rework and more code.
> - Disabling SCV for any PR KVM possible kernel will cause a slowdown
>   when not using PR KVM.
> - A boot time option to disable SCV to use PR KVM is user-hostile.
> - System call instruction emulation for SCV facility unavailable
>   instructions is too complex and old emulation code was subtly broken
>   and removed.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S |  4 
>  arch/powerpc/kernel/setup_64.c   | 15 +++
>  arch/powerpc/kvm/book3s_pr.c | 20 ++--
>  3 files changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 55caeee37c08..b66dd6f775a4 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -809,6 +809,10 @@ __start_interrupts:
>   * - MSR_EE|MSR_RI is clear (no reentrant exceptions)
>   * - Standard kernel environment is set up (stack, paca, etc)
>   *
> + * KVM:
> + * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM
> + * ensures that FSCR[SCV] is disabled whenever it has to force AIL off.
> + *
>   * Call convention:
>   *
>   * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index be8577ac9397..ac52c69a3811 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -197,6 +197,21 @@ static void __init configure_exceptions(void)
>
>   /* Under a PAPR hypervisor, we need hypercalls */
>   if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> + /*
> +  * PR KVM does not support AIL mode interrupts in the host, and
> +  * SCV system call interrupt vectors are only implemented for
> +  * AIL mode. Under pseries, AIL mode can only be enabled and
> +  * disabled system-wide so when PR KVM is loaded, all CPUs in
> +  * the host are set to AIL=0 mode. SCV can not be disabled
> +  * dynamically because the feature is advertised to host
> +  * userspace, so SCV support must not be enabled if PR KVM can
> +  * possibly be run.
> +  */
> + if (IS_ENABLED(CONFIG_KVM_BOOK3S_PR_POSSIBLE) && 
> !radix_enabled()) {
> + init_task.thread.fscr &= ~FSCR_SCV;
> + cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
> + }
> +

"Under pseries, AIL mode can only be enabled and disabled system-wide so
 when PR KVM is loaded, all CPUs in the host are set to AIL=0 mode."

Loaded as in 'modprobe kvm_pr'? And host as in "nested host"
surely. Unless I completely misunderstood the patch (likely).

Is there a way to make this less unexpected to users? Maybe a few words
in the Kconfig entry for PR_POSSIBLE saying "if you enable this and run
a Hash MMU guest, you lose SCV"?

>   /* Enable AIL if possible */
>   if (!pseries_enable_reloc_on_exc()) {
>   init_task.thread.fscr &= ~FSCR_SCV;
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 34a801c3604a..4d1c84b94b77 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -140,9 +140,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu 
> *vcpu, int cpu)
>  #endif
>
>   /* Disable AIL if supported */
> - if (cpu_has_feature(CPU_FTR_HVMODE) &&
> - cpu_has_feature(CPU_FTR_ARCH_207S))
> - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
> + if (cpu_has_feature(CPU_FTR_HVMODE)) {
> + if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
> + if (cpu_has_feature(CPU_FTR_ARCH_300) && (current->thread.fscr 
> & FSCR_SCV))
> + 

[PATCH v2 0/4] KVM: PPC: KVM module exit fixes

2022-01-24 Thread Fabiano Rosas
I stumbled upon another issue with our module exit so I'm sending
another version to add a fix for it.

- patches 1 and 3 are already reviewed;

- patch 2 lacks a Reviewed-by. Nick asked about an issue Alexey might
  have encountered. I haven't heard of any issues with the module exit
  aside from the ones that this series fixes;

- patch 4 is new. It fixes an issue with module refcounting.

v1:
https://lore.kernel.org/r/20211223211931.3560887-1-faro...@linux.ibm.com

Fabiano Rosas (4):
  KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
  KVM: PPC: Book3S HV: Delay setting of kvm ops
  KVM: PPC: Book3S HV: Free allocated memory if module init fails
  KVM: PPC: Decrement module refcount if init_vm fails

 arch/powerpc/kvm/book3s_hv.c | 28 
 arch/powerpc/kvm/powerpc.c   |  7 ++-
 2 files changed, 26 insertions(+), 9 deletions(-)

-- 
2.34.1



Re: [PATCH 6/6] KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> KVMPPC_NR_LPIDS no longer represents any size restriction on the
> LPID space and can be removed. A CPU with more than 12 LPID bits
> implemented will now be able to create more than 4095 guests.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/kvm_book3s_asm.h | 3 ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 3 ---
>  2 files changed, 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
> b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index e6bda70b1d93..c8882d9b86c2 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -14,9 +14,6 @@
>  #define XICS_MFRR0xc
>  #define XICS_IPI 2   /* interrupt source # for IPIs */
>
> -/* LPIDs we support with this build -- runtime limit may be lower */
> -#define KVMPPC_NR_LPIDS  (1UL << 12)
> -
>  /* Maximum number of threads per physical core */
>  #define MAX_SMT_THREADS  8
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index f983fb36cbf2..aafd2a74304c 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -269,9 +269,6 @@ int kvmppc_mmu_hv_init(void)
>   nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT;
>   }
>
> - if (nr_lpids > KVMPPC_NR_LPIDS)
> - nr_lpids = KVMPPC_NR_LPIDS;
> -
>   if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
>   /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */
>   if (cpu_has_feature(CPU_FTR_ARCH_207S))


Re: [PATCH 5/6] KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic,
> fix it to 4096 (12-bits) explicitly for now.
>
> kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS
> because the L1 partition table registration hcall already did that, and
> it checks against the partition table size.
>
> This patch also puts all the partition table size calculations into the
> same form, using 12 for the architected size field shift and 4 for the
> shift corresponding to the partition table entry size.
>
> Signed-of-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/kvm_host.h |  7 ++-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c |  2 +-
>  arch/powerpc/kvm/book3s_hv_nested.c | 24 +++-
>  3 files changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 5fd0564e5c94..e6fb03884dcc 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -34,7 +34,12 @@
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  #include   /* for MAX_SMT_THREADS */
>  #define KVM_MAX_VCPU_IDS (MAX_SMT_THREADS * KVM_MAX_VCORES)
> -#define KVM_MAX_NESTED_GUESTSKVMPPC_NR_LPIDS
> +
> +/*
> + * Limit the nested partition table to 4096 entries (because that's what
> + * hardware supports). Both guest and host use this value.
> + */
> +#define KVM_MAX_NESTED_GUESTS_SHIFT  12
>
>  #else
>  #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 5be92d5bc099..f983fb36cbf2 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -266,7 +266,7 @@ int kvmppc_mmu_hv_init(void)
>   return -EINVAL;
>   nr_lpids = 1UL << mmu_lpid_bits;
>   } else {
> - nr_lpids = KVM_MAX_NESTED_GUESTS;
> + nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT;
>   }
>
>   if (nr_lpids > KVMPPC_NR_LPIDS)
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 1eff969b095c..75169e0753ce 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -439,10 +439,11 @@ long kvmhv_nested_init(void)
>   if (!radix_enabled())
>   return -ENODEV;
>
> - /* find log base 2 of KVMPPC_NR_LPIDS, rounding up */
> - ptb_order = __ilog2(KVMPPC_NR_LPIDS - 1) + 1;
> - if (ptb_order < 8)
> - ptb_order = 8;
> + /* Partition table entry is 1<<4 bytes in size, hence the 4. */
> + ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4;
> + /* Minimum partition table size is 1<<12 bytes */
> + if (ptb_order < 12)
> + ptb_order = 12;
>   pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order,
>  GFP_KERNEL);
>   if (!pseries_partition_tb) {
> @@ -450,7 +451,7 @@ long kvmhv_nested_init(void)
>   return -ENOMEM;
>   }
>
> - ptcr = __pa(pseries_partition_tb) | (ptb_order - 8);
> + ptcr = __pa(pseries_partition_tb) | (ptb_order - 12);
>   rc = plpar_hcall_norets(H_SET_PARTITION_TABLE, ptcr);
>   if (rc != H_SUCCESS) {
>   pr_err("kvm-hv: Parent hypervisor does not support nesting 
> (rc=%ld)\n",
> @@ -534,16 +535,14 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu)
>   long ret = H_SUCCESS;
>
>   srcu_idx = srcu_read_lock(&kvm->srcu);
> - /*
> -  * Limit the partition table to 4096 entries (because that's what
> -  * hardware supports), and check the base address.
> -  */
> - if ((ptcr & PRTS_MASK) > 12 - 8 ||
> + /* Check partition size and base address. */
> + if ((ptcr & PRTS_MASK) + 12 - 4 > KVM_MAX_NESTED_GUESTS_SHIFT ||
>   !kvm_is_visible_gfn(vcpu->kvm, (ptcr & PRTB_MASK) >> PAGE_SHIFT))
>   ret = H_PARAMETER;
>   srcu_read_unlock(&kvm->srcu, srcu_idx);
>   if (ret == H_SUCCESS)
>   kvm->arch.l1_ptcr = ptcr;
> +
>   return ret;
>  }
>
> @@ -639,7 +638,7 @@ static void kvmhv_update_ptbl_cache(struct 
> kvm_nested_guest *gp)
>
>   ret = -EFAULT;
>   ptbl_addr = (kvm->arch.l1_ptcr & PRTB_MASK) + (gp->l1_lpid << 4);
> - if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 8))) {
> + if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12 - 4))) {
>   int srcu_idx = srcu_read_lock(&kvm->srcu);
>   ret = kvm_read_guest(kvm, ptbl_addr,
>&ptbl_entry, sizeof(ptbl_entry));
> @@ -809,8 +808,7 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm 
> *kvm, int l1_lpid,
>  {
>   struct kvm_nested_guest *gp, *newgp;
>
> - if (l1_lpid >= KVM_MAX_NESTED_GUESTS ||
> - l1_lpid >= (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12 - 4)))
> + if (l1_lpid >= (

Re: [PATCH 4/6] KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This removes the fixed sized kvm->arch.nested_guests array.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/include/asm/kvm_host.h |   3 +-
>  arch/powerpc/kvm/book3s_hv_nested.c | 110 +++-
>  2 files changed, 59 insertions(+), 54 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index d9bf60bf0816..5fd0564e5c94 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -326,8 +326,7 @@ struct kvm_arch {
>   struct list_head uvmem_pfns;
>   struct mutex mmu_setup_lock;/* nests inside vcpu mutexes */
>   u64 l1_ptcr;
> - int max_nested_lpid;
> - struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS];
> + struct idr kvm_nested_guest_idr;
>   /* This array can grow quite large, keep it at the end */
>   struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
>  #endif
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 9d373f8963ee..1eff969b095c 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -521,11 +521,6 @@ static void kvmhv_set_nested_ptbl(struct 
> kvm_nested_guest *gp)
>   kvmhv_set_ptbl_entry(gp->shadow_lpid, dw0, gp->process_table);
>  }
>
> -void kvmhv_vm_nested_init(struct kvm *kvm)
> -{
> - kvm->arch.max_nested_lpid = -1;
> -}
> -
>  /*
>   * Handle the H_SET_PARTITION_TABLE hcall.
>   * r4 = guest real address of partition table + log_2(size) - 12
> @@ -660,6 +655,35 @@ static void kvmhv_update_ptbl_cache(struct 
> kvm_nested_guest *gp)
>   kvmhv_set_nested_ptbl(gp);
>  }
>
> +void kvmhv_vm_nested_init(struct kvm *kvm)
> +{
> + idr_init(&kvm->arch.kvm_nested_guest_idr);
> +}
> +
> +static struct kvm_nested_guest *__find_nested(struct kvm *kvm, int lpid)
> +{
> + return idr_find(&kvm->arch.kvm_nested_guest_idr, lpid);
> +}
> +
> +static bool __prealloc_nested(struct kvm *kvm, int lpid)
> +{
> + if (idr_alloc(&kvm->arch.kvm_nested_guest_idr,
> + NULL, lpid, lpid + 1, GFP_KERNEL) != lpid)
> + return false;
> + return true;
> +}
> +
> +static void __add_nested(struct kvm *kvm, int lpid, struct kvm_nested_guest 
> *gp)
> +{
> + if (idr_replace(&kvm->arch.kvm_nested_guest_idr, gp, lpid))
> + WARN_ON(1);
> +}
> +
> +static void __remove_nested(struct kvm *kvm, int lpid)
> +{
> + idr_remove(&kvm->arch.kvm_nested_guest_idr, lpid);
> +}
> +
>  static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned 
> int lpid)
>  {
>   struct kvm_nested_guest *gp;
> @@ -720,13 +744,8 @@ static void kvmhv_remove_nested(struct kvm_nested_guest 
> *gp)
>   long ref;
>
>   spin_lock(&kvm->mmu_lock);
> - if (gp == kvm->arch.nested_guests[lpid]) {
> - kvm->arch.nested_guests[lpid] = NULL;
> - if (lpid == kvm->arch.max_nested_lpid) {
> - while (--lpid >= 0 && !kvm->arch.nested_guests[lpid])
> - ;
> - kvm->arch.max_nested_lpid = lpid;
> - }
> + if (gp == __find_nested(kvm, lpid)) {
> + __remove_nested(kvm, lpid);
>   --gp->refcnt;
>   }
>   ref = gp->refcnt;
> @@ -743,24 +762,22 @@ static void kvmhv_remove_nested(struct kvm_nested_guest 
> *gp)
>   */
>  void kvmhv_release_all_nested(struct kvm *kvm)
>  {
> - int i;
> + int lpid;
>   struct kvm_nested_guest *gp;
>   struct kvm_nested_guest *freelist = NULL;
>   struct kvm_memory_slot *memslot;
>   int srcu_idx, bkt;
>
>   spin_lock(&kvm->mmu_lock);
> - for (i = 0; i <= kvm->arch.max_nested_lpid; i++) {
> - gp = kvm->arch.nested_guests[i];
> - if (!gp)
> - continue;
> - kvm->arch.nested_guests[i] = NULL;
> + idr_for_each_entry(&kvm->arch.kvm_nested_guest_idr, gp, lpid) {
> + __remove_nested(kvm, lpid);
>   if (--gp->refcnt == 0) {
>   gp->next = freelist;
>   freelist = gp;
>   }
>   }
> - kvm->arch.max_nested_lpid = -1;
> + idr_destroy(&kvm->arch.kvm_nested_guest_idr);
> + /* idr is empty and may be reused at this point */
>   spin_unlock(&kvm->mmu_lock);
>   while ((gp = freelist) != NULL) {
>   freelist = gp->next;
> @@ -797,7 +814,7 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm 
> *kvm, int l1_lpid,
>   return NULL;
>
>   spin_lock(&kvm->mmu_lock);
> - gp = kvm->arch.nested_guests[l1_lpid];
> + gp = __find_nested(kvm, l1_lpid);
>   if (gp)
>   ++gp->refcnt;
>   spin_unlock(&kvm->mmu_lock);
> @@ -808,17 +825,19 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm 
> *kvm, int l1_lpid,
>   newgp = kvmhv_alloc_nested(kvm, l1_lpid);
>   if 

Re: [PATCH 3/6] KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This removes the fixed-size lpid_inuse array.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/kvm/powerpc.c | 25 +
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 102993462872..c527a5751b46 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -2453,20 +2453,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   return r;
>  }
>
> -static unsigned long lpid_inuse[BITS_TO_LONGS(KVMPPC_NR_LPIDS)];
> +static DEFINE_IDA(lpid_inuse);
>  static unsigned long nr_lpids;
>
>  long kvmppc_alloc_lpid(void)
>  {
> - long lpid;
> + int lpid;
>
> - do {
> - lpid = find_first_zero_bit(lpid_inuse, KVMPPC_NR_LPIDS);
> - if (lpid >= nr_lpids) {
> + /* The host LPID must always be 0 (allocation starts at 1) */
> + lpid = ida_alloc_range(&lpid_inuse, 1, nr_lpids - 1, GFP_KERNEL);
> + if (lpid < 0) {
> + if (lpid == -ENOMEM)
> + pr_err("%s: Out of memory\n", __func__);
> + else
>   pr_err("%s: No LPIDs free\n", __func__);
> - return -ENOMEM;
> - }
> - } while (test_and_set_bit(lpid, lpid_inuse));
> + return -ENOMEM;
> + }
>
>   return lpid;
>  }
> @@ -2474,15 +2476,14 @@ EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid);
>
>  void kvmppc_free_lpid(long lpid)
>  {
> - clear_bit(lpid, lpid_inuse);
> + ida_free(&lpid_inuse, lpid);
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_free_lpid);
>
> +/* nr_lpids_param includes the host LPID */
>  void kvmppc_init_lpid(unsigned long nr_lpids_param)
>  {
> - nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param);
> - memset(lpid_inuse, 0, sizeof(lpid_inuse));
> - set_bit(0, lpid_inuse); /* The host LPID must always be 0 */
> + nr_lpids = nr_lpids_param;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_init_lpid);


Re: [PATCH 2/6] KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> The LPID allocator init is changed to:
> - use mmu_lpid_bits rather than hard-coding;
> - use KVM_MAX_NESTED_GUESTS for nested hypervisors;
> - not reserve the top LPID on POWER9 and newer CPUs.
>
> The reserved LPID is made a POWER7/8-specific detail.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/include/asm/kvm_book3s_asm.h |  2 +-
>  arch/powerpc/include/asm/reg.h|  2 --
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 29 ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  8 +++
>  arch/powerpc/mm/init_64.c |  3 +++
>  5 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
> b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index b6d31bff5209..e6bda70b1d93 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -15,7 +15,7 @@
>  #define XICS_IPI 2   /* interrupt source # for IPIs */
>
>  /* LPIDs we support with this build -- runtime limit may be lower */
> -#define KVMPPC_NR_LPIDS  (LPID_RSVD + 1)
> +#define KVMPPC_NR_LPIDS  (1UL << 12)
>
>  /* Maximum number of threads per physical core */
>  #define MAX_SMT_THREADS  8
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 1e14324c5190..1e8b2e04e626 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -473,8 +473,6 @@
>  #ifndef SPRN_LPID
>  #define SPRN_LPID0x13F   /* Logical Partition Identifier */
>  #endif
> -#define   LPID_RSVD_POWER7   0x3ff   /* Reserved LPID for partn switching */
> -#define   LPID_RSVD  0xfff   /* Reserved LPID for partn switching */
>  #define  SPRN_HMER   0x150   /* Hypervisor maintenance exception reg 
> */
>  #define   HMER_DEBUG_TRIG(1ul << (63 - 17)) /* Debug trigger */
>  #define  SPRN_HMEER  0x151   /* Hyp maintenance exception enable reg 
> */
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 09fc52b6f390..5be92d5bc099 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -256,7 +256,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
> kvm_memory_slot *memslot,
>
>  int kvmppc_mmu_hv_init(void)
>  {
> - unsigned long rsvd_lpid;
> + unsigned long nr_lpids;
>
>   if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE))
>   return -EINVAL;
> @@ -264,16 +264,29 @@ int kvmppc_mmu_hv_init(void)
>   if (cpu_has_feature(CPU_FTR_HVMODE)) {
>   if (WARN_ON(mfspr(SPRN_LPID) != 0))
>   return -EINVAL;
> + nr_lpids = 1UL << mmu_lpid_bits;
> + } else {
> + nr_lpids = KVM_MAX_NESTED_GUESTS;
>   }
>
> - /* POWER8 and above have 12-bit LPIDs (10-bit in POWER7) */
> - if (cpu_has_feature(CPU_FTR_ARCH_207S))
> - rsvd_lpid = LPID_RSVD;
> - else
> - rsvd_lpid = LPID_RSVD_POWER7;
> + if (nr_lpids > KVMPPC_NR_LPIDS)
> + nr_lpids = KVMPPC_NR_LPIDS;
> +
> + if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
> + /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */
> + if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + WARN_ON(nr_lpids != 1UL << 12);
> + else
> + WARN_ON(nr_lpids != 1UL << 10);
> +
> + /*
> +  * Reserve the last implemented LPID use in partition
> +  * switching for POWER7 and POWER8.
> +  */
> + nr_lpids -= 1;
> + }
>
> - /* rsvd_lpid is reserved for use in partition switching */
> - kvmppc_init_lpid(rsvd_lpid);
> + kvmppc_init_lpid(nr_lpids);
>
>   return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d185dee26026..0c552885a032 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -50,6 +50,14 @@
>  #define STACK_SLOT_UAMOR (SFS-88)
>  #define STACK_SLOT_FSCR  (SFS-96)
>
> +/*
> + * Use the last LPID (all implemented LPID bits = 1) for partition switching.
> + * This is reserved in the LPID allocator. POWER7 only implements 0x3ff, but
> + * we write 0xfff into the LPID SPR anyway, which seems to work and just
> + * ignores the top bits.
> + */
> +#define   LPID_RSVD  0xfff
> +
>  /*
>   * Call kvmppc_hv_entry in real mode.
>   * Must be called with interrupts hard-disabled.
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 35f46bf54281..ad1a41e3ff1c 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -371,6 +371,9 @@ void register_page_bootmem_memmap(unsigned long 
> section_nr,
>
>  #ifdef CONFIG_PPC_BOOK3S_64
>  unsigned int mmu_lpid_bits;
> +#ifdef CONFIG_KVM_B

[PATCH v2 4/4] KVM: PPC: Decrement module refcount if init_vm fails

2022-01-24 Thread Fabiano Rosas
We increment the reference count for KVM-HV/PR before the call to
kvmppc_core_init_vm. If that function fails we need to decrement the
refcount.

Signed-off-by: Fabiano Rosas 
---
Caught this while testing Nick's LPID patches by looking at
/sys/module/kvm_hv/refcnt
---
 arch/powerpc/kvm/powerpc.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..4285d0eac900 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -431,6 +431,8 @@ int kvm_arch_check_processor_compat(void *opaque)
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
struct kvmppc_ops *kvm_ops = NULL;
+   int r;
+
/*
 * if we have both HV and PR enabled, default is HV
 */
@@ -456,7 +458,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return -ENOENT;
 
kvm->arch.kvm_ops = kvm_ops;
-   return kvmppc_core_init_vm(kvm);
+   r = kvmppc_core_init_vm(kvm);
+   if (r)
+   module_put(kvm->arch.kvm_ops->owner);
+   return r;
 err_out:
return -EINVAL;
 }
-- 
2.34.1



[PATCH v2 3/4] KVM: PPC: Book3S HV: Free allocated memory if module init fails

2022-01-24 Thread Fabiano Rosas
The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b9aace212599..87a49651a402 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6104,7 +6104,7 @@ static int kvmppc_book3s_init_hv(void)
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
r = kvm_init_subcore_bitmap();
if (r)
-   return r;
+   goto err;
}
 
/*
@@ -6120,7 +6120,8 @@ static int kvmppc_book3s_init_hv(void)
np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
if (!np) {
pr_err("KVM-HV: Cannot determine method for accessing 
XICS\n");
-   return -ENODEV;
+   r = -ENODEV;
+   goto err;
}
/* presence of intc confirmed - node can be dropped again */
of_node_put(np);
@@ -6133,12 +6134,12 @@ static int kvmppc_book3s_init_hv(void)
 
r = kvmppc_mmu_hv_init();
if (r)
-   return r;
+   goto err;
 
if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
if (r)
-   return r;
+   goto err;
}
 
r = kvmppc_uvmem_init();
@@ -6151,6 +6152,12 @@ static int kvmppc_book3s_init_hv(void)
kvmppc_hv_ops = &kvm_ops_hv;
 
return 0;
+
+err:
+   kvmhv_nested_exit();
+   kvmppc_radix_exit();
+
+   return r;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.34.1



[PATCH v2 2/4] KVM: PPC: Book3S HV: Delay setting of kvm ops

2022-01-24 Thread Fabiano Rosas
Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3a3845f366d4..b9aace212599 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6127,9 +6127,6 @@ static int kvmppc_book3s_init_hv(void)
}
 #endif
 
-   kvm_ops_hv.owner = THIS_MODULE;
-   kvmppc_hv_ops = &kvm_ops_hv;
-
init_default_hcalls();
 
init_vcore_lists();
@@ -6145,10 +6142,15 @@ static int kvmppc_book3s_init_hv(void)
}
 
r = kvmppc_uvmem_init();
-   if (r < 0)
+   if (r < 0) {
pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+   return r;
+   }
 
-   return r;
+   kvm_ops_hv.owner = THIS_MODULE;
+   kvmppc_hv_ops = &kvm_ops_hv;
+
+   return 0;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.34.1



[PATCH v2 1/4] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init

2022-01-24 Thread Fabiano Rosas
The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d1817cd9a691..3a3845f366d4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6138,8 +6138,11 @@ static int kvmppc_book3s_init_hv(void)
if (r)
return r;
 
-   if (kvmppc_radix_possible())
+   if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
+   if (r)
+   return r;
+   }
 
r = kvmppc_uvmem_init();
if (r < 0)
-- 
2.34.1



Re: [PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-01-24 Thread Doug Anderson
Hi,

On Mon, Jan 24, 2022 at 1:22 AM Christophe Leroy
 wrote:
>
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -2022,8 +2022,11 @@ static int kdb_lsmod(int argc, const char **argv)
> if (mod->state == MODULE_STATE_UNFORMED)
> continue;
>
> -   kdb_printf("%-20s%8u  0x%px ", mod->name,
> -  mod->core_layout.size, (void *)mod);
> +   kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> +   kdb_printf("/%8u  0x%px ", mod->data_layout.size);

Just counting percentages and arguments, it seems like something's
wrong in the above print statement.

-Doug


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Yinan Liu

在 2022/1/24 下午5:19, Sachin Sant 写道:

While running stress_code_patching test from selftests/powerpc/mm
against 5.17-rc1 booted on a POWER10 LPAR following ftrace warning
is seen:

WARNING: CPU: 1 PID: 2017392 at kernel/trace/ftrace.c:2068 
ftrace_bug+0x274/0x2d8
Modules linked in: dm_mod bonding rfkill sunrpc pseries_rng xts vmx_crypto 
uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg 
ibmvscsi ibmveth scsi_transport_srp fuse
CPU: 1 PID: 2017392 Comm: stress_code_pat Not tainted 5.17.0-rc1-gdd81e1c7d5fb 
#1
NIP:  c02d561c LR: c02d5618 CTR: 005b4448
REGS: c000332fb760 TRAP: 0700   Not tainted  (5.17.0-rc1-gdd81e1c7d5fb)
MSR:  8282b033   CR: 48228224  XER: 
0009
CFAR: c01f6b00 IRQMASK: 0
GPR00: c02d5618 c000332fba00 c2a2 0022
GPR04: 7fff c000332fb720 c000332fb718 0027
GPR08: c0167cca7e10 0001 0027 c28d6d08
GPR12: 8000 c0167fa30780 4000 7fff9a089798
GPR16: 7fff9a089724 7fff9a026be8 7fff99fbf4f0 7fff9a08d568
GPR20: 7fffce533ed0 0001 7fff9a0399d8 7fffd9eccf94
GPR24: 0001  c000332fbc70 c0fb0d18
GPR28: c0ff5080 c0fadd38 c20032ec c70800a8
NIP [c02d561c] ftrace_bug+0x274/0x2d8
LR [c02d5618] ftrace_bug+0x270/0x2d8


Hi, Steven and Sachin

I don't have a powerpc machine for testing, I guess the ppc has a 
similar problem with the s390. It's not clear to me why the compiler 
does this. Maybe we can handle ppc like you did with the s390 before, 
but I'm not sure if other architectures have similar issues. Or limit 
BUILDTIME_MCOUNT_SORT to a smaller scope and make it only available for 
x86 and arm?


steven, what's your opinion?


Best regards
--yinan



[PATCH v4] powerpc/papr_scm: Implement initial support for injecting smart errors

2022-01-24 Thread Vaibhav Jain
Presently PAPR doesn't support injecting smart errors on an
NVDIMM. This makes testing the NVDIMM health reporting functionality
difficult as simulating NVDIMM health related events need a hacked up
qemu version.

To solve this problem this patch proposes simulating certain set of
NVDIMM health related events in papr_scm. Specifically 'fatal' health
state and 'dirty' shutdown state. These error can be injected via the
user-space 'ndctl-inject-smart(1)' command. With the proposed patch and
corresponding ndctl patches following command flow is expected:

$ sudo ndctl list -DH -d nmem0
...
  "health_state":"ok",
  "shutdown_state":"clean",
...
 # inject unsafe shutdown and fatal health error
$ sudo ndctl inject-smart nmem0 -Uf
...
  "health_state":"fatal",
  "shutdown_state":"dirty",
...
 # uninject all errors
$ sudo ndctl inject-smart nmem0 -N
...
  "health_state":"ok",
  "shutdown_state":"clean",
...

The patch adds a new member 'health_bitmap_inject_mask' inside struct
papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the
hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs
at nmemX/papr/health_bitmap_inject.

A new PDSM named 'SMART_INJECT' is proposed that accepts newly
introduced 'struct nd_papr_pdsm_smart_inject' as payload thats
exchanged between libndctl and papr_scm to indicate the requested
smart-error states.

When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject()
constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload
and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being
fetched from the hypervisor, the health_bitmap reflects requested smart-error
states.

Signed-off-by: Vaibhav Jain 
Signed-off-by: Shivaprasad G Bhat 
---
Changelog:

Since v3:
* Renamed the sysfs entry from 'health_bitmap_override' to
'health_bitmap_inject'.
* Simplified the variable names and removed the 'health_bitmap_{mask,override}'
members. Instead replaced them with a single 'health_bitmap_inject_mask'
member. [Aneesh]
* Updated the sysfs documentations and commit description.
* Used READ/WRITE_ONCE macros at places where 'health_bitmap_inject_mask' may be
accessed concurrently.

Since v2:
* Rebased the patch to ppc-next
* Added documentation for newly introduced sysfs attribute 
'health_bitmap_override'

Since v1:
* Updated the patch description.
* Removed dependency of a header movement patch.
* Removed '__packed' attribute for 'struct nd_papr_pdsm_smart_inject' [Aneesh]
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem | 12 +++
 arch/powerpc/include/uapi/asm/papr_pdsm.h | 18 
 arch/powerpc/platforms/pseries/papr_scm.c | 90 ++-
 3 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem 
b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 95254cec92bf..4ac0673901e7 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -61,3 +61,15 @@ Description:
* "CchRHCnt" : Cache Read Hit Count
* "CchWHCnt" : Cache Write Hit Count
* "FastWCnt" : Fast Write Count
+
+What:  /sys/bus/nd/devices/nmemX/papr/health_bitmap_inject
+Date:  Jan, 2022
+KernelVersion: v5.17
+Contact:   linuxppc-dev , 
nvd...@lists.linux.dev,
+Description:
+   (RO) Reports the health bitmap inject bitmap that is applied to
+   bitmap received from PowerVM via the H_SCM_HEALTH. This is used
+   to forcibly set specific bits returned from Hcall. These is then
+   used to simulate various health or shutdown states for an nvdimm
+   and are set by user-space tools like ndctl by issuing a PAPR 
DSM.
+
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 82488b1e7276..17439925045c 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -116,6 +116,22 @@ struct nd_papr_pdsm_health {
};
 };
 
+/* Flags for injecting specific smart errors */
+#define PDSM_SMART_INJECT_HEALTH_FATAL (1 << 0)
+#define PDSM_SMART_INJECT_BAD_SHUTDOWN (1 << 1)
+
+struct nd_papr_pdsm_smart_inject {
+   union {
+   struct {
+   /* One or more of PDSM_SMART_INJECT_ */
+   __u32 flags;
+   __u8 fatal_enable;
+   __u8 unsafe_shutdown_enable;
+   };
+   __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
+   };
+};
+
 /*
  * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
  * via 'nd_cmd_pkg.nd_command' member of the ioctl struct
@@ -123,12 +139,14 @@ struct nd_papr_pdsm_health {
 enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
PAPR_PDSM_HEALTH,
+   PAPR_PDSM_SMART_INJECT,
PAPR_PDSM_MAX,
 };
 
 /* Maximal union that can hol

Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Randy Dunlap



On 1/24/22 10:55, Geert Uytterhoeven wrote:
> Hi Alex,
> 
> On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher  wrote:
>> On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven  
>> wrote:
>>> On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:
  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: error: 
 control reaches end of non-void function [-Werror=return-type]:  => 1560:1
>>
>> I don't really see what's going on here:
>>
>> #ifdef CONFIG_X86_64
>> return cpu_data(first_cpu_of_numa_node).apicid;
>> #else
>> return first_cpu_of_numa_node;
>> #endif
> 
> Ah, the actual failure causing this was not included:
> 
> In file included from /kisskb/src/arch/x86/um/asm/processor.h:41:0,
>  from /kisskb/src/include/linux/mutex.h:19,
>  from /kisskb/src/include/linux/kernfs.h:11,
>  from /kisskb/src/include/linux/sysfs.h:16,
>  from /kisskb/src/include/linux/kobject.h:20,
>  from /kisskb/src/include/linux/pci.h:35,
>  from
> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:25:
> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In
> function 'kfd_cpumask_to_apic_id':
> /kisskb/src/arch/um/include/asm/processor-generic.h:103:18: error:
> called object is not a function or function pointer
>  #define cpu_data (&boot_cpu_data)
>   ^
> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9:
> note: in expansion of macro 'cpu_data'
>   return cpu_data(first_cpu_of_numa_node).apicid;
>  ^
> /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1:
> error: control reaches end of non-void function [-Werror=return-type]
>  }
>  ^

ah yes, UML.
I have a bunch of UML fixes that I have been hesitant to post.

This is one of them.
What do people think about this?

thanks.

---
From: Randy Dunlap 


../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9: note: in 
expansion of macro ‘cpu_data’
  return cpu_data(first_cpu_of_numa_node).apicid;
 ^~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1: error: control 
reaches end of non-void function [-Werror=return-type]

../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
‘kfd_fill_iolink_info_for_cpu’:
../arch/um/include/asm/processor-generic.h:103:19: error: called object is not 
a function or function pointer
 #define cpu_data (&boot_cpu_data)
  ~^~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1688:27: note: in expansion 
of macro ‘cpu_data’
  struct cpuinfo_x86 *c = &cpu_data(0);
   ^~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:7: error: dereferencing 
pointer to incomplete type ‘struct cpuinfo_x86’
  if (c->x86_vendor == X86_VENDOR_AMD)
   ^~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1691:23: error: 
‘X86_VENDOR_AMD’ undeclared (first use in this function); did you mean 
‘X86_VENDOR_ANY’?
  if (c->x86_vendor == X86_VENDOR_AMD)
   ^~
   X86_VENDOR_ANY

../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c: In function 
‘kfd_create_vcrat_image_cpu’:
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:1742:11: warning: unused 
variable ‘entries’ [-Wunused-variable]
  uint32_t entries = 0;

Signed-off-by: Randy Dunlap 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |6 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

--- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1552,7 +1552,7 @@ static int kfd_cpumask_to_apic_id(const
first_cpu_of_numa_node = cpumask_first(cpumask);
if (first_cpu_of_numa_node >= nr_cpu_ids)
return -1;
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
return cpu_data(first_cpu_of_numa_node).apicid;
 #else
return first_cpu_of_numa_node;
--- linux-next-20220107.orig/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ linux-next-20220107/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1679,7 +1679,7 @@ static int kfd_fill_mem_info_for_cpu(int
return 0;
 }
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
 static int kfd_fill_iolink_info_for_cpu(int numa_node_id, int *avail_size,
uint32_t *num_entries,
struct crat_subtype_iolink *sub_type_hdr)
@@ -1738,7 +1738,7 @@ static int kfd_create_vcrat_image_cpu(vo
struct crat_subtype_generic *sub_type_hdr;
int avail_size = *size;
int numa_node_id;
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
uint32_t entries = 0;
 #endif
int ret = 0;
@@ -1803,7 +1803,7 @@ static int kfd_create_vcrat_image_cpu(vo
sub_type_hdr->length);
 
/* Fill in Subtype:

Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Geert Uytterhoeven
Hi Alex,

On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher  wrote:
> On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven  
> wrote:
> > On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:
> > >  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: 
> > > error: control reaches end of non-void function [-Werror=return-type]:  
> > > => 1560:1
>
> I don't really see what's going on here:
>
> #ifdef CONFIG_X86_64
> return cpu_data(first_cpu_of_numa_node).apicid;
> #else
> return first_cpu_of_numa_node;
> #endif

Ah, the actual failure causing this was not included:

In file included from /kisskb/src/arch/x86/um/asm/processor.h:41:0,
 from /kisskb/src/include/linux/mutex.h:19,
 from /kisskb/src/include/linux/kernfs.h:11,
 from /kisskb/src/include/linux/sysfs.h:16,
 from /kisskb/src/include/linux/kobject.h:20,
 from /kisskb/src/include/linux/pci.h:35,
 from
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:25:
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In
function 'kfd_cpumask_to_apic_id':
/kisskb/src/arch/um/include/asm/processor-generic.h:103:18: error:
called object is not a function or function pointer
 #define cpu_data (&boot_cpu_data)
  ^
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1556:9:
note: in expansion of macro 'cpu_data'
  return cpu_data(first_cpu_of_numa_node).apicid;
 ^
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1560:1:
error: control reaches end of non-void function [-Werror=return-type]
 }
 ^

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Alex Deucher
On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven  wrote:
>
> On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:
> > Below is the list of build error/warning regressions/improvements in
> > v5.17-rc1[1] compared to v5.16[2].
> >
> > Summarized:
> >  - build errors: +17/-2
> >  - build warnings: +23/-25
> >
> > Note that there may be false regressions, as some logs are incomplete.
> > Still, they're build errors/warnings.
> >
> > Happy fixing! ;-)
> >
> > Thanks to the linux-next team for providing the build service.
> >
> > [1] 
> > http://kisskb.ellerman.id.au/kisskb/branch/linus/head/e783362eb54cd99b2cac8b3a9aeac942e6f6ac07/
> >  (all 99 configs)
> > [2] 
> > http://kisskb.ellerman.id.au/kisskb/branch/linus/head/df0cc57e057f18e44dac8e6c18aba47ab53202f9/
> >  (98 out of 99 configs)
> >
> >
> > *** ERRORS ***
> >
> > 17 error regressions:
> >  + /kisskb/src/arch/powerpc/kernel/stacktrace.c: error: implicit 
> > declaration of function 'nmi_cpu_backtrace' 
> > [-Werror=implicit-function-declaration]:  => 171:2
> >  + /kisskb/src/arch/powerpc/kernel/stacktrace.c: error: implicit 
> > declaration of function 'nmi_trigger_cpumask_backtrace' 
> > [-Werror=implicit-function-declaration]:  => 226:2
>
> powerpc-gcc5/skiroot_defconfig
>
> >  + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible 
> > function types from 'void (*)(long unsigned int)' to 'void (*)(long 
> > unsigned int,  long unsigned int,  long unsigned int,  long unsigned int,  
> > long unsigned int)' [-Werror=cast-function-type]:  => 1756:13, 1639:13
> >  + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible 
> > function types from 'void (*)(struct mm_struct *)' to 'void (*)(long 
> > unsigned int,  long unsigned int,  long unsigned int,  long unsigned int,  
> > long unsigned int)' [-Werror=cast-function-type]:  => 1674:29, 1662:29
> >  + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible 
> > function types from 'void (*)(struct mm_struct *, long unsigned int)' to 
> > 'void (*)(long unsigned int,  long unsigned int,  long unsigned int,  long 
> > unsigned int,  long unsigned int)' [-Werror=cast-function-type]:  => 1767:21
> >  + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible 
> > function types from 'void (*)(struct vm_area_struct *, long unsigned int)' 
> > to 'void (*)(long unsigned int,  long unsigned int,  long unsigned int,  
> > long unsigned int,  long unsigned int)' [-Werror=cast-function-type]:  => 
> > 1741:29, 1726:29
> >  + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible 
> > function types from 'void (*)(struct vm_area_struct *, long unsigned int,  
> > long unsigned int)' to 'void (*)(long unsigned int,  long unsigned int,  
> > long unsigned int,  long unsigned int,  long unsigned int)' 
> > [-Werror=cast-function-type]:  => 1694:29, 1711:29
>
> sparc64-gcc11/sparc-allmodconfig
>
> >  + /kisskb/src/arch/um/include/asm/processor-generic.h: error: called 
> > object is not a function or function pointer:  => 103:18
> >  + /kisskb/src/drivers/vfio/pci/vfio_pci_rdwr.c: error: assignment makes 
> > pointer from integer without a cast [-Werror=int-conversion]:  => 324:9, 
> > 317:9
> >  + /kisskb/src/drivers/vfio/pci/vfio_pci_rdwr.c: error: implicit 
> > declaration of function 'ioport_map' 
> > [-Werror=implicit-function-declaration]:  => 317:11
> >  + /kisskb/src/drivers/vfio/pci/vfio_pci_rdwr.c: error: implicit 
> > declaration of function 'ioport_unmap' 
> > [-Werror=implicit-function-declaration]:  => 338:15
>
> um-x86_64/um-allyesconfig
>
> >  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: error: 
> > control reaches end of non-void function [-Werror=return-type]:  => 1560:1

I don't really see what's going on here:

#ifdef CONFIG_X86_64
return cpu_data(first_cpu_of_numa_node).apicid;
#else
return first_cpu_of_numa_node;
#endif

Alex

>
> um-x86_64/um-all{mod,yes}config
>
> >  + /kisskb/src/drivers/net/ethernet/freescale/fec_mpc52xx.c: error: passing 
> > argument 2 of 'mpc52xx_fec_set_paddr' discards 'const' qualifier from 
> > pointer target type [-Werror=discarded-qualifiers]:  => 659:29
>
> powerpc-gcc5/ppc32_allmodconfig
>
> >  + /kisskb/src/drivers/pinctrl/pinctrl-thunderbay.c: error: assignment 
> > discards 'const' qualifier from pointer target type 
> > [-Werror=discarded-qualifiers]:  => 815:8, 815:29
>
> arm64-gcc5.4/arm64-allmodconfig
> arm64-gcc8/arm64-allmodconfig
>
> >  + /kisskb/src/lib/test_printf.c: error: "PTR" redefined [-Werror]:  => 
> > 247:0, 247
> >  + /kisskb/src/sound/pci/ca0106/ca0106.h: error: "PTR" redefined [-Werror]: 
> >  => 62, 62:0
>
> mips-gcc8/mips-allmodconfig
> mipsel/mips-allmodconfig
>
> >  + error: arch/powerpc/kvm/book3s_64_entry.o: relocation truncated to fit: 
> > R_PPC64_REL14 (stub) against symbol `machine_check_common' defined in .text 
> > section in arch/powerpc/kernel/head_64.o:  => (.text+0x3e4)
>
> powerpc-gcc5/powerpc-allyesconfig
>
> Gr{oetje,eeting}s,
>
> 

Re: [PATCH] KVM: PPC: Book3S HV P9: Optimise loads around context switch

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> It is better to get all loads for the register values in flight
> before starting to switch LPID, PID, and LPCR because those
> mtSPRs are expensive and serialising.
>
> This also just tidies up the code for a potential future change
> to the context switching sequence.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 


Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Jakub Kicinski
On Mon, 24 Jan 2022 09:04:33 -0800 Jakub Kicinski wrote:
> On Mon, 24 Jan 2022 08:55:40 +0100 (CET) Geert Uytterhoeven wrote:
> > >  + /kisskb/src/drivers/net/ethernet/freescale/fec_mpc52xx.c: error: 
> > > passing argument 2 of 'mpc52xx_fec_set_paddr' discards 'const' qualifier 
> > > from pointer target type [-Werror=discarded-qualifiers]:  => 659:29
> > 
> > powerpc-gcc5/ppc32_allmodconfig

Sent:
https://lore.kernel.org/r/20220124172249.2827138-1-k...@kernel.org/

> > >  + /kisskb/src/drivers/pinctrl/pinctrl-thunderbay.c: error: assignment 
> > > discards 'const' qualifier from pointer target type 
> > > [-Werror=discarded-qualifiers]:  => 815:8, 815:29
> > 
> > arm64-gcc5.4/arm64-allmodconfig
> > arm64-gcc8/arm64-allmodconfig  

I take this one back, that's not me.


Re: [PATCH] KVM: PPC: Book3S HV: HFSCR[PREFIX] does not exist

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This facility is controlled by FSCR only. Reserved bits should not be
> set in the HFSCR register (although it's likely harmless as this
> position would not be re-used, and the L0 is forgiving here too).
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/reg.h | 1 -
>  arch/powerpc/kvm/book3s_hv.c   | 2 +-
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 2835f6363228..1e14324c5190 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -417,7 +417,6 @@
>  #define   FSCR_DSCR  __MASK(FSCR_DSCR_LG)
>  #define   FSCR_INTR_CAUSE (ASM_CONST(0xFF) << 56)/* interrupt cause */
>  #define SPRN_HFSCR   0xbe/* HV=1 Facility Status & Control Register */
> -#define   HFSCR_PREFIX   __MASK(FSCR_PREFIX_LG)
>  #define   HFSCR_MSGP __MASK(FSCR_MSGP_LG)
>  #define   HFSCR_TAR  __MASK(FSCR_TAR_LG)
>  #define   HFSCR_EBB  __MASK(FSCR_EBB_LG)
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 84c89f08ae9a..be8914c3dde9 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2830,7 +2830,7 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu 
> *vcpu)
>* to trap and then we emulate them.
>*/
>   vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
> - HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX;
> + HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
>   if (cpu_has_feature(CPU_FTR_HVMODE)) {
>   vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM


Re: [PATCH] KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs

2022-01-24 Thread Fabiano Rosas
Nicholas Piggin  writes:

> The L0 is storing HFSCR requested by the L1 for the L2 in struct
> kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest
> is not a per-vCPU structure. Hilarity ensues.
>
> Fix it by moving the nested hfscr into the vCPU structure together with
> the other per-vCPU nested fields.
>
> Fixes: 8b210a880b35 ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state 
> accessible")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 


Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-01-24 Thread Zi Yan
On 24 Jan 2022, at 4:55, Oscar Salvador wrote:

> On 2022-01-19 20:06, Zi Yan wrote:
>> From: Zi Yan 
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
>
> Hi Zi Yan,
>
> I had to re-read this several times as I found this a bit misleading.
> I was mainly confused by the fact that memory_hotplug does isolation on 
> PAGES_PER_SECTION granularity, and reading the above seems to indicate that 
> either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity.

You are right. Sorry for the confusion. I think it should be
“Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) granularity.”

memory_hotplug uses PAGES_PER_SECTION. It is greater than that.


>
> True is that start_isolate_page_range() expects the range to be pageblock 
> aligned and works in pageblock_nr_pages chunks, but I do not think that is 
> what you meant to say here.

Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
be an uncovered bug in the current code, since all callers uses at least
max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.

The reason is that if start_isolate_page_range() is only pageblock aligned
and a caller wants to isolate one pageblock from a MAX_ORDER-1
(2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
accounting error. To avoid it, start_isolate_page_range() needs to isolate
the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.


>
> Now, to the change itself, below:
>
>
>> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
>> *zone, struct page *page,
>>  return page;
>>  }
>>
>> -for (; iter < pageblock_nr_pages - offset; iter++) {
>> -page = pfn_to_page(pfn + iter);
>> +for (pfn = first_pfn; pfn < last_pfn; pfn++) {
>
> You already did pfn = first_pfn before.

Got it. Will remove the redundant code.

>
>>  /**
>>   * start_isolate_page_range() - make page-allocation-type of range of pages 
>> to
>>   * be MIGRATE_ISOLATE.
>> - * @start_pfn:  The lower PFN of the range to be isolated.
>> - * @end_pfn:The upper PFN of the range to be isolated.
>> + * @start_pfn:  The lower PFN of the range to be checked for
>> + *  possibility of isolation.
>> + * @end_pfn:The upper PFN of the range to be checked for
>> + *  possibility of isolation.
>> + * @isolate_start:  The lower PFN of the range to be isolated.
>> + * @isolate_end:The upper PFN of the range to be isolated.
>
> So, what does "possibility" means here. I think this need to be clarified a 
> bit better.

start_isolate_page_range() needs to check if unmovable pages exist in the
range [start_pfn, end_pfn) but mark all pageblocks within [isolate_start,
isolate_end) MIGRATE_ISOLATE (isolate_* need to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned). But now I realize “possibility” here is very
confusing, since both ranges decide whether the isolation can succeed.

>
> From what you pointed out in the commit message I think what you are doing is:
>
> - alloc_contig_range() gets a range to be isolated.
> - then you pass two ranges to start_isolate_page_range()
>   (start_pfn, end_pfn]: which is the unaligned range you got in 
> alloc_contig_range()
>   (isolate_start, isolate_end]: which got aligned to, let's say, to 
> MAX_ORDER_NR_PAGES
>
> Now, most likely, (start_pfn, end_pfn] only covers a sub-range of 
> (isolate_start, isolate_end], and that
> sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

Correct.

I agree that isolate_start and isolate_end are pretty confusing here.
They are implementation details of start_isolate_page_range() and should
not be exposed. I will remove them from the parameter list and produce
them inside start_isolate_page_range(). They are pfn_max_align_down()
and pfn_max_align_up() of start_pfn and end_pfn, respectively.

In alloc_contig_range(), the code is still needed to save and restore
migrateypes for [isolate_start, start_pfn) and (end_pfn, isolate_end],
because [start_pfn, end_pfn) is not required to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned. Like I said in the patch, the code will
go away once MIGRATE_ISOLATE becomes a standalone bit without overwriting
existing migratetypes during page isolation. And then isolate_start
and isolate_end here will be completely transparent to callers of
start_isolate_page_ra

Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Jakub Kicinski
On Mon, 24 Jan 2022 08:55:40 +0100 (CET) Geert Uytterhoeven wrote:
> >  + /kisskb/src/drivers/net/ethernet/freescale/fec_mpc52xx.c: error: passing 
> > argument 2 of 'mpc52xx_fec_set_paddr' discards 'const' qualifier from 
> > pointer target type [-Werror=discarded-qualifiers]:  => 659:29  
> 
> powerpc-gcc5/ppc32_allmodconfig
> 
> >  + /kisskb/src/drivers/pinctrl/pinctrl-thunderbay.c: error: assignment 
> > discards 'const' qualifier from pointer target type 
> > [-Werror=discarded-qualifiers]:  => 815:8, 815:29  
> 
> arm64-gcc5.4/arm64-allmodconfig
> arm64-gcc8/arm64-allmodconfig

Let me take care of these in net.


Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

2022-01-24 Thread Mel Gorman
On Mon, Jan 24, 2022 at 11:12:07AM -0500, Zi Yan wrote:
> On 24 Jan 2022, at 9:02, Mel Gorman wrote:
> 
> > On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> >> From: Zi Yan 
> >>
> >> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> >> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> >> requirement for CMA and alloc_contig_range().
> >>
> >> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> >> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> >> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> >>
> >> [1] 
> >> https://lore.kernel.org/linux-mm/20211130100853.gp3...@techsingularity.net/
> >>
> >> Signed-off-by: Zi Yan 
> >>
> >> 
> >>
> >> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
> >>[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
> >> MIGRATE_TYPES },
> >>[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
> >> MIGRATE_TYPES },
> >>[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
> >> MIGRATE_TYPES },
> >> +  [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
> >>  #ifdef CONFIG_CMA
> >>[MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
> >>  #endif
> >
> > If it's never used, why is it added?
> 
> Just to make the fallbacks list complete, since MIGRATE_CMA and
> MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
> MIGRATE_ISOLATE. WDYT?
> 

It probably makes more sense to remove them or replace them with a comment
stating what migratetypes do not have a fallback list. Do it as a separate
patch that stands alone. It does not need to be part of this series.

-- 
Mel Gorman
SUSE Labs


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Steven Rostedt
On Mon, 24 Jan 2022 20:15:06 +0800
Yinan Liu  wrote:

> Hi, Steven and Sachin
> 
> I don't have a powerpc machine for testing, I guess the ppc has a 
> similar problem with the s390. It's not clear to me why the compiler 
> does this. Maybe we can handle ppc like you did with the s390 before, 
> but I'm not sure if other architectures have similar issues. Or limit 
> BUILDTIME_MCOUNT_SORT to a smaller scope and make it only available for 
> x86 and arm?
> 
> steven, what's your opinion?

Yeah, I think it's time to opt in, instead of opting out.

Something like this:

-- Steve

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c2724d986fa0..5256ebe57451 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -82,6 +82,7 @@ config ARM
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..7996548b2b27 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -166,6 +166,7 @@ config ARM64
select HAVE_ASM_MODVERSIONS
select HAVE_EBPF_JIT
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_CONTEXT_TRACKING
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7399327d1eff..46080dea5dba 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -186,6 +186,7 @@ config X86
select HAVE_CONTEXT_TRACKING_OFFSTACK   if HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
select HAVE_OBJTOOL_MCOUNT  if STACK_VALIDATION
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 752ed89a293b..7e5b92090faa 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -70,10 +70,16 @@ config HAVE_C_RECORDMCOUNT
help
  C version of recordmcount available?
 
+config HAVE_BUILDTIME_MCOUNT_SORT
+   bool
+   help
+ An architecture selects this if it sorts the mcount_loc section
+at build time.
+
 config BUILDTIME_MCOUNT_SORT
bool
default y
-   depends on BUILDTIME_TABLE_SORT && !S390
+   depends on HAVE_BUILDTIME_MCOUNT_SORT
help
  Sort the mcount_loc section at build time.
 


Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

2022-01-24 Thread Zi Yan
On 24 Jan 2022, at 9:02, Mel Gorman wrote:

> On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
>> From: Zi Yan 
>>
>> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
>> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
>> requirement for CMA and alloc_contig_range().
>>
>> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
>> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
>> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>>
>> [1] 
>> https://lore.kernel.org/linux-mm/20211130100853.gp3...@techsingularity.net/
>>
>> Signed-off-by: Zi Yan 
>>
>> 
>>
>> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>>  [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
>> MIGRATE_TYPES },
>>  [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
>> MIGRATE_TYPES },
>>  [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
>> MIGRATE_TYPES },
>> +[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>>  #ifdef CONFIG_CMA
>>  [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
>>  #endif
>
> If it's never used, why is it added?

Just to make the fallbacks list complete, since MIGRATE_CMA and
MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
MIGRATE_ISOLATE. WDYT?

>
> Otherwise looks fine so
>
> Acked-by: Mel Gorman 

Thanks.

--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature


[PATCH 3/3] powerpc/time: improve decrementer clockevent processing

2022-01-24 Thread Nicholas Piggin
The stop/shutdown op should not use decrementer_set_next_event because
that sets decrementers_next_tb to now + decrementer_max, which means a
decrementer interrupt that occurs after that time will call the
clockevent event handler unexpectedly. Set next_tb to ~0 here to prevent
any clock event call. Init all clockevents to stopped.

Then the decrementer clockevent device always has event_handler set and
applicable because we know the clock event device was not stopped. So
make this call unconditional to show that it is always called. next_tb
need not be set to ~0 before the event handler is called because it will
stop the clockevent device if there is no other timer.

Finally, the timer broadcast interrupt should not modify next_tb because
it is not involved with the local decrementer clockevent on this CPU.

This doesn't fix a known bug, just tidies the code.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/time.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 384f58a3f373..f3845601ab6a 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -107,7 +107,12 @@ struct clock_event_device decrementer_clockevent = {
 };
 EXPORT_SYMBOL(decrementer_clockevent);
 
-DEFINE_PER_CPU(u64, decrementers_next_tb);
+/*
+ * This always puts next_tb beyond now, so the clock event will never fire
+ * with the usual comparison, no need for a separate test for stopped.
+ */
+#define DEC_CLOCKEVENT_STOPPED ~0ULL
+DEFINE_PER_CPU(u64, decrementers_next_tb) = DEC_CLOCKEVENT_STOPPED;
 EXPORT_SYMBOL_GPL(decrementers_next_tb);
 static DEFINE_PER_CPU(struct clock_event_device, decrementers);
 
@@ -644,9 +649,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
 
now = get_tb();
if (now >= *next_tb) {
-   *next_tb = ~(u64)0;
-   if (evt->event_handler)
-   evt->event_handler(evt);
+   evt->event_handler(evt);
__this_cpu_inc(irq_stat.timer_irqs_event);
} else {
now = *next_tb - now;
@@ -665,9 +668,6 @@ EXPORT_SYMBOL(timer_interrupt);
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 void timer_broadcast_interrupt(void)
 {
-   u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
-
-   *next_tb = ~(u64)0;
tick_receive_broadcast();
__this_cpu_inc(irq_stat.broadcast_irqs_event);
 }
@@ -893,7 +893,9 @@ static int decrementer_set_next_event(unsigned long evt,
 
 static int decrementer_shutdown(struct clock_event_device *dev)
 {
-   decrementer_set_next_event(decrementer_max, dev);
+   __this_cpu_write(decrementers_next_tb, DEC_CLOCKEVENT_STOPPED);
+   set_dec_or_work(decrementer_max);
+
return 0;
 }
 
-- 
2.23.0



[PATCH 2/3] powerpc/time: Fix KVM host re-arming a timer beyond decrementer range

2022-01-24 Thread Nicholas Piggin
If the next host timer is beyond decrementer range, timer_rearm_host_dec
will leave decrementer not programmed. This will not cause a problem for
the host it will just set the decrementer correctly when the decrementer
interrupt hits, it seems safer not to leave the next host decrementer
interrupt timing able to be influenced by a guest.

This code is only used in the P9 KVM paths so it's unlikely to be hit
practically unless large decrementer is force disabled in the host.

Fixes: 25aa145856cd ("powerpc/time: add API for KVM to re-arm the host 
timer/decrementer")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/time.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index cd0b8b71ecdd..384f58a3f373 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -582,8 +582,9 @@ void timer_rearm_host_dec(u64 now)
local_paca->irq_happened |= PACA_IRQ_DEC;
} else {
now = *next_tb - now;
-   if (now <= decrementer_max)
-   set_dec_or_work(now);
+   if (now > decrementer_max)
+   now = decrementer_max;
+   set_dec_or_work(now);
}
 }
 EXPORT_SYMBOL_GPL(timer_rearm_host_dec);
-- 
2.23.0



[PATCH 1/3] powerpc/64s/interrupt: Fix decrementer storm

2022-01-24 Thread Nicholas Piggin
The decrementer exception can fail to be cleared when the interrupt
returns in the case where the decrementer wraps with the next timer
still beyond decrementer_max. This results in a decrementer interrupt
storm. This is triggerable with small decrementer system with hard
and soft watchdogs disabled.

Fix this by always programming the decrementer if there was no timer.

Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq 
handlers unless perf is in use")
Reported-by: Alexey Kardashevskiy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/time.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 62361cc7281c..cd0b8b71ecdd 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -649,8 +649,9 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
__this_cpu_inc(irq_stat.timer_irqs_event);
} else {
now = *next_tb - now;
-   if (now <= decrementer_max)
-   set_dec_or_work(now);
+   if (now > decrementer_max)
+   now = decrementer_max;
+   set_dec_or_work(now);
__this_cpu_inc(irq_stat.timer_irqs_others);
}
 
-- 
2.23.0



Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

2022-01-24 Thread Mel Gorman
On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> From: Zi Yan 
> 
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
> 
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> 
> [1] 
> https://lore.kernel.org/linux-mm/20211130100853.gp3...@techsingularity.net/
> 
> Signed-off-by: Zi Yan 
>
> 
>
> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>   [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
> MIGRATE_TYPES },
>   [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
> MIGRATE_TYPES },
>   [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
> MIGRATE_TYPES },
> + [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>  #ifdef CONFIG_CMA
>   [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
>  #endif

If it's never used, why is it added?

Otherwise looks fine so

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs


Re: [PATCH kernel v5] KVM: PPC: Merge powerpc's debugfs entry content into generic entry

2022-01-24 Thread Cédric Le Goater

On 1/11/22 01:54, Alexey Kardashevskiy wrote:

At the moment KVM on PPC creates 4 types of entries under the kvm debugfs:
1) "%pid-%fd" per a KVM instance (for all platforms);
2) "vm%pid" (for PPC Book3s HV KVM);
3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM);
4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS);

The problem with this is that multiple VMs per process is not allowed for
2) and 3) which makes it possible for userspace to trigger errors when
creating duplicated debugfs entries.

This merges all these into 1).

This defines kvm_arch_create_kvm_debugfs() similar to
kvm_arch_create_vcpu_debugfs().

This defines 2 hooks in kvmppc_ops that allow specific KVM implementations
add necessary entries, this adds the _e500 suffix to
kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for.

This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC.

This removes no more used debugfs_dir pointers from PPC kvm_arch structs.

This stops removing vcpu entries as once created vcpus stay around
for the entire life of a VM and removed when the KVM instance is closed,
see commit d56f5136b010 ("KVM: let kvm_destroy_vm_debugfs clean up vCPU
debugfs directories").

Suggested-by: Fabiano Rosas 
Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
Changes:
v5:
* fixed e500mc2

v4:
* added "kvm-xive-%p"

v3:
* reworked commit log, especially, the bit about removing vcpus

v2:
* handled powerpc-booke
* s/kvm/vm/ in arch hooks
---
  arch/powerpc/include/asm/kvm_host.h|  6 ++---
  arch/powerpc/include/asm/kvm_ppc.h |  2 ++
  arch/powerpc/kvm/timing.h  | 12 +-
  arch/powerpc/kvm/book3s_64_mmu_hv.c|  2 +-
  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
  arch/powerpc/kvm/book3s_hv.c   | 31 ++
  arch/powerpc/kvm/book3s_xics.c | 13 ++-
  arch/powerpc/kvm/book3s_xive.c | 13 ++-
  arch/powerpc/kvm/book3s_xive_native.c  | 13 ++-
  arch/powerpc/kvm/e500.c|  1 +
  arch/powerpc/kvm/e500mc.c  |  1 +
  arch/powerpc/kvm/powerpc.c | 16 ++---
  arch/powerpc/kvm/timing.c  | 21 +
  13 files changed, 51 insertions(+), 82 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 17263276189e..f5e14fa683f4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -26,6 +26,8 @@
  #include 
  #include 
  
+#define __KVM_HAVE_ARCH_VCPU_DEBUGFS

+
  #define KVM_MAX_VCPUS NR_CPUS
  #define KVM_MAX_VCORESNR_CPUS
  
@@ -295,7 +297,6 @@ struct kvm_arch {

bool dawr1_enabled;
pgd_t *pgtable;
u64 process_table;
-   struct dentry *debugfs_dir;
struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
@@ -673,7 +674,6 @@ struct kvm_vcpu_arch {
u64 timing_min_duration[__NUMBER_OF_KVM_EXIT_TYPES];
u64 timing_max_duration[__NUMBER_OF_KVM_EXIT_TYPES];
u64 timing_last_exit;
-   struct dentry *debugfs_exit_timing;
  #endif
  
  #ifdef CONFIG_PPC_BOOK3S

@@ -829,8 +829,6 @@ struct kvm_vcpu_arch {
struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
struct kvmhv_tb_accumulator guest_time; /* guest execution */
struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
-
-   struct dentry *debugfs_dir;
  #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
  };
  
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h

index 33db83b82fbd..d2b192dea0d2 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -316,6 +316,8 @@ struct kvmppc_ops {
int (*svm_off)(struct kvm *kvm);
int (*enable_dawr1)(struct kvm *kvm);
bool (*hash_v3_possible)(void);
+   int (*create_vm_debugfs)(struct kvm *kvm);
+   int (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry 
*debugfs_dentry);
  };
  
  extern struct kvmppc_ops *kvmppc_hv_ops;

diff --git a/arch/powerpc/kvm/timing.h b/arch/powerpc/kvm/timing.h
index feef7885ba82..45817ab82bb4 100644
--- a/arch/powerpc/kvm/timing.h
+++ b/arch/powerpc/kvm/timing.h
@@ -14,8 +14,8 @@
  #ifdef CONFIG_KVM_EXIT_TIMING
  void kvmppc_init_timing_stats(struct kvm_vcpu *vcpu);
  void kvmppc_update_timing_stats(struct kvm_vcpu *vcpu);
-void kvmppc_create_vcpu_debugfs(struct kvm_vcpu *vcpu, unsigned int id);
-void kvmppc_remove_vcpu_debugfs(struct kvm_vcpu *vcpu);
+int kvmppc_create_vcpu_debugfs_e500(struct kvm_vcpu *vcpu,
+   struct dentry *debugfs_dentry);
  
  static inline void kvmppc_set_exit_type(struct kvm_vcpu *vcpu, int type)

  {
@@ -26,9 +26,11 @@ static inline void kvmppc_set_exit_type(struct kvm_vcpu 
*vcpu, int type)
  /* if exit timing is n

[PATCH] powerpc/64: Move paca allocation later in boot

2022-01-24 Thread Michael Ellerman
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size()
and paca allocation.

The first is that on a Radix capable machine but with "disable_radix" on
the command line, there is a window during early boot where
early_radix_enabled() is true, even though it will later become false.

  early_init_devtree:   <- early_radix_enabled() = false
early_init_dt_scan_cpus:<- early_radix_enabled() = false
...
check_cpu_pa_features:  <- early_radix_enabled() = false
...   ^ <- early_radix_enabled() = TRUE
allocate_paca:| <- early_radix_enabled() = TRUE
...   |
ppc64_bolted_size:| <- early_radix_enabled() = TRUE
if (early_radix_enabled())| <- early_radix_enabled() = TRUE
return ULONG_MAX; |
...   |
...   | <- early_radix_enabled() = TRUE
...   | <- early_radix_enabled() = TRUE
mmu_early_init_devtree()  V
... <- early_radix_enabled() = false

This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's
paca allocation, even though later it will return a different value.
This is not currently a bug because the paca allocation is also limited
by the RMA size, but that is very fragile.

The second issue is that when using the Hash MMU, when we call
ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet
detected whether 1T segments are available. That causes
ppc64_bolted_size() to return 256MB, even if the machine can actually
support up to 1T. This is usually OK, we generally have space below
256MB for one paca, but for a kdump kernel placed above 256MB it causes
the boot to fail.

At boot we cannot discover all the features of the machine
instantaneously, so there will always be some periods where we have
incomplete knowledge of the system. However both the above problems stem
from the fact that we allocate the boot CPU's paca (and paca pointers
array) before we decide which MMU we are using, or discover its exact
features.

Moving the paca allocation slightly later still can solve both the
issues described above, and means for a normal boot we don't do any
permanent allocations until after we've discovered the MMU.

Note that although we move the boot CPU's paca allocation later, we
still have a temporary paca (boot_paca) accessible via r13, so code that
does read only access to paca fields is safe. The only risk is that some
code writes to the boot_paca, and that write will then be lost when we
switch away from the boot_paca later in early_setup().

The additional code that runs before the paca allocation is primarily
mmu_early_init_devtree(), which is scanning the device tree and
populating globals and cur_cpu_spec with MMU related flags. I do not see
any additional code that writes to paca fields.

[1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhj...@linux.ibm.com
[2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhj...@linux.ibm.com

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/prom.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 3d30d40a0e9c..86c4f009563d 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -352,6 +352,9 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
be32_to_cpu(intserv[found_thread]));
boot_cpuid = found;
 
+   // Pass the boot CPU's hard CPU id back to our caller
+   *((u32 *)data) = be32_to_cpu(intserv[found_thread]);
+
/*
 * PAPR defines "logical" PVR values for cpus that
 * meet various levels of the architecture:
@@ -388,9 +391,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
cur_cpu_spec->cpu_features &= ~CPU_FTR_SMT;
else if (!dt_cpu_ftrs_in_use())
cur_cpu_spec->cpu_features |= CPU_FTR_SMT;
-   allocate_paca(boot_cpuid);
 #endif
-   set_hard_smp_processor_id(found, be32_to_cpu(intserv[found_thread]));
 
return 0;
 }
@@ -714,6 +715,7 @@ static inline void save_fscr_to_task(void) {}
 
 void __init early_init_devtree(void *params)
 {
+   u32 boot_cpu_hwid;
phys_addr_t limit;
 
DBG(" -> early_init_devtree(%px)\n", params);
@@ -790,8 +792,6 @@ void __init early_init_devtree(void *params)
 * FIXME .. and the initrd too? */
move_device_tree();
 
-   allocate_paca_ptrs();
-
DBG("Scanning CPUs ...\n");
 
dt_cpu_ftrs_scan();
@@ -799,7 +799,7 @@ void __init early_init_devtree(void *params)
/* Retrieve CPU related informations from the flat tree
 * (altivec support, boot CPU ID, ...)
 */
- 

[RFC V1 04/31] powerpc/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-01-24 Thread Anshuman Khandual
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
macros can be dropped which are no longer needed. While here, this also
localizes arch_vm_get_page_prot() as powerpc_vm_get_page_prot().

Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/mman.h|  3 +-
 arch/powerpc/include/asm/pgtable.h | 19 
 arch/powerpc/mm/mmap.c | 47 ++
 4 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b779603978e1..ddb4a3687c05 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,6 +135,7 @@ config PPC
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_HAS_VM_GET_PAGE_PROT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_KEEP_MEMBLOCK
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 7cb6d18f5cd6..7b10c2031e82 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -24,7 +24,7 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned 
long prot,
 }
 #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
 
-static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
+static inline pgprot_t powerpc_vm_get_page_prot(unsigned long vm_flags)
 {
 #ifdef CONFIG_PPC_MEM_KEYS
return (vm_flags & VM_SAO) ?
@@ -34,7 +34,6 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long 
vm_flags)
return (vm_flags & VM_SAO) ? __pgprot(_PAGE_SAO) : __pgprot(0);
 #endif
 }
-#define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
 
 static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
 {
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index d564d0ecd4cd..3cbb6de20f9d 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -20,25 +20,6 @@ struct mm_struct;
 #include 
 #endif /* !CONFIG_PPC_BOOK3S */
 
-/* Note due to the way vm flags are laid out, the bits are XWR */
-#define __P000 PAGE_NONE
-#define __P001 PAGE_READONLY
-#define __P010 PAGE_COPY
-#define __P011 PAGE_COPY
-#define __P100 PAGE_READONLY_X
-#define __P101 PAGE_READONLY_X
-#define __P110 PAGE_COPY_X
-#define __P111 PAGE_COPY_X
-
-#define __S000 PAGE_NONE
-#define __S001 PAGE_READONLY
-#define __S010 PAGE_SHARED
-#define __S011 PAGE_SHARED
-#define __S100 PAGE_READONLY_X
-#define __S101 PAGE_READONLY_X
-#define __S110 PAGE_SHARED_X
-#define __S111 PAGE_SHARED_X
-
 #ifndef __ASSEMBLY__
 
 #ifndef MAX_PTRS_PER_PGD
diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
index c475cf810aa8..7f05e7903bd2 100644
--- a/arch/powerpc/mm/mmap.c
+++ b/arch/powerpc/mm/mmap.c
@@ -254,3 +254,50 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct 
rlimit *rlim_stack)
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
 }
+
+static inline pgprot_t __vm_get_page_prot(unsigned long vm_flags)
+{
+   switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
+   case VM_NONE:
+   return PAGE_NONE;
+   case VM_READ:
+   return PAGE_READONLY;
+   case VM_WRITE:
+   return PAGE_COPY;
+   case VM_READ | VM_WRITE:
+   return PAGE_COPY;
+   case VM_EXEC:
+   return PAGE_READONLY_X;
+   case VM_EXEC | VM_READ:
+   return PAGE_READONLY_X;
+   case VM_EXEC | VM_WRITE:
+   return PAGE_COPY_X;
+   case VM_EXEC | VM_READ | VM_WRITE:
+   return PAGE_COPY_X;
+   case VM_SHARED:
+   return PAGE_NONE;
+   case VM_SHARED | VM_READ:
+   return PAGE_READONLY;
+   case VM_SHARED | VM_WRITE:
+   return PAGE_SHARED;
+   case VM_SHARED | VM_READ | VM_WRITE:
+   return PAGE_SHARED;
+   case VM_SHARED | VM_EXEC:
+   return PAGE_READONLY_X;
+   case VM_SHARED | VM_EXEC | VM_READ:
+   return PAGE_READONLY_X;
+   case VM_SHARED | VM_EXEC | VM_WRITE:
+   return PAGE_SHARED_X;
+   case VM_SHARED | VM_EXEC | VM_READ | VM_WRITE:
+   return PAGE_SHARED_X;
+   default:
+   BUILD_BUG();
+   }
+}
+
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   return __pgprot(pgprot_val(__vm_get_page_prot(vm_flags)) |
+  pgprot_val(powerpc_vm_get_page_prot(vm_flags)));
+}
+EXPORT_SYMBOL(vm_get_page_prot);
-- 
2.25.1



Re: [PATCH 1/7] modules: Refactor within_module_core() and within_module_init()

2022-01-24 Thread Christophe Leroy


Le 24/01/2022 à 13:32, Christoph Hellwig a écrit :
> On Mon, Jan 24, 2022 at 09:22:15AM +, Christophe Leroy wrote:
>> +static inline bool within_range(unsigned long addr, void *base, unsigned 
>> int size)
> 
> Please avoid the overly long line.
> 
> .. But given that this function only has a single caller I see no
> point in factoring it out anyway.

Patch 2 brings a second caller.

Having it in patch 1 reduces churn in patch 2. Is it the wrong way to do ?

Christophe

Re: [PATCH 1/7] modules: Refactor within_module_core() and within_module_init()

2022-01-24 Thread Christoph Hellwig
On Mon, Jan 24, 2022 at 09:22:15AM +, Christophe Leroy wrote:
> +static inline bool within_range(unsigned long addr, void *base, unsigned int 
> size)

Please avoid the overly long line.

.. But given that this function only has a single caller I see no
point in factoring it out anyway.


Re: [PATCH 7/7] powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx

2022-01-24 Thread kernel test robot
Hi Christophe,

I love your patch! Perhaps something to improve:

[auto build test WARNING on mcgrof/modules-next]
[also build test WARNING on powerpc/next linus/master jeyu/modules-next 
v5.17-rc1 next-20220124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/Allocate-module-text-and-data-separately/20220124-172517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git 
modules-next
config: powerpc-allmodconfig 
(https://download.01.org/0day-ci/archive/20220124/202201242036.ojeeplob-...@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/2a5f7a254dd5c1efcfb852f5747632c85582016d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Christophe-Leroy/Allocate-module-text-and-data-separately/20220124-172517
git checkout 2a5f7a254dd5c1efcfb852f5747632c85582016d
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=powerpc SHELL=/bin/bash kernel/debug/kdb/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   kernel/debug/kdb/kdb_main.c: In function 'kdb_lsmod':
>> kernel/debug/kdb/kdb_main.c:2027:38: warning: format '%p' expects a matching 
>> 'void *' argument [-Wformat=]
2027 | kdb_printf("/%8u  0x%px ", mod->data_layout.size);
 | ~^
 |  |
 |  void *


vim +2027 kernel/debug/kdb/kdb_main.c

5d5314d6795f3c1 Jason Wessel 2010-05-20  2006  
5d5314d6795f3c1 Jason Wessel 2010-05-20  2007  #if defined(CONFIG_MODULES)
5d5314d6795f3c1 Jason Wessel 2010-05-20  2008  /*
5d5314d6795f3c1 Jason Wessel 2010-05-20  2009   * kdb_lsmod - This function 
implements the 'lsmod' command.  Lists
5d5314d6795f3c1 Jason Wessel 2010-05-20  2010   *   currently loaded kernel 
modules.
5d5314d6795f3c1 Jason Wessel 2010-05-20  2011   *   Mostly taken from 
userland lsmod.
5d5314d6795f3c1 Jason Wessel 2010-05-20  2012   */
5d5314d6795f3c1 Jason Wessel 2010-05-20  2013  static int kdb_lsmod(int 
argc, const char **argv)
5d5314d6795f3c1 Jason Wessel 2010-05-20  2014  {
5d5314d6795f3c1 Jason Wessel 2010-05-20  2015   struct module *mod;
5d5314d6795f3c1 Jason Wessel 2010-05-20  2016  
5d5314d6795f3c1 Jason Wessel 2010-05-20  2017   if (argc != 0)
5d5314d6795f3c1 Jason Wessel 2010-05-20  2018   return 
KDB_ARGCOUNT;
5d5314d6795f3c1 Jason Wessel 2010-05-20  2019  
5d5314d6795f3c1 Jason Wessel 2010-05-20  2020   kdb_printf("Module  
Size  modstruct Used by\n");
5d5314d6795f3c1 Jason Wessel 2010-05-20  2021   
list_for_each_entry(mod, kdb_modules, list) {
0d21b0e3477395e Rusty Russell2013-01-12  2022   if (mod->state 
== MODULE_STATE_UNFORMED)
0d21b0e3477395e Rusty Russell2013-01-12  2023   
continue;
5d5314d6795f3c1 Jason Wessel 2010-05-20  2024  
299a20e0bead4b7 Christophe Leroy 2022-01-24  2025   
kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
299a20e0bead4b7 Christophe Leroy 2022-01-24  2026  #ifdef 
CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
299a20e0bead4b7 Christophe Leroy 2022-01-24 @2027   
kdb_printf("/%8u  0x%px ", mod->data_layout.size);
299a20e0bead4b7 Christophe Leroy 2022-01-24  2028  #endif
299a20e0bead4b7 Christophe Leroy 2022-01-24  2029   kdb_printf("  
0x%px ", (void *)mod);
5d5314d6795f3c1 Jason Wessel 2010-05-20  2030  #ifdef CONFIG_MODULE_UNLOAD
d5db139ab376464 Rusty Russell2015-01-22  2031   kdb_printf("%4d 
", module_refcount(mod));
5d5314d6795f3c1 Jason Wessel 2010-05-20  2032  #endif
5d5314d6795f3c1 Jason Wessel 2010-05-20  2033   if (mod->state 
== MODULE_STATE_GOING)
5d5314d6795f3c1 Jason Wessel 2010-05-20  2034   
kdb_printf(" (Unloading)");
5d5314d6795f3c1 Jason Wessel 2010-05-20  2035   else if 
(mod->state == MODULE_STATE_COMING)
5d5314d6795f3c1 Jason Wessel 2010-05-20  2036   
kdb_printf(" (Loading)");
5d5314d6795f3c1 Jason Wessel 2010-05-20  2037   else
5d5314d6795f3c1 

[PATCH 2/2] KVM: PPC: Book3S PR: Disallow AIL != 0

2022-01-24 Thread Nicholas Piggin
KVM PR does not implement address translation modes on interrupt, so it
must not allow H_SET_MODE to succeed.

This is not compatible with QEMU behaviour. The solution might be to
have a cap-ail for this, but now it's broken either way so fix it in
KVM to start with.

This allows PR Linux guests that are using the SCV facility to boot and
run, because Linux disables the use of SCV if AIL can not be set to 3.
This isn't a real fix because Linux or another OS could implement real
mode SCV vectors and try to enable it. The right solution is for KVM to
emulate scv interrupts from the facility unavailable interrupt.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_pr_papr.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
b/arch/powerpc/kvm/book3s_pr_papr.c
index 1f10e7dfcdd0..dc4f51ac84bc 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -281,6 +281,22 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu 
*vcpu)
return EMULATE_DONE;
 }
 
+static int kvmppc_h_pr_set_mode(struct kvm_vcpu *vcpu)
+{
+   unsigned long mflags = kvmppc_get_gpr(vcpu, 4);
+   unsigned long resource = kvmppc_get_gpr(vcpu, 5);
+
+   if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) {
+   /* KVM PR does not provide AIL!=0 to guests */
+   if (mflags == 0)
+   kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+   else
+   kvmppc_set_gpr(vcpu, 3, H_UNSUPPORTED_FLAG_START - 63);
+   return EMULATE_DONE;
+   }
+   return EMULATE_FAIL;
+}
+
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
 {
@@ -384,6 +400,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
return kvmppc_h_pr_logical_ci_load(vcpu);
case H_LOGICAL_CI_STORE:
return kvmppc_h_pr_logical_ci_store(vcpu);
+   case H_SET_MODE:
+   return kvmppc_h_pr_set_mode(vcpu);
case H_XIRR:
case H_CPPR:
case H_EOI:
@@ -421,6 +439,7 @@ int kvmppc_hcall_impl_pr(unsigned long cmd)
case H_CEDE:
case H_LOGICAL_CI_LOAD:
case H_LOGICAL_CI_STORE:
+   case H_SET_MODE:
 #ifdef CONFIG_KVM_XICS
case H_XIRR:
case H_CPPR:
@@ -447,6 +466,7 @@ static unsigned int default_hcall_list[] = {
H_BULK_REMOVE,
H_PUT_TCE,
H_CEDE,
+   H_SET_MODE,
 #ifdef CONFIG_KVM_XICS
H_XIRR,
H_CPPR,
-- 
2.23.0



[PATCH 1/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled

2022-01-24 Thread Nicholas Piggin
PR KVM does not support running with AIL enabled, and SCV does is not
supported with AIL disabled.

Fix this by ensuring the SCV facility is disabled with FSCR while a
CPU can be running with AIL=0. PowerNV host supports disabling AIL on a
per-CPU basis, so SCV just needs to be disabled when a vCPU is run.

The pSeries machine can only switch AIL on a system-wide basis, so it
must disable SCV support at boot if the configuration can potentially
run a PR KVM guest.

SCV is not emulated for the PR guest at the moment, this just fixes the
host crashes.

Alternatives considered and rejected:
- SCV support can not be disabled by PR KVM after boot, because it is
  advertised to userspace with HWCAP.
- AIL can not be disabled on a per-CPU basis. At least when running on
  pseries it is a per-LPAR setting.
- Support for real-mode SCV vectors will not be added because they are
  at 0x17000 so making such a large fixed head space causes immediate
  value limits to be exceeded, requiring a lot rework and more code.
- Disabling SCV for any PR KVM possible kernel will cause a slowdown
  when not using PR KVM.
- A boot time option to disable SCV to use PR KVM is user-hostile.
- System call instruction emulation for SCV facility unavailable
  instructions is too complex and old emulation code was subtly broken
  and removed.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S |  4 
 arch/powerpc/kernel/setup_64.c   | 15 +++
 arch/powerpc/kvm/book3s_pr.c | 20 ++--
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 55caeee37c08..b66dd6f775a4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -809,6 +809,10 @@ __start_interrupts:
  * - MSR_EE|MSR_RI is clear (no reentrant exceptions)
  * - Standard kernel environment is set up (stack, paca, etc)
  *
+ * KVM:
+ * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM
+ * ensures that FSCR[SCV] is disabled whenever it has to force AIL off.
+ *
  * Call convention:
  *
  * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index be8577ac9397..ac52c69a3811 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -197,6 +197,21 @@ static void __init configure_exceptions(void)
 
/* Under a PAPR hypervisor, we need hypercalls */
if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
+   /*
+* PR KVM does not support AIL mode interrupts in the host, and
+* SCV system call interrupt vectors are only implemented for
+* AIL mode. Under pseries, AIL mode can only be enabled and
+* disabled system-wide so when PR KVM is loaded, all CPUs in
+* the host are set to AIL=0 mode. SCV can not be disabled
+* dynamically because the feature is advertised to host
+* userspace, so SCV support must not be enabled if PR KVM can
+* possibly be run.
+*/
+   if (IS_ENABLED(CONFIG_KVM_BOOK3S_PR_POSSIBLE) && 
!radix_enabled()) {
+   init_task.thread.fscr &= ~FSCR_SCV;
+   cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
+   }
+
/* Enable AIL if possible */
if (!pseries_enable_reloc_on_exc()) {
init_task.thread.fscr &= ~FSCR_SCV;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 34a801c3604a..4d1c84b94b77 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -140,9 +140,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu 
*vcpu, int cpu)
 #endif
 
/* Disable AIL if supported */
-   if (cpu_has_feature(CPU_FTR_HVMODE) &&
-   cpu_has_feature(CPU_FTR_ARCH_207S))
-   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
+   if (cpu_has_feature(CPU_FTR_ARCH_300) && (current->thread.fscr 
& FSCR_SCV))
+   mtspr(SPRN_FSCR, mfspr(SPRN_FSCR) & ~FSCR_SCV);
+   }
 
vcpu->cpu = smp_processor_id();
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -175,9 +178,12 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu)
kvmppc_save_tm_pr(vcpu);
 
/* Enable AIL if supported */
-   if (cpu_has_feature(CPU_FTR_HVMODE) &&
-   cpu_has_feature(CPU_FTR_ARCH_207S))
-   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3);
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   mtspr(SPRN_LPCR, mfspr(SPRN

[PATCH 0/2] KVM: PPC: Book3S PR: SCV fixes

2022-01-24 Thread Nicholas Piggin
These patches seem to be the quickest and easiest way to avoid the
host crash and to get recent Linux images (that use scv) to run under
PR.

These are independent of the syscall emulation series. Those just came
about when looking at ways to fix the host problem and seeing emulation
was broken.

Thanks,
Nick

Nicholas Piggin (2):
  KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled
  KVM: PPC: Book3S PR: Disallow AIL != 0

 arch/powerpc/kernel/exceptions-64s.S |  4 
 arch/powerpc/kernel/setup_64.c   | 15 +++
 arch/powerpc/kvm/book3s_pr.c | 20 ++--
 arch/powerpc/kvm/book3s_pr_papr.c| 20 
 4 files changed, 53 insertions(+), 6 deletions(-)

-- 
2.23.0



Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-01-24 Thread Oscar Salvador

On 2022-01-19 20:06, Zi Yan wrote:

From: Zi Yan 

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.


Hi Zi Yan,

I had to re-read this several times as I found this a bit misleading.
I was mainly confused by the fact that memory_hotplug does isolation on 
PAGES_PER_SECTION granularity, and reading the above seems to indicate 
that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages 
granularity.


True is that start_isolate_page_range() expects the range to be 
pageblock aligned and works in pageblock_nr_pages chunks, but I do not 
think that is what you meant to say here.


Now, to the change itself, below:



@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
*zone, struct page *page,
return page;
}

-   for (; iter < pageblock_nr_pages - offset; iter++) {
-   page = pfn_to_page(pfn + iter);
+   for (pfn = first_pfn; pfn < last_pfn; pfn++) {


You already did pfn = first_pfn before.


 /**
  * start_isolate_page_range() - make page-allocation-type of range of 
pages to

  * be MIGRATE_ISOLATE.
- * @start_pfn: The lower PFN of the range to be isolated.
- * @end_pfn:   The upper PFN of the range to be isolated.
+ * @start_pfn: The lower PFN of the range to be checked for
+ * possibility of isolation.
+ * @end_pfn:   The upper PFN of the range to be checked for
+ * possibility of isolation.
+ * @isolate_start: The lower PFN of the range to be isolated.
+ * @isolate_end:   The upper PFN of the range to be isolated.


So, what does "possibility" means here. I think this need to be 
clarified a bit better.


From what you pointed out in the commit message I think what you are 
doing is:


- alloc_contig_range() gets a range to be isolated.
- then you pass two ranges to start_isolate_page_range()
  (start_pfn, end_pfn]: which is the unaligned range you got in 
alloc_contig_range()
  (isolate_start, isolate_end]: which got aligned to, let's say, to 
MAX_ORDER_NR_PAGES


Now, most likely, (start_pfn, end_pfn] only covers a sub-range of 
(isolate_start, isolate_end], and that

sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

If so, should not the names be reversed?


--
Oscar Salvador
SUSE Labs


[PATCH 7/7] powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx

2022-01-24 Thread Christophe Leroy
book3s/32 and 8xx have a separate area for allocating modules,
defined by MODULES_VADDR / MODULES_END.

On book3s/32, it is not possible to protect against execution
on a page basis. A full 256M segment is either Exec or NoExec.
The module area is in an Exec segment while vmalloc area is
in a NoExec segment.

In order to protect module data against execution, select
ARCH_WANTS_MODULES_DATA_IN_VMALLOC.

For the 8xx (and possibly other 32 bits platform in the future),
there is no such constraint on Exec/NoExec protection, however
there is a critical distance between kernel functions and callers
that needs to remain below 32Mbytes in order to avoid costly
trampolines. By allocating data outside of module area, we
increase the chance for module text to remain within acceptable
distance from kernel core text.

So select ARCH_WANTS_MODULES_DATA_IN_VMALLOC for 8xx as well.

Signed-off-by: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 0631c9241af3..0360d6438359 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -161,6 +161,7 @@ config PPC
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select ARCH_WANT_LD_ORPHAN_WARN
+   select ARCH_WANTS_MODULES_DATA_IN_VMALLOC   if PPC_BOOK3S_32 || 
PPC_8xx
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
select BUILDTIME_TABLE_SORT
-- 
2.33.1


[PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-01-24 Thread Christophe Leroy
Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC to allow architectures
to request having modules data in vmalloc area instead of module area.

This is required on powerpc book3s/32 in order to set data non
executable, because it is not possible to set executability on page
basis, this is done per 256 Mbytes segments. The module area has exec
right, vmalloc area has noexec.

This can also be useful on other powerpc/32 in order to maximize the
chance of code being close enough to kernel core to avoid branch
trampolines.

Signed-off-by: Christophe Leroy 
Cc: Jason Wessel 
Cc: Daniel Thompson 
Cc: Douglas Anderson 
---
 arch/Kconfig|  6 +++
 include/linux/module.h  |  8 
 kernel/debug/kdb/kdb_main.c | 10 -
 kernel/module.c | 73 -
 4 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 847fde3d22cd..ed6a5ab8796d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -883,6 +883,12 @@ config MODULES_USE_ELF_REL
  Modules only use ELF REL relocations.  Modules with ELF RELA
  relocations will give an error.
 
+config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+   bool
+   help
+ For architectures like powerpc/32 which have constraints on module
+ allocation and need to allocate module data outside of module area.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/include/linux/module.h b/include/linux/module.h
index fc7adb110a81..3d908bb7ed08 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -421,6 +421,9 @@ struct module {
/* Core layout: rbtree is accessed frequently, so keep together. */
struct module_layout core_layout __module_layout_align;
struct module_layout init_layout;
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+   struct module_layout data_layout;
+#endif
 
/* Arch-specific module values */
struct mod_arch_specific arch;
@@ -592,7 +595,12 @@ static inline bool within_module_layout(unsigned long addr,
 static inline bool within_module_core(unsigned long addr,
  const struct module *mod)
 {
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+   return within_module_layout(addr, &mod->core_layout) ||
+  within_module_layout(addr, &mod->data_layout);
+#else
return within_module_layout(addr, &mod->core_layout);
+#endif
 }
 
 static inline bool within_module_init(unsigned long addr,
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0852a537dad4..b09e92f2c78d 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2022,8 +2022,11 @@ static int kdb_lsmod(int argc, const char **argv)
if (mod->state == MODULE_STATE_UNFORMED)
continue;
 
-   kdb_printf("%-20s%8u  0x%px ", mod->name,
-  mod->core_layout.size, (void *)mod);
+   kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+   kdb_printf("/%8u  0x%px ", mod->data_layout.size);
+#endif
+   kdb_printf("  0x%px ", (void *)mod);
 #ifdef CONFIG_MODULE_UNLOAD
kdb_printf("%4d ", module_refcount(mod));
 #endif
@@ -2034,6 +2037,9 @@ static int kdb_lsmod(int argc, const char **argv)
else
kdb_printf(" (Live)");
kdb_printf(" 0x%px", mod->core_layout.base);
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+   kdb_printf("/0x%px", mod->data_layout.base);
+#endif
 
 #ifdef CONFIG_MODULE_UNLOAD
{
diff --git a/kernel/module.c b/kernel/module.c
index de1a9de6a544..53486a65750e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -81,7 +81,9 @@
 /* If this is set, the section belongs in the init part of the module */
 #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
 
+#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
 #definedata_layout core_layout
+#endif
 
 /*
  * Mutex protects:
@@ -108,6 +110,12 @@ static struct mod_tree_root {
.addr_min = -1UL,
 };
 
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+static struct mod_tree_root mod_data_tree __cacheline_aligned = {
+   .addr_min = -1UL,
+};
+#endif
+
 #ifdef CONFIG_MODULES_TREE_LOOKUP
 
 /*
@@ -183,6 +191,11 @@ static void mod_tree_insert(struct module *mod)
__mod_tree_insert(&mod->core_layout.mtn, &mod_tree);
if (mod->init_layout.size)
__mod_tree_insert(&mod->init_layout.mtn, &mod_tree);
+
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+   mod->data_layout.mtn.mod = mod;
+   __mod_tree_insert(&mod->data_layout.mtn, &mod_data_tree);
+#endif
 }
 
 static void mod_tree_remove_init(struct module *mod)
@@ -195,6 +208,9 @@ static void mod_tree_remove(struct module *mod)
 {
__mod_tree_remove(&mod->core_layout.mtn, &mod_tree);
mod_tree_remove_in

[PATCH 5/7] modules: Introduce data_layout

2022-01-24 Thread Christophe Leroy
In order to allow separation of data from text, add another layout,
called data_layout. For architectures requesting separation of text
and data, only text will go in core_layout and data will go in
data_layout.

For architectures which keep text and data together, make data_layout
an alias of core_layout, that way data_layout can be used for all
data manipulations, regardless of whether data is in core_layout or
data_layout.

Signed-off-by: Christophe Leroy 
---
 kernel/module.c | 52 -
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 051fecef416b..de1a9de6a544 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -81,6 +81,8 @@
 /* If this is set, the section belongs in the init part of the module */
 #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
 
+#definedata_layout core_layout
+
 /*
  * Mutex protects:
  * 1) List of modules (also safely readable with preempt_disable),
@@ -2012,19 +2014,20 @@ static void module_enable_ro(const struct module *mod, 
bool after_init)
set_vm_flush_reset_perms(mod->init_layout.base);
frob_text(&mod->core_layout, set_memory_ro);
 
-   frob_rodata(&mod->core_layout, set_memory_ro);
+   frob_rodata(&mod->data_layout, set_memory_ro);
+
frob_text(&mod->init_layout, set_memory_ro);
frob_rodata(&mod->init_layout, set_memory_ro);
 
if (after_init)
-   frob_ro_after_init(&mod->core_layout, set_memory_ro);
+   frob_ro_after_init(&mod->data_layout, set_memory_ro);
 }
 
 static void module_enable_nx(const struct module *mod)
 {
-   frob_rodata(&mod->core_layout, set_memory_nx);
-   frob_ro_after_init(&mod->core_layout, set_memory_nx);
-   frob_writable_data(&mod->core_layout, set_memory_nx);
+   frob_rodata(&mod->data_layout, set_memory_nx);
+   frob_ro_after_init(&mod->data_layout, set_memory_nx);
+   frob_writable_data(&mod->data_layout, set_memory_nx);
frob_rodata(&mod->init_layout, set_memory_nx);
frob_writable_data(&mod->init_layout, set_memory_nx);
 }
@@ -2202,7 +2205,7 @@ static void free_module(struct module *mod)
percpu_modfree(mod);
 
/* Free lock-classes; relies on the preceding sync_rcu(). */
-   lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+   lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
 
/* Finally, free the core (containing the module structure) */
module_memfree(mod->core_layout.base);
@@ -2449,7 +2452,10 @@ static void layout_sections(struct module *mod, struct 
load_info *info)
|| s->sh_entsize != ~0UL
|| module_init_layout_section(sname))
continue;
-   s->sh_entsize = get_offset(mod, &mod->core_layout.size, 
s, i);
+   if (m)
+   s->sh_entsize = get_offset(mod, 
&mod->data_layout.size, s, i);
+   else
+   s->sh_entsize = get_offset(mod, 
&mod->core_layout.size, s, i);
pr_debug("\t%s\n", sname);
}
switch (m) {
@@ -2458,15 +2464,15 @@ static void layout_sections(struct module *mod, struct 
load_info *info)
mod->core_layout.text_size = mod->core_layout.size;
break;
case 1: /* RO: text and ro-data */
-   mod->core_layout.size = 
debug_align(mod->core_layout.size);
-   mod->core_layout.ro_size = mod->core_layout.size;
+   mod->data_layout.size = 
debug_align(mod->data_layout.size);
+   mod->data_layout.ro_size = mod->data_layout.size;
break;
case 2: /* RO after init */
-   mod->core_layout.size = 
debug_align(mod->core_layout.size);
-   mod->core_layout.ro_after_init_size = 
mod->core_layout.size;
+   mod->data_layout.size = 
debug_align(mod->data_layout.size);
+   mod->data_layout.ro_after_init_size = 
mod->data_layout.size;
break;
case 4: /* whole core */
-   mod->core_layout.size = 
debug_align(mod->core_layout.size);
+   mod->data_layout.size = 
debug_align(mod->data_layout.size);
break;
}
}
@@ -2719,12 +2725,12 @@ static void layout_symtab(struct module *mod, struct 
load_info *info)
}
 
/* Append room for core symbols at end of core part. */
-   info->symoffs = ALIGN(mod->core_layout.size, symsect->sh_addralign ?: 
1);
-   info->stroffs = mod->core_layout.size = info->symoffs + ndst * 
sizeof(Elf_Sym);
-   mod->core_layout.size += strtab_size;
-   info->core_typeoffs = mod->core_l

[PATCH 4/7] modules: Prepare for handling several RB trees

2022-01-24 Thread Christophe Leroy
In order to separate text and data, we need to setup
two rb trees. So modify functions to give the tree
as a parameter.

Signed-off-by: Christophe Leroy 
---
 kernel/module.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 346bc2e7a150..051fecef416b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -159,14 +159,14 @@ static const struct latch_tree_ops mod_tree_ops = {
.comp = mod_tree_comp,
 };
 
-static noinline void __mod_tree_insert(struct mod_tree_node *node)
+static noinline void __mod_tree_insert(struct mod_tree_node *node, struct 
mod_tree_root *tree)
 {
-   latch_tree_insert(&node->node, &mod_tree.root, &mod_tree_ops);
+   latch_tree_insert(&node->node, &tree->root, &mod_tree_ops);
 }
 
-static void __mod_tree_remove(struct mod_tree_node *node)
+static void __mod_tree_remove(struct mod_tree_node *node, struct mod_tree_root 
*tree)
 {
-   latch_tree_erase(&node->node, &mod_tree.root, &mod_tree_ops);
+   latch_tree_erase(&node->node, &tree->root, &mod_tree_ops);
 }
 
 /*
@@ -178,28 +178,28 @@ static void mod_tree_insert(struct module *mod)
mod->core_layout.mtn.mod = mod;
mod->init_layout.mtn.mod = mod;
 
-   __mod_tree_insert(&mod->core_layout.mtn);
+   __mod_tree_insert(&mod->core_layout.mtn, &mod_tree);
if (mod->init_layout.size)
-   __mod_tree_insert(&mod->init_layout.mtn);
+   __mod_tree_insert(&mod->init_layout.mtn, &mod_tree);
 }
 
 static void mod_tree_remove_init(struct module *mod)
 {
if (mod->init_layout.size)
-   __mod_tree_remove(&mod->init_layout.mtn);
+   __mod_tree_remove(&mod->init_layout.mtn, &mod_tree);
 }
 
 static void mod_tree_remove(struct module *mod)
 {
-   __mod_tree_remove(&mod->core_layout.mtn);
+   __mod_tree_remove(&mod->core_layout.mtn, &mod_tree);
mod_tree_remove_init(mod);
 }
 
-static struct module *mod_find(unsigned long addr)
+static struct module *mod_find(unsigned long addr, struct mod_tree_root *tree)
 {
struct latch_tree_node *ltn;
 
-   ltn = latch_tree_find((void *)addr, &mod_tree.root, &mod_tree_ops);
+   ltn = latch_tree_find((void *)addr, &tree->root, &mod_tree_ops);
if (!ltn)
return NULL;
 
@@ -212,7 +212,7 @@ static void mod_tree_insert(struct module *mod) { }
 static void mod_tree_remove_init(struct module *mod) { }
 static void mod_tree_remove(struct module *mod) { }
 
-static struct module *mod_find(unsigned long addr)
+static struct module *mod_find(unsigned long addr, struct mod_tree_root *tree)
 {
struct module *mod;
 
@@ -231,22 +231,22 @@ static struct module *mod_find(unsigned long addr)
  * Bounds of module text, for speeding up __module_address.
  * Protected by module_mutex.
  */
-static void __mod_update_bounds(void *base, unsigned int size)
+static void __mod_update_bounds(void *base, unsigned int size, struct 
mod_tree_root *tree)
 {
unsigned long min = (unsigned long)base;
unsigned long max = min + size;
 
-   if (min < mod_tree.addr_min)
-   mod_tree.addr_min = min;
-   if (max > mod_tree.addr_max)
-   mod_tree.addr_max = max;
+   if (min < tree->addr_min)
+   tree->addr_min = min;
+   if (max > tree->addr_max)
+   tree->addr_max = max;
 }
 
 static void mod_update_bounds(struct module *mod)
 {
-   __mod_update_bounds(mod->core_layout.base, mod->core_layout.size);
+   __mod_update_bounds(mod->core_layout.base, mod->core_layout.size, 
&mod_tree);
if (mod->init_layout.size)
-   __mod_update_bounds(mod->init_layout.base, 
mod->init_layout.size);
+   __mod_update_bounds(mod->init_layout.base, 
mod->init_layout.size, &mod_tree);
 }
 
 #ifdef CONFIG_KGDB_KDB
@@ -4719,7 +4719,7 @@ struct module *__module_address(unsigned long addr)
 
module_assert_mutex_or_preempt();
 
-   mod = mod_find(addr);
+   mod = mod_find(addr, &mod_tree);
if (mod) {
BUG_ON(!within_module(addr, mod));
if (mod->state == MODULE_STATE_UNFORMED)
-- 
2.33.1


[PATCH 3/7] modules: Always have struct mod_tree_root

2022-01-24 Thread Christophe Leroy
In order to separate text and data, we need to setup
two rb trees.

This also means that struct mod_tree_root is required even without
MODULES_TREE_LOOKUP.

Also remove module_addr_min and module_addr_max as there will
be one min and one max for each tree.

Signed-off-by: Christophe Leroy 
---
 kernel/module.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 201d27643c84..346bc2e7a150 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -85,7 +85,7 @@
  * Mutex protects:
  * 1) List of modules (also safely readable with preempt_disable),
  * 2) module_use links,
- * 3) module_addr_min/module_addr_max.
+ * 3) mod_tree.addr_min/mod_tree.addr_max.
  * (delete and add uses RCU list operations).
  */
 static DEFINE_MUTEX(module_mutex);
@@ -96,6 +96,16 @@ static void do_free_init(struct work_struct *w);
 static DECLARE_WORK(init_free_wq, do_free_init);
 static LLIST_HEAD(init_free_list);
 
+static struct mod_tree_root {
+#ifdef CONFIG_MODULES_TREE_LOOKUP
+   struct latch_tree_root root;
+#endif
+   unsigned long addr_min;
+   unsigned long addr_max;
+} mod_tree __cacheline_aligned = {
+   .addr_min = -1UL,
+};
+
 #ifdef CONFIG_MODULES_TREE_LOOKUP
 
 /*
@@ -149,17 +159,6 @@ static const struct latch_tree_ops mod_tree_ops = {
.comp = mod_tree_comp,
 };
 
-static struct mod_tree_root {
-   struct latch_tree_root root;
-   unsigned long addr_min;
-   unsigned long addr_max;
-} mod_tree __cacheline_aligned = {
-   .addr_min = -1UL,
-};
-
-#define module_addr_min mod_tree.addr_min
-#define module_addr_max mod_tree.addr_max
-
 static noinline void __mod_tree_insert(struct mod_tree_node *node)
 {
latch_tree_insert(&node->node, &mod_tree.root, &mod_tree_ops);
@@ -209,8 +208,6 @@ static struct module *mod_find(unsigned long addr)
 
 #else /* MODULES_TREE_LOOKUP */
 
-static unsigned long module_addr_min = -1UL, module_addr_max = 0;
-
 static void mod_tree_insert(struct module *mod) { }
 static void mod_tree_remove_init(struct module *mod) { }
 static void mod_tree_remove(struct module *mod) { }
@@ -239,10 +236,10 @@ static void __mod_update_bounds(void *base, unsigned int 
size)
unsigned long min = (unsigned long)base;
unsigned long max = min + size;
 
-   if (min < module_addr_min)
-   module_addr_min = min;
-   if (max > module_addr_max)
-   module_addr_max = max;
+   if (min < mod_tree.addr_min)
+   mod_tree.addr_min = min;
+   if (max > mod_tree.addr_max)
+   mod_tree.addr_max = max;
 }
 
 static void mod_update_bounds(struct module *mod)
@@ -4526,14 +4523,14 @@ static void cfi_init(struct module *mod)
mod->exit = *exit;
 #endif
 
-   cfi_module_add(mod, module_addr_min);
+   cfi_module_add(mod, mod_tree.addr_min);
 #endif
 }
 
 static void cfi_cleanup(struct module *mod)
 {
 #ifdef CONFIG_CFI_CLANG
-   cfi_module_remove(mod, module_addr_min);
+   cfi_module_remove(mod, mod_tree.addr_min);
 #endif
 }
 
@@ -4717,7 +4714,7 @@ struct module *__module_address(unsigned long addr)
 {
struct module *mod;
 
-   if (addr < module_addr_min || addr > module_addr_max)
+   if (addr < mod_tree.addr_min || addr > mod_tree.addr_max)
return NULL;
 
module_assert_mutex_or_preempt();
-- 
2.33.1


[PATCH 2/7] modules: Add within_module_text() macro

2022-01-24 Thread Christophe Leroy
Add a macro to check whether an address is within module's text.

Signed-off-by: Christophe Leroy 
---
 include/linux/module.h | 13 +
 kernel/module.c| 17 +
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 33b4db8f5ca5..fc7adb110a81 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -570,6 +570,19 @@ static inline bool within_range(unsigned long addr, void 
*base, unsigned int siz
return addr >= (unsigned long)base && addr < (unsigned long)base + size;
 }
 
+static inline bool within_module_layout_text(unsigned long addr,
+const struct module_layout *layout)
+{
+   return within_range(addr, layout->base, layout->text_size);
+}
+
+static inline bool within_module_text(unsigned long addr,
+ const struct module *mod)
+{
+   return within_module_layout_text(addr, &mod->core_layout) ||
+  within_module_layout_text(addr, &mod->init_layout);
+}
+
 static inline bool within_module_layout(unsigned long addr,
const struct module_layout *layout)
 {
diff --git a/kernel/module.c b/kernel/module.c
index 84a9141a5e15..201d27643c84 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4224,11 +4224,6 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user 
*, uargs, int, flags)
return load_module(&info, uargs, flags);
 }
 
-static inline int within(unsigned long addr, void *start, unsigned long size)
-{
-   return ((void *)addr >= start && (void *)addr < start + size);
-}
-
 #ifdef CONFIG_KALLSYMS
 /*
  * This ignores the intensely annoying "mapping symbols" found
@@ -4765,13 +4760,11 @@ bool is_module_text_address(unsigned long addr)
 struct module *__module_text_address(unsigned long addr)
 {
struct module *mod = __module_address(addr);
-   if (mod) {
-   /* Make sure it's within the text section. */
-   if (!within(addr, mod->init_layout.base, 
mod->init_layout.text_size)
-   && !within(addr, mod->core_layout.base, 
mod->core_layout.text_size))
-   mod = NULL;
-   }
-   return mod;
+
+   if (mod && within_module_text(addr, mod))
+   return mod;
+
+   return NULL;
 }
 
 /* Don't grab lock, we're oopsing. */
-- 
2.33.1


[PATCH 0/7] Allocate module text and data separately

2022-01-24 Thread Christophe Leroy
This series allow architectures to request having modules data in
vmalloc area instead of module area.

This is required on powerpc book3s/32 in order to set data non
executable, because it is not possible to set executability on page
basis, this is done per 256 Mbytes segments. The module area has exec
right, vmalloc area has noexec.

This can also be useful on other powerpc/32 in order to maximize the
chance of code being close enough to kernel core to avoid branch
trampolines.

Christophe Leroy (7):
  modules: Refactor within_module_core() and within_module_init()
  modules: Add within_module_text() macro
  modules: Always have struct mod_tree_root
  modules: Prepare for handling several RB trees
  modules: Introduce data_layout
  modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and
8xx

 arch/Kconfig|   6 ++
 arch/powerpc/Kconfig|   1 +
 include/linux/module.h  |  38 ++-
 kernel/debug/kdb/kdb_main.c |  10 +-
 kernel/module.c | 207 
 5 files changed, 186 insertions(+), 76 deletions(-)

-- 
2.33.1


[PATCH 1/7] modules: Refactor within_module_core() and within_module_init()

2022-01-24 Thread Christophe Leroy
within_module_core() and within_module_init() are doing the exact same
test, one on core_layout, the second on init_layout.

In preparation of increasing the complexity of that verification,
refactor it into a single function called within_module_layout().

Signed-off-by: Christophe Leroy 
---
 include/linux/module.h | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index c9f1200b2312..33b4db8f5ca5 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -565,18 +565,27 @@ bool __is_module_percpu_address(unsigned long addr, 
unsigned long *can_addr);
 bool is_module_percpu_address(unsigned long addr);
 bool is_module_text_address(unsigned long addr);
 
+static inline bool within_range(unsigned long addr, void *base, unsigned int 
size)
+{
+   return addr >= (unsigned long)base && addr < (unsigned long)base + size;
+}
+
+static inline bool within_module_layout(unsigned long addr,
+   const struct module_layout *layout)
+{
+   return within_range(addr, layout->base, layout->size);
+}
+
 static inline bool within_module_core(unsigned long addr,
  const struct module *mod)
 {
-   return (unsigned long)mod->core_layout.base <= addr &&
-  addr < (unsigned long)mod->core_layout.base + 
mod->core_layout.size;
+   return within_module_layout(addr, &mod->core_layout);
 }
 
 static inline bool within_module_init(unsigned long addr,
  const struct module *mod)
 {
-   return (unsigned long)mod->init_layout.base <= addr &&
-  addr < (unsigned long)mod->init_layout.base + 
mod->init_layout.size;
+   return within_module_layout(addr, &mod->init_layout);
 }
 
 static inline bool within_module(unsigned long addr, const struct module *mod)
-- 
2.33.1


Re: Build regressions/improvements in v5.17-rc1

2022-01-24 Thread Geert Uytterhoeven

On Sun, 23 Jan 2022, Geert Uytterhoeven wrote:

Below is the list of build error/warning regressions/improvements in
v5.17-rc1[1] compared to v5.16[2].

Summarized:
 - build errors: +17/-2
 - build warnings: +23/-25

Note that there may be false regressions, as some logs are incomplete.
Still, they're build errors/warnings.

Happy fixing! ;-)

Thanks to the linux-next team for providing the build service.

[1] 
http://kisskb.ellerman.id.au/kisskb/branch/linus/head/e783362eb54cd99b2cac8b3a9aeac942e6f6ac07/
 (all 99 configs)
[2] 
http://kisskb.ellerman.id.au/kisskb/branch/linus/head/df0cc57e057f18e44dac8e6c18aba47ab53202f9/
 (98 out of 99 configs)


*** ERRORS ***

17 error regressions:
 + /kisskb/src/arch/powerpc/kernel/stacktrace.c: error: implicit declaration of 
function 'nmi_cpu_backtrace' [-Werror=implicit-function-declaration]:  => 171:2
 + /kisskb/src/arch/powerpc/kernel/stacktrace.c: error: implicit declaration of 
function 'nmi_trigger_cpumask_backtrace' [-Werror=implicit-function-declaration]:  
=> 226:2


powerpc-gcc5/skiroot_defconfig


 + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible function 
types from 'void (*)(long unsigned int)' to 'void (*)(long unsigned int,  long 
unsigned int,  long unsigned int,  long unsigned int,  long unsigned int)' 
[-Werror=cast-function-type]:  => 1756:13, 1639:13
 + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible function 
types from 'void (*)(struct mm_struct *)' to 'void (*)(long unsigned int,  long 
unsigned int,  long unsigned int,  long unsigned int,  long unsigned int)' 
[-Werror=cast-function-type]:  => 1674:29, 1662:29
 + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible function 
types from 'void (*)(struct mm_struct *, long unsigned int)' to 'void (*)(long 
unsigned int,  long unsigned int,  long unsigned int,  long unsigned int,  long 
unsigned int)' [-Werror=cast-function-type]:  => 1767:21
 + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible function 
types from 'void (*)(struct vm_area_struct *, long unsigned int)' to 'void 
(*)(long unsigned int,  long unsigned int,  long unsigned int,  long unsigned int, 
 long unsigned int)' [-Werror=cast-function-type]:  => 1741:29, 1726:29
 + /kisskb/src/arch/sparc/mm/srmmu.c: error: cast between incompatible function 
types from 'void (*)(struct vm_area_struct *, long unsigned int,  long unsigned 
int)' to 'void (*)(long unsigned int,  long unsigned int,  long unsigned int,  
long unsigned int,  long unsigned int)' [-Werror=cast-function-type]:  => 
1694:29, 1711:29


sparc64-gcc11/sparc-allmodconfig


 + /kisskb/src/arch/um/include/asm/processor-generic.h: error: called object is 
not a function or function pointer:  => 103:18
 + /kisskb/src/drivers/vfio/pci/vfio_pci_rdwr.c: error: assignment makes pointer 
from integer without a cast [-Werror=int-conversion]:  => 324:9, 317:9
 + /kisskb/src/drivers/vfio/pci/vfio_pci_rdwr.c: error: implicit declaration of 
function 'ioport_map' [-Werror=implicit-function-declaration]:  => 317:11
 + /kisskb/src/drivers/vfio/pci/vfio_pci_rdwr.c: error: implicit declaration of 
function 'ioport_unmap' [-Werror=implicit-function-declaration]:  => 338:15


um-x86_64/um-allyesconfig


 + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: error: control 
reaches end of non-void function [-Werror=return-type]:  => 1560:1


um-x86_64/um-all{mod,yes}config


 + /kisskb/src/drivers/net/ethernet/freescale/fec_mpc52xx.c: error: passing 
argument 2 of 'mpc52xx_fec_set_paddr' discards 'const' qualifier from pointer 
target type [-Werror=discarded-qualifiers]:  => 659:29


powerpc-gcc5/ppc32_allmodconfig


 + /kisskb/src/drivers/pinctrl/pinctrl-thunderbay.c: error: assignment discards 
'const' qualifier from pointer target type [-Werror=discarded-qualifiers]:  => 
815:8, 815:29


arm64-gcc5.4/arm64-allmodconfig
arm64-gcc8/arm64-allmodconfig


 + /kisskb/src/lib/test_printf.c: error: "PTR" redefined [-Werror]:  => 247:0, 
247
 + /kisskb/src/sound/pci/ca0106/ca0106.h: error: "PTR" redefined [-Werror]:  => 
62, 62:0


mips-gcc8/mips-allmodconfig
mipsel/mips-allmodconfig


 + error: arch/powerpc/kvm/book3s_64_entry.o: relocation truncated to fit: 
R_PPC64_REL14 (stub) against symbol `machine_check_common' defined in .text 
section in arch/powerpc/kernel/head_64.o:  => (.text+0x3e4)


powerpc-gcc5/powerpc-allyesconfig

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds